Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Semantic clause retrieval for trademark law using transformer encoders and lexical baselines: a cross-domain agri-robotics compliance case study
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 05 March 2026

Semantic clause retrieval for trademark law using transformer encoders and lexical baselines: a cross-domain agri-robotics compliance case study

  • Muhammad Asfand E Yar1,
  • Qadeer Hashir1,
  • M. Hassan Tanveer2,
  • Razvan C. Voicu2 &
  • …
  • Adeel Khalid3 

Scientific Reports , Article number:  (2026) Cite this article

  • 618 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Engineering
  • Mathematics and computing

Abstract

Clause-level retrieval is a recurring bottleneck in legal research and compliance workflows: relevant obligations, exceptions, procedures, and enforcement conditions are often buried in long statutes and regulatory texts, and users may not know the exact terminology needed for keyword search. We present an application-oriented semantic clause retrieval pipeline that indexes documents at the clause level and ranks candidates using off-the-shelf sentence-transformer encoders with cosine similarity. Standard lexical baselines are included to contextualize performance under the same top-k retrieval and expert relevance judgment protocol. We evaluate the approach in a cross-domain setting spanning, trademark statute retrieval on Trademark Ordinance data and a scoped agri-robotics compliance corpus covering regulatory and standards-oriented requirements. The trademark benchmark serves as the primary quantitative evaluation, while the agri-robotics component is used to assess cross-domain transfer under a bounded query set without overstating generalization. In addition to aggregate ranking metrics, we report query-level analysis to characterize model behavior and common failure modes, including high-similarity but decision-irrelevant matches that arise from procedural or definitional overlap.

Data availability

We have now verified the links and are providing direct, publicly accessible PDF links below: Trade Mark Ordinance (Pakistan): https://ipo.gov.pk/system/files/Trade_Mark_Ordinance_2001_0.pdf. Pakistan Code – Relevant Document (Direct PDF): https://pakistancode.gov.pk/pdffiles/administratora4ef8d40e3d97faef49343d2242e0c3a.pdf. Both links have been tested to ensure they are accessible without login or special permissions.

References

  1. Hadfield, G. K. Legal barriers to innovation: The growing economic cost of professional control over corporate legal markets. Stan. L. Rev. 60, 1689 (2007).

    Google Scholar 

  2. Burke-White, W. International legal pluralism. Mich. J. Int’l L. 25, 963 (2003).

    Google Scholar 

  3. Raz, J. Legal principles and the limits of law. Yale L.J. 81, 823 (1971).

    Google Scholar 

  4. Gillan, S. & Starks, L. T. Corporate governance, corporate ownership, and the role of institutional investors: A global perspective. Weinberg Cent. for Corp. Gov. Work. Pap. (2003).

  5. Kim, Y. K., Lee, K., Park, W. G. & Choo, K. Appropriate intellectual property protection and economic growth in countries at different levels of development. Res. Policy 41, 358–375 (2012).

    Google Scholar 

  6. Moens, M.-F., Uyttendaele, C. & Dumortier, J. Information extraction from legal texts: The potential of discourse analysis. Int. J. Hum. Comput. Stud. 51, 1155–1171 (1999).

    Google Scholar 

  7. Arewa, O. B. Open access in a closed universe: Lexis, Westlaw, law schools, and the legal information market. Lewis & Clark L. Rev. 10, 797 (2006).

    Google Scholar 

  8. Livermore, M. A. et al. Law search in the age of the algorithm. Mich. St. L. Rev. 1183 (2020).

  9. Kamilaris, A., Kartakoullis, A. & Prenafeta-Boldú, F. X. A review on the practice of big data analysis in agriculture. Comput. Electron. Agric. 143, 23–37 (2017).

    Google Scholar 

  10. Bechar, A. & Vigneault, C. Agricultural robots for field operations: Current status and future trends. Biosyst. Eng. 149, 94–111 (2016).

    Google Scholar 

  11. van Wynsberghe, A. & Donhauser, J. Ethics of using autonomous robots in agriculture. J. Agric. Environ. Ethics 30, 267–285 (2017).

    Google Scholar 

  12. Wolf, T. et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38–45 (2020).

  13. McGinnis, J. O. & Pearce, R. G. The great disruption: How machine intelligence will transform the role of lawyers in the delivery of legal services. Actual Probl. Econ. Law https://doi.org/10.21202/1993-047x.13.2019.2.1230-1250 (2019).

    Google Scholar 

  14. Fowler, J. H., Johnson, T. R., Spriggs, J. F., Jeon, S. & Wahlbeck, P. J. Network analysis and the law: Measuring the legal importance of precedents at the US Supreme Court. Polit. Anal. 15, 324–346 (2007).

    Google Scholar 

  15. Winkels, R., Boer, A., Vredebregt, B. & Van Someren, A. Towards a legal recommender system. In JURIX, 169–178 (2014).

  16. Neale, T. Citation analysis of Canadian case law. J. Open Access L. 1, 1 (2013).

    Google Scholar 

  17. Sugathadasa, K. et al. Legal document retrieval using document vector embeddings and deep learning. In Intelligent Computing: Proceedings of the 2018 Computing Conference, Volume 2, 160–175 (Springer, 2019).

  18. Food and Agriculture Organization of the United Nations. Digital agriculture (2022). Available at https://www.fao.org/digital-agriculture/en/.

  19. IEEE. Ethically aligned design: A vision for prioritizing human well-being with autonomous and intelligent systems, first edition (2019). Available at https://ethicsinaction.ieee.org/.

  20. Alschner, W. & Skougarevskiy, D. Consistency and legal innovation in the bit universe. Stanf. Public Law Work. Pap. (2015).

  21. Siino, M., Falco, M., Croce, D. & Rosso, P. Exploring LLMs applications in law: A literature review on current legal NLP approaches. IEEE Access https://doi.org/10.1109/ACCESS.2025.3533217 (2025).

    Google Scholar 

  22. Siino, M. Exploring the use of LLMs in the Italian legal domain: A survey on recent applications. Comput. Law Secur. Rev. 58, 106164 (2025).

    Google Scholar 

  23. Chalkidis, I. & Kampas, D. Deep learning in law: Early adaptation and legal word embeddings trained on large corpora. Artif. Intell. Law 27, 171–198 (2019).

    Google Scholar 

  24. Dhanani, J., Mehta, R. & Rana, D. Effective and scalable legal judgment recommendation using pre-learned word embedding. Complex Intell. Syst. 8, 3199–3213 (2022).

    Google Scholar 

  25. Naderi, N. & Hirst, G. Argumentation mining in parliamentary discourse. In Principles and Practice of Multi-Agent Systems: International Workshops: IWEC 2014, Gold Coast, QLD, Australia, December 1-5, 2014, and CMNA XV and IWEC 2015, Bertinoro, Italy, October 26, 2015, Revised Selected Papers 15, 16–25 (Springer, 2016).

  26. Landthaler, J., Waltl, B., Holl, P. & Matthes, F. Extending full text search for legal document collections using word embeddings. In JURIX, 73–82 (2016).

  27. Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017).

  28. Bambroo, P. & Awasthi, A. Legaldb: Long distilbert for legal document classification. In 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), 1–4 (IEEE, 2021).

  29. Khan, W. et al. Part of speech tagging in Urdu: Comparison of machine and deep learning approaches. IEEE Access 7, 38918–38936. https://doi.org/10.1109/ACCESS.2019.2897327 (2019).

    Google Scholar 

  30. Shaukat, S., Shaukat, A., Shahzad, K. & Daud, A. Using TREC for developing semantic information retrieval benchmark for Urdu. Inf. Process. Manag. 59, 102939 (2022).

    Google Scholar 

  31. Reimers, N. & Gurevych, I. Making monolingual sentence embeddings multilingual using knowledge distillation. arXiv preprint arXiv:2004.09813 (2020).

  32. de Oliveira, R. S. & Nascimento, E. G. S. Brazilian court documents clustered by similarity together using natural language processing approaches with transformers. arXiv preprint arXiv:2204.07182 (2022).

  33. Hoppe, C., Pelkmann, D., Migenda, N., Hötte, D. & Schenck, W. Towards intelligent legal advisors for document retrieval and question-answering in german legal documents. In 2021 IEEE Fourth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), 29–32 (IEEE, 2021).

  34. Daud, A. Using time topic modeling for semantics-based dynamic research interest finding. Knowl. Based Syst. 26, 154–163 (2012).

    Google Scholar 

  35. Nejadgholi, I., Bougueng, R. & Witherspoon, S. A semi-supervised training method for semantic search of legal facts in canadian immigration cases. In JURIX, 125–134 (2017).

  36. Wehnert, S. et al. Legal norm retrieval with variations of the bert model combined with tf-idf vectorization. In Proceedings of the eighteenth international conference on artificial intelligence and law, 285–294 (2021).

  37. Daud, A., Li, J., Zhou, L. & Muhammad, F. Knowledge discovery through directed probabilistic topic models: A survey. Front. Comput. Sci. China. 4, 280–301 (2010).

    Google Scholar 

  38. Union, E. Access to european union law – digital and robotic agriculture (2020). Available at https://eur-lex.europa.eu/.

  39. United States Department of Agriculture. Science and research – robotics in agriculture (2023). Available at https://www.usda.gov/topics/science/research.

  40. International Organization for Standardization. Robotics – safety requirements for service robots (iso 22143:2019) (2019). Available at https://www.iso.org/standard/72341.html.

  41. Conneau, A., Kiela, D., Schwenk, H., Barrault, L. & Bordes, A. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017).

  42. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

  43. Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N. & Androutsopoulos, I. Legal-bert: The muppets straight out of law school. arXiv preprint arXiv:2010.02559 (2020).

  44. Geng, S., Lebret, R. & Aberer, K. Legal transformer models may not always help. arXiv preprint arXiv:2109.06862 (2021).

  45. Danopoulos, D., Kachris, C. & Soudris, D. Approximate similarity search with faiss framework using fpgas on the cloud. In International Conference on Embedded Computer Systems, 373–386 (Springer, 2019).

  46. Salton, G. & Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988).

    Google Scholar 

  47. Robertson, S. & Zaragoza, H. The probabilistic relevance framework: Bm25 and beyond. Found. Trends. Inf. Retr. 3, 333–389 (2009).

    Google Scholar 

  48. Turpin, A. & Scholer, F. User performance versus precision measures for simple search tasks. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 11–18 (2006).

  49. Zhu, J., Wu, J., Luo, X. & Liu, J. Semantic matching based legal information retrieval system for covid-19 pandemic. Artif. intelligence law 1–30 (2023).

  50. Raza, S. A covid-19 search engine (co-se) with transformer-based architecture. Healthc. Anal. 2, 100068 (2022).

    Google Scholar 

Download references

Acknowledgements

The authors would like to express their sincere gratitude to the Higher Education Commission (HEC) for the financial support provided under the National Research Program for Universities (NRPU), project reference No. 15401. This support played a pivotal role in the successful execution of our research project. We also extend our appreciation to the reviewers and editors for their valuable feedback and contributions, which significantly enhanced the quality of this paper. Additionally, we would like to acknowledge the collaborative efforts of the research team at Bahria University, Islamabad, which contributed to the successful completion of this research endeavor.

Funding

The research presented in this paper was supported by the Higher Education Commission (HEC) under the National Research Program for Universities (NRPU) with project reference No. 15401. Dr. Muhammad Asfand-E-Yar is the PI of the project.

Author information

Authors and Affiliations

  1. Center of Excellence in Artificial Intelligence (CoE-AI), Department of Computer Science Bahria University, Islamabad, Pakistan

    Muhammad Asfand E Yar & Qadeer Hashir

  2. Department of Robotics and Mechatronics Engineering, Kennesaw State University, Kennesaw, USA

    M. Hassan Tanveer & Razvan C. Voicu

  3. Department of Industrial and Systems Engineering, Kennesaw State University, Kennesaw, USA

    Adeel Khalid

Authors
  1. Muhammad Asfand E Yar
    View author publications

    Search author on:PubMed Google Scholar

  2. Qadeer Hashir
    View author publications

    Search author on:PubMed Google Scholar

  3. M. Hassan Tanveer
    View author publications

    Search author on:PubMed Google Scholar

  4. Razvan C. Voicu
    View author publications

    Search author on:PubMed Google Scholar

  5. Adeel Khalid
    View author publications

    Search author on:PubMed Google Scholar

Contributions

M.A.Y. and Q.H. developed the methodology, performed the experiments, and analyzed the results. M.H.T. and R.C.V. contributed to the design of the semantic search framework and interpretation of results. A.K. provided project supervision, research guidance, and critical revisions to the manuscript. M.A.Y. and Q.H. wrote the main manuscript text. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Muhammad Asfand E Yar or M. Hassan Tanveer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Asfand E Yar, M., Hashir, Q., Tanveer, M.H. et al. Semantic clause retrieval for trademark law using transformer encoders and lexical baselines: a cross-domain agri-robotics compliance case study. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43098-3

Download citation

  • Received: 19 September 2025

  • Accepted: 02 March 2026

  • Published: 05 March 2026

  • DOI: https://doi.org/10.1038/s41598-026-43098-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Semantic search
  • Information retrieval
  • BM25
  • TF–IDF
  • Sentence embeddings
  • Legal NLP
  • Trademark law
  • Agri-robotics compliance
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics