Abstract
Clause-level retrieval is a recurring bottleneck in legal research and compliance workflows: relevant obligations, exceptions, procedures, and enforcement conditions are often buried in long statutes and regulatory texts, and users may not know the exact terminology needed for keyword search. We present an application-oriented semantic clause retrieval pipeline that indexes documents at the clause level and ranks candidates using off-the-shelf sentence-transformer encoders with cosine similarity. Standard lexical baselines are included to contextualize performance under the same top-k retrieval and expert relevance judgment protocol. We evaluate the approach in a cross-domain setting spanning, trademark statute retrieval on Trademark Ordinance data and a scoped agri-robotics compliance corpus covering regulatory and standards-oriented requirements. The trademark benchmark serves as the primary quantitative evaluation, while the agri-robotics component is used to assess cross-domain transfer under a bounded query set without overstating generalization. In addition to aggregate ranking metrics, we report query-level analysis to characterize model behavior and common failure modes, including high-similarity but decision-irrelevant matches that arise from procedural or definitional overlap.
Data availability
We have now verified the links and are providing direct, publicly accessible PDF links below: Trade Mark Ordinance (Pakistan): https://ipo.gov.pk/system/files/Trade_Mark_Ordinance_2001_0.pdf. Pakistan Code – Relevant Document (Direct PDF): https://pakistancode.gov.pk/pdffiles/administratora4ef8d40e3d97faef49343d2242e0c3a.pdf. Both links have been tested to ensure they are accessible without login or special permissions.
References
Hadfield, G. K. Legal barriers to innovation: The growing economic cost of professional control over corporate legal markets. Stan. L. Rev. 60, 1689 (2007).
Burke-White, W. International legal pluralism. Mich. J. Int’l L. 25, 963 (2003).
Raz, J. Legal principles and the limits of law. Yale L.J. 81, 823 (1971).
Gillan, S. & Starks, L. T. Corporate governance, corporate ownership, and the role of institutional investors: A global perspective. Weinberg Cent. for Corp. Gov. Work. Pap. (2003).
Kim, Y. K., Lee, K., Park, W. G. & Choo, K. Appropriate intellectual property protection and economic growth in countries at different levels of development. Res. Policy 41, 358–375 (2012).
Moens, M.-F., Uyttendaele, C. & Dumortier, J. Information extraction from legal texts: The potential of discourse analysis. Int. J. Hum. Comput. Stud. 51, 1155–1171 (1999).
Arewa, O. B. Open access in a closed universe: Lexis, Westlaw, law schools, and the legal information market. Lewis & Clark L. Rev. 10, 797 (2006).
Livermore, M. A. et al. Law search in the age of the algorithm. Mich. St. L. Rev. 1183 (2020).
Kamilaris, A., Kartakoullis, A. & Prenafeta-Boldú, F. X. A review on the practice of big data analysis in agriculture. Comput. Electron. Agric. 143, 23–37 (2017).
Bechar, A. & Vigneault, C. Agricultural robots for field operations: Current status and future trends. Biosyst. Eng. 149, 94–111 (2016).
van Wynsberghe, A. & Donhauser, J. Ethics of using autonomous robots in agriculture. J. Agric. Environ. Ethics 30, 267–285 (2017).
Wolf, T. et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38–45 (2020).
McGinnis, J. O. & Pearce, R. G. The great disruption: How machine intelligence will transform the role of lawyers in the delivery of legal services. Actual Probl. Econ. Law https://doi.org/10.21202/1993-047x.13.2019.2.1230-1250 (2019).
Fowler, J. H., Johnson, T. R., Spriggs, J. F., Jeon, S. & Wahlbeck, P. J. Network analysis and the law: Measuring the legal importance of precedents at the US Supreme Court. Polit. Anal. 15, 324–346 (2007).
Winkels, R., Boer, A., Vredebregt, B. & Van Someren, A. Towards a legal recommender system. In JURIX, 169–178 (2014).
Neale, T. Citation analysis of Canadian case law. J. Open Access L. 1, 1 (2013).
Sugathadasa, K. et al. Legal document retrieval using document vector embeddings and deep learning. In Intelligent Computing: Proceedings of the 2018 Computing Conference, Volume 2, 160–175 (Springer, 2019).
Food and Agriculture Organization of the United Nations. Digital agriculture (2022). Available at https://www.fao.org/digital-agriculture/en/.
IEEE. Ethically aligned design: A vision for prioritizing human well-being with autonomous and intelligent systems, first edition (2019). Available at https://ethicsinaction.ieee.org/.
Alschner, W. & Skougarevskiy, D. Consistency and legal innovation in the bit universe. Stanf. Public Law Work. Pap. (2015).
Siino, M., Falco, M., Croce, D. & Rosso, P. Exploring LLMs applications in law: A literature review on current legal NLP approaches. IEEE Access https://doi.org/10.1109/ACCESS.2025.3533217 (2025).
Siino, M. Exploring the use of LLMs in the Italian legal domain: A survey on recent applications. Comput. Law Secur. Rev. 58, 106164 (2025).
Chalkidis, I. & Kampas, D. Deep learning in law: Early adaptation and legal word embeddings trained on large corpora. Artif. Intell. Law 27, 171–198 (2019).
Dhanani, J., Mehta, R. & Rana, D. Effective and scalable legal judgment recommendation using pre-learned word embedding. Complex Intell. Syst. 8, 3199–3213 (2022).
Naderi, N. & Hirst, G. Argumentation mining in parliamentary discourse. In Principles and Practice of Multi-Agent Systems: International Workshops: IWEC 2014, Gold Coast, QLD, Australia, December 1-5, 2014, and CMNA XV and IWEC 2015, Bertinoro, Italy, October 26, 2015, Revised Selected Papers 15, 16–25 (Springer, 2016).
Landthaler, J., Waltl, B., Holl, P. & Matthes, F. Extending full text search for legal document collections using word embeddings. In JURIX, 73–82 (2016).
Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017).
Bambroo, P. & Awasthi, A. Legaldb: Long distilbert for legal document classification. In 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), 1–4 (IEEE, 2021).
Khan, W. et al. Part of speech tagging in Urdu: Comparison of machine and deep learning approaches. IEEE Access 7, 38918–38936. https://doi.org/10.1109/ACCESS.2019.2897327 (2019).
Shaukat, S., Shaukat, A., Shahzad, K. & Daud, A. Using TREC for developing semantic information retrieval benchmark for Urdu. Inf. Process. Manag. 59, 102939 (2022).
Reimers, N. & Gurevych, I. Making monolingual sentence embeddings multilingual using knowledge distillation. arXiv preprint arXiv:2004.09813 (2020).
de Oliveira, R. S. & Nascimento, E. G. S. Brazilian court documents clustered by similarity together using natural language processing approaches with transformers. arXiv preprint arXiv:2204.07182 (2022).
Hoppe, C., Pelkmann, D., Migenda, N., Hötte, D. & Schenck, W. Towards intelligent legal advisors for document retrieval and question-answering in german legal documents. In 2021 IEEE Fourth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), 29–32 (IEEE, 2021).
Daud, A. Using time topic modeling for semantics-based dynamic research interest finding. Knowl. Based Syst. 26, 154–163 (2012).
Nejadgholi, I., Bougueng, R. & Witherspoon, S. A semi-supervised training method for semantic search of legal facts in canadian immigration cases. In JURIX, 125–134 (2017).
Wehnert, S. et al. Legal norm retrieval with variations of the bert model combined with tf-idf vectorization. In Proceedings of the eighteenth international conference on artificial intelligence and law, 285–294 (2021).
Daud, A., Li, J., Zhou, L. & Muhammad, F. Knowledge discovery through directed probabilistic topic models: A survey. Front. Comput. Sci. China. 4, 280–301 (2010).
Union, E. Access to european union law – digital and robotic agriculture (2020). Available at https://eur-lex.europa.eu/.
United States Department of Agriculture. Science and research – robotics in agriculture (2023). Available at https://www.usda.gov/topics/science/research.
International Organization for Standardization. Robotics – safety requirements for service robots (iso 22143:2019) (2019). Available at https://www.iso.org/standard/72341.html.
Conneau, A., Kiela, D., Schwenk, H., Barrault, L. & Bordes, A. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N. & Androutsopoulos, I. Legal-bert: The muppets straight out of law school. arXiv preprint arXiv:2010.02559 (2020).
Geng, S., Lebret, R. & Aberer, K. Legal transformer models may not always help. arXiv preprint arXiv:2109.06862 (2021).
Danopoulos, D., Kachris, C. & Soudris, D. Approximate similarity search with faiss framework using fpgas on the cloud. In International Conference on Embedded Computer Systems, 373–386 (Springer, 2019).
Salton, G. & Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988).
Robertson, S. & Zaragoza, H. The probabilistic relevance framework: Bm25 and beyond. Found. Trends. Inf. Retr. 3, 333–389 (2009).
Turpin, A. & Scholer, F. User performance versus precision measures for simple search tasks. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 11–18 (2006).
Zhu, J., Wu, J., Luo, X. & Liu, J. Semantic matching based legal information retrieval system for covid-19 pandemic. Artif. intelligence law 1–30 (2023).
Raza, S. A covid-19 search engine (co-se) with transformer-based architecture. Healthc. Anal. 2, 100068 (2022).
Acknowledgements
The authors would like to express their sincere gratitude to the Higher Education Commission (HEC) for the financial support provided under the National Research Program for Universities (NRPU), project reference No. 15401. This support played a pivotal role in the successful execution of our research project. We also extend our appreciation to the reviewers and editors for their valuable feedback and contributions, which significantly enhanced the quality of this paper. Additionally, we would like to acknowledge the collaborative efforts of the research team at Bahria University, Islamabad, which contributed to the successful completion of this research endeavor.
Funding
The research presented in this paper was supported by the Higher Education Commission (HEC) under the National Research Program for Universities (NRPU) with project reference No. 15401. Dr. Muhammad Asfand-E-Yar is the PI of the project.
Author information
Authors and Affiliations
Contributions
M.A.Y. and Q.H. developed the methodology, performed the experiments, and analyzed the results. M.H.T. and R.C.V. contributed to the design of the semantic search framework and interpretation of results. A.K. provided project supervision, research guidance, and critical revisions to the manuscript. M.A.Y. and Q.H. wrote the main manuscript text. All authors reviewed and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Asfand E Yar, M., Hashir, Q., Tanveer, M.H. et al. Semantic clause retrieval for trademark law using transformer encoders and lexical baselines: a cross-domain agri-robotics compliance case study. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43098-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-43098-3