Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
The construction and refined extraction techniques of knowledge graph based on large language models
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 10 February 2026

The construction and refined extraction techniques of knowledge graph based on large language models

  • Li Peng1,2,
  • Pei Yang1,
  • Ye Juexiang3 &
  • …
  • Li Yuangan4 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computer science
  • Software

Abstract

With the growing need for intelligent decision-support systems, the development of high-quality knowledge graphs has become essential for improving operational efficiency and decision reliability. However, the specialized nature, distributed sources, and sensitive aspects of this knowledge present unique challenges to conventional knowledge management approaches. Current general-purpose large language models often struggle with domain-specific text comprehension, particularly in accurately interpreting technical parameters and operational guidelines. To address these limitations, this paper introduces a framework for building and refining specialized knowledge graphs using adapted large language models. Our approach involves fine-tuning base LLMs with domain-specific datasets, enabling them to better handle complex terminology and semantic nuances. The framework incorporates a multimodal knowledge integration pipeline that combines rule-based systems with ontological structures to extract and link entities from diverse data sources, creating an adaptive knowledge network. Experimental results demonstrate that our fine-tuned model achieves substantial gains in relationship extraction accuracy, while the resulting knowledge graph shows strong performance in semantic coherence and operational reasoning assessments, offering robust support for critical decision-making processes. This research presents a novel approach for effective knowledge integration and cross-functional collaboration in specialized domains.

Data availability

Data sets generated during the current study are available from the corresponding author on reasonable request. The data are available but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.

References

  1. Tamašauskaitė, G. & Groth, P. Defining a knowledge graph development process through a systematic review[J]. ACM Trans. Softw. Eng. Methodol. 32 (1), 1–40 (2023).

    Google Scholar 

  2. AlMousa, M., Benlamri, R. & Khoury, R. A Novel Word Sense Disambiguation Approach Using WordNet Knowledge graph[J] Vol. 74, 101337 (Computer Speech & Language, 2022).

  3. Pellissier Tanon, T., Weikum, G. & Suchanek, F. Yago 4: A reason-able knowledge base[C]//The Semantic Web: 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31–June 4, 2020, Proceedings 17. Springer International Publishing, : 583–596. (2020).

  4. Singh, K. et al. No one is perfect: analysing the performance of question answering components over the dbpedia knowledge graph[J]. J. Web Semant. 65, 100594 (2020).

    Google Scholar 

  5. Arnaout, H. et al. Negative knowledge for open-world Wikidata[C]//Companion Proceedings of the Web Conference 2021. : 544–551. (2021).

  6. Xiong, C., Power, R. & Callan, J. Explicit semantic ranking for academic search via knowledge graph embedding[C]//Proceedings of the 26th international conference on world wide web. : 1271–1279. (2017).

  7. Zhang, F. et al. Collaborative knowledge base embedding for recommender systems[C]//Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. : 353–362. (2016).

  8. Wang, Z. Y. et al. Survey of intelligent question answering research based on knowledge graph[J]. Comput. Eng. Appl. 56 (23), 1–11 (2020).

    Google Scholar 

  9. Martinez-Rodriguez, J. L., Hogan, A. & Lopez-Arevalo, I. Information extraction Meets the semantic web: a survey[J]. Semantic Web. 11 (2), 255–335 (2020).

    Google Scholar 

  10. Chen, B. et al. Unleashing the potential of prompt engineering in large Language models[J]. Patterns 6 (6), 101260 (2025).

  11. Fan, W. et al. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. : 6491–6501. (2024).

  12. Miller, G. A. WordNet: a lexical database for English[J]. Commun. ACM. 38 (11), 39–41 (1995).

    Google Scholar 

  13. Fellbaum, C. & WordNet An Electronic Lexical database[J] Vol. 2, 678–686 (MIT Press google schola, 1998).

  14. Bizer, C. et al. Dbpedia-a crystallization point for the web of data[J]. J. Web Semant. 7 (3), 154–165 (2009).

    Google Scholar 

  15. Auer, S. et al. Dbpedia: A nucleus for a web of open data[C]//international semantic web conference. Berlin, Heidelberg: Springer Berlin Heidelberg, : 722–735. (2007).

  16. Bizer, C., Heath, T. & Berners-Lee, T. Linked data-the story so far[M]//Linking the World’s Information: Essays on Tim Berners-Lee’s Invention of the World Wide Web. : 115–143. (2023).

  17. Subagdja, B. et al. Machine learning for refining knowledge graphs: A Survey[J]. ACM Comput. Surveys. 56 (6), 1–38 (2024).

    Google Scholar 

  18. Zhong, L. et al. A comprehensive survey on automatic knowledge graph construction[J]. ACM Comput. Surveys. 56 (4), 1–62 (2023).

    Google Scholar 

  19. Schmitt, X. et al. A replicable comparison study of NER software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate[C]//2019 sixth international conference on social networks analysis, management and security (SNAMS). 338–343. (IEEE, 2019).

  20. Che, Wanxiang, et al. N-LTP: An open-source neural language technology platform for Chinese[C]//Proceedings of the 2021 conference on empirical methods in natural language processing: System demonstrations. : 42–49 (2021).

  21. Lample, Guillaume, et al. Neural Architectures for Named Entity Recognition [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. : 260–270 (2016).

  22. Devlin, Jacob, et al. Bert: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). : 4171–4186 (2019).

  23. Jia, C. et al. Entity enhanced BERT pre-training for Chinese NER[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). : 6384–6396. (2020).

  24. Chang, Y. et al. Chinese named entity recognition method based on BERT[C]//2021 IEEE international conference on data science and computer application (ICDSCA). IEEE, : 294–299. (2021).

  25. Sun, L. et al. RpBERT: a text-image relation propagation-based BERT model for multimodal NER[C]//Proceedings of the AAAI conference on artificial intelligence. 35(15): 13860–13868. (2021).

  26. Souza, F., Nogueira, R. & Lotufo, R. Portuguese named entity recognition using BERT-CRF[J]. arXiv preprint arXiv:1909.10649. (2019).

  27. Hu, S. et al. Chinese Named Entity Recognition based on BERT-CRF Model[C]//2022 IEEE/ACIS 22nd International Conference on Computer and Information Science (ICIS). IEEE, : 105–108. (2022).

  28. Dai, Z. et al. Named entity recognition using BERT BiLSTM CRF for Chinese electronic health records[C]//2019 12th international Congress on image and signal processing, biomedical engineering and informatics (cisp-bmei). IEEE, : 1–5. (2019).

  29. Pawar, S., Palshikar, G. K. & Bhattacharyya, P. Relation extraction: A survey[J]. (2017). arXiv preprint arXiv:1712.05191.

  30. Liu, C. Y. et al. Convolution neural network for relation extraction[C]//International conference on advanced data mining and applications. 231–242 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2013).

  31. Smirnova, A. & Cudré-Mauroux, P. Relation extraction using distant supervision: A survey[J]. ACM Comput. Surv. (CSUR). 51 (5), 1–35 (2018).

    Google Scholar 

  32. Lin, Y. et al. Neural relation extraction with selective attention over instances[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). : 2124–2133. (2016).

  33. Peng, H. et al. Learning from context or names? An empirical study on neural relation extraction[C]//Poceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). : 3661–3672 (2020).

  34. Alt, C., Hübner, M. & Hennig, L. Fine-tuning pre-trained transformer Language models to distantly supervised relation extraction[J]. (2019). arXiv preprint https://doi.org/10.48550/arXiv.1906.08646.

  35. Han, Xu, et al. OpenNRE: An open and extensible toolkit for neural relation extraction [C]//Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP): system demonstrations. :169–174 (2019).

  36. Nasar, Z., Jaffry, S. W. & Malik, M. K. Named entity recognition and relation extraction: State-of-the-art[J]. ACM Comput. Surv. (CSUR). 54 (1), 1–39 (2021).

    Google Scholar 

  37. Brown, T. et al. Language models are few-shot learners. NeurIPS (2020).

  38. Touvron, H. et al. LLaMA: Open and efficient foundation language models. arXiv:2302.13971 (2023).

  39. Mosbach, M. et al. On the stability of fine-tuning BERT: Misconceptions, explanations, and strong baselines. ICLR (2021).

  40. Hu, E. et al. LoRA: Low-rank adaptation of large language models. ICLR (2021).

  41. Liang, Y. et al. GPT4Tool: Connecting large language models with massive tools via instruction tuning. ACL (2023).

  42. Houlsby, N. et al. Parameter-efficient transfer learning for NLP. ICML (2019).

  43. Li, X. et al. Prefix-tuning: Optimizing continuous prompts for generation. ACL (2021).

  44. Lester, B. et al. The power of scale for parameter-efficient prompt tuning. EMNLP (2021).

  45. Zaken, E. et al. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. AACL (2022).

  46. Dettmers, T. et al. QLoRA: Efficient finetuning of quantized LLMs. NeurIPS (2023).

  47. Zhang, Y. et al. Knowledge graph completion for high-speed railway turnout maintenance based on multi-level KBGC model. MDPI Actuators 13 (10), 410 (2024).

  48. Chen, L. et al. Exploring Bridge Maintenance Knowledge Graph by Leveraging Graph Data Mining ( Automation in Construction, 2024).

  49. Wang, Q. et al. A Knowledge graph-based Approach for Exploring Railway Operational Accidents ( Reliability Engineering & System Safety, 2021).

  50. Liu, W. et al. Knowledge graph construction based on joint model for equipment maintenance. MDPI Math 11 (17), 3748 (2023).

  51. Zhou, H. et al. Multi-domain fusion for cargo UAV fault diagnosis knowledge graph. J. Intell. Manuf. 4 (1), 10 (2024).

  52. Zhou, W. et al. Universalner: targeted distillation from large Language models for open named entity recognition. ArXiv Preprint https://doi.org/10.48550/arXiv.2308.03279, (2023).

  53. Guo, Q. et al. BANER: Boundary-aware LLMs for few-shot named entity recognition [C]//Proceedings of the 31st International Conference on Computational Linguistics. : 10375–10389 (2025).

  54. Wang, J. et al. When phrases meet probabilities: enabling open relation extraction with cooperating large language models Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). (2024).

  55. Cabot, P. et al. REBEL: Relation extraction by end-to-end language generation. EMNLP (2021).

  56. Hsieh, C. et al. In-context learning for few-shot knowledge graph completion. AAAI (2024).

  57. Wang, Y. et al. ClinicalKG: Automatic construction of clinical knowledge graphs using LLMs. JAMIA (2023).

Download references

Acknowledgements

The results and knowledge included herein have been obtained owing to support from the Harbin Institute of Technology, project no. 2024M071077003.

Funding

The results and knowledge included herein have been obtained owing to support from the Harbin Institute of Technology, project no. 2024M071077003.

Author information

Authors and Affiliations

  1. Northwestern Polytechnical University, Xi’an, China

    Li Peng & Pei Yang

  2. Chinese Aeronautical Establishment, Beijing, China

    Li Peng

  3. Harbin Institute of Technology, Harbin, China

    Ye Juexiang

  4. Beihang University, Beijing, China

    Li Yuangan

Authors
  1. Li Peng
    View author publications

    Search author on:PubMed Google Scholar

  2. Pei Yang
    View author publications

    Search author on:PubMed Google Scholar

  3. Ye Juexiang
    View author publications

    Search author on:PubMed Google Scholar

  4. Li Yuangan
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Li Peng and Pei Yang designed experimental algorithms and experiments, and wrote the main manuscript text. Ye Juexiang developed code and implemented the experiments. Li Yuangan reviewed and proofread the manuscript.

Corresponding author

Correspondence to Li Peng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, L., Yang, P., Juexiang, Y. et al. The construction and refined extraction techniques of knowledge graph based on large language models. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38066-w

Download citation

  • Received: 29 May 2025

  • Accepted: 28 January 2026

  • Published: 10 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-38066-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Knowledge graph
  • Adapted large language model
  • Multimodal knowledge integration
  • Operational decision support
  • Dynamic knowledge network
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics