Abstract
With the growing need for intelligent decision-support systems, the development of high-quality knowledge graphs has become essential for improving operational efficiency and decision reliability. However, the specialized nature, distributed sources, and sensitive aspects of this knowledge present unique challenges to conventional knowledge management approaches. Current general-purpose large language models often struggle with domain-specific text comprehension, particularly in accurately interpreting technical parameters and operational guidelines. To address these limitations, this paper introduces a framework for building and refining specialized knowledge graphs using adapted large language models. Our approach involves fine-tuning base LLMs with domain-specific datasets, enabling them to better handle complex terminology and semantic nuances. The framework incorporates a multimodal knowledge integration pipeline that combines rule-based systems with ontological structures to extract and link entities from diverse data sources, creating an adaptive knowledge network. Experimental results demonstrate that our fine-tuned model achieves substantial gains in relationship extraction accuracy, while the resulting knowledge graph shows strong performance in semantic coherence and operational reasoning assessments, offering robust support for critical decision-making processes. This research presents a novel approach for effective knowledge integration and cross-functional collaboration in specialized domains.
Data availability
Data sets generated during the current study are available from the corresponding author on reasonable request. The data are available but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.
References
Tamašauskaitė, G. & Groth, P. Defining a knowledge graph development process through a systematic review[J]. ACM Trans. Softw. Eng. Methodol. 32 (1), 1–40 (2023).
AlMousa, M., Benlamri, R. & Khoury, R. A Novel Word Sense Disambiguation Approach Using WordNet Knowledge graph[J] Vol. 74, 101337 (Computer Speech & Language, 2022).
Pellissier Tanon, T., Weikum, G. & Suchanek, F. Yago 4: A reason-able knowledge base[C]//The Semantic Web: 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31–June 4, 2020, Proceedings 17. Springer International Publishing, : 583–596. (2020).
Singh, K. et al. No one is perfect: analysing the performance of question answering components over the dbpedia knowledge graph[J]. J. Web Semant. 65, 100594 (2020).
Arnaout, H. et al. Negative knowledge for open-world Wikidata[C]//Companion Proceedings of the Web Conference 2021. : 544–551. (2021).
Xiong, C., Power, R. & Callan, J. Explicit semantic ranking for academic search via knowledge graph embedding[C]//Proceedings of the 26th international conference on world wide web. : 1271–1279. (2017).
Zhang, F. et al. Collaborative knowledge base embedding for recommender systems[C]//Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. : 353–362. (2016).
Wang, Z. Y. et al. Survey of intelligent question answering research based on knowledge graph[J]. Comput. Eng. Appl. 56 (23), 1–11 (2020).
Martinez-Rodriguez, J. L., Hogan, A. & Lopez-Arevalo, I. Information extraction Meets the semantic web: a survey[J]. Semantic Web. 11 (2), 255–335 (2020).
Chen, B. et al. Unleashing the potential of prompt engineering in large Language models[J]. Patterns 6 (6), 101260 (2025).
Fan, W. et al. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. : 6491–6501. (2024).
Miller, G. A. WordNet: a lexical database for English[J]. Commun. ACM. 38 (11), 39–41 (1995).
Fellbaum, C. & WordNet An Electronic Lexical database[J] Vol. 2, 678–686 (MIT Press google schola, 1998).
Bizer, C. et al. Dbpedia-a crystallization point for the web of data[J]. J. Web Semant. 7 (3), 154–165 (2009).
Auer, S. et al. Dbpedia: A nucleus for a web of open data[C]//international semantic web conference. Berlin, Heidelberg: Springer Berlin Heidelberg, : 722–735. (2007).
Bizer, C., Heath, T. & Berners-Lee, T. Linked data-the story so far[M]//Linking the World’s Information: Essays on Tim Berners-Lee’s Invention of the World Wide Web. : 115–143. (2023).
Subagdja, B. et al. Machine learning for refining knowledge graphs: A Survey[J]. ACM Comput. Surveys. 56 (6), 1–38 (2024).
Zhong, L. et al. A comprehensive survey on automatic knowledge graph construction[J]. ACM Comput. Surveys. 56 (4), 1–62 (2023).
Schmitt, X. et al. A replicable comparison study of NER software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate[C]//2019 sixth international conference on social networks analysis, management and security (SNAMS). 338–343. (IEEE, 2019).
Che, Wanxiang, et al. N-LTP: An open-source neural language technology platform for Chinese[C]//Proceedings of the 2021 conference on empirical methods in natural language processing: System demonstrations. : 42–49 (2021).
Lample, Guillaume, et al. Neural Architectures for Named Entity Recognition [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. : 260–270 (2016).
Devlin, Jacob, et al. Bert: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). : 4171–4186 (2019).
Jia, C. et al. Entity enhanced BERT pre-training for Chinese NER[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). : 6384–6396. (2020).
Chang, Y. et al. Chinese named entity recognition method based on BERT[C]//2021 IEEE international conference on data science and computer application (ICDSCA). IEEE, : 294–299. (2021).
Sun, L. et al. RpBERT: a text-image relation propagation-based BERT model for multimodal NER[C]//Proceedings of the AAAI conference on artificial intelligence. 35(15): 13860–13868. (2021).
Souza, F., Nogueira, R. & Lotufo, R. Portuguese named entity recognition using BERT-CRF[J]. arXiv preprint arXiv:1909.10649. (2019).
Hu, S. et al. Chinese Named Entity Recognition based on BERT-CRF Model[C]//2022 IEEE/ACIS 22nd International Conference on Computer and Information Science (ICIS). IEEE, : 105–108. (2022).
Dai, Z. et al. Named entity recognition using BERT BiLSTM CRF for Chinese electronic health records[C]//2019 12th international Congress on image and signal processing, biomedical engineering and informatics (cisp-bmei). IEEE, : 1–5. (2019).
Pawar, S., Palshikar, G. K. & Bhattacharyya, P. Relation extraction: A survey[J]. (2017). arXiv preprint arXiv:1712.05191.
Liu, C. Y. et al. Convolution neural network for relation extraction[C]//International conference on advanced data mining and applications. 231–242 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2013).
Smirnova, A. & Cudré-Mauroux, P. Relation extraction using distant supervision: A survey[J]. ACM Comput. Surv. (CSUR). 51 (5), 1–35 (2018).
Lin, Y. et al. Neural relation extraction with selective attention over instances[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). : 2124–2133. (2016).
Peng, H. et al. Learning from context or names? An empirical study on neural relation extraction[C]//Poceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). : 3661–3672 (2020).
Alt, C., Hübner, M. & Hennig, L. Fine-tuning pre-trained transformer Language models to distantly supervised relation extraction[J]. (2019). arXiv preprint https://doi.org/10.48550/arXiv.1906.08646.
Han, Xu, et al. OpenNRE: An open and extensible toolkit for neural relation extraction [C]//Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP): system demonstrations. :169–174 (2019).
Nasar, Z., Jaffry, S. W. & Malik, M. K. Named entity recognition and relation extraction: State-of-the-art[J]. ACM Comput. Surv. (CSUR). 54 (1), 1–39 (2021).
Brown, T. et al. Language models are few-shot learners. NeurIPS (2020).
Touvron, H. et al. LLaMA: Open and efficient foundation language models. arXiv:2302.13971 (2023).
Mosbach, M. et al. On the stability of fine-tuning BERT: Misconceptions, explanations, and strong baselines. ICLR (2021).
Hu, E. et al. LoRA: Low-rank adaptation of large language models. ICLR (2021).
Liang, Y. et al. GPT4Tool: Connecting large language models with massive tools via instruction tuning. ACL (2023).
Houlsby, N. et al. Parameter-efficient transfer learning for NLP. ICML (2019).
Li, X. et al. Prefix-tuning: Optimizing continuous prompts for generation. ACL (2021).
Lester, B. et al. The power of scale for parameter-efficient prompt tuning. EMNLP (2021).
Zaken, E. et al. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. AACL (2022).
Dettmers, T. et al. QLoRA: Efficient finetuning of quantized LLMs. NeurIPS (2023).
Zhang, Y. et al. Knowledge graph completion for high-speed railway turnout maintenance based on multi-level KBGC model. MDPI Actuators 13 (10), 410 (2024).
Chen, L. et al. Exploring Bridge Maintenance Knowledge Graph by Leveraging Graph Data Mining ( Automation in Construction, 2024).
Wang, Q. et al. A Knowledge graph-based Approach for Exploring Railway Operational Accidents ( Reliability Engineering & System Safety, 2021).
Liu, W. et al. Knowledge graph construction based on joint model for equipment maintenance. MDPI Math 11 (17), 3748 (2023).
Zhou, H. et al. Multi-domain fusion for cargo UAV fault diagnosis knowledge graph. J. Intell. Manuf. 4 (1), 10 (2024).
Zhou, W. et al. Universalner: targeted distillation from large Language models for open named entity recognition. ArXiv Preprint https://doi.org/10.48550/arXiv.2308.03279, (2023).
Guo, Q. et al. BANER: Boundary-aware LLMs for few-shot named entity recognition [C]//Proceedings of the 31st International Conference on Computational Linguistics. : 10375–10389 (2025).
Wang, J. et al. When phrases meet probabilities: enabling open relation extraction with cooperating large language models Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). (2024).
Cabot, P. et al. REBEL: Relation extraction by end-to-end language generation. EMNLP (2021).
Hsieh, C. et al. In-context learning for few-shot knowledge graph completion. AAAI (2024).
Wang, Y. et al. ClinicalKG: Automatic construction of clinical knowledge graphs using LLMs. JAMIA (2023).
Acknowledgements
The results and knowledge included herein have been obtained owing to support from the Harbin Institute of Technology, project no. 2024M071077003.
Funding
The results and knowledge included herein have been obtained owing to support from the Harbin Institute of Technology, project no. 2024M071077003.
Author information
Authors and Affiliations
Contributions
Li Peng and Pei Yang designed experimental algorithms and experiments, and wrote the main manuscript text. Ye Juexiang developed code and implemented the experiments. Li Yuangan reviewed and proofread the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Peng, L., Yang, P., Juexiang, Y. et al. The construction and refined extraction techniques of knowledge graph based on large language models. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38066-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-38066-w