Performance evaluation of GPT-4o on South Korean national exams for building mechanical equipment maintenance

Choi, Haneul; Lee, Jehyun; Kim, Jonghun

doi:10.1038/s41598-025-16118-x

Download PDF

Article
Open access
Published: 19 August 2025

Performance evaluation of GPT-4o on South Korean national exams for building mechanical equipment maintenance

Haneul Choi¹,
Jehyun Lee² &
Jonghun Kim¹

Scientific Reports volume 15, Article number: 30436 (2025) Cite this article

3312 Accesses
1 Citations
Metrics details

Subjects

Abstract

This study evaluates the applicability of large language models (LLMs) in mechanical equipment maintenance in buildings by assessing GPT-4o’s performance on two national certification exams in South Korea: Engineer Energy Management (EEM) and Engineer Air-Conditioning Refrigerating Machinery (EACRM). GPT-4o achieved average scores of 80.6 and 81.25 on the EEM and EACRM exams, respectively, passing all five attempts. The model performed well on both non-calculation and calculation problems and demonstrated high consistency, with an average response consistency of 97%. Despite these strengths, three key limitations were identified: weak advanced reasoning, difficulty in solving legal questions, and poor interpretation of scientific figures. Experimental results indicate that advanced reasoning can be improved using reasoning-optimized models, while legal question accuracy can be significantly enhanced with retrieval-augmented generation (RAG). However, figure interpretation remains dependent on advancements in visual recognition capabilities. These findings suggest that GPT-4o possesses foundational knowledge applicable to mechanical equipment maintenance in buildings but also highlight the need to address certain limitations for practical implementation. This study provides a foundation for future research on integrating LLMs into industrial applications, such as maintenance management software, to enhance maintenance efficiency and address workforce shortages.

Nexus of environmental management accounting, and carbon emission management on environmental, social, and governance performance: evidence from symmetrical and asymmetrical approach

Article Open access 12 July 2025

Characterize traction–separation relation and interfacial imperfections by data-driven machine learning models

Article Open access 12 July 2021

Within-project and cross-project defect prediction based on model averaging

Article Open access 21 February 2025

Introduction

Over the past two decades, research has highlighted a substantial increase in global energy consumption, primarily driven by rapid urbanization, population growth, economic expansion, and the rising demands and expectations of building occupants and owners¹. According to the latest report by the International Energy Agency (IEA), global final energy consumption reached 445 exajoules (EJ) in 2023, with the building sector accounting for 125 EJ, representing approximately 28% of total consumption². Among the various energy-consuming services in buildings, Heating, Ventilation, and Air Conditioning (HVAC) systems constitute the most energy-intensive category, contributing approximately 38% of total building energy consumption globally. The second-largest contributor is Domestic Hot Water (DHW), which accounts for approximately 13%¹. These two services, collectively classified as mechanical equipment, account for nearly 50% of total building energy consumption, underscoring the critical role of efficient mechanical equipment management in mitigating global energy consumption.

South Korea also experiences significant energy expenditures on mechanical equipment. Annually, approximately USD 19.2 billion is allocated to energy costs for mechanical equipment, representing approximately 71% of total energy consumption in domestic buildings³. Reflecting global trends, the proportion of energy consumption attributed to mechanical equipment in South Korean buildings remains notably high. In response to this challenge, the Mechanical Equipment Act⁴ was enacted in 2018 and came into effect in 2020. The South Korean government introduced this legislation to enhance the efficiency of mechanical equipment management and maintenance in buildings, aiming to reduce energy costs by 10% annually. This reduction is projected to yield cost savings of approximately USD 1.92 billion per year while contributing to national energy efficiency and greenhouse gas emissions reduction efforts.

A key provision of the Mechanical Equipment Act is the mandatory appointment of a mechanical equipment maintenance manager in designated buildings. These managers are certified professionals holding nationally recognized qualifications and are responsible for ensuring occupant comfort while optimizing energy consumption through the inspection, management, and operation of mechanical equipment. According to the Enforcement Decree of the Mechanical Equipment Act⁵buildings with a total floor area exceeding 10,000 m² or residential complexes with more than 500 units must appoint a designated number of maintenance managers based on the building size. For instance, buildings with a total floor area exceeding 60,000 m² are required to employ one principal maintenance manager and one assistant maintenance manager, while buildings with a total floor area between 10,000 m² and 15,000 m² are required to employ one principal maintenance manager. Currently, more than 40,000 buildings in South Korea fall under this regulatory requirement.

Despite these regulatory efforts, the field of mechanical equipment maintenance faces two critical workforce challenges. First, the supply of qualified professionals is insufficient relative to the number of buildings subject to regulatory requirements. Many building owners and operators express concerns over the financial burden associated with appointing maintenance managers, with an estimated annual cost of USD 38,000 per manager. Moreover, suburban and rural areas experience acute labor shortages. Second, the workforce in this field is aging, with a disproportionate number of older professionals. As of August 2024, data from the Korea Mechanical Construction Contractors Association⁶ indicate that 91% of mechanical equipment maintenance managers are aged 40 years or older. Consequently, while the implementation of the Mechanical Equipment Act has expanded the scope and importance of mechanical equipment maintenance, the sector continues to face significant labor shortages.

To address these workforce challenges, recent research and industry initiatives have explored the potential of artificial intelligence (AI) and robotics as supplementary solutions^7,8,9. In particular, the application of Large Language Models (LLMs) such as GPT¹⁰ has garnered increasing attention. LLMs are proficient in comprehending complex technical documents and extracting pertinent information, and they have been successfully implemented across various industries to assist human professionals. For example, Amazon has leveraged AI-powered chatbots based on LLMs to automate routine inquiries, reducing the workload of customer service representatives and enabling 24-hour support¹¹. Similarly, NVIDIA and Hippocratic AI have announced plans to develop an LLM-based virtual nurse capable of providing human-level video consultations to mitigate the nursing workforce shortage in the United States¹². Empirical studies have reported measurable productivity improvements attributable to LLMs. For instance, developers utilizing GitHub Copilot have been found to complete programming tasks 56% faster¹³while customer service agents using AI-based conversational assistants have demonstrated an average productivity increase of 14%¹⁴.

Given these advancements, an important question arises: Can LLMs be effectively utilized in the field of mechanical equipment maintenance? Specifically, could LLMs support tasks such as interpreting technical manuals, analyzing operational data, and assisting maintenance managers? A fundamental approach to evaluating the applicability of LLMs in a new domain is to assess their domain-specific knowledge. As summarized in Table 1, previous studies have demonstrated LLM proficiency across diverse disciplines. For example, healthcare is one of the most extensively studied domains in LLM knowledge evaluation, and LLMs have achieved high scores on various medical certification exams worldwide^{15,16,17,18,19}. In the legal domain, GPT-4 ranked in the top 10% of examinees on the U.S. Bar Exam with a 90% accuracy rate²⁰. LLMs have also demonstrated notable success in domain-specific assessments in fields such as agriculture²¹ and engineering^22,23,24. More recently, a study reported that various LLMs, including GPT, possess a solid understanding of HVAC design, highlighting their growing potential in technical fields²⁴.

Table 1 Research reviews of LLM performance across domains.

Full size table

In this context, the present study aims to evaluate the applicability of LLMs in the domain of mechanical equipment maintenance in South Korea. Specifically, we investigate whether GPT, a representative LLM, possesses the requisite knowledge for mechanical equipment maintenance tasks. Given that certification as a mechanical equipment maintenance manager requires passing a national technical qualification examination, we assess GPT’s performance on the relevant exam questions. If GPT successfully passes the examination, it would provide evidence that the model possesses the foundational knowledge required for this professional role. Beyond simply determining whether GPT can pass the exam, this study also conducts a detailed analysis of its problem-solving process to identify strengths and limitations in its application to mechanical equipment maintenance. The research framework is illustrated in Fig. 1.

To the best of our knowledge, this is the first study to systematically evaluate the performance of LLMs on domain-specific national certification exams related to building mechanical equipment maintenance. Unlike prior work that has primarily focused on general language understanding or broad technical applications, this study uniquely explores how LLMs perform in highly specialized certification exams that directly reflect real-world engineering knowledge requirements. Although certification exams analyzed in this study are specific to South Korea, the engineering knowledge it assesses is largely applicable beyond national boundaries. The findings of this study can be utilized to develop practical methods for LLMs to assist mechanical equipment maintenance managers. If LLMs are integrated into maintenance tasks, they have the potential to address workforce shortages in the short term while enhancing maintenance efficiency in the long term, ultimately contributing to building energy savings.

Fig. 1

Subjects

Abstract

Similar content being viewed by others

Nexus of environmental management accounting, and carbon emission management on environmental, social, and governance performance: evidence from symmetrical and asymmetrical approach

Characterize traction–separation relation and interfacial imperfections by data-driven machine learning models

Within-project and cross-project defect prediction based on model averaging

Introduction

Methods

Examinee

Exam questions

Prompt design

Exam procedure

Evaluation metrics

Subject scores and overall scores

Accuracy

Consistency

Error distribution

Results and discussions

Quantitative evaluation

Subject scores and overall scores

The accuracy for each problem type

Consistency

Error distribution

Qualitative evaluation

Problem-solving ability for non-calculation problems

Problem-solving ability for calculation problems

Limitations of problem-solving

Lack of advanced reasoning ability

Difficulty in solving legal questions

Weakness in interpreting figures

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links