Introduction

With the rapid development of the global economy and continuous technological advancements, the gap between industrial demands and educational training objectives has become increasingly evident. To narrow this gap, the concept of industry-education integration has emerged, aiming to strengthen cooperation between the industrial and educational sectors, achieving a close alignment between talent cultivation goals and societal needs. Industry-education integration not only promotes the renewal of educational content and methods but also provides students with more abundant practical opportunities and employment pathways1. However, as the model of industry-education integration is being promoted and deepened, how to accurately evaluate undergraduate students’ professional core competencies in this context has become a key concern for educators and the industrial sector alike.

The cultivation and evaluation of professional core competencies is an important component of higher education quality assurance systems and is directly related to students’ comprehensive quality and career development abilities2. In the context of industry-education integration, this evaluation faces more complex challenges, including the diversification of evaluation standards, scientific methods of evaluation, and the applicability of evaluation results. These challenges require the evaluation system to comprehensively and accurately reflect students’ professional core competencies while adapting to the constantly changing industrial demands. Educational evaluation is a crucial part of the educational quality assurance mechanism and serves as an important tool for supervising and improving higher education quality3,4,5. Scientifically evaluating the professional core competencies of talent cultivation not only provides an objective measure of an institution’s talent development level but also serves as a “steering tool” to guide universities in continuous improvement, thus laying the foundation for the cultivation of outstanding undergraduate talents with a sense of national and social responsibility and both virtue and talent. Practical and feasible evaluation indicators are the foundation and key to effective evaluation6,7,8.

In the context of global higher education reform, scholars have conducted research from multiple dimensions, such as critical thinking, feedback literacy, collaborative learning, authentic assessment, employability, and innovative assessment methods, enriching the understanding of the connotation of students’ core competencies and exploring evaluation paths. Critical thinking is considered an essential component of 21st-century university students’ core competencies. Din (2020), through research on Pakistani undergraduates, found that critical reading skills could serve as an effective indicator for measuring critical thinking levels9. This study pointed out that teaching and evaluation designs should consciously guide students to engage in in-depth reading, critical understanding, and logical reasoning, thereby improving their cognitive levels and problem-solving abilities. This finding provides a clear direction for the “higher-order thinking” dimension in the core competency evaluation system. In the context of diverse talent development, constructing an effective layered and categorized education assessment mechanism is a challenge that universities need to address. Feuchter and Preckel (2022) explored the impact of ability grouping on gifted students’ learning experiences and found that full-time ability grouping significantly reduced students’ feelings of boredom and burnout during the learning process, thus stimulating their intrinsic motivation to learn10. This study provides empirical references for how higher education can balance individual differences and fairness in curriculum design and evaluation mechanisms. In the context of new teaching models increasingly emphasizing the learner’s central role, student feedback literacy has become an essential part of ability development. Chong (2021) re-examined the generation mechanism of student feedback literacy from an ecological perspective, arguing that it is not only the product of teaching behaviors but also influenced by multiple ecological factors such as the teacher’s role, teaching context, and course objectives11. Ibarra-Sáiz et al. (2020) emphasized that through a systematic peer evaluation process, students could develop judgmental awareness and self-regulation abilities while receiving feedback12. These two studies collectively emphasize that the evaluation indicator system should include dimensions reflecting students’ “metacognitive abilities,” “feedback comprehension and application abilities,” and “self-regulated learning paths.” Traditional evaluation methods often focus on outcome-based judgments and fail to accurately reflect the development of students’ true competencies. Ajjawi et al. (2020) proposed that authentic assessment, as a bridge connecting school education and industrial practice, is especially crucial in the context of industry-education integration13. Their research found that the complexity of work situations requires higher education assessment mechanisms to be more aligned with real-world contexts, thus providing positive guidance for students’ overall quality formation. Similarly, Sokhanvar et al. (2021), through a systematic literature review, pointed out that authentic assessment not only enhances students’ learning engagement and satisfaction but also improves their vocational skills and employability14. Therefore, the evaluation indicator system should include indicators such as “task-oriented” and “practical relevance,” which reflect problem-solving abilities in real-world contexts. Römgens et al. (2020), after reviewing research on employability in higher education and the workforce, proposed that “employability” should be regarded as a dynamic competency set, rather than a static outcome15. Their research breaks down employability into dimensions such as “basic skills, attitude characteristics, reflective abilities, networking, and adaptability,” emphasizing that universities should enhance students’ cross-context adaptability through systematic course design and assessment mechanisms. Lavi et al. (2021) also pointed out that students in STEM fields generally recognize that teaching methods significantly help the development of their communication, collaboration, and critical thinking skills, thus verifying the close relationship between teaching methods and the development of employability16. In terms of higher education evaluation system reform, Meijer et al. (2020) focused on assessment literacy in collaborative learning and pointed out that current evaluation methods mostly remain at the level of “task completion” and ignore core dimensions such as “cognitive contribution,” “role responsibility,” and “collective knowledge construction” during the learning process17. Bin Mubayrik (2020) further explored the integration trends of formative and summative assessments in adult education, proposing that dynamic, blended evaluation mechanisms should be built in diverse learning contexts to enhance learner participation and sustained learning motivation18. These studies provide theoretical support for building a comprehensive, diversified undergraduate core competency evaluation model.

Based on the existing research, it can be observed that current studies have accumulated rich results in identifying key student competencies, pathways for competency improvement, and optimization of assessment mechanisms. However, there are still the following deficiencies: (1) Research mainly focuses on a single competency dimension, lacking the construction of a systematic and complete core competency framework; (2) The application of core competency assessment models in engineering, information technology, teacher training, and other fields is still insufficient; (3) In China, there is a lack of theoretical support and empirical exploration for constructing a localized industry-education integration competency evaluation model.

In the practical implementation of industry–education integration, industry stakeholders possess frontline expertise and contextual awareness regarding competency profiling, job capability requirements, and employment standards. To ensure the usability and transferability of the evaluation system, this study incorporated industry stakeholder perspectives from the problem-definition stage onward: (1) During conceptualization and framework development, high-frequency competency elements and observable evidence from the employer side were collected to align evaluation indicators with real workplace needs. (2) In the expert consultation and interview phases, substantive participation from enterprise representatives was ensured, and indicators related to job fit, task orientation, and collaborative education were embedded throughout the curriculum–teaching–evaluation process. (3) During the empirical phase, data were collected using jointly rated scales by universities and enterprises, thereby enhancing the interpretability and operability of evaluation results from the employer’s perspective. This design contributes to narrowing the evaluation gap between educational outcomes and job requirements, fostering a closer alignment between higher education talent cultivation and industrial competency demands.

Therefore, this study intends to use the CIPP evaluation model19,20,21 as the theoretical basis to construct an evaluation indicator system for undergraduate students’ professional core competencies in the context of industry-education integration and, based on this, integrate the fuzzy comprehensive evaluation method to establish a comprehensive evaluation model. The research will use a certain undergraduate institution as a case study, collect evaluation data from students in typical engineering programs, and conduct empirical testing on the applicability and scientific nature of the model.

Definition of related concepts

Industry–education integration

Industry-education integration refers to the deep collaboration between the industrial sector and the education sector, aimed at narrowing the gap between educational training and industrial demands. This collaborative model typically involves multiple levels, including curriculum design, innovation in teaching methods, internship and training arrangements, complementary faculty resources, research project cooperation, and the joint development of talent cultivation models. Essentially, industry-education integration is an adaptation and optimization of the education system’s response to market demands. Through close collaboration with the industry, higher education institutions can gain a more direct understanding of industry development trends and skill requirements, allowing them to adjust curricula and training programs to ensure the practicality and forward-thinking nature of the knowledge and skills students acquire. At the same time, businesses can engage in the educational process, helping universities cultivate high-quality talent that meets industry needs while also building their future talent pool. Industry-education integration is not only a reform in the educational sector but also a necessity for socio-economic development. It helps build a bridge for the transfer of knowledge and skills, drives innovation in educational content and methods, and cultivates more high-quality talent that meets the requirements of the times.

Professional core competencies

Professional core competencies refer to the essential set of knowledge, skills, attitudes, and values required to achieve professional literacy and competence in a specific academic field. These competencies are indispensable for individuals to succeed within a particular professional domain and are an important goal in the educational process. Professional core competencies encompass not only the mastery of professional knowledge but also the ability to solve practical problems, critical thinking, innovation, communication and collaboration, professional ethics, and the ability to engage in lifelong learning. In the educational field, the cultivation of professional core competencies is a key focus of curriculum design and teaching activities, aimed at ensuring students can smoothly transition into their professional careers after graduation and effectively adapt to and promote the development of the relevant industry. The evaluation and enhancement of professional core competencies are critical components of higher education quality assurance systems. By establishing clear competency goals, designing effective teaching programs, and developing evaluation mechanisms, students’ knowledge construction, skills development, and personal growth in their professional fields can be promoted.

CIPP evaluation model

The CIPP model, proposed by Stufflebeam in the 1970s, is a comprehensive evaluation model used to improve programs, projects, products, or systems22,23,24. The model consists of four evaluation phases:

  1. (1)

    Context Evaluation: Aims to understand the needs, problems, and opportunities and determine the rationality of goals and objectives.

  2. (2)

    Input Evaluation: Assesses resources, strategies, and plans to select the most effective action plans.

  3. (3)

    Process Evaluation: Monitors the implementation process to ensure the effective execution of the plan and makes timely adjustments.

  4. (4)

    Product Evaluation: Evaluates outcomes and effects, including both expected and unexpected, positive and negative results.

The CIPP evaluation model is characterized by its process-oriented, comprehensive, and feedback-based nature. The model has been widely applied in teaching evaluation, curriculum system evaluation, teacher evaluation, and the construction of educational evaluation systems.

Research methodology for the evaluation index system

Formation of the research team

The research team consists of 7 members, including one professor specializing in undergraduate teaching research, one associate professor engaged in undergraduate education research, one doctoral student, one lecturer, and three master’s students. The faculty members are responsible for resource coordination, selecting interviewees, consulting experts, and reviewing the content of the questionnaire items. The graduate students are responsible for literature review, constructing the questionnaire item pool, developing interview outlines, designing expert consultation questionnaires, organizing interview materials, and compiling expert feedback. Finally, the team will collaboratively discuss and finalize the evaluation index system.

Construction of the expert consultation questionnaire

Literature review

This study is based on the CIPP evaluation theory model, using Chinese keywords such as “Industry-Education Integration, Undergraduates, Talent Development, Education Evaluation, Quality Evaluation, Education Standards, Professional Core Competencies, Evaluation Indicators, Delphi,” and English keywords such as “Integration of Industry and Education, Undergraduates, Talent Development, Education Evaluation, Quality Evaluation, Education Standards, Professional Core Competencies, Evaluation Indicators, Delphi.” These keywords were used to search for relevant articles in databases such as CNKI, Wanfang, VIP, PubMed, and Web of Science. The search period was from April 7, 2004, to April 7, 2024.

Based on the results of the literature review, categories of professional core competency evaluation indicators for undergraduates were extracted. Reference was made to the standards of regional accreditation agencies in the United States, ABET accreditation standards, AACSB accreditation standards, as well as China’s “Ministry of Education Undergraduate Teaching Work Review and Evaluation Standards” and the comprehensive evaluation index systems for undergraduate programs in some provinces. These were used to construct the initial draft framework of the undergraduate professional core competency evaluation questionnaire, based on the CIPP model, including several items.

Interviews with undergraduate education experts

On the basis of the literature analysis, the research team developed an interview outline as follows:

  1. (1)

    Could you share your understanding of professional core competencies for undergraduates in the context of industry-education integration?

  2. (2)

    What core competencies do you think undergraduates need to develop under the framework of industry-education integration?

  3. (3)

    What factors do you believe influence the quality of talent cultivation in undergraduate institutions?

  4. (4)

    What conditions should undergraduate institutions meet to cultivate professionals with core competencies? Among these, which are the most important?

A purposive sampling method was used to select teaching staff and teaching management personnel from undergraduate institutions, as well as enterprise representatives for the interviews. Expert inclusion criteria: (1) Master’s degree or higher, associate professor or higher title, with at least 10 years of work experience in undergraduate education or management; (2) Bachelor’s degree or higher, intermediate or higher title, with at least 5 years of work experience in undergraduate teaching; (3) Voluntary participation in the study. Expert exclusion criteria: withdrawal during the interview process. Based on the inclusion criteria, 10 experts, including faculty members, teaching management staff, and enterprise personnel, were selected for interviews. The basic information of the experts is shown in Table 1.

Table 1 Basic information of interviewed experts.

Finally, the interview results were analyzed and coded. After multiple group discussions, the content of the questionnaire based on the CIPP model was supplemented and refined to develop the initial draft of the undergraduate professional core competency evaluation questionnaire.

Construction of the consultation questionnaire

The final consultation questionnaire was constructed based on the four primary indicators, twelve secondary indicators, and seventy-four tertiary indicators identified in the undergraduate professional core competency evaluation questionnaire. The questionnaire consists of the following sections:

  1. (1)

    Introduction and Instructions: A brief introduction to the research background and objectives.

  2. (2)

    Core Competency Evaluation Indicator System Consultation Form: This includes the various levels of indicators and their importance ratings. The importance of each indicator was evaluated using the Likert 5-point scale25,26, with scores ranging from 1 to 5, from low to high importance. There is a section for modifying and adding items.

  3. (3)

    Expert Basic Information Survey: This section gathers information on the expert’s gender, education, age, and title.

  4. (4)

    Expert Authority Survey: This includes questions regarding the expert’s familiarity with the consultation issues and the basis for their judgments.

Implementation of expert consultation

The key to the Delphi method27,28 is the selection of experts. In this study, 21 experts from schools and enterprises in six provinces and cities (Guangzhou, Shanghai, Fujian, Xi’an, Ningxia, and Jiangxi) were invited to participate. The expert inclusion criteria were: A bachelor’s degree or higher; Associate professor or higher title; At least 15 years of work experience in higher education teaching or enterprise work; At least 10 years of experience in teaching management in higher education; Voluntary participation in the study. According to the literature, the number of experts participating in Delphi consultation should be between 15 and 50. This study followed the recommendation of McMillan et al.29 and invited 21 experts.

This study covers six provinces and cities, including Guangzhou, Shanghai, Fujian, Xi’an, Ningxia, and Jiangxi, with the selection criteria based on: Regional economic and industrial structure differences (Eastern/Central/Western regions, with a balance of manufacturing and services). Differences in types and levels of universities (a mix of research-oriented, application-oriented, and regional universities). The industry concentration and university–industry collaboration foundation corresponding to the target disciplines (such as Mechanical Engineering). Due to constraints in research funding and expert availability, full coverage of all regions was not achievable. To control for regional bias, stratified comparisons of expert ratings from different regions were conducted in the two rounds of the Delphi method, and sensitivity analyses were performed. After removing any regional sub-sample, the changes in Kendall’s W and the weights of key indicators remained within the preset threshold, ensuring the stability of the conclusions.

From January 2024 to April 2024, experts were sent the consultation questionnaire via WeChat or email, with a response request every 2–3 weeks. Two rounds of expert consultation were conducted. The threshold method was used to select indicators. The mean value of the importance ratings for each indicator from the consultation results was calculated, with the threshold for the mean value = mean - standard deviation, and the coefficient of variation threshold = mean + standard deviation. Indicators with a mean value above the threshold were considered high-priority indicators, and those with a coefficient of variation below the threshold were considered low-priority indicators. Indicators that did not meet two of the threshold criteria were deleted, while those with one threshold criterion not met were discussed by the group and either modified or deleted based on expert opinions.

Statistical methods

Statistical analysis was conducted. Descriptive analysis was used for the basic information of the experts. Expert reliability was assessed using expert engagement level, expert authority coefficient (Cr), the degree of consensus in expert opinions, and the coordination coefficient (Kendall’s W). The expert authority coefficient (Cr) was calculated as the average of the expert judgment basis coefficient (Ca) and expert familiarity coefficient (Cs), using the formula: Cr = (Ca + Cs) / 2. The assignment of Ca and Cs values is shown in Tables 2 and 3. The concentration of expert opinions was expressed by the average value and standard deviation of the importance ratings of the indicators. Coordination was expressed by Kendall’s W and the coefficient of variation (CV).

Table 2 Expert judgment basis assignment table (Ca).
Table 3 Expert familiarity level assignment table (Cs).

The Analytic Hierarchy Process (AHP) method was used to calculate the weights of the indicators and to establish a hierarchical model. The mean values of the importance ratings from the second round of indicator evaluation were used to determine the Satty scale. A judgment matrix was then constructed to calculate the weights of each level and indicator in the hierarchy. Consistency checks were performed to ensure the reliability of the comparisons. A p-value of less than 0.05 was considered to indicate statistically significant differences.

Evaluation indicator system results

General information of experts

A total of 21 experts were invited to participate in the study. In the first round of consultation, 19 experts participated, and in the second round, 18 experts took part. The invited experts included individuals from university academic affairs offices, university teaching and research development centers, and enterprise leaders. Their research fields covered university teaching, related enterprises, and university teaching management. The average age of the experts in the first round of consultation was (51.64 ± 6.39) years, and their average work experience was (30.84 ± 6.91) years. In the second round of consultation, the average age of the experts was (52.28 ± 5.90) years, and their average work experience was (31.56 ± 6.35) years. Detailed expert information is shown in Table 4.

Table 4 Expert general information.

As shown in Table 4, the proportion of full professors is relatively high, with the aim of strengthening the methodological rigor and enhancing inter-agency coordination efficiency during the early stages of indicator selection. To reduce the structural bias introduced by overrepresentation of full professors, two control strategies were employed: Authority coefficient stratified comparison: The Cr and W values for full and associate professors, as well as for sub-samples from enterprises and universities, were calculated separately. The results show no significant inter-group differences. Resampling sensitivity analysis: A reweighting experiment was conducted with a 0.8 weight reduction for the full professor group, and the rankings of first- and second-level indicator weights remained unchanged. Furthermore, during the expert interview phase, coverage from the industry side (including project management, technology, HR, and recruitment departments) was ensured to strengthen the industry perspective.

Expert engagement level and authority coefficient

The expert engagement level was represented by the effective response rate of the surveys and the rate at which experts provided feedback. The response rates for the two rounds of the questionnaire were 90.48% and 94.74%, respectively. The expert feedback rates were 73.68% and 22.22%, respectively. These results indicate a relatively high level of expert involvement in this study. Detailed results are shown in Table 5.

Table 5 Expert engagement level.

The Cs values for the two rounds of consultation were 0.884 and 0.900, respectively. The Ca values were 0.911 and 0.931, respectively, while the Cr values were 0.898 and 0.916. Both rounds of expert authority coefficients were 0.840 and 0.845, exceeding 0.7, indicating a high level of expert authority and a high reliability of the consultation results.

Expert judgment basis coefficient (C a)

The Ca coefficient represents the extent to which experts score the importance of each indicator in the consultation questionnaire based on their personal theoretical analysis, practical experience, domestic and international references, or subjective judgment. The results are shown in Table 6.

First Round:

$$\begin{aligned} {C_{\text{a}}} & =({\text{7}} \times 0.{\text{3}}+{\text{12}} \times 0.{\text{2}})/{\text{19}}+({\text{17}} \times 0.{\text{5}}+{\text{2}} \times 0.{\text{4}})/{\text{19}}+({\text{5}} \times 0.{\text{1}}+{\text{14}} \times 0.{\text{1}}) \\ & \quad /{\text{19}}+({\text{5}} \times 0.{\text{1}}+{\text{4}} \times 0.{\text{1}}+{\text{1}}0 \times 0.{\text{1}})/{\text{19}}=0.{\text{927}} \\ \end{aligned}$$

Second Round:

$$\begin{aligned} {C_{\text{a}}} & =({\text{8}} \times 0.{\text{3}}+{\text{1}}0 \times 0.{\text{2}})/{\text{18}}+({\text{14}} \times 0.{\text{5}}+{\text{4}} \times 0.{\text{4}})/{\text{18}}+({\text{5}} \times 0.{\text{1}}+{\text{13}} \times 0.{\text{1}}) \\ & \quad /{\text{18}}+({\text{4}} \times 0.{\text{1}}+{\text{7}} \times 0.{\text{1}}+{\text{7}} \times 0.{\text{1}})/{\text{18}}=0.{\text{922}} \\ \end{aligned}$$
Table 6 Expert judgment basis self-assessment results.

Expert familiarity level

The familiarity level of the experts with the consultation content was divided into five levels. The self-assessment results for expert familiarity are shown in Table 7.

First Round:

$${C_{\text{s}}}=({\text{5}} \times 0.{\text{9}}+{\text{14}} \times 0.{\text{7}})/{\text{19}}=0.{\text{753}}$$

Second Round:

$${C_{\text{s}}}=({\text{6}} \times 0.{\text{9}}+{\text{12}} \times 0.{\text{7}})/{\text{18}}=0.{\text{767}}$$
(1)
Table 7 Expert familiarity level self-assessment results.

Expert authority coefficient (C r)

The expert authority level results are shown in Table 8.

Table 8 Expert authority coefficient.

Kendall’s W and the consistency of the weight vector were calculated by stratifying experts based on their category (enterprise/university) and title level (full/associate professor). The results show that the differences in W stratification coefficients between the two rounds were not significant (P > 0.05), and the overall CR was < 0.1. After performing reweighting and leave-one-out tests for the case where the proportion of full professors was relatively high, the rankings of the top five indicators remained stable, indicating that the conclusions are robust to the expert structure.

Expert opinion concentration and coordination coefficients

The concentration of expert opinions was represented by the average importance value, standard deviation, and the expert feedback rate. The results from the first round of consultation showed that the importance values of the indicator system ranged from 3.74 to 4.89, with standard deviations between 0.32 and 1.08. The expert feedback rate was 73.68%. The results from the second round showed that the importance values of the indicator system ranged from 3.83 to 4.94, with standard deviations between 0.24 and 1.03. The expert feedback rate was 22.22%. These results suggest that expert opinions on the various indicators became more consistent over time.

The coordination of expert opinions was represented by the coefficient of variation (CV) and Kendall’s concordance coefficient (Kendall’s W). The results from the first round showed that the CV values ranged from 0.000 to 0.270. The second round results showed that the CV values ranged from 0.048 to 0.290. The Kendall’s W coefficients for the indicators in both rounds of consultation were 0.182 and 0.244 (both P < 0.01), indicating a good level of coordination among expert opinions. Moreover, the Kendall’s W in the second round was higher than in the first round, suggesting that expert opinions became more unified in the second round, thereby enhancing the reliability of the consultation results. The results are shown in Table 9.

Table 9 Coordination degree of expert opinions and significance test for two rounds of consultation.

Expert consultation opinions and results

Based on the indicator selection criteria and expert consultation feedback, the two rounds of consultation indicators were deleted and modified through group discussions. In the first round, 74 tertiary indicators were considered for screening. The selection process followed a “deletion and modification” approach based on boundary values and expert opinions. Quantitative criteria: For each tertiary indicator, the mean importance score and the coefficient of variation (CV) of coordination were calculated. Indicators that did not meet the acceptable range were placed in the “deletion candidate pool.” Qualitative criteria: Each indicator was checked against three factors: its alignment with the CIPP dimensions, its observability and collectability (data availability, consistency in measurement), and its redundancy with retained indicators (high overlap or conceptual inclusion). Expert recommendations integration: Items with “conceptual overlap or unclear definitions” were prioritized for merging or rewriting, while items with unstable measurement criteria or strong dependency on external data were deleted.

First round consultation results

The mean importance scores for the primary indicators ranged from 4.37 to 4.79, with a CV ranging from 0.087 to 0.114. These indicators met the inclusion criteria and were retained without modification. The mean importance scores for the secondary indicators ranged from 4.26 to 4.89, with a CV ranging from 0.064 to 0.153. Two indicators were merged, such as combining “Faculty Team” and “Teacher Quality” into “Faculty Development.” Six indicators were modified, such as changing “Quality of Students” to “Attractiveness of Student Source.” The mean importance scores for the tertiary indicators ranged from 4.00 to 4.84, with a CV ranging from 0.077 to 0.246. Eight indicators were deleted, including “Teaching Competence of Faculty” and “Achievement of Talent Cultivation Goals.” Ten new indicators were added, including “Social Recognition” and “Implementation of Educational Reform.” Eight indicators were merged, such as combining “Teaching Formats,” “Teaching Methods,” and “Construction of Teaching Informatization” into “Smart Teaching Level.” Fifteen indicators were modified, such as changing “Reasonable Course Arrangement” to “Balance Between Theoretical and Practical Courses.” After these revisions, the second-round consultation questionnaire included 4 primary indicators, 11 secondary indicators, and 71 tertiary indicators.

Second round consultation results

The mean importance scores for the primary indicators ranged from 4.33 to 4.83, with a CV ranging from 0.079 to 0.136. These indicators met the inclusion criteria and were retained without modification. The mean importance scores for the secondary indicators ranged from 4.33 to 4.94, with a CV ranging from 0.049 to 0.159. No expert suggestions were made, and the indicators met the inclusion criteria, so they were retained without modification. The mean importance scores for the tertiary indicators ranged from 3.89 to 4.94, with a CV ranging from 0.049 to 0.216. Three indicators were deleted, including “Course Completion Rate” and “Number of Students Awarded in Other Categories.” Five indicators were modified, such as changing “Teaching Environment” to “Classroom Teaching Atmosphere.”

Through two rounds of expert consultation and indicator screening, the final talent development evaluation indicator system for undergraduates in the context of industry-education integration consists of 4 primary indicators, 11 secondary indicators, and 68 tertiary indicators. The detailed system is shown in Table 10.

Weighting of core competency evaluation indicators for undergraduate programs in the context of industry-education integration

The Analytic Hierarchy Process (AHP) was used to determine the weights of indicators at each level. A random consistency test (CR < 0.1) showed that the weight distribution of the indicators at each level is reasonable and scientifically sound. The details are shown in Table 10.

Table 10 Industry–education integration perspective core competency evaluation indicator system for undergraduate programs and weights.

Explanation of key tertiary indicators

A1 Training Objectives. A1-1 Alignment of Training Objectives with Professional Development and Institutional Positioning: This measures the degree to which the training objectives align with the institution’s educational positioning, professional development plans, and industry trends. A1-2 Guiding and Operability of Training Objectives: This assesses whether the training objectives are detailed into actionable graduation requirements and course goals. A1-3 Degree of Reflection of Industry–Education Integration Concept: This measures how much the training objectives emphasize university–industry collaboration and authentic situational practice.

A2 Attractiveness of Student Recruitment. A2-1 First-choice Enrollment Rate: The proportion of students applying to the program as their first choice. A2-2 Admission Line and Provincial Control Line Ratio: Reflects the academic level of the incoming students. A2-3 Enrollment Rate: The proportion of new students who actually enroll.

A3 Social Reputation. A3-1 Social Recognition: The overall evaluation of the program by employers and industry associations. A3-2 Social Awareness: The program’s external influence and visibility.

B1 Faculty Development. B1-5 Teaching Competence of Faculty: The faculty’s skills in teaching design, feedback, and evaluation. B1-8 Professional Ethics and Conduct of Faculty: Adherence to rules and regulations and their role in student development. B1-10 Proportion of Faculty with Industry Backgrounds: The proportion of faculty members with practical experience in industry.

B2 Educational Support. B2-4 University–Industry Joint Laboratories/Training Centers: The number and effectiveness of co-constructed platforms. B2-10 Career Planning and Development Support: Systematic career education. B2-11 Social Practice Activities: The extent and quality of social practices that are aligned with the industry.

C1 Curriculum Development. C1-2 Alignment of Course Objectives with Training Objectives: Completeness of the course objective support matrix. C1-3 Course Content Support for Graduation Requirements: The degree to which teaching content aligns with graduation requirements. C1-9 Proportion of Practical Teaching in the Curriculum: The proportion of practical hours to total hours.

C2 Teaching Organization. C2-3 Level of Smart Teaching: The degree to which information technology and data-driven teaching applications are integrated. C2-4 Student Participation in University–Industry Collaboration Projects: The proportion and quality of student participation in training and projects.

C3 Teaching Quality Assurance. C3-2 Student Performance Assessment Methods: The balance between formative and summative evaluations. C3-5 Student Satisfaction with Teaching: The level of satisfaction with the classroom and courses. C3-7 Teaching Quality Monitoring and Evaluation: The effectiveness of monitoring mechanisms and closed-loop improvement processes.

D1 Capability to Gain Honors. D1-1 Competition Award Coverage: The proportion of students who participate in and win awards in competitions. D1-2 Participation in Innovation and Entrepreneurship Projects: The proportion of students who engage in and succeed in innovation and entrepreneurship projects.

D2 Core Competencies of Undergraduates. D2-4 Communication Skills: Written and oral expression, and cross-team collaboration. D2-5 Autonomous Learning Ability: Self-regulation and continuous learning. D2-8 Professional Qualification Examination Pass Rate: The rate at which students pass professional qualification exams and obtain certifications.

D3 Graduate Quality. D3-3 Employment (Including Further Study) Rate: The proportion of graduates who have secured employment or further study. D3-4 Employment and Major Compatibility: The degree to which graduates’ employment matches their field of study. D3-6 Graduate Employment Satisfaction: The degree of satisfaction with job placement and career development. D3-7 Internship and Employer Satisfaction: The employer’s evaluation of student internships and graduates.

Discussion on the evaluation indicator system

Analysis of the scientific nature of the evaluation indicator system

This study uses the CIPP (Context, Input, Process, Product) evaluation model as its framework. Based on literature analysis and semi-structured interviews, it draws on international standards for undergraduate education, combines China’s national teaching quality standards, and integrates evaluation systems from several provincial undergraduate programs. The study also references a series of national educational evaluation guidelines to establish a preliminary evaluation indicator system for the core competencies of undergraduate students within the scope of industry-education integration. The research indicates that the Delphi method is effective in forecasting, evaluating, and determining the evaluation indicators. During the two rounds of expert consultations, the research strictly followed the steps and methods of the Delphi process. After each round, the results from the previous consultation were fed back to the experts, ensuring that the evaluation indicator system for industry-education integration in the context of undergraduate professional core competencies was scientifically constructed. Furthermore, this study employed the Analytic Hierarchy Process (AHP) to calculate the weights of the various indicators, reducing the influence of subjective factors from the experts. This approach made the consultation results more scientific and reasonable.

Reliability analysis of the evaluation indicator system

This study invited 21 experts from schools and enterprises in six provinces and cities—Guangzhou, Shanghai, Fujian, Xi’an, Ningxia, and Jiangxi—to participate in the consultation. The experts come from a wide range of regions, which helps to minimize the bias and limitations of the consultation results. Among the experts involved in both rounds of consultation, there are staff members from university academic affairs offices, teaching research and development centers, heads of undergraduate institutions, and senior engineers from enterprises. All experts have more than 15 years of work experience, hold at least associate senior titles, and serve as graduate advisors. Their research fields include undergraduate education, teaching research, and management, and each expert has achieved significant accomplishments in these areas, allowing them to provide professional guidance for this study. The effective response rate for the consultation questionnaires in both rounds exceeded 90%, and the experts’ authority coefficients were above 0.8, indicating a high level of expert engagement and authority, thus making the consultation results reliable. The Kendall’s concordance coefficients for the two rounds of consultation were 0.182 and 0.244 (both P < 0.01), suggesting a good level of agreement among expert opinions. The higher Kendall’s concordance coefficient in the second round indicates that the experts’ opinions became more aligned, further confirming the credibility of the consultation results. In summary, the evaluation indicator system for undergraduate core competencies in the context of industry-education integration constructed in this study demonstrates strong reliability.

The necessity of constructing the evaluation indicator system for undergraduate professional core competencies in the context of industry–education integration

The necessity of constructing an evaluation indicator system for undergraduate professional core competencies in the context of industry-education integration lies in its ability to precisely capture the alignment between educational outputs and industry demands, promote innovation in educational content and methods, and enhance the quality and efficiency of talent cultivation. This system should include indicators such as mastery of theoretical knowledge, ability to solve practical problems, innovation and research and development capabilities, vocational skills, teamwork and communication abilities, and professional ethics. These indicators not only reflect students’ disciplinary and professional competencies but also embody their comprehensive qualities and abilities to adapt to future work environments. By using these evaluation indicators, educational institutions can better adjust their training programs, strengthen collaborations with the industry, and provide solid support for students’ career development and for the continuous innovation within industries.

Construction of the evaluation model

Based on the established evaluation indicator system for undergraduate professional core competencies in the context of industry-education integration, this paper further applies the Fuzzy Comprehensive Evaluation Method30,31,32 to develop the corresponding evaluation model. The Fuzzy Comprehensive Evaluation Method is suitable for complex evaluation problems that involve multiple levels and indicators, and that are difficult to describe using precise numerical values33,34. The core of this method lies in using membership functions to reflect the relationship between each evaluation object and its corresponding evaluation level. It then combines the indicator weights to compute the comprehensive evaluation results35,36.

Model structure and process

The fuzzy comprehensive evaluation model constructed in this study uses a multi-level fuzzy comprehensive evaluation method, which includes the following main steps: Determine the evaluation indicator set (U) ; Set the evaluation level set (V) ; Construct the membership degree matrix (R) ; Determine the indicator weight vector (W) ; Perform fuzzy comprehensive calculations and compute the comprehensive evaluation results.

Setting the indicator set and evaluation level set

As shown in Table 10, the evaluation indicator system for undergraduate core competencies constructed in this study includes 4 first-level indicators, 11 s-level indicators, and 68 third-level indicators. The third-level indicators serve as the final evaluation units, forming the evaluation indicator set (U). The evaluation level set (V) is set to a five-level evaluation scale.

Setting the membership function

This study uses the fuzzy membership scoring method, converting the scores given by experts or evaluation subjects for each third-level indicator (on a 1–5 Likert scale) into membership degrees for the five-level evaluation scale. Let the score for a particular indicator be x, and its corresponding fuzzy membership vector be:

$${R_{\text{i}}}=\left( {\begin{array}{*{20}{c}} {{r_1}}&{{r_2}}&{\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{r_3}}&{{r_4}} \end{array}}&{{r_5}} \end{array}} \end{array}} \right)$$
(1)

When x = 1, then Ri =[1,0,0,0,0]; When x = 2, then Ri =[0,1,0,0,0]; When x = 3, then Ri =[0,0,1,0,0]; When x = 4, then Ri =[0,0,0,1,0]; When x = 5, then Ri =[0,0,0,0,1]. By summarizing the rating data from each evaluation subject, the frequency of each indicator across the five levels is counted, and the membership degree matrix R is normalized.

Constructing the weight vector

Based on the weight results for the third-level indicators calculated using the AHP method, the weight vector W is constructed. The weight values have satisfied the consistency test requirements and can be directly used for subsequent calculations.

Fuzzy comprehensive calculation

The membership vector B for the comprehensive evaluation can be calculated using the following formula:

$$B{\text{ }}={\text{ }}W{\text{ }} \times {\text{ }}R$$
(2)

The final fuzzy evaluation result can be determined using the maximum membership principle to identify the corresponding level, or the fuzzy expected value method can be used for further quantification. The formula for calculating the fuzzy expected value E is as follows:

$$E{\text{ }}={\text{ }}B{\text{ }} \times {\text{ }}{V^T}$$
(3)

Where, V =1,2,3,4,5.

As shown in Fig. 1, the overall evaluation results are divided into five categories: Very Poor, Poor, Average, Good, and Excellent, based on the intervals of the E value. The interval standards in the figure are derived from the experience of universities in assessing student performance. For a maximum score of 100, a score of 60 is typically considered passing, 80 and above is regarded as good, and 90 and above is considered excellent. Using this proportion, the interval segmentation as shown in Fig. 1 can be obtained.

Fig. 1
Fig. 1
Full size image

The division of evaluation results based on the E value.

Application of the evaluation model—a case study of the mechanical engineering program at a university

To validate the practical application of the evaluation indicator system and fuzzy comprehensive evaluation model, this study selects the 2019 cohort of undergraduate students from the Mechanical Engineering program at a certain university as the evaluation subject. The university is a “Double First-Class” institution, and its Mechanical Engineering program has been actively advancing industry-education integration reforms in recent years, with a strong foundation in practical teaching and abundant university-industry cooperation resources.

Implementation process

  1. (1)

    Sample Selection: The study targeted the 2019 cohort of undergraduate students in the Mechanical Engineering program. A random sample of 132 students was selected, and the core competency evaluation questionnaires were distributed and collected, resulting in 132 valid samples.

  2. (2)

    Questionnaire Design: Based on the 68 third-level indicators in Table 10, a Likert 5-point scale evaluation questionnaire was designed. The scoring data was sourced from joint evaluations by professional teachers and mentors from cooperating enterprises.

  3. (3)

    Data Processing: The scoring data was transformed into frequency distributions for the five levels and then normalized. A fuzzy membership degree matrix R was constructed for the 132 students, and the weight vector W was derived from the results of the AHP method.

  4. (4)

    Fuzzy Comprehensive Evaluation: The fuzzy evaluation vector B and fuzzy expected value E for each student were calculated sequentially, and the corresponding evaluation levels were determined.

Results and analysis

The distribution of core competency levels for the 132 students in the Mechanical Engineering program at the university is shown in Fig. 2. The results indicate that the program has achieved positive outcomes in the context of industry-education integration. Specifically, 79.6% of students have core competencies at the “Good” level or above, with only a few students falling within the “Poor” or lower range. This reflects the overall high quality of education.

Fig. 2
Fig. 2
Full size image

Distribution of core competency evaluation levels.

Further analysis of the average expected values of different primary indicators is shown in Fig. 3. ‘Process Evaluation’ and ‘Input Evaluation’ score the highest, indicating that the program has a solid foundation in areas such as course organization, faculty development, and practical teaching. However, ‘Background Evaluation’ and ‘Outcome Evaluation’ scores are slightly lower, suggesting the need to further strengthen the construction of social reputation and the feedback mechanism for tracking graduate quality.

Fig. 3
Fig. 3
Full size image

Fuzzy expected values of primary indicators.

Application value

  1. (1)

    As a Teaching Quality Diagnostic Tool: This model can serve as an important tool in the internal quality assurance system of higher education institutions, used to regularly monitor the development of students’ core competencies in various programs.

  2. (2)

    Serving Talent Development Program Optimization: By analyzing evaluation results, it can guide the optimization of curriculum design, strengthen practical teaching, and enhance the collaboration between universities and enterprises in talent cultivation.

  3. (3)

    Promoting Deep Collaboration Between Universities and Enterprises: The evaluation results can serve as an important basis for enterprises to assess the teaching quality and talent development level of cooperating universities, thus promoting deeper involvement of enterprises in educational reform.

In conclusion, the empirical research combining real data tables and the fuzzy comprehensive evaluation model has verified the effectiveness, scientific nature, and practical application value of the evaluation system and model. This model demonstrates strong generalizability and can be promoted and applied in more programs and universities in the future.

Suggestions for the promotion and application in different disciplines and types of universities

Based on the empirical results of the Mechanical Engineering program, this section further discusses the pathways for promoting the evaluation indicator system and fuzzy comprehensive evaluation model in different disciplines and types of universities.

General path for interdisciplinary promotion

Establish a “Universal Benchmark—Discipline Mapping” Dual-layer Structure: Begin with the CIPP four-dimensional universal indicator set as the primary benchmark, covering common elements such as alignment with training objectives, curriculum-support alignment, teaching quality assurance, and graduation outcomes. Then, perform “discipline mapping” based on the core competencies and external accreditation requirements, replacing secondary and tertiary indicators with “strongly correlated observable indicators” relevant to the specific discipline.

Use of “Secondary AHP + Ontological Indicator Database” for Weight Calibration: Retain the general weight structure while organizing internal and external experts in the discipline to conduct a second round of AHP to meet the differentiated requirements of “evidence strength—availability—decision sensitivity” for different disciplines. The indicators will be consolidated into a reusable ontological indicator database, facilitating cross-disciplinary sharing and version iteration.

Contextualized Membership Function Setting: The membership function for fuzzy comprehensive evaluation should not be “one-size-fits-all.” It is recommended to use piecewise/triangular/trapezoidal functions based on the data distribution characteristics of each discipline (such as papers, patents, clinical skills assessments, teaching performance), and establish dynamic thresholds based on historical baselines, ensuring that grade divisions maintain group stability while accommodating developmental comparisons.

Evidence Triangle Validation and Formative Feedback: In addition to model outputs (E-value and grade), retain the “evidence triangle” (quantitative data, process archives, and external peer reviews) to enhance interpretability. Integrate with teaching meetings and curriculum-graduation requirement alignment analyses for ongoing formative improvement.

Stratified promotion strategies for different types of universities

Research-oriented/“Double First-Class” Universities: With the focus on “research-teaching-industry collaboration”, enhance the input dimension by strengthening the development and quality of high-level research platforms and university-enterprise joint laboratories. The process dimension should emphasize research-course co-construction, academic norms, scientific innovation training, and international learning experiences, while the outcome dimension highlights high-quality papers/patents/competitions, and international competence. It is recommended that such universities moderately increase the weight of process evaluation and outcome evaluation and align with the key dimensions of the “Double First-Class Evaluation Method” to maintain consistency between policies, institutions, and disciplines.

Application-oriented Undergraduate Universities: Focusing on “industry scenarios—project-based teaching—employment quality”, emphasize university-industry joint courses, real project practices, dual-track enterprise mentors, and the alignment of job competency profiles with courses and achievement loops. In the outcome dimension, set strong constraints on indicators such as employment quality, job compatibility, job adaptation speed, and performance. Refer to national guidelines on the transformation of local application-oriented universities and incorporate social service and regional industry contribution into the weighted items of the background and outcome dimensions.

Regional/Resource-constrained Universities: Adopt a strategy of “lightweight indicators + key evidence prioritization”, focusing on building several “easily implementable” evidence points (such as course-objective support matrices, practical hours ratios, university-industry project lists, and graduate outcomes and satisfaction) and performing annual micro-adjustments to weights using phased AHP. Additionally, compensate for shortages in equipment and training facilities by leveraging alliances and shared resources (regional industry-education integration communities, industry associations).

Private/Industry-specific Universities: Strengthen the “market feedback and governance” dimension, incorporating indicators such as employer repurchase cooperation rates, part-time mentor structures, and tuition-investment-output efficiency analysis to ensure governance transparency and sustainable educational quality.

In summary, the constructed evaluation indicator system and fuzzy comprehensive evaluation model possess strong transfer potential across different disciplines and types of universities. Through a systematic path of “universal benchmark—discipline mapping—weight calibration—evidence triangle—platform governance”, it ensures the scientificity and comparability of evaluations, while respecting the contextual differences across disciplines and institutions, achieving authentic and effective evaluation and continuous improvement.

Conclusions

In the context of the ongoing deepening of higher education reform and the integration of industry and education, the establishment of a scientific, systematic, and application-oriented evaluation system for undergraduate core competencies is of significant importance for improving the quality of talent cultivation in universities, optimizing the allocation of educational resources, and strengthening the university-enterprise collaboration mechanism. However, current research on the systematic evaluation of undergraduate core competencies under the perspective of industry-education integration is still insufficient. Existing evaluation systems generally suffer from incomplete structure, inadequate theoretical support, and weak quantitative capabilities, highlighting the urgent need for systematic research with both theoretical depth and practical breadth. This study is based on the CIPP evaluation model, considering four dimensions: Context, Input, Process, and Product. By using literature review, expert interviews, Delphi method, and Analytical Hierarchy Process (AHP), an evaluation indicator system for undergraduate professional core competencies was constructed. Subsequently, the fuzzy comprehensive evaluation method was introduced to establish a multi-level quantitative evaluation model. An empirical analysis was conducted using the mechanical engineering program at a “Double First-Class” university as a case study. The following main conclusions were drawn:

  1. (1)

    A Theoretically Supported and Well-Structured Evaluation Indicator System Was Developed: This system, based on the CIPP model framework, covers four primary indicators—Context, Input, Process, and Product—and includes 11 secondary indicators and 68 tertiary indicators. It reflects a full-chain evaluation concept, from the educational environment to the educational process and student outcomes, demonstrating strong systematization and completeness.

  2. (2)

    The Scientific and Authoritative Nature of the Indicator System Was Ensured Using Delphi and AHP Methods: The study invited 21 experts from universities and enterprises across six provinces and cities to participate in two rounds of consultation. The expert reliability coefficient (Cr) was above 0.84, and the Kendall’s coefficient of concordance was significant (P < 0.01), ensuring the representativeness, rationality, and scientificity of the selected indicators. The AHP method was used to construct an indicator weight system, quantifying the influence of each indicator and enhancing the operability of the evaluation system.

  3. (3)

    A Quantitative and Operable Evaluation Model Based on Fuzzy Comprehensive Evaluation Method Was Established: This model effectively handles multi-level, multi-dimensional, and highly fuzzy evaluation issues. By constructing a membership degree matrix for the indicators and performing weighted comprehensive calculations, it provides a comprehensive judgment of students’ competency levels, demonstrating good adaptability and universality.

  4. (4)

    The empirical study validated the feasibility and application value of the evaluation model. An empirical study was conducted using the Mechanical Engineering program at a certain university as a sample. The results revealed that the model could clearly differentiate students’ core competency levels. Through the analysis of first-level indicator scores, it was found that the program performed well in areas such as practical teaching and faculty investment. However, improvements were needed in areas such as outcome tracking and external reputation. This directly demonstrates the core competency assessment function of the model, while also indicating the areas where the university needs to strengthen its efforts within the specific discipline.