Introduction

In the 1970s, Classical test theory (CTT), which relies on raw scores, dominated the field of measurement research. However, CTT is heavily dependent on both the test items and the participants, with scores varying across different participants and items. Item response theory (IRT) emerged as an alternative, offering the ability to estimate participants’ ability on a common scale. From the 1970s to the 1980s, a substantial number of studies applied IRT to practical evaluation problems1. Despite its advantages over CTT, IRT remains a summary assessment, only capable of reporting participants’ abilities in a single dimension. Critically, the IRT model “has little connection with the basic process, strategy and knowledge structure of project solving concerned by cognitive theory”2. Neither CTT nor IRT can reflect the psychological characteristics or cognitive processes involved in participants’ responses to test items, nor can they capture participants’ mastery of specific, fine-grained knowledge points. To address these limitations, Cognitive Diagnostic Modeling (CDM) was developed. CDMs overcome the shortcomings of traditional theories and align with practical educational requirements. CDMs are “designed to detect a student’s specific knowledge structure or operational skills in a certain area, thereby providing detailed diagnostic information about the student’s cognitive strengths and weaknesses3. ” It presents test questions to students in the form of items, with the student’s latent traits serving as assessment attributes. Based on the students’ responses, psychological assessment models are used to infer students’ mastery of various attributes, laying the groundwork for personalized learning4. In essence, cognitive diagnostic assessment (CDA) is an evaluation method that applies cognitive diagnostic theory and models to analyze test data and generate detailed diagnostic insights5.

In the mathematics education, it is suggested that mathematics teachers, mathematicians, psychometricians, and educational statisticians cooperate in research projects to recognize the potential value of additional conceptual discussion and secondary analysis directly applicable to the existing school system6. At present, although Trends in International Mathematics and Science Study (TIMSS) was not originally developed for diagnostic purposes, its items using in non-diagnostic assessment can still be applied for diagnostic purposes7. The structure of CDMs within the TIMSS assessment is clear and suitable for CDA, which allows for the transformation of the non-diagnostic TIMSS test into a cognitive diagnostic assessment tool with diagnostic capabilities8. Mathematical cognitive processes refer to students’ understanding and operational processes involving mathematical knowledge and skills. As one of the two core areas assessed in the TIMSS test, this domain has garnered significant attention. TIMSS categorizes the assessment of cognitive domains into three major components: knowing, applying, and reasoning, proving a comprehensive framework for evaluating students’ mathematical cognition.

There have been several cognitive diagnostic studies utilizing TIMSS data to explore students’ mathematical performance. Tatsuoka and others used the rule space model (RSM) to sample and analyze the mathematical performance of Grade 8 students in TIMSS in 1999 9. Dogan et al. used RSM to study the mathematical performance of Turkish students on TIMSS-R. A Q-matrix containing 23 attributes was used to compare the distribution of attribute mastery between Turkish and American students10. The results showed that Turkish students have poor ability to master attributes such as P10 (quantitative reading), S4 (approximation / estimation), S6 (mode and relationship) and S10 (solving open problems) compared with those for American students10. Eunkyoung et al. conducted a similar study to compare the mastery of students’ attributes in South Korea, the Czech Republic and the United States11. Using the TIMSS-1999 dataset, Birenbaum and others also compared the attribute mastery of students in the United States, Singapore and Israel12. Using the similar same-year dataset, Chen conducted a CDA with three specific analyses, including the calculation of classification rate, multiple regression analysis and the comparison of attribute mastery probability in four pamphlets4. While most prior research focused on cross-national comparisons of students’ knowledge mastery, they rarely provided individual-level analyses. Among the few studies that did, the analyses were limited to simple descriptive statistics, with no application of rigorous inferential statistical methods. This gap underscores the need for further research to provide deeper insights into individual-level cognitive diagnostic analyses.

TIMSS, as the largest international educational assessment, involves over 60 countries globally. Its purpose is to monitor achievement trends in mathematics and science among Grade 4 and 8 students from participating countries worldwide, as well as to assess curriculum implementation and identify promising teaching practices. It provides cross-national comparative data that informs educational policies. Regrettably, Mainland China, despite being the most populous nation, has not yet participated in this assessment. Therefore, by obtaining our own data through CDA, we aim to explore how Mainland Chinese students would perform under the TIMSS framework. We seek to identify differences between the results of China and those countries that perform exceptionally well and to further analyze the pathways and progressions in students’ learning.

Methods

Attribute

Tatsuoka believed that attributes are production rules, project types, program operations or more general cognitive tasks. In this study, cognitive attributes were defined as the cognitive processes necessary for students to complete the test items13. The classification of these attributes is a constructed ordered classification according to the order of students’ cognitive development starting from students’ cognition when completing the test task. As one of the two core contents of it, cognition has received much attention in this field. TIMSS-2015 divided the examination of cognitive field into three parts: knowing, applying, and reasoning. Based on the three dimensions of TIMSS evaluation, seven specific attributes were formed (Table 1).

Table 1 Attribute definitions of cognitive process.

Q-matrix

The Q-matrix is a relational table used to represent how test items examine specific attributes, where 0 indicates that the attribute is not examined, and 1 indicates that the attribute is examined. In this study, a set of publicly available TIMSS-2015 questionnaires was selected as the evaluation items, and twenty experts were selected as the project calibration expert group, including eight doctoral students majoring in mathematics education, six front-line mathematics teachers in middle school, two provincial famous teachers selected by various provinces, and four experts in mathematics education in colleges and universities. They calibrated the attributes examined by the test items, for example, item 6:

figure a

the symmetric image of a shaded image with respect to a line.

The calibration result for this item is RM (Representation Modeling).

Twenty experts coded independently and the final calibration results are presented as follows (Table 2).

Table 2 Q-matrix of TIMSS cognitive process.

According to Table 2, except that attribute OS and GP had only three items to examine, other attributes had four or more than four items to examine.

Samples

Since Mainland China has not participated in the TIMSS test, this study conducted an independent assessment involving 4,733 Grade 8 students from Gansu, Guizhou, Guangdong, and Shanghai. These regions were deliberately chosen to ensure the representativeness and diversity of the sample, reflecting the significant economic and educational disparities between China’s central-western and eastern regions. Gansu and Guizhou represent less economically developed areas in the northwest and southwest, respectively, while Guangdong and Shanghai were selected as examples of more economically advanced regions in the southeast and along the eastern coast.

The assessment utilized a set of 28 items from TIMSS-2015 to evaluate the mathematical cognitive processes of these students. Additionally, to provide a broader context and comparative insights, secondary data from TIMSS assessments of high-performing regions—Singapore, Japan, South Korea, Taiwan, and Hong Kong—were analyzed. This approach allowed for an in-depth examination of Mainland Chinese students’ performance in relation to international benchmarks.

Model selection

The key to the correct diagnosis is to choose the appropriate diagnosis model based on different cognitive assessment assumptions5,14,15. To select the best fitting model, this study evaluated the parameters of seven commonly used models, and selected the model with the best fitting effect through model comparisons. The seven models specifically are Deterministic Input; Noisy ‘And’ Gate (DINA)16, which assumes that for a test-taker to correctly answer a particular item, they must master all the attributes required by the item. Lacking even one attribute significantly lowers the probability of correctly answering the item, making this model a non-compensatory cognitive diagnostic model; Deterministic Input; Noisy ‘Or’ Gate (DINO)17, which assumes that the more attributes a test-taker masters, the higher the probability of correctly answering the item, as the attributes have an additive effect. Therefore, this model is a compensatory cognitive diagnostic model that allows attributes to compensate for each other; Reduced reparametrized unified model (RRUM)18 introduces a penalty parameter. If a test-taker has not mastered a specific attribute, the penalty parameter reduces their probability of correctly answering the item, making the model partially compensatory. Additive Cognitive Diagnostic Model (ACDM)19, assumes that mastering an attribute increases the probability of answering the item correctly in a linear additive fashion. This model exhibits partial compensatory effects, as attributes contribute additively to the probability of success; Loglinear CDM (LCDM)20, is a saturated cognitive diagnostic model that integrates the categorical latent variable approach of cognitive diagnostic models with item response theory (IRT), offering flexibility and comprehensive modeling of item responses; Linear logit Model (LLM)21 is a linear logistic regression model with cognitive diagnostic capabilities. It extends traditional linear models to accommodate cognitive diagnostic functionalities. Finally, the Mixtures Model22 selects the optimal simplified model for each item by evaluating a combination of models, enabling a tailored approach to cognitive diagnosis.

The evaluation process is implemented using G-DINA package in software R, and the evaluation results are shown in Table 3 below:

Table 3 Parameter statistics of different models of TIMSS cognitive process.

According to Table 3, the results of model comparison show that the AIC and BIC of the mixed model are smaller than those of other models. This shows that the mixed model shows a better fitting for the data in this study.

Inspection of analytical tools

Item fitting analysis

The fitting effect of each item and model in the test tool is an important factor in the evaluation of cognitive diagnosis. Some studies have shown that the fitting effect of CDM and test items directly determines the accuracy of the diagnosis effect of the model23. The error of the square test is considered as the deviation of the potential response under the error root approximation test. The RMSEA calculation formula of project j is24:

$$\:\text{R}\text{M}\text{S}\text{E}{\text{A}}_{\text{j}}=\sqrt{\sum\:_{\text{k}}\sum\:_{\text{c}}{\uppi\:}\left({{\uptheta\:}}_{\text{c}}\right){\left({\text{P}}_{\text{j}}\left({{\uptheta\:}}_{\text{c}}\right)-\frac{{\text{n}}_{\text{j}\text{k}\text{c}}}{{\text{N}}_{\text{j}\text{c}}}\right)}^{2}}$$

\(\:{\uppi\:}\left({{\uptheta\:}}_{\text{c}}\right)\) represents the classification probability of the level of class C potential trait, \(\:{\text{P}}_{\text{j}}\) represents the probability estimated by the project response function. \(\:{\text{n}}_{\text{j}\text{k}\text{c}}\) refers to the expected number of people in dimension k of category C potential trait level in item j, \(\:{\text{N}}_{\text{j}\text{c}}\) refers to the expected number of potential trait levels of category C.

The closer the value of RMSEA is to 0, the smaller the fitting deviation is, and the better the fitting is. In the study of Oliveri and von Davier, the critical value of RMSEA is set to 0.1. When RMSEA > 0.1, it indicates that the project fitting is poor25. According to this standard, it can be concluded that in the cognitive process attribute, only the fitting effects for item16 and Item21 were poor (RMSEAs > 0.1), and the other overall fitting effects are acceptable.

Absolute fitting analysis

The process of model comparison is essentially a relative model-fitting evaluation. To comprehensively assess model performance, conducting an absolute fit analysis for each model is equally important. In the context of education evaluation, absolute fit analysis holds particular importance as it focuses on how well the model fits the observed response data independently, without comparing it to other models26. The absolute fitting index in this study adopts the Limited information of the Root Mean Square Error of Approximation (RMSEA2)27. which is an absolute fitting index commonly used in CDMS. In the construction of index model, RMSEA2 is different from conventional RMSEA because it only uses two moments: univariate and bivariate interaction28. And the formula is as follows:

$$\:\text{R}\text{M}\text{S}\text{E}{\text{A}2}_{\text{j}}=\sqrt{\sum\:_{\text{k}}\sum\:_{\text{c}}{\uppi\:}\left({{\upalpha\:}}_{\text{c}}\right){\left({\text{P}}_{\text{j}}\left({{\upalpha\:}}_{\text{c}}\right)-\frac{{\widehat{\text{n}}}_{\text{j}\text{k}\text{c}}}{{\widehat{\text{N}}}_{\text{j}\text{c}}}\right)}^{2})}$$

Where k represents the attribute, c represents a potential class of the specific attribute combination \(\:{{\upalpha\:}}_{\text{c}}\), \(\:{\uppi\:}\left({{\upalpha\:}}_{\text{c}}\right)\) is the potential class\(\:{{\upalpha\:}}_{\text{c}}\) being evaluated, \(\:{\text{P}}_{\text{j}}\) is the evaluated response function, \(\:{\widehat{\text{n}}}_{\text{j}\text{k}\text{c}}\) is the expected number of students of item j in category k possessing \(\:{{\upalpha\:}}_{\text{c}}\), \(\:{\widehat{\text{N}}}_{\text{j}\text{c}}\) is the expected number of students possessing \(\:{{\upalpha\:}}_{\text{c}}\) on project j29. The mean value of RMSEA2 is the average value of RMSEA2 of all projects, which can represent the overall fitting of the model. At present, there is no unified standard on RMSEA2 in CDMs. Some studies believe that generally RMSEA2 < 0.089 is a sufficient fit, and RMSEA2 < 0.05 is a better fit in the multidimensional item response theory28. Hu and other scholars believe that RMSEA2 < 0.05 in CDMS is the standard for model fitting30. Through the operation of G-DINA package in R, the attribute in cognitive process is obtained, RMSEA2 = 0.0299 < 0.05, so the mixed model absolutely fits the data.

Reliability analysis

The reliability of CDA can be investigated from two perspectives: first, by treating the test as a traditional assessment and calculating the alpha coefficient under CTT; second, by assessing the test-retest consistency. Templin et al. obtained the index by calculating the correlation of attribute mastery probability of the same participants in two successive measurements under the assumption that the attribute probability mastered by the participants remains unchanged31. In this study, the CDA platform (flexCDMs) developed by the team of Tu Dongbo was used to evaluate the reliability32. Through the evaluation of the data α = 0.9079 > 0.7, which has high reliability under the CTT theory. The statistics of Templin’s evaluation reliability in three dimensions are: RR = 0.9857, CM = 0.9911, OS = 0.9826, RM = 0.9882, PI = 0.9923, AE = 0.9656, CP = 0.9541, Mean = 0.9799. The test-retest reliability of TIMSS cognitive process attributes has reached more than 0.9, so this test has high reliability.

Results

Analysis of attribute mastery probability

The data of the four tests were analyzed. Meanwhile, in the international comparative analysis, the top 5 countries (regions) of TIMSS-2015 Grade 8 mathematics achievement were selected, which were Singapore, Korea, Taiwan, Hong Kong and Chinese mainland.

The data of the four provinces and cities in Chinese mainland were evaluated, and the results of Fig. 1 are obtained through statistical analysis.

Fig. 1
figure 1

Broken line chart of attribute mastery of mathematical cognition process in Grade 8 in four provinces (cities) of China. Recollection and recognition (RR), Calculation and measurement (CM), Operation and solution (OS), Representation modeling (RM), Process implementation (PI), Analysis and evaluation (AE), Generalization and Proof (GP).

Figure 1 illustrates the mastery of various attributes in the process of mathematical cognition among Grade 8 students across four provinces (cities) as well as the overall. On the whole, students’ mastery of various attributes of cognitive process is relatively balanced, basically maintained at about 70%. In comparison, students’ mastery of the attributes of Operation and Solution is low, only reaching more than 60%, and their mastery of the three attributes of Recollection and Recognition, Calculation and Measurement and Generalization and Proof is good, reaching more than 75%.

From the comparison of provinces (cities), the performance of students in Shanghai in the seven attributes of cognitive process is significantly better than those in other provinces and cities, reflecting an absolute advantage, basically reaching more than 85%, and even more than 90% in the attributes of Recollection and Recognition and Generalization and Proof. Only the mastery of Operation and Solution is slightly lower, only 83.2%. The data of Gansu, Guizhou and Guangdong provinces are basically consistent, which are lower than the overall level. Only Gansu’s Recollection and Recognition and Representation Modeling attributes are higher than the overall level. At the same time, it can be found that students in Guangdong have the lowest probability of mastering attributes in Recollection and Recognition, Calculation and Measurement and Generalization and Proof, and students in Guizhou have the lowest probability of mastering attributes in Analysis and Evaluation. Students in Guizhou, Gansu and Guangdong have almost the same mastery of Operation and Solution attributes with only 61%. The two attributes of Operation and Solution and Analysis and Evaluation show low mastery probability in all provinces and cities.

Based on data analysis of mainland China, further evaluation was conducted comparing data from mainland China and the top five countries (regions). The results, presented in Fig. 2, were obtained through statistical analysis and data centralization processing.

Fig. 2
figure 2

Standardized distribution of eighth-grade mathematics cognitive process attribute mastery in different countries or regions. Recollection and recognition (RR), Calculation and measurement (CM), Operation and solution (OS), Representation modeling (RM), Process implementation (PI), Analysis and evaluation (AE), Generalization and Proof (GP).

According to Fig. 2, Chinese mainland has a higher level of cognition in seven cognitive processes. Apart from the two attributes of Process Implementation and Analysis and Evaluation, Singapore has the best performance in five other attributes, and its value reaches the maximum of six countries. Singapore has an absolute advantage in the attribute Process Implementation. Japan has some advantages in Representation modeling, which value is only inferior to that of Chinese mainland, and the other six attributes are below the average. In addition to being slightly higher than the average value in terms of Representation Modeling and Process Implementation, other attributes in South Korea are lower than the average value, and Analysis and Evaluation has reached the lowest value in six countries, with poor performance. In addition to RM in Taiwan is far lower than other countries. The other six cognitive process attributes are basically close to the average and are relatively balanced. Hong Kong, China is higher than the average in the attributes of Recollection and Recognition, Operation and Solution, and Generalization and Proof, especially in the attribute of Recollection and Recognition. However, the performance of Calculation and Measurement, Process Implementation, and Analysis and Evaluation is not good, especially when the Calculation and Measurement attribute reach the minimum value of six countries or regions, and the mastery attribute is the lowest.

Overall, students of Chinese mainland have absolute superiority in the control of cognitive process attributes, almost all of them are at the best level. Only two attributes of Process Implementation and Analysis and Evaluation are ranked second, which is second only to Singapore with little difference. Students of Chinese mainland have the most obvious advantages in Calculation and Measurement, Operation and Solution, and Representation Modeling, which are far more obvious than those from other five countries or regions.

Analysis of advanced learning

Based on the data of students’ responses, the CDM evaluates the mastery of each student’s different attributes, and finally makes a judgment on the mastery status of each attribute (mastery is marked as 1, non-mastery is marked as 0). Therefore, each student’s mastery of different attributes forms a multidimensional vector composed of 0 or 1, which is usually called knowledge state5. Then, through IRT, using the three parameter IRT (3PL) and the mirt-package in R, the corresponding ability (θ) of each student under the item response theory is calculated33. According to the classes formed by different knowledge states, the average value of the ability values of all students in this class is calculated as the ability value of this knowledge state. The clustering results and capability values are shown in Table 4:

Table 4 Cognitive process attribute values of TIMSS.

As shown in Table 4, the capability value corresponding to the knowledge state (00000000) is the smallest, with a value of -1.42, and the capability value corresponding to the knowledge state (11111111) is the largest, with a value of 0.56. Therefore, dividing the ability value from − 1.5 to 0.6 into five levels with every 0.42 as a level, and the learning progress chart can be obtained as shown in Fig. 3 below:

Fig. 3
figure 3

Advanced learning diagram of cognitive process attributes.

As illustrated in the learning path diagram in Fig. 3, the majority of knowledge states fall in the third level, with some distributed across the second and fourth levels, and only one knowledge state present in both the first and fifth levels. This distribution indicates that students did not progress in a linear or equidistant manner in their development of cognitive processes. Instead, there appear to be periods of significant leaps, particularly in the initial and final stages of learning. Using the same methodology for constructing advanced learning paths based on foundational knowledge and skills, the attributes associated with knowledge states at various ability levels were extracted to define advanced learning. The steps to derive advanced cognitive dimensions are summarized in Table 5 below.

Table 5 Classification of learning advanced level in TIMSS cognitive process attribute.

The cognitive process of students is related to several elements including their interest in learning and their own learning characteristics, the logical structure of subject content and teachers’ teaching. Therefore, the learning path reflected by the above data is the result of comprehensive factors, which can provide some help for guiding students’ learning and teachers’ teaching35.

Personalized analysis

In the personalized analysis, this study selects four students numbered GZBS051, GSLS252, GDZD451, SHYH026 as the research participants (e.g., Jack, Sarah, John, Lucas for each). Their common feature is that they have a common total score under the traditional test theory, but their knowledge states are (0100001), (1001010), (1100001) and (1100000) separately. Jack and Lucas only master two attributes, while Sarah and John master three attributes. There are also great differences in the types of mastering attributes, as shown in Fig. 4:

Fig. 4
figure 4

Comparative analysis of four students’ mastery of dimension attributes of cognitive process with the same total score. GZBS051: Jack, GSLS252: Sarah, GDZD451: John, SHYH026:Lucas.

Figure 4. Comparative analysis of four students’ mastery of dimension attributes of cognitive process with the same total score.

It can be seen from Fig. 4 that the probabilities of Calculation and Measurement and Generalization and Proof attributes of Jack are about 0.7. Although the probabilities of mastering the two attributes are large, it has not been fully mastered and needs to be strengthened. In the Recollection and Recognition attribute knowledge states, it is not mastered, but there is also a certain probability of mastery and a certain foundation. Sarah has a high probability of mastering the attributes of Recollection and Recognition, Representation Modeling, and Analysis and Evaluation, which are more than 0.8, and the probability of mastering other attributes is almost close to 0. Meanwhile, John has a high probability of mastering the attributes of Calculation and Measurement and Generalization and Proof, reaching more than 0.9, but the probability of mastering the attributes of Recollection and Recognition is only slightly higher than 0.5, which is in the state of partial mastery. The probability of mastering other attributes is very low, which can be considered as not mastering at all. Lucas’s mastery probabilities of: Recollection and Recognition and Calculation and Measurement attributes are about 0.6. In terms of knowledge state, although Lucas has mastered this attribute, it needs to be further strengthened. In terms of the attributes of Analysis and Evaluation and Generalization and Proof, the mastery probability of Lucas is about 0.4, which means he or she has a certain foundation for this attribute, but has not reached the conditions for mastering this attribute.

Discussion

This study collected data from 4,733 Grade 8 students across four provinces (cities) in Mainland China and utilized CDA to analyze their performance across seven cognitive process attributes. Additionally, the top five countries in mathematics achievement from TIMSS-2015 for Grade 8 were selected for comparison, allowing for a detailed evaluation of Mainland China’s results against these high-performing countries. The analysis thoroughly explored the data from three perspectives: attribute mastery, advanced learning progression, and personalized learning analysis. These findings serve as a pre-test for Mainland China’s participation in TIMSS and provide a more standardized research framework for applying cognitive diagnostic methodologies.

First, the findings indicate that Grade 8 students in Mainland China demonstrate a significant advantage in mastering mathematical cognitive processes, particularly in Calculation and Measurement (CM), Operation and Solution (OS), and Representation Modeling (RM). These traditional domains are critical for solving routine problems, achieving procedural fluency, and applying mathematical concepts in structured scenarios. This reflects the effectiveness of the current curriculum and instructional practices in mainland China, which emphasize fundamental mathematical skills and applications. Such strengths suggest that students are well-prepared for standardized tests and structured problem-solving, showcasing the success of a system designed to ensure proficiency in foundational mathematics. However, while Mainland Chinese students demonstrate strong procedural fluency, the curriculum may need to integrate more inquiry-based and exploratory learning to foster higher-order thinking skills, such as critical reasoning and innovative problem-solving36,37,38. These findings underscore the need for a balanced educational approach that combines mastery of fundamental skills with opportunities for advanced mathematical reasoning and creativity.

Moreover, result suggests that students’ cognitive development in mathematics is not uniform or steady but rather characterized by periods of accelerated progress, particularly during the early and advanced stages of learning. This irregular progression highlights the complexity of cognitive development39,40,41, which is influenced by multiple factors such as students’ intrinsic interest in learning42,43, individual learning characteristics43, the logical structure of the subject content45, and the effectiveness of teaching methods46. These insights emphasize the importance of tailoring instruction to support diverse learning trajectories. Educators can design targeted interventions and scaffolded learning experiences to address varying rates of cognitive development, while curriculum developers can align educational content with students’ natural learning paths, fostering more effective and personalized learning environments.

In the end, the personalized analysis of students’ mastery probabilities, as illustrated in Fig. 4, highlights the diverse strengths and weaknesses among individual learners, underscoring the importance of personalized instruction47,48,49. For instance, Jack and Lucas require targeted support to strengthen their foundational skills in Calculation and Measurement and Representation Modeling, while Sarah and John show varying degrees of proficiency across attributes that could benefit from focused teaching strategies. By leveraging data-driven insights, educators can design adaptive learning pathways that provide feedback-driven support, enabling students to build on their strengths while addressing areas of weakness. Such approaches not only enhance individual learning outcomes but also inform broader curriculum development aimed at equitable educational opportunities.

CDA has gained prominence among educational researchers for its ability to enrich traditional evaluation methods by providing detailed diagnostic information. Nichols highlighted that CDA offers educators and decision-makers insights into students’ problem-solving strategies, conceptual understanding, and mastery of domain-specific principles50. Previous studies applying CDA to TIMSS data have demonstrated its potential to uncover nuanced differences in mathematics performance across nations. For example, studies comparing South Korean and American students have revealed disparities in problem-solving and reasoning, as well as the impact of teacher guidance on student learning51,52. However, many of these studies faced limitations in attribute construction and the depth of data analysis. This study addresses these gaps by improving attribute construction and fully leveraging the diagnostic potential of the data, providing a more comprehensive understanding of students’ mathematical cognitive processes.

Conclusion

With the application of cognitive diagnostic theory and the adaptation of the TIMSS test, this study developed a cognitive diagnostic tool to analyze students’ mathematical cognitive processes. Through an in-depth analysis of data from Mainland China, meaningful conclusions were drawn, providing insights into students’ mathematical learning. Moreover, international comparisons highlighted both the strengths and weaknesses of mathematics learning among Mainland Chinese students. Importantly, the knowledge states derived from the CDA allowed for the construction of student learning paths and progressions. While these results are theoretically sound and offer reasonable explanations for longstanding educational concerns, they remain data-driven findings. Further empirical validation, particularly through practical testing or longitudinal assessment, is necessary to confirm their alignment with real-world educational contexts.