Methods for measuring career readiness of high school students: based on multidimensional item response theory and text mining

Wang, Peng; Zheng, Yuanxin; Zhang, Mingzhu; Yin, Kexin; Geng, Fei; Zheng, Fangxiao; Ma, Junchi; Wu, Xiaojie

doi:10.1057/s41599-024-03436-0

Download PDF

Article
Open access
Published: 16 July 2024

Methods for measuring career readiness of high school students: based on multidimensional item response theory and text mining

Peng Wang¹,
Yuanxin Zheng¹,
Mingzhu Zhang¹,
Kexin Yin¹,
Fei Geng¹,
Fangxiao Zheng¹,
Junchi Ma¹ &
…
Xiaojie Wu¹

Humanities and Social Sciences Communications volume 11, Article number: 922 (2024) Cite this article

6221 Accesses
1 Citations
Metrics details

Subjects

Abstract

In contemporary society, career readiness holds paramount significance for individual life, exerting a direct influence on initial employment, job satisfaction, and the sense of career identity. Framed within multidimensional item response theory and text mining, this study embarks on exploring assessment methodologies for high school students’ career readiness by revising the “Career Readiness Questionnaire – Adolescent Version” and employing text mining techniques. Study One collected 1261 valid data points through cluster sampling. With the aid of Bayesian multivariate item response theory parameter estimation procedures and R language, the career readiness measurement tool was revised, yielding a concise scale that aligns with psychometric requirements. The research findings indicated that the concept of “career readiness” is more suitable for the multidimensional graded response model than for the bifactor model. The dataset’s discrimination parameters fell within the range of [1.59, 3.84], the difficulty parameters fell between [−2.91, 2.24], and the peak values of the maximum information functions fell within [0.24, 2.35]. After six items with the lowest peaks were removed (Items 4, 5, 6, 31, 32, and 33), the remaining 30 items composed the Chinese concise version “Career Readiness Questionnaire – Adolescent Version,” with discrimination parameters ranging from [1.45, 3.38], difficulty parameters between [−3.31, 1.76], and maximum information function peaks within [0.50, 2.64]. Building upon the effective participants from Study One, Study Two matched questionnaire data with textual information, resulting in 1012 valid participants. Leveraging text mining, a machine learning model was constructed to predict high school students’ career readiness based on essay texts. The results of Study 2 prove that the revised lexicon was more accurate in feature extraction. Building upon this, the machine learning model for essay text demonstrated excellent performance in predicting career readiness, with random forest outperforming the other algorithms. This study provides a novel approach for schools and parents to comprehend the state of career readiness among high school students, offering a convenient and effective tool for educational activities related to students’ career development.

Development of a two-way mentorship scale focusing on next-generation core competencies

Article Open access 02 October 2023

Unsettled horizon: adolescents’ career expectations in the volatile, uncertain, complex, and ambiguous contexts

Article Open access 01 July 2025

Longitudinal development of career adaptability in pre-service teachers: the impact of internship experiences and emotion regulation strategies

Article Open access 16 May 2025

Introduction

In a competitive contemporary society, every parent aspires for their children to attain a leading position in both academics and careers, achieving success in their career. The attainment of success in one’s career is crucial for individuals and is closely related to various facets of life (Gao and Ding, 2022; Zhan, 2020).

Brown and Lent (2012) characterized career readiness as the diverse attitudes, behaviours, and abilities required to master occupational tasks, transitions, and challenges. Patton and Skorikov (2007) conceptualized career readiness as a component of the career identity commitment process, emphasizing its critical role in the late adolescence of youth and transition into adult careers. Career readiness is correlated with various factors influencing professional success (Salomon et al. 2024). For instance, Lau et al. (2020) discovered a direct correlation between a positive self-concept and job preparation skills. A longitudinal study conducted by Liu (2016) revealed that the dimensions of career readiness, including readiness in abilities, mindset, and behaviour, significantly predict the initial employment quality of vocational school graduates. Furthermore, Gysbers (2013) asserted that students who are well prepared for their careers show more active engagement, greater psychological resilience, and a firm commitment to a self-defined professional future, thereby adding meaning and purpose to their lives. Most previous research on career readiness has focused on college students or other groups of students about to enter the workforce, such as nursing students (He et al. 2021), college athletes (August, 2020), students in specialized art schools (Li and Qin, 2021), and special students (Binghashayan et al. 2022; Lombardi et al. 2023). However, high school students have been largely excluded from the scope of career readiness studies. High school students are at the turning point of entering adult society and need to accumulate and prepare adequately to ensure future career success (Super, 1980; Castellano et al. 2017; Kenny et al. 2023). Consequently, the level of career readiness among high school students is considered a pivotal factor in determining their ability to smoothly adapt to adult society (Crespo et al. 2013; Hirschi et al. 2009).

Item response theory (IRT) is a modern measurement theory that emerged after classical testing theory (CTT) and has widespread applications in the fields of psychology and educational measurement. In comparison, IRT has distinct advantages in handling data and item-level analysis (Embretson, 1996; Hirsch et al. 2023). The core assumption of IRT is that an individual’s performance on a specific item is determined by his or her ability at a latent level, differing from the assumptions of CTT. IRT utilizes item characteristic curves (ICCs) to depict the probability of an individual responding to an item across various levels of ability. However, traditional IRT models are unidimensional, assuming that an individual’s performance is influenced by only one latent ability. In practice, an individual’s performance is often affected by multiple latent abilities (Reckase, 2006; Kang and Xin, 2010; Alam et al. 2023). Hence, multidimensional item response theory (MIRT) emerged. MIRT integrates factor analysis and unidimensional IRT, describing the relationships between item parameters, the multidimensional latent traits of respondents, and response patterns to items through nonlinear functions. Compared to one-dimensional IRT, MIRT can better handle multidimensional data and enhance measurement accuracy and precision by considering the interdimensional correlations. This superiority is particularly pronounced when dealing with a limited number of items and a small number of items per dimension (Wang et al. 2006; Quansah et al. 2024).

In the development and revision of measurement scales, the successful application and advancement of MIRT have effectively addressed the limitations of unidimensional models on multidimensional data, providing a powerful tool for more efficient measurement (Jiménez et al. 2023; Sepehrinia et al. 2024). Dodeen and Al-Darmaki (2016) revised the UAE National Marital Satisfaction Scale, and the results of the shortened revised scale were very close to those of the original scale. Zang et al. (2012) utilized IRT to revise the Parent–Youth Companion Attachment Scale, resulting in a new scale with significantly increased test information peak functions and greater reliability and effectively examining the attachment status of Miao ethnic junior high school students in China. Ma et al. (2023) revised the Multidimensional-Multi-Attribution Causality Scale based on MIRT, and the revised scale exhibited improved psychometric properties. Sarah et al. (2018) demonstrated that the results of the multidimensional graded response model supported the scoring process of the two subscales of the Cushing QoL questionnaire, contributing to the enhancement of scale quality in the field of health sciences. Sukhawaha et al. (2016) used MIRT and exploratory factor analysis to develop a scale with 35 items and 4 factors (stressors, pessimism, suicide, and depression) and validated it through confirmatory MIRT analysis of 450 adolescents, which effectively distinguished individuals with suicide attempts from those without suicidal tendencies. These previous studies affirm the effectiveness of IRT-based methods in the development of psychological scales.

In the era of big data, the online space has accumulated rich individual behaviour and expression data, providing developmental opportunities for psychological science (Idris and Ken, 2018; Yue et al. 2013; Raghavendra et al. 2024). With the prevalence of social media and online learning platforms, researchers can collect more open and real text data to delve into individual behaviours and psychological traits (Feldman and Sanger, 2008; Morrow, 2024; Macanovic and Przepiorka, 2024). By utilizing dictionaries such as the Linguistic Inquiry and Word Count (LIWC), researchers can extract lexical features from text posted by Facebook users to predict their Big Five personality traits (Markovikj et al. 2013; Tay et al. 2020). In China, researchers have noted the extensive user populations on online social media platforms such as Weibo and Zhihu. They used the Chinese version of the LIWC to extract lexical features to analyse user-generated content and delve into users’ psychological states (Wang et al. 2016). They explored the thematic characteristics of public mental health topics on online platforms and analysed the anxiety and depression of the public (Cheng et al. 2017; Mi et al. 2021). Additionally, researchers have employed text, such as essays, to predict personality traits. Hirsh and Peterson (2009) studied the relationship between word usage and the Big Five personality traits through participants’ self-narrative materials. Zhang et al. (2017) utilized the life design paradigm and designed essay test materials named “My Career Story” to predict career adaptability in college students. Luo et al. (2021) researched the shy traits and shy language style models of primary school students through their essays, diaries, and comments. Overall, the integration of text mining and psychological research is deepening, and the scope of research is continually expanding.

Previous research has indicated that the development or revision of career readiness scales predominantly relies on CTT. For instance, Lei et al. (2000) investigated the psychological state of career readiness and its influencing factors among university students in Chongqing. Bai (2021) analysed the current status of career readiness among maritime graduates. Vanessa et al. (2022) aimed to enhance students’ career readiness through career guidance, while Tian (2022) conducted a study on the career attitudes of new students in rural high schools in Hangzhou. The use of IRT to develop or refine career readiness scales is still relatively unexplored. Therefore, this study aimed to thoroughly investigate the assessment tools for high school students’ career readiness using MIRT and text mining. By enhancing the quality of measurement tools through the utilization of MIRT, coupled with the integration of text mining, this study aims to explore the optimal machine learning model for achieving automated predictions of career readiness. The ultimate goal is to enhance the convenience and effectiveness of career readiness measurements.

Study 1 Revising the ‘Career Readiness Questionnaire - Adolescent Version’ using multidimensional item response theory

Method

Participants

Conducting randomized cluster sampling, we selected second-year high school students from a province, involving 37 classes and a total of 1585 individuals. Following the school’s approval and informed consent from the students and their guardians, we conducted online collective assessments on a class-by-class basis. A total of 324 participants were excluded due to invalid responses (questionnaires with 99% of answers selecting the same option), leaving a remaining 1261 valid participants (M_age = 16.23, SD_age = 0.69). Among them, 607 were male (48.1%), and 654 were female (51.9%).

Instruments

Career Readiness Scale-Adolescent Version (CRS-A)

Marciniak et al. (2020) developed the CRS-A, which comprises 36 items organized into 12 dimensions: occupational expertise (OE), labour market knowledge (LMK), soft skills (SS), career involvement (CI), career confidence (CCo), career clarity (CCl), social support—school (SS-S), social support—family (SS-Fa), social support—friends (SS-Fr), networking (Nw), career exploration (CE), and self-exploration (SE). Each dimension consists of 3 items, and there is no reverse scoring. The items were measured on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree).

Occupational Identity Scale (OIS)

The OIS was developed by Mauer and Gysbers (1990) and consists of 18 items with no reverse scoring. It employs a 5-point scoring system (1 = strongly disagree, 5 = strongly agree). An example item is as follows: “Deciding on a career is a long-term and challenging issue for me.” In this assessment, the Cronbach’s α coefficient for the OIS was 0.92.

Career Decision Self-Efficacy Scale (CDSES)

The subscale within the CDSES, developed by Lent (2013) to measure career decision self-efficacy, comprises 8 items. There is no reverse scoring, and it utilizes a 10-point scoring system (1 = not confident at all, 10 = extremely confident). An example item is as follows: “How confident are you in your ability to find a career that suits your personality?” In this assessment, the Cronbach’s α coefficient for this subscale was 0.96.

Research process

Initially, two bilingual translators translated the original scale into Chinese and performed multiple back-translations. Four psychology professors and psychology master’s students were then invited to evaluate and modify the content. Subsequently, an equivalent measurement tool with the same number of items and scoring methods as the original scale was developed. The initial scale was then administered, and the data were randomly divided. One-half of the data (Dataset A) were used for MIRT analysis, where a multidimensional graded response model and a bifactor model were constructed, and their results were compared. In this process, items with inappropriate discrimination, difficulty, and item information functions (IIFs) were deleted. The other half of the data (Dataset B) were used for validation analysis. The model fit results were re-evaluated based on the model’s item discrimination, difficulty and IFF. This led to the formation of the formal scale. Finally, criterion-related validity analysis was employed to further examine the validity of the revised scale (Yao, 2003).

Analytical approach

Data management and analysis were conducted with SPSS 26.0. The Bayesian multivariate item response theory (BMIRT) developed by Yao (2003) was employed for MIRT analysis. Additionally, the “mirt” package in the R programming language was utilized for estimating the ICCs and IIFs (Chalmers, 2012).

Results

Parameter Estimation Results Based on Dataset A

Comparison between the Multidimensional Graded Response Model (MGRM) and bifactor model

Considering the dimensionality of the original scale and the comparison of the two models, this study employed a statistical method based on information criteria for fit testing. We established the MGRM and bifactor model and compared them using the Akaike information criterion (AIC) and Bayesian information criterion (BIC). Generally, the principle is to select the optimal model based on the minimum AIC (Cheng et al. 2015). Additionally, the BIC imposes a stronger penalty on the model than does the AIC. The results showed that the AIC of the MGRM was 43658.03 and the BIC was 79428.02, while the AIC of the bifactor model was 45303.01 and the BIC was 84030.02. Both the AIC and BIC for the MGRM were superior to those for the bifactor model. Therefore, the MGRM was selected for further in-depth analysis.

MGRM parameter estimation results for Dataset A

(1)
Item Difficulty and Item Discrimination

The comparative results of the two models’ fits indicated that the MGRM model outperforms the bifactor model. This model assumes that each item’s response is influenced by only one ability, resulting in each item having only one discrimination on its corresponding dimension. In BMIRT, the Q-matrix is utilized to define the relationship between items and dimensions’ measurement properties, facilitating the estimation of model parameters (Tu et al. 2011).

According to IRT, excessively low item discrimination suggests limited differentiation among participants with different abilities, while overly high discrimination can adversely affect the overall test results. Therefore, following the recommendations of Yang et al. (2008), a criterion for discrimination (a) was adopted, in which items with a ≤ 0.3 or a ≥ 4 were removed. For difficulty (b), guidance from Zan et al. (2008) was considered, leading to the removal of items with b ≤ −4 or b ≥ 4. Table 1 presents the parameter estimation results for all items. Notably, all the items’ discrimination parameter estimates fell within the range of [1.59, 3.84], and the difficulty parameter estimates were within the range of [−2.91, 2.24]. This indicates that the distribution ranges for both the discrimination and difficulty parameters are within acceptable limits. Therefore, no items were removed at this stage.
Table 1 MGR Parameter Estimation Results for Dataset A.
Full size table
(2)
Item Information Functions

The IIF is employed to illustrate the degree of discrimination that a test or scale provides to participants within a specific θ range. Higher information values indicate greater precision and reliability (Dodeen and Al-Darmaki, 2016). As depicted in Fig. 1 and Table 2, the distribution of the maximum information function peaks for the 36 items fell within the range of [0.24, 2.35]. For instance, Items 13, 15, and 14 exhibited higher maximum information function peaks than did the other items (Items 6, 5, and 33), indicating their superior performance in providing more informative measurements. Considering a balance between item information functions, item content, and the original dimensional structure of the scale, six items with the lowest maximum information function peaks were removed (Items 4, 5, 6, 31, 32, and 33). The remaining 30 items formed the Chinese abridged version of the “Career Readiness Questionnaire-Adolescent Version”.

Table 2 Item Information Functions Table for Dataset A.

Full size table

Validation analysis of the parameter estimation results for Dataset A based on Dataset B

MGRM parameter estimation results for Dataset B

(1)
Item Difficulty and Item Discrimination

To validate the reliability of the results obtained from constructing the MGRM based on Dataset A, we further conducted parameter estimation analysis on Dataset B. Considering the parameter estimation results and IIF analysis on Dataset A, a total of 6 items were removed, involving 2 dimensions. Thus, in the subsequent confirmatory parameter estimation, 30 items were retained, encompassing 10 dimensions. Table 3 displays the results, showing that the discrimination parameter estimates for all items in the abbreviated scale ranged from [1.45, 3.38], and the difficulty parameter estimates ranged from [−3.31, 1.76]. In Dataset B, both the discrimination and difficulty parameter distributions fell within acceptable standard range (Yang et al. 2008; Zan et al. 2008). Therefore, it can be inferred that the results based on dataset A are reliable.
Table 3 MGR Parameter Estimation Results for Dataset B.
Full size table
(2)
Item Information Functions

The results of the IIF based on Dataset B are presented in Table 4 and Fig. 2. The peak values of the maximum information functions for the 30 items in the abbreviated scale ranged from [0.50, 2.64], which is higher than the range of the 36 items in the original scale. Notably, in the analysis conducted on Dataset A, original Items 13, 15, and 14 ranked in the top three in terms of maximum information. In the analysis of the abbreviated scale, due to the removal of some items, the original Items 13, 14, and 15, now renumbered as 10, 11, and 12, respectively, continued to rank among the top three in providing the most information. The maximum amount of information provided by the retained items remained relatively stable. For instance, the maximum information of original Item 10, now renumbered 7, increased from 0.64 to 0.65, and the maximum information of Item 25, now renumbered 22, decreased from 1.44 to 1.26. It is worth noting that all items provided maximum information greater than or equal to 0.5, and there were no items that significantly fell below the others (Items 5 and 6 in the original scale had noticeably lower maximum information than other items). Based on these comparisons, this study considers the performance of the 30 items in the abbreviated scale to be excellent and suitable for subsequent research.

Table 4 Item Information Functions Table for Dataset B.

Full size table

Results of the ability parameter estimation of the participants

Using the MGRM on Dataset B, the ability parameters of the participants were estimated. As indicated in Table 5, the average ability scores of all the participants across the 10 dimensions ranged from 0.14 to 0.27. Specifically, among the 10 career readiness dimensions, the participants exhibited the highest abilities in the CCl, SS-S, and OE dimensions (0.27, 0.25, and 0.25, respectively) and the lowest abilities in the SSFr, Nw, and SS-Fa dimensions (0.14, 0.15, and 0.18, respectively). Additionally, one of the advantages of MIRT is the ability to estimate participants’ abilities across multiple dimensions. This enables researchers to conduct in-depth analyses of each participant’s abilities in various test dimensions, thereby achieving the function of cognitive diagnosis. Table 5 reports the ability scores of some participants in Dataset B across the 10 dimensions. Participant “7023747” performed well in all dimensions, while participants “7023742” and “7023718” performed poorly in all dimensions. Participant “7023733” performed poorly in five dimensions—OE, SS, CCl, SS-Fr, and Nw—but performed well in five dimensions—CI, CCo, SS-S, SS-Fa, and SE.

Table 5 Results of the Ability Parameter Estimation of Participants.

Full size table

Criterion-related validity analysis

Career readiness, occupational identity, and career decision self-efficacy exhibit distinct characteristics to a certain extent. However, previous research indicates a certain degree of correlation between occupational identity, career decision self-efficacy, and career readiness (Nam et al. 2008; Praskova et al. 2015). Therefore, we used the OIS and CDSES as criterion measures to comprehensively validate the performance of the revised scale.

The results of the criterion-related validity analysis are presented in Table 6. The domain score of the revised career readiness scale exhibited a significant correlation with the total scores of the occupational identity scale and the career decision self-efficacy scale (r1 = 0.305, r2 = 0.682). The correlations between the domain score of the revised career readiness scale and various dimensions of the occupational identity scale ranged from 0.203 to 0.346, indicating strong validity. Moreover, the internal correlations among the dimensions of the revised career readiness scale ranged from 0.283 to 0.702, with an average of 0.459. The correlations between the dimensions of the revised career readiness scale and dimensions of the occupational identity scale ranged from 0.016 to 0.422, with an average of 0.190. Last, the correlations between dimensions of the revised career readiness scale and the total score of the career decision self-efficacy scale ranged from 0.215 to 0.682, with an average of 0.442. These findings suggested that while career readiness has connections with occupational identity and career decision self-efficacy, it also demonstrates distinctiveness.

Table 6 Criterion-related Validity Analysis.

Full size table