Introduction

The importance of creativity in the 21st century cannot be overstated, as it plays a crucial role in driving innovation, problem-solving, and personal and societal progress1. According to Plucker et al., the concept of creativity refers to the interaction of abilities, processes, and environment through which an individual or group can produce a perceived product of novelty and usefulness defined in a social context2, which develops significantly during adolescence, with distinct trajectories of development for insight, verbal divergent thinking, and visual divergent thinking3. As Barbot and Heuser assert, creativity functions as a vital component of adolescent self-identity, enabling teenagers to cultivate positive self-image through creative expression4. Tested in a sample of 146 late adolescents, Giancola et al. supported that adolescents’ openness and creativity contribute to their ability to critically think about today’s ecological issues and adopt environmentally friendly behaviors5.

Historically, the study of creativity has been approached from two main perspectives: domain-general theories that focus on universal aspects of creativity, such as divergent thinking and personality traits, and domain-specific theories that emphasize the role of expertise, knowledge, and skills specific to a particular domain6,7. These distinct approaches, however, have led to a need for a more comprehensive framework that can reconcile both perspectives.

In response to this need, the Amusement Park Theoretical (APT) model of creativity8,9 offers a unified framework that integrates both domain-general and domain-specific perspectives, providing a more comprehensive understanding of the nature of creativity. The APT model suggests that creativity operates at different levels of specificity, with domain-general factors (e.g., intelligence, motivation) acting as the foundation for creativity across all domains, while domain-specific factors (e.g., expertise, knowledge) influence creativity within specific areas. In this model, creativity is likened to an amusement park, with general thematic areas represented by different sections of the park and more specific domains represented by individual rides within each section. The model incorporates domain-general factors as the park’s general requirements that apply to all visitors, regardless of the specific rides they choose, while domain-specific factors are represented by the unique requirements and experiences of each individual ride. By integrating both domain-general and domain-specific perspectives, the APT model acknowledges the importance of universal factors that contribute to creativity across domains while also recognizing the role of domain-specific expertise and knowledge in shaping creative outcomes within specific fields, allowing for a more nuanced understanding of the complex nature of creativity and its manifestation in various contexts.

Building upon the APT model, Kaufman10 developed the Kaufman Domains of Creativity Scale (K-DOCS), a self-report measure that assesses individuals’ self-perceived creativity across five distinct domains: Self/Everyday, Scholarly, Performance, Mechanical/Scientific, and Artistic. The scale consists of 50 items that ask respondents to rate their creativity in various areas of life, such as daily problem-solving, academic pursuits, performing arts, engineering, and visual arts9. The five domains assessed by the K-DOCS align with the APT model’s conceptualization of creativity as a multi-faceted construct, providing a valuable tool for researchers and practitioners to assess creativity across these domains and gain insights into individuals’ creative strengths and preferences. Notably, the K-DOCS can effectively assess an individual’s level of creativity in a specific domain even if they have not directly experienced the behaviors described in the domain-specific items. This is achieved by asking respondents to imagine themselves in those scenarios11.

The K-DOCS has been widely used in creativity research and has demonstrated good psychometric properties, including internal reliability and factorial validity10,12,13,14,15,16,17. Among the studies regarding to factorial validity, Kapoor et al., recruiting a large sample of 22,013 American participants, adopting confirmatory factor analyses (CFA), found that the K-DOCS demonstrated good reliability and construct validity, with the five- and nine-factor model of good fit for both men and women while the one-factor model was not recommended15. In another study, using CFA as well, Kapoor, Zheng, et al. supported both five- and nine- factor structure for the K-DOCS across ethnicity16. However, in Miroshnik et al.’s study validating the K-DOCS on a Russian-speaking sample, exploratory and confirmatory factor analyses demonstrated that the five-factor structure was the best fit but the nine-factor one was not supported13.

As for criterion validity, choosing the Big Five Inventory–2 as criterion measures, Miroshnik et al. reported to find that openness to experience was correlated with the K-DOCS13. Moreover, adopting Chinese big five personality inventory brief version (CBF-PI-B) and creative self-efficacy scale as criterion measures, Zhang et al. found that openness to experience and creative self-efficacy had positive correlations with the K-DOCS factors17.

Although the 50-item K-DOCS has been verified as a reliable and valid scale, it has limitations that have led to the creation of shortened versions. Firstly, some items have been found inappropriate for certain populations, with Tan et al. removing four items and Kandemir and Kaufman removing three items due to low factor loadings in Malaysian18 and Turkish samples19, respectively. Additionally, completing 50 items takes participants more time to respond and could be burdensome, resulting in lowered administration efficiency20. To address these issues, researchers have developed various shorter versions to facilitate more practical applications21, including a 20-item English version20, a 42-item Turkish version22, and a 16-item Chinese version17.

The use of shortened creativity scales offers several advantages. They allow for more efficient data collection, reducing participant fatigue and potentially increasing response rates. Moreover, shorter scales provide greater flexibility in research designs, enabling researchers to examine the relationships between creativity and other variables more comprehensively within time-constrained studies. This increased efficiency and flexibility can lead to more diverse and nuanced investigations of creativity across various contexts and populations. Among these shortened versions, the 20-item short form of the K-DOCS (20-K-DOCS) was recently developed by Tan et al.20 by selecting the four items with the highest factor loadings from each of the five dimensions of the original K-DOCS. In their study involving 1,409 Malaysian undergraduates, Tan et al. found that the 20-K-DOCS was a psychometrically equivalent short form of the original K-DOCS, maintaining its strong psychometric properties while offering a more concise assessment of creativity.

To date, the reliability and validity of several shortened versions, including the 16-item Chinese and 42-item Turkish versions, have been established in various studies17,22. However, the 20-K-DOCS is relatively new and has primarily been validated with university student populations. Despite its demonstrated suitability for university students, the applicability of the 20-K-DOCS to mid-adolescents (aged 15–18) remains unknown. This gap in the literature presents an important area for further investigation, as understanding the psychometric properties of the 20-K-DOCS in younger populations could significantly enhance its utility in educational and developmental research. This focus is particularly crucial as adolescence, being a transitional phase from childhood to adulthood, is often seen as a critical period for the development of creativity4,23. The importance of this period stems from two main factors: firstly, it is marked by significant improvements in cognitive abilities thought to be related to creativity, such as abstract reasoning and problem-solving, along with the gradual maturation of emotional and social skills24; secondly, adolescents’ creative identities are more susceptible to influence from emotions, relationships, and social experiences25. The various values, criteria, and behaviors they encounter during this period can either support or suppress the development of their creative identity. Given the importance of this developmental stage, special attention needs to be paid to the creativity of this population to support its development effectively. Therefore, testing the suitability of the 20-K-DOCS for mid-adolescents is not only necessary but also potentially valuable for enhancing our understanding of creativity during this crucial developmental period.

Furthermore, another notable gap in the literature on the 20-K-DOCS is the scarcity of studies employing Rasch analysis to evaluate its psychometric properties. Item Response Theory (IRT), of which Rasch analysis is a specific model, and Classical Test Theory (CTT) are two primary approaches used to analyze and validate measurement scales in psychometrics. Embretson and Reise highlighted a key distinction between these approaches24. While CTT focuses on overall test scores, IRT provides a more nuanced perspective by considering the interaction between individual item characteristics and respondent traits26. Specifically, IRT posits that the probability of a correct response to an item is a function of both the respondent’s ability and the item’s difficulty. This relationship allows for a more precise estimation of respondent abilities and item properties.

Given that Rasch analysis, a form of IRT, offers a more comprehensive understanding of a scale’s functioning at the item level, it provides detailed information on item difficulty and fit, yielding psychometric insights unavailable through CTT alone27. Additionally, it allows for the examination of scale unidimensionality, a crucial assumption in psychological measurement28. Surprisingly, this approach has seldom been used to assess the K-DOCS or its shortened versions. In contrast, other creativity measures, such as the Creative Achievement Questionnaire29 and the Academic Writing Creativity scale30, have benefited from Rasch analysis, providing valuable insights into their psychometric properties. The lack of Rasch analysis in K-DOCS literature represents a significant opportunity for future research. To provide a more comprehensive picture of the 20-K-DOCS’ psychometric properties and further establish its validity, it is necessary to extend the analysis beyond CTT and employ Rasch analysis. This approach would allow researchers to gain a more nuanced understanding of the scale’s functioning and its suitability for assessing creativity across various domains.

To address the aforementioned knowledge gaps, this study aims to conduct a comprehensive evaluation of the psychometric properties of the 20-K-DOCS utilizing both CTT and Rasch analysis, while concurrently assessing its concurrent and known-groups validity. The investigation of concurrent validity will be pursued through an examination of the scale’s relationship with the Creative Self-Efficacy (CSE) scale. This approach is predicated on the understanding that creative self-efficacy—defined as an individual’s belief in their creative potential—serves as a critical driver of creative motivation12. This line of inquiry is supported by Sternberg’s7 assertion that creativity development is inextricably linked to motivation, and Bandura’s31 postulation that a robust sense of self-efficacy is fundamental to creative productivity and the discovery of new knowledge. Extant empirical studies on the K-DOCS have consistently revealed significant correlations between creative self-efficacy and creativity, with correlation coefficients ranging from 0.23 to 0.4820, 0.30 to 0.4632, and 0.25 to 0.4117, thus underscoring the intrinsic relationship between these constructs. Furthermore, the study will investigate the known-groups validity of the K-DOCS scale, a crucial indicator of construct validity given the absence of a gold standard for creativity measurement. This methodological approach is validated when the scale demonstrates the ability to differentiate between groups known to vary on the variable under investigation33. Through this multifaceted evaluation, incorporating CTT, Rasch analysis, concurrent validity assessment, and known-groups validation, this study endeavors to provide a comprehensive examination of the 20-K-DOCS, thereby contributing to the ongoing refinement and validation of creativity measurement instruments.

Method

Participants and data collection

To investigate the psychometric properties of the 20-K-DOCS among adolescents aged 15–18, a convenience sampling method was employed, beginning in January 2024. Assistance was sought from school administrators at two vocational secondary schools and two vocational high schools in a province of mainland China. To ensure adherence to ethical data collection procedures, the research purpose was initially presented to the classroom teachers of these schools through the administrators’ arrangement. The teachers were then encouraged to discuss the purpose with the students’ parents, either through face-to-face meetings or online chat groups. Electronic consent was obtained from parents who allowed their children to participate in the survey. Subsequently, participating students completed the online questionnaire in their school’s computer lab.

The inclusion criteria for participation in the study were as follows: (i) absence of clinically diagnosed mental illnesses, and (ii) provision of electronic consent forms by parents. The exclusion criterion was based on responding too quickly (less than 400 s) and failing an attention test question, which required respondents to select their favorite color. If a respondent selected a non-relevant response instead of a color, they were excluded from the analysis. Ultimately, 3,079 respondents were included in the study’s analysis. There were no missing data due to the platform’s functionality, which required all responses to be complete before submission.

It needs to be clarified that rather than offering course credits or other direct incentives to students, our recruitment approach mainly relied on school administrators to encourage teachers to promote student participation in the questionnaire survey. This method aimed to increase voluntary involvement while reducing potential bias from external rewards. To show appreciation for their support, our research team provided educational benefits to the participating schools. Specifically, we arranged for our team members, who are experienced in vocational skills competitions, to deliver two lectures on competition strategies to all students in each school. Additionally, we offered personalized coaching sessions for student teams preparing for national-level competitions. This arrangement not only aided our data collection but also supported the educational goals of the participating schools, creating a mutually beneficial research partnership.

Table 1 presents the demographic information of the participants, revealing an average age of 16.48 years (SD = 0.93), with a majority being male (57.8%, n = 1780). Furthermore, a higher proportion were enrolled in vocational high schools (61.1%, n = 1881). The distribution of participants across different grades was relatively even, with the exception of a notably smaller number in the third year of vocational high school (12.0%, n = 371). Regarding their fields of study, participants were enrolled in a diverse range of programs, with engineering being the most common, representing 33.1% (n = 1019) of the participants. This aligns with the characteristics of vocational education. The study was approved by the Jiangxi Psychological Consultant Association (IRB ref: JXSXL-2022-CL15). Informed consent was obtained from all individual participants included in the study.

Table 1 Participant characteristics (N = 3,079).

Instruments

20-item Kaufman domains of creativity scale

The 20-K-DOCS is a shortened version of the K-DOCS. As mentioned earlier, this version was proposed by Tan et al.20 to select the four highest factor loadings from the original K-DOCS. Therefore, the oblique five-factor structure is consistent between the shortened version and the original version. The scale includes five domains: Self/Everyday, Scholarly, Performance (encompassing writing and music), Mechanical/Scientific, and Artistic, with 4 items per domain, totaling 20 items. The instructions are as follows: “Compared to people of your age and life experience, how would you rate your creativity in the following behaviors? For behaviors you have not specifically engaged in, assess your creative potential based on your performance in similar tasks. The scale employs a 5-point Likert scale, ranging from 1 (very little creativity) to 5 (very creative). In the study by Tan et al.15, the Cronbach’s alpha coefficients for the five domains of the scale were 0.73 (Self/Everyday), 0.72 (Scholarly), 0.87 (Performance), 0.84 (Mechanical/Scientific), and 0.81 (Artistic), respectively.

Creative self-efficacy scale

In the present study, we utilized the self-belief in creative thinking strategies subscale from the CSE scale to assess participants’ creative self-efficacy. This subscale was chosen not only because of the absence of items requiring reverse scoring, which could potentially lead to confusion in the analysis, but more importantly, the highest Cronbach’s alpha internal consistency coefficient (0.83) of this subscale among the three34. The other two subscales of the CSE scale were included in the online survey; however, they were not analyzed as they contained items that needed to be reversed. The self-belief in creative thinking strategies subscale, developed by Hung35, comprises four items. Representative items include: “When facing novelty problems, I believe I can imagine a lot of solutions very quickly” and “When facing a challenging task, I believe I can imagine a lot of related knowledge.” Responses were recorded using a 4-point Likert scale, ranging from 1 (strongly disagree) to 4 (strongly agree). In Hung’s34 original study, the internal consistency of the four items, as measured by Cronbach’s alpha coefficient, was found to be 0.83 as presented above. In the current study, the McDonald’s omega for the CSE scale was 0.93, demonstrating high reliability.

Data analysis

The descriptive statistics for the 20-K-DOCS, along with the Pearson correlations among its subscales and their association with CSE scale were presented. In addition to basic analyses, this study conducted a multivariate analysis of variance (MANOVA) to compare 20-K-DOCS subscale levels among participants based on their experience in Vocational Schools Skills Competitions, testing the scale’s known-groups validity. The MANOVA was chosen to simultaneously analyze all dependent variables (K-DOCS subscales), accounting for potential interrelationships among them. Participants were categorized by their highest level of competition: national, provincial, municipal, school, or no participation. This competition, a significant event in China’s vocational education system with nearly 90 categories, progressively evaluates participants’ work from school to national levels.

The known-groups validity test is based on the hypothesis that participants with higher-level competition experience, particularly at the national level, should demonstrate higher 20-K-DOCS scores compared to those with less or no experience. This expectation stems from creativity being a key factor in competition success, the progressive selection process favoring creative individuals, and the potential for competitions to foster creative skill development and enhance creative self-efficacy. If confirmed, this would provide evidence for the 20-K-DOCS scale’s ability to differentiate between groups theoretically expected to have varying levels of creativity, thus partially supporting its construct validity. This approach is particularly valuable given the absence of a gold standard measure for creativity.

Subsequent analyses employed multidimensional Rasch modeling to examine the 20 K-DOCS items, accounting for the correlations among the five subscales. This approach allows for the simultaneous calibration of all subscales, utilizing the inter-subscale correlations to enhance measurement precision. The initial phase of analysis compares unidimensional and multidimensional Rasch models. Within this framework, we assess whether the Partial Credit Model (PCM) or the Andrich Rating Scale Model (RSM) is more appropriate for the data. Following this, the unidimensionality of each subscale and the monotonicity of response categories were evaluated. To assess unidimensionality, we applied the criterion that the eigenvalue of the first component in the principal component analysis (PCA) of the residuals should be less than 2.0036. Monotonicity was examined by calculating the category boundary parameters for each of the five subscales.

Having met the prerequisites for Rasch analysis, item validity was evaluated using information-weighted (INFIT) and outlier-sensitive (OUTFIT) fit statistics, with acceptable ranges for valid item responses set between 0.50 and 1.5037. The multidimensional Wright map was utilized to assess item targeting, offering a comprehensive visual representation of items from all five subscales in a single display. A person separation reliability threshold of 0.50, as suggested by Boone et al.38, was applied to examine the scale’s ability to differentiate participants with high and low creativity traits.

Following item diagnostics, model fit assessment focused on factorial, convergent, and discriminant validity using a CFA approach. This included testing several models: a unidimensional model, an oblique five-factor model, a nine-factor model (based on the original K-DOCS structure), a second-order model, a bifactor model (comprising one general factor and five specific factors), an Exploratory Structural Equation Modeling (ESEM) five-factor model, and an ESEM bifactor model. It should be noted that while the nine-factor model was included for comparison with the original K-DOCS structure, it was expected to be psychometrically problematic for the 20-item version due to having factors with too few items.

In addition to the common structures of second-order model and oblique five- and nine-factor models in K-DOCS, the rationale for incorporating both bifactor and ESEM approaches was to provide a comprehensive assessment of the 20-K-DOCS’ psychometric properties and underlying structure. The bifactor model, represented by a general factor and specific factors corresponding to the proportional structure of the 20-K-DOCS, was employed to address high factor correlations and determine the unique contributions of general and specific factors39. Simultaneously, ESEM was utilized to overcome limitations of traditional CFA, particularly in avoiding inflated factor correlations40. Unlike CFA, which constrains cross-loadings to zero, ESEM allows estimation of all possible cross-loadings while maintaining a confirmatory framework. This approach provides a more realistic representation of complex constructs, often yielding reduced and more accurate factor correlations, improved model fit, and a balance between exploratory flexibility and confirmatory hypothesis-testing40. By allowing items to load on multiple factors, ESEM mitigates issues of misspecification and biased parameter estimates common in CFA, especially for multidimensional constructs like creativity. Both bifactor and ESEM methods are well-suited for the 20-K-DOCS, enabling a nuanced examination of its factor structure while avoiding pitfalls of overly restrictive modeling, thus allowing for a more comprehensive understanding of the scale’s underlying structure.

Model fit was assessed using established criteria: Comparative Fit Index (CFI) and Non-Normed Fit Index (NNFI) values above 0.90, Root Mean Square Error of Approximation (RMSEA) below 0.06, and Standardized Root Mean Square Residual (SRMR) below 0.08, following Brown41. The mean- and variance-adjusted weighted least squares (WLSMV) estimation was employed, given the ordinal nature of the 20-K-DOCS, as it provides robust estimates for non-normal distributions typical in Likert scales42.

The selection of the most suitable model considered the enhanced model fit potentially afforded by bifactor, ESEM five-factor and ESEM bifactor models, as evidenced by notable improvements in RMSEA/SRMR and TLI/CFI indices surpassing 0.015 and 0.01, respectively43,44. After identifying the model that adequately represented the factor structure, the internal reliability of the 20-K-DOCS subscales was assessed using McDonald’s omega coefficients.

Upon selecting the best-fit model, convergent and discriminant validity were further examined based on standardized factor loadings and Average Variance Extracted (AVE). An item’s loading significance and magnitude were used as indicators of convergent validity for the construct, deemed satisfactory with an AVE above 0.50, in line with Fornell and Larcker45. Additionally, the 20-K-DOCS’s multidimensional factor discriminant validity was evaluated, with sufficient discriminant validity established when the AVE for two factors exceeded the squared correlation between those factors46.

In the subsequent phase of the CFA, we broadened our investigation to assess the concurrent validity of the 20-K-DOCS in relation to CSE, utilizing the superior model identified in the CFA phase. Employing SEM, we examined the path coefficients stemming from the latent constructs within the 20-K-DOCS towards CSE. The evaluative criteria applied to this model remained consistent with those established in the initial CFA, adhering to the same estimation procedures.

Results

Descriptive statistics, Pearson correlations, and internal reliability

Among the five subscales of the 20-K-DOCS, Everyday Creativity had the highest mean score (M = 13.81, SD = 3.87), followed by Scholarly Creativity (M = 13.16, SD = 4.01), Artistic Creativity (M = 13.03, SD = 4.15), Performance Creativity (M = 12.12, SD = 4.57), and Mechanical/Scientific Creativity (M = 12.07, SD = 4.46). All subscales showed significant positive correlations with creativity self-efficacy (r = .45 to 0.69, p < .001) (refer to Table 2).

Table 2 Pearson correlations between 20-K-DOCS and CSE scale (N = 3079).

A MANOVA was conducted to examine the difference in previous experience of attending the Vocational Schools Skills Competition on five subscales within the 20-K-DOCS. The overall MANOVA revealed a significant multivariate main effect for previous experience, Wilks’ λ = 0.99, F (20, 10183) = 1.76, p = .020. Given the significance of the overall test, the univariate main effects were examined. Significant univariate main effects were found for Performance Creativity (F = 4.29, p = .002), Mechanical/Scientific (F = 4.81, p < .001), and Artistic categories (F = 4.82, p < .001), but not for Self/Everyday Creativity and Scholarly Creativity. Post hoc analyses using Games-Howell indicated that participants who had experience attending the national level scored higher than those who did not participate in the competition. For Performance Creativity, national level participants (n = 78, M = 13.74, SD = 5.45) scored higher than non-participants (n = 2664, M = 12.00, SD = 4.57). Similarly, for the Mechanical/Scientific category, national level participants (n = 78, M = 13.77, SD = 5.48) outperformed non-participants (n = 2664, M = 11.95, SD = 4.44). In the Artistic category, national level participants (n = 78, M = 14.72, SD = 4.97) also scored higher than non-participants (n = 2664, M = 12.93, SD = 4.13). These findings provide partial support for the known-groups validity of the 20-K-DOCS, a component of construct validity. The results indicate that individuals who demonstrated exceptional creativity by reaching the national level of competition scored higher in specific categories than those who did not participate, aligning with the theoretical expectation that groups with different levels of creative achievement should be distinguishable by the scale.

Multidimensional Rasch analysis

Table S1 in the supplementary materials compares unidimensional and multidimensional Rasch models using RSM and PCM. The results indicate that PCM demonstrated superior fit in both unidimensional and multidimensional contexts, with lower AIC and BIC values. Likelihood ratio tests confirmed significantly lower deviance for PCM compared to RSM in both cases (p < .001). Within the PCM framework, the multidimensional Rasch model outperformed its unidimensional counterpart, exhibiting lower AIC and BIC values. A further likelihood ratio test revealed significantly lower deviance for the multidimensional PCM (p < .001). These findings suggest that the multidimensional Rasch model using PCM provides the most appropriate fit to the data, supporting its use in subsequent analyses.

Unidimensionality of each subscale was confirmed by PCA of the residuals. First component eigenvalues ranged from 1.51 to 1.81 across the five subscales, all below the 2.00 threshold, indicating no significant secondary dimensions. Table 3 presents the thresholds for the response categories and reveals that several items did not demonstrate monotonic category increases. Specifically, items 1, 2, and 3 in the Subscale of Self/Everyday Creativity, as well as all items in the Subscales of Performance Creativity and Artistic Creativity, exhibited violations of monotonicity in steps 3 and 4. To address this issue, we adopted the approach of collapsing response categories 4 and 5 into a single category. The revised items demonstrated satisfactory monotonicity, and consequently, we utilized these modified items to conduct further analyses of factorial and concurrent validity.

Table 3 Item difficulties and response category boundary parameters for the five factors of K-DOCS-20.

Moreover, Table 3 also shows that the infit/outfit of all items fell within the acceptable range of 0.5 to 1.5, suggesting that each item possessed validity and provided optimal information for the trait being measured. Within each subscale, the items identified as the most challenging were item 2 for the Self/Everyday Creativity subscale, item 5 for the Scholarly Creativity subscale, and items 12, 13, and 19 for the Performance Creativity, Mechanical/Scientific Creativity, and Artistic Creativity subscales, respectively.

The multidimensional Wright Map analysis, as depicted in Fig. 1, reveals a distribution of items across different difficulty levels for each subscale. While there is a concentration of items near the zero point (the mean difficulty level), the map also shows some spread of item difficulties. This distribution indicates that the test items cover a range of ability levels, with many items appropriately targeted to correspond with the average ability of the participant group. However, the map also highlights measurement gaps, particularly at the extreme ends of the latent ability spectrum. This suggests that while the scale is generally well-calibrated for the cohort to which it was administered, there may be limitations in assessing individuals with very high or very low levels of creativity in each dimension. The presence of these gaps indicates potential areas where additional items might be needed to fully capture the breadth of the constructs being measured, especially at the extremes of the ability range.

Fig. 1
figure 1

Wright map of the five subscales for the 20-item K-DOCS. SEC, Self/everyday creativity; SC, Scholarly creativity; PC, Performance creativity; M/SC, Mechanical/cientific creativity; AC, Artistic creativity.

The person separation reliability, quantified by the Expected A Posteriori (EAP) reliability coefficient, exceeded 0.85 for each subscale within the multidimensional Rasch model. This indicates that the scales demonstrate adequate precision to differentiate between varying levels of the latent traits for most of the study participants. However, in light of the Wright Map analysis, it is important to note that this high reliability may not extend to participants at the extreme ends of the ability spectrum.

Factorial, convergent, and discriminant validity

Table 4 provides a detailed summary of the model fit indices for various structural models using the revised items (those with category collapsing). These models include the one-factor model, the oblique five-factor model, nine-factor model, the second-order model, the bifactor model, ESEM five-factor model, and the ESEM bifactor model. The findings reveal that the oblique five-factor, the nine-factor, ESEM five-factor, and ESEM bifactor models all exhibited acceptable levels of model fit. In contrast, the one-factor, second-order, and bifactor models did not fit well with the data. It is particularly noteworthy that the improvement in model fit from the ESEM five-factor and ESEM bifactor models, as the differences between these models did not meet the predefined significance thresholds. Given the principle of parsimony, which prefers simpler models when the fit is similarly good, ESEM five-factor model appears to be a better choice than ESEM bifactor model. Most importantly, the factor correlations in the oblique five-factor model, which ranged from 0.71 to 0.93, were notably reduced in the ESEM five-factor model to a range of 0.34 to 0.69, highlighting the efficacy of the ESEM in addressing overly high factor correlations. Besides, although the nine-factor model fits better than the five-factor model, due to the limitation of having only a single item in two dimensions, there is no way to truly test their psychometric properties. Based on the above, the oblique five-factor model and ESEM five-factor model were identified as adequately representing the factor structure for further analysis. Moreover, the subscales demonstrated excellent internal consistency reliability in both the oblique five-factor and ESEM five-factor models. The McDonald’s omega coefficients ranged from 0.91 to 0.95 for the oblique five-factor model and from 0.80 to 0.83 for the ESEM five-factor model, indicating strong reliability across both approaches.

Table 4 Model fit across various factor structures in the 20-K-DOCS.

In the oblique five-factor structure of the 20-K-DOCS, the factor loadings for each item within the subscales surpassed 0.80, indicating robust item-specific contributions to their respective factors. Subsequent calculations of AVE produced values ranging from 0.74 to 0.82, affirming a commendable level of convergent validity for the five-factor model of the 20-K-DOCS. Nevertheless, the pronounced correlations among the latent variables challenge the discriminant validity within this structural framework.

Conversely, the ESEM outcomes, as detailed in Table 5, revealed that the majority of item-specific target loadings exceeded 0.50, with cross-loading issues being generally negligible, predominantly under 0.30, although a few exceptions were noted. Specifically, Items 4, 9, and 13 exhibited cross-loadings exceeding 0.30, which, while notable, were observed in only a small number of cases. This observation largely stipulated that convergent validity is substantiated by the magnitude of an item’s factor loadings on its designated factor surpassing those on non-associated factors, with the aforementioned exceptions warranting further consideration.

Table 5 Standardized factor loadings derived from ESEM models.

Concurrent validity

In the exploration of the concurrent validity of the 20-K-DOCS in relation to CSE, the SEM analysis, based on the oblique five-factor model, yielded the following results: χ² (df) = 3427.05 (237), CFI = 0.986, NNFI = 0.984, RMSEA = 0.066, and SRMR = 0.026. Notably, the factors of Artistic Creativity (β = 0.39, t = 9.51, p < .001), Scholarly Creativity (β = 0.20, t = 2.13, p = .033), and Mechanical/ Scientific Creativity (β = 0.15, t = 3.29, p < .001) demonstrated significant positive associations with CSE, while Self/Everyday Creativity and Performance Creativity showed no significant relationship (Refer to Figure S1 in the supplementary file).

The ESEM five-factor model (Refer to Figure S2 in the supplementary file) also revealed an exemplary model fit: χ² (df) = 835.29 (177), CFI = 0.997, NNFI = 0.996, RMSEA = 0.035, SRMR = 0.011. Within this model, all path coefficients were significantly positively correlated with CSE, in descending order of magnitude: Artistic Creativity (β = 0.34, t = 13.43, p < .001), Scholarly Creativity (β = 0.21, t = 6.23, p < .001), Mechanical/ Scientific Creativity (β = 0.18, t = 7.85, p < .001), Self/Everyday Creativity (β = 0.17, t = 5.90, p < .001), and Performance Creativity (β = 0.09, t = 4.20, p < .001).

Discussion

This study comprehensively assessed the psychometric properties of the 20-K-DOCS on a large sample of Chinese adolescents aged 15–18. The results demonstrated excellent internal consistency for each subscale, and satisfactory infit and outfit statistics supported the notion that the items within each subscale effectively measured their respective factors. Furthermore, the person separation reliability for all subscales was greater than 0.80, suggesting that the scale can effectively differentiate between different levels of creativity facets. The five-factor structure was confirmed by both CFA and ESEM. Concurrent validity was also established, with positive associations found between the five facets of creativity in the K-DOCS and creative self-efficacy47.

Notably, this study utilized multidimensional Rasch analysis to diagnose each item of the 20-item K-DOCS, a method rarely employed in previous psychometric studies of the K-DOCS. To our knowledge, only Susanto et al.48 have applied Rasch analysis to the original K-DOCS using Indonesian students, though their sample size of less than 100 participants limited the reliability of their results. In our study, with a larger and more suitable sample size, we employed multidimensional Rasch analysis to evaluate the 20-item K-DOCS. While we confirmed the general validity of each item, our analysis also revealed nuanced insights into the scale’s performance across different levels of the measured constructs. Our findings indicate that while the 20-K-DOCS demonstrates overall validity, it functions most effectively for individuals within the average to moderately high or low range of creative abilities. For participants at the extremes of the creativity spectrum in each dimension, the measurement precision may be reduced due to the lack of items targeting these exceptional ability levels. This insight, rather than contradicting the scale’s validity, highlights an area for potential refinement. Further development of the 20-K-DOCS could focus on enhancing its ability to discriminate across the full spectrum of creative abilities in each dimension, particularly by adding items that better target very high and very low levels of creativity. Such improvements could increase the scale’s overall measurement precision and utility, especially for assessing individuals with exceptional creative abilities.

Moreover, the unidimensional structure of the 20-item K-DOCS did not fit the data well, providing initial support for the multifaceted nature of creativity rather than a singular construct. Poor fit of the one-factor structure has also been found in previous studies of the K-DOCS (50-item English version15,16; 16-item Chinese version17). Further supporting this perspective, our analyses using bifactor and ESEM bifactor models indicated that the addition of a general factor did not significantly improve model fit. Specifically, the improvements in model fit indices were minimal when a general factor was included, suggesting that the general factor lacks robustness. To our knowledge, this is the first time the bifactor approach has been used to assess the factor structure of the K-DOCS. Our findings echo those of Kapoor et al.16, who suggested that it is not recommended to summate K-DOCS scores on all items to obtain a unitary “creativity” score. In this vein, our findings favor the domain-specific perspective of creativity, highlighting the importance of considering distinct facets rather than an overall construct (i.e., domain-generality). This finding is consistent with previous research, which has shown that creativity is better understood as a set of domain-specific factors rather than a single, overarching construct9,10. This aligns with the arguments of Kaufman and Baer49, who proposed that creativity manifests differently across various domains and that a domain-specific approach is more appropriate.

However, it is important to note that the correlations between the creativity facets shown in the ESEM five-factor model in our study still had moderate effects, suggesting potential overlap among these distinct factors. This finding aligns with the recent work of Acar et al.50, who tested the K-DOCS using a meta-analytic structural equation modeling approach. Their results supported both a general domain of creativity and a two-factor solution consisting of Arts and Sciences factors using a second-order CFA. Although our study diverges from Acar et al.‘s findings in that the second-order CFA model did not fit well in our analysis, we still advocate for a conservative conclusion that considers both domain-generality and domain-specificity. This perspective aligns more closely with the APT model of creativity8. Given that bifactor models and second-order CFAs are rarely reported in creativity research, we acknowledge that further evidence is needed to definitively establish the most appropriate structural model for creativity as measured by the 20-K-DOCS. This cautious approach recognizes the complexity of creativity as a construct and the need for continued investigation into its underlying structure.

Surprisingly, the factor correlations demonstrated excessively high correlations among the five creativity facets of the 20-item K-DOCS, contrary to most studies of the K-DOCS17,20,51. It is worth considering whether the unique characteristics of the adolescent participants (including the age period and their enrollment in vocational education) in this study led to this exception, and future research should investigate this further, given that most studies of the K-DOCS have been conducted with university students or general adult populations50. Despite the excessive correlations found in the zero-order correlations and traditional CFA, the application of ESEM in this study provides a novel approach. Previous research on the K-DOCS encountered cross-loading issues with items13and resolved this by removing the problematic items. However, using ESEM, a more effective solution may be to retain the original items. The use of ESEM was particularly effective in addressing the issue of excessive correlations between factors, thereby enhancing discriminant validity. This aligns with the recommendations of Alamer and Marsh44and van Zyl and ten Klooster39, who highlighted that ESEM can effectively mitigate high factor correlations that often compromise the discriminant validity of measurement models. In our study, the factor correlations in the oblique five-factor model ranged from 0.70 to 0.92 but were significantly reduced in the ESEM five-factor model to a range of 0.33 to 0.68. This reduction underscores the utility of ESEM in providing a clearer differentiation between the distinct facets of creativity. Furthermore, echoing previous studies15,19, our analysis found that the nine-factor model provided an acceptable fit. However, due to the presence of single-item subscales in this structure, a comprehensive assessment was not feasible in the current study. This limitation presents an opportunity for future research to further investigate and validate the nine-factor structure using a version of the K-DOCS that includes multiple items for each subscale.

Lastly, the five creativity facets of the 20-item K-DOCS demonstrated significant associations with creative self-efficacy, with artistic creativity exhibiting the strongest correlation. This finding aligns with the results reported by Tan et al.20, suggesting that the relationships between different domains of creativity and other creativity-related variables may vary. Moreover, our study provided evidence for known-groups validity. Students who had participated in national-level competitions demonstrated significantly higher scores in both mathematical and artistic creativity compared to those who had not engaged in such competitions. This observation further underscores the differential impact of creativity dimensions on individuals and their roles in creativity-related tasks. To further advance our understanding of this complex construct, future research should continue to explore and validate the domain-specific factors of creativity across diverse contexts, employing rigorous methodologies and considering potential moderating variables.

This study has important implications for future research on creativity both in theoretical and practical perspectives. Theoretically, through thoroughly evaluating the psychometric properties of the K-DOCS, it contributes significantly to the growing body of literature on this scale and provides valuable insights into the nature of creativity in Chinese adolescents in addition to university students. Particularly, the application of multidimensional Rasch analysis and ESEM offers a nuanced understanding of the scale’s structure and item functioning. While our findings largely support the use of the 20-item K-DOCS as a reliable and valid measure of domain-specific creativity and underscore the importance of considering both domain-generality and domain-specificity in creativity research, they also highlight areas that require further investigation. Specifically, the multidimensional Rasch analysis identified some items with non-monotonically increasing category thresholds, and the ESEM analysis revealed cross-loading issues for certain items. These findings suggest that while the 20-K-DOCS is a promising tool, there is room for refinement. Future research should address these issues, potentially through item revision or the development of additional items to enhance the scale’s psychometric properties. Furthermore, continued exploration of the interplay between domain-general and domain-specific aspects of creativity could lead to a more comprehensive understanding of this complex construct.

Practically speaking, this research has important implications for practitioners using the 20-K-DOCS: First, the assessment tool is most reliable when evaluating individuals at the higher or lower ends of the creativity spectrum. Second, to obtain meaningful insights, practitioners should avoid using composite scores and instead analyze each creative domain independently, as this better captures domain-specific creative abilities and can better inform targeted interventions or development strategies.

Conclusions and limitations

This study provides robust evidence for the psychometric properties of the 20-item K-DOCS in a large sample of Chinese adolescents, supporting the multifaceted and domain-specific nature of creativity. Our findings demonstrate the scale’s overall validity and reliability for measuring domain-specific creativity. The differential associations between creativity facets and creative self-efficacy, along with higher mathematical and artistic creativity scores among national-level competition participants, underscore the importance of domain-specific considerations in creativity research. However, our analysis also identified areas for potential improvement, particularly in measuring extreme levels of creativity and in the functioning of some items (i.e., non-monotonic thresholds and cross-loading issues). These results suggest directions for future scale development and refinement to improve measurement across the full spectrum of creative abilities.

This study has some limitations that should be considered. Firstly, the sample consisted of Chinese adolescents aged 15–18, which may limit the generalizability of the findings to other age groups and cultural contexts. Particularly, the exceptionally high correlations among the creativity facets raise the question of whether the vocational school setting or the developmental stage of the participants contributed to this finding. Future studies should investigate the psychometric properties of the 20-item K-DOCS in diverse populations to determine if these high correlations persist across different contexts and age groups. Secondly, the criterion variable in this study was limited to creative self-efficacy, without extending to other relevant constructs. To establish a more comprehensive understanding of the 20-item K-DOCS’s criterion validity, future research should examine its relationships with a broader range of creativity-related variables, such as creative achievement, divergent thinking, and personality traits associated with creativity. Thirdly, the 20-item version adopted in the current study has limited sensitivity at the extremes of the creativity spectrum. Future research could explore alternative item selection strategies or develop supplementary items to better capture exceptional levels of creativity, expanding the scale’s applicability across a broader range of creative capacities. Lastly, this study relied on self-report measures, which may be subject to response biases, despite the inclusion of an attention check. Self-report assessments are inherently vulnerable to social desirability bias and other forms of response distortion, which could potentially influence the results. Future research could incorporate other assessment methods, such as performance-based tasks or observer ratings, to provide a more comprehensive and objective evaluation of creativity.