Abstract
A clinically useful characterization of the cognitive aging process requires the development of valid and robust behavioral tests, with an emphasis on explaining and understanding typical inter-individual variability in cognition. Here, using a dataset that includes behavioral scores collected with the National Institute of Health Toolbox Cognition Battery (NIHTB-CB) and other auxiliary tests, we examined (1) the differences between young and old adults across different cognitive domains, (2) the strength of across-subject correlations in behavioral test scores, (3) the consistency of low-dimensional behavioral representations across age using factor analysis, and (4) the accuracy of behavioral scores in predicting participants’ age. Our results revealed that (1) elderly females had better verbal episodic memory scores than elderly males, (2) across-subject correlations between behavioral tests varied with age group, (3) although a three-factor model explained the behavioral data in both age groups, some tasks loaded to different factors between the two groups, and (4) age-performance relationship (i.e. a regression model linking age to cognitive scores) in one group cannot be extrapolated to predict age in the other group, indicating an inconsistency in age-performance relationships across groups. These findings suggest that executive function tests might tap into different cognitive processes in different age groups, which might ultimately suggest that a statistically significant between-group difference in test performance might not always reflect differences in the same underlying cognitive processes. Overall, this study calls for more caution when interpreting age-related differences and similarities between age groups with different cognitive abilities even when the same tests are used.
Similar content being viewed by others
Introduction
As we age, gradual and natural changes can be observed and quantified across diverse cognitive and affective dimensions. Perhaps most importantly, the rate of such changes varies considerably across people, suggesting different aging trajectories caused by either healthy (normal) aging or pathological (abnormal) aging processes that are mediated by a complex interplay between several mechanisms1. Accurate characterization of aging trajectories along the cognitive dimension relies on the accurate assessment of decline in cognitive abilities, including measures of working memory, processing speed, and executive function2, where age-related changes in processing speed are believed to account for changes in several cognitive domains3. Such gradual cognitive decline might occur while most language skills remain intact and semantic knowledge can even improve with age4. To offer an accurate characterization of pathological aging at the individual level, it is important to define what is ‘normal’ aging, that is the range of performance over cognitive tests that could be considered normal or typical. In other words, it is important to fully characterize what is normal aging to enable clinically useful diagnostic decisions about abnormal aging.
One potential cause of cognitive aging is neural dedifferentiation5,6, which reflects brain responses becoming less precise and selective as people get older. Neural dedifferentiation has been negatively associated with cognitive performance specifically on tasks that require fluid processing and memory in healthy aging6. It has also been reported as an age-related increase in the correlation between different cognitive abilities7,8,9,10. Thus, understanding the role of neural dedifferentiation in cognitive performance is essential to explaining aging trajectories, but dedifferentiation might reflect both age-dependent and age-independent factors (see discussion in6). Therefore, it is of paramount importance to study cognitive aging with tests that are sensitive to age-dependent factors. Here, we show how some widely used cognitive tests can tap into different processes depending on age.
Healthy aging versus pathological aging is commonly reported as significant differences in performance in well-documented cognitive tests, many of them already available in existing validated neuropsychological batteries such as the National Institute of Health Toolbox Cognitive Battery (NIHTB-CB). Such cognitive tests are not independent, i.e., they are strongly correlated and can thus be assigned to a common underlying domain or construct (e.g., memory, executive function, language, . etc.). Indeed, prior work has shown that cognitive scores in different tests can be reduced to a few dimensions (components or latent variables), meaning that such tests are highly dependent across individuals11,12. Consequently, individual scores across many cognitive tests can be reduced to a behavioral representation (profile) with few dimensions. Such low-dimensional behavioral representation can explain the common variance in the observed measures across subjects, thereby offering a lower space into which individual cognitive scores can be projected. One question of interest concerns the consistency of such low-dimensional behavioral spaces across age. Are the identified components similar or different across the lifespan? Do tests measure the same abilities in groups with different demographics? Are behavioral scores, with performance measured as accuracy or reaction time, sensitive enough to subtle age-induced variations? Do age-related differences in behavioral scores reflect changes in domain-specific or domain-general processes13?
These questions are of paramount importance to the study of aging, and for the development of reliable neuropsychological assessment tools. To study the reliability of the measured constructs by a battery across age, previous studies examined measurement invariance that reflects the equivalence of a given construct across age groups. This is because measurement invariance underpins valid interpretation of group differences in cognitive scores14. Measurement invariance is typically assessed at four nested levels: (1) configural invariance estimates the consistency of the latent cognitive structure, (2) metric invariance measures the similarity of the strength of the relationship between the observed behavioral scores and the assumed cognitive domain (e.g. factor loadings), (3) scalar invariance tests whether the starting points (intercepts) are equivalent for the tested groups, and (4) at the most detailed level, strict invariance examines whether the unexplained variance is consistent across groups. For instance, Kiselica et al. (2020) examined the invariance of a hypothesized factor model (attention, visual, processing speed/executive, memory, and language factors) for the Uniform Dataset (UDS) neuropsychological battery using confirmatory factor analysis across sex, age and education on a sample aged 68.33 ± 10.26 years15. They showed that the factor model achieved at minimum a metric invariance over a 1-year follow-up assessment and across different demographic groups. In a 4-year longitudinal study with a diverse sample of participants aged over 54 years old, a 2-factor model of memory and executive function showed strict invariance of the neuropsychological battery used in the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil)16. Similarly, another study reported metric invariance on healthy older adults aged 65.7 ± 8.2 years over an average period of two years17. Using the NIHTB-CB tests, a 5-factor model (vocabulary, reading, episodic memory, working memory and executive function/processing speed) and a second-order two-factor model (fluid vs. crystallized skills) were found to consistently explain the measures recorded from different age groups (aged 20–60 years and 60–85 years)18.
In contrast, other studies showed differences across groups when behavioral data were reduced to a low-dimensional space. For instance, by examining the latent structure of executive function across 3 age groups19, it has been shown that a three-factor model of shifting, inhibition, and updating was a better fit for behavioral data in young adults but not in the older groups. Instead, the data of older groups were better explained by a two-factor model of updating/inhibition and shifting19. Likewise, a recent study evaluated the factor structure of the NIHTB-CB tests in a sample of 411 cognitively healthy and impaired participants aged between 45 and 94 years old20. A 2-factor model (fluid vs. crystallized skills) was a good fit for participants aged 65 and above while younger participants’ performance was better explained using a three-factor model of language, executive functions, and memory20. These studies suggested that measurement invariance was not met over the tested groups, raising concerns about the consistency of the measured constructs by these tests as identified by the factorial representations across age. This begs the question of the validity of the comparisons in cognitive scores between young and old adults, considering that the original tests used to measure these cognitive scores might presumably relate to different behavioral dimensions or factors across age groups.
In this context, we examined here the assumption of measurement invariance for some well-established behavioral tests administered to both healthy young and older adults. Specifically, we examined quantitative differences between young and older subjects using validated tests from the NIHTB-CB battery, the Mini-Mental State Examination (MMSE) test, in addition to validated supplementary tests. We also highlighted differences in cognitive scores that might be influenced by sex or education. Correlations, in the across-subject dimension, between the different tasks were assessed for each group, with an emphasis on their variability between and within each domain (e.g., executive function, memory). Lastly, with factor analyses, we investigated the qualitative equivalency of the constructs measured by validated tests between young and old groups. Our analysis revealed that not all tasks were functionally (psychometrically) equivalent in both age groups.
Methods
Data description
This study used the cognitive measures from the neurocognitive aging dataset21,22. The dataset was collected from 163 younger adults, aged 18–34 years old (males: 70, females: 93), and 120 older adults, aged 60–89 years (males: 54, females: 66) in two different recording sites. All participants had a minimum of 12 years of schooling. The administered battery included tasks from NIHTB-CB23: the Rey Auditory Verbal Learning, Picture Sequence Memory, Flanker Inhibitory Control and Attention, Dimensional Change Card Sort (DCCS), List Sorting Working Memory (LSWM), Picture Vocabulary, and Oral Reading Recognition tests. In addition, validated auxiliary tasks were also included: The Verbal Paired Associates from the Wechsler Memory Scale-IV24, the Associative Recall Paradigm25, Shipley-2 Vocabulary26, Trail Making Test B-A (TMT)27, the Reading Span Task28, and the Symbol Digit Modalities Test (SDMT)29. The Mini-Mental State Examination (MMSE) test was also administered to all participants. These tasks were chosen to target the episodic memory, semantic memory, executive functions and processing speed domains21; a detailed description of the tasks can be found in Table S1 of the Online Resource. Demographic data including age, gender, and education level were also collected.
Data preparation
Raw performance measures in each task were obtained21, where each task was evaluated using either accuracy, reaction time, or a composite score of accuracy and reaction time as instructed by the tests’ manuals (Table S1). For the purposes of this study, the following preprocessing steps were carried out: Trail Making B-A score was given as the time difference between the two parts of the test, therefore we reversed the sign of this score such that, on every test, a higher score reflected a better performance14. Second, missing values were imputed using Multivariate Imputation by Chained Equations (MICE) for each age group separately. In the current version of the dataset, less than 5% of each variable was missing with a total of 16.56% from the young group and 20.83% from the older group. Raw scores were transformed to Z-scores for all measures and used for all the following steps except when comparing the task performance with 2-sample t-tests between the two age groups. Lastly, outliers were detected and removed using the boxplot method with a threshold of 3 (resulting in 17 outliers from the young and 8 outliers from the elderly). All variables followed a normal distribution as indicated by the kurtosis and skewness estimates. However multivariate abnormality was observed in the young adults group.
Data analysis
We implemented three analyses (noted Analysis A, B and C) to address different but complementary questions.
Analysis A involved the assessment of between-group comparisons of scores for each task and the assessment of across-subject correlations between tasks. Specifically, multiple statistical t-tests were used to assess performance differences between the two age groups and between genders at a Bonferroni-corrected significance level of p < 0.05. Pairwise Pearson r correlation, in the across-subject dimension, was calculated to identify relationships between tasks as well as correlations between tasks and other variables (age, years of education) for each age group separately. Our rationale here is that tasks that covary across participants (i.e. their scores going down and up across participants) can be considered as part of the same domain (positive correlation) or opposite domains (negative correlation). This analysis yielded a 17 × 17 correlation matrix for each group. The statistical difference between the correlation matrices was assessed using Fisher’s method at a Bonferroni-corrected significance level of p < 0.05. Last but not least, we also tested whether all participants, regardless of their age, could be clustered into two groups on the basis of their behavioral scores only. For that aim, K-means clustering was used to reclassify participants into groups based on their cognitive performance rather than their age. Specifically, the z-scores of the cognitive measures from all participants were submitted to the K-means clustering method with a pre-specified number of clusters of two. The clustering provided a means to examine whether the behavioral scores pattern (range, variance, correlation) is specific to each age group; i.e. if the obtained data-driven groups with K-means match the original young and older groups. A good matching between the data-driven K-means groups and the original age groups would provide strong evidence that variation in behavioral scores does certainly reflect age-related effects.
Analysis B identified the low-dimensional behavioral representations (i.e. constructs) of the included cognitive measures using factor analyses. Confirmatory factor analysis was used to fit a baseline model in each group using robust maximum likelihood estimator to account for the violation of the multivariate normality assumption30. The cognitive ability model was based on the Cattel-Horn-Carroll (CHC) theoretical model as well as on prior work on the factor structure of the NIHTB-CB battery18,20,31,32. The hypothesized model contained three factors: (1) long-term storage and retrieval (Glr) (Verbal Paired Associates: Immediate Recall, Delayed Recall, Delayed Free Recall; Associative Recall, NIH Cognition Rey Auditory; NIH Cognition Picture Sequence Memory), (2) crystallized intelligence (Gc) (Shipley Vocabulary; NIH Cognition Picture Vocabulary; NIH Cognition Oral Reading Recognition), (3) visual processing (Gv)/processing speed (Gs) (Trail Making Task: B-A; NIH Cognition Flanker Inhibitory Control and Attention; NIH Cognition Dimensional Change Card Sort; NIH Cognition List Sort Working Memory). This model was developed while following the rule of thumb that at least three indicators are required per factor to identify a factor model properly33. Processing speed could not be identified in a separate factor, which is a common problem when an insufficient number of tests is administered to support an independent factor34,35. We relied on the Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR) to judge the fit of the model. CFI and TLI values larger than 0.9 and RMSEA and SRMR values smaller than 0.07 would indicate an adequate fit36,37. Next, we tested for measurement invariance using the Multi-Group Confirmatory Factor Analysis (MGCFA), including configural, metric, scalar, and strict models. These models were compared statistically using the difference in the chi-square-statistics38.
Since full measurement invariance could not be established across groups (see Results Section below), a follow-up Exploratory Factor Analysis (EFA) was conducted to model the underlying structure of the cognitive measures without imposing strict theoretical constraints for each age group separately39,40. The factorability of the data was evaluated using the Kaiser-Meyer-Olkin (KMO) test and Bartlett’s sphericity test41. KMO represents the percentage of variance among variables that could be considered common variance, with higher KMO values indicating the suitability of the data for factor analysis42. Bartlett’s sphericity test determines if a correlation matrix differs considerably from the identity matrix43, with significant results indicating the suitability of the variables in the correlation matrix for factor analysis. Several factor analysis extraction methods can be used for EFA, such as Maximum Likelihood (ML) and Principal Axis Factoring (PAF). Here, we used PAF as it is recommended for relatively small samples44,45,46, with oblique rotation (ProMax) for easier interpretation of the results. To determine the optimal number of factors to be considered, a technique called Parallel Analysis (PA) was implemented44. It compares the results of factor analyses conducted on the original data to randomly generated synthetic data with the same dimensions as the original dataset, over multiple iterations (here we performed 100 iterations). The optimal number of factors is identified at the point where the eigenvalues from the synthetic data exceed those from the original data. Results from PA were also compared to the scree plot, and the smallest to the largest number of factors from these two methods were tested. The best model was selected based on interpretability, where each factor should have a minimum of 3 variables loading saliently on that factor20,47. Furthermore, a hierarchical clustering analysis was performed to verify the EFA results in each group with correlation as a distance metric. The final step involved comparing the resulting factor model across the two age groups. All factor analyses were performed in RStudio (v1.1.456) using Lavaan package.
In Analysis C, regression models were built to assess how age can be predicted from behavioral scores. First, an Ordinary Least Square (OLS) regression with lasso regularization and “leave one out” cross-validation was used to predict the age of participants based on their performance on all tasks (age as the dependent variable and behavioral scores as the independent variables). Second, we tested whether the relationships between the cognitive measures and age observed in one age group hold true to another age group. This was implemented in two stages: (1) an OLS regression model was fitted using the behavioral scores as independent variables and age as the dependent variable from one age group (e.g. the young adults group), and (2) that estimated regression model was then used to predict the age of the other group (e.g. the older adults group). Given the recent interest in brain age48,49, and assuming that the behavioral tests reflect brain function, this analysis helped (1) assess how well age is reflected in the cognitive scores measured with the used neuropsychological assessment tool, (2) test if scores from old adults represent a natural progression (extrapolation) of scores from the young group. The latter would essentially verify whether the age-behavior relationship is relatively consistent across different age groups.
Results
Analysis A
All task scores were significantly different between the age groups at the corrected p < 0.05 level. As expected, young adults had higher scores on the MMSE, episodic memory and executive function measures, whereas older adults had significantly better performance on the semantic knowledge tasks. This pattern of differences was also reported by the original authors and held true after controlling for gender, education and recording site21. However, when we examined gender effects within each age group, we found that in the older adult group, female participants had higher scores on the verbal episodic memory tasks (Immediate Recall (p < 0.01), Delayed Recall (p = 0.02), Delayed Free Recall (p = 0.05) and Associative Recall (p = 0.01)) compared to male participants.
Figure 1 shows the correlations between the tasks, including age and years of education, for each age group. Two semantic memory tasks were moderately correlated with education, while the symbol digit modality test and the dimensional change card sorting were moderately negatively correlated with age in the older adult group only. Likewise, using Fisher’s method, stronger correlations were also observed between the list sorting task and the semantic memory tasks in the young adults compared to the older participants (Fig. 2).
Spearman correlation matrix of the neurocognitive measures along with age and education for each age groups. vpa_imm_tot: Verbal Paired Associates Immediate Recall, vpa_delay: Verbal Paired Associates Delayed Recall, vpa_fr_delay: Verbal Paired Associates Delayed Free Recall, associative_recall: Associative Recall, nihcog_rey: NIH Cognition Rey Auditory Verbal Learning: Immediate Recall, nihcog_psm: NIH Cognition Picture Sequence Memory, shipley_vocab: Shipley Vocabulary, nihcog_orr: NIH Cognition Oral Reading Recognition, nihcog_pva: NIH Cognition Picture Vocabulary, nihcog_lswm: NIH Cognition List Sorting Working Memory, nihcog_dccs: NIH Cognition Dimensional Change Card Sort, nihcog_flanker: NIH Cognition Flanker Inhibitory Control and Attention, trails_b.a: Trail Making Task: B-A, sdmt_oral: Symbol Digits Modality Test.
Variables with significant correlation differences (p < 0.05 corrected) between the two age groups. vpa_imm_tot: Verbal Paired Associates Immediate Recall, vpa_delay: Verbal Paired Associates Delayed Recall, vpa_fr_delay: Verbal Paired Associates Delayed Free Recall, associative_recall: Associative Recall, shipley_vocab: Shipley Vocabulary, nihcog_pva: NIH Cognition Picture Vocabulary, nihcog_lswm: NIH Cognition List Sort Working Memory.
Interestingly, the two groups resulting from the data-driven K-means clustering were significantly different in age, with a 90% match to the original age groups. Specifically, Group 1 had 13 and 133 young and old adults respectively, whereas Group 2 had 100 and 10 young and old adults respectively.
Analysis B
The proposed cognitive ability model, based on the Cattel-Horn-Carroll (CHC) theoretical model, was tested with CFA across the two groups. CFA results indicated poor fit in both young adults (\(\:{\chi\:}_{\left(74\right)}^{2}=184.424,\:p<0.001;\:RMSEA=\:0.1;\:SRMR=\:0.126;\:CFI=0.771;\:TLI=0.719\)), and older adults (\(\:{\chi\:}_{\left(74\right)}^{2}=125.741,\:\:p<0.001;\:RMSEA=\:0.08;\:SRMR=\:0.089;\:CFI=0.905;\:TLI=0.883\)). Modification indices were examined, and two error terms were allowed to correlate for both groups (nihcog dccs-nihcog flanker, vpa_imm_tot-vpa_delay); these scores originate from the same domain, and each pair share the same scoring method. This resulted in the following fit indices: young adults: (\(\:{\chi\:}_{\left(72\right)}^{2}=99.246,\:\:p=0.018;\:RMSEA=\:0.052;\:SRMR=\:0.071;\:CFI=0.94;\:TLI=0.924\)), and older adults: (\(\:{\chi\:}_{\left(72\right)}^{2}=112.015,\:\:p=0.002;\:RMSEA=\:0.069;\:SRMR=\:0.076;\:CFI=0.932;\:TLI=0.914\)), suggesting that the model shows acceptable fit across groups according to the predefined goodness-of-fit criteria.
Next, we tested the model with the modification indices using MGCFA. Configural invariance models demonstrated a good fit to the data. The test of the next higher level of invariance, metric invariance, requires that the factor loadings between items and constructs are invariant across groups. This model was rejected according to the chi-square difference test. Partial invariance was tested at this level by relaxing constraints on variable loadings that show the largest modification index, one variable at a time. First, the loadings of the LSWM from the NIHTB-CB were freely estimated; however, the chi-square test still indicated a lack of metric invariance. After re-examining the modification indices, verbal paired associate delayed recall test loadings were freely estimated across the two groups. This resulted in an improved fit with p = 0.076. Strong invariance was established after constraining the intercepts of the model from the previous step to be equal in both groups. Lastly, partial strict invariance was shown at p = 0.054; see Table 1 for the results of the chi-square difference test.
Next, EFA was conducted to model the underlying structure of the cognitive measures for each group separately. Bartlett and KMO statistics for both groups’ correlation matrices fell within the acceptable ranges for factor analysis, young group (KMO = 0.72, Barlett’s test < 0.001), old adults (KMO = 0.73, Barlett’s test < 0.001). PA resulted in a 3-factor solution for both groups, while the scree plot supported a 4-factor model. Both were evaluated in each group. The 4-factor models were not clearly interpretable because of the cross-loading for some of the variables (Table S2 of the Online Resource). Hence, the 3-factor models were selected for both age groups. The observed variables’ loadings on the resulting factors are shown in Table 2 along with the eigenvalues, cumulative explained variance, and internal consistency. The factors in both groups could be interpreted as semantic memory, episodic memory, and executive function factors. However, there were differences in the executive function loadings between the two groups. In the young adults’ group, the Symbol Digit Modality Task was associated with the episodic memory tasks, and the List Sorting Working Memory and Trails Making B-A tasks were associated with the semantic memory tasks, thus resulting in a factor with two loadings only. On the other hand, in the older adults’ group, both tasks were associated with the executive functions factor. Also, the Trails Making B-A test had low loading scores in both groups. The resulting clusters from the hierarchical clustering analysis confirmed the EFA results (Figure S1 of the Online Resource).
Analysis C
The leave-one-out cross-validation analysis revealed that the cognitive measures incorporated in the study accounted for 39.6% of the variance of older adults’ age, whereas for the group of young adults, it accounted for a comparatively lower value of 18.5%. This means that age was better predicted by the behavioral scores in the older group than in the younger group. The results from the mixed regression models are shown in Fig. 3 (i.e., predicting participant’s age in one age group from the regression model of the other age group). The performance of these models was evaluated using the absolute mean error, and both models had relatively very low predictive power with an error higher than 45 years. This very low predictive power cannot be due to differences in intercepts only, as there was no correlation between true and predicted age (see regression lines of the mixed models in Fig. 3). This implies that the relationship between behavioral scores and age in the young group cannot be simply extrapolated to predict age in the older group, suggesting that age-behavior relationships are age-group-dependent.
Different OLS regression models to predict participants’ age using their cognitive measures A) model fitted on the young adults data to predict their age. B) model fitted on the elderly data to predict the young group age. C) model fitted on the young data to predict the elderly group age, D) model fitted on the elderly data to predict their age.
Discussion
Fundamentally, a between-group comparison in behavioral scores assumes that the tests used to collect these scores measure the same concepts in both groups. There is a rich literature in psychometry about construct validity that deals with measurement validity by ensuring that multiple observable or measurable indicators (e.g. test scores) are well reflecting the constructs of interest (e.g. cognitive abilities). Here, we showed that, despite using the same tests in both age groups, some executive functions tests did not load to the same factors in both groups. Our study calls for more caution when comparing scores between different age groups even when using well-studied tests such as the NIHTB-CB battery.
Overall, the comparisons in task performance between the two age groups were consistent with previous work2,50, including a decline in executive functions presumably due to a significant decrease in brain connectivity51,52, that is also mediating53,54 a decline in episodic memory55,56,57. Our results also confirmed previous reports on healthy aging about older people showing relative preservation or even improvement in semantic memory, in particular in tasks that rely on vocabulary storage56,58,59. Although sex differences in cognitive performance were not consistent across studies60,61, our results revealed better performance in females than males during verbal episodic memory tasks in the older group. This has important clinical implications, as verbal memory scores are typically used in the diagnosis of amnestic mild cognitive impairment (aMCI) and Alzheimer’s disease (AD) dementia often without sex adjustment. This might relate to a different underlying neuropathology in the transition from aMCI to AD dementia in women, thus introducing a sex bias in the detection of MCI62,63. This sex effect in the older adult group might be due to differences in comorbidities along the aging process as women tend to have a lower prevalence of focal lesions (microbleeds, lacunes, and cortical infarcts) than men64.
The difference in the associations (i.e. correlations) between the cognitive scores of the two age groups (Fig. 2) might reflect age-related changes in brain networks7,65. Specifically, there was an increase in the shared variance among verbal episodic memory tasks with increasing age, which is a novel observation not reported in previous studies. On the other hand, the association between list sorting working memory task and the two semantic memory tasks (Shipley vocabulary and picture vocabulary) did not persist with advancing age. A recent study supports this observation where the capacity of visual working memory was found to be dependent on semantic knowledge66. In the same way, the regression results further highlight the complex interplay between age and cognition. Cognitive scores in late adulthood were more sensitive to chronological age than in young adulthood, suggesting that age-related differences in cognition are more pronounced in later life. These results are in line with previous research showing that cognition is not a unitary model and that cognitive abilities have different developmental trajectories that can vary significantly across the lifespan67,68.
Both Analyses B and C provided evidence that the constructs or domains measured by well-validated cognitive tests differed across age groups. The mixed regression models (Analysis C) suggested that brain-behavior relationships might not be the same for both age groups. This can explain why the model used to predict age from behavioral scores in one age group failed to accurately predict age in the other age group. Likewise, our factor analysis (Analysis B) offered important insights into the main constructs measured by the NIHTB-CB battery and other auxiliary tests across the lifespan. In contrast to previous reports on similar tasks (e.g18), full measurement invariance of the hypothesized CHC model could not be established due to the multifaceted nature of the targeted executive functions processes and the limited number of tests included in return. Overall, The NIHTB-CB fluid tests have been found recently to have poor to adequate construct validity, with low correlations between the executive functions tests in particular69,70. Although partial strict invariance was achieved here, other studies argued that partial invariance is insufficient to assure meaningful comparisons, given that the fit relied on data-driven modification indices with a high risk of overfitting the sample data71. Indeed, simulation studies have suggested that modification indices might limit the generalizability of the resulting model72,73,74. Likewise, Steinmetz showed that, while partial invariance allows meaningful comparisons of latent means, it may be unreliable for comparing observed scores75. We reckon that interpreting invariance for group comparison is not always straightforward (e.g76,77,78). Furthermore, our EFA results revealed inconsistent loadings and latent cognitive models across the two age groups. More specifically, processing speed, executive functions and working memory scores were scattered across the 3 factors in the young adults’ group, whereas in the older group, they were found to aggregate into a separate factor from episodic and semantic memory tests. Notably, these three domains are known to account for age-related cognitive decline79. This shift in task loadings could be explained by the fact that memory impairment may manifest at a later age or at a different rate compared to executive functions, consequently, these two cognitive constructs may become more distinct with advancing age80. Indeed, recent work has shown that performance in executive functions is better explained by a 2-factor model in older adults compared to a 3-factor model in young adults19,81. Moreover, it was suggested that as people get older, they rely more on general executive control processes due to the reduction and allocation of cognitive resources as opposed to younger adults19. Recent studies suggested that typical in executive functions tests may not accurately measure individual differences in executive functions, but rather reflect differences in information processing speed82. Other findings have shown that measures of executive functions do not reveal a distinct latent variable unless they are combined with tasks that assess other cognitive processes, such as processing speed or working memory capacity83. We note that existing literature reckons the difficulty in estimating and interpreting the factor structure of executive functions. Overall, existing literature suggests that measures of executive functions might be either unreliable or invalid in measuring individual differences in executive functions82,84.
These findings from the three analyses have implications on how to understand differences in cognitive scores between young and older adults. For example, the List Sorting Working Memory task was found to have variant intercepts and loadings between the two age groups, it was also associated with different factors in both age groups. Therefore, a significant difference in performance in that task between young and older adults might presumably reflect different constructs being measured in both groups. It is interesting to note that prior work has shown that the List Sorting Working Memory test can load to different factors, for instance loading similarly to other working memory tasks18 despite significant correlations with executive functions tasks85. We reckon that the NIHTB-CB battery has been tested and validated in different groups86, and there is already evidence in current literature about its clinical usefulness for the assessment of cognitive abilities in diverse populations87,88,89,90,91,92. However, in line with previous studies, our findings pointed to the complexity of identifying and measuring individual differences in executive functions82,84, and to the validity of the constructs measured by the NIHTB-CB20,70. Fundamentally, any comparison between young and old adults should reflect the same construct or domain to ensure valid and interpretable differences. There is no clear interpretation of any mean differences if there is no common structure93.
Our findings can be explained by several aspects. For instance, our analysis did not explicitly account for the influence of different subcomponents of a given construct that may vary across age groups94,95, which could translate into the same factor being dominated by different subcomponents. Furthermore, the older group might show more pronounced heterogeneity than the younger group, due for instance to significant inter-variability differences in decline rates in abilities96. Although our K-means clustering showed a matched grouping to the original age groups, we cannot rule out that subgroups might exist within the older group. This possibility raises some interesting questions (hypotheses) that warrant future research. For example, if the older group can be subdivided into homogenous subgroups, one can hypothesize that a subgroup with the same identified low-dimensional representation (i.e. the same factors) as the young group would most likely follow a healthy aging process, whereas a subgroup with a different low-dimensional representation than the young group might indicate a high risk of pathological aging for the individuals that belong to that subgroup.
There are some conceptual and methodological issues that need to be acknowledged. First, the cross-sectional design of this study limits inferences about any causal associations between age differences and cognitive ability, in addition to poor interpretability and validity of individual differences due to confounding effects from within-subject associations97. Second, executive functions involves a heterogeneous set of processes that entertain complex relationships with processing speed and working memory98,99. This might complicate the interpretation of factor analysis results, making executive functions tests not well suited for factor analyses (see discussion in7). Third, the dataset14 used here has missing values that we imputed for each age group separately. Fourth, the performance indicators used could also account for the non-invariance between the models of the two age groups, the common time component in the Flanker, DCCS and TMT scores (see scoring criteria in Table S1) may have contributed to the age differences in the latent factor structure, as reaction speed differs significantly between the two age groups; e.g. a similar pattern has been reported previously20. Fifth, the similarity between tests was computed as a correlation across subjects. While this covariance-based approach is suitable to better capture variability across subjects (e.g100), it is also notoriously sensitive to outliers101, therefore we down-weighted their influence before computing the correlations between tests. Sixth, measurement invariance is necessary for uncomplicated interpretation of validity correlations and differences in test scores across groups102. However, observed correlations (cf. Figure 2) may also vary across age groups because of (1) real inherent differences in correlations between the same invariant constructs, or (2) existing differences in the reliability of the test scores. Put another way, observing different patterns of correlations between the same set of cognitive scores in different age groups is possible even under measurement invariance, suggesting that variability in correlations does not necessarily reflect differences in cognitive constructs. Last but not least, the generalizability of the 3-factor structure, identified here in a data-driven way with exploratory factor analysis, needs future investigations. Our main assumptions during the identification of the 3-factor structure were (1) test scores based on reaction times and accuracy represented accurate measurements of task performance in both age groups, (2) correlated tests represented a common factor, (3) weakly correlated tests were representing distinct factors, and (4) each identified factor was equated with one broad construct. Discrepancies between studies in the exact underlying factorial structure for the NIHTB-CB battery might be explained by possible differences in how those assumptions were satisfied.
In conclusion, our study shows that age affects the factorial structure of cognitive assessment in neurotypicals. It raises questions about the validity of significant differences between age groups when the used tests are measuring different constructs. It also highlights the challenge to comprehensively measure age-dependent changes in cognitive abilities with a set of tests that span multiple cognitive dimensions. For instance, some tasks, such as the List Sorting Working Memory, can behave differently across age groups, presumably because their successful execution might put variable demands on different processes in both age groups. In this context, it is important that users of psychometrically validated cognitive assessment batteries are aware of the tests that are not factorable as well as the impact of demographics on task performance (here age and gender were shown to have an impact on some tests). Overall, existing test batteries have been extremely useful in the clinical setting for both diagnostic and prognostic purposes. Future work needs to investigate how measured constructs (factors) by these test batteries would generalize across populations with diverse demographics, spoken languages, affective skills, socioeconomics and co-morbidities.
Data availability
The Neurocognitive aging data used in this study is available to the public within the Open Science Framework project “Goal-Directed Cognition in Older and Younger Adults”: https://doi.org/10.17605/OSF.IO/YHZXE.
References
Stern, Y., Barnes, C. A., Grady, C., Jones, R. N. & Raz, N. Brain reserve, cognitive reserve, compensation, and maintenance: operationalization, validity, and mechanisms of cognitive resilience. Neurobiol Aging83, 124–129. https://doi.org/10.1016/j.neurobiolaging.2019.03.022 (2019).
Murman, D. L. The Impact of Age on Cognition. Semin Hear.36(3), 111. https://doi.org/10.1055/S-0035-1555115 (2015).
Salthouse, T. A., Atkinson, T. M. & Berish, D. E. Executive functioning as a potential mediator of age-related cognitive decline in normal adults. J Exp Psychol Gen132(4), 566–594. https://doi.org/10.1037/0096-3445.132.4.566 (2003).
Haitas, N., Amiri, M., Wilson, M., Joanette, Y. & Steffener, J. Age-preserved semantic memory and the CRUNCH effect manifested as differential semantic control networks: An fMRI study. PLoS One16(6), e0249948. https://doi.org/10.1371/JOURNAL.PONE.0249948 (2021).
Logan, J. M., Sanders, A. L., Snyder, A. Z., Morris, J. C. & Buckner, R. L. Under-recruitment and nonselective recruitment: dissociable neural mechanisms associated with aging. Neuron33(5), 827–840. https://doi.org/10.1016/S0896-6273(02)00612-8 (2002).
Koen, J. D., Srokova, S. & Rugg, M. D. Age-related neural dedifferentiation and cognition. Curr. Opin. Behav. Sci.32, 7–14. https://doi.org/10.1016/j.cobeha.2020.01.006 (2020).
Bock, O., Haeger, M. & Voelcker-Rehage, C. Structure of executive functions in young and in older persons. PLoS One. 14 (5). https://doi.org/10.1371/journal.pone.0216149 (2019).
Baltes, P. B. & Lindenberger, U. Emergence of a powerful connection between sensory and cognitive functions across the adult life span: a new window to the study of cognitive aging? Psychol. Aging. 12 (1), 12–21. https://doi.org/10.1037/0882-7974.12.1.12 (1997).
de Frias, C. M., Lövdén, M., Lindenberger, U. & Nilsson, L. G. Revisiting the dedifferentiation hypothesis with longitudinal multi-cohort data. Intelligence35(4), 381–392. https://doi.org/10.1016/J.INTELL.2006.07.011 (2007).
Anstey, K. J., Hofer, S. M. & Luszcz, M. A. Cross-sectional and longitudinal patterns of dedifferentiation in late-life cognitive and sensory function: the effects of age, ability, attrition, and occasion of measurement. J Exp Psychol Gen132(3), 470–487. https://doi.org/10.1037/0096-3445.132.3.470 (2003).
Bisogno, A. L. et al. A low-dimensional structure of neurological impairment in stroke. Brain Commun.3 (2). https://doi.org/10.1093/braincomms/fcab119 (2021).
Halai, A. D., Woollams, A. M. & Lambon Ralph, M. A. Using principal component analysis to capture individual differences within a unified neuropsychological model of chronic post-stroke aphasia: revealing the unique neural correlates of speech fluency, phonology and semantics. (2016). https://doi.org/10.1016/j.cortex.2016.04.016
Baciu, M. et al. Strategies and cognitive reserve to preserve lexical production in aging. Geroscience43(4), 1725–1765. https://doi.org/10.1007/S11357-021-00367-5 (2021).
Avila, J. F. et al. Measurement Invariance of Neuropsychological Measures of Cognitive Aging Across Race/Ethnicity by Sex/Gender Groups. Neuropsychology34(1), 3. https://doi.org/10.1037/NEU0000584 (2020).
Kiselica, A. M., Webber, T. A., Benge, J. F., Benge, J. F. & Benge, J. F. The uniform dataset 3.0 neuropsychological battery: Factor structure, invariance testing, and demographically adjusted factor score calculation. J Int Neuropsychol Soc26(6), 576–586. https://doi.org/10.1017/S135561772000003X (2020).
Bertola, L. et al. Longitudinal measurement invariance of neuro-psychological tests in a diverse sample from the ELSA-Brasil study. Brazilian J. Psychiatry. 43 (3), 254–261. https://doi.org/10.1590/1516-4446-2020-0978 (2021).
Moreira, P. S. et al. Longitudinal measurement invariance of memory performance and executive functioning in healthy aging. PLoS One. 13 (9). https://doi.org/10.1371/journal.pone.0204012 (2018).
Mungas, D. et al. Factor Structure, Convergent Validity, and Discriminant Validity of the NIH Toolbox Cognitive Health Battery (NIHTB-CHB) in Adults. J. Int. Neuropsychol. Soc.20(6), 579–587. https://doi.org/10.1017/S1355617714000307 (2014).
Glisky, E. L. et al. Differences between young and older adults in unity and diversity of executive functions. Aging Neuropsychol Cogn28(6), 829–854. https://doi.org/10.1080/13825585.2020.1830936 (2021).
Ma, Y. et al. Latent Factor Structure and Measurement Invariance of the NIH Toolbox Cognition Battery in an Alzheimer’s Disease Research Sample. J. Int. Neuropsychol. Soc.27(5), 412–425. https://doi.org/10.1017/S1355617720000922 (2021).
Spreng, R. N. et al. Neurocognitive aging data release with behavioral, structural and multi-echo functional MRI measures. Sci. Data9(1), 1–11. https://doi.org/10.1038/s41597-022-01231-7 (2022).
Spreng, R. N. Goal-Directed Cognition in Older and Younger Adults. [Online]. Available: https://osf.io/yhzxe/
Gershon, R. C. et al. NIH toolbox for assessment of neurological and behavioral function. Neurology80(11 Supplement 3), S2–S6. https://doi.org/10.1212/WNL.0B013E3182872E5F (2013).
Weschsler Adult Intelligence Scale. 4th Edition (WISC-IV) | Pearson. Accessed: 03 Oct 2023. [Online]. Available: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Cognition-%26-Neuro/Wechsler-Adult-Intelligence-Scale-%7C-Fourth-Edition/p/100000392.html
Brainerd, C. J. et al. Dual-retrieval models and neurocognitive impairment. J Exp Psychol Learn Mem Cogn40(1), 41–65. https://doi.org/10.1037/A0034057 (2014).
Kaya, F., Delen, E. & Bulut, O. Test Review: Shipley-2 Manual. J Psychoeduc Assess30(6), 593–597. https://doi.org/10.1177/0734282912440852 (2012).
Reitan, R. M. Validity of the trail making test as an indicator of organic brain damage. Percept Mot Skills8(3), 271–276. https://doi.org/10.2466/PMS.1958.8.3.271 (1958).
Daneman, M. & Carpenter, P. A. Individual differences in working memory and reading. J Verbal Learning Verbal Behav19(4), 450–466. https://doi.org/10.1016/S0022-5371(80)90312-6 (1980).
Symbol Digit Modalities Test. Accessed: 12 Nov 2023. [Online]. Available: https://psycnet.apa.org/doiLanding?doi=10.1037%2Ft27513-000
Li, C. H. Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behav. Res. Methods48(3), 936–949. https://doi.org/10.3758/S13428-015-0619-7/TABLES/6 (2016).
Flanagan, D. P. & Dixon, S. G. The Cattell-Horn‐Carroll Theory of cognitive abilities, in Encyclopedia of Special Education, Wiley, https://doi.org/10.1002/9781118660584.ese0431. (2014).
Caemmerer, J. M., Keith, T. Z. & Reynolds, M. R. Beyond individual intelligence tests: application of Cattell-Horn-Carroll Theory. Intelligence. 79https://doi.org/10.1016/j.intell.2020.101433 (2020).
Floyd, F. J. & Widaman, K. F. Factor analysis in the development and refinement of clinical assessment instruments. Psychol. Assess.7(3), 286–299. https://doi.org/10.1037/1040-3590.7.3.286 (1995).
Gross, A. L., Khobragade, P. Y., Meijer, E. & Saxton, J. A. Measurement and Structure of Cognition in the Longitudinal Aging Study in India–Diagnostic Assessment of Dementia. J. Am. Geriatr. Soc.68, S11. https://doi.org/10.1111/JGS.16738 (2020).
Park, L. Q. et al. Confirmatory factor analysis of the ADNI neuropsychological battery. Brain Imaging Behav6(4), 528–539. https://doi.org/10.1007/s11682-012-9190-3 (2012).
Hu, L. T. & Bentler, P. M. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct. Equ Model.6 (1), 1–55. https://doi.org/10.1080/10705519909540118 (1999).
Browne, M. W. & Cudeck, R. Alternative ways of assessing Model Fit. Sociol. Methods Res.21 (2), 230–258. https://doi.org/10.1177/0049124192021002005 (1992).
Bollen, K. A. Structural equations with latent variables. Struct. Equations Latent Variables. 1–514. https://doi.org/10.1002/9781118619179 (1989).
Flora, D. B. & Flake, J. K. The purpose and practice of exploratory and confirmatory factor analysis in psychological research: Decisions for scale development and validation. Can J Behaviour Sci49(2), 78–88. https://doi.org/10.1037/CBS0000069 (2017).
Schmitt, T. A. Current methodological considerations in exploratory and confirmatory factor analysis. J. Psychoeduc Assess.29 (4), 304–321. https://doi.org/10.1177/0734282911406653 (2011).
KMO and Bartlett’s Test - IBM Documentation. & Accessed 20 May 2023. [Online]. Available: https://www.ibm.com/docs/en/spss-statistics/28.0.0?topic=detection-kmo-bartletts-test
Kaiser, H. F. An index of factorial simplicity. Psychometrika39(1), 31–36. https://doi.org/10.1007/BF02291575/METRICS (1974).
Bartlett, M. S. A Note on the Multiplying Factors for Various χ2 Approximations. J. R Stat. Soc. Ser. B Stat. Methodol.16(2), 296–298. https://doi.org/10.1111/J.2517-6161.1954.TB00174.X (1954).
Watkins, M. W. Exploratory factor analysis: a guide to best practice. J. Black Psychol.44 (3), 219–246. https://doi.org/10.1177/0095798418771807 (2018).
Santos, N. C. et al. Exploring the factor structure of neurocognitive measures in older individuals. PLoS One10(4), e0124229. https://doi.org/10.1371/journal.pone.0124229 (2015).
Asbee, J., Kelly, K., McMahan, T. & Parsons, T. D. Factor analysis of the virtual reality Stroop task. J Clin Exp Neuropsychol44(8), 604–617. https://doi.org/10.1080/13803395.2022.2150749 (2022).
Velicer, W. F. & Fava, J. L. Effects of variable and subject sampling on factor pattern recovery. Psychol. Methods. 3 (2), 231–251. https://doi.org/10.1037/1082-989X.3.2.231 (1998).
Lu, H., Li, J., Fung, A. W. T. & Lam, L. C. W. Diversity in verbal fluency performance and its associations with MRI-informed brain age matrices in normal ageing and neurocognitive disorders. CNS Neurosci. Ther.29 (7). https://doi.org/10.1111/CNS.14144 (2023).
Korbmacher, M. et al. Bio-psycho-social factors’ associations with brain age: a large-scale UK Biobank diffusion study of 35,749 participants. Front. Psychol.14https://doi.org/10.3389/FPSYG.2023.1117732 (2023).
Salthouse, T. A. Selective review of cognitive aging. J. Int. Neuropsychol. Soc.16 (5), 754. https://doi.org/10.1017/S1355617710000706 (2010).
Fjell, A. M., Sneve, M. H., Grydeland, H., Storsve, A. B. & Walhovd, K. B. The disconnected brain and executive function decline in aging. Cerebral Cortex27(3), 2303–2317. https://doi.org/10.1093/CERCOR/BHW082 (2017).
Idowu, M. I. & Szameitat, A. J. Executive function abilities in cognitively healthy young and older adults—A cross-sectional study. Front. Aging Neurosci.15, 976915. https://doi.org/10.3389/FNAGI.2023.976915/BIBTEX (2023).
Isingrini, M. & Taconnat, L. Episodic memory, frontal functioning, and aging. Rev Neurol (Paris)1643(Suppl 3), S91-5. https://doi.org/10.1016/S0035-3787(08)73297-1 (2008).
Guillaume, C. et al. Early age-related changes in episodic memory retrieval as revealed by event-related potentials. Neuroreport20(2), 191–196. https://doi.org/10.1097/WNR.0B013E32831B44CA (2009).
Tromp, D., Dufour, A., Lithfous, S., Pebayle, T. & Després, O. Episodic memory in normal aging and Alzheimer disease: Insights from imaging and behavioral studies. Ageing Res Rev24(Pt B), 232–262. https://doi.org/10.1016/J.ARR.2015.08.006 (2015).
Luo, L. & Craik, F. I. M. Aging and memory: a cognitive approach. Can. J. Psychiatry. 53, 346–353. https://doi.org/10.1177/070674370805300603 (2008).
Liampas, I. et al. Longitudinal episodic memory trajectories in older adults with normal cognition. Clin. Neuropsychol.37 (2), 304–321. https://doi.org/10.1080/13854046.2022.2059011 (2023).
Verhaeghen, P. Aging and vocabulary scores: a meta-analysis. Psychol Aging18(2), 332–339. https://doi.org/10.1037/0882-7974.18.2.332 (2003).
Hoffman, P. An individual differences approach to semantic cognition: divergent effects of age on representation, retrieval and selection. Sci. Rep.8 (1). https://doi.org/10.1038/s41598-018-26569-0 (2018).
Pauls, F., Petermann, F. & Lepach, A. C. Gender differences in episodic memory and visual working memory including the effects of age. Memory21(7), 857–874. https://doi.org/10.1080/09658211.2013.765892 (2013).
Golchert, J. et al. Women outperform men in verbal episodic memory even in oldest-old age: 13-year longitudinal results of the AgeCoDe/AgeQualiDe study. J Alzheimer’s Dis69(3), 857–869. https://doi.org/10.3233/JAD-180949 (2019).
Sundermann, E. E. et al. Improving Detection of Amnestic Mild Cognitive Impairment with sex-specific cognitive norms. J. Alzheimers Dis.84 (4), 1763–1770. https://doi.org/10.3233/JAD-215260 (2021).
Sundermann, E. E. et al. Female advantage in verbal memory. Neurology87(18), 1916–1924. https://doi.org/10.1212/WNL.0000000000003288 (2016).
Vinke, E. J. et al. Trajectories of imaging markers in brain aging: the Rotterdam Study. Neurobiol. Aging71, 32–40. https://doi.org/10.1016/J.NEUROBIOLAGING.2018.07.001 (2018).
Bagarinao, E. et al. Reorganization of brain networks and its association with general cognitive performance over the adult lifespan. Sci. Rep.9(1), 11352. https://doi.org/10.1038/s41598-019-47922-x (2019).
Starr, A., Srinivasan, M. & Bunge, S. A. Semantic knowledge influences visual working memory in adults and children. PLoS One15(11), e0241110. https://doi.org/10.1371/journal.pone.0241110 (2020).
Hartshorne, J. K. & Germine, L. T. When does cognitive functioning peak? The asynchronous rise and fall of different cognitive abilities across the life span. Psychol Sci26(4), 433–443. https://doi.org/10.1177/0956797614567339 (2015).
Nichols, E. S., Wild, C. J., Owen, A. M. & Soddu, A. Cognition across the lifespan: investigating age, sex, and other sociodemographic influences. Behav. Sci.11(4), 51. https://doi.org/10.3390/bs11040051 (2021).
Scott, E. P., Sorrell, A. & Benitez, A. Psychometric Properties of the NIH Toolbox Cognition Battery in healthy older adults: reliability, validity, and Agreement with Standard Neuropsychological tests HHS Public Access. J. Int. Neuropsychol. Soc.25 (8), 857–867. https://doi.org/10.1017/S1355617719000614 (2019).
Ott, L. R. et al. Construct validity of the NIH Toolbox Cognitive Domains: a comparison with conventional neuropsychological assessments. Neuropsychology. 36 (5), 468–481. https://doi.org/10.1037/neu0000813 (2022).
MacCallum, R. C., Roznowski, M., Necowitz, L. B. Model modifications in covariance structure analysis: the problem of capitalization on chance. Psychol Bull, (1992). Accessed: 15 Aug 2024. https://psycnet.apa.org/fulltext/1992-25917-001.html
MacCallum, R. M. P. Specification searches in covariance structure modeling. Psychol Bull, (1986). Accessed: 15 Aug 2024. Available: https://psycnet.apa.org/fulltext/1986-26275-001.html
French, B. F. & Finch, W. H. Multigroup confirmatory factor analysis: locating the invariant referent sets. Struct Equ Modeling15(1), 96–113. https://doi.org/10.1080/10705510701758349 (2008).
Jorgensen, T. D. Applying permutation tests and multivariate modification indices to configurally invariant models that need respecification. Front Psychol8, 250494 (2017).
Steinmetz, H. Analyzing observed composite differences across groups: is partial measurement invariance enough? Methodology. 9 (1), 1–12. https://doi.org/10.1027/1614-2241/A000049 (2013).
Tse, W. W. Y., Lai, M. H. C. & Zhang, Y. Does strict invariance matter? Valid group mean comparisons with ordered-categorical items. Behav Res Methods56(4), 3117–3139. https://doi.org/10.3758/S13428-023-02247-6 (2024).
Shi, D., Song, H. & Lewis, M. D. The impact of partial factorial invariance on cross-group comparisons . 26, (7), 1217–1233. https://doi.org/10.1177/1073191117711020. (2017)
Robitzsch, A. & Lüdtke, O. Why full, partial, or approximate measurement Invariance are not a prerequisite for meaningful and valid Group comparisons. Struct. Equ Model.30 (6), 859–870. https://doi.org/10.1080/10705511.2023.2191292 (2023).
Rozas, A. X. P., Juncos-Rabadán, O. & González, M. S. R. Processing speed, inhibitory control, and working memory: three important factors to account for age-related cognitive decline. 66, (2), 115–130, (2008). https://doi.org/10.2190/AG.66.2.B
Ferguson, H. J., Brunsdon, V. E. A. & Bradford, E. E. F. The developmental trajectories of executive function from adolescence to old age. Sci. Rep.11, 1382. https://doi.org/10.1038/s41598-020-80866-1 (2021).
Karr, J. E. et al. The unity and diversity of executive functions: a systematic review and re-analysis of latent variable studies. Psychol. Bull, 144 (11) 1147. https://doi.org/10.1037/BUL0000160 (2018).
Löffler, C., Frischkorn, G. T., Hagemann, D., Sadus, K. & Schubert, A. L. ·The common factor of executive functions measures nothing but speed of information uptake. Psychol Res, (2024). https://doi.org/10.1007/s00426-023-01924-7
Schubert, A. L. & Rey-Mermet, A. Does process overlap theory replace the issues of general intelligence with the issues of attentional control?. J Appl Res Mem Cogn8(3), 277–283. https://doi.org/10.1016/j.jarmac.2019.06.004 (2019).
Hedge, C., Powell, G. & Sumner, P. The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behav. Res. Methods50(3), 1166–1186. https://doi.org/10.3758/S13428-017-0935-1/TABLES/5 (2018).
Tulsky, D. S. et al. NIH Toolbox Cognition Battery (NIHTB-CB): list sorting test to measure working memory. J. Int. Neuropsychol. Soc.20 (6), 599–610. https://doi.org/10.1017/S135561771400040X (2014).
Fox, R. S. et al. Uses of the NIH toolbox® in clinical samples: a scoping review. Neurol Clin Pract12(4), 307–319. https://doi.org/10.1212/CPJ.0000000000200060 (2022).
Shields, R. H. et al. Validation of the NIH Toolbox Cognitive Battery in intellectual disability. Neurology94(12), e1229–e1240. https://doi.org/10.1212/WNL.0000000000009131 (2020).
Zelazo, P. D. et al. NIH toolbox cognition battery (CB): validation of executive function measures in adults. J. Int. Neuropsychol. Soc.20 (6), 620–629. https://doi.org/10.1017/S1355617714000472 (2014).
Heaton, R. K. et al. Reliability and validity of composite scores from the NIH Toolbox Cognition Battery in adults. J. Int. Neuropsychol. Soc.20 (6), 588–598. https://doi.org/10.1017/S1355617714000241 (2014).
Hackett, K. et al. Utility of the NIH Toolbox for assessment of prodromal Alzheimer’s disease and dementia. Alzheimers Dement. (Amst)10, 764–772. https://doi.org/10.1016/J.DADM.2018.10.002 (2018).
Halter, C. M. et al. Construct validation of NIH Toolbox Cognition Battery premorbid cognitive functioning scores in Black and White older americans with and without mild cognitive impairment. J. Int. Neuropsychol. Soc.https://doi.org/10.1017/S1355617723000425 (2023).
Carlozzi, N. E. et al. Construct validity of the NIH Toolbox Cognition Battery in individuals with stroke. Rehabil Psychol62(4), 443–454. https://doi.org/10.1037/REP0000195 (2017).
Fischer, R. & Karl, J. A. A primer to (cross-cultural) multi-group invariance testing Possibilities in R. Front. Psychol, 10, (2019). https://doi.org/10.3389/FPSYG.2019.01507
McAlister, C. & Schmitter-Edgecombe, M. Executive function subcomponents and their relations to everyday functioning in healthy older adults. J Clin Exp Neuropsychol38(8), 925–940. https://doi.org/10.1080/13803395.2016.1177490 (2016).
Maldonado, T., Orr, J. M., Goen, J. R. M. & Bernard, J. A. Age differences in the subcomponents of executive functioning. J Gerontol B Psychol Sci Soc Sci75(6), e31–e55. https://doi.org/10.1093/GERONB/GBAA005 (2020).
Idowu, M. I. & Szameitat, A. J. Executive function abilities in cognitively healthy young and older adults-A cross-sectional study. Front. Aging Neurosci.15https://doi.org/10.3389/FNAGI.2023.976915 (2023).
Hsu, S., Poldrack, R., Ram, N. & Wagner, A. D. Observed correlations from cross-sectional individual differences research reflect both between-person and within-person correlations. PsyArXivhttps://doi.org/10.31234/osf.io/zq37h (2022).
McCabe, D. P., Roediger, H. L., McDaniel, M. A., Balota, D. A. & Hambrick, D. Z. The relationship between working memory capacity and executive functioning evidence for a common executive attention construct. Neuropsychology. 24, (2), 222, (2010). https://doi.org/10.1037/A0017619
Albinet, C. T., Boucard, G., Bouquet, C. A. & Audiffren, M. Processing speed and executive functions in cognitive aging: how to disentangle their mutual relationship?. Brain Cogn.79(1), 1–11. https://doi.org/10.1016/J.BANDC.2012.02.001 (2012).
Seghier, M. L. & Price, C. J. Dissociating functional brain networks by decoding the between-subject variability. Neuroimage45(2), 349–359. https://doi.org/10.1016/J.NEUROIMAGE.2008.12.017 (2009).
Devlin, S. J., Gnanadesikan, R. & Kettenring, J. R. Robust estimation and outlier detection with correlation coefficients. Biometrika62(3), 531–545. https://doi.org/10.1093/BIOMET/62.3.531 (1975).
Bowden, S. C. et al. Measurement invariance of core cognitive abilities in heterogeneous neurological and community samples. Intelligence32(4), 363–389. https://doi.org/10.1016/J.INTELL.2004.05.002 (2004).
Funding
This work was funded by Khalifa University [grant numbers FSU-2022-006 and RC2-2018-022].
Author information
Authors and Affiliations
Contributions
O.H. was involved in the data analysis, figures preparation, interpretation of the results, and manuscript writing. M.L.S was involved in the conception of the study, interpretation of the results, manuscript writing and funding.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hatahet, O., Seghier, M.L. The validity of studying healthy aging with cognitive tests measuring different constructs. Sci Rep 14, 23880 (2024). https://doi.org/10.1038/s41598-024-74488-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-74488-0
Keywords
This article is cited by
-
Alzheimer disease seen through the lens of sex and gender
Nature Reviews Neurology (2025)





