Main

The physical and social environment of the USA is characterized by racism—a long-standing and deeply entrenched system of biased patterns and practices that infuse all aspects of society1,2. The impact of racism on health is multidimensional and multi-generational, and despite efforts to dismantle (and counter-efforts to uphold) its impact, racism uniquely harms health across the life course1,2. In the USA, racism is targeted to and harms members of racialized groups, including Black, American Indian, Hispanic American, Asian, Native Hawaiian, Pacific Islander, Middle Eastern/North African and multiracial groups, although its specific form and function probably vary. Empirical data documenting the impact of racism on the health of Black Americans have been extensively described for over a century, including in the seminal work of W. E. B. Du Bois in 1900, Visualizing Black America, declaring that “the problem of the twentieth century is the problem of the color line”3. Although the exact terminology and definitions regarding various forms of racism in the medical literature are actively evolving, systemic racism (SR) has been described as reflecting the normalization and legitimization of historical and contemporary discrimination that produces adverse outcomes for racial minorities4,5. Importantly, the concept of SR is crucial to the recognition that racism is situated within a system of institutions, practices, social structures and beliefs of an entire society, not only within institutions or among individuals. As noted by Hardimon6, “systemic racism is a form of racism that is inter-institutional and inter-structural”.

Building on the extensive body of literature on racism and health, we define SR as operating across three broad domains: interpersonal, structural and cultural (Fig. 1). Interpersonal racism refers to differential, inequitable and discriminatory treatment of targeted racial groups by individuals, whether deliberately or unintentionally1,2. It is characterized by prejudice and discrimination that involve differential assumptions about the ability, motives and intentions of others, as well as differential actions towards others based on their race. Two prominent and validated measures of interpersonal racism7 are the Experiences of Discrimination developed by Krieger8 and the Everyday Discrimination Scale developed by Williams et al. in 19979. Self-reported experience of discrimination is associated with adverse health outcomes2,7,10,11.

Fig. 1: Conceptual overview of mechanisms linking SR and health outcomes.
figure 1

The dashed arrows indicate mutually reinforcing dynamics between component dimensions of SR. HPA, hypothalamic–pituitary–adrenal; CNS, central nervous system.

Structural racism refers to laws, policies and processes that are implicitly or explicitly codified in American institutions and have discriminatory impact1,2. Even when expressly prohibited by law, covert discrimination and inequitable application of policies have been documented in criminal, employment, housing system and health outcomes12,13,14. A prime example of structural racism is Jim Crow laws, which enforced legal racial discrimination in southern US states until 1964 and are linked to adverse health outcomes, decades after the end of Jim Crow15,16,17,18,19. Another example is racial residential segregation or residential redlining, also a well-established predictor of adverse health outcomes20,21. More recently, Brown et al. developed a latent construct of state-level structural racism that spans multiple domains22,23, reflecting the multifaceted and interconnected nature of structural racism as a system of subordination and superordination22,23. They found that a latent measure of structural racism provides a better model fit and explains more variation in health than summative indices (or single indicators) of structural racism, which ignore relationships among items and assume that they equally reflect the construct22,23. Several other recent studies illustrate the utility of latent measures of structural racism24,25.

Cultural racism focuses on the ideology, or the system of beliefs and ideas shared by a majority, as well as the tangible cultural manifestations of those beliefs in arts, customs and ways of life26. It describes broadly held ideas, biases and stereotypes that provide the basis for the system of white supremacy24,27, which are then used to justify inequitable and harmful societal customs. In essence, racist ideology fuels racist actions reflected in the dominant culture. It provides racially oppressive systems the support and justification needed to function6. Williams et al. defined cultural racism as belief systems (that is, ideologies), images and norms (that is, customs) that devalue, marginalize and subordinate people of colour2, and Michaels et al. as the water surrounding the iceberg upholding other dimensions of racism26. Self-reported data on ideologies can be difficult to measure due to social desirability bias and self-censoring in survey interviews; however, tangible expressions of ideology in the form of hate crimes and public symbols of the Confederacy—cultural expressions of ideology—can be empirically measured. The racial animus index, first described by Stephens-Davidowitz in 201428, is one indicator of cultural racism that has recently been adapted for studies of health outcomes among Black Americans29,30,31. Derived from an internet-search-based proxy of area racism, it utilizes the proportion of internet searches on racist terms, similar to studies of sensitive social subjects (for example, religion and firearms) shown to reflect underlying cultural beliefs more accurately than conventional survey tools28, given perceptions of online anonymity. Higher racial animus was associated with increased risk of all-cause mortality29 and depression31 among Black Americans.

The literature on cultural racism at the population level is in its infancy, with only one prior multidimensional validated measure of cultural racism to our knowledge, which was entirely based on self-report indicators and largely subject to social desirability bias32. The purpose of this analysis is thus to develop and validate a measure of cultural racism that is theory-informed and derived from a combination of self-report and non-self-report indicators, as well as clarify its unique role, separate from and in combination with other SR domains, in harming the health of Black Americans. Here we describe the approach and results of a latent cultural racism measure, including robust assessments of validity and reliability. Latent measures are useful for assessing broader aspects of directly unobservable constructs such as culture or ideology, while minimizing the risk of biased estimates. To demonstrate the empirical utility of the cultural racism measure (defined predominantly on data from 2017 due to data availability) and estimate its contribution to Black–white US health disparities, we examine its association with five key health outcomes from 2018 to 2021: overall mortality, cardiovascular-specific mortality, diabetes-specific mortality, cancer-specific mortality and life expectancy at birth. We have made our cultural racism measure publicly available to advance scientific inquiry on its uniquely harmful contributions to various health outcomes in the USA.

Results

Indicators of state-level cultural racism

We leveraged established theoretical and conceptual frameworks describing anti-Black cultural racism in selecting our key study indicators. We began with Bonita-Silva’s 1997 reframing of racism as a “racial ideology of a racialized social system”27 and Michaels and colleagues’26 decomposition of cultural racism into three unique dimensions: (1) values and beliefs of white superiority; (2) shared social consciousness encompassing shared attitudes, biases and prejudice about different racial groups; and (3) expression in language, symbol and media of the dominant society26. Finally, we incorporated Cogburn’s33 identification of distinct frameworks that describe cultural expressions of ideological racism as (1) symbolic boundaries, cultural processes that maintain health inequities, racial frames and ideologies (for example, colour-blind racism) that invisibly shape decision-making (for example, low white support for Medicaid expansion), stereotype threats that evoke physiological and psychological responses, and situational threats (for example, implicit bias) that produce chronic stress; (2) status hierarchies, the informal culture that shapes appropriate behaviour within institutions centred on white normativity, and placement of physical symbols of presumed superiority (for example, symbols of the Confederacy); and (3) collective imaginaries, cultural tools that convey false notions of white superiority and Black inferiority (for example, hate groups), including racialized symbols, imagery and language.

To develop the cultural racism factor (CRF) at the population level, we identified nine indicators aligned with prior frameworks and hypothesized to be components of a latent measure that will encapsulate elements of both racist ideologies and cultural expressions of those ideologies. While cultural racism probably operates at multiple geographic levels (individual, neighbourhood, county and state), this analysis focuses on cultural racism at the state level because states exert a powerful influence as a key institution that shapes population health and allocation of resources22, and relevant data are most consistently available at the state level. In addition, while the form and function of cultural racism probably vary by target group (for example, Black versus American Indian versus Hispanic), this analysis focuses on anti-Black racism given the extensive literature clarifying its theoretical and conceptual underpinnings for Black Americans. The nine indicators evaluated are described below and summarized in Table 1.

  1. (1)

    Public symbols of the Confederacy: This refers to publicly sponsored symbols honouring Confederate leaders or soldiers or the Confederate States of America, including monuments and statues, flags, holidays and other observances, names of schools, highways, parks, bridges, counties, cities, lakes, dams, roads, military bases and public works. Such symbols convey presumed superiority of the white race and reflect racial ideologies that serve as a source of psychosocial stress to racialized groups33. We obtained data on the number of public symbols of the Confederacy for each US state in 2017 from the Southern Poverty Law Center34. We excluded public symbols in the dataset that had a removal date before 2017. However, we separately examined public symbols regardless of removal date in sensitivity analyses. We normalized to the state population size by dividing the counts in each state by the total population from the US Census Bureau for 2017, multiplied by 100,000.

  2. (2)

    Number of hate groups: Hate groups are characterized by collectively organizing and promoting biases, as reflected in their official statements or principles, statements of leaders, or activities, to attack an entire class of people35. Hate groups reflect norms of hate and degradation that target racialized groups35. We obtained the number of hate groups in each state for 2017 from the Southern Poverty Law Center36 and normalized to state population size.

  3. (3)

    Hate crime rate: Hate crimes refer to individual criminal acts motivated by bias against a protected class, including race35. Evidence suggests that hate crime rate at the state level reflects the extent to which a greater climate of hate is directed towards racialized groups, driven by an underlying perception of members of those groups as threats to white individuals35. States differ in their conceptual definitions of what legally constitutes hate crimes and also in their reporting patterns, contributing to under-reporting and potential misclassification35. We estimated the hate crime rate at the state level in 2017 using data from the Federal Bureau of Investigation Hate Crime Statistics37, defined as the percentage of hate crimes in a given state that were classified as anti-race bias.

  4. (4)

    Online racial animus: This refers to the racially charged online search rate in a geographic area, based on the percentage of Google search queries that include racially charged language28,31. We estimated online racial animus for 2017 in each state using Google Trends; we searched for four racial slurs typically directed at Black Americans (see Supplementary Method 1 for the list of racial slurs included). Google Trends38 reports an index of search activity, which “measures the fraction of queries that include the term in question in the chosen geography at a particular time relative to the total number of queries at that time”31, scaled to range between 0 and 100.

  5. (5)

    Racial resentment: This refers to a set of racialized, and particularly anti-Black, beliefs and attitudes against policies seeking to redress racial discrimination, fuelled by the ideology that racial inequality is due to Black Americans violating traditional American values such as self-reliance, work ethic and discipline39. This type of racial resentment has been shown to influence public opposition to political and policy preferences that support social welfare and affirmative action39. Smith et al.39 developed estimates of racial resentment for US states by applying multilevel regression and post-stratification on pooled public opinion survey data (1988–2016) from the American National Election Survey, which included the standard four-question battery to measure racial resentment. Here we used the Smith et al. state-level measure of racial resentment for 2016 (the most recent year available), considering each state’s racialized composition (see Supplementary Method 1 for additional methodological details).

  6. (6)

    Anti-immigrant sentiments: This refers to public opinion towards immigrants, shown to impact immigration policies at the state level40,41. Anti-immigrant sentiments strongly reflect the prevailing ideological climate in a state, driven by political narratives and media reporting that paint negative stereotypes of immigrant characteristics including race40. We used a measure of anti-immigration sentiments developed by Butz and Kehrberg41, who applied the multilevel regression and post-stratification method on two national surveys, the American National Election Study and the General Social Survey, for the years 2004, 2006 and 2008. We derived an aggregate measure of anti-immigrant sentiments as the average of estimated anti-immigrant sentiments for each state across the three time periods (see Supplementary Method 1 for additional methodological details).

  7. (7)

    Lack of support for Medicaid expansion among white non-Hispanic (NH) residents: This refers to the level of public support for Medicaid expansion, shown to be racialized, with racialized groups having appreciably higher levels of support for welfare policies such as Medicaid expansion than white Americans42. Moreover, levels of white public support for Medicaid expansion notably drove state adoption decisions, with support by racialized groups having little to no impact42. Grogan and Park42 previously estimated state-level Medicaid expansion support by applying the multilevel regression and post-stratification methodology to four national-level survey datasets from the Kaiser Family Foundation monthly tracking poll, which specifically assessed support for Medicaid expansion between July 2012 and April 2013. Here we obtained data on the prevalence of state-level Medicaid expansion support among white residents (see Supplementary Method 1 for additional methodological details), reverse coded to capture the lack of white public support for Medicaid expansion.

  8. (8)

    Lack of community-oriented policing (COP) implementation: COP refers to a philosophy that aims to empower communities, with an emphasis on finding solutions to community problems such as crime, quality of life and neighbourhood conditions43. The key components of COP are community partnerships, community mobilization, organizational transformation and problem-solving43. The use of COP strategies in law enforcement agencies in the USA dates back to the mid-1990s and has since increased in popularity as a means to improve police–citizen interactions43. In the context of our study, COP is a negative predictor of cultural racism; however, we do not assume that all COP programmes are unambiguously good or necessarily effective, or warrant anything less than critical attention. Prior studies have measured COP using Community Policing in Law Enforcement Management and Administrative (LEMAS) data43,44. We obtained data on COP implementation for each US police department/agency in the LEMAS dataset for 2016, the most recent data prior to our study period. State-level prevalence of the lack of COP implementation was estimated as the proportion of police departments in a state that did not implement COP, assessed on the basis of the police department/agency’s response to a survey question regarding having a specialized unit with full-time or designated personnel for COP45 (see Supplementary Method 1 for additional methodological details).

  9. (9)

    Implicit bias of white NH residents: Implicit bias refers to prejudicial behaviours or judgements that are automatically activated in an individual’s mind even when the underlying stereotype or attitude is not consciously endorsed46. Implicit bias is widely measured using the white–Black Implicit Association Test (IAT)47, which assesses the relative strength of an individual’s implicit associations, attitudes or stereotypes for white Americans relative to the strength of the associations for Black Americans46. Here we estimated aggregate implicit bias by averaging at the state level IAT scores for white NH residents who completed Harvard University’s Project Implicit test in 201748 (see Supplementary Method 1 for additional methodological details).

Table 1 Cultural racism indicators and state-level descriptive statistics for study variables (n = 50 states and DC)

Model development and psychometric properties

Distributions of CRF indicators

The distributions of all nine CRF indicator variables, definitions and data sources are presented in Table 1. Across states on average, there were 0.4 hate groups per 100,000 (minimum (min), 0.05 in NM; maximum (max), 3.75 in DC); the online racial animus index was 69 (min, 46 in HI; max, 100 in WV); the percentage of anti-immigrant sentiment was 53% (min, 40% in CA; max, 64% in NC); the anti-race hate crime rate was 0.57 (min, 0.25 in WY; max, 0.93 in AL); the lack of COP implementation prevalence was 60% (min, 0% in DC; max, 100% in HI); the prevalence of racial resentment was 0.63 (min, 0.54 in VT; max, 0.70 in AR); there were 0.79 public symbols of the Confederacy per 100,000 people, ranging from 0 in 17 states (AK, CO, CT, HI, IL, KS, ME, MI, ND, NE, NH, NJ, RI, UT, VT, WI and WY) to a maximum of 5.56 in MS; white NH resident implicit bias index was 0.36 (min, 0.29 in OR; max, 0.42 in ND); and the lack of white support for Medicaid expansion was 55% (min, 46.68% in HI; max, 61.22% in OK).

Confirmatory factor analysis models

Model fit statistics for the four specified confirmatory factor analysis (CFA) models (Models 1–4) are shown in Table 2. Model 1 included all nine CRF indicators and no correlated errors; this model had poor fit statistics (comparative fit index (CFI) and Tucker Lewis index (TLI), <0.9; root mean square error of approximation (RMSEA), >0.1). Model 3 excluded two poorly loading indicators (hate groups per 100,000 and anti-race hate crime) and specified two pairs of correlated error terms; this model had the best fit statistics (CFI = 0.96; TLI = 0.94; Bayesian information criterion (BIC), 938.24; RMSEA = 0.09). Model 2 (all nine indicators and two pairs of correlated errors) and Model 4 (excluding the two poorly loading indicators and specifying a two-factor model) both had fit statistics that were inferior to those of Model 3. Model 3 was considered the final model as it satisfied both statistical and theoretical considerations of model fit. The factor loadings of all seven indicators in Model 3 were statistically significant, with absolute values ranging from a minimum of 0.37 for white implicit bias to 0.90 for a lack of white support for Medicaid expansion. In this final model, we retained white resident implicit bias (factor loading, 0.37; 90% confidence interval (CI), (0.17, 0.56)), following expert recommendation to account for the sampling distribution of factor loading scores, in this case with an upper limit >0.40 (ref. 49). The average variance extracted (AVE) for the final CFA model was 0.45 (90% CI, (0.38, 0.54)). The final CFA model that produced the CRF, along with indicator variables and loadings, is graphically presented in Fig. 2, and factor loadings with 90% CIs are summarized in Extended Data Table 1.

Table 2 CRF latent models and fit statistics
Fig. 2: Final one-factor CFA model.
figure 2

The final model accounted for correlated errors between (1) online racial animus and lack of white support for Medicaid expansion and (2) the number of Confederate symbols per 100,000 and racial resentment. CFA loadings are indicated by solid arrows from the latent construct to each indicator; the standardized covariance of correlated errors is indicated with dashed arrows between indicators with correlated errors. The sample size is 51 (that is, all 50 US states and DC).

CRF distribution

The CRF distribution is shown in Fig. 3. Across states, there was a mean (s.d.) CRF value of 0 (1.03), ranging from −2.33 in Hawaii to 1.99 in West Virginia. Spatial mapping of the CRF across all 50 states (top panel of Fig. 3) indicates that the CRF was higher in the southeastern, southwestern and north-central US regions, including 17 states (WY, SD, NE, KS, MO, OK, AR, LA, MS, AL, GA, SC, TN, KY, WV, IN and TX) with CRF values ≥0.46, the highest CRF tertile.

Fig. 3: Distribution of CRF scores for each US state and DC.
figure 3

The top panel shows the geographical distribution of the CRF, while the bar chart in the lower panel shows the value for each state and DC.

CRF tests for reliability

The CRF showed excellent reliability, defined in terms of both internal consistency, which measures the overall degree to which component indicators are inter-correlated, and homogeneity, which measures the extent to which the indicators capture a single underlying construct. The CRF had an average inter-item correlation of 0.31 (s.d., 0.31) for the seven indicators in the final model, well within the range (0.15 to 0.5) reflecting good internal consistency with minimal redundancy50. Evidence of homogeneity for the CRF is supported by better fit statistics for the one-factor final model than for two-factor models (Table 2). To assess test–retest reliability and construct stability, we compared the current CRF with a version of the CRF defined using indicators from 2004. Although only four indicators were available in 2004, we observed a strong correlation between the CRF in both periods (r = 0.83; 90% CI, (0.75, 0.89); P = 3.2 × 10−14; Extended Data Fig. 1). Lastly, the CRF showed evidence of construct reliability based on McDonald’s omega51, estimated using structural equation modelling (0.89; 95% CI, (0.83, 0.93)), a measure that is especially useful for evaluating internal consistency for multidimensional constructs and latent variables where item loadings may vary50.

CRF tests for validity

We conducted a series of validity tests to determine whether the CRF latent measure performs as expected in relation to other metrics and to determine its empirical contribution above and beyond other measures of SR. We assessed external validity (Fig. 4) by evaluating whether the CRF varied by state-level Confederate or Jim Crow law status. We observed significantly higher median (25th percentile, 75th percentile) CRF values in Confederate states (1.04 (0.43, 1.38)) than in non-Confederate states (−0.24 (−1.06, 0.30)), and in Jim Crow law states (0.76 (0.14, 1.49)) than in non–Jim Crow law states (−0.44 (−1.28, 0.10)), showing strong external validity by Confederate (Wilcoxon rank-sum statistic, W = 77, P = 1.1 × 10−3) and Jim Crow law (W = 95, P = 2.1 × 10−5) status. We assessed convergent validity (Extended Data Fig. 2) by comparing the CRF with the only other existing anti-Black cultural racism index by Price et al.32. The CRF showed significant and positive, but not perfect, correlations (r = 0.66; 90% CI, (0.5, 0.77); P = 1.7 × 10−7), highlighting that the two measures are similar but not identical. We assessed incremental validity (Extended Data Table 2) by including the CRF in models along with the anti-Black racism index of Price et al. and three established structural racism measures (state racism index (SRI)52, standardized structural racism factor (SSRF)24 and Index of Concentration in the Extremes for Racialized Economic Segregation (ICERACEINC)21). The CRF consistently had significant and large effect sizes in all five models—by itself, with the anti-Black racism index and with each structural racism measure, highlighting a distinct contribution of the CRF above and beyond the other measures. We assessed discriminant validity using three separate lines of evidence. First, the CRF was negatively correlated with two measures of structural racism (ICERACEINC: r = −0.5; 90% CI, (−0.66, −0.31); P = 2 × 10−4); SSRF: r = −0.47; 90% CI, (−0.66, −0.23); P = 3.1 × 10−3) but not the third (SRI: r = 0.15; 90% CI, (−0.09, 0.37); P = 0.29; Extended Data Fig. 3). However, the absolute values for all three correlation estimates were less than 0.67 (square root of the AVE for the CFA model from which the CRF was derived), supporting discriminant validity for the CRF relative to structural racism53,54. Second, the correlation between the CRF and the anti-Black cultural racism index of Price et al. (r = 0.66; 90% CI, (0.50, 0.77); d.f. = 49; P = 1.7 × 10−7) was significantly larger than the correlation between the CRF and each structural racism measure, specifically the SSRF (r = −0.47; 90% CI, (−0.66, −0.23); d.f. = 35; P = 3.1 × 10−3), the SRI (r = 0.15; 90% CI, (−0.09, 0.38); d.f. = 48; P = 0.29; z = 3.43; P = 3 × 10−4) and ICERACEINC (r = −0.5; 90% CI, (−0.66, −0.31); d.f. = 49; P = 2 × 10−4; z = 8.34; P = 0.00), according to a one-tailed Pearson and Filon 1898 test of significance for the difference between two overlapping correlations. This test is appropriate when considering a directional hypothesis comparing whether one correlation is significantly greater or significantly smaller, but not both. The convergent validity tests provided empirical evidence that the CRF is similar to the anti-Black cultural racism measure, as expected, while the divergent validity tests provided evidence that the CRF was indeed not similar to the structural racism measures examined, as expected. The third line of evidence for discriminant validity showed that the CRF was not correlated with the state-level normalized count of aviation accidents/incidents reported by the National Transportation Board for the year 2017 (r = −0.17; 90% CI, (−0.39, 0.06); d.f. = 49; P = 0.22), a variable for which there is no theoretical expectation of a relationship with cultural racism (Supplementary Fig. 1).

Fig. 4: Distribution of state-level CRF values by history of Jim Crow laws and history of Confederacy.
figure 4

a, The CRF and history of Jim Crow laws. b, The CRF and history of Confederacy. The CRF value for each state is indicated using state abbreviation labels. In each box plot, the centre line indicates the median, the centre solid dot indicates the mean, the box limits indicate the upper and lower quartiles, and the whiskers indicate values within 1.5 times the interquartile range. The sample size is 51.

Associations between CRF and health outcomes

As presented in Table 3, we tested the associations between CRF and five separate health outcomes (age-adjusted all-cause mortality per 100,000 for 2018–2021, age-adjusted cancer-specific mortality per 100,000, age-adjusted cardiovascular-specific mortality per 100,000, age-adjusted diabetes-specific mortality per 100,000 and life expectancy at birth) for NH Black and NH white adults. The CRF was significantly associated with all health outcomes examined except for cancer mortality, irrespective of statistical adjustments. In the demographics-adjusted model, each unit increase in the CRF was associated with 136 per 100,000 (95% CI, (90, 182)) additional all-cause deaths. In the fully adjusted model that additionally adjusted for socio-economic status and percentage NH Black, each unit increase in the CRF was associated with 111 per 100,000 (64, 159) additional all-cause deaths. For cardiovascular-specific mortality, each unit increase in the CRF was associated with 40 (21, 59) additional deaths in the demographics-adjusted model, which was attenuated to 28 (7, 49) cardiovascular deaths in the fully adjusted model. For diabetes-specific mortality, each unit increase in the CRF was associated with 6 per 100,000 (3, 9) and 4 per 100,000 (1, 7) additional deaths in the demographics-adjusted and fully adjusted models, respectively. For cancer-specific mortality, each unit increase in the CRF was associated with 11 per 100,000 (5, 17) additional cancer-specific deaths in the demographics-adjusted model, but this effect became attenuated and non-significant in the fully adjusted model to 7 per 100,000 (0, 14). For life expectancy at birth, each unit increase in the CRF was associated with a two-year decline (−2 (−3, −1)) in the demographics-adjusted model. An expanded table showing covariate estimates is presented in Extended Data Table 3, while predictions based on the demographics-adjusted models are graphically presented in Extended Data Fig. 4. These associations across a range of health outcomes provide further evidence of predictive validity of the CRF.

Table 3 Association between the CRF and age-adjusted (per 100,000) all-cause and cause-specific mortality among US NH Black and NH white adults

CRF effect measure modification

We next evaluated whether the associations between the CRF and each health outcome varied by race, geographical region and population density (Extended Data Table 4a,b). Interaction terms between the CRF and race, region and population density were non-significant for all-cause mortality, cardiovascular-specific mortality and life expectancy at birth (Pint > 0.05). However, the CRF was associated with significantly higher cancer-specific mortality rates in the South (~19 per 100,000; 95% CI, (10, 28); P = 4.1 × 10−4), but not in the Midwest, West or Northeast (P > 0.05; Fig. 5a). Furthermore, diabetes-specific mortality showed an interaction effect with race (Pint < 0.01), driven by a stronger association of the CRF with diabetes-specific mortality among NH Black adults (~8 per 100,000; 95% CI, (2, 14); P = 6.8 × 10−3), but smaller associations among NH white adults (~6 per 100,000; 95% CI, (3, 8); P = 1.0 × 10−4; Fig. 5b). While the CRF was associated with adverse health outcomes for both NH Black and NH white adults across the range of outcomes studied, the observed mortality and life expectancy disparities between the groups persisted across the entire spectrum of the CRF.

Fig. 5: Interaction effects for the CRF and cancer mortality by region and for the CRF and diabetes mortality by race.
figure 5

a, The CRF and cancer mortality by region. b, The CRF and diabetes mortality by race. The results are from random-effects linear regression models for cause-specific deaths per 100,000 including interaction terms and adjusted for demographic factors (total population size, study year and geographic region). The sample size is 204 (that is, four observations for each of 50 US states and DC). The double segment on the y axis in a indicates truncation of values between zero and the next value on the y axis, and the y-axis ranges differ between a and b. The lines in the plots represent estimated β coefficients, while the shaded error bands represent 95% CIs around the estimated β coefficients, indicating the range within which the true values are expected to fall with 95% certainty. The width of the bands reflects the uncertainty in the estimates, with narrower bands signifying greater precision. Fully reported statistical results for the interaction models on which a and b are based are presented in Supplementary Table 6.

Sensitivity analyses

To determine the robustness of our main results to various analytical assumptions, we conducted a series of sensitivity analyses. First, we examined the performance of the CRF model to different specifications of police–citizen interaction indicators. We observed that the lack of COP implementation indicator outperformed various specifications of police brutality indicators (police killings of unarmed NH Black Americans considered as a proportion, rate per 100,000 or Black–white ratio; descriptive statistics are summarized in Supplementary Table 1), on the basis of factor loading and statistical significance (Supplementary Method 2a and Supplementary Table 2). While the three variables derived from police killing data had poor factor loadings and did not reach statistical significance in the CFA base model, the lack of COP implementation indicator loaded significantly on the latent factor in Model 1 (Supplementary Method 2a and Supplementary Table 2). Second, regarding the exclusion of public symbols of the Confederacy with removal dates prior to 2017, Model 3 (the final CFA model) outperformed the sensitivity model that included an alternative version of the number of public symbols of the Confederacy per 100,000 from which no exclusions were made (that is, it included symbols with removal dates prior to 2017; Supplementary Method 2b and Supplementary Table 3). Finally, we present empirical estimates for the associations between each CRF measure indicator and age-adjusted mortality, showing significant associations for all but two (implicit bias of white NH residents and number of Confederate symbols per 100,000) of the seven indicator variables retained in the final CFA model (Extended Data Table 5).

Discussion

We developed a latent measure of anti-Black cultural racism at the state level, measurably advancing the scientific literature by providing a theory-informed, validated, empirical measure that can be used across a variety of health outcomes alone or in combination with other domains of SR to investigate health disparities. The CRF considers the inextricable function of racist ideologies and beliefs in driving cultural racism, which in combination with other domains—structural and interpersonal racism—sustain a fundamentally racist US system. The CRF was associated with higher overall, cardiovascular, diabetes and cancer mortality and contributed to lower life expectancy at birth. The adverse impact of the CRF was significant and non-trivial for both NH Black and NH white adults; however, across all levels of the CRF, NH Black adults experienced higher mortality rates. These findings highlight that cultural racism is a contemporary, consistent and universally harmful social exposure for US adults that must be addressed.

SR, including its interpersonal, cultural and/or structural domains, is harmful to health and operates via mechanisms that include racial resentment, discriminatory policies leading to consistent underinvestment in racialized groups, and greater exposure to harmful social, physical and environmental conditions. These conditions shape and/or modify community and individual health practices by limiting access to health-promoting resources and increasing harmful coping behaviours, leading to poor health outcomes. SR also drives harmful psychosocial and biophysical exposures, extensively described with the ecosocial theory and concept of weathering55,56,57,58. At issue are chronic psychosocial stress59,60,61 and adverse biophysical exposures (for example, greater exposure to air pollution), premature physiologic wear and tear or allostatic load, hyperactivation of the hypothalamic–pituitary–adrenal axis, and subsequent dysregulation of the neuroendocrine, cardiovascular, immune and metabolic systems62,63,64,65,66. In fact, SR has been associated with changes in epigenetic mechanisms67, telomere length and gene expression68,69,70, including a recent study showing that exposure to racial discrimination may account for over 50% of race-related differences in pro-inflammatory transcription-factor activity71.

Cultural racism, specifically, is thought to operate through four distinct pathways1: (1) by driving societal policies that maintain unequal access to resources; (2) by leading to individual-level conscious and unconscious bias that results in discrimination against non-dominant racial groups; (3) by causing stereotype threat, which is the activation of anxieties and expectations in stigmatized groups when negative stereotypes are evoked; and (4) by causing internalized racism, which is the acceptance of negative stereotypes as true by some members of stigmatized groups1. Cultural racism complements and upholds structural and interpersonal racism domains, as described in the well-regarded “water surrounding the iceberg” analogy26 as well as the analogy of a “bird cage that locks members of a racially defined group into subordinate positions”6. Thus, efforts to dismantle structural and interpersonal racism are likely to be ineffective without also understanding and addressing the context of cultural racism.

The CRF latent measure makes a distinct contribution to the empirical literature as it was a stronger predictor of health outcomes than individual indicators alone and remained robust to statistical adjustments for various state-level confounders. It also addresses the notably sparse empirical literature regarding the unique contribution of cultural racism at the population level in driving adverse health outcomes26,33,72. The few existing studies rely on single indicators or have methodological limitations that limit generalizability. For example, two studies have reported that the Google search rate for racially charged terms is positively associated with a variety of adverse health outcomes29,30, and cultural racism measured as area-level racial bias was shown to be associated with increased death rates, more so for Black than for white adults73. Price et al. developed a 31-item index of anti-Black cultural racism, based on survey responses from three sources (2 items from the American National Election Survey, 9 from the General Social Survey and 20 from Project Implicit)32. However, the reliance on solely self-reported individual-level survey responses to measure racial attitudes makes this index vulnerable to social desirability bias and misclassification32, while a focus on only anti-Black racial bias may not account for harmful pro-white ideologies that exist even when social actors do not explicitly acknowledge anti-Black sentiments (that is, colour-blind racism)74. While these studies are informative and match theoretical speculations of a major role for cultural racism in health outcomes, we provide empirical evidence that a latent measure capturing multiple indicators of ideology, attitudes and behaviour, including both self-report and non-self-report data, is feasible and important.

We found consistent associations between the CRF and mortality among Black and white adults, in line with prior theoretical frameworks highlighting distinct pathways through which cultural racism can negatively impact health1. As expected, a higher CRF was associated with higher mortality for Black adults than for white adults, an association that was consistent across all levels of CRF exposure, supporting the idea that cultural racism contributes tangibly to, but does not fully explain, the racial gap in mortality. We also observed that the CRF increased all-cause, cancer-specific, cardiovascular-specific and diabetes-specific mortality rates for white adults. Cultural racism thus harms the health of both Black and white adults. This finding is distinct from studies of structural racism, which has been shown to be consistently detrimental for Black adults with mixed findings for white adults11,22,75,76. However, the pattern is consistent with the theory of universal harm described in relation to structural sexism, where all genders experience detrimental health effects from social systems that foster toxic constructions of masculinity77.

Several mechanisms may be driving the observed associations between the CRF and the health outcomes examined. First, racial resentment has been shown to drive lack of support for policies deemed beneficial to Black Americans, including social welfare programmes such as Medicaid expansion. In fact, empirical evidence indicates that white lack of support for Medicaid expansion based on white racial resentment strongly predicted state-level adoption of Medicaid expansion policies, thereby also harming white access to health care42. Furthermore, it is well documented that cuts to, and inadequate funding for, federal and state safety nets and other social programmes end up denying white Americans access to these resources and programmes in addition to everyone else78. In addition, racialized opposition to various civil and voting rights policies, including the American Rescue Plan Act, has been well documented. For example, Alabama and Arkansas, two states in the top tertile of the CRF score, also received failing grades for support of these policies according to the 2021 Civil Rights Scorecard79. Another potential mechanism less well clarified is status threat, which refers to perceived threats to white Americans’ social status80, a mechanism that not only fuels cultural racism but also potentially drives racial resentment that triggers physiological and behavioural stress response mechanisms80. Finally, we observed that while the CRF was higher in southern states with established histories of structural racism (for example, Jim Crow laws), the CRF was also high in Midwest and northeastern states, highlighting that while historical context matters in shaping the form and function of racism, it is important to consider the role of cultural racism as a contemporary form of racism. Brown and Homan note that there are “distinct contemporary racialized regimes, characterized by different manifestations and modalities of racism across place”24. The CRF is an important contribution to this emerging literature that, emphasizes the need to consider region-specific differences in the form and mechanisms by which various domains of racism, potentially interacting with other racialized or class dynamics, impact and harm the health of Black and white Americans even in regions with traditionally lower levels of structural racism.

Study design limitations and considerations

There are several methodological considerations relevant to the interpretation of our findings, broadly categorized as (1) the choice of the geographic level of assessment, (2) the validity of the selected indicators, (3) the time point of assessment and (4) interactions between cultural racism and other SR domains.

Regarding the geographic level of assessment, this study focused on state-level measures for both pragmatic and conceptual reasons. First, data availability was most consistent at the state level, making it possible to obtain indicators across all nine domains of interest. Moreover, states have been shown to powerfully shape population health through policies that influence the social, electoral81 and policy environment22, and, if validated in future studies as a key driver of health disparities, state-level interventions to address SR can have major impacts.

Regarding the validity of the selected indicators, as with most studies that rely on secondary/administrative data, there are inherent limitations. For example, online racial animus was measured using an index of racially charged online search rate. While this measure broadly captures all Google queries containing racial slurs, it cannot distinguish whether each search is actually motivated by racist attitudes and does not imply that all racist individuals conduct such racially charged searches29. Similarly, the anti-race hate crime rate sourced from the Federal Bureau of Investigation’s Crime Data Explorer online platform is limited by the lack of consistency across states in laws and definitions of hate crime and in reporting standards. Moreover, since this source only includes crimes known to police, it probably underestimates the true rates35. The prevalence of lack of COP implementation is another indicator with inherent limitations given that it is derived from police department survey responses, which are not immune to social desirability and inaccuracies. To account for this, we explored whether the use of three state-level alternative measures derived from unarmed-citizens-killed-by-police data could be better indicators. However, adding these measures did not improve model fit compared with lack of COP implementation prevalence, suggesting that lack of COP implementation probably better captures the culture of police departments/agencies, while unarmed-citizens-killed-by-police-derived measures may be better suited in other SR domains82. Last, data from Project Implicit were used to compute state-level implicit racial bias; however, this data source is nationally non-representative and oversamples white persons, especially NH white women with higher education levels (≥4 years of college)83. Also, our estimate of implicit racial bias was computed from IAT scores, for which construct and discriminant validity as a measure of individual differences in implicit social cognition has come under question recently, with some evidence suggesting that the IAT more effectively measures sensitive attitudes rather than implicit attitudes”84.

Regarding the time point of assessment for CRF indicators, this study was designed to be cross-sectional, which does not allow for examining changes over time or temporal heterogeneity of effects. Our rationale for computing the CRF with indicators estimated in the year 2017 is that this was the most recent year of data where most indicators were available, and we could analyse health outcomes after this date (2018–2021). When data were not available for that year, we used estimates for indicators within the decade prior to 2017. For example, for modelled estimates abstracted from published literature for NH white support levels for Medicaid expansion, data were available for the period between 2012 and 201342. Similarly, for anti-immigrant sentiments, modelled estimates were averaged across 2004, 2006 and 200841, a nine-year gap from our target year of 2017. For racial resentment, data were abstracted for 2016. As this area of study matures, we anticipate that additional data years will become available to fine-tune these measures41. Future research should explore whether our approach to selecting domains for measurement to create latent variables is applicable to other time periods.

It is also important to note that our conceptual framework defines cultural racism as a single domain, separate from, even as it is connected to, structural and interpersonal racism, with these three domains together comprising the overarching SR construct. Each domain is distinct and uniquely harmful, and collectively probably exerts multiplicative effects whereby exposure to multiple domains simultaneously increases the risk of adverse health outcomes to a greater degree than a single domain alone. One first step towards interrogating these pathways is addressing the lack of an empirical cultural racism measure. Future steps are needed to explore more granular geographic assessments (for example, at the county, city and/or census-tract level), additional time points of assessment (using more recent data to examine temporal changes, for example since the upsurge of anti-racist advocacy following the murder of George Floyd, and current anti-DEI backlash) and evaluating life-course trajectories (that is, at infancy, early life and adulthood) by historical generation (for example, born before or after Jim Crow was in effect).

Finally, we acknowledge that the state-level nature of the empirical assessment of mortality outcomes limits our ability to make causal inferences. Certain features of our study were mainly driven by data availability, including our focus on Black (that is, the oppressed group) and white (that is, the dominant group) Americans, and mortality and life expectancy at birth as health outcomes. Given the dearth of studies using multidimensional measures of cultural racism, this leaves room for future studies to investigate other health outcomes that might be impacted by cultural racism, including physiological and psychological outcomes, and for future refinements of a CRF that captures cultural racism specifically targeted to non-Black individuals. Moreover, it would be a valuable contribution to the literature for future studies to consider how cultural racism impacts the health of other racial and ethnic minorities. To this end, we have provided values for our CRF and its component indicators deposited in a publicly available repository (https://doi.org/10.5281/zenodo.15609627) and provided detailed descriptions for these indicators (Supplementary Table 5).

In conclusion, our study contributes insights that substantially move the field towards improved clarity about the measurement and impact of cultural racism. To make a meaningful impact in reversing the societal harms due to racism, shared understanding based on validated empirical measurement is critically important and can inform effective and durable interventions to eliminate racialized health inequities and improve US health outcomes overall.

Methods

No human participants were directly involved in the study, and all data used were de-identified and obtained from publicly available sources.

CRF development

Descriptive statistics were summarized for all nine CRF indicators to describe the means, standard deviations and ranges of all indicators (Table 1). The CRF was conceptualized using a reflective measurement model85 in which covariation among the nine indicators was hypothesized to be caused by and reflect variation in the underlying latent construct. Therefore, on the basis of theoretical considerations and estimated pairwise correlations between indicators (Extended Data Table 6), we used CFA to examine the extent to which these nine indicators reflect the underlying construct, estimating different measurement models with varying specifications (Table 2). The base model (Model 1) included all nine indicators loaded onto a single factor, with no correlated errors between indicators. Next, the base model was improved (Model 2) by specifying correlated errors between (1) the lack of support for Medicaid expansion among white NH residents and online racial animus, and (2) the number of public symbols of the Confederacy per 100,000 and racial resentment. Specifying correlated errors explicitly accounts for common sources of covariation among indicators that are independent of the effect captured by the latent construct. In the third model (Model 3), the base model was further improved by excluding any indicators that loaded poorly or insignificantly on the latent factor in addition to specifying correlated errors. Last, a two-factor model was conducted (Model 4), allowing indicators to load on two latent constructs, one for self-report indicators and one for non-self-report indicators. Also, any indicators that loaded poorly or did not load significantly on the base model were excluded from the two-factor model, and one pair of correlated errors between racial resentment and the lack of support for Medicaid expansion among white NH residents was specified.

Fit statistics were evaluated for all CFA models using the model χ2, CFI, TLI, RMSEA and BIC. Good fit of CFA models is indicated by an insignificant χ2, CFI and TLI > 0.90, and RMSEA < 0.05 (ref. 86) as well as smaller values of the BIC. Our final CRF selection was based on satisfying good-fit statistics and consistency with the underlying theoretical conceptualization and was informed by expert recommendation to contextualize fit indices by considering construct reliability87, and thus combined information from multiple fit indices (χ2, CFI and TLI) as well as composite reliability.

Descriptive statistics for the resulting CRF from the best CFA model were computed. In addition, we conducted a series of tests to assess the validity and reliability of the CRF, evaluated evidence to support the intended interpretation of the latent measure and proposed use, and conducted sensitivity analyses to assess the potential impact of methodological limitations inherent in the selected indicators and to investigate the robustness of our model to these issues (Supplementary Method 3).

We conducted the following tests to assess construct validity:

  1. (1)

    External validity: We examined the distribution of the CRF on the basis of whether states had or had not (1) been part of the Confederacy that sought to uphold slavery and (2) passed Jim Crow laws (that is, laws legalizing racialized discrimination) in the aftermath of the defeat of the Confederacy16. CRF distributions by Jim Crow and Confederacy status were compared using Wilcoxon rank-sum tests.

  2. (2)

    Convergent validity: To assess the convergent validity of our CRF, we estimated its correlation with the anti-Black cultural racism index of Price et al.32, the only other previously published multidimensional measure of cultural racism.

  3. (3)

    Criterion-related validity: We focused on predictive validity as a specific example of criterion-related validity and assessed predictive validity of the CRF by quantifying associations between the CRF and four mortality outcomes (all-cause, cancer-specific, cardiovascular-specific and diabetes-specific) and one social outcome (life expectancy). All four mortality rates for NH Black and NH white US adults ages 25+ from 1 January 2018 to 31 December 2021 for all 50 US states were downloaded from the Centers for Disease Control and Prevention’s WONDER system (Underlying Cause of Death, 2018–2021, Single Race Request)88. Mortality data were determined from death certificates, with underlying cause of death based on physician reporting on the death certificate. Age-adjusted mortality rates per 100,000 persons were analysed with the ‘2000 US standard’ used as the standard population88. Deaths were retrieved with ICD-10 codes C00–C97 for malignant neoplasms for cancer-specific mortality, codes I00–I99 (diseases of the circulatory system) for cardiovascular-specific mortality and codes E10–E14 (diabetes mellitus) for diabetics-specific mortality. Death counts between 0 and 9 are suppressed; therefore, some states were assigned ‘unreliable’ death rates and excluded from further analyses. Life expectancy at birth for 2018–2020 was downloaded from the National Center for Health Statistic’s National Vitals Statistics System’s website for Life Expectancy. Further details on outcome queries are provided in Supplementary Method 4. On the basis of prior literature23,89,90, we considered a variety of state-level covariates for statistical adjustment including geographical region, population size (log-transformed), percentage of the population that is NH Black, a measure of income inequality (that is, the Gini coefficient) and percentage of the state population with income below the federal poverty level. Data for state population size, percentage of the population that is NH Black and percentage of the state population below poverty were obtained from the US Census Bureau’s American Community Survey for 2017. We also obtained data for population density, defined as the average population per square mile of land at the state level, a proxy measure for rural status, from the US Census Bureau’s 2010 Decennial Census. Income inequality was sourced from web-published updates described in refs. 91,92. Descriptive statistics were summarized for covariates in terms of mean, standard deviation, range for continuous variables and percentages for categorical variables. For each of the 50 states and Washington, DC, NH Black and NH white mortality rates were obtained for all four mortality outcomes considered. Crude and adjusted random-effect models were conducted to separately examine the relationship of the CRF, both as a continuous and as a dichotomized variable (derived from tertiles; highest versus low/medium levels combined), with all-cause and cancer-specific mortality rates for NH Black and NH white adults. Adjusted models controlled for potential state-level confounders including demographics (that is, log-transformed total population, study period and geographical region), socio-economics (that is, income inequality and percentage below the poverty level) and percentage NH Black population. The demographics-adjusted model was considered as our main analytic model as it did not include covariates that were potentially related to the CRF. Potential effect measure modification based on race, geographic region and population density was assessed by testing for interaction effects on predicted health outcomes, followed by stratified models when an interaction term was statistically significant.

  4. (4)

    Incremental validity: To assess the incremental validity of the CRF, we used the only other existing multidimensional cultural racism measure, as well as structural racism as our frame of reference for conceptual comparisons. In addition to the anti-Black racism index of Price et al., three measures of structural racism were selected to assess incremental validity: ICERACEINC21, SSRF24 and SRI52. Using all-cause mortality as the primary outcome, we estimated the amount of variance explained in the outcome by a linear mixed-effects model containing each of the racism measures independently, while minimally adjusting for demographic variables. Next, we estimated the amount of variance explained when the CRF was added to each model containing one of the anti-Black racism index and the structural racism measures, and documented the effect sizes for each of the racism measures in the models.

  5. (5)

    Discriminant validity: The CRF is a homogeneous construct, theorized to be a component of SR, distinct from but related to and mutually reinforcing other unique dimensions of SR, particularly structural racism. We therefore assessed the CRF’s discriminant validity of concept in three different ways. First, we did this in relation to structural racism, by evaluating the absolute magnitude of the correlation between the CRF and each of three measures of structural racism, using Pearson’s correlation test. Tests of discriminant validity typically focus on the absolute magnitude of the correlation rather than the direction of the correlation between measures of two constructs93. We compared correlation absolute values to the square root of the AVE for the CFA model from which the construct was derived, with the expectation that as long as the correlations are less than the square root of the AVE, there is no major concern for discriminant validity53,54. While this expectation is typically applied to constructs in the same model53,54, here we used this expectation as a guide to evaluate the extent of discrimination between the underlying concept of the CRF and that of structural racism. Second, we considered convergent and discriminant validity as part of a continuum and compared correlations for discriminant validity to correlations for convergent validity with the hypothesis that convergent validity correlations will be significantly greater than correlations of discriminant validity93,94. We therefore used Pearson and Filon’s 1898 test (cocor.dep.groups.overlap function in the cocor package95 version 1.1-4 in R) to evaluate whether the convergent validity correlation between the CRF and the anti-Black racism index of Price et al. was significantly greater than the correlations between the CRF and each of the three structural racism measures. Third, we evaluated discriminant validity by testing whether the CRF was significantly correlated with a variable that it is theoretically not supposed to be associated with—the state-level count of aviation incidents/accidents reported by the National Transportation Safety Board for the year 201796. This count was normalized by dividing by the number of Part 139 airports in each state as reported by the Bureau of Transportation Statistics for 201797.

We conducted the following tests to assess construct reliability:

  1. (1)

    Average inter-item correlation: This correlation is used for assessing internal consistency and is computed as the average of pairwise correlations between all indicators that load on the latent construct.

  2. (2)

    Test–retest reliability: We assessed this for four of the seven CFA indicators due to data availability across two time points. For the four indicators with values available in 2004 and 2017 (the number of public symbols of the Confederacy per 100,000, anti-immigrant sentiments, racial resentment and implicit bias of white NH residents), we estimated the latent construct, specifying the CFA model as closely as possible to match the final CFA model determined with 2017 data. We tested for correlations between the 2004 and 2017 CFA to evaluate test–retest reliability of the CRF.

  3. (3)

    Composite reliability: This was chosen as a measure of internal consistency of the CRF, because it does not assume equal factor loadings across indicators (that is, tau equivalence is false)98.

Sensitivity analyses

Sensitivity analyses were conducted to evaluate:

  1. (1)

    Potential misclassification due to under-reporting and/or selection bias in police department self-report COP data

  2. (2)

    Considering the impact of excluding public symbols with removal dates before versus after 2017

Details of the sensitivity analyses are provided in Supplementary Method 2.

Missing values

Three indicators had missing values for DC (racial resentment, anti-immigrant sentiments and lack of white NH support for Medicaid expansion), and anti-race hate crime was missing for HI. Missing values for DC and HI were imputed to the average value on the basis of non-missing values for the relevant indicator. For the number of public symbols of the Confederacy per 100,000, there were only 34 states with values, indicating that the remaining states had no such symbols. We therefore set the value for the other 17 states to 0. For health outcomes, three had missing values. Cancer-specific mortality rates were missing for NH Black adults in ID, SD, VT and WY in 2018–2019 and in MT, ND, SD, VT and WY in the 2020–2021 study period. For cardiovascular-specific mortality, rates were missing for NH Black adults in MT, ND, VT and WY in the 2018–2019 study period. Diabetes-specific mortality rates were missing for NH Black adults in AK in both study periods and in UT for 2018–2019 only, and for NH white adults in DC for 2020–2021. We did not impute missing outcome values but instead omitted any missing values from further analyses.

Lastly, the CRF was mapped to visualize its spatial distribution across all US states. No statistical methods were used to predetermine sample sizes, but our sample sizes are like those reported in previous publications that conducted state-level analyses using US data. All data analyses were performed using the R statistical computing program, version 4.2.2 (ref. 99), including the packages laavan100 version 0.6-19, performance101 version 0.15.0, ggeffects102 version 2.3.0, nlme version 3 (ref. 103), DiagrammeR version 1.0.11 (ref. 104) and ggplot2 version 2 (ref. 105), and the source code x.boot.R for estimating CIs for the final CFA model106.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.