Main

Socioeconomic status (SES) stratifies society, with deep impacts on wealth1, health2, family and life course3. Various disciplines, including economics, demography, public health and sociology, have operationalized this multidimensional construct, focusing on the ‘big three’ indicators: educational attainment, income, earnings and wealth, and occupational status. Here we conduct a genome-wide association study (GWAS) on sociologically informed occupational status measures. We exploit our findings to advance understanding and quantitative modelling of status attainment processes across the life course and their complex relationship with health.

The deeply engrained intergenerational transmission of SES and inequalities across generations4,5 has motivated social and medical scientists to consider whether genetics plays a role in SES6,7,8 and, more recently, SES-related stratification and non-genetic inheritance, which biases genetic effects on a phenotype9,10. So far, the focus has primarily been on educational attainment11,12 and income13, with less attention to the heritability of occupational status. However, family studies indicate moderate heritability of occupational status comparable to other SES measures in the range of 0.30–0.40 (refs. 14,15,16,17,18). Molecular genetic research on SES proxies has focused on educational attainment6,7,19,20,21 and income22,23, neglecting occupational status. SES measures are important since they introduce gene–environment correlations which affect GWAS results24 and influence the patterns of genetic correlations of mental health traits25. This calls for a more nuanced and holistic understanding of SES that goes beyond educational attainment and income. While SES measures are intertwined, the dimensions are clearly analytically and empirically distinct26, and individuals may, for example, trade off income for other types of status, in particular occupations. Educational attainment may therefore not necessarily translate into economic success.

We extend previous work of a GWAS on broadly skill-based occupational groups using the UK Biobank, which identified 30 independent single-nucleotide polymorphisms (SNPs) associated with 9 very broad categories of the UK Standard Occupational Classification (SOC) and an SNP heritability of 0.085 (ref. 27). Since occupation in the UK Biobank is richly measured using 353 categories, we go beyond the existing GWAS by drawing from decades of sociological theory and measurement of occupational stratification. Sociological measures are preferable since purely skill-based measures suffer from inconsistent operationalization and lack theoretical and substantive thinking about the underlying mechanisms of status attainment, ignoring, for example, social prestige and other status factors1 (Box 1).

Sociologists consider occupation as the primary social and economic role held by most adults outside their immediate family or household, often even as ‘the single most important dimension in social interaction’ (p. 203)28. It is a long-term stable indicator of an individual’s social position in society alongside income, consumption, division of labour and social reproduction1. Adequately measuring occupational status is complex, with generations of sociologists dedicated to mapping this complex qualitative trait on a continuous scale29. The three conceptual approaches to measuring occupational status consider either socioeconomic differences between occupations, inter-occupational social interaction, or ascribed prestige of different jobs30.

In our analyses, we focus on three different measures of occupational status, championed by different theoretical traditions in sociology. First, the International Socioeconomic Index (ISEI)31, is a status measure constructed from scaling weights that maximize the (indirect) influence of education on income through occupation. Second, the Standard International Occupational Prestige Scale (SIOPS)28, is a prestige-based measure based on public opinion surveys where a representative population is tasked with ranking occupations by their relative social standing. Third, the Cambridge Social Interaction and Stratification Scale (CAMSIS)1, measures the distance between occupations on the basis of the frequency of social interactions between them (operationalized as husband-and-wife combinations). This measure is based on the notion that differential association is a function of social stratification, with partners and friends more likely to be selected from within the same group. Although these measures are championed by different theoretical traditions in sociology, empirically they have substantial but not perfect correlations32, alluding to an underlying latent factor of occupational status.

The current study investigates molecular genetic associations with ISEI, SIOPS and CAMSIS. Analyses were conducted on 273,157 (130,952 males and 142,205 females) individuals in the UK Biobank33, identifying 106 independent SNPs, and replicated in the UK’s National Child Development Study (NCDS; N = 4,899; 2,525 females and 2,374 males). Genomic structural equation modelling (GSEM)34 suggests a general genetic factor across all SES measures of occupational status, income and education. An overview of the study is provided in Extended Data Fig. 1.

The integration of molecular genetics into such a core topic of social science research promises a richer understanding of the role of the biological and social factors as well as the improvement of quantitative modelling and understanding of social processes of attainment status transmission. We thus utilize our GWAS discovery results for various sociogenomic investigations. While there is limited research that has identified a potential biological basis from GWAS findings for complex behavioural traits19,35, there has been some progress towards understanding potential pathways. This is particularly in psychiatric and addiction-related phenotypes and type 2 diabetes36,37,38. Accordingly, we investigated how social and psychological mechanisms play a role in the genetics of occupational status, including childhood career aspirations, non-cognitive39 and cognitive traits27. We then examined to what extent polygenic scores (PGSs) for occupational status predict the phenotype within and between families, their genetic penetrance of careers across the life course and the role common genetic variants play as a confounder of the intergenerational transmission of occupational status. Additional analyses explore the complex relationship between occupational status and health outcomes and how parental occupational status confounds the genetic prediction of general health. Our findings are conclusive that ignoring genetic data in parent–offspring SES transmission and quantitative stratification research in general leads to biased results in non-experimental studies, while the interplay between genes and the environment remains complex.

Results

Heritability, discovery and genetic links among SES measures

The main analyses were conducted on individuals from the UK Biobank on the three phenotypic measures of occupational status: CAMSIS (N = 273,157), SIOPS (N = 271,769) and ISEI (N = 271,769; Methods). Linkage disequilibrium score regression (LDSC)-based SNP heritability (\({h}_{{\rm{SNP}}}^{2}\))40 was significantly different from zero for all occupational measures, and ~50% larger for CAMSIS (\({h}_{{\rm{SNP}}}^{2}=0.145\), s.e. = 0.0066) compared with SIOPS (\({h}_{{\rm{SNP}}}^{2}=0.105\), s.e. = 0.0052) and ISEI (\({h}_{{\rm{SNP}}}^{2}=0.109\), s.e. = 0.0056, see Fig. 1). This is within the range of \({h}_{{\rm{SNP}}}^{2}\) for other status measures estimated in the UK Biobank (Methods), such as education (\({h}_{{\rm{SNP}}}^{2}=0.153\), s.e. = 0.0056) and income (\({h}_{{\rm{SNP}}}^{2}=0.092\), s.e. = 0.0041), and for CAMSIS nearly twice as high as for previously reported occupational measures27. Genome-based restricted maximum likelihood (GREML) analyses confirmed these results (Supplementary Table 1).

Fig. 1: Comparison of SNP-heritability estimates of occupational status measures vs income and education.
figure 1

LD score-based SNP-heritability estimates of occupational status measures CAMSIS (N = 273,157), SIOPS (N = 271,769) and ISEI (N = 271,769) compared to income (N = 353,673) and education (N = 404,420). Each bar is a single estimate of SNP heritability and each error bar indicates the s.e. of the estimate (95% confidence intervals (CIs) are presented).

Source data

The GWASs identified 106 independent SNPs for CAMSIS, including 56 also found for ISEI and 51 for SIOPS on the basis of an R2 threshold of 0.1 and a window size of 1,000 kb (see Fig. 2 Manhattan plot), one of which (only significant for CAMSIS) was found on the X chromosome. We identified 11,206 SNPs in LD with our autosomal lead SNPs (Methods) and conducted an exhaustive phenome-wide association study (PheWAS) using the GWAS catalogue and the IEU OpenGWAS Project database. While we observe a substantial overlap with other socioeconomic status-related traits, 8 of our variants (rs12137794, rs17498867, rs10172968, rs7670291, rs26955, rs2279686, rs72744938, rs62058104) have not yet been linked to any status-related trait. For three variants (rs7670291, rs26955, rs72744938) not even suggestive associations (P < 5 × 10−6) with status traits are discernible. For two of these, we find strong links to platelet count. A full list of all implicated phenotypes is provided in Supplementary Table B8. The only non-autosomal hit (rs146852038) has previously been linked to the age of first sexual intercourse and educational attainment41.

Fig. 2: Manhattan plot of the GWASs for occupational status measures.
figure 2

Manhattan plot with autosomal SNP position on the x axis and the logarithm of the P value on the y axis of the GWASs for occupational status measures CAMSIS (N = 273,157), ISEI (N = 271,769) and SIOPS (N = 271,769).

We then replicated these hits using the National Child Development Study (NCDS), an ongoing study of a British birth cohort born in 1958 (Methods). This dataset was chosen because it is a similar UK cohort, important since previous research demonstrated genetic variation by country and birth cohort for complex behavioural phenotypes42. Despite the notable disparity in sample size, with 4,899 individuals in the NCDS compared with ~273,157 in our discovery sample, our results surpassed the expected sign concordance and achieved a higher than anticipated number of significant hits at P = 0.05 (Supplementary Information Section 7.4). This replication result underscores the robustness of our findings, even when subjected to smaller-sample constraints.

To investigate the functional implications of the genetic variants associated with occupational status, we performed gene-based and gene-set analyses using multivariate analysis of genomic annotation (MAGMA; Methods)43. We observe that genes implicated by our SNPs are expressed in the brain, including the pituitary gland. No other tissue showed significant enrichment for gene expression.

We also jointly analysed the highly correlated occupational status measures together with income and education to increase statistical power using multitrait analysis of GWAS44 (MTAG; Methods) resulting in 731, 646 and 653 variants passing the significance threshold for CAMSIS, ISEI and SIOPS, respectively.

Genetic correlations (Fig. 3, lower triangle) between the three measures were close to 1 and thus stronger than the phenotypic correlations (upper triangle), ranging between 0.80 and 0.90. The genetic correlations with educational attainment and household income were almost twice as high (0.81–0.97) as their phenotypic counterparts (0.32–0.44). Considering these high genetic correlations, it is unsurprising that we found strong evidence for a common genetic factor of occupational status using genomic structural equation modelling (GSEM)34, with high loadings for all three measures (standardized path coefficients of 0.99, 0.99 and 0.99, for CAMSIS, ISEI and SIOPS, respectively; Supplementary Information Section 11). We furthermore provide evidence for a common genetic factor of SES including income and education (see Supplementary Fig. 6).

Fig. 3: Phenotypic and genetic correlations of occupational status measures and other SES indicators.
figure 3

Upper triangle: phenotypic correlations. Lower triangle: genetic correlations. Correlations of occupational status measures and other SES indicators are based on LD score regression. N = 246,492 for phenotypic correlations. Darker blue circles indicate stronger positive correlations.

Polygenic prediction

We assessed the out-of-sample predictive performance of the PGSs using two data sources. The first sample comprised a subset of siblings from the UK Biobank, for which we conducted an additional GWAS excluding individuals from the discovery analysis. The second sample consisted of the aforementioned NCDS.

MTAG-based out-of-sample predictions, which incorporate occupational status measures with household income and educational attainment, were slightly higher in the NCDS compared with the UK Biobank, with an incremental R2 of 0.097 (s.e. = 0.0035) in NCDS across all observations (0.075, s.e. = 0.00287 in the UK Biobank) for CAMSIS, 0.065 (s.e. = 0.0032; 0.054, s.e. = 0.0025 in the UK Biobank) for ISEI and 0.067 (s.e. = 0.0031; 0.053, s.e. = 0.00248 in the UK Biobank) for SIOPS (Fig. 4). As expected, polygenic scores based on PRSice2 and SBayesR weights have smaller but comparable incremental R2 values in both UK data sets across all measures of occupational scores (Fig. 4).

Fig. 4: Out-of-sample polygenic prediction performance within UK Biobank and NCDS.
figure 4

Incremental R2 compared to a baseline model consisting of 10 principal components, sex and age. Bars denote 95% CIs. N = 24,579 for CAMSIS and 24,472 for ISEI and SIOPS in the UK Biobank; for NCDS average performance over different ages, N = 5,389; 5,312; 5,211; 4,902; and 4,263 for CAMSIS at ages 33, 42, 46, 50 and 55; and corresponding N = 5,449; 5,293; 5,197; 4,892; and 4,252 for ISEI/SIOPS.

Source data

The longitudinal data in the NCDS reveal changes in the PGS effects across the life course or career trajectories, respectively. First, we were able to examine PGS prediction of occupational status across the life course at ages 33, 42, 46, 50 and 55 (Supplementary Information Section 13).

By leveraging the NCDS activity calendar data, we delineated comprehensive career trajectories over 30 years, from the onset of participants’ professional lives. When stratified by PGS quintiles, parental SES and sex, these trajectories revealed a notable interplay between polygenic scores and social factors (Fig. 5). Individuals who started their careers in the lower end of occupational status scores but ranked high in the PGS consistently advanced in their careers over the years. Conversely, those who initially held higher occupational status but had lower PGSs exhibited a steady decline in their professional trajectories, as measured by occupational status scores (Supplementary Fig. 12). While our focus is on CAMSIS, similar patterns were evident for SIOPS and ISEI, underscoring the consistency of our findings (Supplementary Figs. 13 and 14). These results further highlight the importance of understanding how and why societal structures and factors correlate with genotypes and jointly predict career trajectories.

Fig. 5: Mean percentile of the CAMSIS occupational status distribution across career stratified by sex, parental education and the CAMSIS PGS.
figure 5

N = 201,939 time points from 5,475 individuals. Parental education measured as Low = no qualifications, Medium = lower secondary and High = upper secondary/degree. Bars denote 95% CIs.

Disentangling direct, indirect and demographic effects

GWAS population estimates include a combination of direct effects (inherited genetic variation) and indirect effects or gene–environment correlations and can be further influenced by assortative mating. We conducted multiple analyses to better understand the relative importance of these dimensions in relation to our estimates (Supplementary Information Section 12)45,46.

First, we investigated the predictive performance of our scores between more than 29,500 siblings in the UK Biobank, a common design to identify direct genetic effects. Notably, traits related to socioeconomic status or other non-clinical outcomes tend to exhibit considerable within-family effect reductions45, potentially affecting their practical utility47. Our analysis supports these previous studies, showing a reduction in effects for occupational status measures of more than 50% in total, with results for other SES measures (education and income) in a similar range (see Fig. 6 for the ratio of population and within-family models and Methods).

Fig. 6: Ratio of standardized beta coefficients for the effect of the respective PGS on the phenotype based on within-sibship, adoption and parental control models to the population estimate for CAMSIS, SIOPS and ISEI.
figure 6

Ratios based on within-sibship, adoption and parental-control models. N = 24,579 for CAMSIS, 24,472 for ISEI and SIOPS (within-sibship); N = 3,398 for CAMSIS and 3,414 for ISEI and SIOPS (adoption); N = 13,972 for CAMSIS and 13,973 for ISEI and SIOPS (parental control). Each estimate is the ratio of standardized beta coefficient of within-sibship, adoption or parental-control model PGS, βreduced, to the beta coefficient of population-based PGS, βPGS. The error bars represent 95% CIs calculated with the bootstrap method (1,000 repetitions).

This discrepancy between the unrelated population and within-family estimate can be attributed to indirect family effects or assortative mating. Indirect effects include the (heritable) social transmission of economic resources, and cultural and social capital, as well as social-psychological factors such as parental expectations, which represent passive gene–environment correlation. To quantify the role of indirect effects, we use two research designs. First, we adjust the best-performing PGS prediction in the NCDS for parental SES (measured as parental occupational status at age 11). Second, we conduct an adoption prediction study. In an adoption design, children are raised by non-biological parents, thereby providing a unique opportunity to examine the influence of genetic factors while minimizing the effects of passive gene–environment correlation. We re-ran our GWAS for occupational status, while excluding the set of 3,414 respondents of British-European genetic ancestry in the UK Biobank that stated that they were adopted and for which occupational information was available. Results from both designs are remarkably similar, with the parental SES showing an effect attrition of 21% for all three measures, and the adoptee prediction resulting in an effect reduction of 23%, 22% and 27% for CAMSIS, ISEI and SIOPS, respectively. Notably, our results concur with ref. 48, where the extent of attenuation for cognitive and non-cognitive skills was considerably smaller in an adoption compared with a sibling design.

The observed remaining discrepancy between the population estimate controlling for indirect effects and within-family estimates could be attributed to attrition in the within-family design due to assortative mating, which attenuates the within-family effect. Recent findings by economic historians have demonstrated strong partner matching on occupational status within the United Kingdom dating back to at least the 1750s49. By employing a method first proposed by ref. 6, we demonstrate that, even in the absence of indirect effects, within-family effects are plausibly anticipated to be attenuated by 21–27% (Supplementary Information Section 12). We find further support for attenuation by directly analysing the spousal PGS correlation, which substantially exceed what could be expected from simple phenotypical assortment (Supplementary Information Section 12). Accordingly, it closes the observed gap between both estimates. Under the assumption of additive effect reduction due to assortative mating and indirect effects, all three methods consistently estimate the proportion of direct population effects to be within the range of 73–79%. This convergence of findings underscores the importance of accounting for biases related to partner matching when examining the role of genetic factors in occupational status. It furthermore motivates the inclusion of parental SES for robustness in the application of PGS analyses downstream of the population GWAS.

Social mechanisms linking genetics and occupational status

A pertinent question to consider is which traits serve as mediators of the association between an individual’s genome and occupational status. Evidence from twin studies indicates that both cognitive and non-cognitive traits play a mediating role in the relationship between genetic and social outcomes50.

Building on previous behavioural phenotype GWASs and the literature41, we identified five traits that are potential mediators of the general genetic factor of SES: cognitive performance6, attention-deficit/hyperactivity disorder (ADHD; as a proxy for behavioural disinhibition)51, openness to experience52, risk tolerance53 and neuroticism54. In a multivariate genetic regression model (Supplementary Information Section 11.3), overall, we can explain 70% of the genetic association with occupational status. Among these mediators, the associations are generally similar for all three measures of occupational status. Of these, the strongest effects are observed for cognitive performance. However, when we introduce ADHD and openness to experience into the models, these associations are slightly reduced. The importance of ADHD is increased by the introduction of risk tolerance. In contrast to ADHD and neuroticism, risk tolerance positively correlates with the SES factor, when controlling for the other potential mediators (see Supplementary Tables 3–5).

In the NCDS data, we tested the mediating effects of adolescent phenotypic measures of cognitive ability, externalizing behaviour, internalizing behaviour, scholastic motivation, occupational aspiration and subjective health for the occupational status PGS (Fig. 7, Methods and Supplementary Information Section 15). Depending on the career stage of the respondents indicated by NCDS waves, these variables explained 56–74% of the link between our PGSs and occupational status (Fig. 7). As expected, cognitive ability was the main mediator, explaining 33–51% of the association depending on respondents’ age. Scholastic motivation explained between 8–11%, occupational aspiration 7–11% and other non-cognitive traits up to 5%. The overall mediation by subjective health was minimal. Effect reductions are proportional when adjusting for parental SES to control for passive gene–environment correlation or indirect effects, respectively (Supplementary Information Section 11.3).

Fig. 7: Mediation model results of polygenic prediction of occupational status in NCDS through the life course.
figure 7

N = 3,169; 3,111; 3,075; 2,881; and 2,499 for CAMSIS at ages 33, 42, 46, 50 and 55; and corresponding N = 3,196; 3,100; 3,068; 2,878; and 2,494 for SIOPS and ISEI. Separate linear regression models with two-sided tests. Stars indicate the significance level based on P values: no star, P > 0.05; *0.01 ≤ P < 0.05; **0.001 ≤ P < 0.01; ***P < 0.001.

Intergenerational transmission

Given that parental status is strongly associated with their offspring’s status, the study of intergenerational status transmission has a long tradition, often focusing on educational attainment55,56. In the NCDS data, the phenotypic correlation between paternal occupation at age 11 and offspring occupational status at various ages for all three measures was substantial (~0.30). Including a PGS to control for genetic inheritance and identify social effects reduced the intergenerational correlation of occupational status by 11%. However, this is probably an underestimation given the power limitations of GWAS in capturing full SNP heritability. Rescaling the results to estimated SNP heritability57, up to 38% of the intergenerational correlation is due to common genetic inheritance; 62% is due to other factors, which include social inheritance and possibly the effects of rare genetic variants58 not captured by SNP-heritability estimates (see also Fig. 8 and Supplementary Information Section 16 for estimates by age).

Fig. 8: Percentage of genetic confounding in the intergenerational transmission of occupational status in NCDS through the life course.
figure 8

Percentages based on the predictive validity of polygenic scores (GWAS heritability) and an extrapolation of their effect to the variance explained by common SNPs (SNP heritability). N = 3,875; 3,835; 3,747; 3,550; and 3,079 for CAMSIS at ages 33, 42, 46, 50 and 55; and corresponding N = 3,902; 3,797; 3,718; 3,522; and 3,053 for SIOPS/ISEI.

Genetic confounding between occupational status and health

Occupational status is correlated with various health outcomes and higher-SES individuals typically live longer and are in better health2. It is essential to understand to what extent this association between occupational status and health is a causal one to, for example, design effective health intervention strategies. The observed association could partly be driven by endogeneity since individuals with better health also potentially secure better jobs or have higher performance at work. Controlling for genetic associations reduces biases arising from genetic endogeneity also in regard to potential direct pleiotropic effects59. We therefore investigate to what extent the occupational status PGS confounds the observed relationship between occupational status and general health as well as mental health in the NCDS data (see Supplementary Tables 14 and 16 for regression estimates). Similar to the intergenerational transmission of status, we find significant genetic confounding in the observed relationship.

To better understand the degree to which the genotypic effect of occupational status on general and mental health might incorporate indirect effects, we analysed the health of the respondents on the basis of their occupational status PGSs with and without parental occupational status at age 11 as a control variable. In accordance with previous results, we found that taking parental occupational status into account reduced the PGS prediction of general health on average across ages and outcomes by 19.5% and of mental health by 23.7%, demonstrating the importance of considering parental SES indicators for the genetic study of offspring’s health outcomes (see Supplementary Tables 15 and 17).

Discussion

Analysing data from 273,157 individuals from the UK Biobank, we identified 106 independent SNPs associated with occupational status measures, 8 of which have not been previously reported in related SES GWASs. Our study provides PGSs that are associated with occupational status in two samples of individuals of European ancestry in the United Kingdom, with an out-of-sample prediction of 5–8% depending on the status measure and up to 9% depending on career stage. Genetic associations derived from CAMSIS were ~50% larger than for SIOPS and ISEI and twice as high as measures applied previously27. This is likely since SIOPS and ISEI are based on multicountry data from the 1970s and 1980s, and CAMSIS was constructed within the United Kingdom where our sample is located. CAMSIS conceptually focuses on social interactions, in contrast to, for example, purely skill-based measures. A potential reason for this observation may be genetic selection into interaction networks of friends60,61. A particular feature of CAMSIS is the inclusion of spousal networks. As Fisher stated, referring to past historical periods and particular contexts: “[P]revailing opinion, mutual interest and the opportunities for social intercourse, have proved themselves sufficient, in all civilized societies, to lay on the great majority of marriages the restriction that the parties shall be of approximately equal social class”62. Evidence for genetic assortative mating has been demonstrated for political views63 and educational attainment64,65, supporting strong phenotypic evidence of assortative mating by SES, race/ethnicity and religion, also showing that this has evolved with demographic change66. The heritability of CAMSIS might partly capture effects of assortative mating on the phenotype of the individual. However, high genetic correlations between CAMSIS, SIOPS and ISEI may point to the benefits of a more granular and exact measure of the same latent phenotype in CAMSIS and construction of measures within similar populations42. We also provide the R package ‘ukbjobs’ to equip researchers using the UK Biobank to employ more-standard, well-defined sociologically informed measures67.

Our study not only demonstrates the genetic interdependence of occupational status measures, but also reveals a strong genetic correlation between educational attainment, income and occupational status, identifying a common genetic factor of SES. Notably, genetic correlations among SES indicators surpass phenotypic correlations by a factor of two to three. This outcome represents an outlier from the conjecture of ref. 68, which states that phenotypic correlations can serve as proxies for genetic correlations—a notion that finds empirical support in both animals and humans69,70.

The deviation might have several reasons, including trade-offs between investments into different dimensions of SES. Higher education does not always guarantee high income or occupational status, since labour market conditions, personal networks, ethnicity and gender can influence career trajectories5. Higher occupational status does not always bring a high income or demand high education, and may vary across cultures and social contexts71. Certain genetic traits may be associated with individuals achieving higher levels in particular areas through a mechanism known as vertical pleiotropy (that is, mediated pleiotropy)23. For instance, genetic factors correlate with cognitive abilities, personality traits and mental health, which may, in turn, impact educational attainment, income and occupational status. Environmental factors such as family background, social norms, cultural expectations and chance also shape SES. Environmental differences in individual cases can lead to more heterogeneity and thus weaker phenotypic correlations, and subsequently have a completely different causal pathway in influencing health and behavioural outcomes.

We have shown that the prediction attrition within families is in part due to indirect genetic effects or genetic nurture, respectively, which also consistently contribute to the latent factor for constructed SES measures. Moreover, a mounting body of evidence suggests that strong assortative mating on this latent factor has been present for multiple generations49,72. Notably, a higher spousal correlation has been observed for the genetic predictor of educational attainment than for the actual phenotype64,73. This phenomenon may partially account for why genetic variants display a stronger predictive power for occupational status between families, as opposed to within families where the variation in these variants is more limited.

We integrated the polygenic signal for occupational status into occupational mobility and social stratification research and vice versa, with crucial implications on both sides. First, intergenerational mobility in social status is of great interest, not only for social scientists, but also policymakers, public health and epidemiology and is related to questions of equality of opportunity and societies’ degree of openness5,18,74. Next to cognitive skills, we showed that also scholastic motivation, occupational aspiration, personality traits and behavioural disinhibition (proxied by ADHD) mediate the association between genetics and occupational status. It is also vital to note that around one-third of the polygenic signal remains unexplained in each of our approaches, although it is likely that this is at least in part a result of the incomplete overlap of mediating variables between both analyses. We need further investigations to better understand the role of genetics in status inheritance and evaluate the interpretation of heritability as a pure merit measure in the context of questions addressing equality of opportunity.

Second, there are important considerations related to the intergenerational transmission of SES. It has been a common assumption in heritability studies of educational attainment that genetic influences are stable in absolute terms, while environmentally driven inequalities tend to reduce with lower intergenerational correlations8. Extrapolating results from PGSs, we show that the intergenerational correlation for occupational status is up to 38% due to genetic inheritance—this is even stronger than for educational attainment55,56. This suggests that social stratification researchers need to adjust their sole focus on intergenerational correlations to also explicitly consider gene–environment correlation in statistical modelling. We note that the applied extrapolation assumes SNP-heritability levels but could still represent an underestimation since PGSs have a lower prediction compared with SNP heritability. However, the latter is still smaller than the heritability estimated from twin models; hence, SNP heritability as measured here remains conservative compared with previous studies55. The discrepancy between SNP and twin heritability might be due to rare genetic variants, higher environmental homogeneity within families and nonlinear genetic effects42,75.

Third, we highlight questions about the causality of the relationship between health and occupational status and SES in general2. It is plausible to assume that higher status causally leads to better health, for example, due to a higher living standard, nutrition and better knowledge about and access to health care systems, among others2. At the same time (heritable) poor health might force an individual into a lower-status occupation, or genetics might have direct pleiotropic effects on education and health or related factors, leading to an overestimation of a direct, phenotypic causal effect. The question of causality, however, is paramount for designing targeted policy interventions, and genetic confounding needs to be considered. It is also relevant to quantify their potential impact and clarify claims in social mobility research regarding genetically driven, health-related confounders. We show that the association between occupational status and health is up to 25% confounded by common genetic effects and therefore substantially upward biased when genetic factors are not considered.

Fourth, combining theoretical, measurement and modelling perspectives of the social sciences and genetics is not only important for the interpretation of status in social science theory and modelling, but also for genetic research75. First, the discovery of indirect parental effects unravelled the importance of social influences correlated with genotypes in the discovery of genetic effects on education9. We show that controlling for parental occupational status strongly reduces genetic prediction of the occupational status PGS with general and mental health. While genetic prediction based on our PGSs on health is comparably small (1%) and confounding effects may not entirely generalize to other regions of the genome important for health outcomes, further investigation is required to understand whether and how parental SES measures should be integrated in population GWAS studies. Second, the continued use of the measures that have a strong theoretical, conceptual and measurement basis, such as occupational status in social stratification research, underlines the importance of precision phenotypes. Contrary to a previous GWAS that relied solely on a skill-based minimal occupational classification27, our occupational status phenotypes, which have been developed by sociologists over decades, doubled the heritability using CAMSIS, increased SNP discovery by more than threefold and also provides a consistently meaningful interpretation of the outcome variable. This also emphasizes the genetic relevance of socially theorized measures and of social factors included in them, such as potential interaction or social prestige.

Finally, our findings embrace an interdisciplinary perspective when studying social stratification, mobility and status transmission. By further studying the underlying latent factor of individual socioeconomic status indicators, we can foster a better understanding of the genetic correlates of socioeconomic status and its broader implications for society. It is imperative to comprehend the role of indirect effects and passive gene–environment correlations in this puzzle, as well as the causes and consequences of assortative mating on these relationships. The dynamic nature of the intergenerational transmission of socioeconomic status would be best served by a more comprehensive and rigorous social, historical and genetic approach.

Our study also has its limitations. The UK Biobank represents only 5.5% of the approached target population and over-represents individuals with lower mental health problems, BMI, non-smokers, with higher education and from less-economically deprived areas45,76,77,78,79. Consequently, participation bias affects the UK Biobank, limiting its generalizability and introducing the potential that observed genetic associations may be influenced by the characteristics of the subset of individuals who chose to participate in the UK Biobank80. We do show how our measures of occupational status differ from UK census data (Supplementary Information Section 6). We can expect environmental heterogeneity across different populations to challenge our findings. Accordingly, we use the NCDS sample, another UK dataset with different potential selection biases, to strengthen our analytic approach. While PGS predictions are nearly identical in our two British-European genetic ancestry populations, previous research has demonstrated that for educational attainment, only 50% of genetic effects are universal across seven Western populations42. Population genetic heterogeneity also limits the scope of this study beyond UK residents, since similar to the majority of GWAS so far, we focus on European-ancestry individuals in a Western country81,82. The integration of other ancestries, temporal, geographical and more diverse socioeconomic contexts is the future. The reduction of PGS prediction within families also emphasizes the relevance of recent initiatives for discovery designs using family data and to further study the role of assortative mating for within-family effect reduction45. It is particularly important since parental genetic factors could influence their children’s occupational status through non-genetic mechanisms, and these effects might not be adequately captured when considering only the child’s PGS. Therefore, we recognize the importance of including both parents’ PGSs as control variables to estimate genetic confounding effects, but this was not possible using the current data. This underscores the need for additional research with multigenerational genetic and social survey data. Despite these limitations, the current study offers many new insights into the interplay between genetics and occupational status scores along with socioeconomic status.

Methods

This Article has Supplementary Information with details about data and methods and additional detailed analyses. Extended Data Fig. 1 also provides an overview. We have also built the R package ‘ukbjobs’ that allows researchers to construct CAMSIS, ISEI and SIOPS occupational scores directly from the UK Biobank data (https://github.com/tobiaswolfram/ukbjobs).

Ethics approval

This research was conducted using the UK Biobank under application 32696 and NCDS under application GDAC_2021_16_TROPF, with ethics approval from the University of Oxford under application SOC_R2_001_C1A_21_60. Both the UK Biobank and NCDS applications were specific to the scope of this paper. For the UK Biobank approval, we received approval for a scope extension to ensure transparency, allowing us to expand from our focus on non-standard occupations to also include occupational status. Here we specified that our plan was “to perform GWAS analysis using employment histories from the UK Biobank to construct sociologically informed measures of occupational status”. We specified that we would construct sociologically informed measures of occupational status (CAMSIS, SIOPS and ISEI) for our GWAS and noted that the analysis would be accompanied by NCDS genetic and phenotypic data. For the NCDS application, we specified not only the information mentioned above but also the set of polygenic prediction analyses. We also preregistered our analysis plan (https://osf.io/djbr2/), which was updated for replication (https://osf.io/x6va5).

UK Biobank

For both the discovery and prediction of occupational status measures, education and income, we used data from the UK Biobank. The UK Biobank is a large-scale biomedical database and research resource containing in-depth genetic and health information from 502,655 individuals recruited between 2006 and 2010. The database is globally accessible to approved researchers. Details of the UK Biobank genotyping procedure can be found elsewhere83. After phenotype selection and genetic quality control (performed in PLINK v.1.9, v.2), we conducted our analyses on 273,157 individuals (130,952 males, 142,205 females).

NCDS

As a second, longitudinal UK prediction sample, we used the NCDS following 17,000 children born in Great Britain in 1 week in 1958. NCDS has been designed to examine the social and obstetric factors associated with stillbirth and death in early infancy. Overall, there were ten waves available (birth: 1958, age 7: 1965, age 11: 1969, age 16: 1974, age 23: 1981, age 33: 1991, age 42: 2000, age 46: 2004, age 50: 2008 and age 55: 2013).

Phenotyping

‘Socioeconomic differences’-based indices measure the ‘attributes of occupations that convert a person’s main resource (education) into a person’s main reward (income)’31. The most commonly used measure is occupational prestige, termed the International Socioeconomic Index (ISEI)31, which is constructed from scaling weights that maximize the (indirect) influence of education on income through occupation.

Other prestige-based measures are the result of public opinion surveys in which representative samples of the population are tasked with ranking occupations by their relative social standing. Emerging at a similar time as socioeconomic indices84, Treiman85 demonstrated that prestige-based measures were surprisingly constant over time and cultures, consolidating their use in social scientific research. The Standard International Occupational Prestige Scale (SIOPS or Treiman-prestige)28 remains another commonly used metric in this tradition.

Lastly, occupational status indicators derived from ‘social interaction’ focus on the heterogeneity of associations between occupants of different jobs, following the tradition of refs. 86,87. They are based on the idea that differential association is a function of social stratification since members of a group are more likely to interact within that group than with out-group members. Thus, acquaintances, friends and spouses are much more likely to be selected from within the same group than from outside. A group of Cambridge sociologists reversed this approach to measure social structure on the basis of interactions. The Cambridge Social Interaction and Stratification Scale (CAMSIS) measures the distance between occupations on the basis of the frequency of social interactions (operationalized as husband-and-wife combinations) between them1.

Information on occupational status scales was merged to the occupational classification scheme utilized in the UK Biobank (the Standard Occupational Code version 2000 (SOC2000))88. CAMSIS-based status could be directly merged using the data available in ref. 89. ISEI and SIOPS (as provided by the R package ‘strat’, R software v.4.2.0, v.4.1.2)90, however, use the less granular ISCO-88 scale, so a mapping from ISCO to SOC was employed91. If multiple job codes for a respondent were available, the most recent job was used.

‘Income’ was measured similarly as in ref. 23 using a coarse, 5-level ordinal household income variable. Educational attainment was defined as years of education and coded according to the scheme provided in ref. 6.

The prestige of ‘current’ or ‘most recent occupation’ is treated as a continuous measure. In the initial discovery analysis using the UK Biobank, respondents were asked to provide job titles for the current or the most recent job held. The job information was coded using the four-digit UK SOC2000. We built a procedure to link the UK SOC2000 to ISCO-88(COM) and then derive ISEI and SIOPS from ISCO-88(COM). All phenotypes were inverse-normal rank transformed before analysis. In the NCDS, the SOC2000 code of the respondent’s occupation (as well as their father’s when they were 11 years old), is also available, thus the same procedure was applied.

In the NCDS, we looked at ‘health’ measured at ages 23, 33, 42, 46, 50 and 55. Participants were asked to rate their general health on a scale from 1 (excellent) to 4 (poor) (age 23 and 33), 1 (excellent) to 5 (very poor) (age 42) and 1 (excellent) to 5 (poor) (age 46, 50 and 55).

For each time point, the outcome was treated as metric and standardized to have a mean of zero and a standard deviation of 1. We then regressed it on the CAMSIS, ISEI and SIOPS PGSs, respectively, while controlling for sex and 10 principal components to correct for population stratification.

Discovery

An analysis plan was preregistered and uploaded in February 2021 (https://osf.io/329pr/) and updated in February 2023 (https://osf.io/x6va5). All calculations were based on mixed-model association tests as implemented in the programme FastGWA92, with association testing based on v.3 imputed data. Following the preposted open science analysis plan in each regression, the following covariates were included: the first 10 genomic principal components, age at assessment and age2, UK Biobank (UKB) assessment centre at recruitment, sex and genotyping array (BiLEVE or Axiom) on the sample of British-European genetic ancestry. Chromosomes were analysed separately. To speed up the calculation of summary statistics, a minimum minor allele frequency (MAF) filter of 0.01 was imposed, leaving 10.2 million SNPs for the analysis. We supplemented our autosomal analyses with association analyses of SNPs on the X chromosome in a joint association analysis of both sexes.

PheWAS

All 1,000 genome SNPs in linkage disequilibrium (R2 > 0.6 for European ancestry) with the 106 independent SNPs were identified using FUMA93. For these 11,206 SNPs, 1,005,470 phenotypical associations reaching at least suggestive significance (P < 5 × 106) in the GWAS catalogue and the IEU OpenGWAS project were collected94. All variants with at least one genome-wide significant link to a trait associated with education, income or any other socioeconomic outcome were removed, leaving 8 hits (rs12137794, rs17498867, rs10172968, rs7670291, rs26955, rs2279686, rs72744938, rs62058104) that have not yet been linked to any SES-related trait. For three variants (rs7670291, rs26955, rs72744938), not even suggestive associations (P > 5 × 10−6) were found.

Univariate LDSC

Univariate LDSC regression was performed on the summary statistics from the GWAS to quantify the degree to which population stratification influenced the results and to estimate heritability (performed in Python v.3.8.4, v.3.9.15). For this, GWA test statistics were regressed onto the LD score of each SNP. LD scores were used with European genetic ancestry individuals and weights were downloaded from https://utexas.app.box.com/s/vkd36n197m8klbaio3yzoxsee6sxo11v. SNPs were included if they had a MAF of >0.01 and an imputation quality score of >0.9 and were available in the LD score file. Intercepts for all three occupational status measures were close to 1 (CAMSIS: 1.1193, s.e. = 0.013; SIOPS: 1.0845, s.e. = 0.011; ISEI: 1.0993, s.e. = 0.0125).

MAGMA

To investigate the functional implications of the genetic variants associated with occupational status, we performed gene-based and gene-set analyses using MAGMA43. We used FUMA93 to annotate, prioritize, visualize and interpret GWAS results, to run MAGMA on our summary statistics and to map SNPs to genes. We tested whether the genes prioritized by FUMA were enriched for expression in 30 general tissue types (GTEx v.8) and 53 specific tissue types (GTEx v.8) using MAGMA’s gene-set analysis. We observed a strong expression in all brain tissues compared with other tissues. No other tissue showed significant enrichment for gene expression.

MTAG

MTAG (in Python v.2.7)44 was used to meta-analyse all three occupational status measures with a secondary GWAS on household income in UKB and a secondary GWAS on educational attainment in UKB (for the validation subsample of siblings in UKB) or the third GWAS meta-analysis for education6 excluding 23andMe participants as well as the NCDS cohort (for the validation using the NCDS data). This allowed us to leverage the high genetic correlations between the occupational status measures and income/education (see above) to increase power and detect variants, and improve prediction as outlined above and in Supplementary Information Section 10.

GSEM

We used the infrastructure provided by the GenomicSEM package34 to compute LDSC-based genetic covariances and correlations between our occupational status measures and education and income. SNPs were included using similar criteria as specified for univariate LDSC. Covariance structures between the three measures of occupational status were used as input in a genomic structural equation model to analyse their loading on a joint factor of occupational status (Supplementary Information Section 11.1) We furthermore applied a multivariate genetic regression model to the genetic covariance matrix of each of our occupational status measures and cognitive performance, ADHD, openness to experience, risk tolerance and neuroticism (Supplementary Information Section 11.3).

Prediction analyses

Overall, we constructed three types of polygenic scores for each phenotype (Supplementary Information Section 10): (1) Pruning and thresholding using PRSice95, (2) SBayesR96 and (3) MTAG + SBayesR. In our prediction analyses, we residualized for sex, age (only in UKB) and 10 principal components before calculating the R2. For the within-family analysis in UK Biobank, we identified a sample of siblings and computed family-fixed-effects regressions with both polygenic scores as well as phenotypes standardized beforehand, and interpreted the change in coefficients (Supplementary Information Section 12).

Mediation analyses

NCDS respondents were asked at age 11 about the type of job they would like to do in the future. We coded these jobs to SOC2000, constructed their occupational status and ran mediation models in lavaan in R (v.4.2.0, v.4.1.2)97 to quantify the share of the association between PGS and occupational status that can be attributed to occupational aspirations. We tested a comprehensive multiple mediation model, introducing cognitive ability, internalizing behaviour, scholastic motivation and externalizing behaviour as additional mediators (Supplementary Fig. 8).

Confounding analyses

Within NCDS, information on the paternal occupation at age 12 was used to estimate the correlation between paternal and offspring occupational status at various ages for all three measures. We combined the approach of scaling the variance explained by polygenic scores, outlined in ref. 57, and integrated it into a mediation model to test which share of the intergenerational correlation for each of the three metrics was confounded by the corresponding polygenic score if we assumed that it only explained the amount of variance in our prediction analysis or the full SNP heritability (Supplementary Information Section 14).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.