Main

Type 2 diabetes (T2D) is becoming a major public health concern in Africa, congruent with the complex interplay of genetic, environmental and socioeconomic factors1,2,3. According to the International Diabetes Federation, it is predicted that, globally, people with T2D will rise by 51%, reaching 700.2 million by 2045 from 463 million in 20194. A substantial increase of 143% is anticipated in Africa, with numbers expected to rise from 19.4 million in 2019 to 47.1 million in 20454. Hemoglobin A1c (HbA1c), also known as glycated hemoglobin5, provides an estimate of the blood sugar level over a period of 2–3 months by measuring the percentage of hemoglobin with attached glucose6,7. An HbA1c level of 6.5% or higher on two separate tests typically indicates diabetes. Levels between 5.7% and 6.4% suggest prediabetes, and values below 5.7% are considered normal8. Combining proteomic and genomic data for blood-based protein quantitative trait loci (pQTLs) has identified hundreds of associations between genetic variants and protein levels9,10,11,12,13. A fraction of individuals with African ancestry in the diaspora has been studied in proteomics studies to date12,14, with continental Africans largely underrepresented.

To address this, we measured 2,873 proteins using the Olink PEA Explore assay in the plasma samples of 163 individuals with prediabetes or T2D (cases) (defined as HbA1c > 5.7%) and 362 normoglycemic controls (defined as HbA1c < 5.7%) (Table 1) from a subset of the Uganda Genome resource, hereafter referred to as Uganda Genome Resource Proteomics Data (UGR-PD). We performed differential protein expression analysis between the two groups and carried out proteomic genetic association analysis to identify sequence variants influencing protein levels. We subsequently examined the role of the identified pQTLs in T2D using colocalization and Mendelian randomization (MR) analyses.

Table 1 Clinical characteristics of the study participants

First, we studied the association between protein levels and cardiometabolic traits measured in the UGR-PD (Supplementary Table 1). A total of 208 proteins were associated with HbA1c, 42 with high-density lipoprotein (HDL) and 46 with low-density lipoprotein (LDL) at a false discovery rate (FDR) of 5% (Fig. 1). Some of the associations, such as ERCC1 found to be associated with HbA1c (Padj = 6.77 × 10−7) and HDL (Padj = 1.91 × 10−2), have been shown to affect glucose intolerance in a progeroid-deficient animal model causing an autoinflammatory response that leads to fat loss and insulin resistance15.

Fig. 1: Association of protein levels with clinical traits.
Fig. 1: Association of protein levels with clinical traits.
Full size image

The y axis represents the association’s FDR-adjusted −log10(P); the x axis of each plot represents the effect size estimated using linear regression. The horizontal red dashed line indicates the multiple testing adjusted significance threshold with associations above the line considered statistically significant. GGT, gamma-glutamyl transferase; SBP, systolic blood pressure.

Next, we sought to identify differentially expressed protein (DEP) levels between cases and controls. DEPs were defined based on a twofold change (log2(fold change) > 0.5) in expression levels at an FDR of 5%. This led to the identification of 88 DEPs. Among these, 57 were significantly upregulated, with log2 fold changes ranging from 0.50 to 1.18, while 31 proteins were downregulated with log2 fold changes between −0.51 and −1.17 (Fig. 2a and Supplementary Table 2). EGF-like repeats and discoidin I-like domains 3 (EDIL3), associated with processes such as cell adhesion, migration and vascular development, showed the most significant upregulation with Padj 1.2 × 10−13. EDIL3 is differentially expressed in the adipose tissue of insulin-resistant and insulin-sensitive individuals16,17, and is involved in angiogenesis18,19,20. Impaired angiogenesis has been implicated in the progression of diabetic retinopathy and nephropathy21,22. The DEPs were primarily enriched in Gene Ontology terms such as chemokine receptor binding and chemokine and cytokine activity (Supplementary Table 3). We further compared cases and controls with regard to adipokines, biomarkers of obesity and proteins linked to pancreatic function before and after adjusting for obesity to disentangle obesity-driven signals from those independently associated with diseases status (Fig. 2b). In cases of the unadjusted model, leptin (LEP) was significantly upregulated compared to controls (log(fold change) = 0.759, Padj = 1.62 × 10−5). C-X-C motif chemokine ligand 5 (CXCL5) showed the highest upregulation in cases (log(fold change) = 1.056, Padj = 1.76 × 10−7). Resistin and interleukin-18 were significantly downregulated in cases compared to controls (log(fold change) Padj = −0.292, 8.51 × 10−3 and −0.367, and 5.89 × 10−4, respectively). Additionally, angiopoietin-like protein 2 was elevated in cases (log(fold change) = 0.426, Padj = 0.00153), while inflammatory markers such as tumor necrosis factor and interleukin-6 showed nonsignificant expression level differences between cases and controls. However, upon adjusting for obesity, CXCL5 and LEP were attenuated indicating that their expressions may be mediated by obesity (Fig. 2b).

Fig. 2: Proteomic profiling identifies differentially expressed proteins linked to type 2 diabetes.
Fig. 2: Proteomic profiling identifies differentially expressed proteins linked to type 2 diabetes.
Full size image

a, Volcano plot showing DEPs, with significantly overexpressed proteins annotated in red and downregulated proteins in blue, using a linear model implemented in limma. The black horizontal dashed line represents the −log10(FDR) cutoff corresponding to a 5% false discovery rate. b, Comparison of cases and controls with regard to adipokines and other proteins that are biomarkers of obesity and central adiposity before and after adjusting for obesity. The log(fold change), a measure of protein expression changes between patients with T2D and controls, was calculated as the base-2 logarithm of the ratio of the mean expression in patients with T2D to the mean expression in controls. c, Scatter plot of the comparison of the top significant DEPs with UGR-PD on the y axis and UKB-PPP on the x axis.

The comparison of significant DEPs in UGR-PD with the same set of proteins in the UK Biobank Pharma Proteomics Project (UKB-PPP) using the T2D definition described in ref. 23 (ncases (T2D) = 2,461 and ncontrols = 50,553) showed some population-specific differences (log(fold change)). For instance, proteins such as apolipoprotein F (APOF), tumor necrosis factor superfamily member 12 and lipoprotein lipase (LPL) are significantly upregulated in patients with T2D compared to controls in the UGR-PD but not in the UKB-PPP. lysophosphatidylcholine acyltransferase 2 and interleukin-8 are more strongly downregulated in patients with T2D compared to controls in the UGR-PD. Proteins such as prolylcarboxypeptidase, LEP, EDIL3 and apolipoprotein A-IV (APOA4) showed the same trend of expression between patients with T2D and controls in the two populations (Fig. 2c).

Among the significant DEPs in the UGR-PD, eight have T2D-associated genome-wide association study (GWAS) hits within 40 kb (Table 2), although none of the significant DEPs showed evidence of colocalization with T2D. The association of these proteins with T2D and the nearby GWAS signals strengthens the hypothesis that these proteins could have a causal or mediatory role in the pathophysiology of T2D in this population.

Table 2 Significant DEPs with a T2D GWAS hit within 40 kb of the transcription site of the gene encoding the protein

After quality control, we undertook pQTL analysis with up to 15.8 million imputed variants with a minor allele frequency (MAF) > 0.05 for 2,873 proteins. We identified 399 independent associations after multiple testing correction at P value thresholds of P < 1.46 × 10−6 and P < 2.2×10−10 for cis- and trans-pQTLs, respectively (Supplementary Table 4). We identified 346 (86.7%) cis-pQTLs and 53 (13.3%) trans-pQTLs. Seven proteins had both cis-pQTLs and trans-pQTLs. We also identified four trans-pQTLs located within a pleiotropic locus.

To determine the uniqueness of the pQTLs identified in the UGR-PD, we compared them against the pQTLs of 47 genome-wide pQTL studies (Supplementary Table 5). We identified six independent cis-pQTLs and 31 independent trans-pQTLs that were not previously reported in any population (Supplementary Table 6), and 362 pQTLs reported in prior studies (Supplementary Table 7). We compared our pQTL findings against the African ancestry data of the UKB-PPP and found that 16.7% (58 of 346) of the discovered cis-pQTLs and all trans-pQTLs have not been reported previously (Supplementary Table 8). We tested the conditionally independent UGR-PD pQTLs for replication in the UKB-PPP. Of the 399 pQTLs, we were able to test 392 in the UKB-PPP data. Of these, 303 replicated at P ≤ 1.2 × 10−4 (Bonferroni-corrected threshold) and 270 also had the same effect estimate direction (Supplementary Table 9).

We examined the relevance of the previously identified pQTLs with T2D and associated risk factors, such as lipid traits, blood pressure and cardiovascular disease, by cross-referencing with the GWAS Catalog and ref. 24. Of the 362 previously identified pQTLs (Supplementary Table 7), six were associated with T2D or T2D-related traits (Supplementary Table 10).

One hundred and fifty-one identified pQTLs overlapped or fell within a 500-kb window of T2D-associated GWAS variants (Supplementary Table 11). Only one of these pQTLs (rs6075339) colocalized with a T2D signal. rs901886 (ICAM5) located on chromosome 9 overlapped with multiple T2D-associated variants, including rs74956615 and rs34536443, which have been implicated in immune regulation and inflammation25,26, processes known to contribute to T2D pathophysiology. rs62068711 (DPEP1) on chromosome 16 also overlaps with rs12920022, a variant previously linked to T2D risk27, suggesting a potential role of dipeptidase-related pathways in glucose metabolism. Furthermore, a pleiotropic pQTL, rs532436, identified near SELE, IL-7R and ALPI in our study is also associated with a GWAS hit (rs529565) for ABO protein levels28. The association of rs532436 with multiple proteins (for example, ABO, SELE, IL-7R) suggests that this variant may affect upstream regulatory mechanisms (for example, transcription factor binding, chromatin accessibility) influencing the expression of multiple genes (Fig. 3).

Fig. 3: Three-dimensional Manhattan plot of identified cis-pQTLs.
Fig. 3: Three-dimensional Manhattan plot of identified cis-pQTLs.
Full size image

a, Proteins are shown on the x axis, chromosome location is shown on the y axis and the −log10(P) of each association is shown on the z axis. b, Scatter plot of pQTL variant location against the location of the gene encoding the target protein. Each dot represents an independent variant. cis-pQTLs are colored in red, while trans-pQTLs are colored in blue. A multiple testing correction threshold was used for both cis and trans-pQTLs. c, Summary of the identified pQTLs showing their functional consequences. d, Proportion of variance explained by the conditionally independent pQTLs categorized into bins.

Next, we performed colocalization analysis to determine the shared risk variants between pQTLs and T2D using a large multi-ancestry GWAS29. We found one colocalizing signal with strong evidence for a shared T2D risk variant. Specifically, we observed a posterior probability (PP4 = 95.5%) for colocalization between a T2D-associated variant and a pQTL (rs6075339) regulating the expression of the signal regulatory protein alpha (SIRPα) protein (Fig. 4a,b). Genetic studies have implicated SIRP signaling in diabetes pathogenesis. For example, a single-nucleotide polymorphism in human SIRPγ, encoding a SIRP family receptor that also binds CD47, was associated with type 1 diabetes30.

Fig. 4: LocusZoom plots of the colocalizing SIRPα pQTL and T2D risk variant.
Fig. 4: LocusZoom plots of the colocalizing SIRPα pQTL and T2D risk variant.
Full size image

a,b, LocusZoom plots of the colocalizing SIRPα pQTL (a) and T2D risk variant (b). Top: T2D GWAS P values. Bottom: pQTL P values for the same region. c, MR forest plot for proteins causally associated with T2D. The effect estimates represent the odd ratio of T2D per unit change of protein level and the error bars represent the 95% confidence intervals around the estimated effects. These were estimated using a Wald ratio estimate. d,e, PheWAS plots for TFP1 (d) and ACE (e). SNP, single-nucleotide polymorphism.

We undertook an MR analysis to examine the causal relationship between the identified cis-pQTLs and T2D. We found 18 proteins to be causally associated with T2D. Our MR results showed that genetically increased angiotensin-converting enzyme (ACE), CA13, MLN, SERPINA5 and WFIKKN1 levels were associated with an increased risk of T2D. Proteins such as ADH1B, CNTN2, COMT, CPM, GHR, ICAM5 and ILR6 showed a protective effect on T2D risk (Fig. 4c and Supplementary Table 12). ACE is an essential component of the renin–angiotensin system and it has a crucial role in the development of insulin resistance31. By increasing insulin sensitivity and decreasing inflammation, ACE inhibitors, which are frequently used to treat hypertension, have been demonstrated in clinical studies and meta-analyses to lower the incidence of new-onset T2D in people at high risk32. the COMT variant rs4680 is associated with lower HbA1c and protection from T2D33. This corroborates our MR findings where the COMT pQTL rs4680 showed a protective effect against T2D. While no other significant pQTLs identified through MR were directly associated with T2D, several proteins (TFPI, LTA, GHR and ADH1B) encoded by genes within which these pQTLs reside have been linked to T2D or T2D-related traits (Supplementary Table 13).

In line with its established function in blood pressure regulation, the pQTL rs4363 showed significant associations with cardiovascular traits in the phenome-wide association study (PheWAS), such as high blood pressure and hypertension. Furthermore, its associations with Alzheimer’s disease (neurological domain) and T2D (metabolic domain) indicate wider in metabolic and neurodegenerative processes. It also showed some significant associations with anthropometric traits, such as height and standing height. rs3213739 exhibited significant associations with the waist–hip ratio (anthropometric domain) and the resting heart rate and pulse rate (cardiovascular domain), highlighting its role in body composition and metabolism (Fig. 4d,e and Supplementary Table 14).

Lastly, we assembled a list of 1,804 postulated effector genes for T2D from nine GWAS studies. If a gene coding for any of the proteins associated with the identified pQTLs in our study was found in the curated list, we defined such gene/protein as reported; if not, we classified them as previously unresolved. We identified 320 proteins previously unresolved as potentially linked to effector genes for T2D based on these GWAS signals (Supplementary Table 15).

Our work takes a first step toward addressing the underrepresentation of continental African individuals in genetics and proteomics studies. Thus, we were able to delineate the molecular landscape of 2,873 unique proteins in a context that might be pivotal to understanding drivers of T2D pathophysiology, identified 58 African-ancestry-specific cis-pQTLs that have not been reported previously and identified 18 proteins that are causally associated with T2D. The generalizability of these findings may be limited to the continent because the population was drawn from a single demographic group within Africa. Hence, there is a need to include more ancestrally diverse populations in future studies.

In this study, we used the Olink targeted proteomic assay, which has some limitations; for example, only a subset of the full proteome is studied and the affinity of aptamers may be affected by missense variants. While HbA1c is a highly standardized and accurate test with lower intraindividual variability compared to fasting glucose, in individuals of African ancestry, using HbA1c as a blood sugar level indicator may not provide the full spectrum of the metabolic conditions associated with T2D because of the prevalence of hemoglobinopathies, such as glucose-6-phosphate dehydrogenase (G6PD) deficiency. In individuals with G6PD deficiency, there is increased susceptibility to hemolysis, which may lead to reduced HbA1c levels potentially leading to missed T2D diagnosis34,35.

The DEP analysis of adipokines and metabolic proteins between cases and controls revealed differences in the role these proteins have in obesity, inflammation and pancreatic function. LEP was significantly upregulated in cases, which is consistent with its known association with adiposity and metabolic regulation36. Previous studies linked circulating LEP levels with insulin resistance and T2D development37; experimental models suggest that it may influence Beta cell function and glucose metabolism38,39.

Population-specific differences in protein expression were observed when DEPs were compared between the UGR-PD and UKB-PPP cohorts. Some proteins were upregulated in patients with T2D compared to controls in one cohort but not in the other. In comparison, other proteins were downregulated in one cohort but upregulated in the other. These differences suggest that factors beyond disease status may influence variation in protein expression. Ancestral genetic variation is one potential explanation, as genetic diversity affects gene regulation and metabolic pathways40. Additionally, environmental factors, including diet, lifestyle and exposure to infections, may contribute to disparities in protein expression profiles. Lastly, variations in T2D disease progression, comorbidities or medication use across the two cohorts could also have a role. Some significantly expressed DEPs had a T2D GWAS hit within a 500-kb window. However, none colocalized with T2D. The finding provides evidence that disease risk may be influenced by genetic variants close to T2D-associated proteins via protein-mediated pathways. Proteins like LEP, LPL, EIF5A and CCL25 have several GWAS hits within ±500 kb of them, which shows that these proteins may mediate genetic predisposition to T2D.

Some of the identified pQTLs were associated with T2D or relevant to T2D via association with other cardiometabolic traits, including lipid and blood pressure traits. Previous studies found rs532436 and rs505922 to be associated with T2D, HDL cholesterol levels, triglycerides (TGs) and diastolic blood pressure (DBP) 41,42,43 across diverse ancestral populations. In addition, rs77924615 has been linked to cardiovascular disease and blood pressure traits44,45, supporting its potential contribution to metabolic syndrome, a key risk factor for T2D. The association of rs10460181, rs2455069 and rs12721054 with lipid traits46,47,48 corroborate previous findings that lipid dysregulation has a vital role in developing insulin resistance and T2D49,50. According to the MR results, the COMT pQTL rs4680 had a protective effect against T2D. This is consistent with a study conducted in the Women’s Genome Health Study, which found that the high-activity G-allele of rs4680 was linked to lower HbA1c levels and a slight decrease in the risk of T2D in women of European ancestry33.

In conclusion, the associations and causally associated proteins identified offer promising avenues for developing targeted therapies and personalized treatment strategies for T2D, contributing to improved management and prevention of this global health challenge. Our findings demonstrate the utility and discovery opportunities afforded by including individuals of African ancestry in large-scale proteomic studies.

Methods

Ethics

The study was approved by the Uganda Virus Research Institute Research and Ethics Committee (UVRI REC no. GC/127/907) and the Uganda National Council for Science and Technology (no. UNCST HS2527ES).

Study population

Participants were selected from the UGR, a subset of the General Population Cohort (GPC). As described previously51,52, the GPC is a population-based cohort of over 22,000 people from 25 nearby communities in the remote Southwest Ugandan sub-county of Kyamulibwa, which is a part of the Kalungu district. We selected 528 samples from the UGR-PD based on age, sex and HbA1c. After hemolysis of anticoagulated whole blood, the concentrations of total hemoglobin and HbA1c were measured using turbidimetric inhibition immunoassay quantitative hemoglobin Alc Gen51. In addition to the genotype quality control described in ref. 51, we used a Hardy–Weinberg P < 1 × 10−6.

Association with clinical characteristics

We used linear regression to determine the association between protein levels and systolic blood pressure, DBP, alanine, albumin, alkaline phosphatase, aspartate aminotransferase, bilirubin, cholesterol, gamma-glutamyl transferase, HDL, LDL, TGs and hemoglobin A1c. All P values were FDR-corrected.

DEPs and functional enrichment

We determined DEPs between cases and controls using limma53; we used a Benjamini–Hochberg FDR for multiple testing54. DEPs are defined as proteins with an FDR < 5% and a fold change greater than 0.5 (log2(fold change) > 0.5). To better understand the functional impact of the proteins, we used the enrichr tools from clusterProfiler55.

Proteomics quality control

The Olink’s proximity extension assay technology56 was used to measure the plasma level of 2,978 proteins in 528 samples across eight Olink panels. The levels of protein expression were measured logarithmically as Normalized Protein eXpression units. We adjusted all phenotypes using a linear regression for age, sex, plate number and sample collection season, followed by an inverse-normal transformation of the residuals. During the quality control process, we excluded one sample because the PCR plate well was empty; an additional two samples were further excluded because of a missingness greater than 40%. For assay quality control, 40 assays were excluded because they did not have Normalized Protein eXpression values. Additionally, we excluded 31 assays that had a fraction of assay warning greater than 15%. No assay was excluded because of limit of detection. In all, 525 samples and 2,873 assays remained after quality control and were subsequently used for further analysis.

Single-point association

Covariates such as sex, age, plate and mean protein expression per sample were regressed using R’s LM function. Residuals were then translated into z-scores and used for the association analysis. We used the single-point-analysis-pipeline v.0.0.2 (dev branch) (https://github.com/hmgu-itg/single-point-analysis-pipeline/tree/dev) to perform the association analysis for single-nucleotide polymorphisms with a MAF > 0.05. GCTA v.1.93.2 beta was used to conduct a mixed linear model association analysis; the genetic relationship matrix function within the GCTA software was used to estimate the genetic relationships among individuals. We then used GCTA-COJO, designed for approximate conditional and joint stepwise model selection, to identify independent associated variants at each locus.

Significance threshold

The confidence interval significant threshold was determined by multiplying the Bayes factors by the number of proteins tested; values over 1 were capped at 1. The Bayes factor was estimated using eigenMT57. eigenMT calculates Meff as the number of ranked eigenvalues from the adjusted genotype correlation matrix needed to account for 99% of the detected genotype variability. Subsequently, the corrected P values were adjusted for multiple testing by applying the FDR method. Q values were then calculated using the qvalue package, allowing for the identification of a subset of significant associations based on a q < 0.05. Finally, the cis threshold for significance in the pQTL analysis was determined by averaging the smallest nonsignificant P value and the largest significant P value. This method resulted in a cis P = 1.462 × 10−6. The trans threshold was calculated based on the effective number of variants (Neff) and the number of protein traits (Meff). The Neff was derived by performing linkage disequilibrium pruning with the indep 500 5 0.2 parameters in Plink v.1.958. This resulted in an Neff of 452,593 unique variants. The Meff was calculated using the Meff function and Gao method in the poolr R package59. The trans P value threshold is 2.227 × 10−10. Variants within 1 megabase (Mb) upstream or downstream of the encoding genes are referred to as cis-pQTLs, while trans-pQTLs are those found beyond 1 Mb relative to the encoding gene. Ensembl’s Variant Effect Predictor was used to determine the functional impact of the variants.

Comparison of pQTLs to prior published data

To determine the uniqueness of our pQTLs, we used an in-house-built database of previously identified signals of 46 genome-wide pQTL studies, including the UKB-PPP12. We evaluated novelty by identifying new loci and new variants. New loci were defined as those with no published variants within ±1 Mb of our variants. For variants at known loci, we checked their rsIDs against those previously reported. Variants with no prior matches were further conditioned (gcta-cojo-cond) in the context of other known variants at that locus. These were classified as new if the significance of their association P value (cis-pQTL: P < 1.462 × 10−6 and trans-pQTL P < 2.227 × 10−10) persisted even after adjusting for other known variants.

Colocalization analysis

We performed Bayesian-based colocalization analysis using the Coloc.fast function (https://github.com/tobyjohnson/gtx) between our pQTL signals and multi-ancestry T2D GWAS summary statistics29 from the DIAGRAM database. To assume shared genetics, we used default priors and a posterior probability of PP.H4 ≥ 0.8 (ref. 60). To increase statistical power and strengthen the robustness of our findings, a multi-ancestry GWAS (n = 2,535,601) was selected for the colocalization analysis rather than the largest African-specific meta-analysis (n = 154,160). The much larger sample sizes available in the multi-ancestry GWAS data facilitate higher resolution for signal localization and enhance the capacity to detect genetic associations.

MR

To identify putative causal effects, we performed a two-sample MR analysis using the cis-pQTL data in the UGR-PD as exposure and the multi-ancestry T2D GWAS meta-analysis29 as the outcome. The analyses were conducted using the TwoSampleMR61. We used the previously defined independent cis-pQTLs as genetic instrumental variables and considered only those with an F-statistic greater than ten. As all proteins had at most one independent cis-pQTL, we applied the Wald ratio estimate. The use of single instrumental variables limits the sensitivity analyses for assessing MR assumptions. Therefore, we assessed consistency in the direction of effects using the African T2D GWAS meta-analysis29. We chose the multi-ancestry T2D GWAS meta-analysis for the primary results to maximize statistical power, acknowledging that the population structure of the African T2D GWAS meta-analysis is also not entirely homogeneous with the UGR-PD. Moreover, we corroborated our findings with a colocalization analysis. However, differences in linkage disequilibrium structures between the pQTLs and T2D GWAS data reduced the power to detect colocalizing signals.

PheWAS

The PheWAS module of the GWAS Atlas62, a comprehensive database that integrates the findings of GWAS across several phenotypes and traits, was used to carry out the PheWAS. The analysis aimed to methodically assess a protein’s association with several phenotypes and traits. To account for the large number of tests, the module performs multiple testing corrections and organizes phenotypes into specified trait groups (such as metabolic, cardiovascular and immunological). A Bonferroni-corrected P = 1.05 × 10−5 was used to determine whether an association was significant.

Identification of effector genes

To find putative effector genes for T2D, we compiled effector genes associated with the T2D GWAS. This dataset was curated from nine papers published in the Type 2 Diabetes Knowledge Portal, resulting in a collection of 1,804 distinct effector genes. For classification purposes, proteins that were documented in our curated list were labeled ‘reported’. Those not found on the list were classified as ‘unresolved’.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.