Main

A long history of studies in human genetics has investigated how the powerful evolutionary pressure of endemic malaria has resulted in the selection of genetic variants that moderate the effect of the disease. In 1948, J. B. S. Haldane first suggested that heterozygous carriers of alleles that confer β-thalassaemia, a life-threatening anaemia at that time, could reduce mortality during malaria infection, and this ‘heterozygote advantage’ might explain the increased frequency of such alleles4.Since this initial proposal—commonly referred to as the malaria hypothesis—the increased frequencies, in areas where malaria was historically endemic, of not only β-thalassaemia, but also α-thalassaemia, sickle-cell anaemia (caused by the haemoglobin S (HbS) variant), haemoglobin C (HbC) disease, haemoglobin E (HbE) disease and ovalocytosis have all been hypothesized to be the result of selection5. Malaria resistance has also been proposed to account for the high frequency of alleles causing deficiency of the enzyme glucose-6-phosphate dehydrogenase (G6PD), and genetic studies have provided evidence for positive selection of the G6PD-deficiency A allele6.

Broader genome-wide association studies (GWASs) have now identified additional DNA variants that are associated with the clinical severity of malaria7,8,9,10,11,12, some of which—such as variation in the ATP2B4 gene13—are also associated with features of red blood cells (RBCs), the cells that host key stages of the malaria parasite’s life cycle in a human host. However, the mechanisms by which these variants and genes can both affect RBC parameters and ameliorate malaria remain mostly unknown.

Here we focus on explaining the molecular action of one allelic variant in the CCND3 gene region, in terms of its effects on both blood-cell traits and resistance to malaria. Our first finding was an association signal with increased levels of haemoglobin A2 (HbA2), marked by the derived T allele of single-nucleotide polymorphism (SNP) rs112233623 close to CCND3, the gene encoding the cyclin D3 protein1. Notably, this allelic variant showed a much higher frequency (minor allele frequency (MAF) = 10.1%) in a population cohort from Sardinia14, where the association was first detected, than in a sample set from northern Europe1 (MAF = 1%).

Within the same gene region, the CHARGE consortium study (composed mainly of northern European individuals)15 had previously found an association signal with reduced mean corpuscular volume (MCV) and increased RBC count, marked by the derived A allele of SNP rs9349205, located 160 bp upstream of rs112233623. Following up on this genetic finding, another group experimentally demonstrated the erythroid-cell enhancer activity of the core region surrounding SNP rs9349205, and a further increase in enhancer activity specifically for rs9349205-A, compared with the baseline ancestral G allele16. The researchers also showed that Ccnd3-knockout mice had a decreased RBC count and increased MCV, effects opposite to those of rs9349205-A in humans16. After further examination of human and mouse primary erythroid cells, the team proposed that cyclin D3 affects the number of cell divisions during the terminal differentiation of erythroid precursors, and thereby influences the size and number of erythrocytes, such that a lower level of CCND3 leads to fewer but larger RBCs16.

A more recent study in a large UK Biobank sample used statistical fine mapping in the CCND3 gene region and stepwise conditional analysis to identify rs9349205-A as the primary association for decreased RBC count and rs112233623-T as a putative independent associated variant with opposite effects2.

In addition to its effects on the cell cycle, the cyclin D3–CDK6 complex has been found in cancer cells to phosphorylate and thereby reduce the activity of the rate-limiting glycolytic pathway enzymes 6-phosphofructokinase (PFK1) and pyruvate kinase M2 (PKM2)3. Consequently, glucose-derived carbon is shunted into the pentose phosphate and serine pathways, leading to increased formation of NADPH, the source of reducing power to neutralize ROS3. To our knowledge, no investigation has so far reported whether these metabolic effects in cancer cells are recapitulated in RBCs and their precursors, and, if so, whether they are modified by rs112233623-T and rs9349205-A.

We therefore began this study with a series of related questions. How are the known actions of CCND3 related to the known erythrocyte trait effects of variant rs112233623-T? Does this allelic variant affect the known CCND3 erythroid enhancer, and, in particular, does it affect the binding of specific transcription factors? What effects are seen when rs112233623-T and rs9349205-A are inherited together on the same DNA molecule (haplotype)? Is the higher frequency of rs112233623-T in Sardinian individuals, compared with other European cohorts, more likely to be explained by random drift or positive selection during the evolutionary history of this population? And if selective mechanisms were involved, what was the selective pressure? In the present work, we address all of these queries, and we outline a role for reduced CCND3 function in malaria resistance and identify its probable mechanism. Our results suggest that inhibiting CCND3 could constitute a therapeutic modality for malaria.

Genetic associations in the CCND3 enhancer

Extending our previous work on the genetic architecture of haemoglobin levels in 6,305 individuals1, we examined genetic associations in the CCND3 gene region for a larger set of haematological traits in 6,824 individuals from the SardiNIA general population cohort (Methods). In contrast to findings in cohorts from northern Europe, where the association plots for the haematological traits RBC and MCV in this gene region are led by rs9349205-A (ref. 2), in Sardinia, rs112233623-T dominates the association profile, alongside another variant, rs113267280-G, that is in strong linkage disequilibrium (LD) with it (r2 = 0.96). These two variants form the credible set for HbA2, RBC MCV and mean corpuscular haemoglobin (MCH) associations (Methods, Extended Data Fig. 1 and Supplementary Table 1), and it is not possible to determine from the genetic association data which variant is primarily associated with these traits and which is secondarily associated owing to high LD. Still, there is evidence—such as a higher index of predicted deleterious effect for rs112233623-T than for rs113267280-G, using the combined annotation-dependent depletion (CADD) score (https://cadd.gs.washington.edu/score) (16.21 versus 2.27, respectively)—suggesting that rs112233623-T is the primarily associated variant, as was originally proposed for the HbA2 association1.

Focusing on this SNP and integrating previously reported associations1,2, the derived T allele of rs112233623 is associated with increased HbA2 (P = 4.03 × 10−22, β = +0.31), MCV (P = 3.77 × 10−29, β = +0.36) and MCH (P = 2.48 × 10−24, β = +0.32), and decreased RBC count (P = 1.18 × 10−16, β = −0.27), as well as with increased fetal haemoglobin (HbF) (P = 4.32 × 10−03, β = +0.09) (Table 1 and Supplementary Information). The phenotypic effects on MCV and RBC count are similar to the haematological profile seen in Ccnd3-knockout mice16, suggesting that the association signal marked by rs112233623-T results from reduced CCND3 activity, probably owing to decreased expression.

Table 1 Association results for the rs112233623-T variant

Of note, in the SardiNIA cohort, the derived allele A of rs9349205 associates with increased MCV (P = 8.8 × 10−5, β = +0.09) and MCH (P = 1.4 × 10−4, β = +0.09), and decreased RBC count (P = 1.6 × 10−2, β = −0.06). This is seemingly the same direction of effects as rs112233623-T, and opposite to the decrease in MCV and increase in RBC count previously reported for rs9349205-A in other populations2,15. However, the apparent contradiction can be resolved with conditional association analysis. After controlling for the effect of rs112233623-T, rs9349205-A was indeed associated with a reduction in MCV (P = 2.3 × 10−2, β = −0.05) and MCH (P = 3.3 × 10−2, β = −0.05), and an increased RBC count (P = 1.6 × 10−2, β = +0.06), consistent with findings in other populations and with increased CCND3 expression driven by rs9349205-A (refs. 16,17).

Similarly, in single-variant analyses, we find that rs9349205-A is associated with increased HbA2 (P = 1.4 × 10−9, β = +0.14), again with the same direction of effect as rs112233623-T. However, after conditional association analyses accounting for the effect of rs112233623-T, the association of rs9349205-A with HbA2 completely disappeared (P = 0.64), in contrast to the opposite effects of rs112233623-T on RBC count, MCV and MCH traits. This suggests that, unlike the putative decrease of CCND3 expression due to rs112233623-T, which has a strong positive effect on HbA2 levels, the increase of CCND3 expression due to rs9349205-A alone might not have any primary effect on HbA2 levels.

To better understand the contrasting association results for rs9349205-A and haematological traits in the SardiNIA cohort, compared with other European cohorts, and why conditional analysis adjusting for the effect of rs112233623-T renders the associations consistent, it is useful to examine the allele and haplotype frequencies of the two SNPs in various populations (for example, Sardinian individuals from the SardiNIA study cohort and European (EUR) individuals from the 1000 Genomes Project (1KG)18). The derived rs9349205-A allele is common both in SardiNIA and in 1KG EUR (frequency 21.8% and 24.4%, respectively), whereas the derived rs112233623-T allele is common in SardiNIA (frequency 10.1%) but much rarer in 1KG EUR, especially in the northern Europe subset (frequency 1%; Fig. 1). In Sardinian individuals, the two variants show considerable LD (r2 = 0.4, D′ = 1); the rs112233623-T allele is always found on an rs9349205-A background (Extended Data Fig. 2 and Supplementary Table 2), and roughly half of the individuals carrying rs9349205-A carry rs112233623-T.

Fig. 1: Frequency of the rs112233623-T and rs93429205-A alleles in various human populations.
Fig. 1: Frequency of the rs112233623-T and rs93429205-A alleles in various human populations.
Full size image

The frequencies of the rs112233623-T and rs93429205-A alleles in Sardinian and other populations were calculated in individuals who had undergone whole-genome sequencing, from the SardiNIA cohort (1,577 unrelated samples51; red circles) and from 1KG (2,504 unrelated samples18; grey circles). The allele frequencies for each variant, calculated under a binomial model, are shown as point estimates for each sampled population, with error bars indicating the 95% confidence interval. Any error bars that are not visible remain within the point symbol. Abbreviations on the right denote ancestry groups: EAFR, East African; EAS, East Asian; SAS, South Asian; WAFR, West African.

In 1KG EUR, the rs112233623-T allele is also observed in an rs9349205-A background, but because it is much rarer, nearly all of the rs9349205-A-carrying haplotypes contain the alternative rs112233623-C allele (Extended Data Fig. 2 and Supplementary Table 2). A plausible scenario is that the derived rs112233623-T allele arose on a haplotype with the rs9349205-A-derived allele background, and subsequently the derived|derived allele haplotypic configuration rose in frequency locally in Sardinia. The difference in the discovery-association results between Sardinian and northern European cohorts (shown above) can thus be explained by the higher frequency of the rs9349205-A|rs112233623-T haplotype in Sardinian individuals, in which the expected effect of rs9349205-A could be reversed by a larger inferred opposite effect of rs112233623-T under a simple additive model (Supplementary Information).

This evidence together suggests that rs112233623-T is the primary variant underlying an association with several blood-cell traits that might be mediated by changes in CCND3 expression. Next, therefore, we sought to assess the effects of rs112233623-T on CCND3 expression and cell-cycle progression in erythroblast cultures.

rs112233623-T and reduced CCND3 expression

To assess the effects of rs112233623-T on CCND3 erythroid expression, we used quantitative PCR with reverse transcription (RT–qPCR) and western blot analysis in in-vitro-differentiated primary erythroblasts derived from SardiNIA cohort volunteers who were homozygous for either the wild-type (WT) rs112233623-C allele (CC genotype) or the derived rs112233623-T allele (TT genotype). We focused on the basophilic stage of erythroblast differentiation, which corresponds to the peak of CCND3 expression19,20 (Extended Data Fig. 3). Compared with erythroblasts from individuals with the rs112233623-CC genotype, CCND3 expression was reduced at both the mRNA and the protein level in erythroblasts from individuals with the rs112233623-TT genotype (Fig. 2a and Extended Data Fig. 4a).

Fig. 2: Influence of the rs112233623-T allele on CCND3 expression and the cell cycle in erythroid cultures.
Fig. 2: Influence of the rs112233623-T allele on CCND3 expression and the cell cycle in erythroid cultures.
Full size image

a, Relative expression of CCND3 mRNA in cultured erythrocyte progenitors (erythroblasts) from individuals who are homozygous for the WT allele rs112233623-C (CC), compared with those who are homozygous for the derived allele rs112233623-T (TT). The expression of CCND3 was quantified by RT–qPCR relative to the expression of the endogenous control GAPDH at day 7 of the in-vitro-differentiated erythroid culture (for each genotype, n = 11 individuals). In the box plots, the line across the box indicates CCND3 median expression, the box ranges from the first quartile to the third quartile of the distribution and the whiskers extend from each box to the most extreme data points. A Wilcoxon–Mann–Whitney test was used, with the level of significance indicated by asterisks (**P < 0.01; one-sided P = 9.62 × 10−3). b, Histograms showing the distribution of cell-cycle stages, reported as G0–G1, S and G2–M, in erythroblasts derived from individuals homozygous for the derived rs112233623-T allele versus the WT rs112233623-C allele at two time points of erythroid differentiation: day 7 (left) and day 9 (right). Data are mean ± s.e.m.; n = 4 per genotype. A two-sided Student’s t-test was used, with the level of significance indicated by asterisks (*P < 0.05; NS, not significant). The statistical results for all comparisons are provided in Supplementary Table 3. c,d, Analysis of erythroblasts derived from individuals homozygous for the WT rs112233623-C allele (c) or the derived rs112233623-T allele (d) at day 9 of erythroid differentiation. In each panel, from left to right: representative flow-cytometry plots with propidium iodide staining, showing the distribution of cell-cycle phases (G0–G1, S and G2–M); representative flow-cytometry dot plots of GPA+-gated erythroblasts, showing the expression of Band 3 and integrin α4 (data are mean ± s.e.m.; n = 4 per genotype); and cytospin preparations stained with May–Grünwald–Giemsa (scale bars, 20 μm; 50× magnification).

Source Data

Using erythroblasts derived from volunteers in the same cohort, we further tested whether the reduction in CCND3 expression associated with the rs112233623-T allele inhibits the transition from G1 to S phase, as previously described for the loss of function of Ccnd3 in knockout mice16. We measured cell-cycle levels at two time points of differentiation—day 7 and day 9—corresponding mainly to the basophilic and to the polychromatophilic stages of erythroblast differentiation, respectively; at the latter polychromatophilic stage, CCND3 expression declines (Extended Data Fig. 3), according to published RNA-sequencing (RNA-seq) data19,20. In erythroblasts with the rs112233623 TT genotype, a higher percentage of cells in G0–G1 and a lower percentage of cells in S and G2–M phases were observed at day 7, and these differences became significant at day 9 of differentiation (Fig. 2b–d and Extended Data Fig. 4c,d). These results were confirmed by the differential growth curves observed for the two genotypes (Extended Data Fig. 4b).

These results support a model in which the rs112233623-T allele is associated with reduced CCND3 expression, which, in turn, leads to reduced cell-cycle progression during erythropoiesis, a reduction in the number of mitoses during erythropoiesis, a higher MCV and a lower RBC count. We next sought to understand the mechanism by which of rs112233623-T affects CCND3 expression.

An inhibitory site in the CCND3 enhancer

We next investigated how the activity of the CCND3 enhancer region, where rs112233623 and rs9349205 are located, might be altered or disarmed by rs112233623-T to account for the reduced expression of CCND3 and resulting haematological associations. We cloned a 372-bp DNA fragment of the enhancer surrounding rs9349205 and rs112233623, carrying either the WT configuration (the G|C allelic combination across both SNPs, respectively, in the haplotype) or the combined derived alleles (the A|T allelic combination) of these SNPs, into a luciferase expression vector. The constructs were then tested for their ability to activate expression of luciferase in human umbilical cord-derived erythroid progenitor 2 (HUDEP-2) cells, a model that can recapitulate the development of erythroid cells21. In these transfection experiments, the derived A|T allele combination showed a tenfold reduction in enhancer activity, compared with the WT G|C allele combination (Extended Data Fig. 5a), despite experimental findings showing that the A-derived allele of rs9349205 alone increases activity16. These functional results are thus consistent with our genetic evidence that the action of rs112233623-T overrides the opposite effect of rs9349205-A. They are also in agreement with the reduced expression of CCND3 mRNA in erythroblasts derived from individuals who are homozygous for the rs112233623-T allele (shown above), and with the results of another study, which found that shorter constructs that retained the T allele had reduced activity in luciferase transactivation experiments in K562 erythroleukaemia cells2.

The luciferase reporter results suggested that the rs112233623-T allele modifies transcription-factor binding. We investigated this possibility further by in silico inference of putative transcription-factor-binding sites in the sequence surrounding the rs112233623 variant (https://bio.tools/alggen). We identified a consensus binding sequence for SMAD3 (CAGACA) in the WT sequence, the fifth nucleotide of which corresponds to rs112233623-C. SMAD3 is a member of the SMAD family of transcription factors, which are involved in late erythroid maturation22. Notably, the change rs112233623C>T in the binding sequence (CAGATA) would be expected to abolish SMAD3 binding while creating a putative binding site (AGATA[A/G]) for GATA1 (Extended Data Fig. 5b). GATA1 is a key member of the GATA family of transcription factors, which are involved in haematopoietic development23, and acts as a repressor in this context (see below).

The predictions of transcription-factor binding from the in silico analysis were experimentally confirmed in vitro using labelled oligonucleotide probes containing either the WT (C) or the derived (T) allele of rs112233623 in electrophoretic mobility shift assays (EMSAs) in cell extracts expressing either SMAD3 or GATA1 protein (Methods). By supershift analysis with antibodies and competition analysis, we found that the C>T transition in rs112233623 indeed reduced the binding of SMAD3 (Extended Data Fig. 5c) and increased the binding of GATA1 (Extended Data Fig. 5d).

The differential binding of SMAD3 to the region surrounding rs112233623 was confirmed by chromatin immunoprecipitation (ChIP) assays in erythroblasts from volunteers homozygous for either WT rs112233623-C or the derived rs112233623-T allele (Extended Data Fig. 5e). However, in the ChIP assays, GATA1 binding remained relatively unchanged, presumably compensated by a nearby GATA motif (Extended Data Fig. 5f).

We next confirmed the proposed effects of the variants on the erythroid-specific enhancer of CCND3 in the presence of SMAD3. Luciferase expression constructs containing the CCND3 enhancer bearing the rs112233623-C allele were expressed in HUDEP-2 cells at higher levels after transactivation with SMAD3, especially when the coactivator p300 was also provided24. Conversely, the level of expression of constructs bearing the rs112233623-T allele showed no response to transactivation (Extended Data Fig. 5a).

By contrast, the same CCND3 luciferase constructs expressed in HUDEP-2 cells but cotransfected with GATA1 in combination with its coactivator FOG125 showed an even greater reduction in luciferase activity of the CCND3 enhancer bearing the rs112233623-T derived allele. The activity of the CCND3 enhancer was also reduced, albeit to a lesser extent, for luciferase constructs containing the WT rs112233623-C allele (Extended Data Fig. 5a). A plausible explanation for this intrinsically slight repressive effect of GATA1 on the WT enhancer is the presence of an additional canonical GATA1 site in the WT enhancer fragment, which was shown to be occupied by the GATA1 protein in previous ChIP–seq assays16.

Together, these data confirm that the rs112233623-T derived allele of the CCND3 erythroid enhancer decreases the expression and impairs the activity of CCND3 through reduced binding of the activator SMAD3 and increased binding of the repressor GATA1.

Blocking TGFβ–SMAD reduces the levels of cyclin D3

Because SMAD3 and the highly related SMAD2 are intermediate effectors of TGFβ signalling26, we thought that the TGFβ–SMAD3 pathway would be likely to modulate the expression of CCND3 in erythroid cells. We verified this hypothesis by analysing the expression of SMAD3 and CCND3 in HUDEP-2 cells at various time points after treatment with TGFβ. qPCR analysis showed that induction by TGFβ led to a rapid increase in SMAD3 expression, followed by a substantial increase in the expression of CCND3 (Extended Data Fig. 6a). This result is consistent with a previous study, in which TGFβ increased the expression of CCND3 in MCF-7 cells27. Moreover, treating HUDEP-2 cells simultaneously with both TGFβ and SB-505124—a specific inhibitor of the phosphorylation of the TGFβR1 receptors ALK4 and ALK5—prevents SMAD3 phosphorylation and thereby precludes any subsequent TGFβ-mediated increase in the expression of CCND3.

To further validate the stimulation of CCND3 by SMAD3 in HUDEP-2 cells, while accounting for potential compensatory actions of SMAD228, we inactivated both SMAD3 and SMAD2 by CRISPR–Cas9 targeted editing. This resulted in an overall reduction of 60% in the expression of the target CCND3 gene, with an even greater decrease of 86% at the protein level (cyclin D3/β-actin ratio; Extended Data Fig. 6b, bottom). These findings are in agreement with previous observations of a significant reduction in Ccnd3 expression in mouse stromal cells when both Smad2 and Smad3 were knocked down29.

Selection for low erythroid levels of cyclin D3

Given its phenotypic effects, the evolutionary history of the CCND3 rs112233623-T allele is of particular interest. The rs112233623-T allele is common in the SardiNIA cohort (MAF of around 10%), less frequent in cohorts from mainland Italy (MAF of around 2%), rarer still in cohorts from northern Europe and South Asia (MAF < 1%) and thus far undetected in cohorts from Africa and East Asia (Fig. 1). We used evidence from allele frequency differentiation and haplotype variation (Methods) to assess whether the higher frequency of rs112233623-T in the SardiNIA cohort is likely to be the result of positive selection that favoured this specific variant in the genetic history of Sardinia, or whether patterns of variation at the locus are best explained by random genetic drift. Consistent with positive selection on this variant, the observed differentiation in allele frequency between SardiNIA and 1KG EUR samples is greater than that seen for the vast majority of 1,075 genomic variants matched for allele frequency in Sardinia and other pertinent features (FST = 0.094, 99.9th ‘genomic percentile’ of differentiation; Extended Data Fig. 7a). Using a more specific metric for positive selection based on the length of the core haplotype around the variant of interest (integrated haplotype score; iHS30), the haplotype carrying rs112233623-T shows a level of extended homozygosity in SardiNIA that is in the tail of values observed in matched variants across the genome (iHS = 2.16, 99.1th genomic percentile; Extended Data Fig. 7b,d). Likewise, the haplotype diversity is reduced in individuals from the SardiNIA cohort, compared with the 1KG EUR cohort (cross-population extended haplotype homozygosity (xp-EHH) = 1.99, 99.4th genomic percentile; Extended Data Fig. 7c). Reconstruction of allele frequency trajectories over time using a local ancestral recombination graph surrounding rs112233623 (using CLUES31; Methods) showed a rise in the frequency of rs112233623-T towards the present time, with a paired selection coefficient point estimate for this allelic variant that was among the highest values obtained for comparable genomic variants (s = 0.026, 99.5th genomic percentile; Extended Data Fig. 8 and Supplementary Information). Together, these observations support the inference that the haplotype carrying the rs112233623-T decrease of expression (DoE) allele had a beneficial effect on fitness in the genetic history of Sardinia.

CCND3 DoE impairs P. falciparum through ROS

Given the already described phenotypic effects on blood-cell traits, and the plausible evidence for positive selection on the CCND3 DoE rs112233623-T variant, we wondered whether the variant is associated with malaria outcomes—especially because malaria was more prevalent in Sardinia than it was anywhere else in Europe until its eradication there around the 1950s32,33. To investigate this possibility, we assessed the effect of this variant on the P. falciparum life cycle in vitro, using RBCs derived from study participants. We also tested the CCND3 increase of expression (IoE) rs9349205-A variant, given its opposite biological—and thus possibly opposite evolutionary—effects, as well as the combination of both alleles. To do this, we used the SardiNIA general population cohort, recalling individuals who were homozygous for the haplotypes carrying the variants of interest (Fig. 3a). The individuals included 17 homozygotes for the CCND3 IoE rs9349205-A and the DoE rs112233623-T derived alleles (IoE/IoE|DoE/DoE); 14 homozygotes for the IoE allele rs9349205-A and for the WT allele rs112233623-C (IoE/IoE|WT/WT); and, as a control group, 7 homozygotes for the WT basal expression alleles rs112233623-C and rs9349205-G (WT/WT|WT/WT). To reduce background variation and avoid confounding, the individuals in these categories were selected from among those who do not carry other reported protective malaria variants that are common in Sardinia (for example, β- or α-globin or the G6PD-deficient Mediterranean (G6PD-Med) variants; Methods). We also included an additional group of five male individuals hemizygous for the G6PD-Med allele (rs5030868-A) and carrying the WT genotype at rs9349205 and rs112233623 (WT/WT|WT/WT, G6PD-Med); this permitted us to compare the CCND3 variants of interest with a variant that has been previously proposed to confer protection against malaria derived from P. falciparum and Plasmodium vivax34,35,36,37.

Fig. 3: Effects of genotype variation at rs112233623 and rs9349205 in RBCs infected with P. falciparum.
Fig. 3: Effects of genotype variation at rs112233623 and rs9349205 in RBCs infected with P. falciparum.
Full size image

a, Schematic representation of the CCND3 erythroid enhancer, showing the relative positions of the SNPs based on the CCND3 full-length coding sequence (chromosome 6, NC_000006.12, GRCh38.p14). b, Level of P. falciparum parasitaemia at each of three life cycles (48, 96 and 144 hpi) in RBC samples from donors with the following genotypes: homozygotes for the IoE rs9349205-A allele and the WT basal expression rs112233623-C allele (IoE/IoE|WT/WT; n = 14) in red; homozygotes for the rs9349205-G and rs12233623-C WT alleles (WT/WT|WT/WT; n = 7) in orange; homozygotes for the rs9349205-A IoE and rs112233623-T DoE alleles (IoE/IoE|DoE/DoE; n = 17) in blue; and homozygotes for the rs9349205-G and rs112233623-C WT alleles, and hemizygotes for the G6PD Mediterranean allele (G6PD-Med; n = 5), in green. In the box plots, the horizontal line indicates the median level of parasitaemia, the box denotes the interquartile range (IQR; 25th–75th percentile) and the whiskers extend to the most extreme data points within 1.5 × IQR of the box, with points beyond this range plotted individually as outliers. c, Percentage distribution of different parasite stages (rings, trophozoites, schizonts and dead parasites) over three life cycles in RBCs. Samples are as in b, with genotypes at the top of the graphs. The first cycle is defined as 24–48 hpi, the second cycle as 72–96 hpi and the third cycle as 120–144 hpi. Data are mean ± s.d. for each stage. d, Percentage of parasite growth inhibition, normalized to the control group (WT/WT|WT/WT), at each of three life cycles (48, 96 and 144 hpi, respectively) in the same samples as in b. Data are mean ± s.d.; colours correspond to the genotypes in b. e, Box plots of ROS (H2O2) concentration in RBCs from donors with the indicated genotypes, coloured as in b (n = 10, 7, 14 and 5 biologically independent samples, respectively). f, Correlation between P. falciparum parasitaemia and ROS (H2O2) levels in infected RBCs at 144 hpi. Values for each donor are indicated by dots, with colours corresponding to genotypes as in b. All P values in b,d,e are from two-sided non-parametric Wilcoxon–Mann–Whitney tests, with significance indicated by asterisks (*P < 0.05, **P < 0.01, ***P < 0.001 and ****P < 0.0001). Statistics for all comparisons are provided in Supplementary Table 3.

Source Data

To assess genotypic effects on malaria resistance, RBCs purified from recalled SardiNIA donors were infected with the Uganda Palo Alto strain of P. falciparum (FUP) and maintained in culture for three blood-stage cycles, as described in the Methods.

Infected RBCs from individuals with the IoE/IoE|DoE/DoE genotype (that is, homozygous for the haplotype carrying the CCND3 IoE and the DoE alleles together) exhibited progressively impaired P. falciparum growth, as indicated by reduced parasitaemia levels when compared with the WT/WT|WT/WT control group (Fig. 3b and Supplementary Table 6). Notably, the inhibition of parasite growth in IoE/IoE|DoE/DoE RBCs, which was already evident during the first cycle and suggestive of reduced invasion efficiency, became even more evident from the second cycle onwards, with parasites being retained in the schizont stage, leading to delayed merozoite egress and reinvasion of uninfected RBCs (Fig. 3c). Furthermore, the growth restriction became more pronounced in the third cycle, manifesting as a marked reduction in viable parasites, compared with the control group (WT/WT|WT/WT; Fig. 3c). A significant inhibition of parasite growth was observed in IoE/IoE|DoE/DoE RBCs (Fig. 3d), with comparable effects seen in hemizygous G6PD-Med erythrocytes (WT/WT|WT/WT G6PD-Med) (Fig. 3b–d).

By contrast, RBCs from individuals carrying the IoE/IoE|WT/WT genotype (that is, homozygous for the haplotype split carrying the IoE allele but not the DoE allele) showed the highest level of parasitaemia, faster parasite development and a lack of inhibition of parasite growth, compared with the other genotypes (Fig. 3b–d).

The differences in parasite development between the various genotypes are also evident in photomicrographs of blood smears (Extended Data Fig. 9) and in the videos available at https://figshare.com/s/d89f8f9f90d3a76eccb9 (ref. 38).

Consistent results with regard to parasitaemia levels were also obtained by repeating the same experiments in RBCs from a smaller group of recalled individuals carrying the same genotypes of interest as those studied in the previous experiments, but infected in parallel with the P. falciparum FUP strain and two other P. falciparum isolates (3D7 and DD2). This experiment suggests that the observed associations, including those between DoE rs112233623-T and P. falciparum growth, are unlikely to be influenced by genetic differences in P. falciparum (Extended Data Fig. 10).

Together, these in vitro results show that parasite growth is substantially reduced in RBCs of DoE rs112233623-T carriers. Furthermore, the results suggest a modest—if any—detrimental effect of the IoE rs9349205-A on parasitaemia during malaria infection when it is not on the same haplotype as rs112233623-T.

Given the reported effects of inhibition of cyclin D3 on oxidative stress in various cancer-cell lineages3, we hypothesized that rs112233623-T-mediated reduction of CCND3 expression might result in an increase in ROS in infected RBCs, and that this could eventually contribute to parasite clearance. This would thus be analogous to the effect seen when G6PD, the first enzyme in the pentose phosphate pathway, is lacking. Opposite effects in the production of ROS would then be expected in infected RBCs from carriers of rs9349205-A who do not also carry rs112233623-T. To test these hypotheses, we used liquid chromatography and mass spectrometry (LC–MS) to measure the levels of ROS (hydrogen peroxide; H2O2) in RBCs from donors with different CCND3 and G6PD genotypes (see above). RBCs purified from individuals homozygous for the DoE rs112233623-T allele indeed showed a significant increase in the concentration of ROS, compared with RBCs purified from individuals homozygous for the WT rs112233623-C allele—although not as extreme as that seen in individuals hemizygous for G6PD deficiency (G6PD-Med; Fig. 3e). By contrast, a reduction in ROS levels was observed in individuals homozygous for the IoE rs9349205-A allele, compared with those who were homozygous for the WT rs9349205-G allele (and thus also homozygous for WT rs112233623-C). When we correlated ROS levels with P. falciparum growth data, we observed a significant inverse correlation between ROS and parasite development during the third cycle (144 hours post-infection; hpi), with individuals from different CCND3 genotype categories clustering coherently (Fig. 3f).

These findings suggest that the reduction of P. falciparum growth in RBCs and the increased levels of ROS observed in the context of reduced CCND3 expression and G6PD-Med deficiency are likely to reflect a shared mechanism of resistance based on increased oxidative stress.

Discussion

Here we provide direct experimental evidence showing that the genetic variant rs112233623-T has a causal role in haematological trait variation and confers protection against malaria. We show that rs112233623-T abolishes a DNA-binding site for the activating transcription factor SMAD3 and creates a binding site for the repressive transcription factor GATA1. The consequences are reduced expression of CCND3 and, as owing to the lower level of cyclin D3 protein, inhibition of the G1–S transition in RBC precursors. The steady-state number of RBCs therefore decreases, whereas the volume of each cell increases. At the same time, HbA2—and, to a lesser extent, HbF—increases, highlighting a specific window during terminal erythropoiesis in which the shortening of the cell cycle reduces (albeit to varying degrees) the typical increase in the δ-globin/β-globin and γ-globin/β-globin synthesis ratios as erythroblasts complete their maturation39,40.

We infer that rs112233623-T is likely to have arisen on a background haplotype that also included another variant, rs9349205-A, which, when not co-inherited with rs112233623-T, has an opposite though weaker effect on CCND3 expression and haematological phenotypes. Capitalizing on the unique genetic architecture of the CCND3 gene region in the Sardinian population, we show that when the two variants are co-inherited, the smaller effect of rs9349205-A is overwhelmed by the stronger effect of rs112233623-T, which reduces CCND3 expression and has a knockout-like phenotypic effect on RBC traits (Fig. 4). The findings at the population level are also supported at the molecular level, and specifically in transfection experiments of erythroid cell lines using cloned DNA fragments with the two allelic variants together, in which we observed a marked reduction in enhancer activity, compared with the increase in activity reported for the derived A allele of rs9349205 alone. There is a probable mechanistic basis for the observed effects. Previous results have shown that rs9349205-A enhances CCND3 expression during erythroid differentiation by fostering an increase of chromatin accessibility at the enhancer17. On the basis of our finding that SMAD3 can no longer bind to the binding site after it has been altered by rs112233623-T, any effect of rs9349205-A on chromatin accessibility would be irrelevant on an rs112233623-T background.

Fig. 4: Influence of different rs9349205–rs112233623 haplotypes on the expression and function of CCND3.
Fig. 4: Influence of different rs9349205–rs112233623 haplotypes on the expression and function of CCND3.
Full size image

The white oblongs represent the WT G and C alleles at the rs9349205 and rs112233623 loci, respectively. The derived alleles at these loci are indicated by red and blue oblongs, respectively. The top dotted arrow depicts an earlier mutation (occurring before modern humans migrated out of Africa) from rs9349205-G to rs9349205-A. The bottom dotted arrow shows a more recent mutation (after modern humans left Africa) from rs112233623-C to rs112233623-T. The corresponding effects of these mutations and derived haplotypes on gene expression and function are also illustrated schematically.

The rise in the frequency of rs112233623-T in the Sardinian population seems to have occurred rapidly—sufficiently rapidly to suggest that positive selection of rs112233623-T occurred in populations with Sardinian ancestry (Extended Data Figs. 7 and 8). Consistent with the crucial function of cyclin D3 in RBC precursors, we considered whether past endemic malaria in Sardinia was the primary driver of this positive selection. Using cohort-based association studies to test for a possible protective effect of rs112233623-T against malaria infection is not possible, because the variant is absent in areas of the world where malaria is still endemic, and malaria was eradicated in Sardinia, where the allele is common, in the early 1950s. Nonetheless, we were able to identify an antimalarial effect in vitro by infecting cultured erythrocytes from Sardinian individuals with varying rs112233623 genotypes, as well as other variants of interest, with P. falciparum. This was made feasible by the high frequency of the T allele in Sardinia and the availability of a bioresource of thousands of genetically profiled individuals. We found that in RBCs of individuals homozygous for the CCND3 DoE rs112233623-T variant, there was a substantial inhibition of Plasmodium development, ultimately leading to parasite death. This was correlated with a substantial increase in the levels of ROS in these cells, comparable with that seen in individuals hemizygous for G6PD deficiency (Fig. 3).

The strong inhibitory effect of the rs112233623-T variant on parasitaemia in vitro, together with evidence for its positive selection in a population with a long history of endemic malaria, supports the hypothesis that this variant is protective against severe malaria, which is inherently associated with high levels of parasitaemia. This predicted defence against severe malaria by increased ROS production taps into the same pathway as G6PD deficiency, and is one of the main proposed mechanisms of action of commonly used antimalarial drugs such as quinolines, atovaquone and artemisinin and its derivatives41,42. All of these drugs have been shown to increase oxidative stress to levels that compromise malarial parasites, resulting in damage to cellular components, including lipids, proteins and DNA, and ultimately to cell death via apoptosis or necrosis. In fact, documented increases of ROS in individuals with the sickle-cell trait or β-thalassaemia43,44 raise the possibility that oxidative stress, which has been regarded as a feature of pathophysiology of other RBC disorders, is also involved in their mediation of resistance to malaria.

In addition to the ROS effects, we propose that the antimalarial action of reduced CCND3 expression is also mediated by its effects on the cell cycle of erythroblasts in the bone marrow, the preferential niche for the commitment and development of P. falciparum gametocytes. Previous research has shown that the maturation time of gametocytes in erythroblasts (around 10 days) is longer than the time that it normally takes for erythroblasts to develop from orthochromatophilic stages to reticulocytes in vitro (about 3–4 days)45. A prolonged period of erythroblast development is promoted by Plasmodium to ensure its gametocytogenesis through the release of factors that affect both infected and uninfected erythroblasts45, contributing to the ineffective erythropoiesis and anaemia observed in malaria. The reported lower number of erythroblast divisions when cyclin D3 expression is reduced16 might thus counteract the delay in the differentiation of erythroblasts that is normally imposed and needed by P. falciparum.

A genetically inherited reduction in CCND3 expression could thus prevent severe anaemia—a major cause of death—in two ways: first, by alleviating peripheral haemolysis through increased ROS levels; and second, by ameliorating ineffective erythropoiesis by reducing cell divisions during terminal erythropoiesis in the bone marrow. This latter effect could also prevent the sexual maturation of the parasite, thereby hindering disease transmission.

The decrease of CCND3 expression might have a similar protective role in P. vivax infections in the bone marrow46, and a comparable protective effect of increased ROS production in peripheral blood during P. vivax infections is consistent with the reported protection against severe P. vivax infection conferred by the G6PD-Med variant35 (rs5030868-A).

In terms of the therapeutic relevance of our findings, because CCND3 is expressed during erythroblast maturation but not in mature erythrocytes, the antimalarial effects of CCND3 inhibition are likely to be related mainly to its action in the bone marrow. These include immediate cell-cycle effects—moderating ineffective erythropoiesis—and delayed metabolic effects against peripheral haemolysis through increased ROS in erythrocytes, which will depend on the daily replacement rate of erythrocytes from the bone marrow. However, compared with the steady state, erythrocyte turnover is also expected to be significantly increased by inhibition of CCND3 in the bone marrow and by the erythropoietic stress caused by haemolysis during malaria infection. Our findings thus provide a rationale for inhibiting cyclin D3 as early as possible during parasite infection. Cyclin D3 inhibition, including in combination therapies with artemisinin, could thus be used to help treat malaria and reduce its transmission and disease burden, which remain a major health problem in many regions of the world47.

In addition, anti-cyclin-D3 therapies could also help to control Burkitt’s lymphoma—an aggressive non-Hodgkin B cell lymphoma that has been clearly linked to malaria in highly malaria-endemic areas of Africa48—both indirectly, by controlling the spread of malaria, and more directly, by counteracting the effect of known somatic mutations increasing the activity of CCND3, which have been shown to promote its pathogenesis49.

The requirement for cyclin D3 in the development of immune cells50 suggests that caution is needed in interventions with antagonizing therapies, but the possibility that cyclin D3 can tolerate some degree of inhibition for a defined period of time is supported by data showing that even Ccnd3-knockout mice are vital and develop into adulthood16.

Methods

Cell lines and primary cell cultures

All cells were cultured under 5% CO2 at 37 °C. HEK-293T and COS7 cells (ATCC) were cultured in Dulbecco’s modified Eagle’s medium (DMEM) with 10% heat-inactivated fetal bovine serum (FBS) (Gibco), 1% l-glutamine and 1% penicillin–streptomycin (Sigma). HUDEP-2 cells (provided by Y. Nakamura and R. Kurita) were cultured as described21.

CD34+ haematopoietic stem and progenitor cells were isolated from peripheral blood from donors and sorted using CD34+ MACS (Miltenyi) according to the manufacturer’s protocol. The cells were differentiated using a three-phase erythroid in vitro differentiation protocol as previously described52,53. In brief, the cells were expanded in StemSpan medium (Stem Cell Technologies) supplemented with 100 ng ml−1 stem cell factor (SCF) (PeproTech), 100 ng ml−1 Flt-3 (PeproTech), 100 ng ml−1 thrombopoietin (TPO) (PeproTech) and 20 ng ml−1 IL-3 (PeproTech) for a total of six days.

After six days, the cells were reseeded in phase I differentiation medium (days 1–7) consisting of Iscove’s modified Dulbecco’s medium (IMDM) (Gibco) supplemented with 2% human plasma (Stem Cell Technologies), 3% human serum (Sigma), 3 units ml−1 heparin (Sigma), 10 µg ml−1 insulin (Sigma), 3 units ml−1 erythropoietin (Janssen-Cilag), 10 ng ml−1 SCF (PeproTech), 1 ng ml−1 IL-3 (PeproTech) and 200 µg ml−1 holo-transferrin (Sigma), at a density of around 0.5 × 106–1 × 106 cells per ml for a total of seven days. Phase II medium (days 8–12) included the same cytokines except for IL-3. During phase III (days 13 and beyond) holo-transferrin was increased to 1 mg ml−1 and SCF was withdrawn. Erythroid differentiation was monitored by fluorescence-activated cell sorting (FACS) analysis and May–Grünwald–Giemsa (RAL Diagnostics) staining of cytospin slides. Images were captured from cytospin slides using a PrimoStar 3 microscope (Zeiss) coupled with an Axiocam 208 camera (Zeiss). For growth-curve analysis, cultured erythrocyte progenitor cells were seeded in the same amounts and then counted at the indicated days both by haemocytometer (Tali, Invitrogen) and by microscope with trypan blue staining (Invitrogen) to exclude dead cells. To test for TGFβ stimulation, HUDEP-2 cells were stimulated with 5 ng ml−1 TGFβ (Sigma) and sampled for SMAD3 and CCND3 mRNA expression profiling at various time points after the treatment (0 min, 60 min, 120 min, 240 min and 300 min). The inhibitory effects on this pathway were evaluated with 20 μM SB-505124 (Sigma) in 0.2% DMSO, as previously described54. All of the cells used tested negative for mycoplasma contamination.

Human blood samples

Whole-blood samples were collected from volunteers enrolled in the SardiNIA general population cohort14, at the project centre in Lanusei (Ogliastra, Sardinia). All participants gave informed consent for the study protocols, which were approved by the Sardinian Regional Ethics Committee (protocol 2171/CE). These participants have agreed in principle to be recalled for functional follow-up experiments based on their genotypes at loci of interest, such as those reported in this study. The samples collected for the experiments reported in this study were specifically selected to ensure representation of homozygous genotypes at the loci of interest, rs112233623 and rs9349205 close to the CCND3 gene. For the malaria experiments, individuals who were carriers of the most frequent allelic variants in Sardinia that have previously been associated with protection against malaria were excluded from the collection. These variants include the β039 thalassaemia mutation (rs11549407-A) in the HBB gene (frequency = 0.051), the 3.7-kb α-thalassaemia deletion (NG_000006.1:g.34164_37967del3804; frequency = 0.14) and the Mediterranean deficiency variant (rs5030868-A) in the G6PD gene (frequency = 0.083), with the exception of a small group of samples that were selected to be hemizygous for the G6PD Mediterranean deficiency variant. We also tried to ensure a balance in terms of sex in the various genotype groups being compared (Supplementary Information).

Whole blood was collected using heparin vacutainer tubes (BD Biosciences) for gene-expression and cell-cycle experiments based on the generation of erythrocyte progenitor cells, and acid–citrate–dextrose (ACD) vacutainer tubes (BD Biosciences) for P. falciparum infection experiments. The blood was kept at ambient temperature in vitro before processing for the specific experiments, all of which began on the same day. Further details about sample collection are provided in the Supplementary Information, and the overlap of donors across the functional and P. falciparum growth experiments is detailed in Supplementary Table 5.

Flow-cytometry staining

All staining steps were performed at 4 °C, with efforts made to minimize sample exposure to ambient light. For flow-cytometry analysis, 3 × 105 in-vitro-differentiated erythroid progenitor cells were washed twice in PBS and resuspended in staining buffer (1% FBS in PBS). Cells were surface-stained for 20 min with the following antibodies: 1:50 PC7-conjugated CD235a (glycophorin A, Beckman Coulter), 1:100 PE-conjugated anti-Band3 (IBGRL) and 1:20 APC-conjugated anti-CD49d (integrin α4 chain, Beckman Coulter). After washing, cells were resuspended in 500 μl FACSFlow (BD Biosciences). For the single-cell population, CD235a–PC7 versus FSC-A was used to gate on CD235a–PC7-positive cells. Erythroblast maturation was evaluated by flow cytometry through the simultaneous detection of surface markers integrin α4, glycophorin A (GPA) and Band 3, using anti-CD49d–APC, anti-CD235a–PC7 and anti-Band3–PE monoclonal antibodies, respectively. Data were collected using BD FACS Accuri C6 Plus (BD Biosciences). For cell-cycle analysis at each time point, cultured erythroid progenitor cells were collected, washed in PBS, resuspended in permeabilizing solution and incubated at 37 °C for 30 min. Propidium iodide (BD Biosciences) was then added to a final concentration of 10 μg ml−1, and cells were incubated for an additional 10 min at room temperature before analysis. Cells were acquired on a FACSCanto (BD Biosciences) coupled with FACSDiva software (v.6.1.3).

The data analysis was performed using FlowJo software (v.10.10.0).

Luciferase reporter analysis

Firefly luciferase reporter constructs (pGL4.23, Promega) were generated by cloning the variant of interest centred in 372 nucleotides of genomic context upstream of the minimal promoter. Expression plasmids were prepared by subcloning human cDNA clones for each of SMAD3, p300 (clone provided by D. Barilà) and GATA1 (clone provided by F. Grosveld) into the pEF5-HA vector (vector provided by N. Bottini). The expression plasmid pCMV6-Entry FOG1 was purchased from Origene. To avoid erroneous normalization of transfection efficiency, we used a Renilla mutant control plasmid (pRLSV40-mut-GATA) in which the GATA response elements had been mutagenized, thus rendering pRL-SV40 unresponsive to GATA stimulation. Primer sequences are listed in Supplementary Table 4.

Firefly constructs (2,000 ng) were cotransfected with the pRLSV40-mut-GATA construct (300 ng) alone or with the expression plasmid (1,000 ng) into 625,000 HUDEP-2 cells using the Neon electroporation system (Invitrogen, 1200 V, 20 msec, two pulses). After 48 h, luciferase activity was measured using a Synergy 2 plate reader (Biotek) with a Dual-Glo luciferase assay system (Promega) according to the manufacturer’s instructions. Firefly luciferase levels were normalized to the pRLSV40-mut-GATA Renilla luciferase activity for each sample, and all values were then plotted relative to the empty pGL4 vector.

EMSAs

EMSAs were performed as previously described55. Radiolabelled double-stranded oligonucleotides were prepared by annealing oligonucleotides containing the rs112233623 WT (C) or the derived (T) allelic variants. Small-scale nuclear extracts were prepared as previously described56 from COS7 cells that had been treated with TGFβ (5 ng ml−1 for 16 h) as previously described57 and then transfected with the eukaryotic expression vectors pEF5HA-SMAD3 or pEF5HA-GATA1, or empty pEF5HA vector as a control. The presence and correct size of the proteins of interest in all extracts were verified in western blot assays. For supershift experiments, EMSA reactions were preincubated for 30 min on ice with specific antibody (anti-HA.1, BioLegend; anti-GATA1, Santa Cruz Biotechnology). Competitor cold oligonucleotides were added in 25×, 50× and 100× molar excess. The protein–DNA interaction products were resolved on a 5% non-denaturing acrylamide gel (37.5:1 acrylamide:bis-acrylamide, 0.25× TBE). Representative full-scan blots are provided in the Supplementary Information and in the source data. Primer sequences are listed in Supplementary Table 4.

ChIP assay

ChIP was performed by a SimpleChIP Plus Enzymatic Chromatin IP Kit (Magnetic Beads, Cell Signaling Technology, 9005S). In brief, cells (4 × 106 per immunoprecipitation) were fixed with 1% formaldehyde for 10 min at room temperature, and the reaction was quenched with glycine at a final concentration of 125 mM. Nuclei were prepared, and chromatin was incubated with micrococcal nuclease at 37 °C for 15 min, which was followed by an appropriate amount of sonication. Immunoprecipitations were performed using 10 μg of an antibody against SMAD3 or GATA1, or 1 μg with nonspecific rabbit IgG as negative control (2729S Cell Signaling Technology), according to the manufacturer’s instructions, at 4 °C for 12–16 h. The next day, the immunocomplexes were rotationally incubated with 30 µl of ChIP-Grade Protein G Magnetic Beads for two hours at 4 °C and then washed three times using low-salt wash buffer and once with high-salt wash buffer at 4 °C for 5 min per wash. Chromatin was eluted by ChIP elution buffer for 30 min at 65 °C with gentle vortex mixing (1,200 rpm) and cross-links were reversed by treatment with 5 M NaCl and proteinase K for two hours at 65 °C. ChIP DNA was purified and subsequently quantified by qPCR using primers designed to amplify a region surrounding rs112233623 in the CCND3 enhancer and SimpleChIP Human α Satellite Repeat Primers (4486, Cell Signaling Technology). Data were finally presented as percentages of the input DNA. Primer sequences are listed in Supplementary Table 4.

Genome editing by CRISPR–Cas9

Single guide RNAs (sgRNAs) targeting SMAD3 and SMAD2 were developed using a CRISPR design tool kindly provided by the Zhang laboratory (Broad Institute, MIT) (see Supplementary Table 4 for sgRNA sequences). The annealed oligos were ligated into the BsmbI-digested LentiViral (LV) construct LentiCRISPRV2 (Addgene). Lentiviral vectors were obtained by transfecting HEK-293T cells with 10 μg LentiCRISPRV2 (expressing both sgRNA and Cas9 or Cas9 only), 7.5 μg psPAX2, 2.5 μg pMD2G-VSVG and 1.25 μM pRSV-REV, using polyethylenimine (PEI; ref. 58) (Polysciences). After 72 h, viral particles in the supernatants were collected by centrifugation at 20,000 rpm for two hours at 4 °C. HUDEP-2 cells were then transduced with CRISPR–Cas9 by centrifugation at 2,800 rpm for 90 min at 30 °C with virus at a multiplicity of infection of 40. Forty-eight hours after transduction, cells were positively selected with the addition of 1 μg ml−1 of puromycin to the medium. To validate biallelic CRISPR indel detection at the genomic target site, DNA was extracted from puromycin-selected HUDEP-2 cells with the QIAamp DNA Mini Kit (QIAGEN), PCR-amplified (primer sequences are listed in Supplementary Table 4) and sequenced by the Sanger method in a 3130xl Genetic Analyzer (Applied Biosystems). The percentage of indels was calculated using TIDE v.3.3.0 (https://tide.nki.nl/)59.

RT–qPCR analysis

RNA was extracted from cells using the RNeasy extraction kit (QIAGEN) according to the manufacturer’s instructions. First-strand cDNA synthesis was performed with the SuperScript III kit (Thermo Fisher Scientific) for equivalent amounts of starting RNA from all samples. The cDNA was analysed on an ABI 7700 machine using SYBR green master mix (Applied Biosystems), and normalized to GAPDH or B2M (β2-microglobulin) as an internal control. All samples were done in triplicate. PCR cycle conditions were 95 °C for 5 min, and 40 cycles of 95 °C for 10 s, 54 °C for 10 s, 72 °C for 15 s. Analyses of Ct values were performed as previously described60. PCR primer pairs are listed in Supplementary Table 4.

Analysis of RNA-seq data from public datasets

Raw RNA-seq data (FASTQ files) of An et al.19 and Ludwig et al.20 were downloaded from the NCBI Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/sra). RNA-seq reads were aligned using STAR software (v.2.7.10b)61 against a transcriptome reference generated by RSEM software (v.1.3.1)62. The transcriptome reference was built from the hs37d5 human genome assembly (https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/) and GENCODE v.14 gene annotation (https://www.gencodegenes.org) using the rsem-prepare-reference function of RSEM. Normalized gene-expression levels (fragments per kilobase of transcript per million mapped reads; FPKM) were computed with the rsem-calculate-expression function of RSEM. Box plots were generated with R (v.3.5.3).

Western blotting

Western blotting was done following standard protocols. Proteins were collected in RIPA buffer (Millipore) supplemented with 2 mM PMSF (Sigma), and cOmplete Mini Protease Inhibitor Cocktail (Roche). Protein yield was determined with the Bradford protein assay (Bio-Rad), and equal amounts of total protein were separated by SDS–PAGE (Thermo Fisher Scientific). Proteins were transferred onto a polyvinylidene difluoride (PVDF; Amersham Biosciences) membrane in a wet transfer system (Bio-Rad). Membranes were then incubated with HRP-conjugated secondary antibodies (Santa Cruz Biotechnology), developed with an ECL system (Amersham Biosciences) according to the manufacturer’s instructions and visualized using photographic film. The ratio between cyclin D3 and β-actin was calculated using Image J1.52a (https://imagej.net/ij/). Representative full-scan blots are provided in the Supplementary Information and in the source data.

Reagents and antibodies

Recombinant TGFβ and TGFβ type I receptor inhibitor SB431542 were purchased from Sigma-Aldrich (T7039 and S4696). The following antibodies were used: PC7-conjugated anti-human CD235a (GPA–PC7, 1:50) supplied by Beckman Coulter, clone 11E4B-7-6, A71564; PE-conjugated anti-Band3 (Band 3–PE, 1:100) supplied by IBGRL, clone BRIC 6, 9439PE; APC-conjugated anti-CD49d (integrin α4–APC, 1:20) supplied by Beckman Coulter, clone HP2/1, B01682; anti-human GATA1 (GATA1, 1:10) supplied by Santa Cruz Biotechnology, clone N1, sc-266X; anti-influenza haemagglutinin epitope tag HA.11 (SMAD3–HA, 1:10) supplied by BioLegend, clone 16B12, MMS-101P; anti-human cyclin D3 (cyclin D3, 1:250) supplied by Santa Cruz Biotechnology, clone D-7, sc-6283; anti-human SMAD2 (SMAD2, 1:500) supplied by Santa Cruz Biotechnology, clone S-20, sc-6200; anti-β-actin (β-actin, 1:1,000) supplied by Santa Cruz Biotechnology, clone C4, sc-47778; anti-cyclin D3 (cyclin D3, 1:2,000) supplied by Cell Signaling Technology, clone DCS22, 2936S; rabbit anti-mouse IgG–HRP (1:20,000) supplied by Santa Cruz Biotechnology, sc2005; goat anti-rabbit IgG–HRP (1:20,000) supplied by Santa Cruz Biotechnology, sc2004; rabbit anti-goat IgG–HRP (1:20,000) supplied by Santa Cruz Biotechnology, sc-2768; anti-human SMAD3 (SMAD3, 10 µg in 100 µl for immunoprecipitation and 1:1,000 for western blot) supplied by Cell Signaling Technology, clone C67H9, 9523; anti-human GATA1 (GATA1, 10 µg in 100 µl) supplied by Abcam, ab11852; and normal rabbit IgG, (IgG, 1 µg in 100 µl) supplied by Cell Signaling Technology, 2729S.

Plasmodium falciparum culture

In vitro studies were done using the genetically stable reference FUP strain (mycoplasma-free) according to previously reported protocols63. The FUP strain of P. falciparum selected for our in vitro study (provided by E. Schwarzer) has been used extensively in both in vitro and in vivo studies to investigate various aspects of malaria infection, including RBC invasion, growth and development in the host. Compared with other strains, the FUP strain tends to maintain its genetic integrity during in vitro culture, making it a reliable choice for long-term studies64. To evaluate the reproducibility of our findings across genetically distinct parasite backgrounds, we performed additional experiments using the 3D7 (African) and DD2 (Southeast Asian) isolates.

Infection of RBCs from genotyped individuals

Fresh genotyped venous blood (about 25 ml) was collected from genotyped donors into ACD vacutainer tubes (BD Biosciences) and immediately processed. RBCs were separated from plasma and leukocytes by three washings in RPMI 1640 containing 25 mM HEPES (Gibco) and supplemented with 2 mM glutamine, 20 mM glucose, 27 μg ml−1 hypoxanthine and 32 μg ml−1 gentamicin (Sigma) (pH 7.2). The washed genotyped RBCs were used to maintain the parasite in vitro cultures in the same medium. Parasite cultures were synchronized as described previously65. Cultures were incubated at 37 °C under a 5% O2, 5% CO2, 90% N2 (v/v) atmosphere as previously described66.

To synchronize the genotyped cultures by parasite stage, schizonts (mature forms of Plasmodium) were collected from the 40–60% interface after passing a mixed-stage culture through a discontinuous Percoll mannitol density gradient by centrifugation at 5,000g for 30 min. The ring-stage parasites (early parasite stage) from the bottom fraction were also kept for further synchronous culturing. To start plasmodium cultures, density-isolated schizonts were added to erythrocytes in growth medium to a density of 2.5% parasites per erythrocyte.

All experiments involving in vitro malaria infections were performed in parallel with two initial haematocrit settings: 2% and 1%. This parallel experimental design provided the option to select between the two settings for subsequent short- and long-term parasite culture analysis, circumventing the complex and unpredictable process of repeating experiments from a small pool of donors with the relevant multilocus genotype combination, who are difficult to resample.

Samples with an initial haematocrit of 2% used 240 µl of RBCs with 2.5% schizont parasitaemia. We added 240 µl of packed erythrocytes every 48 h, coupled with replacement of the growth medium. This led to a gradual increase in haematocrit up to approximately 6% by the third cycle.

For 1% initial haematocrit samples, we used 2.5% schizont parasitaemia to infect 120 µl of packed erythrocytes and the same amount of RBCs was added to the cultures every 48 h, alongside the replacement of the growth medium. In this case, the final haematocrit level reached approximately 3% by the third cycle.

These two settings were used for all replicates at each time point (24, 48, 72, 96, 120 and 144 hpi). Because analysis of long-term cultures was performed to allow a wider range of observations, experiments were selected that began with an initial haematocrit level of 1%.

This is because the haematocrit level affects the glucose level significantly: as the haematocrit increases, so does glucose consumption, resulting in glucose depletion; in turn, the glucose level affects parasitaemia significantly, because the parasites rely heavily on glycolysis for energy and survival. Especially in the late cycles of long-term cultures, when the haematocrit level increases proportionally with the progressive addition of erythrocytes, we expect that parasitaemia will be less (negatively) affected by glucose consumption by both uninfected and infected erythrocytes at the selected lower haematocrit setting.

For each condition, two thin blood smears were independently counted by three trained microscopists in a blinded manner according to the latest guidelines of the World Health Organization (Malaria Microscopy Quality Assurance Manual). The resulting six counts were averaged to obtain a single arithmetic mean per condition.

Total parasitaemia, stages of morphology, parasite viability and inhibition rate were determined using Diff-Quick (RAL Diagnostics)-stained thin blood smears and light microscopy (Carl Zeiss Microscope Primo Star, objective lenses: 20×/0.50 PlanF and 100×/1.25 Oil Plan-ACHROMAT). The percentage of total parasitaemia was determined as follows: (number of parasitized erythrocytes/total number of erythrocytes) × 100. For each sample, at least 5,000 cells were counted. Damaged and pyknotic forms were counted separately and excluded from the total parasitaemia count.

The percentage of ring parasitaemia was determined as follows: (number of rings (single and multiple)/total number of infected RBCs) × 100.

The percentage of trophozoite parasitaemia was determined as follows: (number of trophozoites (single and multiple)/total number of infected RBCs) × 100.

The percentage of schizont parasitaemia was determined as follows: (number of schizonts (most mature form of P. falciparum asexual stages)/total number of infected RBCs) × 100.

The percentage of dead parasites was determined as follows: (number of pyknotic forms (blue dots inside RBCs)/total number of infected RBCs) × 100.

The percentage of parasite growth inhibition was calculated as follows: (1 – parasitaemia value)/(parasitaemia WT/WT|WT/WT average value) × 100.

The experiments were done in five batches to ensure, as far as possible, that individuals recruited for each experimental batch had a similar proportion of the different genotypes, especially IoE and DoE, to control for potential batch effects (Supplementary Information). A table listing the batch assignment for each donor used in the infection experiments is provided in the source data. We also ensured that all assays were performed under strictly controlled, standardized and homogeneous conditions to maintain consistency and comparability between batches, using exactly the same experimental protocols, including identical reagent batches, instrument settings and environmental parameters. This approach ensured the validity, accuracy and reproducibility of the results and minimized the variability associated with sample processing over time.

LC–MS measurements for ROS detection

Plasma was separated from whole blood by centrifugation at 200g for 5 min at 4 °C. White blood cells and pellets were thereby removed and the collected RBCs were transferred into a fresh microcentrifuge tube. The RBCs were then diluted 100-fold in PBS and probed with 20 μM coumarin boronic acid (Cayman Chemical). Before LC–MS analysis, 100 μl of methanol was added to the samples, which were then centrifuged at 5,000g for 30 min at 4 °C. LC–MS analysis was performed on an UltiMate 3000 UPLC system (Thermo Fisher Scientific) as previously described67. The limit of quantitation was calculated from calibration curves. For hydrogen peroxide species, the limit of detection was set to 11 nM, and the limit of quantification was set to 37 nM.

Statistical analyses

Targeted genetic association

Association signals for HbA2 and HbF levels (g dl—1), as well as MCV, RBC and MCH, were detected for the CCND3 gene region in a cohort of up to 6,824 samples from the SardiNIA cohort, as previously reported1 in a subset of 6,305 individuals from the same cohort in an analysis of haemoglobin levels.

Analyses used a genetic map based on 7,928 genotyped samples, with the OmniExpress Illumina array and iScan system using the Infinium HTS protocol, and imputed on a genome-wide scale using a Sardinian sequence-based reference panel of 3,514 individuals, as previously described51. The target region of ±0.5 Mb around the gene was analysed using a linear mixed model that accounts for cryptic relatedness and population stratification, implemented in the q.emmax function of the EPACTS program (http://genome.sph.umich.edu/wiki/EPACTS). Before the analysis, phenotype levels were normalized with inverse-normal transformation and adjusted for covariates of sex, age and age2 and for the presence of at least one of the 3 β0 mutations (β039 (rs11549407), HBB:c.20delA (rs63749819) and HBB:c.315+1G>A (rs33945777)), all directly genotyped or imputed. Moreover, HbF outliers with values higher than 5% were not considered in the association analysis, as previously reported1. An association signal was considered significant at a P-value threshold of 6.9 × 10−9 (ref. 51). Conditional analyses were performed for each trait by adding the most associated SNP as a covariate to the model adjusted for age, age2, sex and β0: no independent variant was found at the locus. To detect a causal variant, or causal variants, in the associated region, fine mapping was done by applying credible set analyses to the traits showing a significant association signal in the region (HbA2, MCV, MCH and RBC) using FINEMAP v.1.4.2 software; the --cond option and settings for one causal variant (--n-causal-snps 1) and for a maximum of ten causal variants (--n-causal-snps 10) for each association profile were used, as previously described68. Fine-mapping analyses converged for all tested traits after one causal SNP was detected, finding a single credible set at the 95% summed posterior probability. Variants in the credible set were then annotated for likely deleterious effect using the CADD score69 (using Ensembl GRCh37 release 111, January 2024).

Population genetic analyses

To investigate the hypothesis that the high allele frequency of rs112233623 in the Sardinian population results from positive selection rather than from genetic drift, standard metrics based on allele frequency (FST) and haplotype differentiation patterns (iHS and xp-EHH) were calculated as previously described70. In particular, a subset of 1,577 unrelated samples belonging to the SardiNIA whole-genome sequence reference panel51 was studied. This subset was defined by computing the genome-wide proportion of pairwise identity by descent (IBD) (π) on a random set of one million SNPs with a MAF > 0.05 in the 1KG population sample; for each pair of individuals with π > 0.05, the offspring was preferentially removed if in a trio; otherwise, the individual with the largest summed value of π across all other relationships with π > 0.05 was removed. Analyses on SardiNIA were then compared with those on 500 EUR samples included in 1KG phase 3 (ref. 18). To estimate the FST statistic, the Weir–Cockerham formula implemented in VCFtools v.0.1.12b (https://vcftools.github.io/index.html) was used to compare SardiNIA with 1KG EUR samples. For the haplotype-based analyses, variants with a MAF < 0.01 and a Hardy–Weinberg equilibrium (HWE) test P < 10−4 in Sardinian samples were first removed; this conservative filter facilitated better reconstruction of haplotypes by avoiding genotyping errors. The extended haplotype homozygosity (EHH) and the unstandardized iHS and xp-EHH values were then calculated using selscan71. Standardized statistics were obtained using the norm tool included in selscan. Finally, to establish a statistical benchmark for each test statistic, we constructed an empirical background distribution based on randomly selected variants from sequenced SardiNIA genomes that matched our target variant rs112233623 for three features: MAF ± 2% in SardiNIA; a measure of background selection (B score (ref. 72) ± 50 units); and local recombination rate in a 50-kb region around the variant (±0.5 cM per Mb; ref. 70) for a total of 13,134 variants. To exclude background variants in LD, variants with a correlation (r2) ≥ 0.1 were then filtered using the pruning procedure implemented in plink (https://www.cog-genomics.org/plink/1.9/), to obtain 1,075 variants that were used for the empirical background distribution used to calculate the genomic percentile values. Percentiles are calculated for standardized values for FST and xp-EHH tests, and for absolute standardized values for iHS. A high percentile value (such as 99%) suggests that for the statistic calculated, the variant shows more extreme signatures of positive selection than do 99% of the genomic background variants.

Coalescent-based analyses of recent positive selection

Description of dataset and preprocessing steps

For the purposes of coalescent-based analyses, we used the whole-genome sequencing dataset described previously73,74 (n = 3,514). This dataset includes individuals from the SardiNIA Project51, as well as from two case–control studies conducted in Sardinia. Related individuals were pruned from this dataset on the basis of the proportion of pairwise shared IBD, as described73. Given the computational demands, in a first step we sampled 100 unrelated individuals at random, followed by a second step in which we randomly selected a larger sample of 300 unrelated individuals for coalescent-based analyses.

Construction of genealogies using Relate

Genealogies were constructed using the Relate method (v.1.1.9), as described previously75. In brief, this method approximates the ancestral recombination graph by inferring a series of marginal trees along the genome. For all analyses using Relate, we followed the pipeline described in the online documentation (https://myersgroup.github.io/relate/). First, the ConvertFromVcf module was used to convert files from VCF format to the haps/sample format used by Relate, and the provided script (PrepareInputFiles.sh) was used to prepare the data. Then Relate was run with the mutation rate set to 1.25 × 10−8 and the haploid effective population size set to 30,000 to generate anc/mut files. The script EstimatePopulationSize.sh was then used to estimate coalescence rates, which were then used to re-estimate branch lengths in the anc/mut files with the script ReEstimateBranchLengths.sh. Marginal trees were visualized using the provided script TreeViewMutation.sh.

Genealogies for the sampled Sardinian individuals

We first constructed a genealogy for the 100 sampled individuals described in the ‘Description of dataset and preprocessing steps’ subsection above, using data from chromosome 6. As a precursor to using CLUES to infer selection coefficients across the genome, we repeated the Relate analysis using data from each autosome (1–22). We then used the script SampleBranchLengths.sh to sample branch lengths using Markov chain Monte Carlo (MCMC) for each variant in the SNP set of the 1,075 selected matching variants. We used the coalescence rates inferred by Relate and these sampled branch lengths as input and set the mutation rate to 1.25 × 10−8 per bp per generation. This step was repeated for each variant for which we estimated a selection coefficient (see below).

Multi-population genealogy

We also constructed a genealogy using Relate for a merged dataset including the 100 individuals described in the ‘Description of dataset and preprocessing steps’ subsection above, 107 Tuscan individuals from Italy18 and 108 Yoruba individuals from Ibadan, Nigeria18. We followed the same procedure described in the ‘Construction of genealogies using Relate’ subsection above to construct and visualize this genealogy for variants on chromosome 6, including rs112233623 and rs9349205.

Estimation of allele frequency trajectory and selection coefficient for rs112233623 using CLUES

We used the CLUES method31 to jointly infer the allele frequency trajectory and selection coefficient for the DoE variant (rs112233623). To do so, we followed the steps described above to run Relate tree inference for the sampled individuals described in the ‘Description of dataset and preprocessing steps’ subsection above and sample branch lengths from the trees. We then used the CLUES script inference.py to run CLUES with the sampled branch lengths (--times) and coalescence rates (--coal) outputted by Relate as input to the script. The CLUES-provided script plot_traj.py was then used to visualize the allele frequency trajectory.

Empirical distribution of selection coefficient estimates

To contextualize the results of the selection inference procedure, we repeated the CLUES analysis for the 1,075 SNPs selected to match by MAF, B score (background selection) and recombination rate with that of the DoE variant rs112233623. The HLA region was excluded from this SNP set. Of the 1,075 matched SNPs, 1,035 and 1,054 mapped onto the marginal tree inferred by Relate using 100 and 300 samples, respectively, and the remaining variants were excluded from further analysis.

Statistical analyses in experiments

On the basis of data distributions and sample sizes, parametric t-tests or non-parametric Wilcoxon–Mann–Whitney tests were applied to compare means or medians, respectively, and to establish statistical significance. To test a specific hypothesis, the one-tailed alternative-hypothesis test was used. Statistical analyses were performed using R v.3.5.3 (2019-03-11).

Correlations were calculated using the Pearson’s product moment correlation coefficient and P values were computed under the two-side hypothesis (cor.test function in R).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.