The equitable representation of global genetic diversity in preclinical cellular models such as human induced pluripotent stem cells (hiPSCs) has been advocated for by many in recent years1,2,3. This has been driven by the recognition of extraordinary genetic diversity across the globe, especially within historically underrepresented population groups, including those on the African continent. This genetic diversity must be translated into physiologically relevant models to realise equitable benefits from advances in preclinical research for all population groups. Since the (a) efficacy of treatments from vaccines to gene therapies (both ex vivo and in vivo), (b) preclinical testing of small molecules, (c) diagnostic systems, and (d) deconvolution of the molecular basis of disease, could all be dependent on individual genetic background, the impact for healthcare is bound to be expansive.

Whilst not yet market-approved, over 100 clinical trials for hiPSC-based cellular therapies are ongoing4. In addition, as a cellular and disease modelling resource, hiPSC technologies are being utilized to advance preclinical research across myriad fields within drug discovery and regenerative medicine, allowing for the functional investigation of individual donor genetics as they relate to specific diseases and treatment outcomes. Africa’s extensive genetic diversity is associated with a unique disease burden which could be repositioned to instead offer a rich resource for the development of therapeutics related to human leucocyte antigen (HLA) matching. Allogeneic strategies represent a significant proportion of current hiPSC clinical trials, yet few studies are actively taking this into consideration (as an example here by Japan5) to expand broader population applications. Given the increasing evidence of HLA diversity6,7 inclusion of a wider representation of population samples will be required to ensure equitable benefit from advances in cellular therapeutic intervention for historically underrepresented and underserved population groups.

The scale of African genetic diversity

Africa harbours the greatest genetic diversity in the world - which contextualized from within Africa relates to the extraordinary diversity across the >2000 ethnolinguistic groups (constituting the four larger population structures i.e. Afroasiatic, Khoisan, Niger-Congo, and Nilo-Saharan) across the continent which are not adequately represented in research (the extent of which is shown to some degree in Fig. 1).

Fig. 1: Graphic representation of ethnolinguistic diversity in Africa.
figure 1

Overlayed Geo-referencing of Ethnic Groups (GREG) and Language families (Felix 2001) layers using Harvard WorldMap24 are visible. The latter are shown within the key and reproduced with permission from Marc Leo Felix25, whilst the GREG layers are noted here in ref. 26, derived from ref. 27.

The initial 1000 genome study dataset revealed that typical African genomes harbour ~25% more genetic variant sites per genome than any other population, as well as the highest number of common variants (>5%) defined as globally rare (<0.5%)8. Several initiatives such as the Human Heredity and Health in Africa (H3Africa) program have made great strides towards increasing the representation of African genomes to more closely represent Africa’s 17.5% of the global population9. In addition, version four of the genome aggregation database (gnomAD V4.010) currently includes a three-fold increase in deposited African genomes sequenced since the release of V2.0, yet an overall reduction in the total proportion of deposited African genomes (from 8.8% to 4.6%) is reflected due to an increase in the number of genomes of European ancestry being deposited.

It is therefore critical to ensure that equitable representation of African genetic diversity is addressed in hiPSC collections to mitigate the stark resource gap in the global research landscape. Ghosh and colleagues analysed the status of global ancestry reported for hiPSC collections in 20222, where lines of non-African ancestry were significantly more common than those from African populations. The authors eloquently outlined the importance of greater inclusivity and the subsequent challenges of accessing diverse lines for both academia and industry. For this commentary, we focus on evaluating the representation of iPSCs of African ancestry derived from ethnolinguistic groups (outlined above) from the African continent, based on data accessible in January 2024.

African hiPSC representation within global collections

Several organisations host hiPSC lines denoting population descriptors including ‘African’, ‘African American’, or ‘Black’. Of the larger global repositories, WiCell11 (United States of America) has reported the highest proportion, comprising approximately 15% of reported lines (Fig. 2A). Importantly the majority (191) of the 203 available African American hiPSC lines hosted at the WiCell repository are each derived from individual donors, serving as a true reflection of genetic diversity present within this repository. The HipSci repository12 based in Europe, hosts ten lines characterised as Black or African, derived from seven donors with population definitions including donor ethnicity and predicted population. The human pluripotent stem cell registry (hPSCreg13) lists 68 hiPSC registered lines denoted to be of Black or African ethnicity, which are however derived from only 12 donors. Of the approximately 6000 lines registered on hPSCreg, 62% lack reported population descriptors referring to genetic ancestry which perhaps reflects the lack of importance historically assigned to genetic diversity by the research community at large. Finally, the Coriell repository14 (United States of America) holds 17 Black or African American lines which are each derived from individual donors.

Fig. 2: Global analysis of iPSC collections.
figure 2

A Relative representation by identified genetic population groups within each repository and the hPSC registry. B Detailed sub-analysis of available African iPSC lines outlining clinical pathologies of each repository and associated donor representation.

Whilst several WiCell lines are derived from individuals with reported clinical phenotypes linked to a specific disease area of research, the majority (74%) are assigned “no disease reported” and could be used as background population control lines for disease modelling studies. At the time of writing, 57% (91) of these potential control lines are also not restricted for use to a specific disease area (Fig. 2B). It’s important that such lines should adhere to the International Society for Stem Cell Research recommendations of iPSC characterisation, to ensure quality control standards and confidence in their utility in scientific research studies15. To the best of our knowledge, only 12 of the 215 African American WiCell lines are annotated as lacking the necessary karyotyping standards and are usefully marked as such. The Coriell repository also hosts an additional 11 Black/African American hiPSC lines from 11 donors with no reported disease, representing an important resource adding to the global numbers of potential control samples of diverse African representation.

WiCell hiPSC lines associated with a specific disease area are largely derived from a combination of inherent blood and cardiac disease patients. With sickle cell disease16 and cardiovascular diseases17 being predominant within African populations, this represents an important potential resource for identifying novel gene therapies and small molecule drugs which are relevant to population-specific disease aetiologies. HipSci, hPSCreg and Coriell add to this global resource by including hiPSCs representing diseases with varying degrees of genetic predisposition including retinitis pigmentosa, Alzheimer’s disease, cystic fibrosis (CF), glycogen storage disorder, diabetes and epilepsy (Fig. 2B).

It is clear from this data that ancestral African genetics is present in global repositories. However, improved representation is needed to fully recapitulate Africa’s contributions to global genetic diversity. Increased international migration, the need to replace historically genetically homogeneous cell line experimental pipelines with more diverse genetics inherent within the African population, and Africa’s reliance on pharmaceuticals, collectively necessitate increased diversity in accessible iPSC lines.

Yet, a concerning narrative still holds across the scientific community; that African hiPSCs from outside the continent itself e.g. African Americans, would sufficiently assuage concerns of diverse representation. The genetic diversity present in many African American populations is predominantly derived from West African populations18 leading to alteration of allele frequencies through genetic drift, subsequent founder effects and admixture (created by recently linked, but previously distinct geographically distant ancestors within a few generations19). These effects collectively reinforce that neither this population group nor any other single minority population group outside the African continent, can solely serve as a proxy for the entirety of genetic diversity encompassed within the diverse ethnolinguistic groups on the African continent.

For the purposes of comprehending the implications of ethnolinguistic genetic diversity, we need only look at a few examples within the continent itself. A variant responsible for the metabolism of the antimalarial, chloroquine, has vastly differing allele frequencies across two South African population groups, at 16% in Tsonga-speakers vs 0.8% in Xhosa-speakers20. Other examples include the breadth of disease-causing mutations in spinal muscular atrophy, Duchenne muscular dystrophy, and CF where rare or previously unknown mutations are increasingly recognized as being predominantly responsible for the disease phenotype21 offering a valuable resource for disease modelling. Indeed, the globally predominant CF disease-causing delF508 mutation is present in less than 50% of patients on the African continent with an additional 21 uniquely African mutations littered across the continent.

In addition, the equitable representation of patient-derived samples of idiopathic and/or complex disease, underpinned by diverse genetic variation across African ancestral groups, would serve as a critical turning point for the future impact of hiPSC technologies on the African continent. This would provide preclinical resources that not only facilitate multiple avenues of research related to disease pathophysiology and therapy but are also relevant to the continent’s diversity. These are some of the many demonstrations that justify the need to diversify hiPSC representation from within and across the African continent.

To the best of our knowledge, most of the samples in global repositories represent individuals from only a few African countries. However, Africa’s regional diversity is broader than a simple designation of ‘African descent’ where a breakdown of ethnolinguistic sub-populations highlights the diversity even within a single African country22. This is critical to reinforce given the recent evidence of extraordinary regional diversity between 111 sub-Saharan ethnolinguistic and geographical groups across the continent23.

A local solution for a global problem

A requirement for the inclusion of African ancestry within hiPSC studies would represent a welcome step towards addressing existing disparities within the literature and global hiPSC resources, but there is still a clear need for a Pan-African approach to address equitable representation for the continent’s diverse population groups. At the same time, the potential for symbolic representation of only a few ‘African ancestry’ hiPSC lines should be cautioned against to avoid a common misconception which is underpinned by the idea that this would adequately represent an entire continent which harbours more genetic diversity than the rest of the world combined.

Whilst the number of lines derived from ancestral African population groups is increasing, more still needs to be done to ensure that cellular models such as hiPSCs that are representative of African genetic diversity are locally generated, regulated and made globally accessible. At the time of publication, only five lines generated in Africa have been submitted to hPSCreg which highlights the necessity for investing resources within the continent. Building on converting this vision into further scalable action, initiatives that seek to grow the cellular representation of the African diaspora should be strongly advocated for, and accelerated to implementation ensuring that it is rooted in a collaborative network of governmental, clinical, and scientific stakeholders.

A lack of resources (skills and funding) has led to the under-representation of samples spanning African ethnolinguistic groups in global collections. Our goal is to ensure that representation of the African genetic diaspora is incorporated into global R&D pipelines to ensure a future of equitable healthcare for the continent. Whilst this requires consultation with established global programmes to include diverse hiPSCs, we, as contributors of African research, reinforce that community consultation (research bodies and the general population) through public lectures is an absolute requirement to ensure Africa is not harvested of her diversity, without due benefit to the local population.

Partnerships that support the financially sustainable localisation of iPSC generation in under-represented regions with associated training of local scientists are needed. We envisage that with a public–private partnership supported by the government, a mutually beneficial structure could be financially sustainable in the long-term, with industry players supporting the generation and expansion of diverse cellular lines encompassing both bespoke clinical cohorts as well as much-needed background controls, accessible by academia. Providing accessibility of African-derived iPSCs must be balanced by the need to prevent exploitation of this resource. Commercial licencing of the African lines to industry is an essential first step towards improving accessibility to these resources for African researchers. Another strategy worth considering would be to ensure African research interests are equitably represented within the IP positions defined for industry use. The most optimal structure may involve a variation on such strategies but must, at its inception, ensure that African genetic diversity is not exploited. In addition, considerations for equitable IP could provide an opportunity for additional long-term sustainability, and benefit sharing. To do this, we need an established iPSC centre of excellence in Africa which can serve as a launchpad for the diffusion of skills and technologies to promote the uptake of hiPSC technologies across the continent. Through training workshops and visiting researcher programmes this could be expanded across the continent to establish strategically positioned regional nodes creating an African-led hiPSC network to drive the adequate representation of African genetic diversity in global stem cell repositories.

We acknowledge that the lack of adequate representation within global hiPSC resources available to the academic and industry community is not an African problem alone—investment is required to ensure that global population diversity is fairly represented within current repositories as this is critical to assuring equitable benefit and impact from advances within the hiPSC field for all population groups.

With the right partners, the localisation of such endeavours on the continent would strive to build capability and capacity, whilst ensuring that Africa’s extraordinary genetic diversity is harnessed for global accessibility and consequently, impact.