Background

The International Health Cohorts Consortium (IHCC) was established in 2018 at the request of the leaders of the Heads of International Research Organizations (HIROs) and through a collaboration between the Global Genomic Medicine Collaborative (G2MC) and the Global Alliance for Genomics and Health (GA4GH). It is a global initiative aimed at closing gaps in genomic databases to enhance representation of different ancestral groups, promoting collaboration across academic and industry partners, and supporting cutting-edge research in areas that impact global health1. The mission of the IHCC is to forge cohort connections that revolutionize population health science by providing sustainable data infrastructure, cultivating a collaborative research environment, and promoting policies and best practices that foster connectivity, interoperability, and reciprocity.

IHCC membership criteria prioritize large population cohorts, capability for longitudinal health follow-up, broad participant selection, biological sample collection, and a commitment to data-sharing. It also recognizes the value of smaller cohorts representing underrepresented or unique groups.

Projects supported by the IHCC are focused on the biological, environmental, and social determinants of health and disease. Its core objectives are to (1) accelerate research, (2) harmonize data, (3) educate researchers, (4) enhance public health, and (5) foster innovation. Collaborators can utilize IHCC resources, including the public IHCC Data Atlas (https://atlas.ihccglobal.org/), to drive distinct areas of research in collaboration with other cohorts, and/or to access samples and data from populations of interest.

In this Comment, we summarize the breadth of IHCC members’ resources, invite prospective partners to join us in addressing global challenges in health research, and propose a federated template for constructive collaboration.

A large number of IHCC resources are available

Participants from member cohorts span a broad range of ages, ethnicities, and geographic locations, with 35% of locations (N = 24) self-identifying as a low- and middle-income country (LMIC), per World Bank criteria2.

A members’ survey was disseminated from November 2021 to March 2022 and again from November 2022 to March 2023. In total, it was completed by 69 of the 89 cohort members. Respondents reflected the diversity of the IHCC as a whole and were from Africa (N = 7), Asia (N = 19), Australia (N = 2), North America (N = 19), South America (N = 3), and Europe (N = 19). Of the 69 member cohorts who responded, 45 (65%) hold genomic data on some/all of their cohort. Forty-one sites had collected genotype data from ~12,834,036 research participants. Collectively, the cohorts represent ~34 million unique research participants with available data/samples (Fig. 1).

Fig. 1: Broad breakdown of underlying cohorts as reported by 69 unique IHCC member cohorts.
figure 1

A Breakdown of ~33,872,619 unique participants across 69 reporting sites. Participants are categorized by sub-population for All, LMIC, and non-LMIC member cohorts. B Breakdown of resources/datatypes for unique participants as reported by member cohorts.

Several million biospecimens are available for immediate research use, subject to project-specific informed consent and ethical oversight, and country-specific legislation. Cohorts with ongoing recruitment continuously add to these collections (Fig. 2).

Fig. 2: Breakdown of approximate number and types of biosamples available as reported by member sites.
figure 2

Across the 69 responding cohorts, millions of biospecimens are available for immediate research use, subject to project-specific informed consent and ethical oversight, and country-specific legislation. Cohorts with ongoing recruitment continuously add to these sample collections, ensuring that both new and existing studies can access fresh or archived samples as needed.

Relevant phenotype data are largely broad-based (i.e., not phenotype-specific). Demographic data informative of social determinants of health are widely available. Additionally, detailed clinical outcomes address chronic disease, mental health, and lifestyle. Ten (14%) IHCC cohorts (including two LMICs) reported linkage to participants’ electronic medical records (EMR).

IHCC encourages data sharing and collaboration

The IHCC supports open and timely access to research findings, adopting the Framework for Responsible Sharing developed by the Global Alliance for Genomics and Health (GA4GH)3. This assumes that data donors (or their legal representatives) have provided consent for data use and sharing, and that ethics oversight consistent with local, national, and relevant international laws has been applied, as well as culturally appropriate ethics standards and best practices for governing future data use. The intention and willingness to collaborate with other members and beyond is a prerequisite to membership and fundamental to the IHCC research enterprise.

IHCC facilitates LMIC/LRS representation and support

Since its founding, a core component of the IHCC's mission has been to enhance inclusion of cohorts with diversity from LMICs/low-resource settings (LRS) and to contribute to closing the existing ancestral and minority population gaps in databases. LRSs are characterized by significant constraints in both financial and human resources, encompassing clinical and non-clinical personnel. These contexts are further defined by an underdeveloped organizational and physical infrastructure. In addition to supporting LMIC/LRS collaborations, the IHCC explicitly supports diversity across age, ethnicity, sex, rural/urban, socioeconomic status, both in terms of constituent cohorts and investigator development.

The Global Cohort Atlas allows robust and simple data sharing

The IHCC Global Cohort Atlas serves as a centralized resource for discovering available data across cohorts. It enables data harmonization and sharing across different platforms. Participation in the Atlas is a requirement for any projects funded by the IHCC.

In 2020, the Atlas was launched as the first open global directory of large-scale human cohorts, with cohort information gathered from surveys and data dictionaries. It enables the findability of cohort data by providing users with a single entry point to cross-query cohort data dictionaries to discover cohorts, phenotypes, or variables of interest. Searchable dimensions include participant disease status, data use policy, sample collection parameters, genotypes, and demographic/health variables from cohort data dictionaries. The Atlas encourages data interoperability by providing harmonized descriptions of cohorts. All of the Atlas’s cohort public metadata (cohort descriptors, data dictionaries, and variables) are openly available (https://github.com/IHCC-cohorts).

Cohorts are classified into three levels of detail based on the cohort variables collected and harmonized. High-level cohort fields are collected upon joining IHCC (Level 1), structured cohort descriptors are collected through the IHCC annual cohort surveys (Level 2), and complete semantic harmonization of the cohort data dictionary variables is carried out in collaboration with cohort data managers (Level 3). This multi-level approach has enabled us to lower the bar for inclusion in the Atlas (Level 1), balancing the curation overhead required to reach the most comprehensive Level 3.

The Cohort Atlas has grown to 89 cohorts from 43 countries with 12 harmonized cohorts (Level 3) through collaborations with projects such as CINECA and the Davos Alzheimer’s Collaborative (DAC).

Starting a collaboration

Collaborations typically start with a feasibility assessment through the Atlas and outreach to cohort leaders for project alignment.

The Alzheimer’s pilot aimed to integrate existing cohort data worldwide to form a global Alzheimer’s disease (AD) resource. By uniting diverse cohorts with varying degrees of genomic and phenotypic data, the program targeted early detection of disease onset and progression. The approach to project planning, initiation, and development constitutes a working model for future IHCC programs with a milestone-driven approach to program development (Table 1).

Table 1 The IHCC-DAC model for program development

In the planning stages, over 70 experts defined specific recommendations for the ‘Must have” minimum set of components and approaches, as well as suggestions for desirable “Nice to have” components. A full mapping and assessment of existing cohorts of subjects willing and able to contribute, already enrolled participants, was undertaken. Twenty IHCC cohorts comprising several million individuals were identified. Each completed a resource survey aligned with the scientific plan, which allowed the IHCC to assess readiness, gaps, and support needs. In addition, the group cataloged additional cohorts/participants that offer congruent capabilities, as well as specialized sub-cohorts focused on narrow aspects such as brain autopsy or novel biomarker strategies. In the pilot phase, the IHCC pursued cohort leads from cohorts to contribute data for the assessment of AD polygenic risk in diverse populations in comparison with European ancestry cohorts, including the UK Biobank4.

We envision that this model is adaptable to any number of phenotypes/projects, whereby global data on diverse and heterogeneous populations will yield new discoveries.

Existing pilots and publications

Our existing pilots give an example of the diversity of areas in which our research can be used. Several cohorts are engaged in using PRS to assess genetic susceptibility to various diseases, including cardiovascular diseases, diabetes, and dementia. These pilot studies are crucial for understanding PRS application in underrepresented populations5. Similarly, several cohorts are focused on mental health disorders, including studies to identify genetic and environmental contributors to conditions such as depression, schizophrenia, and anxiety, particularly in non-European populations6,7,8,9. The IHCC continues to support studies pioneering the use of metabolomics in population health. This research is crucial for understanding how metabolic pathways contribute to chronic diseases such as diabetes and cardiovascular conditions in diverse populations10,11. Another example of IHCC-supported health initiatives is in studying the impact of opioids—known carcinogens—on cancer burden. Member cohorts have formed a global collaboration to harmonize data on opioid and cancer risk12,13. Similarly, IHCC Cohorts are collaborating on the study of early-life adiposity linked to cancer risk in diverse genetic ancestries14,15,16. Finally, cohorts from Africa have developed a resource for coronavirus host genomics studies. This multi-collaborator strategic partnership was designed to provide harmonized demographic, clinical, and genetic information specific to Black South Africans with COVID-1917.

Future research directions

We aim to continue our support for the pilot program and to expand to new areas in the coming year. Our subgroups remain active and continue to address gaps in global health research. Future efforts will focus on improving the understanding of mental health disorders across LMIC populations, where the burden of mental health conditions is rising but remains under-researched. We also plan to enhance members’ metabolomics capabilities, particularly by developing standards for cross-cohort data integration. New initiatives include unique opportunities to further assess the relationship between climate and health. With over 70% of IHCC cohorts collecting environmental data, we are well-powered to lead in this increasingly important domain. Other areas for IHCC expansion include cancer genomics, infectious disease research, and pharmacogenomics. These fields represent new frontiers where diverse global data will be important for identifying genetic and environmental contributors to health.

Conclusions

The IHCC is a unique resource and transformative global initiative aimed at uniting large-scale, longitudinal population cohort studies to address some of the most pressing challenges in population health. The consortium’s commitment to diversity, inclusion, and representation addresses existing gaps in genomic databases and health disparities. Through initiatives such as the Global Cohort Atlas, standardized data-sharing frameworks, and federated analysis models, the IHCC has enabled collaborations that transcend geographical and technological barriers while respecting ethical and regulatory considerations.

IHCC’s scientific strategy has catalyzed novel research, including the application of PRS, studies on mental health and metabolomics, and the development of global frameworks for Alzheimer’s disease research. As it continues to grow, the IHCC remains focused on advancing its mission to support sustainable, inclusive, and high-impact research.

The IHCC invites researchers, policymakers, and industry partners to join its efforts in building a collaborative ecosystem that leverages population cohort data to drive innovation and equity in health outcomes worldwide. Researchers interested in joining can contact ihccinfo@ihccglobal.org.