Rationale

The onset and progression of neurological and psychiatric disorders are intricately tied to the molecular processes within brain tissues. Research has demonstrated that the levels of transcription and protein, irrespective of the presence of post-translational modifications (PTMs), exhibit a high degree of specificity in brain tissues, making it challenging to use peripheral tissues as a proxy [1, 2]. Non-primates and non-human primates have been used to study the physiology and pathophysiology of the human brain. While the findings have been informative, significant gaps remain in accurately simulating the human brain, particularly for high-level cognitive functions such as self-cognition and theory of mind [3,4,5]. Furthermore, model animals often exhibit limited interspecies conservation in intergenic regions [3, 6], restricting the understanding of noncoding regulations. Therefore, a human-based molecular atlas that is specific to brain tissues or cell types is indispensable [2].

Population-level molecular atlases for humans, such as the 1000 Genomes, have greatly propelled biomedical research by providing a deep catalog of genetic diversity across global populations [7,8,9]. However, the biology of regulation is missing when solely establishing genetic variant-disease associations, without considering regulatory molecules. A population-level multi-omics reference panel would be fundamental for elucidating the distribution and diversity of molecules, their interrelationships, and their associations with aging and various diseases [1, 10,11,12]. A multi-omics reference panel for human brain tissues was lacking until recent efforts through projects, such as GTEx, ROSMAP, PsychENCODE, MSBB, and the study by the Knight-ADRC [1, 3, 13,14,15]. The GTEx project has constructed a joint genome-transcriptome atlas, showing transcriptional distribution across 50 human tissues and revealing the effects of genetic variation on transcriptional regulation [1]. This has facilitated the discovery of potential regulatory molecules that mediate the effects of variants on diseases, as identified from genome-wide association studies (GWAS), thus partially addressing the “missing biology” between variants and diseases [1]. However, the sample sizes for brain tissues in GTEx are limited, with only 100–200 sample, most of which are predominantly from individuals of European ancestry [1]. The Religious Orders Study/Memory and Aging Project (ROSMAP) has analyzed the genomes, epigenomes, transcriptomes, and proteomes of hundreds of human brain tissue samples, primarily from individuals of European ancestry, thereby creating a multi-omics atlas mainly focused on understanding the etiology of Alzheimer’s disease (AD) [10, 11, 13, 16,17,18]. The PsychENCODE project aims to uncover the genetic and molecular mechanisms of psychiatric disorders through omics data from brain tissues, yet it has a limited representation of Asian samples [19,20,21,22]. The Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB–Mount Sinai NIH Neurobiobank) integrates multiple omics data across various brain regions alongside quantitative measures of neuritic plaque density and clinical dementia ratings to advance understanding of Alzheimer’s disease pathogenesis [15]. Our companion study has suggested that the molecular mechanisms underlying neuropsychiatric disorders might differ between Asian and European populations [23]. It also underscored that increasing genetic ancestral diversity is more efficient for power improvement for probing trait-associated genetic elements than increasing the sample size within single-ancestry reference panel [23]. Therefore, there is an urgent need to fill the gap in large-scale multi-omics brain atlases for Asian populations.

Due to the unique nature of brain tissues, obtaining population-scale samples through hospital- or community-based recruitment is very challenging. National brain banks offer a great opportunity for enabling systematic profiling of the human brain molecular features [24]. Led by the National Health and Disease Human Brain Tissue Resource Center and in collaboration with multiple centers within the China Human Brain Bank Consortium, we propose the China Brain Multi-omics Atlas Project (CBMAP). In Phase I of the project, we collected brain tissue samples from over 1000 Chinese donors. This project will provide a comprehensive landscape for the molecular profiles across the genome, epigenome, transcriptome, proteome, and metabolome for human brain samples (Fig. 1). It is worth mentioning that CBMAP will also encompass multiple PTMs that have not been captured in any of the previous studies at the population scale. PTMs are critical mechanisms of protein functional regulation, altering protein properties, localization, stability, and interactions through the addition or removal of specific chemical groups, or through protein cleavage [25, 26]. These modifications play essential roles in maintaining dynamic cellular functions and biological processes. Changes in the phosphorylation of tau protein are among the most characteristic manifestations of AD [27]. In addition, we will generate spatial omics and single-nucleus 3D structure of chromatin data for selected samples to provide a higher resolution and deeper insights into the molecular underpinnings of brain-related disorders. The 3D structure of chromatin is critical for gene regulation and cellular function. Chromatin regulates precise gene expression through topologically associating domains (TADs) and spatial interactions between enhancers and promoters [28]. These mechanisms are especially important in the complex development and function of the brain. Given the currently limited research on brain PTMs and 3D chromatin structures, unknown mechanisms could be revealed for the etiology and progression of brain diseases by a comprehensive study.

Fig. 1: An overview of the China Brain Multi-omics Atlas Project (CBMAP).
figure 1

The project aims to build a multi-omics atlas for the genome, epigenome, transcriptome, proteome, and metabolome. We will study the etiology of multiple brain-related disorders, mainly including Alzheimer’s disease (AD), Parkinson’s disease (PD), cerebrovascular disease (CVD), amyotrophic lateral sclerosis (ALS), schizophrenia (SCZ). The logo in the upper right corner is composed of a Chinese knot and a DNA double helix structure, which resemble the shapes of brain tissue and blood vessels. Chinese knots are more than just handicrafts; they also symbolics of traditional Chinese culture, representing people’s wishes for happiness, safety, health, longevity, and love.

CBMAP is building a multi-omics reference map of over 1000 human brains (Phase I), aimed at understanding the molecular networks and features of cognition, aging, and brain-related disorders based on the population level profile, supporting the China Brain Project (CBP) [5].

Methods

Sample and tissue collection

As illustrated in Fig. 2, Phase I of the CBMAP was initiated by the China Human Brain Bank Consortium following a standardized operational protocol for human brain banking [29,30,31]. Members of the consortium including the National Health and Disease Human Brain Tissue Resource Center at Zhejiang University (ZJU) in southeastern China, the National Human Brain Bank for Development and Function at Peking Union Medical College (PUMC) in northern China, and the Xiangya Medical School Brain Bank at Central South University (CSU) in central China take part in the Phase I of the CBMAP, contributing over 1000 donors in total. These three brain banks are among the earliest established and currently hold the top-ranked brain sample collections in China. The Phase I collection covers the Yangtze River Delta urban cluster by ZJU in the southeast, the Beijing-Tianjin-Hebei metropolitan area by PUMC in the north, and the Central Yangtze River region by CSU in the middle, encompassing populations of 111 million, 245 million, and 110 million, respectively, for a total population coverage of approximately 466 million people (2022 China Urban Development Potential Ranking).

Fig. 2: Sample distribution.
figure 2

Brain tissue samples for the CBMAP were collected from multiple brain banks from the China Human Brain Bank Consortium. Phase I of the CBMAP includes the brain banks of Zhejiang University, Peking Union Medical College, and Central South University. Recruitment is ongoing for the remaining brain banks. As shown in panel (A), a deeper red indicates a higher population density. These brain banks cover most of China’s densely populated areas. The age distribution, collection year distribution, and neuropathological diagnosis distribution of the Phase I samples are shown in panels (B), (C), and (D), respectively. ADNC Alzheimer’s disease neuropathological change, LBD Lewy body disease, CVD cerebrovascular diseases, PART primary age-related tauopathy, LATE limbic-predominant age-related TDP-43 encephalopathy, ARTAG aging-related tau astrogliopathy.

All donors or their families provided informed consent through voluntary donation agreements, permitting the use of their information and biospecimens for future studies. All procedures involving human participants were performed in accordance with the ethical guidelines of the institutional and national research regulations. The study protocol was approved by the Ethics Committee of Zhejiang University School of Medicine (2020–005 and 2024–007). During sample collection, we recorded the donor’s age, sex, time of death, and disease history. This project only included samples with an ischemia time of less than 24 h and excluded samples with highly pathogenic infections such as HIV. Donors under 18 years of age were not included in this project. Fetal and child donors will be incorporated in a future extension of the CBMAP, focusing on developmental processes.

In Phase II, the sample size will be expanded to approximately 2000 donors, by incorporating contributions from members of the China Human Brain Bank Consortium, including the Hebei Medical University Brain Bank in northern China, the Fudan University Brain Bank, and the Anhui Medical University Brain Bank in eastern China [32]. We are also actively working to integrate brain banks from the southwestern and northwestern regions of China as well as remote areas, e.g. Hainan, into the project to ensure comprehensive representation of most, if not all, populations in China.

Brain regions

In Phase I of the CBMAP, we prioritized the region at the intersection of Brodmann Area 9 (BA9) and the superior frontal gyrus. This region primarily overlaps with the dorsolateral prefrontal cortex (DLPFC), which is instrumental in managing various cognitive processes, including working memory, cognitive flexibility, and planning [33]. The broad functional spectrum of DLPFC implicates it in neurodegenerative diseases such as Alzheimer’s disease and a range of psychiatric disorders [3]. The DLPFC has been a focal point for several large-scale projects, including ROSMAP, PsychENOCDE, and GTEx, have also considered this brain region as a primary focus of their research [1, 3, 17]. In subsequent phases of the CBMAP, we will extend our research to encompass additional brain regions including hippocampus, striatum, amygdala, substantia nigra, and hypothalamus, thereby constructing a more comprehensive molecular atlas.

Neuropathological diagnosis

Samples collected in this project come with definitive pathological diagnoses, including primary age-related tauopathy (PART), limbic-predominant age-related TDP-43 encephalopathy (LATE), aging-related tau astrogliopathy (ARTAG), Alzheimer’s disease neuropathological change (ADNC), Lewy body disease (LBD), cerebrovascular diseases (CVD), and pathologically healthy controls. These uniformly applied pathological diagnostic procedures are organized by the National Health and Disease Human Brain Tissue Resource Center at Zhejiang University, following a unified process of quality controls [32, 34,35,36]. The neuropathological diagnoses were conducted by examining multiple brain regions based on standard procedures [34], and are not limited to the prefrontal cortex. The molecular features of pathological conditions and the pathological condition-specific regulation patterns (including genetic regulations) will be studied.

Genomics and epigenomics

We will conduct whole-genome sequencing (WGS). Considering factors such as sample size, frequency of rare variants, and sequencing costs, the sequencing depth for Phase I is set at between 10–20×. DNA methylation will be assessed using the Infinium MethylationEPIC v2.0 BeadChip. Additionally, we will carry out single-nucleus ATAC sequencing (snATAC-seq) on selected samples to capture cell-type-specific chromatin accessibility. To explore interactions among regulatory elements, we plan to conduct capture-Hi-C [37]. Moreover, we will employ a single-cell/nucleus tri-omic approach ChAIR (chromatin accessibility, interaction, and RNA simultaneously) for single-cell 3D epigenomic and transcriptional regulatory panoramic scanning [38].

Transcriptomics

To profile the transcriptome, we will conduct ribo-free bulk RNA-seq to cover as many samples as possible with RIN ≥ 5. In addition, we will conduct single-nucleus RNA sequencing (snRNA-seq) for selected samples to profile cell-type-specific regulations and to support the deconvolution-based cell-type proportion and cell-type level gene expression estimation. Random primers-based techniques including snRandom-seq will be implemented to capture the low-quality RNAs [39]. Long-read RNA-seq will be utilized to achieve a higher quality of identification of splicing events [40, 41]. Furthermore, we are conducting spatial transcriptomics on representative samples and regions to study the etiology of neurodegenerative diseases [42].

The representative samples are selected based on research questions. To probe disease-related transcriptomic features, we will select cases with clear pathological/clinical diagnoses (including different disease stages and comorbidities) and control samples matched by potential confounding factors. For identifying the molecular features of natural physiological processes (e.g., aging), we will select samples based on the distribution of age, sex, and region. These two parts will cover 100–200 samples of prefrontal cortex.

Proteomics, post-translational modifications, and metabolomics

Protein and post-translational modification (PTM) abundance will be measured using Liquid Chromatography-Mass Spectrometry (LC-MS). This will enable us to investigate protein and PTM profiles of serine/threonine phosphorylation, ubiquitination, and acetylation modifications. Additionally, we plan to examine other PTMs including lactylation to understand the hypoxia-related molecular consequences. For the samples planned for spatial transcriptomics and single-nucleus transcriptomics, we will also conduct spatial proteomics and PTM (at least for phosphorylomics) measurements to study the potential molecular functions within specific microenvironments across multiple omics layers.

In addition, we plan to perform metabolomics analysis on brain tissue homogenates and high-resolution spatial metabolomics. Homogenate metabolomics will leverage widely targeted LC-MS technology, complemented by targeted metabolomics for verification.

Data visualization

We will develop a data visualization portal to facilitate access by the scientific community. Results generated from the analysis, such as QTL signals and case-control differential expression, will be made available on the portal, with interactive query and download functionalities.

The current stage overview

In Phase I, we collected tissue samples from 1187 Chinese donors from the brain banks at Zhejiang University (N = 443), Peking Union Medical College (N = 568), and Central South University (N = 176). In the pilot stage of Phase I, we have completed the bulk-level measurement for the genome, epigenome, transcriptome, proteome, and phosphoproteome profile in a subset of samples (Fig. S1) and established the data processing and quality control pipeline. High-throughput data generation is underway for all available samples. In addition, we are generating snRNA-seq data, snATAC-seq data, and single nucleus-level 3D epigenomic and transcriptional regulatory maps for representative tissue samples, with particular interest in neurodegenerative diseases including AD. Moreover, we have finished pathological diagnoses including PART, LATE, ARTAG, ADNC, LBD, and CVD for most samples.

As shown in Fig. 2A, these samples represent the most densely populated regions in the northern, southeastern, and central parts of China. Among the donors, 37.4% are females, with a median age of 78 years and an interquartile range (IQR) of 66 (P25)–87 (P75) years (Fig. 2B). Since 2012, the number of donors has generally increased year by year (Fig. 2C). The detection rates of age-related neuropathological changes such as PART, LATE, and ARTAG among donors are 36.4, 46.1, and 12.2%, respectively (Fig. 2D). The detection rates for ADNC, LBD, and CVD are 47.6, 13.7, and 62.6% respectively (Fig. 2D). In addition, 24 donors, aged 27–96, have clinical records with schizophrenia diagnosis

In addition, we evaluated the potential impact of sample quality-related characteristics (including RIN and PMI) on molecular profiles based on the pilot data. We performed principal component analysis and presented the distribution of RIN (Fig. S1 A-D) and PMI (Fig. S1 E-H) on the top PCs of molecular traits including epigenome (DNA methylation), transcriptome, proteome, and phosphoproteome. In line with the observation in the GTEx v8 prefrontal cortex BA9 samples (Fig. S2), the RNA degradation level can be captured by PC1 of the gene expression profile. No clear patterns emerged between the other molecular profiles and sample quality characteristics. These preliminary results suggest that gene expression profile is sensitive to RIN, while the PMI (within 24 h) plays a limited role in these molecular profiles. RIN, PMI, age, and sex will be adjusted as covariates for regression models. Compared to ROSMAP, preliminary cis-eQTL analysis in CBMAP samples revealed a greater number of SNP-gene expression associations with an FDR < 0.05 (Fig. S3). A concordant trans-eQTL pattern was observed between CBMAP and ROSMAP (Fig. S3). The observations above indicated a good tissue sample quality.

Comparison with other projects

Compared with the GTEx (we only refer to brain tissues in this context), ROSMAP, PsychENCODE, MSBB, and the project by Knight-ADRC [1, 3, 13, 14], the CBMAP stands out in several key aspects (Table S1): (1) we have included over 1000 donors in Phase I and will increase the sample size to 2000 in the next phase; (2) all the samples are Chinese, a population scarcely represented in previous studies; (3) alongside single-nucleus sequencing, bulk-level high-throughput measurements will be systematically conducted across the entire molecular spectrum, from the genome to the metabolome; (4) high-resolution spatial omics will be profiled for representative samples; (5) single nucleus-level 3D genome interaction will be revealed; (6) multiple post-translational modifications will be profiled and associated with other molecular layers, population characteristics, and diseases.

We have also compared the sample characteristics across the projects. In Phase I, we took samples from the BA9 region and conducted RIN value measurement, which showed that the median and interquartile range of RIN values were comparable to those of the GTEx project’s prefrontal cortex (PFC) BA9 region samples and the ROSMAP project’s PFC region samples (Fig. 3A). The median and interquartile range of postmortem interval (PMI) for CBMAP samples were lower than those of the GTEx project and comparable to those of the ROSMAP project (Fig. 3B). Notably, the CBMAP covers a much broader age range than either GTEx or ROSMAP (Fig. 3C). Only minimal differences were observed across CBMAP centers (Fig. 3D, E, F).

Fig. 3: Distribution of RNA integrity, postmortem interval (PMI), and sample age across different datasets.
figure 3

A, (B), and (C) show the RIN values, PMI (hours), and age for the CBMAP, GTEx (prefrontal cortex BA9), and ROSMAP projects, respectively. D, (E), and (F) display the distribution of RIN, PMI, and age for the three centers in the CBMAP Phase I. The violin plots illustrate the distribution characteristics of the data. The box spans the interquartile range (IQR), with the median indicated by a horizontal line inside each box. Whiskers extend to the smallest and largest values within 1.5 times the IQR. Outliers beyond this range are plotted as individual points. CSU Central South University, PUMC Peking Union Medical College, ZJU Zhejiang University.

Comparisons will be made for the regulation of molecules across different projects and different ancestries. Meanwhile, variant- and gene-level fine-mapping will be facilitated to have a better resolution by leveraging multi-ancestry datasets.

Goals of the project

  1. (1)

    To characterize various molecular elements in Chinese population and highlight potential differences within the Chinese population, as well as between Chinese and other populations.

  2. (2)

    To establish a comprehensive atlas of molecules and co-regulation modules associated with aging and neuropsychiatric disorders (including comorbidities) across multiple layers. To investigate potential molecular subtypes for brain related diseases and clinical manifestations.

  3. (3)

    To generate integrated maps of spatial omics, single-nucleus 3D epigenetic interactions, and transcriptional regulation for representative samples, providing deep insights into cellular features underlying brain related conditions.

  4. (4)

    To profile multiple PTMs, uncovering protein-level crosstalks at the population level that have not yet been extensively studied in the European population. This also includes identifying PTM quantitative trait loci (ptmQTLs) and elucidating their local and distal regulatory mechanisms.

  5. (5)

    To elucidate the genetic architecture of various molecules, which may exhibit ancestry-specific, disease-specific, and age-specific characteristics. With a comprehensive spectrum of omics, we aim to estimate the configurations [43] and potential causal mechanisms of regulatory cascades initialized by genetic variants [44].

  6. (6)

    Under the hypothesis of “different ancestries, same pathogenic genetic element(s),” we aims to enhance mapping resolution in the study of disease-associated molecules by integrating multi-ancestry data [45,46,47].

Prospectives

The power of genetics

The completion of the Asian reference panel will enable the bridging of SNPs and diseases by GWAS-based transcriptome-wide association study (TWAS), proteome-wide association study (PWAS), and other genetics-informed studies [48]. The availability of data from Asian populations will facilitate cross-ancestry fine-mapping, enabling more accurate identification of trait-associated molecules [45, 47, 48]. Unlike case-control studies, which can be biased by confounding factors and reverse causation, research based on germline variants offers a natural advantage in causal inference, making it a powerful method for identifying pathogenic molecules. Such genetics-based probing provides crucial support for drug development and repurposing. Recent studies have shown that the success rate of drug development backed by genetic evidence is 2.6 times greater than that without genetic evidence [49], offering a strong basis for future research leveraging the power of genetics [50].

A live cohort of potential donors

The project is supported by the National Health and Disease Human Brain Tissue Resource Center, which is actively constructing a living cohort for potential donors. This setting will enable more accurate pre-mortem data collection including assessments of cognitive functions, lifestyle preferences, environmental factors, and laboratory measurements. Such comprehensive data collection will be invaluable for future research, particularly in exploring gene-environment interactions [5]. Additionally, this initiative enables the collection of peripheral samples, including blood, cerebrospinal fluid, and skin fibroblasts, which can be used to generate induced pluripotent stem cells (iPSCs). These iPSCs will be used to create in vitro models that mimic human conditions, facilitating the following validation [5].

CBMAP Phase II

In Phase II of CBMAP, we aim to significantly expand and diversify our resources to provide deeper insights into the molecular landscape of the human brain. Building on the foundation of Phase I, we plan to increase the atlas’s sample size from  ~ 1000– ~ 2000. Additional brain banks from the Hebei, Anhui, and Shanghai, as well as brain banks from more remote areas, such as Hainan, will be included to ensure comprehensive representation of most, if not all, populations in China. During Phase II, we will specifically gather brain samples from donors under 18 and fetal brain samples. Multi-omics analysis will be conducted on 300–500 samples from key brain regions including hippocampus, striatum, amygdala, substantia nigra, and hypothalamus, with a focus on several major neurological diseases. Additionally, 5–10 samples will undergo whole-brain spatial transcriptomics and proteomics mapping. This will be complemented by high-resolution MRI-based 3D coordinate mapping for each tissue sample.

We also plan to include living cohorts, such as specialized disease cohorts (e.g., mental disorders and amyotrophic lateral sclerosis [ALS]) and brain-disease-free aging cohorts. We will integrate the health and medical records of these living cohorts, along with molecular characteristics of peripheral samples (e.g., blood and feces), to investigate the connections between peripheral and brain tissues and their relationships with diseases.

The goals for Phase II include establishing a larger and more regionally representative multi-omics atlas of Chinese brain tissue, validating the findings from Phase I, and uncovering new molecular features. We also aim to identify disease-related spatial molecular signatures across multiple brain regions and complete a comprehensive spatial molecular feature map of the entire brain. Importantly, by integrating peripheral tissue data with post-mortem brain samples, we seek to create a network of molecular profiles that bridges living and post-mortem analyses, advancing our understanding of systemic molecular interconnections.

Project organization and data sharing

The CBMAP is spearheaded by the China National Health and Disease Human Brain Tissue Resource Center, leveraging the China Human Brain Bank Consortium for tissue collection. A centralized hub, supported by specialized cores for sample collection, pathology, high-throughput data analysis, methodology development, data visualization, data management, and sharing, ensures the efficient execution and scalability of the project (Fig. S4).

For data sharing, the raw sequencing data will be uploaded to the GSA platform of the China National Bioinformatics Center (https://ngdc.cncb.ac.cn/gsa/). Researchers can apply for data use either through the National Health and Disease Human Brain Tissue Resource Center (http://zjubrainbank.zju.edu.cn/index) or via the GSA platform. All applications will be reviewed and approved by the academic committee and the CBMAP project team.

Conclusion

The CBMAP is pioneering a comprehensive molecular atlas of the human brain, spanning a wide range of ages and standardized pathological diagnoses. By mapping the entire central dogma, including multiple PTMs, the metabolome, spatial omics, and single-nucleus 3D genome structures, we are laying a crucial foundation for unraveling the complex mechanisms underlying brain function. Our progress in this endeavor brings us closer to understanding the mysteries of the human mind and advancing our ability to address brain-related disorders.