Main

With an exponential increase in launches since 2019, space is rapidly becoming more accessible1. Multiple commercial and state-sponsored groups are developing roadmaps to construct space stations, moon bases, Mars colonies and other permanent establishments beyond Earth1. Although innovation across the aerospace sector makes these ambitions technologically achievable, the biomedical challenges for crews in these extraplanetary habitats still need to be addressed, as humans did not evolve to survive in such extreme environments. The clinical consequences of this evolutionary mismatch of spaceflight exposure and adaptation have revealed a plethora of challenges to long-term space habitation, including a loss in bone density and muscle mass2, spaceflight-associated neuro-ocular syndrome3, perturbed immune function4 and spaceflight anaemia5. These physiological changes during spaceflight appear to imprint on the health status in humans; a chief example of this long-term effect is the increased risk of cardiovascular pathology observed in astronauts compared with age-matched controls6.

Before long-term space habitation is feasible, these biomedical challenges must be understood and mitigated. However, the aetiologies driving them are not understood, with the low number of astronauts yielding limited opportunities for in-depth biomolecular characterization. For example, two of the largest multi-omic studies to date have been the NASA Twins Study, which has published an in-depth molecular and cognitive profile of a single astronaut7, and the Japanese Aerospace Exploration Agency (JAXA) Cell-Free Epigenome (CFE) project, which has profiled cell-free DNA (cfDNA) and cfRNA in six astronauts8,9. Thus, achieving statistical power requires integrating data from other cohorts, such as the MARROW study, and other missions5,10. Furthermore, these analyses are complicated by the substantial variation in physiological responses to spaceflight among astronauts. Therefore, there is a need for increasingly large, detailed multi-omic profiles of astronauts to characterize the diversity of physiological shifts as a function of spaceflight11.

To achieve this end, we have leveraged the burgeoning commercial spaceflight industry. With the launch of SpaceX’s 2021 Inspiration4 (I4) mission, a cohort of all-civilian astronauts successfully completed a high elevation (585 km), 3-day orbital mission within a SpaceX Dragon capsule. Using recently developed protocols, the crew participated in a range of biospecimen collections before, during and after their mission12. We used the I4 biospecimens to deeply profile the effect of the stressors of spaceflight (for example, microgravity and space radiation) on crew physiology and health. We also compared these results with previous missions and control datasets, creating the largest-to-date molecular atlas of the effect of spaceflight on the human body, encompassing almost 3,000 samples and over 75 billion sequenced nucleic acids. Collectively, these resources are referred to as SOMA, and the samples are linked to a Cornell Aerospace Medicine Biobank (CAMbank) that stores viably frozen specimens for future, additional analyses.

As with Earth-based cohorts13, these accessible data — when profiled and aggregated at scale — will enable the development of both personalized and general medical guidance for astronauts14. A large group of subject matter experts in artificial intelligence has recently released recommendations focused on the importance of generating and archiving space data into the NASA Open Science Data Repository (OSDR)15,16 to enable autonomous and intelligent precision space health systems, and to monitor, aggregate and assess biomedical statuses for future deep space missions17. In addition, the study of the parallels between the physiological effects of spaceflight and ageing, chronic disease and immune system disorders using omics data can pave the way for therapeutics applicable to conditions on Earth.

Here we present a detailed guide to the SOMA resource, which includes the 2,911 samples collected during the I4 mission11,12, as well as spatial transcriptomics data, long-read profiles of astronaut RNA, microbiome data, exosome profiles and in-depth immune diversity maps. Additional spaceflight data were annotated and compiled into the SOMA portal to help contextualize gene, protein or metabolite dynamics, including data from the NASA Twins Study7, JAXA’s CFE mission8,9, single-cell RNA sequencing (scRNA-seq) data after simulated microgravity on peripheral blood mononuclear cells (PBMCs)18, and rat or mouse spaceflight data matched to human orthologues. In addition to rigorous dataset annotations, we detail (1) a comparison of conclusions on NASA Twins Study and flight dynamics comparing short-duration and long-duration missions, (2) cell-type-specific responses to spaceflight previously undocumented in astronauts, (3) cfRNA expression profiles showing haematological responses during recovery from spaceflight, and (4) additional analyses on individual responses to spaceflight from proteomic, transcriptomic and microbiome data. Data and samples generated in this study are available through SOMA (https://soma.weill.cornell.edu), NASA OSDR (https://osdr.nasa.gov/bio/) and CAMbank (https://cambank.weill.cornell.edu/), which offer an unprecedented view of the multi-system omics changes before, during and after spaceflight.

Comprehensive astronaut data resource

To generate a comprehensive profile of the physiological changes of the I4 crew (29, 38, 42 and 51 years of age), 13 biospecimen sample types were collected and processed, including whole blood, serum, PBMCs, plasma, extracellular vesicles and particles (EVPs) derived from plasma, dried blood spots, oral swabs, nasal swabs, skin biopsies, skin swabs, capsule (SpaceX Dragon) swabs, urine and stool specimens12. After collection, samples were subject to a battery of multi-omic assays, including clinical (CLIA) whole-genome sequencing, a clonal haematopoiesis panel, direct RNA-seq (dRNA-seq), single-nucleus RNA-seq (snRNA-seq), single-nucleus assay for transposase-accessible chromatin with sequencing (snATAC-seq), single-cell B cell repertoire (BCR) and T cell repertoire (TCR) V(D)J sequencing, untargeted plasma proteomics (liquid chromatography–tandem mass spectrometry), untargeted plasma metabolomics, cfDNA sequencing, cfRNA, metagenomics, metatranscriptomics and spatially resolved transcriptomics. In addition, chemokine, cytokine and cardiovascular biomarkers were quantified, and a CLIA lab (Quest Diagnostics) was used to perform a complete blood count and comprehensive metabolic panel (Fig. 1a). Datasets were generated across ten timepoints: three pre-flight (L−92, L−44 and L−3), three in-flight (flight day 1 (FD1), FD2 and FD3), one immediately post-flight (R+1) and three recovery (R+45, R+82 and R+194) spanning 289 days (Fig. 1b). Assays were performed on all crew members unless otherwise noted (Fig. 1b and Supplementary Table 1).

Fig. 1: Compendium of astronaut omic data and time-series analysis paradigms.
figure 1

a, Omics and biochemical assays were performed on blood (whole blood, serum, PBMCs, plasma, plasma-derived EVPs and dried blood spots), oral (microbiome swabs), nasal (microbiome swabs), skin (biopsy and microbiome swabs), environmental (env.; microbiome swabs) and excrement (excrem.; urine and stool) samples. b, The timepoints of this study are separated into four different categories: pre-flight (L−92, L−44 and L−3), in-flight (FD1, FD2 and FD3), post-flight (R+1) and recovery (R+45, R+82 and R+194). The coloured circles indicate which assay was performed at each timepoint. Assays were performed on all crew members, unless denoted with an asterisk. c, Indicator for which assay types have been previously performed in spaceflight studies, broken down by the NASA Twins Study, JAXA studies and anonymized NASA cohort studies. Anon., anonymized.

A total of 2,911 samples were banked, with 1,194 samples processed for sequencing, imaging and biochemical analysis (Supplementary Table 1). These results and assays subsume and expand on work and protocols from previous missions, including the JAXA CFE study, the NASA Twins Study and some NASA astronauts (Fig. 1c). This latter category spans studies primarily from the International Space Station (ISS) that lack certain metadata, primarily duration spent in space and launch dates, to maintain astronaut anonymity. These studies include chemokine/cytokine biomarker panels (n = 46 astronauts), comprehensive metabolic panels, telomere length quantitative PCR (qPCR) and ISS-surface metagenomic profiling (Fig. 1c).

The SOMA resources were first compared with the NASA OSDR database, which contains all publicly accessible human omics data from spaceflight and ground analogue studies. OSDR hosts 76 human omics studies, of which 11 are from human primary cells exposed to spaceflight. The other studies encompassed cell line and ground studies, including high-altitude studies (Extended Data Fig. 1a and Supplementary Table 2), which were all merged with the SOMA dataset. Once merged, the total number of sequenced nucleic acid molecules from this study represents a more than tenfold increase in the total amount of human omics data in the OSDR (Extended Data Fig. 1b), across all spaceflight studies, ground studies, cell line and primary cell experiments (Extended Data Fig. 1 and Supplementary Tables 2 and 3).

The data from the missions were then divided into three analysis timeframes: (1) flight profiles, (2) recovery profiles and (3) longitudinal profiles (Extended Data Fig. 1c). Flight profiles reveal the most immediate effect of spaceflight, recovery profiles catalogue changes that occur after return to Earth, and the longitudinal profiles identify changes that have not returned to baseline after returning to Earth. We focused on several outputs for the resource, including first calculating differentially expressed genes (DEGs) for (1) PBMC snRNA-seq, (2) whole-blood dRNA-seq, (3) skin spatially resolved transcriptomics, and (4) cfRNA. We also mapped differentially methylated genes from whole-blood dRNA-seq, differentially accessible regions from PBMC snATAC-seq, isotype identification from TCR and BCR V(D)J sequencing, differentially abundant proteins from plasma and EVP proteomics, differential metabolites from liquid chromatography–mass spectrometry metabolomics, and microbial differentials from metagenomic and metatranscriptomic assays (Extended Data Fig. 2), with all raw and processed data annotated in the OSDR (Supplementary Table 4).

I4 reproduces NASA Twins Study

Telomere elongation has been previously described in three astronauts who stayed for 6 months to 1 year aboard the ISS7,19,20, but it was unclear how quickly such a phenotype appeared in astronauts. The average telomere length in all I4 crew members increased during spaceflight (17–22% longer), and this trend was statistically significant (mixed-effects linear model P = 0.0048; Fig. 2a). This finding is particularly notable, given the shorter mission duration (3 days total) and higher elevation of the I4 mission than the ISS studies, indicating that telomere length dynamics respond much more rapidly to spaceflight than previously observed.

Fig. 2: Telomere and cytokine Twins Study comparison.
figure 2

a, Normalized average telomere lengths for I4 crew members, pre-flight, during flight and post-flight, determined by qPCR analyses of blood (DNA) collected on dried blood spot (DBS) cards (n = 32 samples for 4 independent participants across 8 timepoints). Two-sided P values were derived using a mixed-effects linear model that incorporated fixed effects for different timepoints (pre-flight, in-flight, post-flight and recovery) and random effects to account for variations among participants. The centre of the boxplots represents the median, the box hinges encompass the first and third quartiles, and the whiskers extend to the smallest and largest values no further than 1.5 × the interquartile range (IQR) away from the hinges. b, Changes in downregulated (DN; purple) and upregulated (UP; orange) gene expression log2 fold-change directionality post-flight from the Twins Study versus I4 in CD19 B cells, CD4+ T cells and CD8+ T cells (statistical significance was determined by a two-sided Wilcoxon rank-sum test). The number of genes is shown below the violin plots. The centre white dot represents the median, and the white line shows the range of the first and third quartiles. c, Relative cytokine/chemokine abundance pre-flight, post-flight and during recovery in the I4 crew versus the NASA Twins Study and anonymized NASA astronaut cohorts for CCL2, IL-10 and IL-6. MLBT, multiplexing LASER bead technology. Pre, pre-flight median; Post, post-flight (R+1). d, Relative abundance of BDNF and IL-19 pre-flight, post-flight and during recovery in the I4 crew. In panels c and d, the two-sided P values and adjusted q values were derived using a mixed-effects model that incorporated fixed effects for different timepoints (pre-flight, in-flight, post-flight and recovery) and random effects to account for variations among participants, except in the Twins Study, which had a single participant (n = 1). P values with an asterisk have a q > 0.05 after multiple correction testing.

We then compared the DEGs and cytokine changes from the Twins Study with those observed in the snRNA-seq data from the I4 mission, as well as compared with the expected DEGs of the assay from replicate negative control donor PBMCs (see Methods). The cross-mission DEG comparison highlighted a consistent response between both types of T cells, including CD4+ and CD8+ markers (552 and 608 DEGs, respectively, both <2.2 × 10−16), across both sorted T cells or single-cell annotated cells (Fig. 2b). Conversely, B cells were less responsive to spaceflight, as expected from previous work in the Twins Study7, which showed B cells as either not significant or less responsive to spaceflight. For the four overlapping cytokines measured in the Twins Study with our panel, we found significant increases in three: IL-6 (P = 0.014), IL-10 (P = 0.021) and CCL2 (P = 0.040) (Fig. 2c); these cytokines also showed changes and similar increases in other long-duration (more than 6 months) crews (Fig. 2c). However, we also ran a differential analysis of all cytokines detected on the I4 mission, to detect any differences from the Twins Study. Indeed, the levels of BDNF showed a statistically significant decrease (P = 0.00011, q = 0.0153), and IL-19 levels showed a statistically significant increase (P = 0.00015, q = 0.0153) during the post-flight (R+1) timepoint that returned to baseline during recovery (R+45 and R+82; Fig. 2d).

Distinct RNA fingerprints of spaceflight

Beyond recapitulating known biomarkers of spaceflight, the atlas integrated newer assays that were not available in previous missions, with a particular emphasis on RNA profiling. The first novel assay was spatially resolved transcriptomics on skin biopsies, which were obtained from all crew members during one pre-flight timepoint (L−44) and the day after landing back on Earth (R+1). The 4-mm biopsies were stained with markers for DNA, PanCK, FAP and α-SMA and then processed with the NanoString/Bruker GeoMx Digital Spatial Profiler, where regions of interest were selected based on the tissue structures identified by the fluorescence staining (Extended Data Fig. 3a). After filtering out outliers, the RNA counts were used for downstream data analysis, generating 95 regions of interest across four skin compartments: outer epidermis, inner epidermis, outer dermis and vasculature (Extended Data Fig. 3b,c). This analysis revealed a distinct set of DEGs, including JAK–STAT signalling, and melanocyte signatures (Fig. 3).

Fig. 3: Body-wide tissue stress map with cfRNA.
figure 3

a, Cell-type deconvolution using Bayes Prism with Tabula Sapiens as a reference. Top ten cell types by average fraction across all samples with all remaining cell types summed together as ‘other’. b, Cell type of origin for hepatocytes, endothelial cells, haematopoietic stem cells and melanocytes, which all show increased abundance during post-flight and recovery timepoints. c, Cell proportion changes in different layers of the skin from spatially resolved transcriptomics on skin biopsies. Predicted melanocyte abundance changes are significant in the inner epidermal and outer dermal skin compartments. In panels b and c, n = 4 independent participants across 7 timepoints. The centre of the boxplots represent the median, the box hinges encompass the first and third quartiles, and the whiskers extend to the smallest and largest values no further than 1.5 × IQR away from the hinges. NS, not significant; **P ≤ 0.01; ***P ≤ 0.001.

A second RNA assay for spaceflight integrated into SOMA was cfRNA profiling, which has recently been established as a dynamic tool for mapping temporal alterations in cfRNA composition and cell lysis21. However, bulk cfRNA had not been utilized to measure the response of spaceflight until the JAXA CFE study8,9 and the I4 mission12. Using principal component analysis, we identified a distinct separation in cfRNA profiles pre-flight versus post-flight and recovery for I4, suggesting a systemic physiological shift probably induced by space travel (Extended Data Fig. 3d). This was further reflected in the differential abundance of cfRNA genes across various timepoints, revealing specific patterns of noncoding expression (Extended Data Fig. 3e) and RNA types (Extended Data Fig. 3f) that correspond with the spaceflight timeline. The cell-type proportions inferred from the cfRNA profiles also exhibited spaceflight-associated variation over time and showed variation distinct from a set of healthy blood donor controls (n = 35; Fig. 3a and Supplementary Table 5). Cell types that showed significant post-flight shifts in proportion included hepatocytes, kidney endothelial cells, haematopoietic stem cells and melanocytes (Fig. 3b and Supplementary Table 5). Of note, the melanocyte cell proportions that demonstrated significant changes post-flight (Fig. 3c) were also found in the spatial skin transcriptomics data, providing additional evidence of adaptive skin responses to the space environment.

A third novel RNA method applied to these spaceflight samples focused on RNA isoforms and RNA modifications (epitranscriptome), through dRNA-seq on the Oxford Nanopore Technologies PromethION and deep RNA-seq (more than 400 million reads per sample) on the Ultima Genomics UG100. These data quantified genes that were differentially expressed and displayed differential N6-methyladenosine methylation (Extended Data Fig. 4a,b), or both, and were analysed for enriched Gene Ontology pathways. We identified a set of sites (set M-I; Extended Data Fig. 4b) that undergoes hypomethylation during recovery, another set (set M-II) that is detectably hypermethylated from spaceflight, and a set (set M-III) that exhibits novel hypermethylation during recovery and longitudinally as well. The common pathways in all three sets (Extended Data Fig. 4b) showed evidence of radiation and telomere response, including ‘TSAI response to radiation therapy’ and ‘Wiemann telomere shortening’22. In addition, the set of downregulated pathways after landing (recovery) was distinct, including genes associated with breathing regulation (for example, CO2 and O2 take-up and release by erythrocytes), which matches those pathways associated with crew response ranges (below, Fig. 5). Although further studies are needed to validate and delineate the potential mechanisms of these RNA dynamics (cfRNA, spatial and RNA modifications), these pathways suggest a potential relationship between RNA expression and methylation in regulating haematological and dermatological functions upon return to Earth.

Gene regulatory changes during recovery

Leveraging the time-series data, we next analysed PBMC gene expression from snRNA-seq to discern whether unique DEGs are present at each timepoint. We examined DEGs from immediately post-flight (FP1) through the recovery profiles (RP1 and RP2) to observe how gene expression profiles are re-established after spaceflight. The number of DEGs was used to quantify the severity and recovery of crew response, and we also compared identified DEGs from a negative control group, including two healthy donors and read-depth- and cell-count-matched permutation group (Extended Data Fig. 5), with an average of more than 700 cells per crew member, per timepoint, per cell type. The DEG count (adjusted P < 0.05, |log2 fold change (FC)| > 0.5) decreased from FP1 to RP1 in CD16+ monocytes, dendritic cells, natural killer cells, ‘other’ T cells (but not CD4+ or CD8+), uncategorized (‘other’) cells and in pseudobulk (calculated from additive counts across cell types to represent PBMCs). However, the DEG count increased in B cells, CD4+ T cells, CD8+ T cells and CD14+ monocytes (Fig. 4a); by RP2, all cell types had lower DEG counts than at FP1 and began to approach the expected noise range of single-cell DEGs (Fig. 4a).

Fig. 4: Recovery profile dynamics in PBMCs.
figure 4

a, Number of DEGs from PBMC snRNA-seq for each cell type during the flight, recovery and longitudinal profiles (adjusted two-sided P < 0.05, |log2FC| > 0.5). NK, natural killer. b, Fraction of DEGs shared with FP1 at RP1, RP2 and LP2 for each cell type. c, Directionality of log2FC between FP1 and RP1 and RP2 for DEGs present in both profiles. d, Bar chart of pathways of DEGs present in RP1 that were absent in FP1 in monocytes and T cells. Bars are shaded by the false discovery rate (FDR) value, and the enrichment ratio is on the x axis.

As DEG counts were higher at RP1 than at FP1 for several cell types, we hypothesized the introduction of distinct DEGs after landing back on Earth. First, we calculated the percentage of DEGs unique to FP1 at both RP1 and RP2, focusing on cell-type-specific DEGs (Fig. 4b). By LP2 (L−92, L−44 and L−3 versus R+82), we observed that nearly all differential gene expression was driven by the same set of genes that were differentially expressed at FP1, with the exception of CD14+ monocytes, which had a unique longitudinal profile not strongly connected with the DEG responses in other cells. With the exception of CD14+ monocytes, the RP1 profile is distinct in the abundance of DEGs not present in FP1, but this perturbation almost entirely (more than 95% of genes) disappeared after flight (RP2 and LP2). From the DEGs shared with FP1, we observed that gene expression directionality reversed for nearly all DEGs between FP1 and each recovery profile (Fig. 4c), indicating a return to baseline for those DEGs. The exception to this were two genes from the CD14+ monocyte population — AHR (log2FC of 0.742 (FP1) and log2FC of 0.808 (RP1)) and PELI1 (log2FC of 0.539 (FP1) and log2FC of 0.560 (RP1); Fig. 4c) — in which there was a positive log2FC expression at both FP1 and RP1.

From the DEGs that were unique to FP1, we next quantified the uniformity of the pathway enrichment between two PBMC lineages: T cells (CD4+ and CD8+) and monocytes (CD14+ and CD16+; Supplementary Table 6). Of note, 211 (70.8%) genes were unique to individual cell types and 87 (29.2%) were shared between two or more cell types (Extended Data Fig. 6a). To identify enriched pathways among the gene set for each cell type, overrepresentation analysis was performed on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Supplementary Table 6). Even though 70.8% of genes were unique to each cell type, their response was more convergent; unique pathways only ranged from 4.2% (CD4+ T cells) to 42.5% (CD14+ monocytes; Extended Data Fig. 6b). Yet some pathways were cell-type specific, such as the circadian entrainment pathway in CD14+ monocytes, with genes GNG2, GNAS, PRKCB, CREB1 and CAMK2D driving the overrepresentation from the gene set for this pathway (Fig. 4d). In addition, our data suggest that overrepresented pathways are sometimes more associated with cell lineage (for example, all T cells) than with the individual cell type (for example, CD4). This is evident from the longevity-regulating pathway that shares the PRKACB, NFKB1 and PIK3CA genes between both CD4+ and CD8+ T cell populations (Fig. 4d).

In addition, we found that certain pathways can be enriched across specific cell types belonging to different lineages. The inflammatory mediator regulation of the ‘TRP channel’ pathway, for example, was overrepresented in both CD8+ T cells and CD14+ monocytes, with each cell type contributing different gene sets associated with the pathway (Fig. 4d). Other pathways, such as the IL-17 signalling pathway, contain mixtures of shared genes and unique genes and are significantly overrepresented across all cell types (Fig. 4d). This also indicates that although cell types are distinct, they appear to have a set of core pathways in response to spaceflight, with 30–60% of overrepresented pathways shared both within and between lineages (Extended Data Fig. 6b).

To examine the chromatin accessibility and regulatory landscape in each cell population, we then analysed the transcription factor-binding site (TFBS) motif accessibility changes from snATAC-seq data to identify the top motifs in flight and recovery profiles (Extended Data Fig. 7). We first observed that increased gene expression was correlated with more accessibility at the transcription start sites, and closed chromatin was associated with downregulated genes (all P < 0.05, Wilcoxon rank-sum test, for all cell types; Extended Data Fig. 7a). The motif accessibility changes associated with recovery profiles recapitulated the trend in gene expression and chromatin accessibility data, which showed lower accessibility in regions of the genome with closed chromatin and higher expression in regions of more accessibility (Extended Data Fig. 7b). In addition, we showed the top five upregulated and downregulated TFBS motifs per cell type in FP1, RP1 and RP2, which revealed both common and distinct motifs and their accessibility across cell types (Extended Data Fig. 7c–e). These data also provided further evidence of cell-type specificity in the differences of chromatin and transcription factor accessibility dynamics.

Intra-individual spaceflight responses

To understand individual variation during spaceflight, the coefficient of variation was measured across microbial, proteomic, cytokine and gene expression normalized count data, calculated by time interval (pre-flight, in-flight, post-flight and recovery). Normalized coefficient of variation scores (see Methods) were calculated for each body area from the microbial swabs from both metagenomic and metatranscriptomic data. The oral and forearm microbial variation (Extended Data Fig. 8a,b) showed Rothia mucilaginosa and Staphylococcus epidermidis as leading variable strains, but each body site has distinct higher coefficient of variation species (other body sites are shown in Extended Data Fig. 8c–j and Supplementary Table 7). Similarly, abundance standardized coefficient of variation scores were calculated for EVP proteomic, plasma proteomic, metabolomic, RNA-seq, dRNA-seq, and cytokine abundance normalized protein and gene counts (Extended Data Fig. 9a–f). To characterize the participant-to-participant variation and null distributions, we also calculated the differentials for FP1 along with label permutation testing on these calculations to confirm that post-flight coefficient of variation was not higher than other timepoints (see Methods; Supplementary Table 8).

We then utilized the data across missions (I4, Twins Study and JAXA), assays (scRNA-seq, cfRNA, bulk RNA-seq and proteomics) and cell types (CD4, CD8, CD14 and CD16) to find the most recurrent pathways associated with spaceflight. The most enriched pathways were antigen binding, haemoglobin, cytokine signalling and immune activation, which confirmed the DEG-related signatures and also validated the signatures from our I4 mission (Fig. 5). Moreover, the NASA Twins Study data and cfRNA profiles from the JAXA study confirmed many of the top-ranked pathways, while also showing differences between bulk blood RNA and protein markers and purified cell populations. These data indicate that leveraging both bulk and sorted cell populations can help to clarify signals coming from crew blood dynamics related to spaceflight.

Fig. 5: Pathway enrichment of most variable genes.
figure 5

Enriched pathways in post-flight compared with pre-flight across various assays and missions, analysed using fast gene set enrichment analysis (fGSEA). The colour represents the normalized enrichment score, whereas the dot size indicates Benjamini–Hochberg adjusted q values. Only the pathways with unadjusted P < 0.01 are shown. The barplot shows the total number of comparisons with q < 0.05 for every pathway, coloured by the direction of the enrichment. The column ‘mixed cell type’ refers to whole blood for I4 data and lymphocyte-depleted cells for the NASA Twins Study. GOBP, gene ontology and biological process; GOMF, gene ontology molecular function; HP, human phenotype; LPS, lipopolysaccharides; NPC, neural progenitor cells; PID, pathway interaction database; WP, wikipathways.

Access to datasets and crew samples

This study introduces full accessibility of astronaut data to the scientific community. Datasets are accessible through an online web portal and scientific data repositories, as well as controlled access to more sensitive data (for example, genetic sequence data) and physical specimens. The online web portal splits the data into three data browsers: the SOMA browser, the I4 single-cell browser and the microbiome browser (Extended Data Fig. 10a–c). The SOMA browser (https://soma.weill.cornell.edu/apps/SOMA_Browser/) enables visualization of gene expression (bulk RNA-seq, snRNA-seq, cfRNA-seq and spatially resolved transcriptomics), mass spectrometry (plasma proteomics, plasma metabolomics and plasma EVP proteomics) and microbial (metagenomic and metatranscriptomic) data. Gene expression and protein abundance are visualized for each astronaut by timepoint, with fold-change values, statistical significance (q value) and summary tables. Any selected gene is also then plotted across data for other missions, including the JAXA CFE study, the NASA Twins Study, mouse spaceflight datasets from NASA OSDR and Genelab, and control cohorts. For additional granularity, the single-cell browser (https://soma.weill.cornell.edu/apps/I4_Multiome/) provides visualizations specific to single-cell gene expression (scRNA-seq) and chromatin accessibility (scATAC-seq) data, and includes quality metrics, cell-type annotations, gene co-expression and chromatin accessibility magnitude estimates. Finally, the microbiome browser contains metagenomic and metatranscriptomic boxplots (https://soma.weill.cornell.edu/apps/I4_Microbiome/) from each timepoint of the study, spanning eight skin locations, deltoid swabs collected pre-skin biopsy, swabs of the SpaceX Dragon capsule and stool samples.

The remaining biospecimens from this study have been preserved and catalogued for continued use by the scientific community (https://cambank.weill.cornell.edu/). These samples include venous blood plasma, venous blood serum, viably frozen PBMCs, vacutainer red blood cell pellets, urine (both crude and with nucleic acid preservative), extracted saliva nucleic acids (DNA and RNA), extracted whole-blood total RNA, extracted skin swab nucleic acids (DNA and RNA) and extracted stool nucleic acids (DNA and RNA). A subset of these samples will be available for additional assays and hypothesis testing by other groups, and the remainder are allocated for long-term biobanking. These data and specimen resources for astronauts can help enable larger cohorts for increased statistical power and also for new biomedical technologies that will emerge in the future.

Discussion

Overall, these data represent a comprehensive clinical and multi-omic resource from commercial and non-commercial astronaut cohorts, creating, to our knowledge, the first-ever aerospace medicine biobank, while providing a platform for private citizens to contribute to future astronaut biomedical studies. In addition, we have demonstrated that short-duration, high-elevation (585 km) spaceflight results in broad-ranging molecular changes, in which some of these changes mirror what has been observed during longer-duration spaceflight, including elevated cytokines, telomere elongation and gene expression changes for immune activation, DNA damage response and oxidative stress. Although more than 95% of markers return to baseline in the months after the mission had ended, some proteins, genes and cytokines appear to be activated only in the recovery period after spaceflight and persist post-flight for at least 3 months.

These results collectively indicate a dynamic recovery profile that substantially reverses the direction of differential gene expression in multiple key biological pathways from the post-flight timepoint (R+1) and afterward. This suggests that re-adaptation to Earth activates a range of restorative mechanisms that help to recover, at least in part, the physiological stress imposed by exposure to the space environment. The systematic analysis of the molecular and cellular changes observed post-flight afford us with a unique opportunity to capture naturally occurring health-restoring mechanisms, which can be used for therapeutic target discovery. Furthermore, we observed a nuanced regulatory landscape, in which enriched recovery pathways are both unique to individual cell types and span cell types in unique combinations. We have carefully indexed various profiles for flight, recovery and longitudinal analysis that are annotated in processed data files available in the NASA OSDR. The RP1 profile indicates that we need more frequent sampling in the 1–2 months directly after flight to untangle the gene and pathway responses during re-adaptation from spaceflight. This will be especially true during longer missions, in which re-adaptation will probably be more intense.

Of note, these data are, to our knowledge, the first-ever joint single-nucleus chromatin profiles (RNA and ATAC) for astronauts, and they also leverage new methods that can track gene expression and epigenetic changes within the same cells. This single-cell, dual-measurement assay provides new data on the molecular changes and regulatory response to spaceflight (for example, chromatin and TFBS accessibility), and the data revealed distinct levels of stress and adaptation by each cell type. Specifically, the T cell and monocyte cell populations (CD14 and CD16) had the largest changes in expression and response of any cell type. The differences between PBMC subpopulations also suggest that single-cell sequencing can be helpful for delineating unique cell-type responses in future studies as well. Indeed, although the immune system and haematopoietic systems both show thousands of transient changes at the gene expression level, the chromatin architecture is distinctly disrupted, in both scale and duration, in these CD14+ and CD16+ monocyte populations. These cell types were also found to be disrupted in the NASA Twins Study7,23, and thus represent a key cell type to be studied for future missions.

The cfRNA, dRNA and spatial RNA profiles revealed unique profiles that differed based on recovery and longitudinal analysis, suggesting that the multi-omic footprint of spaceflight is much wider than previously observed. Although some focused studies of cfRNA have shown that it can be used to detect mitochondrial increase in blood related to spaceflight7,8,9,24, to our knowledge, it had never been applied as a ‘full-body molecular scan’ to detect differential tissue and cell stress. Of note, the SOMA resource enables comparisons of the cfRNAs and cell-encapsulated RNAs, as well as the exosomal fractions, all within the same framework, which is essential for delineating and ranking the cells and tissues in the body that are the most disrupted by spaceflight. Similar to the utility of single-cell data23, the granularity of seeing cell lysis from all tissues across the body in one assay makes cfRNA profiling an ideal addition to some of the proposed standard measures for spaceflight monitoring. So far, cfRNA data indicate that each part of the body may show its own transcriptional response, shedding rate or RNA excretion rates, and thus each tissue should be examined on its own and then compared with other sites and assays. Similarly, in the microbiome data, the taxonomic classifications yielded differences in variation by body site, timepoint and crew member (Extended Data Fig. 8), indicating the need for multi-site and multi-omic sampling for ideal understanding of microbiome changes associated with spaceflight.

However, this study is not without limitations. Although cell-specific comparisons could be made between the I4 mission and previous studies, the assays and collection protocols were slightly different (for example, column purification versus droplet-sorted cells), and thus comparisons will be imperfect. In addition, although the same tube types and methods were used whenever possible, cross-mission comparisons will inevitably include noise from various other types of technical variation, including batches of library preparation and extraction kits, slightly different collection intervals (L−3 versus L−10) and different sequencing or profiling technologies (for example, Illumina versus Ultima or liquid chromatography–tandem mass spectrometry versus NULISASeq). Finally, as our comparisons to other NASA and JAXA datasets span, at most, 64 astronauts, these data are not sufficient to guide medical interventions or inference of mechanisms. As such, these SOMA data and resources should be viewed as preliminary molecular maps of the response of the human body to spaceflight.

Future directions

Nonetheless, the data from the I4 mission will be an invaluable resource for future studies. Deep analyses of secretome profiling25, single-nucleus multiome26, viral activation and ecological restructuring27, skin spatial transcriptomic profiling28, epitranscriptomic profiling22, and genome integrity and clonal haematopoiesis29 have already leveraged this resource. In future missions, additional biomedical profiling can help to delineate the short-term and long-term health effects of spaceflight, including changes in telomere length dynamics, DNA methylation, non-coding RNAs, as well as additional sample types, such as hair follicles, tears, sperm and other biospecimens. Indeed, the remaining samples from the I4 mission have been biobanked for just this reason and to help the scientific community tackle future objectives12. In addition, aliquots of DNA, RNA, protein, serum, urine and stool from the protocols performed in this paper have acquired consent for release and are available (https://cambank.weill.cornell.edu/) for request, in which researchers can then append new results to this extensible SOMA repository.

Differences between the biomedical and cellular responses of crew members may be caused by several factors, including inter-individual genetic differences (Fig. 5), the duration of the mission, the higher flight elevation (585 km), the unique environment of the SpaceX Dragon capsule, or a combination of these and other factors. To address these hypotheses, other molecular assays can be informative for future missions, including other epigenetics and chromatin conformation assays. For example, we have investigated using the method cleavage under targets and release using nuclease (Cut&Run) to profile histone modifications using 5,000 and 10,000 T cells as assay input (Supplementary Fig. 3a), down from the 10 million cells recommended in the original protocol30. T cells were collected from C002 during the R+194 recovery timepoint, but showed high variance at lower input. Nonetheless, the combination of current modifications profiled can be used to annotate active enhancer regions in the genome (Supplementary Fig. 3b), which is a novel profile for astronauts, and opens the door to better understanding of the gene-regulatory changes induced by spaceflight for each crew member.

Molecular changes described here can also help to guide research and countermeasure development, but are only the start of the process of mitigating risks, especially as year-long and multi-year space habitat missions will represent greater biomedical challenges. To aid in this effort, the SOMA resource will continue to expand as further samples are sequenced from the I4 mission11,12 and samples are collected from future missions that travel farther into space, and then compared with other longitudinal, multi-omics cohorts31. Biospecimen samples have been collected and processed from Polaris and Axiom crews, and this Atlas also represents an open call for research participation for astronauts from any commercial or governmental programs.

Finally, although the multi-omic data and resources from the I4 crew represent the largest release of data from astronauts to date, the I4 crew is still a small cohort, and represents only the first step towards resolving the many hazards and needs for long-term missions32, building towards enough statistical power and contextualization of normal human biological variation33,34. Fortunately, one of the crew members for I4 will continue to donate to the SOMA Biobank for multiple missions, including the Polaris Dawn mission, and data and samples from additional crews (for example, long-term Twins Study follow-up, Axiom Saudi, JAXA and Malta missions) are now being collected and integrated. These cross-mission datasets create a unique opportunity for long-term, in-depth analysis of the effect of spaceflight on the human body. Such data are especially important as missions travel farther away from Earth35 and for longer periods of time, in which the data and biomedical discoveries can help to prepare commercial and state-sponsored agencies for the lunar, Mars and exploration-class missions.

Methods

Institutional Review Board statement

All participants consented at an informed consent briefing at SpaceX, and samples were collected and processed under the approval of the Institutional Review Board at Weill Cornell Medicine, under protocol 21-05023569. All crew members consented to data and sample sharing.

I4 data compendium data generation

Full methodology for sample preparation, nucleic acid/protein extraction, sequencing, mass spectrometry and analysis are reported in Supplementary Note 1. Sample collection has been previously reported12.

I4 versus NASA Twins Study cytokine analysis

Differential abundance analyses of cytokines and other analytes were conducted using a mixed-effects model. This model included fixed effects for time, treated as a categorical variable with levels corresponding to preflight (L−92, L−44 and L−3), immediate return (R+1) and recovery periods (R+45, R+82 and R+194). In addition, participant-specific effects were incorporated as random effects in the model. The P values for the coefficients obtained from this model were adjusted for multiple comparisons using the Benjamini–Hochberg procedure.

10x Genomics snRNA-seq negative control data

We acquired cryovials of PBMCs from a healthy man 22 years of age (AllCells) and stored them in vapour-phase liquid nitrogen cryotanks. Two cryovials from this donor were thawed and processed on two different days in the same week, at the same laboratory, using the 10x Genomics demonstrated protocol called Nuclei Isolation for Single Cell Multiome ATAC + Gene Expression Sequencing (CG000365). From each day’s nuclei suspension, ATAC and gene expression libraries were generated (in technical triplicates on the same chip) according to the Chromium Next GEM Single Cell Multiome ATAC + Gene Expression User Guide (CG000338) and sequenced on a NovaSeq 6000 Sequencer. The resulting single-nucleus GEX and single-nucleus ATAC files were aligned using the cellranger-arc pipeline (v2.0.2) from 10x Genomics against the human reference genome hg38. Quality control and cell annotation were performed on the snGEX gene-cell matrices using the R Seurat package (v4.2.0)36. Subpopulations were clustered and labelled using a publicly available Azimuth human PBMC reference37 in conjunction with Seurat’s supervised clustering functionality. DEG analysis for labelled subpopulations was performed using the FindMarkers functionality of Seurat, with a log fold change (logFC) cut-off point of 0.5. P values of resultant genes were measured using the Wilcoxon rank-sum test and deemed significant for P < 0.05. Correlation of chromatin accessibility and gene expression changes in FP1 was characterized by summing up the accessibility in a promoter window consisting of the transcription start site ± 500 bp for every cell in a given cell type across every gene, normalizing it to counts per million, taking the log2 difference between the different timepoints and matching it to the log2FC from the differential RNA analysis.

Recovery profile analysis

DEGs from each cell type from the PBMC 10x Genomics single-cell data were filtered for |logFC| > 0.5 and adjusted P < 0.05 for the FP1, RP1, RP2 and LP2 profiles. An UpSet plot of shared genes was generated using Intervene38. Overrepresented pathways were calculated using WebGestalt39 using the default parameters and the KEGG database. Pathways included in the analysis were within the top 40 lowest false discovery rate values or all pathways with a false discovery rate under 0.05.

Transcription factor motif accessibility analysis

chromVAR40 was used for the analysis of sparse chromatin accessibility from 10x Genomics single-cell data. The FindMarkers function of the Seurat package was used for differential analysis with setting mean.fxn as rowMeans and fc.name = “avg_diff”. The top five differentially accessible transcription factor motifs from each cell type from the PBMC 10x Genomics single-cell data for the FP1, RP1 and RP2 were selected for visualization by heatmaps. Heatmaps were generated by the ComplexHeatmap R package.

Calculation of individual variation

To distinguish the analytes that varied with the greatest magnitude between individual crew members (C001, C002, C003 and C004), a coefficient of variation (CV) calculation was applied across measurements collected for each: metagenomic species, metatranscriptomic species, plasma protein abundance, EVP protein abundance, metabolite abundance, cytokine abundances (Alamar Bio) and gene expression values (Oxford Nanopore dRNA-seq and Ultima RNA-seq). CV calculations were performed using the formula ‘np.std(x, ddof = 0)/np.mean(x)’, where ‘x’ is the array of normalized analyte values and ‘np’ refers to the numpy scientific computing package (v = 1.26). CV calculations were split into pre-flight (L−92, L−44 and L−3), post-flight (R+1) and recovery (R+45 and R+82) intervals. The metagenomic and metatranscriptomic data also include the in-flight (FD2 and FD3) interval.

Before performing the CV calculation, data were first normalized. The ‘limma’ R package was applied to the plasma proteomic, EVP proteomic and metabolite data (v3.52). DESeq2 (v1.36.0) was applied to the Oxford Nanopore dRNA-seq and Ultima RNA-seq data. For microbiome data, to measure individual contributions per astronaut and for each tissue type (‘armpit’, ‘forearm’, ‘nasal’, ‘oral’, ‘post-auricular’, ‘T-zone’, among others), we averaged MetaPhlAn4, species-level, relative abundance values across our collected sample points. CVs were calculated for individual genes, proteins and microbial organisms.

An abundance standardized CV calculation was also performed to correct for the mean/CV relationship in the assays. First, the normalized abundance values across all the analytes of a given dataset were split into approximately 100 mean-abundance quantiles (M_q where q is 1–100 or fewer, in case it resulted in less than 10 points per quantile group). For each quantile group, the mean and the standard deviation of CV values for the analytes belonging to M_q were calculated (CVmean_q and CVs.d._q, respectively) to prepare the mean/CV distribution reference set across all timepoints and samples. Then, for a given timepoint, the CV calculated for each analyte was matched to the corresponding CVmean_q and CVs.d._q based on cross-referencing to the M_q interval overlap. Finally, the abundance standardized CV was calculated as the z score of the CV as (CV − CVmean_q)/(CVs.d._q). Permutation testing of the post-flight versus pre-flight difference was performed by repeating this procedure 10,000 times on shuffled labels for samples, taking the difference. Rank and z score of the observed value were calculated against the random permutations to order the genes and used as input to fGSEA for pathway enrichment analysis. Gene Ontology analysis on abundance standardized CVs was performed with Enrichr41,42 using the default settings.

Additional methods and details

In-depth methods and protocol information for all assays in Fig. 1a is located in Supplementary Note 1.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.