Introduction

Mammalian spermatozoa manifest a unique model for genome packaging in the nucleus. During spermatogenesis, round spermatids undergo substantial morphological changes via the loss of most of the cytoplasm and the formation of a highly compact nuclear head with a species-specific shape. For example, human sperm has an oval head like a paddle, while mouse and rat sperm have falciform heads. To fit into this highly compressed nucleus volume, sperm genomes have a compaction ratio about 6-fold greater than that of somatic genomes1. This compaction is achieved by the replacement of canonical DNA packaging proteins with sperm-specific ones known as protamines, which is believed to further organize chromosomes into higher-order spatial architectures2. Although this chromatin condensation is considered essential for sperm functions, our understanding of the sperm nuclear organization is still limited.

A handful of studies using high-throughput chromatin conformation capture assays have revealed that sperm chromosomes can be partitioned into higher-order structures, such as A/B compartments, topological-associated domains (TADs) and chromatin loops, similarly to what has been observed in somatic cells3,4,5,6,7,8,9. However, recent studies have suggested that these results might be compromised by somatic cell-free DNA contamination introduced during bulk sample collection10,11,12,13. This problem could be potentially resolved by analyzing individual sperm using single-cell Hi-C (scHi-C) technologies, which have emerged as an effective tool to study cell-type-specific chromatin organization during the past few years. One additional advantage of scHi-C is the direct reconstruction of whole-genome structures, which allows visualization and analysis of chromosome conformation in 3D nuclear space, revealing features hidden in bulk measurements such as chromosome size, radial positioning, and chromatin intermingling. Nevertheless, it is challenging to obtain sufficient DNA contacts for successful 3D genome reconstruction from a single sperm cell, partially because the highly condensed sperm chromatin is poorly accessible to Hi-C reaction enzymes. In addition, whether this approach could be applied to reconstruct the morphologically distinctive sperm nucleus remains to be explored.

Here, we developed an improved scHi-C protocol which greatly increased the number of contacts yielded from single sperm. Leveraging this approach, we analyzed thousands of sperm cells from mouse and human samples. The high number of DNA contacts allowed us to reconstruct single-sperm 3D whole-genome structures that faithfully reflected typical morphologies for both mouse and human sperm. Our single-sperm data thus accurately depicted the 3D genome architecture of bona fide sperm nuclei, which exhibited distinctive higher-order chromatin organization in comparison with somatic cells.

Results

Reconstruction of sperm genome structures through optimized single-cell Hi-C

Our initial test on mouse sperm with standard single-cell Hi-C procedures produced only ~ 2000 DNA contacts per cell, which was not sufficient for reliable reconstruction of genome structures. We reasoned that this was mainly due to the incomplete decondensation of sperm chromatin after formaldehyde fixation, which protected genomic DNA from subsequent enzymatic reactions. We, therefore, tested various decondensation conditions and found that treatment of sperm samples with dithiothreitol (DTT), urea, and heparin significantly increased the number of contacts by more than 50-fold (Fig. 1a and Supplementary Fig. 1a). We did not observe any significant bias toward accessible nuclear regions after treatment (Supplementary Fig. 1b). To validate whether this protocol could recapitulate higher-order chromatin structures matching standard Hi-C, we applied it to sequence a bulk sample of mouse embryonic stem cells (mESCs), obtaining a total of ~ 120 million DNA contacts. The resulting DNA contact map and patterns of A/B compartments and TADs were all highly concordant with previously published data (Supplementary Fig. 1c–f)14. We further sequenced single mESC cells and successfully reconstructed 3D genome structures based on DNA contact maps, which exhibited well-known chromatin architectures such as chromosome territories, A/B compartments, and radial preference (Supplementary Fig. 1g–i)15,16. These results suggested that our modified protocol did not disrupt higher-order chromatin organization and produced highly reliable chromatin interaction data.

Fig. 1: 3D genome structures of single mouse and human sperm.
figure 1

a Schematic of the chromatin conformation capture protocol used to generate a single-cell contact matrix. b, c 3D genome structures of representative mouse (b) and human (c) sperm nucleus colored according to chromosomes. The same structures after rotation by 90° are shown beside. Each particle represents 20 kb of chromatin. d, e Superimpositions of aligned mouse (d) and human (e) sperm structure silhouettes. Each single-cell silhouette is generated by the projection of a sperm 3D structure to a 2D plane and the selection of the outermost layer of pixels with a side length of 2 particle radii. f, g Serial cross sections of mouse (f) and human (g) sperm structures. The upper panels show examples of single nucleus cross sections. The lower panels show the average density maps aggregated by aligned single nuclei.

We next attempted to reconstruct 3D genome structures using single sperm data. We sequenced a total of 1000 sperm from two C57BL/6 J mice, and 1042 human sperm from two different donors. After filtering out cells with low data quality, we used 1969 cells to calculate single-cell 3D genome structures, among which 1756 cells yielded structures at 20 kb resolution (defined as root-mean-square deviation < 2 particle radii, Supplementary Data 1). Figure 1b and c showed two representative cells from a mouse and a human, respectively. Remarkably, the reconstructed genome structures clearly delineated canonical nuclear morphologies of both species without using any prior information (Supplementary Fig. 2a). The mouse sperm structure had a distinct apical hook near the acrosome pole and a shallow depression of the nucleus in the caudal pole where the sperm tail was linked to the head by the connecting piece. In contrast, the human sperm structure exhibited a smooth oval shape in the front view and a pear shape in the side view.

To align different single cells to the same 3D Cartesian coordinate system, we fit each reconstructed structure to an oriented bounding box to find their head-tail, dorsal-ventral, and left-right axes (Supplementary Fig. 2b, c). Based on the reconstructed structures, we quantified the morphological heterogeneity of sperm cells by calculating the variation in sperm structure for both mouse and human samples. This analysis revealed slightly higher structural variability in human sperm reconstructions (Supplementary Fig. 2d). Superimposition of single-cell structures faithfully reproduced the morphological differences between mouse and human sperm nuclei (Fig. 1d, e), suggesting that the shape of the reconstructed nuclei was common to the majority of samples. Notably, randomly merging 5% of DNA interactions from other cells resulted in structures showing a substantial violation of sperm morphologies (Supplementary Fig. 3), suggesting that data generated using the single-cell strategy is free of major contamination. Consistent with previously reported imaging results, most mouse sperm had a single large chromocenter in a relatively fixed position, while human sperm had multiple chromocenters randomly positioned within a larger area (Fig. 1f, g and Supplementary Fig. 4)17,18. Together, these results suggested that single-cell Hi-C data faithfully depicted sperm genome structures, which prompted us to further investigate the higher-order organization of sperm genomes in 3D space.

Radial arrangement of sperm chromosomes

Previous studies have found that chromosomes are organized into territories with non-random positioning in somatic cells, as revealed by both Hi-C and imaging-based technologies such as chromosome painting19,20. Reconstructed sperm structures revealed that chromosomes also occupied distinct territories in the nuclei (Fig. 2a, b), but the extent of chromosome intermingling was higher than in somatic cells (Supplementary Fig. 5). We found that the intermingling of sperm chromosomes was influenced by both GC content and radial position (Supplementary Fig. 7c, d), similar to the pattern observed in somatic cells16. To examine the nuclear positioning of sperm chromosomes, we calculated the frequency of particles from a specific chromosome located inside each voxel of the 3D structure and plotted the average spatial distribution of each chromosome across all cells (Fig. 2c, d and Supplementary Fig. 6). We immediately noticed that although autosomes showed different extents of radial preferences within the nucleus, sex chromosomes in sperm were exclusively located in the nuclear center. This unusual pattern most likely reflected the post-meiotic sex chromatin (PMSC), which is known as a silent compartment formed by sex chromosomes alongside chromocenters during meiosis and persists into mature sperm21. We found that sperm autosomes tended to have an elongated rod-like shape, while PMSC clearly exhibited a more compact and rounded configuration (Supplementary Fig. 7a). In comparison with other chromosomes similarly located in the nuclear center, such as chromosome 19 in human sperm, PMSC had a much lower level of chromosome intermingling (Supplementary Fig. 7b), suggesting that they occupied a nuclear domain spatially separated from other chromosomes.

Fig. 2: Spatial distribution of chromosomes in mouse and human sperm nucleus.
figure 2

a, b 3D genome structures of mouse (a) and human (b) sperm with expanded views of separate chromosomes. c, d Cross sections showing the average spatial distribution of chromosomes in mouse (c) and human (d) nuclei. Pixels containing particles from less than 500 (mouse) or 700 (human) cells are discarded for visualization. Color intensity is adjusted for each panel to account for the content of different chromosomes. e, f Average spatial distribution of centromere and telomere for mouse (e) and human (f) sperm similar to (c, d). Panels on the right show the differences between the distribution of telomere and centromere.

We further quantified the radial position across the whole genome by calculating the distance of each 1-Mb particle to the nuclear center of mass. Despite the unsymmetrical shape of sperm nuclei, radial positions of sperm autosomes were generally similar to those of round somatic cells, and inversely correlated with the GC content (Supplementary Fig. 8a–c). Human sperm showed a higher Pearson’s correlation (r = 0.78) than mouse sperm (r = 0.57) with their somatic counterparts. Notable differences were found near centromere and telomere regions, especially between mouse sperm and mESCs (Supplementary Fig. 8d). While many mESCs tended to prefer a Rabl configuration (centromeres clustering on one side of the nuclear envelope and telomeres on the other), this was rarely seen in mouse sperm, where centromeres and telomeres moved toward the nuclear center (Supplementary Fig. 9a–c). On average, centromeres in mouse sperm were more enriched in the nuclear center than telomeres, but in single cells, a centromere could also be seen at the nuclear periphery (Fig. 2e). In contrast, human sperm tended to exhibit an opposite configuration, with centromeres showing a higher chance of being located around the nuclear periphery relative to telomeres (Fig. 2f). This configuration was also consistently seen in human somatic cells (Supplementary Fig. 9d, e). Taken together, these findings indicate that the radial arrangement of sperm chromosomes is largely conserved during male gametogenesis. Notable exceptions include the relocation of sex chromosomes from the nuclear periphery to the nuclear center during prophase I of meiosis, as well as the repositioning of centromeres and telomeres, which may be associated with the formation of sperm chromocenters.

Polarity dependent heterochromatin strength

Both mouse and human chromatin interaction matrices showed a characteristic “checkerboard” pattern, suggesting that sperm genomes were organized into A/B compartments (Fig. 3a, b). We identified A/B compartments by using Principal Component Analysis. Similar to somatic cells, the eigenvectors were correlated with gene density and the GC content (Supplementary Fig. 10a). However, compared with somatic cells, many fine-scale compartment structures merged together in sperm cells, leading to A/B compartments that were larger in size and fewer in number (Supplementary Fig. 10b). Saddle plots suggested that the strength of compartmentalization was weaker in sperm than in mESC or human GM12878 cells, and mouse sperm had lower saddle scores than human sperm (Supplementary Fig. 10c). For both mouse and human sperm, compartmentalization of chromosome X was weaker than that of autosomes (Supplementary Fig. 10d), again indicating that PMSC had a distinct chromatin state.

Fig. 3: Compartmentalization of mouse and human sperm.
figure 3

a, b Contact correlation heatmaps showing pairwise correlations between 1-Mb regions along mouse (a) and human (b) chromosomes. Compartments A and B are classified using normalized principal component eigenvectors (E1). E1 values for mESC and GM12878 are also shown for comparison. c, d Left: cross sections of example mouse (c) and human (d) single sperm showing compartmentalization measured by scA/B values. Right: average distribution of compartmentalization shown on a cross-section of mouse (c) and human (d) sperm nucleus. e, f Cross sections showing the spatial-dependent fluctuation of compartment strength in mouse (e) and human (f) sperm. The middle 1-Mb region (50 20-kb bins) of each compartment B is selected to generate the plots. For each selected bin, the corresponding particles in different cells are ranked according to their scA/B values. The spatial position of the particles from the top 100 cells are then plotted as the “strong B particles” (left), and particles from the bottom 100 cells are plotted as the “weak B particles” (right).

On reconstructed structures, regions belonging to the same compartments tended to cluster together, suggesting the existence of compartments in single cells (Supplementary Fig. 11a, b). To quantify the single-cell compartmentalization, we calculated the compartment value for each genomic bin as the rank-normalized CpG frequency of all loci that it contacted (scA/B)16. The spatial distribution of sperm scA/B values was similar to that of somatic cells, with the heterochromatic compartment B located mainly around the nuclear periphery and the euchromatic compartment A occupying the inner nuclear regions (Fig. 3c, d). Interestingly, we further found that fluctuations of compartment strength were related to its spatial position within the sperm nucleus. Particles with stronger B compartments were not evenly distributed around the nuclear periphery, but enriched at the two poles of the sperm head, especially the caudal pole (Fig. 3e, f and Supplementary Fig. 11c, d). As mature sperm are largely transcriptionally inert22, the variation of compartmentalization might not be related with transcription activities. Sperm DNA have been reported to be anchored to the base of the nucleus through a structure named nuclear annulus23. It is possible that this polarity-dependent heterochromatin strength is associated with chromatin compaction during the elongation of the sperm head.

Mouse and human sperm do not contain TADs

TADs are self-interacting domains identified along the diagonal of Hi-C contact maps24. Compared to compartments, TADs have been found to be highly conserved across various cell types or even different species25. However, whether sperm genomes are organized into TADs has been controversial. TADs can be detected in studies using sperm samples obtained via tissue dissection and sperm swim out, but not in ejaculated human sperm or mouse sperm isolated by fluorescence-activated cell sorting3,4,5,6,7,8,9,26,27. Two recent studies proposed that TADs observed in dissected mouse and monkey samples might be contaminated with cell-free somatic chromatin12,13. Since exogeneous DNA contacts severely disrupted the reconstructed sperm genome structures (Supplementary Fig. 3), data derived from our single-cell structures must be free of contamination and thereby provided a bona fide sperm chromatin interaction profile.

We thus used reconstructed genome structures to generate 3D proximity maps, defined as the percentage of cells where a pair of particles were within a specific 3D distance. We found that at the megabase scale where TADs were often found, neither mouse nor human 3D proximity maps showed the characteristic “square” patterns of TAD structures (Fig. 4a). There was also a lack of contact enrichment averaged over all TADs (or loop dots) identified in somatic cells (Fig. 4b and Supplementary Fig. 12a, b). A few hundred insulation boundaries could still be identified in sperm, but the boundary strength was much weaker (Supplementary Fig. 12c, d). Most of these boundaries seemed to result from a switch of A/B compartments (Supplementary Fig. 12e, f). Finally, compared with somatic cells with well-defined TADs, sperm cells were depleted in contacts less than 1 Mb (Supplementary Fig. 12g), which was just within the size range of TADs.

Fig. 4: Mouse and human sperm do not contain TADs.
figure 4

a 3D proximity maps for mouse (left) and human (right) chromosome, shown as the percentage of cells with 3D distance between two loci fewer than 3 particle radii. b Heatmaps showing the average interaction frequencies of mESC TADs in mouse sperm (left) and GM12878 TADs in human sperm (right). c, d Pairwise 3D distance maps showing TAD-like domains in mouse (c) and human (d) single sperm. Chromatin structures corresponding to the selected regions were shown below, with different sections painted with different colors. e, f Heatmaps showing single-cell insulation scores in mouse sperm (e) and mESC (f) (n = 90 cells). Each line represents a single cell. The bottom curves showed the average insulation scores of single cells. TAD boundaries were marked with circles.

Although TADs were not observed in aggregated sperm data, we found that TAD-like structures still existed in single cells (Fig. 4c, d). We calculated single-sperm insulation scores on pairwise 3D distance matrices derived from reconstructed genome structures. The number and strength of insulation boundaries at the single-cell level were similar between sperm and somatic cells (Supplementary Fig. 13). However, the boundaries in sperm were randomly distributed along the genome, therefore, when averaged together, did not show any preferential positions (Fig. 4e, f). This observation was similar to what has been found in single HCT116 cells after cohesion depletion, which abolished TADs at the population-average level, but did not eliminate TAD-like structures in single cells 28.

The absence of ensemble TADs in sperm could be caused by the low binding of CTCF, or suppression of cohesion-mediated loop extrusion by protamines. Alternatively, it is possible that fine-scale chromatin structures in sperm samples were disrupted during the Hi-C procedures. We, therefore, performed genome architecture mapping (GAM), a ligation-free method for capturing genome-wide chromatin interactions29,30, on human sperm samples. The GAM linkage matrix was highly similar to the Hi-C contact map (Pearson’s correlation r = 0.74; range 0.53 to 0.91 for individual chromosomes excluding chromosome Y, Fig. 5a and Supplementary Fig. 14a). Notably, consistent with the single-cell Hi-C results, the GAM contact matrix also suggested a lack of TAD structure patterns (Fig. 5b and Supplementary Fig. 14b–d). Collectively, these results strongly suggest that mouse and human sperm do not contain TADs.

Fig. 5: GAM independently reproduces sperm chromatin architecture identified by single-cell Hi-C.
figure 5

a Comparison of the GAM linkage matrix and aggregated single-cell Hi-C contact maps of human sperm. E1 values at 1 Mb resolution calculated from GAM or Hi-C data are shown below. b A 10-Mb zoom-in view of panel (a). Insulation scores (IS) of GAM or Hi-C data are shown below.

X and Y sperm are highly similar in chromosome organization

As haploid cells, each sperm carries either an X or Y chromosome. Recent studies have suggested negligible or no differences between these two types of sperm in their shape, size, mobility, and so on, except for their DNA contents31. Whether this difference in DNA content affects the 3D genome organization remains unclear. We focused our analysis on the homozygous mouse sperm to exclude differences caused by genome sequence polymorphism. At single-cell resolution, we were able to distinguish between X and Y sperm by the ratio of X and Y contacts (Supplementary Fig. 15a, b). We found that X and Y sperm were highly similar in terms of autosomal DNA contact matrices, radial arrangement, compartment strength, and contact decay profiles (Supplementary Fig. 15c–f). Uniform manifold approximation and projection (UMAP) visualization using scA/B values, scHiCluster, or Fast-Higashi embeddings all suggested a uniformly mixed clustering of X and Y sperm cells (Supplementary Fig. 16a). We also tried to identify differential chromatin interactions using our previously developed computational strategy32. Only a limited number of differential interactions were identified, which was not significantly different from results between two randomly grouped cells (Supplementary Fig. 16b). Together, these results suggested that the global genome spatial organization was highly similar between X and Y chromosome-bearing sperm.

Discussion

In this study, we developed an optimized single-cell Hi-C assay specifically tailored for mammalian sperm samples. This method enabled us to capture a greater number of chromatin contacts from individual sperm cells and reconstruct high-resolution 3D whole-genome structures. The resulting genome models faithfully reflected the distinctive morphological features of mouse and human sperm nuclei, offering an accurate and detailed representation of sperm chromatin architecture.

The organization of chromosome territories within the highly compact sperm nucleus has long been a subject of interest. Earlier studies using fluorescence in situ hybridization (FISH) revealed segmental distribution patterns of chromosomes within the sperm nucleus. In mice, smaller chromosomes were observed at the base of the nucleus, while larger chromosomes were found in the ventral region33. In humans, chromosomes were found to arrange both radially and longitudinally, with the X chromosome aligned near the top of the sperm head34. However, these findings were constrained by the limited number of analyzed cells, the selection of chromosomes and FISH probes, and z-axis resolution35. In contrast, our reconstructed single-cell 3D genome structures provide a comprehensive and high-resolution view of sperm genome organization, overcoming these limitations. Our data suggest that in both humans and mice, the positioning of sperm autosomes largely mirrors that observed in somatic cells (Supplementary Fig. 8). However, sex chromosomes are uniquely positioned at the center of the sperm nucleus, reflecting the distinct organization of PMSC (Fig. 2c, d). Another critical factor influencing chromosome positioning is the localization of centromeres and telomeres. Earlier studies by Zalensky and collaborators proposed a “hairpin-loop model,” suggesting that centromeres cluster at the nucleus center while telomeres are confined to the periphery36,37. However, 3D genome structures reveal that in single sperm, both centromeres and telomeres can be distributed throughout the nucleus (Supplementary Fig. 9c, e), supporting a refined version of the model proposed recently 17.

At a finer scale, the organization of sperm chromosomes into hierarchical structures remains a debated topic. Earlier studies using bulk sperm samples reported that mouse sperm chromatin exhibited A/B compartments and TAD structures, similar to those found in somatic cells3,4,5,6,7,8,9. Using flow-sorted sperm samples, Vara and colleagues confirmed the presence of compartments in mouse sperm but did not detect well-defined TADs27. Similarly, Chen et al. identified A/B compartments in human sperm but noted the absence of TADs26. More recently, Yin et al. attributed inconsistencies among previous studies to contamination by cell-free somatic chromatin introduced during tissue dissection12. They concluded that mature mouse sperm lack evidence of A/B compartments or TADs, though their findings varied based on whether samples were treated with or without DNase I. In our study, we addressed potential contamination issues by isolating single sperm cells using FACS, thereby ensuring the high purity of samples. In addition, we avoided treatments, such as DNase I, that could compromise nuclear integrity. Further evidence supporting the absence of contamination in our data is the successful reconstruction of sperm genome structures, which is only possible with uncontaminated single-cell data (Fig. 1b, c and Supplementary Fig. 3). Compared to previously published bulk studies, our aggregated single-sperm contact matrices showed stronger correlations with datasets from Vara et al27. Chen et al26. and Yin et al12. (without DNase treatment) (Supplementary Fig. 17).

Our results indicate that sperm genomes can be partitioned into A/B compartments, although the compartmentalization strength is weaker compared to somatic cells and differs between humans and mice (Fig. 3 a, b and Supplementary Fig. 10). Similar to somatic cells, B compartments are predominantly located at the nuclear periphery, while A compartments are more central (Fig. 3c, d). A key distinction in the sperm genome, however, is the absence of TADs (Fig. 4a, b and Supplementary Fig. 12a–c). To rule out artifacts caused by formaldehyde crosslinking, which may be less effective for protamine-rich sperm chromatin, we conducted validation experiments using GAM29,30, which does not require pre-treatments typical of Hi-C experiments. The results from GAM strongly corroborated those obtained from single-cell Hi-C experiments (Fig. 5 and Supplementary Fig. 14). Since it has been found that human sperm lack CTCF expression26, it is thus plausible that TAD structures are disrupted during the histone-to-protamine exchange. The precise molecular mechanisms underlying this process during spermatogenesis remain to be elucidated. The single-cell Hi-C techniques developed in this study offer an effective approach for investigating the dynamic chromatin remodeling that occurs during spermatogenesis. In addition, these methods hold significant potential for advancing research into male infertility linked to abnormalities in sperm chromatin organization.

Methods

Ethics

All experimental procedures involving animals and human participants were conducted in strict accordance with international ethical standards. Animal research approved by the Institutional Animal Care and Use Committee (IACUC) of Peking University (BIOPIC-XingD-1), and human research conducted under the ethical oversight of the Peking University Institutional Review Board (IRB00001052-24079) and the Ethics Committee of Center for Reproductive Medicine, Cheeloo College of Medicine, Shandong University (2024-R22).

Animals

All animal experimental procedures were carried out following the Institutional Animal Care and Use Committee (IACUC) guidelines of Peking University. Two 10–12-week-old C57BL/6 J male mice were purchased from Peking University Laboratory Animal Center. All mice were housed in a specific pathogen-free facility at the Laboratory Animal Center of Peking University on a 12 h light/dark cycle with ambient temperature regulated between 20–25 °C and relative humidity maintained at 30–70%. Animals were provided ad libitum access to food and water during the housing period.

Mouse sperm collection and treatment

Mouse sperm cells were collected from the cauda epididymis in C57BL/6 J male mice. The cauda epididymis was isolated and washed with 1 × PBS twice, placed into a culture dish, briefly dissected to remove fat tissues, then punctured by a needle, and gently squeezed using two forceps. The sperm clot was collected in 4 mL of HTF medium (with BSA) (Nanjing AIBI Bio-Technology, M1135) in a 5 mL tube. Sperms were incubated in HTF for 30 min in a CO2 incubator at 37 °C, and then the upper ~ 2.45 mL was recovered in a new 5 mL tube for DTT treatment. The sperm were incubated in 2 mM DTT for 15 min in a CO2 incubator at 37 °C. Next, the sperm cell suspension was centrifugated and washed with 1 × PBS twice. The pellet was resuspended with 500 μL 1 × PBS, and the cell number was adjusted to ~ 2 million.

Human germ cell collection and treatment

The aim and protocols of this study have been reviewed and approved by the Institutional Review Board of Reproductive Medicine, Shandong University, and Peking University. Written informed consent (including agreement for identifier publication) was obtained from both participants, who received no compensation for their involvement. Donors are 30–50 years old males who are undergoing assisted reproduction due to female causes. According to the routine semen analysis results provided by Shandong University Affiliated Reproductive Hospital, all semen parameters—including sperm motility, count, morphology, seminal fluid enzymatic analysis, basic endocrine evaluations, genetic testing, and cytogenetic analysis—were within normal ranges. Semen samples were collected via masturbation into a sterile plastic container after 2–7 days of abstinence and then incubated at 37 °C for 30 min to liquefy. Healthy semen was chosen in accordance with the 5th version of the WHO pamphlet. Density gradient centrifugation was used for selecting sperm with high motility. Briefly, a two-layer gradient was prepared with 2 mL each of 45% and 90% SpermGrad (Vitrolife Sweden AB, V. Frölunda, Sweden) in 15 mL centrifuge tubes. The semen samples were then transferred to the top of the gradient in individual tubes and centrifuged at 300 × g for 20 min. The resulting supernatant, containing sperm plasm, dead sperm, low-motility sperm, and debris, was discarded, and the pellet was resuspended in 3 mL of sperm rinse, followed by centrifugation at 260 × g for 10 min. The resulting supernatant was again discarded, and the sperm pellet was resuspended in 0.5 mL of G -IVF and washed with 1 × PBS twice, subsequent steps were similar to the treatment of mouse sperm.

Single sperm nuclei Hi-C library generation

The generation of sperm cell Hi-C libraries was optimized from the canonical single-cell in situ Hi-C protocol. Cells were fixed directly in 500 μL 1 × PBS by adding 33.33 μL 16% formaldehyde (Thermo Fisher, 28906) for 10 min with rotating at room temperature. 50 μL 2% BSA was added to quench formaldehyde. The tube was centrifuged at 2500 × g for 5 min at 4 °C. The supernatant was removed, and the pellet was resuspended in 200 μL ice-cold Wash Buffer (0.1 mg/mL BSA (NEB, B9200S), 1 × PBS). The tube was centrifuged at 2500 × g for 5 min at 4 °C. The supernatant was removed, and the pellet was washed with 200 μL sperm decondensation buffer (5 mM HEPES, 10 mM EDTA, 0.2 % Igepal CA 630, 5 mM NaCl, 1.2 M Urea, 10 mM DTT) with 20 μL protease inhibitor (Sigma, P8340), and centrifuged at 2500 × g for 5 min at 4 °C. The supernatant was removed, and the pellet was resuspended in 200 μL sperm decondensation buffer with 20 μL protease inhibitor and 1 mg/mL Heparin. The tube was incubated at 42 °C for 1 h and then centrifuged at 2500  × g for 5 min. The supernatant was removed, and the pellet was resuspended with 150 ~ 200 μL ice-cold Hi-C lysis buffer (10 mM Tris pH 8.0, 10 mM NaCl, 0.2 % Igepal CA 630) with 20 μL protease inhibitor and incubated on ice for 15 min. Took 20 μL of it to a new tube for the subsequent steps; this 20 μL volume contained approximately 250,000 cells.22 uL 1 % SDS was added and incubated at 65 °C for 10 min to further permeabilize. After incubating, 22 μL of 10 % Triton X-100 was added to quench the SDS. Tube was gently vortex to mix well, avoiding excessive foaming, and then incubated at 37 °C for 15 min. 28 uL digestion buffer (1 × rCutSmart buffer, 100 U MboI restriction enzyme (NEB, R0147M)) was added and chromatin was digested at 37 °C for overnight. The tube was incubated at 65 °C for 20 min to inactivate MboI and then cooled to room temperature. 92 μL ligation mix buffer (18 μL 10 × T4 DNA ligase buffer (NEB, B0202S), 1 μL 20 mg/mL BSA (NEB, B9200S), 12 μL 1 μ/uL T4 DNA ligase (Life Tech, 15224-025) and 61 μL water) was directly added and sample was incubated at room temperature for 2 h. After ligation, 800 μL 1 × PBS was added, and about 1 mL nucleus suspension was filtered through a 40 μm cell strainer. Nuclei were then stained with 15 μL DAPI (VECTASHIELD, H-1200-10) for 5 min at room temperature, then held at 4 °C until sorting. Single nucleus1n,1c was FACS sorted to each well of an empty 96-well plate (Thermo Fisher AB1400L) with a BD FACSAriaTM Cell Sorter (65011040) (Supplementary Fig. 18). The whole plate can be stored at − 80 °C until genome amplification. Single nucleus lysis and preamplification followed the same homemade protocol as previously published32. Briefly, 4 μL Lysis Buffer (50 mM NaCl, 20 mM Tris pH 8.0, 0.15% Triton X-100, 1 mM EDTA, 25 mM DTT, 0.54 mg/mL Qiagen Protease (QIAGEN 19157)) was added to each well with a multichannel pipette and the plate was incubated at 50 °C for 3 h and 70 °C for 1 h. After incubation, 36 μL Amplification Mixture (4 μL 10X ThermoPol buffer, 1.2 μL 10 mM dNTP, 0.6 μL 50 μM GAT5 primer, 0.6 μL 50 μM GAT5-7N primer, 0.3 μL 100 mM MgSO4, 1 μL Deep Vent (exo-) (NEB M0259L) and 28.3 μL water) was added and incubated with the following program: Step 1: 95 °C × 5 min; Step 2: 4 °C × 50 s, 10 °C × 50 s, 20 °C × 50 s, 30 °C × 50 s, 40 °C × 45 s, 50 °C × 45 s, 65 °C × 4 min, 95 °C × 20 s, 58 °C × 20 s and repeat Step 2 for an additional 9 times; Step 3: 95 °C × 1 min; Step 4: 95 °C × 20 s, 58 °C × 30 s, 72 °C × 3 min and repeat Step 4 for an additional 14 times; Step 5: 72 °C × 5 min. The sequence of the GAT5 primer was GTAGGTGTGAGTGATGGTTGAGGTAGT. The sequence of GAT5-7N primer was GTAGGTGTGAGTGATGGTTGAGGTAGTNNNNNNN. The single-cell preamplification product was diluted with water to 5 ng/uL. 2 μL of each diluted sample was transposed by the addition of 3 μL Transposition Mixture (1 μL 5 X TTBL, 0.25 μL TTE Mix V50 (Vazyme TD501), 1.75 μL water) and incubated at 55 °C for 10 min. Transposition was stopped by adding 1.25 μL 0.2% SDS, and then the plate was incubated at room temperature for 10 min. Then, 2 μL 5 μM i5 unique dual (UD) index and 2 μL 5 μM i7 unique dual (UD) index were added in each well. Next, the sperm Hi-C library was amplified for 10 cycles of PCR with 9.75 uL KAPA PCR mix (Roche, KK2102) (4 uL 5 × KAPA HiFi GC Buffer, 0.6 uL dNTP Mix (10 mM each), 4 uL KAPA HiFi DNA Polymerase (1 μ/uL) and 4.75 μL water). The PCR program was Step 1: 72 °C × 3 min; Step 2: 98 °C × 30 s; Step 3: 98 °C × 15 s, 60 °C × 30 s, 72 °C × 2 min and repeat Step 3 for an additional 9 times; Step 4: 72 °C × 5 min. The barcoded single-cell libraries were then pooled and purified with 0.6 × and 0.15 × VAHTS DNA Clean Beads (Vazyme, N411-02). The final libraries were sequenced with paired-end 150 bp reads on a NovaSeq 6000 (Illumina) platform according to the manufacturer’s instructions.

mESC Hi-C library generation

The mouse embryonic stem cell (mESC) line was derived from blastocysts at embryonic day 3.5 from F1 hybrid mouse embryos (C57BL/6 J × CAST/EiJ) and cultured on a 6-well plate coated with 0.1% gelatin in an ES cell medium containing serum. The Hi-C procedures were performed the same as described above, including the decondensation treatment.

GAM library preparation

The GAM protocol used in this study was similar as previously described with slight modifications29,30. Fresh human sperm samples (approximately 5–6 million) were fixed directly in 1 mL 1 × PBS by adding 6.29 μL 16% formaldehyde (final 0.1%) for 10 min with rotating at room temperature and quenched with 100 uL 2% BSA. After centrifugated with 3000 × g for 5 min. The cell pellet was washed by Wash Buffer and used for ultrathin nuclear cryosection. Cryosection preparation was performed by a specialist at the Protein Preparation Center. Briefly, the samples were rinsed three times with PBS, each for 5 min. The samples were then embedded in 12% gelatin at 37 °C for 10 min, followed by centrifugation at 10,000 rpm for 1 min. After centrifugation, the samples were pre-cooled on ice to allow the gelatin to solidify. The gelatin-embedded cell clumps were subsequently cut into 0.5 mm³ pieces. These pieces were put into 2.3 M sucrose and incubated in a 4 °C cold room overnight. Cryosections of 150 nm were cut with glass knives using a Leica FC7 ultracut cryotome, collected in sucrose droplets (2.3 M in PBS), and transferred to steel frame PEN (polyethylene naphthalate) membrane slides (Leica) for ultraviolet treatment for 45 min prior to use. Slides were washed in 1 × PBS (three times, 5 min each), then with nuclease-free water (three times, 5 min each). Cresyl violet staining was performed with sterile-filtered cresyl violet (1 % w/v in water, Sigma-Aldrich, C5042) for 15 min, followed by two washes with water (30 s each) and was air dried for at least 15 min. The lysis buffer (1 × single-cell lysis buffer, as outlined in the recipe for Hi-C library generation, 6 mg/mL QIAGEN Protease) was prepared in advance and added to the 0.2 mL PCR tube caps (10 µL each) (Axygen). Up to three nuclear profiles (NPs) were laser microdissected (Leica LMD7000) into each 0.2 mL PCR tube cap and stored at 4 °C before whole genome amplification.

Whole genome amplification of DNA from microdissected nuclear profiles was performed with in-house protocol. In brief, NPs were lysed directly in the PCR caps for 4 h at 60 °C. After protease inactivation at 75 °C for 30 min, the PCR tubes were inverted and centrifuged the right way up at 3000 × g for 5 min to collect the extracted DNA in the bottom of the tube. 10 μL of each lysed sample was transposed by the addition of 20 μL Transposition Mix (15 μL 2 × Transposition buffer (20 mM 1 M Tris pH 8.0, 10 mM 1 M MgCl2, 16% w/v PEG8000), 0.2 μL TTE Mix V50 (dilute by a factor of 100) (Vazyme TD501), 4.8 uL water) and incubated at 55 °C for 10 min. Transposition was stopped by adding 1.5 μL SDS mix (0.9 uL 200 mM EDTA, 0.6 μL 2% SDS), and then the tube was incubated at 55 °C for 10 min. Then, 1 μL 25 μM i5 unique dual (UD) index and 1 μL 25 μM i7 unique dual (UD) index were added in each tube. Next, the samples were amplified for 26 cycles of PCR with 16.5 uL KAPA PCR mix (Roche, KK2102) (10 uL 5 × KAPA HiFi GC Buffer, 1.5 uL dNTP Mix (10 mM each), 2.5 μL 25 mM MgCl2, 1 uL KAPA HiFi DNA Polymerase (1 U/uL) and 1.5 μL water). The barcoded nuclear profile libraries were then pooled and purified with 0.9 × VAHTS DNA Clean Beads (Vazyme, N411-02). The final libraries were sequenced with paired-end 150 bp reads on a NovaSeq 6000 (Illumina) platform according to the manufacturer’s instructions (Supplementary Data 2).

Published data

Mouse Embryonic Stem Cell (mESC) in situ Hi-C data were downloaded from GSE9610714. GM12878 in situ Hi-C data were downloaded from GSE6352538. GM12878 Dip-C data were download from GSE11787616 and reprocessed from raw with custom scripts. Mouse sperm Hi-C data from were downloaded from GSE24024812 GSE13205427 and GSE119805 (4, reanalyzed by ref. 7). Human sperm Hi-C data were downloaded from Genome Sequence Archive with the accession number CRA00010826. Centromere and telomere regions of human and mouse genome were downloaded from the UCSC genome annotation database (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/CytoBandIdeo.txt.gz, https://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/gap.txt.gz).

Bulk Hi-C data analysis

Hi-C data processing. mESC bulk Hi-C reads were processed using the distiller-nf workflow (https://github.com/open2c/distiller-nf). In brief, Hi-C reads were first aligned to the GRCm38 reference genome using the “mem” command of BWA39 (version 0.7.17). Pairtools40 (version 1.0.2) was then used to remove PCR duplicates and extract contacts. Only pairs with both ends having a MAPQ score of ≥ 30 were retained. The resulting contacts file was transformed into a contact matrix and stored in the cool format using cooler (version 0.9.3)41. Iterative correction was applied using the “cooler balance” command to suppress biases in the data.

Similarity of two contact matrices. The Pearson Correlation Coefficient (PCC) and Stratum-adjusted Correlation Coefficient (SCC) were used to measure the similarity between two contact matrices42. For PCC calculations, we used 200 kb resolution contact matrices, excluding contacts from sex chromosomes and contacts over a distance greater than 5 Mb. To compute SCC, we used a Python reimplementation (https://github.com/cmdoret/hicreppy, version 0.1.0) of the original algorithm on 1 Mb resolution contact matrices. The “htrain” subcommand was used to determine the optimal h-value. Then, the “scc” subcommand was used to calculate the SCC of the two contact matrices in cool format using this optimized parameter. Sex chromosomes and contacts with distances greater than 10 Mb were excluded from these calculations.

Contact decay profiles. The P(s) curve was calculated from 50 Kb resolution, iterative normalized contact matrix in cool format using cooltools (version 0.7.0).

Assignment of A/B compartment. We calculate compartmentalization of bulk Hi-C sample using 1 Mb corrected contact matrix using “cis-eig” subcommand of cooltools43 (version 0.7.0). The contact matrix was first divided by an expected matrix to normalize the distance effect. We then calculated the autocorrelation matrix of this normalized matrix and performed principal component analysis on it. Components were sorted according to the variance they explained. The first component was selected to represent the A/B compartmentalization. The sign of the resulting vector was adjusted according to its correlation with the GC content distribution of the genome. For some chromosomes (for example, chr14 in the bulk mESC sample), the second or third component was chosen because visual inspection revealed better correspondence with the correlation matrix.

TAD identification. The Insulation Score (IS) was used to find and quantify the boundaries of topological-associated domains (TADs). We use the “insulation” subcommand of cooltools (version 0.7.0) to calculate the insulation score and boundary strength of each genomic position in 100 Kb, 200 Kb, and 400 Kb windows separately from the contact matrix in 20 Kb bins. Positions with boundary strength higher than Li threshold of the overall boundary strength distribution were selected as TAD boundaries.

Loop identification. We used “pyHICCUPS” command provided by hicpeaks (https://github.com/XiaoTaoWang/HiCPeaks, version 0.3.5) to identify loops from 20 Kb contact matrix in cool format.

Single-cell Hi-C data analysis

Single-cell Hi-C data processing. Hi-C reads of mouse (sperm, mESC) and human (sperm) single-cell samples were mapped to GRCm38 and GRCh38 reference genome separately using BWA-mem2. Contacts were extract with “hickit” package (https://github.com/lh3/hickit, version r291). Contact ends with more than 10 other contact ends within 1 Kb were regard as promiscuous and removed with custom scripts. Contacts with less than 5 other contacts in their 10 Mb L0.5 neighborhood were regard as isolated contacts and also removed like ref. 16. For all cell types, samples with more than 30,000 remaining contacts were kept for downstream analysis. For diploid samples like mESCs, the haplotype of contacts was phased according to SNP sites and imputed. For each sample, each chromosome is treated as a polymer chain, each chromosome is evenly divided into 20 kb sections, each section is treated as a particle of the polymer chain, and contacts of the sample were used as restraints to search for the optimal x, y, z spatial coordinates of particles by the Force-Directed Graph (FDG) algorithm implemented by “hickit”. 5 models with different random initiate seeds were generated simultaneously and 3 of these models with lowest mean mutual root mean square deviation (RMSD) were selected candidates. Samples with an RMSD greater than 2 between any two candidate structures were excluded from the structure-based analysis. For samples that passed quality control, we used the average mutual RMSD of the three candidate structures as the overall RMSD and randomly selected one model from the three candidates to represent the 3D genome structures. The results were saved in files formatted as 3dg, with each line sequentially listing the chromosome name, start position, x, y, and z coordinates, delimited by tab characters. Owing to a deficiency in informational support for their placement, the particles harboring the lowest 6% of internal contacts were discarded.

Quantifying the intermingling level between chromosome territories. We measured the chromosomal diversity within a three-particle radius vicinity around each 20 Kb particle using the Shannon’s Index, akin to the methodology detailed in ref. 16.

Radius of gyration. We utilize the radius of gyration of a chromosomal structure as an indicator of its tendency to elongate, with a definition similar to those outlined in ref. 16. and ref. 15. The radius of gyration for each chromosome is a scalar quantity.

Single-cell compartmentalization. We measure the compartmentalization of various regions in each single cell using scAB values defined similarly to those in ref. 16. Due to the sparsity of contacts, the 20 kb resolution scAB values utilized in the analysis are derived from structural calculations, considering particles within a distance of less than 3 particle radii as neighbors. For each single cell, the scAB is a vector with values ranging from 0 to 1.

Tendency to form Rabl conformation and the extent of centromere facing outward. These two metrics characterize the global features of the genome’s 3D structure, with definitions similar to those found in ref. 16. Regions within 5 Mb of centromeres and telomeres are utilized to represent both features.

Pseudobulk analysis. Single-cell contacts in pairs format were first converted into 1 Kb binned cool files using the “cload” command from cooler (version 0.9.3). Subsequently, these were piled up with the “merge” command and scaled to various resolutions via “zoomify”. Pseudobulk datasets for X and Y sperm were synthesized from their respective single-cell pairs files using the same methodology. Compartmentalization, TAD boundaries, and the P(s) curve for both mouse and human sperms were calculated using the pseudobulk cool files with Cooltools, following the same procedures as those applied to bulk Hi-C data.

3D proximity map. We constructed 3D proximity maps from single-cell genomic structures using definitions identical to those in ref. 44. This was aimed at overcoming data sparsity when magnifying to visualize interactions within a specific genomic region at 20 Kb resolution.

Average TAD plot. We utilized publicly available mESC and GM12878 bulk Hi-C data to generate respective somatic TAD lists for mouse and human genomes. For each TAD block, we extended equal-length regions both upstream and downstream, then extracted corresponding segments from the 3D proximity maps of spermatocytes. These proximity maps were first converted into observed/expected matrices (akin to the steps for compartment identification), then, paralleling the approach in ref. 45. reintroduced a unified − 0.25 power law with respect to distance to generate new “effective contact probability” matrices. These matrices were scaled to the same size and superimposed.

Single-cell TAD-like domains (scTLDs). We generate single-cell distance matrices by computing the distances between every pair of loci from single-cell genome structures. Subsequently, the distance matrix is inverted to correlate directly with interaction frequency, and then the insulation score for each bin is calculated, yielding a single-cell scIS (single-cell Insulation Score) vector. This calculation employs a window size of 10 bin lengths (200 Kb) for consistency. For the single-cell scIS, local minima are indicative of scTLD borders, and the peak prominence of scIS at these positions serves as a measure of the strength of these borders.

Uniform manifold approximation and projection (UMAP) of X and Y sperms. scAB embedding was performed like ref. 16. with custom script, and scAB value of 1 Mb bins were used. For scHiCluster46 (version 1.2.24) embedding, a 100Kb resolution contact matrix with distances within 10 Mb was utilized. Fast-Higashi47 (version 0.1.1) was run following the tutorial guidelines at 500Kb resolution, with settings “filter = True” and “do_conv = False”.

Differential chromatin interaction (DI) analysis. We perform SimpleDiff32 on 200 Kb resolution reconstructed genome structures of mouse X sperm and Y sperms. To gauge the distribution of DI numbers detected by the SimpleDiff algorithm between randomly assembled groups, we randomly drew 300 samples each from mouse X and Y sperm. Starting from this, we iteratively selected 200 samples from X and Y sperm to constitute the X group and Y group, respectively. Then, we shuffled 50% of the samples between the X group and Y group to form the mix1 group and mix2 group. Finally, we computed the DI numbers between the X group and Y group, as well as between the mix1 group and mix2 group. This process was repeated 1000 times. We observed the discrepancies between these two DI count distributions.

Macromolecule graphics

3dg file was transformed to mmCIF format with custom scripts and loaded by UCSF ChimeraX48 (version 1.8). Each chromosome was uniquely colored using the “color” command, with identically named chromosomes in human and mouse cells employing a consistent color scheme. Cartoon representation, white background, and ambient lighting were used for the rendered output image.

Alignment of 3D genome structures

Unlike spherical cells, human and mouse spermatozoa exhibit pronounced morphological features (for example, the elongated anterior of mouse sperms), implying an intrinsic coordinate system exists internally for each nucleus (near or far from the head). To incorporate these characteristics in structural analyses, we endeavored to align the 3D genomic structures according to their shape features. This is fundamentally a rather challenging Simultaneous Pose and Correspondence (SPC) problem49, but we can leverage the unique characteristics of our samples to address it with a relatively simpler 2-step approach (Supplementary Fig. 2a).

Principal Axis Search. Notably, human and murine spermatozoa generally present with a slender, plate-like morphology, suggestive of a narrow distribution along the thickness axis. When viewed along this axis, akin to a lateral perspective, it becomes apparent that the sperms are significantly elongated in the direction spanning the acrosomal to the caudal pole, whereas they are notably shorter in the orthogonal direction. Together with the thickness axis initially mentioned, these three principal axes serve to delineate the sperm’s overall spatial arrangement. We adopt anatomical terminology, designating the longest axis as the head-tail (HT) axis, the shortest as the left-right (LR) axis, and the intermediate one as the dorsal-ventral (DV) axis. We found principal axes for every sample by calculating the Oriented Bounding Box (OBB) of their 20Kb reconstructed structures. Subsequently, we aligned the centroids by subtracting the center of mass coordinates from the positional matrices of the structures and then applied the corresponding transformation matrices to rectify the orientation of the respective axes.

Direction Identification. It bears noting that these principal axes are non-directional. For murine spermatozoa, which possess a single plane of symmetry (perpendicular to the LR axis), aligning the principal axes leaves the sample in one of four orientations: the acrosomal pole facing upwards, downwards, the apical hook pointed to the left, or to the right. In contrast, for human spermatozoa, which exhibit two planes of symmetry (perpendicular to both the DV and LR axes), there are only two possible orientations: the acrosomal pole oriented upwards or downwards. We employed unsupervised learning to discriminate among the various orientations. For murine spermatozoa, we utilized the side view looking along the LR axis as input. The silhouette was extracted and converted into a binary (0,1) image, subsequently embedded using UMAP. From the scatter plot visualization, four clusters of samples became apparent. We then proceeded to cluster using the first three dimensions output by UMAP as inputs to the GMM algorithm. For human spermatozoa, we adopted the side view looking along the DV axis as input, following the identical procedure, which resulted in the segregation of samples into two groups. Flipping the samples to a uniform orientation was achieved by taking the negation of the coordinates corresponding to the respective axis (Supplementary Fig. 2b).

3D superimposition of genome features

The aligned samples were voxelized with a stride of 2 particle radii, assigning each 20 Kb particle to a voxel of 2 × 2 × 2 size based on its coordinates (for simplicity, particles were treated as points during assignment, ignoring their radii). For each sample and each genomic feature, the value of a voxel was represented by the average value of the particles it contained. To ensure the stability of voxel values, those containing fewer than 10 particles were set to missing values. To avoid occlusion when displaying the spatial distribution of genomic features, distributions were mostly shown for a single spatial slice. Slices were typically taken perpendicular to a principal axis (such as the LR axis), with a starting and ending position selected (both must be integer multiples of the stride). Voxels lying between these positions were selected, and for overlapping voxels (those sharing the same HT and DV values), an average was computed.

Nuclear shape. We first assign True to voxels containing particles and False to those without. To measure the extent of a nucleus as observed from one axis (e.g., the LR axis), spread out in two dimensions (HT and DV), instead of setting start and end points, we utilize all voxels, applying a logical OR operation on the values of overlapping voxels (those sharing the same HT and DV values). This generates a binary image representing the presence or “silhouette” of each nucleus. By finding the outer contours of the silhouettes for all nuclei and overlaying them, we can observe the distribution of nuclear shapes. If the silhouettes are simply overlaid, the result shows how many times each pixel is occupied by a nucleus. The frequency of occupation typically exhibits a bimodal distribution, with peaks corresponding to areas densely populated by most nuclei and sparsely populated “background” areas that few nuclei reach. For consistency, we set thresholds of 500 and 700 nuclei to delineate occupied regions from the background in mouse and human sperm, respectively. Due to the bimodal nature of the data, the impact of varying these specific thresholds is minimal.

Spatial density distribution. The spatial distribution of density can depict intracellular structures such as chromocenters and nucleoli. During the reconstruction of genomic structures, particles corresponding to regions of low mappability are removed, manifesting chromocenters as areas of low density. Density is directly represented by the number of particles contained within each voxel, unlike other genomic features, all voxel values are deemed valid regardless of the number of particles they contain. When displaying single-cell spatial slices, voxels of 1 × 1 × 1 size are used for aesthetic purposes.

Spatial distribution of chromosomes. For any given chromosome, we use the ratio of particles originating from that chromosome within each voxel as the voxel’s value. For each observation axis, we employ spatial slices ranging from − 10 to 10. We only display pixels falling within the occupied region, with those in the background region set to invalid values. The variation in chromosome lengths leads to significantly different ranges in frequency distributions across various chromosomes. To accommodate these differences, we adjusted the color range for each subplot individually.

Spatial distribution of centromeres and telomeres. We are unable to detect fragments emanating directly from within centromeric and telomeric regions; thus, we represent their locations using genomic regions within 5 Mb proximal to these areas. The value of each voxel is defined by the proportion of particles belonging to these regions out of the total particle count. For each observation axis, spatial slices between − 10 and 10 are utilized, and only pixels falling within the occupied region are displayed.

Spatial distribution of single-cell compartments. Each cell employs a distinct scAB reference value calculated from its structure. For each observation axis, spatial slices ranging from − 2 to 2 are utilized, and only pixels falling within the occupied region are displayed.

Simulated DNA contamination

We randomly selected 50 mouse sperm samples that passed quality control and were amenable to structural reconstruction, pooling their contacts together to create a “contamination pool.” For each sample, we sequentially removed 1, 3, 5, 10, 15, 20, 25, and 50% of contacts at random, replacing them with an equal number of contacts randomly drawn from the contamination pool. We then used these contaminated contacts to reconstruct the structures. To quantify the impact of contamination on the quality of structural reconstruction, we calculated the proportion of contacts within each cell that hat strongly violated the reconstructed structures15. A contact was considered to strongly violate the structure if the distance between the particles harboring the contact ends in the structure exceeded 4 particle radii. Following this, we applied the same alignment procedure to these structures, fitting the parameters obtained from training on normal samples directly in the UMAP embedding. We observed the distribution of shapes for the contaminated samples by overlaying their contours.

Fluctuation of compartmentalization in single-cells

For each compartment block identified in the AB compartment list derived from pseudobulk data, we selected the top 100 single-cell samples with the highest mean scAB values and the bottom 100 with the lowest mean scAB values within that region. Given that higher scAB values indicate a stronger propensity towards compartment A and lower values towards compartment B, samples with the largest scAB values in an A block were labeled as “strong A” samples for this A block, whereas those with the smallest values were termed “weak A” samples. Conversely, for B blocks, samples were classified as “weak B” and “strong B” respectively. Due to the inherent stochasticity of single cells, the lists of strong/weak A(B) samples corresponding to each A(B) block exhibited substantial variability. For each A(B) compartment block, we quantified the distribution bias of the most central 1 Mb region within the associated strong/weak A(B) samples. This was represented by the ratio of particles originating from this region. For compartment blocks smaller than 1 Mb, the entire block was used directly.

GAM data analysis

Trim Galore (version 0.6.10) was used with default settings to automatically detect and remove adapters. We used bowtie2 (version 2.5.0) to map the reads to the reference genome. Low-quality and unmapped reads were filtered out using the view command from the samtools (version 1.17) package with options “-q 20 -F 4 -bS”. PCR duplicates were subsequently removed using the “markdup” command from samtools. Coverage across the genome was quantified in uniformly sized bins using the “multicov” command from bedtools (version 2.31.1), with bin sizes set at 50 Kb and 1 Mb. Positive bins were identified using the call_bins module from GAMtools (version 2.0.0) with a fixed threshold of 4. A normalized linkage disequilibrium (D’) matrix was generated using the “matrix” function from GAMtools, and only nuclear profiles with a mapping rate greater than 15% were utilized. The data were then converted into a cool format using custom scripts. For the compartment analysis, we first corrected for distance effects in the 1 Mb D’ matrix to generate an autocorrelation matrix. We then performed principal component analysis and selected the component with the highest correlation to GC content among the first three principal components same as ref. 29. TAD analysis was conducted using the same methods as bulk Hi-C.

Statistics and reproducibility

For single-cell Hi-C experiments in both humans and mice, we ensure two replicates per individual. The conclusions of the study are reproducible across different individuals. When analyzing single-cell Hi-C data, we aim for a pooled data depth sufficient to observe features such as compartments and TADs, leading us to collect approximately 1,000 samples, resulting in a total of 100 million contacts when pooled. Samples with fewer contacts are excluded from analysis as they may represent broken cells or insufficient amplification. Samples with insufficient contacts to reconstruct stable structures are excluded from structure-based analysis. For the GAM experiments, we use the SLICE model to predict the required number of nuclear profiles. Preliminary experiments show that increasing the number of NPs further improves data quality, so we ultimately use more samples. Samples with a mapping rate less than 15% are excluded from downstream analysis, as a low mapping rate can indicate empty samples. No statistical method was used to predetermine the sample size. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.