Extended Data Fig. 1: Quality control of scRNA-seq data of the human developing ovaries and testes.

a, Schematic representation of the computational workflow used to analyse scRNA-seq data. b, UMAP (uniform manifold approximation and projection) of the male and female human (left) and mouse (right) scRNA-seq datasets labelled by donor and sample. Dots from the same donor or sample share a colour. For female mouse scRNA-seq data, an additional UMAP is coloured by the study of origin. c, Barplot showing the proportions of human (top) and mouse (bottom) cells profiled with scRNA-seq coloured by lineage and classified by sex and developmental stage (indicated in post-conceptional weeks (PCW) or embryonic (E) / postnatal (P) days). d, Dot plots showing the variance-scaled, log-transformed expression of genes (X-axis) characteristic of the main lineages (Y-axis) detected in male and female human (top) and mouse (bottom) scRNA-seq datasets. Top-layer groups marker genes by categories. Lineages unique to developing ovaries and testes are highlighted with "*". e, Predicted cell annotations from Li et al. 2017 scRNA-seq analysis of human gonads on our human scRNA-seq dataset. Labels were transferred using scmap separately for females (left) and males (right), with a cutoff of 0.5. Cells that do not pass the 0.5 cutoff are labelled as “unassigned”. Colour legend for the main lineages match those in Extended Data Fig. 1c. f, Boxplot showing the predicted probabilities of human cell types transferred with a Support Vector Machine (SVM) model onto manually annotated mouse cell types around the time of sex determination (n = 29,297 cells; left) for both females and males, or considering all developmental stages combined for ovaries (n = 70,379 cells; middle) and testes (n = ; 32,889; right) separately. The box extends from the lower to upper quartile values of the data, with a line at the median. The whiskers extend from the box to show the range of the data. Flyer points are those past the end of the whiskers. CoelEpi = coelomic epithelium; E = embryonic day; Endo = endothelial; Epi = epithelial; FGC = fetal germ cells; P = postnatal day; PCW = post-conceptional weeks; SMC = smooth muscle cells; Soma = somatic; PV = perivascular; Mese = mesenchymal.