Fig. 2: Dynamics of the assembled human early development transcripts.

a Schematic for the in vitro differentiation of hPSC to each of the three embryonic germ layers. b Number of transcripts assembled from each cell state and the merged transcript assembly. c PCA showing the relationships between the samples based on long-read quantification. In addition to hPSC data generated in this study, hPSC (H1) and (c11) were from previously published data12. d Boxplots showing the spliced transcript lengths (left) and number of exons per assembled transcript (right). Boxplots show the median (orange line in the central bar), and second and third quartiles (central bar), and the whiskers show 1.5 times the interquartile ranges, for this and all subsequent boxplots. Statistics is from a Kruskal-Wallis H-test from 5 hPSC replicates and 2 replicates for all other cell types. kbp=kilobase pairs (e) Stacked bar chart showing the number of coding and noncoding transcripts in the hEDT assemblies (left plot), and the number of long-reads assigned to each category (right plot). f Stacked bar chart showing the number of assembled transcripts based on similarity to the GENCODE (v43) assembly. Matching transcript completely matches to a known transcript (including all exons and splice junctions), while a variant transcript has any base pair in common with a GENCODE exon. Novel transcripts do not overlap a GENCODE exon. g Bar chart of the number of novel coding and noncoding genes identified in each cell state. Because a gene might have more than one isoform, a gene could be both coding and noncoding. h Heatmap showing uniformly expressed and lineage-specific transcripts based on long-read quantification, normalized to the highest expression for each transcript. i Genome views of ectoderm-specific loci FEZF1 (left panel) and PAX6 (right panel). j Single-cell RNA-seq expression of in vitro differentiation of hPSCs47, showing uniformly expressed and marker transcripts specific to the indicated cell states. k Enrichment of SNPs with clinical relevance in novel transcripts compared to a matched random background of transcripts with equal lengths and exon structures. Clinically relevant SNPs were from Ensembl140. Statistics is from a Fisher Exact test.