Fig. 5: Genes2Genes compares cell differentiation trajectories between healthy lung and disease lung in idiopathic pulmonary fibrosis.

a, Schematic of the healthy and IPF cell differentiation trajectories of focus, that is, differentiation of alveolar type 2 (AT2) cells into alveolar type 1 (AT1) cells in the healthy lung (reference) versus ABCs in the IPF lung (query). b, Aggregate alignment over the alignments of all highly variable genes (HVGs) (top). Stacked barplots represent reference and query cell-type compositions across 13 equispaced pseudotime points; boxed segments represent mismatches; black lines represent matches. The pairwise time point matrix between healthy and IPF pseudotime (bottom). Color represents total gene count showing a match between corresponding healthy and IPF time points. White line represents the average alignment path. c, Aggregate alignment over the alignments of 88 ABC marker genes (Supplementary Fig. 3) plotted as in b, with the aggregate alignment schematic on top, and the pairwise time point matrix in the middle. Gene expression plots for three example ABC marker genes (KRT17, MMP7 and FN1) between IPF (blue) and healthy (green) data along pseudotime, plotting interpolated log1p-normalized (per-cell total raw transcript counts normalized to 10,000 and log1p-transformed) expression (y axis) against pseudotime (x axis) (bottom). Bold lines represent mean expression trends; faded data points are 50 random samples from the estimated expression distribution at each time point. Black dashed lines represent matches between time points. d, Aggregate alignment path (white) for all EMT pathway genes, plotted on the pairwise time point matrix between healthy and IPF as in b, with the schematic on the right (top right). Heatmap of the smoothened (interpolated) and z-normalized mean log1p gene expression of genes in the EMT pathway along pseudotime (bottom right). e, Gene expression of CAMK1D between IPF (blue) and healthy (green) along pseudotime. Interpolated log1p-normalized expression (y axis) against pseudotime (x axis) as in c (top). Actual log1p-normalized gene expression versus pseudotime plots (bottom). The illustration in a was created using BioRender (https://biorender.com). All interpolations and statistics were generated using our G2G framework.