Introduction

Fibrosing interstitial lung diseases (ILDs) are a heterogenous group of scarring disorders of the lungs that cause progressive respiratory failure and eventually, in the absence of treatment or lung transplantation (LTx), premature death. Idiopathic pulmonary fibrosis (IPF) is the most common, deadly and difficult to treat1,2,3. Although anti-fibrotic treatment has become available, advanced pulmonary fibrosis due to fibrosing ILD is now the leading indication for LTx worldwide4.

When a diagnosis cannot be reached non-invasively, the diagnostic approach to ILDs includes surgical lung biopsies (SLBs) and multidisciplinary discussion (MDD). MDD increases diagnostic confidence and agreement5, but its accuracy remains largely unknown. While the diagnosis of some types of fibrosing ILDs (such as non-specific interstitial pneumonia [NSIP] and chronic hypersensitivity pneumonitis [CHP]) often requires SLBs, IPF may be diagnosed based on well-defined clinical-radiographic criteria. However, when the radiographic presentation is atypical for IPF, SLB is still required.

Although SLB in the context of MDD is considered the gold standard in the diagnosis of ILDs, diagnostic agreement among pathologists is only modest, due to the heterogeneity of findings6. In addition, more than 1 ILD pattern may be found in the same patient7. In a recent study, 16% of patients in an ILD cohort were still unclassifiable after SLB8. These findings clearly outline an urgent need for diagnostic tools with improved accuracy to identify the specific type of fibrosing ILD in individual patients.

Gene expression profiling has been highly effective in reclassifying clinically relevant disease phenotypes with similar histologic presentation9. A major limitation of the previous gene expression profiling studies in fibrosing ILDs10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25, however, is their use of whole lung, bulk tissue samples, where any process that is not quantitatively prevalent may be overlooked, even if biologically of upmost importance26. As many genes are multifunctional, it is also important to assess their expression without losing spatial resolution27. Transcriptomic profiles of bulk tissue homogenates are unlikely to retain the complexity of certain types of fibrosing ILDs, such as IPF and CHP, characterized histologically by regional, temporal and cellular heterogeneity of the lung microenvironments (LMEs)28,29. LMEs consist of a dynamic population of cellular and non-cellular components, which form a multifaceted regulatory network that helps to maintain homeostasis. With the exception of a few studies with limited spatial resolution26,30,31,32, the transcriptional features of specific LMEs in fibrosing ILDs are unknown.

Recently, single-cell RNA-sequencing (sc-RNA-seq) methods identified known and novel cell populations with markedly improved precision33,34,38. The generation of a high-resolution, comprehensive catalog of the changes in lung cellular composition is expected to lead to development of novel cell-specific biomarkers. Notwithstanding, the pathogenesis of fibrosing ILDs involves the active interaction of different types of cells39. Epithelial-driven myofibroblast proliferation is a key example of this concept40. sc-RNA-seq can profile the molecular heterogeneity of the different cell-types and their changes during disease. However, without spatial context, it is unclear how these different cell types coordinate tissue functions. Lastly, sc-RNA-seq is intrinsically prone to the dissociation bias that may lead to spurious changes in cell distribution, limiting the ability to provide an exact numeric description of the changes in cell populations across diseased conditions.

The use of spatially-resolved transcriptomic data can identify links between LMEs and functions that cannot be achieved from sc-RNA-seq alone. Digital spatial profiling (DSP) with the GeoMx platform is a novel computational procedure that generates a whole tissue image and digital profiling data for thousands of transcripts (Fig. 1a). This analysis provides a quantitative assessment of the biological implications of tissue sample heterogeneity and generates spatially-defined transcriptomic profiles.

Fig. 1
figure 1

GeoMx workflow and summary of ILD conditions.(a) FFPE sections are stained with in situ hybridization probes and fluorescently tagged antibodies. Next, immunofluorescence imaging is used to select ROIs. These ROIs are exposed to UV light to photocleave oligonucleotide barcodes. Indexed libraries are generated from these barcodes and sequenced. (b) An alluvial diagram showing five conditions (left column), and matching annotations (right column). (c–g) Representative sections stained with immunofluorescence markers for indicated LMEs in each condition (blue: DNA; red: smooth muscle actin; yellow: CD45; green: pan-cytokeratin). Sampled ROIs are circled in white. Scale bars represent 50 µm.

In this study, we applied DSP to further our understanding of gene expression within the context of the spatial organization of different types of fibrosing ILDs and normal controls. As differences in pathology and gene expression associated with fibrosing ILDs are localized to specific LMEs, the ability to spatially localize gene expression at the interstitial level is critical to gain further insight into disease mechanisms.

Results

Selection of LMEs as ROIs

Demographic, clinical and functional characteristics of patients and their post-SLB outcomes are shown in Table 1. Detailed radiographic and pathologic features of each case are described in Supplementary Table S 1. No post-operative complications were observed.

Table 1 Patient characteristics at the time of surgical lung biopsies, and their outcomes (n = 3 in each group).

Based on the pathologist’s annotations, we selected 179 regions of interest (ROIs) from 12 fibrosing ILD cases and 3 normal controls using the GeoMx platform. These ROIs were selected to capture four histologically distinct LMEs representative of each condition studied (Fig. 1b). Unique vs. shared LMEs (right column) across ILD conditions (left column) are depicted through colour-coded bands (by condition) connecting the two columns in the alluvial plot. Height of bands along the y-axis depicts the total number of ROIs for each LME. Representative immunofluorescence and H&E stained sections are shown in Fig. 1c–g and the appendix, respectively.

Nanostring library preparation and exploratory analysis

The raw probe count and ROI annotation tables were obtained from GeoMx-DSP. Supplementary Fig. 1a shows no evidence of biased library sizes or nuclei counts across any conditions or annotations. Based on a scatter plot of percent sequencing saturation vs. alignment, we removed 7 ROIs that did not meet the 90% cutoff in either metric (Supplementary Fig. 1b, Supplementary Table 2). Next, we used standR to perform exploratory analysis and normalization41. We compared 3 different normalization schemes: CPM, Q3 and Remove Unwanted Variation (RUV4)42(Supplementary Fig. 1c–e). While all 3 methods generated similar distribution of relative log expression across ROIs (left panels), only the RUV4 method removed the patient batch effect as seen in principal component analysis (PCA) plots, where ROIs are grouped by patients (middle panels). PCA plot based on RUV4 normalization showed 3 distinct clusters corresponding to normal control, IPF and NSIP (Supplementary Fig. 1e, right). In contrast, CPM or Q3-normalized PCA plots showed clusters grouped by patients rather than condition (Supplementary Fig. 1c-d, center and right). The RUV4 method also mitigated the impact of batch effect from the time of sample preparation (Supplementary Fig. 1f). To perform downstream differential gene expression analysis with batch correction, we used the limma-voom pipeline with the weight matrices from RUV4 as covariates for the linear model43,44. The design matrix is described in detail in the Methods.

Gene expression patterns in LMEs within each condition

We leveraged our dataset to assess tissue heterogeneity that cannot be resolved with bulk RNA-sequencing. First, we subset our data to each disease condition and its associated LMEs to identify differentially expressed genes and gene sets between annotations. Within IPF, we observed 3 distinct clusters of uninvolved regions, fibroblastic foci (f.foci) and fibrotic/lymphoid regions (Fig. 2a). To identify the most striking differences between these LMEs, we performed pairwise comparisons and identified genes with the highest absolute log2 fold change and − log P values. Figure 2b shows volcano plots of − log adjusted P value vs. log2 fold change for each pairwise comparison. Collagen genes (COL1A1, COL1A2, COL5A1, COL8A1, COL3A1) were significantly upregulated in f.foci compared to fibrotic or uninvolved regions. On the other hand, surfactant genes (SFTPA2, SFTPA2, SFTPB, SFTPC) were downregulated in both fibrosis and f.foci compared to uninvolved regions (Supplementary Table S3).

Fig. 2
figure 2

Gene expression profiles across pathologically distinct ROIs within each subtype of fibrosing ILDs. (a) PCA plot of ROIs in IPF grouped by annotation. (b) Volcano plots showing up (red) or downregulated (blue) genes for indicated comparisons between annotations in IPF. (c) PCA plot of ROIs in NSIP grouped by annotation. (d) Volcano plots showing up (red) or downregulated (blue) genes for indicated comparisons between annotations in NSIP. (e) PCA plot of ROIs in CHP grouped by annotation. (f) Volcano plots showing up (red) or downregulated (blue) genes for indicated comparisons between annotations in CHP. (g) GO “biological process” gene sets (y axis) that are overrepresented by significantly upregulated genes in indicated comparisons (x axis). Top horizontal labels indicate which condition a comparison belongs to. (h) GO “biological process” gene sets (y axis) that are overrepresented by significantly downregulated genes in indicated comparisons (x axis). Top horizontal labels indicate which condition a comparison belongs to. (b), (d), (e): horizontal and vertical dotted lines indicate log2 fold change of 1 and adjusted P value of 0.05, respectively. f.foci: fibroblastic foci. peripheral f: peripheral fibrosis. central f: central fibrosis. Inflam.: inflammatory.

Within NSIP, inflammatory regions formed a clearly distinct cluster away from peripheral or central fibrosis, and airways (Fig. 2c). Pairwise comparisons showed that fibrotic regions (both central and peripheral) upregulated surfactant genes (SFTPA2, SFTPB, SFTPC) while downregulating several chemokines (CXCL13, CCL19, CCL21) compared to inflammatory regions (Fig. 2d). No genes were differentially expressed between the central and peripheral fibrotic regions in NSIP, which is consistent with their overlap in the PCA plot (Supplementary Table S4).

LMEs in CHP showed a degree of overlap between regions of airway inflammation, granulomas, and fibrotic regions (Fig. 2e). Granulomas downregulated surfactants (SFTPA1, SFTPA2, SFTPB, SFTPC) compared to uninvolved regions (Fig. 2f). No genes were robustly upregulated in any of the comparisons. Only few genes were modestly downregulated in granulomas compared to regions of airway inflammation (Supplementary Table S5).

Differential gene expression between spatially distant regions is driven by both heterogeneous composition of cell types in ROIs and deregulated gene expression. We sought to qualify the contribution of these two factors by performing reverse deconvolution with “SpatialDecon”45. First, we estimated the proportions of 21 lung cell types (binned into 10) in each ROI for each ILD condition (Supplementary Fig. 2a–c). These estimates appeared to be accurate. For example, f.foci and airways were enriched with fibroblasts and ciliated cells, respectively. Then, we compared the correlation between predicted and expected gene expression against standard deviation of their residuals for each gene (Supplementary Fig. 2d–f). Differential expression of genes with high correlation and low SD are redundant with differential cell composition (lower right quadrant), those with high correlation and high SD are confounded by differential cell composition (upper right quadrant), and those with low correlation and high SD are independent of differential cell composition (upper left quadrant). We observed that a subset of differentially expressed genes between IPF and NSIP regions was confounded by cell composition (Supplementary Fig. 2d,e, Supplementary Table S6S7). Most differentially expressed genes between CHP regions were redundant with differential cell composition (Supplementary Fig. 2f, Supplementary Table S8).

To determine whether these genes in each comparison perform similar biological functions, we performed over-representation analysis using Gene Ontology “biological process” gene sets46. In addition, we grouped the output into a dot plot to facilitate a meta comparison between pairwise comparisons. Figure 2g shows gene sets, called from upregulated genes, whose adjusted P values and gene ratios are represented by color and size, respectively. Pairwise comparisons are indicated along the x axis (bottom) and the disease condition they belong to (top). Consistent with the volcano plots, f.foci in IPF upregulated gene sets related to extracellular matrix organization compared to both fibrotic and uninvolved regions (Fig. 2g, left column). In contrast, gene sets related to B cell function, such as complement activation and phagocytosis, were upregulated in fibrotic compared to uninvolved regions. These gene sets were mutually exclusive with those upregulated in comparisons between LMEs in NSIP. Both central and peripheral fibrotic regions upregulated gene sets related to peptidase activity and cholesterol transport compared to inflammatory regions (Fig. 2g, right column). Upregulated genes that are annotated in these gene sets and volcano plots are shown in Supplementary Fig. 2g.

We repeated the same analysis with downregulated genes between pairwise comparisons within each condition. A common list of gene sets (surfactant homeostasis, regulation of peptidase activity, etc.) were enriched by genes downregulated in both fibrosis and f.foci compared to uninvolved regions of IPF (Fig. 2h, left column). These were mutually exclusive with gene sets such as B cell activation that were enriched by genes downregulated in both central and peripheral fibrotic regions compared to inflammatory regions in NSIP (Fig. 2h, middle column). These results were consistent with downregulation of surfactant and chemokine genes in IPF and NSIP, respectively (Fig. 2b, d). Lastly, gene sets related to endothelial cell differentiation and development were uniquely enriched in genes downregulated in both fibrosis and granulomas compared to uninvolved regions in CHP (Fig. 2h, right column). Genes annotated in surfactant homeostasis, B cell activation and endothelial cell differentiation gene sets are described in Supplementary Fig. 2h.

With this analysis, we identified high-resolution transcriptomic profiles of distinct LMEs within each of the 3 subtypes of fibrosing ILDs. Differential gene expression and over-representation analyses showed that mutually exclusive gene sets were enriched by up- or downregulated genes between LMEs in each condition.

Comparison between LMEs across disease conditions

Next, we sought to compare fibrotic regions across disease conditions to gain insight into their unique gene expression signatures, despite their common pathological description. Central/peripheral fibrotic regions of NSIP and f.foci of IPFs each formed distinct clusters, while fibrotic regions in IPF and CHP largely overlapped each other (Fig. 3a). To identify the most striking differences between these LMEs, we selected the top 50 differentially expressed genes, sorted by their F statistic. To assess variability between individual ROIs, their normalized expression as row-scaled Z scores is shown in a heatmap (Fig. 3b, Supplementary Table S9). We used an agglomeration method that grouped the top 50 genes into 2 clusters. Based on the Z score and log2 fold change, the orange cluster (top) represented genes that were downregulated in both IPF (fibrosis and f.foci) and CHP (fibrosis) compared to central/peripheral fibrotic regions of NSIP. Conversely, the turquoise cluster (bottom) represented genes such as COL6A3, COL1A1 and COL8A1 that were upregulated uniquely in f.foci of IPF compared to the rest. Spatial deconvolution revealed that most of these differentially expressed genes were redundant with differential cell composition (Supplementary Fig. 3a,b). MXRA5, ASPN, COL12A1 and CD248 were upregulated in f.foci of IPF independent of cell composition (Supplementary Table S10). Based on over-representation analysis of pairwise comparisons, both IPF (fibrosis and f.foci) and CHP (fibrosis) significantly downregulated gene sets related to antigen processing/presentation, immune cell function and surfactant homeostasis compared to central/peripheral fibrotic regions of NSIP (Fig. 3c). Indeed, many of these genes were found in the orange gene cluster in the heatmap (Supplementary Fig. 3e). Even compared to fibrotic regions in CHP and NSIP, gene sets related to extracellular matrix organization were upregulated in f.foci of IPF (Fig. 3c), consistent with collagen gene upregulation in the turquoise cluster in the heatmap (Fig. 3b). These data show unique sets of gene expression profiles in fibrotic and fibroproliferative (f.foci) regions in IPF, CHP and NSIP.

Fig. 3
figure 3

Gene expression profiles across subtypes of fibrosing ILDs. (a) PCA plot of ROIs (fibrosis or fibrotic foci) across IPF, NSIP and CHP. (b) Two clusters of top differentially expressed genes (sorted by F statistic) between central fibrosis of NSIP, fibrosis of CHP, fibrosis of IPF, and fibroblastic foci of IPF. Expression is shown as row-scaled Z scores. Each column represents a unique ROI that belongs to indicated annotation-subtype. A row dendrogram is shown on the left. (c) GO “biological process” gene sets (y axis) that are overrepresented by significantly downregulated genes in indicated comparisons (x axis). (d) PCA plot of ROIs that represent hallmarks of each subtype of fibrosing ILDs. (e) Two clusters of top differentially expressed genes (sorted by F statistic) between inflammatory regions of NSIP, granuloma of CHP, and fibroblastic foci of IPF as in (b). (f) GO “biological process” gene sets (y axis) that are overrepresented by differentially expressed genes in indicated comparisons (x axis). (g),(h) Expression of differentially expressed genes between fibrosis/fibroblastic foci ROIs or hallmark ROIs in matched bulk samples, respectively. f.foci: fibroblastic foci.

Next, “hallmark” LMEs in IPF, CHP and NSIP, namely f.foci, granulomas, and inflammatory regions, respectively47,48,49, were compared with each other. Their histological presentation is used in the clinic as diagnostic markers for each ILD condition. Figure 3d shows 3 distinct clusters of these regions grouped by annotation-condition. Among the top 50 differentially expressed genes between these regions, collagens were uniquely upregulated in f.foci of IPF (Fig. 3e, Supplementary Table S11). On the other hand, inflammatory regions of NSIP upregulated ribosomal genes and chemokines. Granulomas of CHP showed relatively neutral expression of these genes in comparison. These genes were somewhat confounded by differential cell composition, but only few were redundant with it (Supplementary Fig. 3c-d, Supplementary Table S12). Based on over-representation analysis, gene sets related to extracellular matrix organization, and cytoplasmic translation were unique profiles that defined inflammatory regions of NSIP, and f.foci of IPF, respectively (Fig. 3f, Supplementary Fig. 3f). In summary, this analysis shows that the LMEs that are used as diagnostic hallmarks to each subtype of fibrosing ILDs exhibit largely distinct gene expression profiles on spatial transcriptomics analysis.

Direct transcriptional comparison between bulk tissue and DSP in matched cases

We hypothesized that spatially-resolved differential gene expression patterns identified by DSP may be reproduced, at least to some extent, into bulk tissue samples collected from the same biopsies. To this end, we performed bulk RNA-seq of matching samples (2 IPF, 2 CHP, and 3 NSIP cases included in the DSP analysis) and unmatched samples (2 NSIP, and 1 CHP).

We visualized genes that were differentially expressed between regions of fibrosis or fibroproliferation in IPF, CHP and NSIP (Fig. 3b). In bulk tissue samples, the cluster of genes that were significantly upregulated in both fibrotic regions of NSIP were generally upregulated in NSIP, compared to IPF or CHP (Fig. 3g). However, none of these genes met the statistical cutoff (Supplementary Table S13). Likewise, we observed high variability between the 3 IPF cases, as only one showed high expression of genes specific to f.foci (Fig. 3g). In addition, expression of differentially expressed genes between hallmark LMEs (Fig. 3e) was even more stochastic in bulk samples (Fig. 3h). Two out of the 5 CHP cases showed high expression of ribosomal genes like the inflammatory regions of NSIP. Expression of PRMT8 and ZNF587 was generally higher in CHP and IPF than NSIP, which was the opposite of their expression in the hallmark LMEs (Fig. 3e). Taken together, this comparison shows that our spatially resolved data provides more granular insight into gene expression patterns in pathologically relevant regions across fibrosing ILDs compared to bulk RNA-seq.

Comparison between uninvolved regions of IPF and CHP, and bona fide normal controls

Next, we hypothesized that uninvolved, normal-looking regions in fibrosing ILDs may be transcriptionally deregulated compared to truly normal controls. Indeed, we observed a striking separation between bona fide normal controls and uninvolved regions in both IPF and CHP (Fig. 4a, Supplementary Table S14). Unlike fibrotic regions in IPF and CHP that largely overlapped each other (Fig. 3a), their uninvolved regions formed 2 distinct clusters. Consistent with the PCA plot, the top 50 differentially expressed genes between these regions were divided into 2 groups: those that were upregulated in uninvolved regions of IPF and CHP compared to bona fide normal controls, and vice versa (Fig. 4b). As expected, there were no striking differences in cell composition between these regions (Supplementary Fig. 4a), and the top 50 genes were not strongly confounded by it (Supplementary Fig. 4b, Supplementary Table S15). Wound healing and blood coagulation pathways were enriched among upregulated genes in uninvolved regions of IPF and CHP (Fig. 4c, Supplementary Fig. 4c). Conversely, stress response to metal ion pathways were over-represented by downregulated genes. (Fig. 4d, Supplementary Fig. 4d). We performed pairwise comparisons and observed that IPF and CHP largely shared their up or downregulated genes compared to bona fide normal controls (Fig. 4d–e, Supplementary Table S16). Differentially expressed genes with larger fold change were less dependent on cell composition (Supplementary Fig. 4e, Supplementary Table S17). Comparing IPF to CHP directly, only few genes were identified as differentially expressed (Fig. 4d). This is in line with the relatively lower variance explained by the second principal component (PC) compared to the first PC that separates uninvolved regions from IPF and CHP from bona fide normal (Fig. 4a). Overall, our data suggest that significant transcriptional deregulation is already present in uninvolved regions of both IPF and CHP, compared to bona fide normal controls.

Fig. 4
figure 4

Gene expression profiles of uninvolved regions in IPF and CHP, and central regions in normal cases. (a) PCA plot of ROIs, grouped by condition. (b) Two clusters of top differentially expressed genes (sorted by F statistic) between uninvolved regions in IPF and CHP, and central regions in normal controls. Expression is shown as row-scaled Z scores. Each column represents a unique ROI that belongs to indicated annotation-subtype. A row dendrogram is shown on the left. (c) Top 5 overrepresented GO biological process pathways by significantly up or downregulated genes based on adjusted P value < 0.05 and absolute log2 fold change ≥ 1 in indicated comparisons. Dot size and color represent the number of genes in each pathway, and adjusted P value, respectively. (d) Volcano plots of differentially upregulated (red) or downregulated (blue) genes based on adjusted P value < 0.05 and absolute log2 fold change ≥ 1 for indicated pairwise comparisons between regions. (e) Venn diagrams of the number of up or downregulated genes in uninvolved regions of IPF or CHP compared to bona fide normal controls.

DSP gene expression signatures to classify previously unclassified cases

Modest diagnostic agreement and frequency of unclassified ILD cases underscore the need for novel diagnostic tools and approaches6,8. Thus, we applied spatially-resolved gene expression profiles to test if they can classify previously unclassifiable ILD cases into one of IPF, NSIP or CHP. Based on our earlier comparative analyses, we developed a decision tree (Fig. 5a). First, we classified an unclassified case (N4) as NSIP based on high expression of genes specific to central/peripheral fibrotic regions of NSIP compared to IPF or CHP (Fig. 5b). Using an agglomeration method, we observed that all confirmed NSIP samples and case N4 clustered together, while most IPF or CHP cases clustered together separately. IPF and CHP cases did not cluster separately from each other, as their expression of these genes were low compared to NSIP. This was consistent with our earlier observation that fibrosis in IPF and CHP clustered together on the PCA plot (Fig. 3a). In contrast, the uninvolved ROIs of IPF and CHP clustered separately on the PCA plot (Fig. 4a). We leveraged this to test if the remaining unclassified cases would cluster with either IPF or CHP. Figure 5c shows that the uninvolved regions from M26 and N5 clustered closely with confirmed CHP cases (circled in blue) compared to IPF (circled in pink). Taken together, this analysis shows that spatially-resolved transcriptomic profiles can be used proof-in-principle to classify previously unclassified cases as one of known subtypes of fibrosing ILDs.

Fig. 5
figure 5

Classification of unclassified cases with gene expression profiles in fibrosis or uninvolved regions. (a) Flowchart depicting a strategy to group unclassified cases as one of IPF, CHP, or NSIP. (b) Heatmap of top (sorted by F statistic) differentially expressed genes between fibrotic regions across conditions that were identified in Fig. 3b. Expression is shown as row-scaled Z scores. Columns are clustered based on the complete agglomeration method. Patient ID and condition are indicated by color bars above the heatmap. (c) PCA plot showing central regions in bona fide normal (triangle), and uninvolved regions in IPF (circle), CHP (square), and unclassified cases (diamond). Clusters of IPF and CHP are enclosed in pink or blue ellipses, respectively. Unc.: unclassified.

Discussion

Our study is the first to thoroughly characterize spatially resolved, whole-genome transcriptomic profiles of histologically distinct regions of IPF, NSIP, and CHP. Our work offers novel insight into differentially expressed genes and biological functions both within specific LMEs of fibrosing ILD, including normal-looking regions, and across disease conditions that traditional bulk RNA-seq cannot capture. We show that such spatially-resolved gene expression profiles are not easily reproducible in matched bulk tissue samples, but can potentially be leveraged to reclassify previously unclassified cases into IPF, NSIP or CHP.

Based on previous literature and our data, transcriptional heterogeneity is a typical feature of IPF. In line with a previous report, several collagen genes and related gene sets were upregulated in f.foci compared to uninvolved or fibrotic regions50. On the other hand, various immunoglobulins and chemokines that are related to B cell activity were uniquely upregulated in fibrotic regions. This is consistent with CD20-positive staining in tight aggregates in fibrotic regions, adjacent to f.foci based on immunohistochemistry in a separate IPF cohort51. This is also consistent with previous reports of persistent inflammatory activity, observed even in advanced cases of IPF52. A previous study using laser capture microdissection suggested that TGF-β1-induced TSC2/RHEB axis was strongly associated with collagen gene expression in myofibroblasts found in f.foci31. RHEB was mechanistically required for collagen expression upon TGF-β1 treatment in primary human lung fibroblasts in culture. Although collagens and TGF-β1 were significantly upregulated in f.foci compared to fibrosis or uninvolved regions in our data, TSC2 and RHEB were not differentially expressed between f.foci, fibrosis or uninvolved regions (Supplementary Table S3).

Furthermore, we bring new evidence for significant transcriptional deregulation occurring in uninvolved, normal-looking regions, which, importantly, were remarkably distinct from bona fide normal controls. Luzina et al. reported that genes related to ECM organization were highly upregulated in both uninvolved and macroscopically scarred regions in IPF explants compared to health controls30. We found that this pathway was specifically upregulated in f.foci compared to both uninvolved and fibrotic regions. The authors also argued that genes related to ciliated epithelium were upregulated in scarred regions compared to uninvolved regions. However, one of the three normal controls also upregulated genes related to this gene set, which suggests that there is significant variability. We contend that mildly diseased regions were likely captured as uninvolved regions from their use of macroscopically-dissected tissues from explants. Our work is more in line with Blumhagen et al.53 who, using the Nanostring platform, reported upregulation of various immunoglobulins in regions of dense fibrosis compared to uninvolved regions, echoing our findings.

Our analysis of uninvolved regions in IPF and CHP revealed several genes that were divergently deregulated, despite being histologically normal-looking. Indeed, next-generation therapeutics should ideally target deregulated pathways to prevent fibroproliferation before it occurs, as this process is irreversible54. For example, STK16, a serine-threonine kinase, that was robustly upregulated in uninvolved regions of IPF and CHP compared to bona fide normal controls, can be targeted with a selective inhibitor55. STK16 also has been shown to promote VEGF expression, so it may be mechanistically, and temporally upstream of fibrotic tissue remodelling56.

In NSIP, a histologically more homogeneous condition, we identified clearly distinct clusters of gene expression between inflammatory and fibrotic regions. This finding helps explaining the variability in clinical outcomes and response to immunomodulatory therapy observed among NSIP cases57, probably depending on the prevalence of the inflammatory vs. fibrotic component. Another clinically relevant finding was the aberrantly upregulated B cell activation in NSIP inflammatory regions, which could be targeted by rituximab, a monoclonal antibody against CD20 expressed on B cells. Rituximab has in fact recently shown to stop progression in patients with NSIP and NSIP-like patterns58.

Upregulation of ribosomal genes and enrichment of cytoplasmic translation was also significantly enriched in NSIP inflammatory regions. Ribosome biogenesis is linked to anti-viral, pro-inflammatory signaling via High Mobility Group Box-2 (HMGB2), a dsDNA-binding protein, in human cytomegalovirus-infected cells59. HMGB family proteins are pro-inflammatory molecules, once released from the nucleus upon cellular injury60,61. Intratracheal HMGB1 administration has been shown to induce acute lung inflammation in mice62. Furthermore, HMGB1 expression is elevated in patients with CHP and NSIP compared to control subjects63. Taken together, future studies are warranted to determine whether inflammation in CHP and NSIP can be further targeted by inactivating this pathway.

In CHP, a relatively heterogenous condition from a histological point of view, the transcriptional overlap between regions of airway inflammation, granulomas, and fibrotic regions was an unexpected finding. De Sadeleer et al. characterized gene sets that were up- or downregulated in fibrotic HP explant lungs32. In line with this work, we also found that AGER, EMP2 and VEGFA were significantly downregulated in fibrotic regions compared to uninvolved regions. Lastly, the striking similarity between fibrotic regions of IPF and CHP, another histologically heterogenous fibrosing ILD, was a surprising finding, and may help explaining both the high rate of disease progression64 and the frequent detection of combined usual interstitial pneumonia pattern (typical of IPF) among cases of CHP65.

The identification of the most pathobiologically relevant molecular markers of each fibrosing ILD, derived from DSP, may have a key impact on the existing diagnostic algorithm. For example, we found that NSIP fibrotic regions were transcriptionally definitely distinct from fibrotic regions of IPF or CHP, based on clustering and upregulation of several genes. We were able to leverage this to show that one of the unclassified cases was in fact similar to NSIP. We then used the uninvolved regions of IPF and CHP to show that the 2 remaining unclassified cases appeared likely to be CHP. Thus, a DSP decision tree may potentially be used to categorize unclassified cases as one of IPF, CHP or NSIP, based on spatially-resolved gene expression signatures from relevant LMEs. Unfortunately, we were not able to validate this scheme as there are no publicly available, comparable Nanostring dataset. A similar study in an external cohort is warranted.

Pathologic hallmark LMEs of each fibrosing ILD showed distinct gene signatures with DSP, but these could not be reproduced in matched bulk tissues from the same SLBs. While gene signatures to diagnose IPF have been proposed in smaller, transbronchial biopsies (TBBs)66, the latter introduce a level of sampling variability that is difficult to control. As a result, relevant LMEs may be significantly underrepresented in individual TBB samples. It was then unsurprising to find that such bulk tissue-derived gene signatures did not predict progression of disease67. Spatial transcriptomics may then be used in the future to assess and validate the level of relevant LMEs representation in smaller biopsies.

Comparative gene expression data from fibrosing ILDs have been reported, but without spatial resolution22. The lack of reproducibility between gene expression based on Nanostring-DSP and matching bulk tissue data underscores the importance of spatial resolution. Others have characterized gene expression with spatial resolution, but only within IPF53, or limited to a custom panel68. Strengths of our study are presented by the inclusion of a larger spectrum of fibrosing ILDs, comprising entirely of treatment-naive cases, the inclusion of previously unclassifiable cases and normal controls, and the simultaneous comparison with bulk tissues obtained from patient-matched cases. Our study design allowed us to remove advanced disease severity and treatment as confounding variables, which are known to impact gene expression13,69,70. Due to such inclusion criteria, our study is limited to a relatively small biological sample size. In addition, the GeoMX platform precluded us from capturing contiguous areas in favour of discrete ROIs that are often spatially distant from one another. Our study is not easily comparable to previous transcriptomic studies combined with specific cell enrichment approaches71,72,73.

In conclusion, we provide a detailed catalog of gene signatures of the most biologically relevant LMEs of fibrosing ILDs. DSP provides an unparalleled level of insight in spatial transcriptomics of fibrosing ILDs, unmatched by bulk tissue analysis. We provided a rationale to further investigate a DSP-based procedure to reclassify previously unclassified cases, and to qualify the transcriptional relevance of small biopsies for clinical use in the differential diagnostic process. Finally, spatial transcriptomics may be used to acquire new therapeutic targets located in normal-looking, but transcriptionally deregulated regions, before they become fibrotic, therefore preventing further loss of lung function.

Methods

Case selection

Treatment-naïve cases of IPF, NSIP, CHP and unclassifiable ILD (3 cases for each condition) were selected. All 12 patients underwent SLB for diagnostic purposes and were discussed at MDD meetings, where the final diagnosis was reached. Three normal control samples were obtained from traumatic lung injury patients with no underlying lung disease. No patient has clinical or serologic features of autoimmune disease and no patient received immunosuppressant or anti-fibrotic treatment prior to SLB (Table 1, Supplementary Table S1). Informed, written consent was obtained from each patient. To ensure high-quality outcome data, post-biopsy prospective cohort follow-up of patients was systemically collected according to our observational, registered study (NCT03836417). The study was approved by the Research Ethics Board of Western University (113,419 and 123121). All methods were performed in accordance with relevant guidelines and regulations.

Digital spatial RNA profiling

The GeoMx Digital Spatial Profiler (DSP)(Nanostring®, Seattle, WA) was used to select regions from formalin-fixed, paraffin-embedded tissue sections at the University of Minnesota Genomics Center. Tissue sections were stained first with hematoxylin and eosin, and ROIs were pre-selected (Appendix). Four types of ROIs that coincided with LMEs of interest were chosen from each case of the conditions considered:

IPF: 3 fibroblastic foci; 3 regions of fibrosis; 3 regions of lymphoid aggregates; 3 regions of uninvolved lung.

NSIP: 3 regions of peripheral fibrosis; 3 regions of central fibrosis; 3 regions of inflammation; 3 airways.

CHP: 3 granulomas; 3 regions of airway inflammation; 3 regions of fibrosis; 3 regions of uninvolved lung.

Unclassifiable: 3 regions of fibroblasts; 3 regions of fibrosis; 3 lymphoid aggregates; 3 regions of uninvolved lung. For one out of three total cases, two regions of uninvolved lung were sampled.

Normal controls: 3 central regions; 3 peripheral regions; 3 airways; 3 pleural regions.

Condition-annotation relationship is summarized in Fig. 1e.

Separate sections underwent immunofluorescence with morphology markers (DNA SYTO 13, pan-cytokeratin Alexa Fluor 532, alpha-smooth muscle actin PE-Cy5, and CD45 Alexa Fluor 594), combined with in-situ RNA hybridization probes conjugated to barcoding oligos by a photocleavable linker (Fig. 1c–g). As probes, the GeoMx Human Whole Transcriptome Atlas (GeoMx Hu WTA, NanoString®) was used. This atlas is designed for comprehensive quantification of 18,677 genes. ROIs were subsequently identified by morphology markers, then sequentially exposed to UV light to decouple the barcoding oligo from RNA probes. Decoupled oligonucleotides were rapidly aspirated using a microcapillary without touching the sample. Oligonucleotides were deposited into wells of a microtiter plate, and the information contained within each well were indexed to the ROI on the tissue. The oligonucleotides were hybridized to NanoString® barcodes and sequenced using NovaSeq 6000 at the McGill Genome Center.

DSP analysis

Probe count and sample metadata files were obtained from GeoMx DSP control center/v3.0.0.113 (https://geomx.genomics.umn.edu). standR package was used to perform data exploration and normalization in RStudio running R/4.2.241. Briefly, ‘readGeoMx’ command was used to create a Spatial Experiment object. Based on sequence alignment or saturation, ROIs below 90% were removed (Supplementary Fig. 1b). ‘plotRLExpr’ and ‘drawPCA’ commands were used to generate relative log expression and PCA plots, respectively (Supplementary Fig. 1). Different normalization schemes were compared using ‘geomxNorm’ command. Patient batch correction with RUV4 was performed with ‘geomxBatchCorrection’42. ‘findNCGs’ was used to determine that the number of factors of unwanted variation (k) was 4.

For differential gene expression analysis, limma-voom method was used43,44. R script is available at https://github.com/kimsjune/ILD_DSP_2023. The design matrix was constructed as: ~ 0 + anno_type + Sex + batch + ruv_W1 + ruv_W2 ruv_W + ruv_W4, where anno_type represents annotation and disease condition as 1 grouped factor, and ruv_W represents the 4 weight matrices from ‘geomxBatchCorrection’ as covariates. We considered ROIs as biological replicates. Since we made comparisons between and within each patient simultaneously, we treated patients as a random variable with ‘duplicateCorrelation’. ‘makeContrasts’ command was used to call differentially expressed genes between annotations. Top ranking genes (by F statistic) shown in heatmaps were extracted from fitted models with ‘topTable’ command, setting log2 fold change cutoff and Benjamini–Hochberg adjusted P value cutoff at 1 and 0.05, respectively. Pairwise comparisons shown in volcano plots were performed similarly. ggplot2/3.4.4 and ComplexHeatmap/2.15.474 were used to create heatmaps and volcano plots. enrichplot, clusterProfiler packages and Gene Ontology biological process database were used to identify over-represented gene sets and display output as dot plots and network plots46,75,76.

Spatial deconvolution

We used ‘SpatialDecon’ to estimate the proportions of cell types in each ROI and their impact on differential gene expression45. Briefly, the background noise was calculated with 139 “NegProbe-WTX” and the RUV4-normalized data using ‘derive_GeoMx_background’. A cell profile library generated from scRNA-seq data of healthy adult lungs was used77. To simplify the output, the 21 annotated cell types were binned into 10 groups (Supplementary Table S18). Cell abundance estimates were obtained with ‘spatialDecon’. For each comparison between ROIs, a subset of the normalized data and cell abundance estimates was used with ‘reverseDecon’. The output correlation between predicted and observed gene expression was plotted against their standard deviation of residuals to qualify the impact of cell composition on gene expression.

Bulk RNA-seq processing and analysis

Of the 12 cases of fibrosing ILD analyzed with Nanostring DSP, 7 fresh-frozen, whole lung tissue samples obtained at the time of the same SLB were available for bulk RNA-seq and analysis. Four additional samples, unmatched with Nanostring DSP, were also obtained at the time of SLB. Samples were immediately placed in RNAlater stabilization solution (AM7024, ThermoFisher) in the operating room. Samples were then flash-frozen in liquid nitrogen and stored at −80 °C until extraction. Thawed tissue was homogenized with a bead mill (10,158-610, 10,158-558, VWR). Total RNA was extracted with Monarch Total RNA Miniprep kit with DNase I treatment (T2010S, New England Biolabs). TruSeq mRNA stranded library preparation and 100-bp paired-end sequencing were performed at the McGill Genome Center. Briefly, demultiplexed reads were aligned to hg38/GENCODE (v21) comprehensive annotation with STAR/2.7.9a78,79. Duplicate reads were removed with ‘-F 1024’ option in samtools/1.1780. At least 15 million uniquely mapped reads were obtained from all samples. A count table was generated using HTSeq/2.0.281. Nextflow was used to streamline these steps82. Differential gene expression analysis and visualization were performed with limma-voom and ggplot2/3.4.4, respectively in RStudio running R/4.2.2. R script and the design matrix are available at https://github.com/kimsjune/ILD_DSP_2023.

Statistical analysis of patient characteristics

ANOVA with Tukey’s multiple test correction (where applicable) was used to compare continuous clinical characteristics between ILD subtypes (Prism 10, GraphPad). For survival outcome and gender as categorical variables, a contingency table with a chi-squared test was used.