Genomic, transcriptomic, and immunogenomic landscape of over 1300 sarcomas of diverse histology subtypes

Soupir, Alex; Ospina, Oscar E.; Hampton, Oliver; Churchman, Michelle; Radmacher, Michael; Hedges, Dale; McKean, David; Agius, Phaedra; Zeeshan, Saman; Seligson, Nathan D.; Pollock, Raphael; Liebner, David; Chen, James L.; Tinoco, Gabriel; Salhia, Bodour; McCarter, Martin; Wilky, Breelyn A.; Miller, Benjamin J.; Cavnar, Michael J.; Groundland, John S.; Schneider, Bryan P.; Riedlinger, Gregory; Edge, Stephen B.; Moskaluk, Christopher A.; Cardona, Kenneth; Naqash, Abdul Rafeh; Gonzalez, Ricardo J.; Mullinax, John E.; Joyce, David M.; Binitie, Odion; Douglas Letson, G.; Naghavi, Arash O.; Druta, Mihaela; Reed, Damon R.; Siegel, Erin M.; Teer, Jamie K.; Fridley, Brooke L.; Brohl, Andrew S.

doi:10.1038/s41467-025-58678-6

Download PDF

Article
Open access
Published: 06 May 2025

Genomic, transcriptomic, and immunogenomic landscape of over 1300 sarcomas of diverse histology subtypes

Nature Communications volume 16, Article number: 4206 (2025) Cite this article

6785 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Given their rarity and diversity, a fundamental understanding of the genomic underpinnings for many sarcoma subtypes is still lacking. To better define the molecular landscape of this group of diseases, we perform matched whole exome sequencing and RNA sequencing on a cohort of 1340 sarcoma tumor specimens. We identify recurrent somatic mutations and observe an increased mutational burden in metastatic vs. primary samples (p < 0.001). We observe frequent copy number alterations including whole genome doubling, with this feature being more common in metastatic tumors (p = 0.026). Estimation of immune cell abundances followed by hierarchical clustering identifies five immune subtypes ranging from low to high and we observe inferior overall survival in immune deplete clusters compared to immune enriched (p < 0.01). Interestingly, GIST predominantly form a distinct “immune intermediate” cluster that is marked by a specific enrichment for NK cells (FDR < 0.01).

Clinical genomic profiling in the management of patients with soft tissue and bone sarcoma

Article Open access 15 June 2022

Correlation analysis of disulfidptosis-related gene signatures with clinical prognosis and immunotherapy response in sarcoma

Article Open access 26 March 2024

Clinical sequencing of soft tissue and bone sarcomas delineates diverse genomic landscapes and potential therapeutic targets

Article Open access 15 June 2022

Introduction

The term sarcoma encompasses a diverse group of rare malignancies that arise from mesenchymal tissues. Though only accounting for approximately 1% of cancer incidence, there are more than 100 distinct sarcoma subtypes¹. These features of rarity and diversity pose significant challenges to both research efforts and the clinical management of this group of diseases. Historically, sarcomas have been grouped into broad categories (e.g. “soft tissue sarcoma”) despite highly divergent disease biology and clinical behavior amongst these diseases. Unfortunately, for most patients with advanced-stage sarcoma, treatment options are limited and non-curative. Cytotoxic anthracycline-based chemotherapy remains the standard first line systemic treatment for most sarcoma varieties².

In recent years, there has been an increasing emphasis on improving the precision of sarcoma diagnosis, treatment, and clinical management using molecular profiling. Clinical molecular profiling efforts have consistently demonstrated the ability to enhance or refine sarcoma diagnosis in a clinically meaningful percentage of patients and, in some cases, uncover opportunities for molecularly guided therapeutic strategies^3,4,5,6. For several of the more common sarcoma types, immunogenomic characterization might also help to select patients for checkpoint inhibitor therapy⁷, the subject of ongoing clinical study. For several of the rarest subtypes, there have been new diagnostic categorizations based largely on disease-defining genomic features and the resultant recognition of distinct clinical behaviors, for example round cell sarcomas with alternate non-EWS-ETS fusions^8,9.

Despite the widespread use of next generation sequencing (NGS) in both the lab and clinic, our fundamental molecular understanding of many sarcoma subtypes remains limited. Most initial comprehensive NGS efforts in adult sarcomas, such as The Cancer Genome Atlas (TCGA) project and cohort studies, have focused only on a handful of the more common sarcoma subtypes, and largely have characterized only primary tumor samples^10,11. Clinical molecular profiling efforts, while including broader diagnostic categories and tumor stages, are often limited to panel-based sequencing designed to capture molecular features relevant in common cancer types and therefore potentially less relevant to sarcomas^3,6. To further our understanding of the sarcoma molecular landscape across a broad group of sarcomas, we performed comprehensive profiling including whole exome sequencing and transcriptome sequencing on a cohort of >1,300 sarcomas, including both primary and metastatic tumor samples collected as part of the Oncology Research Information Exchange Network (ORIEN) collaborative. In addition to describing foundational genomic features and a landscape of gene expression patterns, we assessed the immunogenomic features of these tumors and the associated microenvironments captured by bulk sequencing and evaluated these immunogenomic features for prognostic relevance.

Results

A total of 1340 tumor samples from 1232 patients representing 42 different sarcoma subtypes were included in this analysis (Table 1). Of these, 1162 samples underwent tumor and germline whole exome sequencing (WES, Supplementary Table 1) and 974 samples underwent RNA sequencing (Supplementary Table 2), with 796 having data available from both. Patients with samples from the primary tumor (n = 971), a metastatic site (n = 225), or both (n = 36) were included in our cohort. Prior to disease-based clustering and analysis, we reclassified the diagnosis in 14 (1.0%) samples based on the presence of a pathognomonic driver fusion (Supplementary fig. 1; Supplementary Table 3).

Table 1 Cohort description

Full size table

Somatic mutational analysis and tumor mutational burden

Somatic mutations were identified for all samples and were assessed for enrichment at the gene level per histology subtype. To provide an overview of the most recurrent and relevant findings, the top 10 most significantly mutated genes from each histology were intersected with the COSMIC Tier 1 Cancer Gene Census and used to create an Oncoplot (Fig. 1A, Supplementary Data 1). As expected, TP53 showed the highest mutation frequency across the entire cohort (18.2%). Additional known disease-specific mutational patterns were recapitulated, such as recurrent KIT and PDGFRA mutations in GIST.

**Fig. 1: COSMIC Tier 1 genes significantly mutated in sarcomas.**

Interestingly, the frequency of somatic mutations in these cancer-associated genes was significantly higher in metastatic tumors compared to primary tumors (median mutations in COSMIC Tier 1 genes 3 vs. 2, Wilcoxon Rank Sum p < 0.001). Notably, the higher mutational rate in known oncologic drivers in metastatic vs. primary samples was largely accounted for by an increase in frequency of the most common tumor suppressors TP53 (26% vs. 16%, Chi-squared p = 0.0007) and ATRX (8.9% vs 5.2%, Chi-squared p = 0.04) and not by an increase in frequency of driver kinase mutations such as KIT (11% vs. 12%, Chi-squared p = 0.88).

We observed a wide range of tumor mutational burden (TMB), both intra- and inter-histology, in our cohort (Fig. 1B). Undifferentiated pleomorphic sarcoma had the highest median mutations per MB (mut/MB; median = 1.3 mut/MB, range 0.06 to 28.2 mut/MB) amongst histologic types evaluated. Overall, we found that 98.3% of ORIEN samples demonstrated a low ( < 5 mut/MB) or intermediate (>= 5 mut/MB and <10 mut/MB) TMB (97.1% and 1.2%, respectively) which was not statistically different from the TCGA SARC cohort (94.6% and 2.1%, respectively, Fisher’s Exact p = 0.14; Fig. 1B). Average tumor mutational burden across the whole ORIEN cohort was 1.46 mut/MB, with a significantly higher burden observed in metastatic samples (1.88 mut/MB) compared to primary samples (1.34 mut/MB; Wilcoxon Rank Sum p < 0.001). For histology subtypes with both primary and metastatic samples greater than 20 (GIST, Leiomyosarcoma, and Sarcoma, NOS), we compared TMB between disease sites. Leiomyosarcoma was the only of the three histology subtypes to display a significant increase in mut/MB in metastatic disease (Wilcoxon Rank Sum p value = 0.022), while GIST (Wilcoxon Rank Sum p value = 0.065) and Sarcoma, NOS (Wilcoxon Rank Sum p value = 0.725) did not.

Microsatellite-instability (MSI) was identified in 6 of 1162 WES samples (0.5%), including 4 sarcoma NOS, 1 leiomyosarcoma, and 1 malignant peripheral nerve sheath tumor. MSI tumors demonstrated a higher TMB (median = 11.0 mut/MB, range 1.1 to 56.5 mut/MB) than the overall cohort (median = 0.73 mut/MB, range 0 to 54.9 mut/MB). Non-silent mutations in mismatch repair (MMR) genes were identified in three of six MSI samples tumors, specifically in genes MLH3, MSH3, MSH6, PMS2, and RFC3.

Copy Number Variations and Whole-Genome Amplification

In many sarcoma subtypes, copy number variations (CNV) are thought to be the primary driving events, such as recurrent MDM2/CDK4 amplification in well-differentiated/de-differentiated liposarcoma (WD/DDLPS). To provide a landscape of CNV across our diverse cohort, we summarized CNV by chromosome arm with length-weighted average, plotted by sarcoma subtype and primary/metastatic status (Fig. 2). Known histology specific arm changes, including recurrent 12q gain in WD/DDLPS and recurrent loss of 14q in gastrointestinal stromal tumor (GIST), were readily identified. Overall, chromosome/segment level copy number gains were more frequent than losses (segment length with “gain” greater than segment length with “loss” in 860 of 1162 samples; 74%). Whole genome doubling (WGD) was a common feature across many sarcoma subtypes and this feature was more frequent in metastatic samples as compared to primary samples for both the entire study cohort (23.4% vs. 17.0%, Chi-squared p = 0.026) and well as for GIST, specifically (22.5% vs. 9.15%, Chi-squared p = 0.038). Furthermore, several COSMIC Tier 1 genes showed patterns consistent with recurrent arm losses at the histology subtype level, such as heterozygous losses of PDGFB and FOXA1 in GIST samples (Supplementary fig. 2).

**Fig. 2: Copy number variation landscape of sarcomas.**

Transcriptional Landscape and Disease Clustering

Transcriptome sequencing was assessed for global expression patterns across sarcoma subtypes. As expected, assessment of gene expression similarities via dimension reduction (UMAP) showed that tumors assigned to the same histology tended to be closer in the multidimensional embedding, providing further support to the histological classification of the samples (Fig. 3A). We chose initially to focus on several of the more common histologies of our dataset including leiomyosarcoma, GIST, myxoid liposarcoma, and dedifferentiated and well-differentiated liposarcomas (WD/DDLPS) that formed visually obvious distinct expressional subgroups (Fig. 3B). We also noted that fusion-driven sarcoma subtypes including Ewing sarcoma, synovial sarcoma, DSCRT and solitary fibrous tumors comprised transcriptomically distinct expression subgroups amongst this study cohort (Fig. 3A, Supplementary Fig. 3).

**Fig. 3: The gene expression landscape of sarcomas.**

To further probe the expression differences driving these subgroups, we performed differential expression analysis with limma¹² comparing each of these divergent subtypes to the remaining samples (Fig. 3C, Supplementary fig. 4, Supplementary Data 2). As expected, disease subtypes maintained higher expression levels of markers of the cell lineage of origin e.g. genes related to smooth muscle differentiation such as MYOCD, TAGLN, JPH2, and CSRP1 amongst the top differentially expressed genes in leiomyosarcoma and markers of adipogenesis in WD/DDLPS including HMGA2 and CIDEC. Known subtype-specific oncologic drivers were also amongst the top False Discovery Rate (FDR)-corrected differentially expressed genes, including KIT in GIST and MDM2 in WD/DDLPS.

Gene set enrichment analysis was performed using the differentially expressed genes in these histology subtypes (Fig. 3D). In GIST, the top upregulated pathways were IL2/STAT5 signaling and oxidative phosphorylation. Interestingly, in addition to adipogenesis and fatty acid metabolic pathways, DDLPS was highly enriched for multiple inflammatory pathways, including JAK/STAT3 signaling and TNFα signaling via NFκB. As expected, the leiomyosarcoma group was enriched for myogenesis but also several cell cycle related pathways as well as mTORC1 signaling. Myxoid liposarcoma was marked by a relative negative enrichment of most oncologic signaling pathways evaluated, highlighting the expected unique oncogenic program of a transcription factor fusion-driven malignancy.

Immunogenomic Profiling

To characterize the immune microenvironment across our diverse sarcoma cohort, we estimated the immune infiltration based on gene expression profiling of the bulk RNA sequencing followed by clustering analysis using these immune infiltration estimates from MCPCounter¹³. This analysis yielded five distinct immune groups (Fig. 4A) that exhibited a gradient of immune infiltration ranging from low immune cell enrichment or “cold” (Groups A and B) to high immune cell enrichment or “hot” (Groups D and E). Strikingly, the immune “intermediate” group C was almost exclusively composed of GIST (Fig. 4A, B). The enrichment scores for all 10 evaluated cell types were significantly different across the five immune groups (Kruskal-Wallis test FDR < 0.001). For most immune cell subtypes, there was a consistent and progressive increase from groups A to E, including in T (median score -1.18 to 1.21) and B cells (-0.71 to 0.81), dendritic cells (-1.17 to 0.89), and cells of monocyte lineage (-1.22 to 1.10). One notable exception is that Group C showed a specific infiltration enrichment for NK cells (Wilcoxon Rank Sum test FDR < 0.001) compared to all other groups except the most immune “hot” Group E, despite modest or intermediate enrichment of other immune cell subtypes. Group B was marked by a higher abundance of endothelial cells and fibroblasts compared to groups A and C.

**Fig. 4: Sarcomas samples classified according to the level of immune infiltration.**

The distribution of tumors amongst immune subgroups varied by histology (Fig. 4B). Notably, angiosarcoma and undifferentiated pleomorphic sarcoma (UPS), diseases that have been reported to be more responsive to checkpoint inhibitor therapy, clustered more predominantly in the immune hot groups D and E. Within individual sarcoma subtypes, metastatic vs. primary status did not significantly associate with a difference in the distribution of immune subgrouping with the exception of dedifferentiated liposarcoma and synovial sarcoma. In DDLPS, metastatic samples were deplete of Group E tumors and in synovial sarcoma, primary samples were predominantly assigned to Group A (Fisher’s test p < 0.05) (Fig. 4B, Supplementary fig. 5). In sarcomas with high immune infiltration (Group E) and with high fibroblast content (Group B), there was a trend towards higher TMB (median log(TMB + 1): 0.44 and 0.50 respectively), compared to other immune groups (log(TMB + 1): 0.39 – 0.43), but the difference was not statistically significant (Kruskal-Wallis test p = 0.58).

We performed differential expression analysis between immune subgroup A/B (“cold”), C (“intermediate/GIST”) and D/E (“hot”). Tumors in groups D and E (“hot”) showed upregulation of immune relevant genes PTPRC, IL2RG, CD53, CCL13, TRAV12-1, HLA genes, and multiple immunoglobulins (Fig. 4A and Supplementary Data 3). In contrast, we observed in Group A and B tumors upregulation of genes predominantly involved in tumor proliferation, such as C1QL4, PRAME, KLHDC8A, and CA9 (Fig. 4A). Gene set enrichment analysis (Fig. 4C) confirmed the inverse relationship between gene sets related to cell cycle/proliferation (consistently enriched in groups A/B) and those related to immune signaling and response (consistently enriched in groups D/E). We further identified enrichment of signaling pathways potentially contributing to the observed immunological clustering, including upregulation of MYC targets and Hedgehog signaling and downregulation of KRAS signaling in the immune cold groups A/B.

Differentially expressed genes in Group C were driven mostly by histology, predominantly resulting in genes also upregulated in the GIST histology. Hence, we tested for differentially expressed genes only among GIST specimens belonging to the various immune subgroups (Supplementary Data 4). Similar to the patterns observed across the entire study cohort, GISTs that cluster in groups A and B showed enriched expression of cell proliferation and apoptosis-related genes such as TYRO3 and CSRNP3, whereas GISTs that cluster in Groups D and E exhibited upregulation of genes involved in immune regulation and/or inflammation (e.g., CSF3R, ANXA1, ALOX5, ITPR3, CXCL1, CXCL3, ZEB2). In group C GIST tumors, we detected upregulated expression of immune-related genes associated with NK cells, including KLRB1, KLRC1, KLRC3, KLRF1, KLRK1, GZMA, GZMK, NCR1, NKG7.

Using matched clinical data, we performed survival analysis to evaluate the prognostic impact of immune subgroups. This analysis revealed that there was a significant difference between groups in overall survival, whether considered for the entire cohort (Fig. 4D, Likelihood Ratio Test – LRT p < 0.01) or stratified by primary/metastatic status (Supplementary fig. 6, LRT p < 0.01). Patients with tumors assigned to Group C had higher overall survival, which was expected given the predominance of GIST tumors in this group as GIST has greater survival expectations compared to most other sarcoma subtypes due to the availability of highly effective tyrosine kinase inhibitor therapy and also excellent expected survival of early-stage tumors included in the cohort treated with curative surgery. Patients with tumors in the immune hot Groups D and E had higher overall survival compared to those with tumors in immune cold Groups A and B. After excluding GISTs from the survival analysis, the same pattern was observed, in spite of non-statistical significance (Supplementary fig. 6).

Discussion

In this study we analyze the mutational, transcriptomic and immunogenomic landscape of over 1,300 adult sarcomas of diverse histologic representation. The molecular profiling generated from this work provides a rich resource, expanding upon prior foundational work such as TCGA¹⁰ towards a more comprehensive molecular profiling of this set of diseases. We provide a broad scope of findings to further our molecular understanding of previously understudied sarcoma subtypes, perform cross-sarcoma comparisons to highlight the divergent biology inherent to sarcoma, and generate novel insights into the underpinnings of these diseases.

An overarching theme of our analysis is the observation of differences in molecular findings between samples taken from primary versus metastatic tumor samples. We observe a modest increase in mutational burden between these groups that is primarily driven by an increase in inactivating mutations of tumor suppressor genes rather than an accumulation of additional oncogene drivers. Whole genome doubling (WGD), a known frequent copy number variation pattern in sarcoma, has previously been proposed as a potential negative prognostic feature⁶, though prior analyses have been limited by small sample sizes with matched clinical outcomes, particularly after subgrouping. Concordantly, we find WGD more frequently in metastatic versus primary samples in our cohort. Interestingly, we found this difference most pronounced in GIST, a tumor type primarily driven by activating kinase mutations, suggesting that CNV and WGD specifically might be an important source of secondary mutation leading to tumor progression in this disease.

Transcriptomic profiling of our cohort of sarcomas highlights the diversity of this group of diseases. As expected, sarcoma subtypes cluster predominantly by disease. The sarcomas that form the most distinct transcriptional clusters include those known to be driven by a transcription factor fusion such as Ewing sarcoma and myxoid liposarcoma. Comparing the expressional landscape of one sarcoma subtype to others may help to identify disease specific pathways of interest for further testing and therapeutic development, such as the identification of mTORC1 upregulation in leiomyosarcoma that was identified in this work and is already a target for therapeutic development¹⁴. To our knowledge this study represents the largest database of whole transcriptome sequencing for adult sarcomas reported in the literature to date and therefore a resource for hypothesis generating comparisons in additional sarcoma types beyond the scope of this initial landscape.

Success of immunotherapy in an increasing number of advanced solid tumors has led to a desire to better understand the tumor-immune microenvironment. In sarcoma, an immune subgroup schema has been proposed based on a combined analysis of RNA sequencing (from TCGA) and several microarray based expressional studies totaling 608 tumors total and encompassing 6 sarcoma histologies⁷. Our work differs from this prior in that it includes a greater diversity of sarcoma subtypes, is larger in sample size, and is restricted only to the analysis of RNAseq. Concordant to this prior work, we find that immune cell enriched sarcoma clusters have a more favorable prognosis than immune cell deplete sarcoma clusters. Intriguingly, in our analysis, we find that most GIST form a unique cluster that has intermediate levels of immune cell enrichment overall but is specifically enriched for NK cell infiltrate. To our knowledge this specific association of GIST with NK cell infiltrate relative to other sarcoma subtypes has not been previously reported or studied, though NK cells have been reported to be relatively abundant in GIST based on prior transcriptomics analyses^15,16 as well as by immunohistochemistry¹⁷. NK cell activity in GIST has also been shown to be predictive of tyrosine kinase inhibitor efficacy and patient survival^17,18. Further study is indicated to assess the clinical relevance of our immune subtyping schema, compare and/or combine our data with prior datasets for refinement of sarcoma immune subtyping, and to confirm immune associations identified in this study such as that between NK cells infiltrate and GIST.

We acknowledge several important limitations of our study. The diversity of our cohort, while overall a strength, equates to having small sample sizes ( ≤ 5 tumors each) for many of the less common disease subtypes. For the purposes of this initial landscape analysis, we bluntly categorized these lower frequency sarcomas into the category of “other,” which will obviously limit disease specific insights for these rarer sarcomas. Additionally, while both primary and metastatic samples were available in this study, very few of these were matched samples from the same patient. Finally, extended clinical data including therapeutic outcomes were still being curated and therefore not available for this analysis, limiting the ability to assess for molecular determinants of response. Despite these limitations, we believe that our study has many strengths and fills an important gap in molecularly profiling efforts in sarcoma.

In conclusion, we report a landscape analysis of whole exome sequencing and RNA sequencing performed on >1300 diverse sarcomas, representing one of the largest molecular profiling efforts in this disease group. We identify the mutational and copy number variation landscape and observe differences between primary and metastatic samples. We highlight expression pathways that are enriched in histologic subtypes that cluster most distinctly from others, providing a subtype-specific roadmap for further translational efforts. Finally, we define immune enriched or depleted sarcoma subgroupings that carry a prognostic impact.

Methods

Patient Population and Study Design

The study was approved by the Institutional Review Board (IRB) at Moffitt Cancer Center (Advarra, IRB#00000971, Pro00048786). The Oncology Research Information Exchange Network (ORIEN) is an alliance of 18 U.S. cancer centers established in 2014. All ORIEN alliance members utilize a standard Total Cancer Care® (TCC) protocol that is approved by their local IRB. As part of the TCC study, participants undergo written informed consent to agree to have their clinical data followed over time, to undergo germline and tumor sequencing, and to be contacted in the future by their provider if an appropriate clinical trial or other study becomes available. TCC is a prospective cohort study with a subset of patients enrolled to the ORIEN Avatar program, which includes research use only (RUO) grade whole-exome tumor sequencing, RNA sequencing, germline sequencing, and collection of deep longitudinal clinical data with lifetime follow up. Aster Insights, the commercial and operational partner of ORIEN, harmonizes all abstracted clinical data elements and molecular sequencing files into a standardized, structured format to enable aggregation of de-identified data for sharing across the Network. 1242 ORIEN Avatar patients diagnosed with sarcoma and consented to the TCC protocol from 13 participating members of ORIEN were included in this study. No sex and/or gender analysis was carried out as this was felt to be out of the scope of this report.

DNA/RNA Preparation and Sequencing

Tumor DNA was purified from frozen tissue and germline DNA was purified from blood using QIASymphony DNA purification kit (Qiagen; Venlo, The Netherlands), followed by ultrasonification to an average size of 213 bp (Covaris; Woburn, MA). Alternatively, tumor DNA was purified from formalin fixed paraffin embedded (FFPE) tissue using truXTRAC FFPE DNA purification kit (Covaris, Woburn, MA), generating an average size of 165 bp. RNA was purified from frozen tissue with RNAeasy plus mini (Qiagen; Venlo, The Netherlands) to generate an average size of 216nt; alternatively, RNA was purified from FFPE tissue using truXTRAC FFPE RNA ultrasonification kit (Covaris; Woburn, MA) generating an average size of 165nt.

DNA libraries were generated, then captured on custom WES arrays (IDT; Coralville, IA (38.7 Mb), or Nimblegen; Madison, WI (63.38 Mb)), with custom probes designed to increase capture of up to 676 cancer-related genes. Libraries were sequenced (100 bp, paired end reads) on Illumina NovaSeq 6000 (Illumina, San Diego, CA) to a target depth of 100x (200x for cancer-related genes) for germline WES, and 300x (600x for cancer-related genes) for tumor WES. Quality checks were performed for gender identity matches. Minimum hybrid selection threshold was set at >80% of bases with >100x coverage for tumor samples and >50x coverage for normal samples. RNA libraries were generated using TruSeq RNA Exome adapters (Illumina; San Diego, CA), captured on exome array, and sequenced (100 bp, paired end reads) at Hudson Alpha (Huntsville, AL) or sequenced (150 bp, paired end reads) at Fulgent (Temple City, CA) to a target depth of 100 M reads (50 M pairs).

WES Quality Control and Alignment

Adapter sequences were trimmed with BBDuk (version 38.46; sourceforge.net/projects/bbmap/). WES samples were processed using the Sentieon App (sentieon_release_201911)¹⁹. Fastq files were aligned to human reference genome (GRCh38/hg38) using BWA-mem²⁰. Resulting cram files underwent extensive quality control checks using Picard (v2.9.0).

Identification of Somatic Mutations

Somatic single nucleotide variants (SNVs) and insertions/deletions (INDELs) were called with TNseq (Sentieon; San Jose, CA) using matched tumor and germline cram files. Resulting vcf files were annotated with Annovar²¹ and Funcotator (GATK v4.1.6.0; https://gatk.broadinstitute.org/hc/en-us/articles/360037224432-Funcotator).

Filtering Somatic Mutations and Calculation of Tumor Mutation Burden

Somatic variants were filtered for population polymorphisms and recurrent sequencing artifacts using Aster Insights Panel of Normals (PoN). Aster Insights PoN is constructed from the ORIEN AVATAR germline variant catalog and is applied by checking every sample’s reported somatic mutation against the catalog to add a VCF “panel_of_normals” flag. A germline variant is included in the PoN if it is present in > 0.5% of ORIEN AVATAR population of unrelated normal samples. For a variant to pass all filters, it must not be present in the PoN, have a variant allele fraction (VAF) greater than 0.04 (4%), at least 1 ALT read in both the F1R2 and F2R1 read orientations, and F1R2 + F2R1 must be greater than 10 ALT reads.

Protein coding length for all genes in HG38 (bp) were calculated using a custom python script. For each collapsed histology subtype, these lengths were modeled with robust regression against frequency of samples that have mutations in that gene. Robust regression was used due to its ability to better handle outliers²². Oncoplots were constructed with maftools, ignoring variant classification of ‘silent’ or ‘unknown’²³. Histology subtypes were collapsed to “Other” if they were composed of less than 5 samples. Top 10 most significant genes from robust regression were selected for each histology subtype and intersected with COSMIC Tier 1 genes to identify cancer-related genes²⁴. TMB was calculated with ‘tcgaCompare’, adjusting for WES kit capture size. Comparisons of mutation frequency between primary and metastatic samples was performed with two-tailed Wilcoxon Rank Sum tests. Tumor mutation burden was classified as low ( < 5 mut/MB), intermediate (>= 5 mut/MB and <10 mut/MB), and high (>= 10 mut/MB)²⁵ for ORIEN and TCGA SARC samples then compared with two-tailed Fisher’s Exact tests due Chi-squared test producing a warning about an expected value being less than 5.

Copy Number Calling

Somatic Copy Number Variation (CNV) was calculated by Sequenza (v3.0.0, https://sequenzatools.bitbucket.io/#/home)²⁶. Files were generated on the alignments of tumor and germline reads by sequenza-utils and processed by CNTools v1.30.0 to generate the final output for copy number at the gene level²⁷. Copy number was classified into homologous deletion (HOMDEL, 0), heterozygous loss (HETLOSS, 1), normal (Normal, 2), gain (GAIN, 3), or amplification (AMP, 4 or greater). Samples were determined to have whole genome doubling (WGD) if more than 50% of all segment lengths were classified as amplified (i.e. CN = 4 or greater). Differences between the number of primary and metastatic samples with WGD was calculated using two-tailed Chi-squared tests for all samples, as well as the top 5 histology subtypes by sample number for consistency. Arm-level copy number change was calculated as the weighted average of copy number segments on each arm (i.e. longer segments contribute more to arm-level copy number) and then classified as outlined above.

Microsatellite Instability

Microsatellite instability is scored using MSIsensor 2 (Niu, et al. (2014)). Briefly, the median number of microsatellites assessed per tumor-normal pair was 21,648 (range: 9954–34,873), and MSI scores are reported as the number of unstable microsatellites divided by the total number of microsatellites assessed. For downstream analyses, a threshold of 18% of sites being somatic (somatic microsatellites divided by total number of microsatellites) was used to separate high MSI and stable MSI. TMB was filtered to include non-silent mutations while removing transcriptional start site and splice site mutations before calculating differences between high and stable MSI samples.

RNA-seq Quality Control and Alignment

Adapter sequences were trimmed with BBDuk (version 38.46; sourceforge.net/projects/bbmap/) and fastq files were aligned to the human reference genome (GRCh38/hg38 with Gencode transcriptome version 32) using STAR (v.2.7.3a, https://github.com/alexdobin/STAR)²⁸. Resulting RNAseq expression data were then normalized at the TPM level, using a linear scaling method based on the trimmed mean among protein-coding genes (mean of TPMs between the 35th and 95th quantiles) of each sample, log2 transformation of scaled TPMs after adding 0.001, loess smoothing, and ‘ComBat’ normalization (sva R package, v3.34.0), adjusting for batch effects induced by sample preservation methods and sequencing capture kit (Supplementary fig. 7)²⁹. The resulting isoform-level counts were then collapsed to gene-level by summing the transcripts originating from a given gene. Gene counts mapped to Y RNAs, Metazoa SRP, 5.8S and 5S rRNA, 7SK, Vault, SnoRNAs MIR, RNA5, and RNU were removed.

To ensure that repeated sampling from the same patient would not overly bias results of downstream RNA-seq analysis, we repeated selected analyses with removal of repeated measures (by random selection of 1 tumor per patient when >1 tumor was available for analysis). The results closely mirrored the primary analysis, supporting the inclusion of all available tumors for the analysis (Supplementary fig. 8).

RNA Gene Fusion Prediction and Identification of Histology Subtype Changing Fusions

STAR-Fusion and Arriba Gene Fusion algorithms are applied to the STAR aligner output files. Gene Fusion predictions from both STAR-Fusion (v1.8.0, https://github.com/STAR-Fusion/STAR-Fusion/wiki)³⁰ and Arriba (v1.1.0, https://github.com/suhrig/arriba)³¹ were merged into a single output file that removes duplicate putative gene fusion calls, putative gene fusion calls of low confidence – reporting gene fusions with at least one (1) junction read and at least one (1) spanning read, and gene fusion calls occurring within the same gene, within SnoRNAs, within rRNAs, or mitochondrial genes – which are areas considered to be contributing to high false-positive rate and generally uninformative. Fusions in rRNA, mitochondrial genes, and fusions within the same genes were removed. Fusions with poor supporting evidence in the spanning and junction reads ( < 1 read), as well as gene fusions that have multiple break-points were removed. Merged putative fusion calls were coalesced to the most dominate gene isoform in a set of reported putative gene fusion calls that either share an identical breakpoint or within a set of overlapping putative gene fusion calls.

Fusions associated with histology reassignment were derived from Gounder et al. These were then intersected with our fusion calls, ignoring donor-acceptor gene order to identify candidate samples in need of histology reassignment. List of samples along with clinical attributes were then reviewed to determine whether change in histology assignment was appropriate given fusion confidence and disease location.

Assessment of Gene Expression and Immune Infiltration

After batch correction and normalization procedures, we applied uniform manifold approximation and projection (UMAP) using the expression from the 10% of genes with the highest standard deviation to explore the gene expression similarities among sarcoma samples. The umap R package was used with spread=2 and random_state=12345. Given the separation of Leiomyosarcomas, Gastrointestinal stromal tumors (GIST), Myxoid liposarcomas, and liposarcomas (well- and dedifferentiated), the determination of differentially expressed genes between each of those sarcoma groups and the rest of the samples was completed using linear models as implemented in the limma package¹². The false discovery rate (FDR) correction was used to adjust for multiple comparisons. Adjusted p-values were used in the fgsea package to calculate enrichment scores for the Hallmark gene sets³².

Immune infiltration scores were calculated with the deconvolution algorithm MCPcounter as conducted in a previous sarcoma study⁷. MCPcounter provides abundance scores for multiple populations of immune cells, as well as other stromal cell types. In order to detect sarcoma immune groups, the MCPcounter scores were used in hierarchical clustering with Ward’s linkage. Unsupervised detection of clusters was performed using hybrid Dynamic Tree Cuts with deepSplit=1.75³³. We collapsed the determined immune groups into three categories: Immune “cold” (Groups A and B), immune “hot” (Groups D and E), and Group C. Differential gene expression analysis was conducted among the three collapsed immune groups by comparing the samples from each group against the samples in the two other groups. We also used FDR-adjusted p-values to calculate GSEA scores for the Hallmark gene sets. Analyses and visualizations were generated in R³⁴ and using ggplot2³⁵ and ComplexHeatmap³⁶.

Immune group survival analysis

For survival analysis, the date of diagnosis was used as the left endpoint. For each patient, the sarcoma diagnosis most proximal in time to the date of sample collection was used to determine the diagnosis date (using a search window that extended five years prior to and up to one month beyond the date of sample collection). The date of death or date of last contact was used as the right endpoint (censored for date of last contact). Samples without a matching diagnosis record or unknown right endpoint were excluded. In total, 781 samples were included in the survival analysis. Cox proportional hazard models were fitted to assess the differences in survival of patients with sarcomas from each of the immune groups. In addition to testing for the effects of immune group assignment, the models included terms accounting for the sarcoma histology and tumor stage. To test for significant differences between the overall survival of each immune group, Likelihood ratio tests (LRTs) were conducted. Analyses were performed in R with the survival and survminer packages³⁷.

Statistics and Reproducibility

No statistical power analysis was completed to predetermine the study sample size. No data were excluded from analyses. The experiments were not randomized. All statistical analyses were completed using R and RStudio. Differential expression analysis was performed by fitting linear models using the limma (doi:10.1093/nar/gkv007) and edgeR (doi:10.1093/nar/gkaf018) R packages. The Kruskal-Wallis test was used to assess differences in immune cell enrichment scores across the five identified immune groups. For specific comparisons, the two-sided Wilcoxon Rank Sum test was applied to evaluate differences in NK cell infiltration between Group C and other groups. Two-sided Fisher’s Exact tests were used to determine the association between immune subgroup distribution and metastatic versus primary status for certain sarcoma subtypes. Survival analyses were conducted using Cox proportional hazard models, with likelihood ratio tests (LRTs) to assess differences in overall survival among immune groups. For all analyses, p-values were adjusted using the Benjamini-Hochberg method for multiple comparisons (i.e., False Discovery Rate or FDR), unless otherwise stated. All other statistical methods are described in their respective methods sections.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data used in this research was generated through private funding by Aster Insights (www.asterinsights.com) in collaboration with the Oncology Research Information Exchange Network (ORIEN, www.oriencancer.org). Requests for access to the data used in this study can be submitted to the corresponding author and ResearchDataRequest@AsterInsights.com.

Code availability

Code for analyses can be found at: https://github.com/FridleyLab/landscape_sarcomas_orien and https://doi.org/10.5281/zenodo.14851789.

References

Kallen, M. E. & Hornick, J. L. The 2020 WHO Classification: What’s New in Soft Tissue Tumor Pathology? Am. J. Surg. Pathol. 45, e1–e23 (2021).
Article PubMed Google Scholar
von Mehren, M. et al. Soft Tissue Sarcoma, Version 2.2022, NCCN Clinical Practice Guidelines in Oncology. J. Natl Compr. Canc Netw. 20, 815–833 (2022).
Article Google Scholar
Gounder, M. M. et al. Clinical genomic profiling in the management of patients with soft tissue and bone sarcoma. Nat. Commun. 13, 3406 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Chang, W. et al. MultiDimensional ClinOmics for Precision Therapy of Children and Adolescent Young Adults with Relapsed and Refractory Cancer: A Report from the Center for Cancer Research. Clin. Cancer Res. 22, 3810–3820 (2016).
Article PubMed PubMed Central Google Scholar
Boddu, S. et al. Clinical Utility of Genomic Profiling in the Treatment of Advanced Sarcomas: A Single-Center Experience. JCO Precis Oncol. 2, 1–8 (2018).
Article PubMed Google Scholar
Nacev, B. A. et al. Clinical sequencing of soft tissue and bone sarcomas delineates diverse genomic landscapes and potential therapeutic targets. Nat. Commun. 13, 3405 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Petitprez, F. et al. B cells are associated with survival and immunotherapy response in sarcoma. Nature 577, 556–560 (2020).
Article ADS CAS PubMed Google Scholar
Brohl, A. S. et al. The genomic landscape of the Ewing Sarcoma family of tumors reveals recurrent STAG2 mutation. Plos Genet 10, e1004475 (2014).
Article PubMed PubMed Central Google Scholar
Bridge, J. et al. Clinicopathologic and Genomic Features of EWSR1-PATZ1 Fusion “Sarcoma”. Lab Invest 99 (2019).
Cancer Genome Atlas Research Network. Electronic address, e. d. s. c. & Cancer Genome Atlas Research, N. Comprehensive and Integrated Genomic Characterization of Adult Soft Tissue Sarcomas. Cell 171, 950–965.e928 https://doi.org/10.1016/j.cell.2017.10.014 (2017).
Baird, K. et al. Gene expression profiling of human sarcomas: insights into sarcoma biology. Cancer Res. 65, 9226–9235 (2005).
Article CAS PubMed Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Article PubMed PubMed Central Google Scholar
Becht, E. et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 17, 218 (2016).
Article PubMed PubMed Central Google Scholar
Lacuna, K., Bose, S., Ingham, M. & Schwartz, G. Therapeutic advances in leiomyosarcoma. Front Oncol. 13, 1149106 (2023).
Article CAS PubMed PubMed Central Google Scholar
Mao, X. et al. Single-cell transcriptome analysis revealed the heterogeneity and microenvironment of gastrointestinal stromal tumors. Cancer Sci. 112, 1262–1274 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q. et al. New insight on the correlation of immune landscapes with immune markers expression in different risk classification of gastrointestinal stromal tumors. J. Gastroenterol. 58, 527–539 (2023).
Article CAS PubMed Google Scholar
Delahaye, N. F. et al. Alternatively spliced NKp30 isoforms affect the prognosis of gastrointestinal stromal tumors. Nat. Med. 17, 700–707 (2011).
Article CAS PubMed Google Scholar
Menard, C. et al. Natural killer cell IFN-gamma levels predict long-term survival with imatinib mesylate therapy in gastrointestinal stromal tumor-bearing patients. Cancer Res. 69, 3563–3569 (2009).
Article CAS PubMed Google Scholar
Donald, F., Rafael, A., Jessica, A. W. & Jeremy, S. E. The Sentieon Genomics Tools - A fast and accurate solution to variant calling from next-generation sequence data. bioRxiv, 115717. https://doi.org/10.1101/115717 (2017).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central Google Scholar
Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S. Fourth edn, (Springer, 2002).
Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747–1756 (2018).
Article CAS PubMed PubMed Central Google Scholar
Tate, J. G. et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Article CAS PubMed Google Scholar
Marabelle, A. et al. Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab: prospective biomarker analysis of the multicohort, open-label, phase 2 KEYNOTE-158 study. Lancet Oncol. 21, 1353–1365 (2020).
Article CAS PubMed Google Scholar
Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 26, 64–70 (2015).
Article CAS PubMed Google Scholar
CNTools: Convert segment data into a region by sample matrix to allow for other high level computational analyses v. 1.30.0 (2023).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 20, 213 (2019).
Article PubMed PubMed Central Google Scholar
Uhrig, S. et al. Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 31, 448–460 (2021).
Article CAS PubMed PubMed Central Google Scholar
Korotkevich, G. et al. Fast gene set enrichment analysis. bioRxiv, 060012 https://doi.org/10.1101/060012 (2021).
Langfelder, P., Zhang, B. & Horvath, S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24, 719–720 (2008).
Article CAS PubMed Google Scholar
R: A language and environent for statistical computing v. 4.1.2; 4.3.0 (R Foundation for Statistical Computing, 2023).
ggplot2: Elegant graphics for data analysis (Springer-Verlag New York, 2016).
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Article CAS PubMed Google Scholar
survminer: Drawing Survival Curves using ‘ggplot2’ (2021).

Download references

Acknowledgements

Funding for publication of this work was supported by the AmandaLee Sarcoma Fund, Moffitt Cancer Center.

Author information

These authors contributed equally: Alex Soupir, Oscar E. Ospina.

Authors and Affiliations

Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
Alex Soupir, Oscar E. Ospina, Jamie K. Teer & Brooke L. Fridley
Aster Insights, Hudson, FL, USA
Oliver Hampton, Michelle Churchman, Michael Radmacher, Dale Hedges, David McKean & Phaedra Agius
Department of Biomedical and Health Informatics, School of Medicine, University of Missouri, Kansas City, MO, USA
Saman Zeeshan
Department of Pharmacotherapy and Translational Research, College of Pharmacy, University of Florida, Jacksonville, FL, USA
Nathan D. Seligson
Division of Surgical Oncology, The Ohio State University Comprehensive Cancer Center, College of Medicine, The Ohio State University Wexner Medical Center, Columbus, OH, USA
Raphael Pollock
Division of Medical Oncology, The Ohio State University Comprehensive Cancer Center, College of Medicine, The Ohio State University Wexner Medical Center, Columbus, OH, USA
David Liebner, James L. Chen & Gabriel Tinoco
Department of Biology, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Bodour Salhia
University of Colorado School of Medicine, Aurora, CO, USA
Martin McCarter & Breelyn A. Wilky
Department of Orthopaedics and Rehabilitation, University of Iowa, Iowa City, IO, USA
Benjamin J. Miller
Department of Surgery, University of Kentucky, Lexington, KY, USA
Michael J. Cavnar
University of Utah, Huntsman Cancer Institute, Salt Lake City, UT, USA
John S. Groundland
Indiana University Simon Comprehensive Cancer Center, Indianapolis, IN, 46202, USA
Bryan P. Schneider
Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA
Gregory Riedlinger
Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
Stephen B. Edge
Department of Pathology, University of Virginia, Charlottesville, VA, USA
Christopher A. Moskaluk
Division of Surgical Oncology, Emory University, Atlanta, GA, USA
Kenneth Cardona
Medical Oncology/Phase 1 program, Stephenson Cancer Center, University of Oklahoma Health Sciences, Oklahoma City, OK, USA
Abdul Rafeh Naqash
Sarcoma Department, Moffitt Cancer Center, Tampa, FL, USA
Ricardo J. Gonzalez, John E. Mullinax, David M. Joyce, Odion Binitie, G. Douglas Letson, Mihaela Druta & Andrew S. Brohl
Department of Radiation Oncology, Moffitt Cancer Center, Tampa, FL, USA
Arash O. Naghavi
Department of Individualized Cancer Management, Moffitt Cancer Center, Tampa, FL, USA
Damon R. Reed
Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, FL, USA
Erin M. Siegel
Division of Health Services and Outcomes Research, Children’s Mercy, Kansas City, MO, USA
Brooke L. Fridley

Authors

Alex Soupir
View author publications
Search author on:PubMed Google Scholar
Oscar E. Ospina
View author publications
Search author on:PubMed Google Scholar
Oliver Hampton
View author publications
Search author on:PubMed Google Scholar
Michelle Churchman
View author publications
Search author on:PubMed Google Scholar
Michael Radmacher
View author publications
Search author on:PubMed Google Scholar
Dale Hedges
View author publications
Search author on:PubMed Google Scholar
David McKean
View author publications
Search author on:PubMed Google Scholar
Phaedra Agius
View author publications
Search author on:PubMed Google Scholar
Saman Zeeshan
View author publications
Search author on:PubMed Google Scholar
Nathan D. Seligson
View author publications
Search author on:PubMed Google Scholar
Raphael Pollock
View author publications
Search author on:PubMed Google Scholar
David Liebner
View author publications
Search author on:PubMed Google Scholar
James L. Chen
View author publications
Search author on:PubMed Google Scholar
Gabriel Tinoco
View author publications
Search author on:PubMed Google Scholar
Bodour Salhia
View author publications
Search author on:PubMed Google Scholar
Martin McCarter
View author publications
Search author on:PubMed Google Scholar
Breelyn A. Wilky
View author publications
Search author on:PubMed Google Scholar
Benjamin J. Miller
View author publications
Search author on:PubMed Google Scholar
Michael J. Cavnar
View author publications
Search author on:PubMed Google Scholar
John S. Groundland
View author publications
Search author on:PubMed Google Scholar
Bryan P. Schneider
View author publications
Search author on:PubMed Google Scholar
Gregory Riedlinger
View author publications
Search author on:PubMed Google Scholar
Stephen B. Edge
View author publications
Search author on:PubMed Google Scholar
Christopher A. Moskaluk
View author publications
Search author on:PubMed Google Scholar
Kenneth Cardona
View author publications
Search author on:PubMed Google Scholar
Abdul Rafeh Naqash
View author publications
Search author on:PubMed Google Scholar
Ricardo J. Gonzalez
View author publications
Search author on:PubMed Google Scholar
John E. Mullinax
View author publications
Search author on:PubMed Google Scholar
David M. Joyce
View author publications
Search author on:PubMed Google Scholar
Odion Binitie
View author publications
Search author on:PubMed Google Scholar
G. Douglas Letson
View author publications
Search author on:PubMed Google Scholar
Arash O. Naghavi
View author publications
Search author on:PubMed Google Scholar
Mihaela Druta
View author publications
Search author on:PubMed Google Scholar
Damon R. Reed
View author publications
Search author on:PubMed Google Scholar
Erin M. Siegel
View author publications
Search author on:PubMed Google Scholar
Jamie K. Teer
View author publications
Search author on:PubMed Google Scholar
Brooke L. Fridley
View author publications
Search author on:PubMed Google Scholar
Andrew S. Brohl
View author publications
Search author on:PubMed Google Scholar

Contributions

A.S. and O.E.O. drafted the manuscript and performed bioinformatics and statistical analysis. O.H., M.R, D.H., D.M., P.A., S.Z., N.S., and J.K.T. provided bioinformatics analysis and support. M.C. and E.M.S. provided coordination efforts for the study. R.P., D.L., J.L.C., G.T., B.S., M.M., B.A.W, B.J.M, M.J.C., J.S.G., B.P.S., G.R., S.B.E., C.A.M., K.C., A.R.N., R.J.G., J.E.M., D.M.J., O.B., G.D.L., A.O.N., M.D., and D.R.R contributed to patient recruitment for biospecimens. B.L.F. supervised bioinformatics analysis and contributed to study design. A.S.B. conceived and designed the study, drafted the manuscript, performed data analysis, and supervised the study. All authors critically reviewed and edited the manuscript and approved of the final version.

Corresponding author

Correspondence to Andrew S. Brohl.

Ethics declarations

Competing interests

O.H., M.C., M.R., D.H., D.M., P.A. and S.Z. are or were employed by Aster Insights. E.S. received research funding from the ORIEN Foundation. D.L. – advisory board with Aadi Biosciences and patent licensing (US 10,65,347 B2: Method for Predicting Prognosis) with MatchTx, Inc. J.L.C. is an employee of Tempus. G.T. - advisory board with SynOx, Deciphera, Daiichi Sankyo. B.A.W. – consulting or advisory role with Deciphera, Epizyme, Adcendo, Polaris, Boehringer Ingelheim, research funding from Exelixis, and travel expenses from Agenus. G.R. – advisory board with AstraZeneca. A.R.N. – advisory board with Foundation Medicine and NGM biosciences, travel compensation from Foundation Medicine and Caris Life Sciences, D.R.R. – data safety monitoring committee for Eisai and Springworks. M.D. – consulting AdaptImuune, Deciphera, and Aaid Biosciences. A.S.B – advisory board with Deciphera, research funding (institution) from Merck. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Transparent Peer Review file

Reporting Summary

Description of Additional Supplementary Files

Supplementary Data 1-4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Soupir, A., Ospina, O.E., Hampton, O. et al. Genomic, transcriptomic, and immunogenomic landscape of over 1300 sarcomas of diverse histology subtypes. Nat Commun 16, 4206 (2025). https://doi.org/10.1038/s41467-025-58678-6

Download citation

Received: 07 May 2024
Accepted: 24 March 2025
Published: 06 May 2025
DOI: https://doi.org/10.1038/s41467-025-58678-6