Introduction

The utilization of molecular oncology assays to assist with diagnosis, classification, prediction of prognosis, and selection of therapy has become ubiquitous in recent years, especially in conjunction with the development and approval of numerous targeted therapies and immunotherapies1,2. Traditionally, molecular assays have examined DNA to identify actionable molecular alterations such as single-nucleotide variants, insertions and deletions, copy number variants, and gene rearrangements. This information has also been used to match cancer patients to clinical trials with the goal of improving survival. More recently, there has been an expanded use of RNA-based assays to identify targetable variants, such as gene fusions and splice alterations, and to apply gene expression signatures to generate diagnostic, prognostic, and predictive information to guide the care of patients with cancer. In addition, there has been a rapid increase in commercially available tests with important implications in cross-platform reproducibility and validity.

One such commercial assay in pancreatic ductal adenocarcinoma (PDAC) is PurISTSM from Tempus AI, Inc. (Chicago, IL, USA), which offers clinical genomics services including DNA sequencing and transcriptome profiling using exome capture3,4. PurIST (Purity Independent Subtyping of Tumors) is a single-sample classifier (SSC) algorithm developed to predict the molecular subtypes of PDAC based upon the relative expression of 8 pairs of genes5 and is being evaluated in clinical trials using the NanoString nCounter® technology6. PurIST, which uses tumor-intrinsic genes that characterize basal-like and classical subtype PDAC, was developed through the training and validation of gene expression data from microarray and RNA-seq platforms5. We and others have consistently found that patients with basal-like subtype tumors have significantly shorter overall survival (OS)5,7,8,9,10,11,12,13. In addition, basal-like tumors are more resistant to systemic FOLFIRINOX (folinic acid, 5-fluorouracil, irinotecan, oxaliplatin) therapy but may respond better to gemcitabine with nab-paclitaxel (GnP), whereas classical tumors are sensitive to FOLFIRINOX5,10,11,12,14. Preclinical evidence also suggests that the basal-like state renders tumor cells more sensitive to RAS inhibitors15,16, lending importance to the PurIST subtypes in light of ongoing RAS inhibitor trials. More recently, our group reported that basal-like tumors are enriched in receptor tyrosine kinases such as epidermal growth factor receptor (EGFR) and that patients with basal-like but not classical tumors respond to EGFR inhibition17. Based on these observations, prospective clinical trials are underway using PurIST subtypes as integral biomarkers in the neoadjuvant (PANCREAS trial, NCT04683315) and metastatic settings (PANGEA trial, NCT06483555) to select therapies based on molecular subtype.

After reporting the compatibility of PurIST classifier with their sequencing platform both with survival and response to FOLFIRINOX therapy using an internal database4, Tempus began offering PurISTSM for clinical use in May 2023. While integrating cancer classifiers with commercial clinicogenomics platforms is attractive in improving workflow by allowing clinicians to perform additional tests on samples that are already being sequenced, such tests should be analytically validated in a head-to-head comparison with a gold standard prior to being used for clinical decision-making. In fact, there have already been reports of inconsistent biomarker test results that may affect treatment allocation. Friends of Cancer Research Homologous Recombination Deficiency (HRD) Harmonization Project found discordant HRD calling in ovarian cancer between 17 assays that were tested (83% positive percent agreement, 80% negative percent agreement for clinical samples)18, which could impact the decision to treat with PARP inhibitors. Assays measuring tumor mutational burden (TMB) have also come under scrutiny for inconsistent results19, which is used to determine whether a patient should receive immunotherapy. Given that there is currently no data directly comparing cancer classifiers on different sequencing platforms, we sought to compare the performance of PurIST on the whole transcriptome RNA-seq that it was developed on versus exome capture RNA-seq in a commercial assay. In addition, we compared the results of whole transcriptome RNA-seq to the NanoString PurIST assay that is being used in two ongoing clinical trials.

Results

PurIST using exome capture RNA-seq overestimates basal-like subtype prevalence compared to whole transcriptome RNA-seq

Seventy-nine patients had subtyping results available from whole transcriptome and exome capture RNA-seq (Fig. 1A). Using whole transcriptome, 7 tumors (8.9%) were basal-like and 72 (91.1%) were classical. Using exome capture, 22 tumors (27.8%) were basal-like and 57 (72.2%) were classical. All 15 non-concordant samples were predicted to be classical using whole transcriptome but were basal-like with exome capture. Overall accuracy of subtyping from exome capture using whole transcriptome RNA-seq as reference was 81.0% (64/79). This resulted in a Cohen’s kappa coefficient of 0.402 with 95% confidence interval (CI) between 0.130 and 0.675, indicating only moderate agreement20.

Fig. 1: PurIST subtype prevalence and probabilities between sequencing methods.
figure 1

A Confusion matrix comparing classical and basal-like subtype prevalence by whole transcriptome and exome capture with Cohen’s kappa coefficient. B Paired boxplots of basal-like probabilities between whole transcriptome and exome capture. Points with connecting line represent the same sample. Line color is gray if there is no subtype switch and orange if there is a subtype switch between the two methods. One-tailed Wilcoxon Signed Rank test was used to determine whether there was a shift towards higher basal-like probability by exome capture RNA-seq. Small amount of random noise was added in x and y directions (maximum 0.05 in x direction and 0.02 in y direction) for visualization purposes only. C Scatter plot of basal-like probabilities between whole transcriptome and exome capture. D Plot of basal-like probability from whole transcriptome and probability discrepancy (basal-like probability using whole transcriptome minus basal-like probability using exome capture). Color of point outline is subtype by whole transcriptome and point fill is subtype by exome capture. EC-seq, exome capture RNA-seq; WT-seq, whole transcriptome RNA-seq.

As the pattern of non-concordance was significantly biased towards basal-like calling with exome capture (P = 9.08e-7, binomial test comparing basal-like proportions of exome capture to whole transcriptome RNA-seq), we leveraged the PurIST reporting of subtype as a continuous variable where the basal-like probability or the basal-ness of a sample can be evaluated5. Basal-like probabilities with exome capture were significantly higher than with whole transcriptome from the same samples (Paired Wilcoxon Signed Rank P = 1.491e-13, Fig. 1B), and overestimation of basal-like probabilities were more frequent at lower basal-like probabilities demonstrated by larger probability discrepancies (Fig. 1C, D). There was no significant correlation between inferred tumor purity and probability discrepancies (Pearson’s correlation r = −0.060 [95% CI, −0.277–0.163] for ESTIMATE and 0.144 [95% CI, −0.080–0.354] for DECODER; P = 0.600 for ESTIMATE and 0.206 for DECODER), indicating that the subtype mismatch was not due to low tumor purity (Fig. S1A, B). The range of inferred tumor purity by DECODER for non-concordant samples was between 39.4% and 56.4% (range 31.4–76.2% for all samples).

Exome capture RNA-seq leads to shifts in gene-pair expression ratios

We next examined why the exome capture method may have led to higher basal-like probabilities and higher prevalence of basal-like calls. PurIST relies on top scoring pair of genes (TSP) method, where ratios of eight gene pairs (one basal-like and one classical) are transformed into a basal-like probability. We found that the TSPs frequently switched from classical to basal-like with exome capture RNA-seq (Fig. 2A). The TSPs that were most frequently found to switch from classical to basal-like were KRT6A-ANXA10 (28/58 or 51% of cases; weight 1.031), BCAR-GATA6 (53/73 or 73% of cases; weight 0.618), and ITGA3-LGALS4 (46/60 or 77% of cases; weight = 0.059). Initially, we considered the possibility that exome capture might preferentially enhance detection of basal-like genes due to probe bias. However, when comparing the expression of PurIST genes between whole transcriptome and exome capture RNA-seq, we found that exome capture exhibited lower relative expression (see Methods) of 14 of 16 TSP genes spanning both classical and basal-like sets (Figs. 2B and S2), suggesting decreased global capture of PurIST genes. The three TSPs with frequent subtype switch from classical to basal-like demonstrated relatively similar ranked expression of the basal-like genes (KRT6A, BCAR3, ITGA3) between the two platforms but lower expression of the respective classical genes (ANXA10, GATA6, LGALS4).

Fig. 2: TSP subtype switch between sequencing methods.
figure 2

A Sankey diagrams demonstrating whether each TSP is basal-like (gene A > gene B) or classical (gene A < gene B) in whole transcriptome (left bars) and exome capture (right bars). B Heatmap of gene expression by whole transcriptome (left) and exome capture (right) of the genes used in PurIST. Each column represents a sample, and columns are ordered by increasing basal-like probability from whole transcriptome RNA-seq. Genes (rows) are ordered by TSP weights. Gene expression was percentile rank normalized. EC-seq exome capture RNA-seq, TSP top scoring pairs of genes, WT-seq whole transcriptome RNA-seq.

PurIST subtypes from exome capture are less prognostic of overall survival compared to whole transcriptome

To examine whether the discordant subtype calls led to different clinical outcomes, we examined the OS of patients based on their tumor subtypes by both methods. Patients with basal-like subtype from whole transcriptome RNA-seq demonstrated significantly shorter median OS of 12 months (95% CI, 10–16 months) compared to 33 months (95% CI, 25–42 months) in patients with classical subtype (log-rank P < 0.0001) with increased mortality hazard ratio (HR) of 6.13 (95% CI, 2.52–14.9; Figs. 3A and S3A). With exome capture, patients who were classified as having basal-like tumors demonstrated numerically shorter median OS of 22 months (95% CI, 16–29 months) compared to 35 months (95% CI, 29–46 months) for those with classical tumors, but this did not reach statistical significance (log-rank P = 0.061, Fig. 3B). While HR for mortality was elevated with basal-like subtype (1.73, 95% CI, 0.98–3.08; Fig. S3A), this was also not statistically significant (P = 0.0604). Log likelihood value was higher with whole transcriptome than with exome capture (−212.75 versus −216.51, P < 2.2e-16), suggesting that subtypes from whole transcriptome RNA-seq can better explain OS than exome capture. Basal-like subtype was associated with shorter progression-free survival using both methods (whole transcriptome: 11 months [95% CI, 8–14 months] vs. 22 months [95% CI, 16–29 months] in classical, log-rank P = 0.00053; exome capture: 13 months [95% CI, 9–18 months] vs. 23 months [95% CI, 17–31 months] in classical; log-rank P = 0.034; Figs. 3C, D and S3B).

Fig. 3: Kaplan–Meier estimates of OS and PFS according to PurIST subtype by sequencing method.
figure 3

A OS from whole transcriptome, B OS from exome capture, C PFS from whole transcriptome, and D PFS from exome capture. Three patients were missing PFS information. PFS progression-free survival, OS overall survival.

Higher concordance in subtypes with NanoString nCounter® platform

PurIST on NanoString platform in a CLIA-certified laboratory is currently being utilized for clinical trials. In this setting, the NanoString PurIST assay has been shown to be highly concordant with whole transcriptome RNA-seq results with 97% accuracy and calculated Cohen’s kappa coefficient of 0.819 (almost perfect agreement)6. Matched samples from the first 40 patients in the PANCREAS trial (NCT04683315) with NanoString PurIST results were processed using whole transcriptome RNA-seq. Of the 40 tumor samples, 4 were basal-like and 36 were classical using RNA-seq. On NanoString platform, 4 were basal-like and 36 were classical with one misclassification in each subtype category, resulting in overall accuracy of 95% (38/40) and Cohen’s kappa coefficient of 0.722 (95% CI, 0.347–1.097), indicating substantial agreement (Fig. 4A). There was no significant difference in paired basal-like probabilities between the two methods (Paired Wilcoxon Signed Rank P = 0.308; Fig. 4B). In addition, the switch between subtypes in the two discordant samples did not appear to be due to low tumor purity, as one of the samples had high proportion of malignant cells by pathology (approximately 50%) and most basal-like probabilities remained stable even at lower tumor purities (Fig. 4C). There was no significant correlation between percent malignancy and probability discrepancy (Pearson’s r = -0.066 [95% CI, −0.373–0.255], P = 0.692). The PANCREAS trial is still ongoing, and survival data are not yet available.

Fig. 4: PurIST subtype prevalence and probabilities in PANCREAS trial.
figure 4

A Confusion matrix comparing classical and basal-like subtype prevalence by whole transcriptome RNA-seq and NanoString nCounter® assay with Cohen’s kappa coefficient. B Paired dot plots of basal-like probabilities between whole transcriptome RNA-seq and NanoString. Points with connecting line represent the same sample. Line color is gray if there is no subtype switch, orange if subtype switch is from classical to basal-like, and blue if subtype switch is from basal-like to classical. Red crossbars represent median basal-like probabilities. Two-tailed Wilcoxon Signed Rank test was used to determine whether there was a difference in basal-like probability between platforms. Small amount of random noise was added in x and y directions (maximum 0.05 in x direction and 0.025 in y direction) for visualization purposes only. C Plot of tumor purity and probability discrepancy (basal-like probability using whole transcriptome RNA-seq minus basal-like probability using NanoString). Color of point outline is subtype by RNA-seq and point fill is subtype by NanoString. WT-seq, whole transcriptome RNA-seq.

Discussion

The integration of laboratory developed tests (LDTs) to clinical practice provides unprecedented access to precision oncology with the potential to improve patient outcomes. However, these commercial tests often involve proprietary technologies and lack transparency that may be barriers to interlaboratory reproducibility. Inconsistent results in other measures that could determine the treatment allocation for cancer patients, such as HRD and TMB, have already been reported18,19. As PurIST was developed prior to the ubiquitous availability of many commercial assays, we sought to determine if PurIST is compatible with tests that are now offered commercially. We report head-to-head comparisons of matched samples between different platforms used to measure gene expression. We found that PurIST subtyping using a commercial exome capture method overestimates the basal-like probability and the prevalence of basal-like tumors relative to whole transcriptome RNA-seq. Given that PurIST was originally developed on microarray and whole transcriptome RNA-seq data, we hypothesize that the observed non-concordance may result from differential gene enrichment in exome capture, potentially due to varying capture efficiencies. Additional factors influencing exome capture efficiency may include gene length and GC content. This results in certain genes to be measured with artificially higher or lower baseline levels of expression relative to the other gene in a TSP and therefore biasing the overall subtype probability call. Lastly, differences in bioinformatic tools for transcript quantification may also contribute to the subtype discordance, as the commercial platform employs kallisto21 whereas our pipeline uses Salmon22. The overestimation of basal-like probability is clinically meaningful as patients with basal-like tumors from exome capture sequencing do not have significantly worse OS compared to those with classical tumors in this study.

Single cell RNA-seq, spatial transcriptomics, and multiplex immunofluorescence methods have demonstrated that PDAC tumors are heterogeneous, such that even when a bulk tumor is classified as one subtype, it may contain both classical and basal-like cells as well as co-expressor cells9,23,24,25. In addition, PDAC is characteristically stroma-rich, with neoplastic cells often comprising only a minority of the tumor volume26. These features may complicate reliable subtyping. To address this, PurIST was specifically designed using genes expressed in cancer cells to circumvent the need for experimental or bioinformatic tumor enrichment and has demonstrated consistency between surgical specimens and biopsies5. In addition, subtype calls are not affected by tissue processing5, and PurIST can be used on RNA from fresh frozen as well as FFPE samples, which is useful given that clinical samples are commonly stored as FFPE blocks. PurIST also computes a basal-like probability that likely represents the degree of tumor basal-ness, which has been shown to correlate with response to systemic therapies5,27. In our retrospective cohort, we did not observe an association between discordant subtype calls and imputed tumor purity, suggesting that the differential classification was not driven by low tumor purity. Similarly, in the trial population, basal-like probabilities remained stable between whole transcriptome RNA-seq and NanoString even at low tumor purities, further supporting the robustness of PurIST subtyping in low-cellularity samples.

PDAC molecular subtypes have been validated in multiple clinical studies to be prognostic and predictive of response to systemic therapies5,11,12,14. Therefore, it is critical to accurately classify tumors to match patients to appropriate therapies. Recently published final results from the COMPASS prospective trial and PASS-01 randomized phase II trial confirmed that basal-like tumors are more resistant to FOLFIRINOX28,29, validating the importance of molecular subtypes for treatment selection. PurIST was developed to be compatible across multiple platforms including whole transcriptome RNA-seq, microarrays, and NanoString assay with the goal of being translatable for clinical application, and its use on the NanoString platform has been tested against the gold standard whole transcriptome RNA-seq5,6. Discordant subtype calling on alternative platforms that have not been validated may have an impact on patient treatment decisions in the future. For example, 15/22 (68.2%) of patients predicted to have basal-like tumors would have mistakenly received GnP as part of the PANCREAS trial rather than the recommended FOLFIRINOX regimen11. In the setting of PANGEA trial, these patients would receive GnP plus erlotinib, a small molecule inhibitor of EGFR, which may be specifically effective against basal-like subtype but have no benefit for classical subtype tumors17.

Our study has several limitations. The sample size of our retrospective cohort is small with a low basal-like prevalence of 9%, which limits the reliability of statistical analyses of survival. Similarly, the small numbers preclude definitive conclusions regarding treatment response between subtypes, reflecting both the challenges of assembling a real-world cohort and the costs associated with running multiple transcriptomic assays for matched samples. Nevertheless, there is clear evidence of subtype shifts with exome capture that reduce observed survival differences between subtypes. The paired nature of our primary analysis, where we examine the difference between subtype calls within the same patient sample, also offers much greater power in detecting, characterizing, and generalizing differences between platforms from a statistical perspective, analogous to power gains typically seen from using paired designs for comparisons (i.e., paired t-tests) versus unpaired designs (i.e., unpaired two-sample t-tests). We also lack pathology-based estimates of malignant cell proportions in the retrospective cohort. To address this, we applied two independent bioinformatic pipelines for tumor purity estimation, demonstrating that differential subtype calls are not attributable to low cellularity. Finally, the PANCREAS trial is ongoing, and treatment outcomes as well as survival data based on NanoString subtypes are not yet available. Results from this trial will be critical for further validation of our findings as well as additional recommendations regarding how to incorporate molecular subtypes into clinical practice.

In summary, our results suggest that classifiers such as PurIST that are predicated on the comparison of expression between genes may generate different classifications depending on sequencing platforms, with misleading results impacting clinical decision making. For replicable and accurate application of classifiers dependent on gene expression comparisons, assays should be carefully evaluated between platforms prior to adoption. Given the widespread use of commercial sequencing assays in clinical practice, it may be necessary to identify alternative TSPs, retrain PurIST to improve concordance with whole transcriptome RNA-seq, and/or consider an alternative threshold of the basal-like probability that determines the subtype classification. Retrained models may need to be specific to each exome-capture platform, as there may be differences in probe biases and other sources of between-gene technical variations. Studies that apply multiple platforms to samples from the same patient may facilitate the identification of alternate gene pairs that are more consistent between modalities for retraining. Monitoring of the performance of retrained platform-specific models would be necessary to ensure that such classifiers remain relevant with future changes or updates to these platforms over time, adding another layer of complexity in their application. Based on our findings, we recommend that PurIST not be used on exome capture data for clinical decision making until cross-platform validation of a retrained model is completed. Our study highlights how technical differences between platforms can substantially alter subtype classification and its prognostic value, underscoring the need for rigorous clinical and analytical validation such as head-to-head cross-platform comparisons with matched samples before clinical adoption of any LDT.

Methods

Study design

Retrospective cohort

The retrospective cohort included PDAC patients treated at a single center. The study was approved by the institutional review board at Medical College of Wisconsin (Milwaukee, WI) and conducted in accordance with the Declaration of Helsinki. All patients who had their tumors sequenced with Tempus xT/xR assay, which utilizes exome capture RNA-seq, were included. Whole transcriptome RNA-seq was performed from the same archival blocks except for one sample in which Tempus xT/xR assay was performed on a biopsy sample and RNA-seq was done on a surgical specimen with concordant subtype between platforms. Patient demographics, treatment information, and clinical outcomes were collected retrospectively with a cut-off date of September 2024.

Trial cohort

In an independent cohort from the PANCREAS trial (ClinicalTrials.gov identifier: NCT04683315), which was approved by the institutional review boards at both the Medical College of Wisconsin and HonorHealth Research Institute (Scottsdale, AZ) and also conducted in accordance with the Declaration of Helsinki, the NanoString nCounter® assay was used in a Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory to measure PurIST gene transcripts and the remaining RNA samples were used in whole transcriptome RNA-seq. All patients provided informed consent prior to inclusion in the trial. NanoString assay directly quantifies RNA molecules and is the technology used for PAM50-based Prosigna® breast cancer classifier21,30.

Data acquisition and processing, PurIST subtyping

Raw sequencing data from Tempus xT/xR was obtained and processed as previously described22,23,31,32. NanoString assay was performed as previously described in a CLIA-certified laboratory at UNC Health6. Whole transcriptome RNA-seq was done as previously described5. Libraries from formalin-fixed paraffin-embedded (FFPE) tumors were prepared with KAPA RNA HyperPrep Kit with RiboErase (Roche) per manufacturer protocol. Libraries were sequenced on NextSeq 500 to obtain 60 million reads per sample. Tumor subtype was called using PurIST as described previously5. Normalized expression for Tempus xT/xR22,31 and RNA-seq5 or raw counts for NanoString were used. When comparing the relative expression of genes across platforms, the normalized expression values were adjusted using percentile ranking to ensure that the expression data were distributed on a constant scale.

Statistics

All analyses were performed in R (v4.4.2). Agreements between subtypes from different methods were examined using Cohen’s kappa (fmsb R package v0.7.6). Survival was analyzed using Kaplan–Meier estimator and Cox proportional hazards regression model (survival R package v3.7-0). Plots were generated with ggplot2 (v3.4.4), survminer (v0.4.9), and ggsankey (v0.0.99999) R packages. Heatmaps were generated with ComplexHeatmap (v2.20.0) R package. Tumor purity from RNA-seq data was inferred using ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumors using Expression data)24,33 using the tidyestimate (v1.1.1.9000) and DECODER25,27 using the decoderr (v0.0.0.9000) R packages.