Main

Early diagnosis of cancer is one of the best strategies to improve patient survival and decrease treatment-related side effects that contribute to poorer health; however, this strategy poses a risk of overtreatment6. Therefore, accurate biomarkers of early cancer progression are needed to stratify patients. Copy number (CN) alterations, although common in cancer, are rarely found in normal tissues, raising the question of whether these signals could help diagnose patients earlier.

This strategy can be tested in esophageal adenocarcinoma (EAC), which has a 5-year survival rate of less than 20%7. Its precursor tissue is known as Barrett’s esophagus (BE); however, the risk for a patient with BE progressing to EAC is only around 0.3% per annum8. Current surveillance programs focus on the presence and grade of dysplasia in BE patients as determined by histopathological examination of biopsies. Low- (LGD) and high-grade dysplasia (HGD) are used as surrogates for early cancer transformation and trigger intervention, commonly by endoscopic resection and radiofrequency ablation (RFA)9,10. Additional risk factors for progression include increasing age, male gender, greater length of the BE segment and tobacco use at the initial evaluation, although these are not yet part of the clinical guidelines11.

Improvements in risk assessment have focused on identifying individual molecular biomarkers, particularly p53 expression12,13,14,15,16 and DNA-methylation changes17,18. However, identification of mutational biomarkers for progression has been difficult, due to the low frequency of recurrent point mutations in either BE19 or EAC20,21. Instead, EAC and BE are characterized by early and frequent genomic (CN and structural) instability20,21,22,23,24. As ongoing genomic instability leads, to a large extent, to clonal diversity, multiple investigations have focused on the heterogeneity and diversity of BE tissues25 as markers of higher risk26,27,28,29.

We investigated genome-wide CN instability as a marker for risk of progression using shallow whole-genome sequencing (sWGS; average depth 0.4×) in a retrospective, demographically matched, case–control cohort of patients (n = 88), with all available endoscopy samples (n = 777) collected during clinical surveillance for BE (Fig. 1a). Shallow WGS was chosen as the protocol because it provides a genome-wide perspective on CNs and the level of genomic instability and has been optimized for use in formalin-fixed paraffin-embedded (FFPE) samples30.

Fig. 1: CN profiles in BE vary over space and time.
figure 1

a, The case–control cohort design for the discovery patient cohort (n = 88). Nonprogressor (NP) patients had a minimum follow-up of 3 years; progressor (P) patients had a minimum 1-year follow-up; all patients start at nondysplastic Barrett’s esophagus (NDBE). Archival samples were collected from every available endoscopy over time, and along the length of the BE segment. bd, Bar plots showing the adjusted CN values across the genome in 5-Mb windows, with relative (within each sample) gains shown in the positive y axis, and relative losses shown in the negative y axis. b, Genomic CN profiles of individual samples for an NP patient (top) and a P patient (bottom). The colors across the chromosomes in each sample are based on the location relative to the place it was taken in the esophagus (sample nearest to the esophageal–gastric junction at the bottom, up the BE segment), and the ideograms to the right of the plots show the samples that belong to a single endoscopy indicated by the year. Note the variability in the CN profiles within samples from the progressor patient in chromosomes 14 and 17, in contrast to the shared pattern across the NP patient in those regions. c,d, Distribution of relative CN values at each genomic segment across all samples in the NP and P patient groups. The gray in the middle is the median ± 1 s.d., indicating a probable diploid genome value. Purple and green show the range of relative gains and losses, respectively. In c all samples, regardless of pathology, are plotted and a large variation in the CN between P (n = 349) and NP (n = 424) patients is clear (that is, chromosomes 1, 4, 9 and 11). In d only NDBE samples from NP (n = 346) and P (n = 172) patient groups are plotted, and the P patients still show a much larger CN signal despite being pathologically indistinguishable.

Source data

CN patterns were examined at multiple levels of the esophagus to understand how patients who progress differ from nonprogressors. We observed that the genomes of individual progressive patients display a generalized disorder across the genomes that varies between samples and over time (Fig. 1b). In addition, CN changes were not confined to cytological atypia (for example, LGD, HGD), because similar profiles were observed for the nondysplastic BE (NDBE) samples (n = 518; Fig. 1c,d).

The CN information and a measure of overall complexity (see Methods and also Extended Data Fig. 1) were used to generate a crossvalidated, elastic-net-regularized logistic regression model of progression and classification, with the endpoint HGD or intramucosal cancer (IMC; see Methods), and subsequently validated using an independent cohort of 76 patients (n = 213 samples), alongside an orthogonal validation of the Seattle BE Study SNP array samples (n = 1,272) from 248 patients31.

This model was designed to be independent of demographic risk factors11 because our cohort was matched for sex, BE segment length, age at diagnosis and smoking status (see Supplementary Table 1). We used the area under curve (AUC) of the receiver operating characteristic (ROC) to evaluate the model training performance. As the model included the diagnostic samples with the most extreme CN (for example, HGD and IMC), we additionally trained a model excluding these, and found that the AUC concordance was high (see Extended Data Fig. 2a), indicating that the model was not sensitive to extreme samples. Aggregating predictions either per endoscopy (mean or maximum sample predictions) or per patient (mean or maximum predictions excluding HGD/IMC samples) did not measurably increase the prognostic accuracy (see Extended Data Fig. 2b), suggesting that a single sample (for example, pooled four-quadrant biopsy) may be sufficient for prediction, which could be ideal for clinical application.

Using all sample predictions generated by the model we evaluated the relative risk (RR) across the cohort. Those samples with the highest RR were more than 20× more likely to progress than average, whereas those with the lowest RR were 10× less likely (Fig. 2a). This information enabled us to calibrate risk classifications based on the enrichment of samples from progressor or nonprogressor patients to maximize the sensitivity of our classes: ‘low’ (probability (Pr) ≤ 0.3; sensitivity = 0.87, specificity = 0.65), ‘moderate’ (0.3 > Pr < 0.5) or ‘high’ (Pr ≥ 0.5, sensitivity = 0.72, specificity = 0.82).

Fig. 2: Genomic predictions of BE progression.
figure 2

a, Histogram of the log(RR) (x axis) of cancer progression across all samples (n = 773) in the discovery cohort, based on the leave-one-patient-out predictions (number of samples, y axis). The samples with the highest RR are predicted to have >30× greater risk of progression (red), whereas the samples with the lowest RR are predicted at a 10× lower risk (blue). The inset shows the calibration of the predicted (x axis, ratio of progressor:nonprogressor (P:NP) patient samples) and mean observed probability (y axis) of progression, evaluated in deciles. The ‘low’ (blue) and ‘high’ (red) risks are enriched for nonprogressor and progressor patients, respectively. Error bars show the 95% confidence interval of the observed:predicted ratio of patients in each decile. b,c, Rate of sample risk classifications in the discovery cohort of 88 patients (n = 773 samples) (b) plotted per pathology (for example, NDBE, ID, LGD, HGD, IMC). The blue bar in ID is 3.1%, and the blue and yellow bars in HGD are each 2.7%. These show that our model can predict progression before pathological changes are visible in NDBE samples and that these predictions are consistent in the independent validation cohort of 76 patients (n = 213 samples) (c). d, Illustration of risk classes across all samples in the discovery cohort (n = 773). The row above the line shows nonprogressor patients (n = 43), whereas the row below the line shows progressor patients (n = 45). Each box of tiles denotes samples from a single patient, indicated by the study-allocated patient number above each box. On the x axis endoscopies are plotted from the baseline on the left, to the endpoint (HGD/IMC in progressors, last available for nonprogressors) endoscopy on the right. The y axis indicates the relative location of the sample, starting from the sample nearest the esophageal–gastric junction (EGJ) at the bottom up the length of the BE segment. Pop-out: heatmap for patient 69 zoomed in to show the axis labels. Absolute time and location will be different in each patient. All heatmaps showing axis labels and pathology are included in Extended Data Fig. 5.

Source data

Samples from patients who progressed were classified as ‘high risk’ for progression independent of histopathology (Fig. 2b). Most importantly, CN profiles in NDBE samples that belonged to progressor patients were classified as high risk in 60.5% (104/172), whereas in nonprogressor patients 64.7% (224/346) of samples were classified as ‘low risk’.

The model was then used to predict and classify risks per sample for the validation cohort (76 patients, 213 samples). Of samples from nonprogressor patients, 78/142 (55%) were classified as low risk, and 55/71 (77%) of samples from patients who progressed were classified as high risk. As in the discovery cohort, high-risk classification of progressor patient samples was largely independent of histopathology (Fig. 2c). Similarly, when we used our model to classify the historical Seattle study patient dataset (n = 248, samples = 1,272 SNP array) we again find that samples from progressors are classified as high risk regardless of pathology (see Extended Data Figs. 3 and 4). However, in this case the algorithm unsurprisingly suffers a loss of accuracy due to the differences in the methodology (see Supplementary Information for complete analysis and endpoint differences).

When sample classifications were plotted according to their spatial distribution in the segment and time of collection in the clinical history, strikingly concordant patterns emerged. Most progressive patient samples are classed as high risk throughout the disease history, whereas nonprogressive patient samples are consistently low risk (Fig. 2d and see also Extended Data Fig. 5). This concordance is evident when we plot the highest risk at each time point per patient (Fig. 3a). For patients who progress, 50% (8/16) of endoscopies had at least one sample classified as high risk ≥8 years before transformation. This classification is in accordance with current diagnostic guidelines that require only a single dysplastic sample to recommend treatment for a patient (Fig. 3b). Cases who lack early CN patterns of progression acquired these over the following years, leading to 78% (18/23) of endoscopies with at least one high-risk sample 1–2 years before HGD/IMC diagnosis.

Fig. 3: Cancer risk over time.
figure 3

a, Per-endoscopy, mean aggregated risks plotted per patient (y axis) over time (x axis) in the months since the initial endoscopy (time 0). The lines between each time point are colored by the maximum (between the initial and final endoscopy) risk classification. The right plot shows patients who progressed, with most patient endoscopies consistently classified as ‘high’ risk. Similarly, in the left plot showing nonprogressor patients, there is a group consistently predicted as ‘low’ risk. The interesting patients are the nonprogressors who have consistently been ‘high’ risk. Follow-up continues on these patients and it is possible that they may ultimately progress to HGD/IMC. b, The progressive patients, using the highest risk (similar to the current guidelines using the highest pathology grade), show that CN can identify 50% of high-risk endoscopies in patients >8 years before HGD or cancer. Bars are the ratio of ‘high’-risk endoscopies to all endoscopies for that time period; error bars indicate mean ± s.e.m.

Source data

More interesting were the patients who have not yet progressed but display a consistent pattern of high-risk endoscopies. Two patients were high risk in every sequenced sample, whereas the remaining patients displayed a mix of risks at each time point (Fig. 2d), presenting what could be clonal diversity in very early progression to EAC (follow-up for these patients continues) and resulting in consistent high risk over time (Fig. 3a).

Statistical algorithms can be improved by increasing the size of the dataset. We therefore conducted subsampling of the discovery cohort with increasing numbers of patients and model training as described in Methods. With each increment in the number of patients the predictive accuracy of the model increased, reaching a (crossvalidated) AUC of 0.89 (specificity = 0.83, sensitivity = 0.82) when combining all discovery and validation patients (n = 164; see Extended Data Fig. 6), indicating that a larger knowledge bank of CN and progression data from BE will continue to improve the precision of patient stratification and the sensitivity of the model, by adding stronger statistical signals and accounting for broader biological variation.

Current guidelines for the management of BE focus on the length of the BE segment and the presence or absence of LGD/HGD in any biopsy sample taken during endoscopy32,33. Most of our patients were under treatment before the current treatment recommendations for LGD, and hence we can compare a set of recommendations based on the current guidelines33 with our model applying similar criteria, but overlaying our risk classifications (Fig. 4a). We applied these recommendations across our entire discovery cohort (88 patients) and evaluated the first 2 endoscopies available excluding the endpoint (Fig. 4b and see also Supplementary Table 2). Using these criteria at the patient’s second surveillance endoscopy available (that is, several years before transformation), 54% of progressor patients (19/35) would have received earlier treatment. Only five of these patients had repeat LGD diagnoses that could recommend earlier treatment or more aggressive surveillance under current pathology-based guidelines. Of progressor patients, 40% (14/35) would continue to receive yearly surveillance per current guidelines. The remaining 6% (2/35) would have been recommended reduced surveillance (3–5 years), but they would not have been diagnosed any differently under current guidelines because they were consistently NDBE. One patient (13) may have had delayed treatment, but this would have occurred under current guidelines as well because no dysplasia was identified before transformation. Of patients who have not progressed, 51% (21/40) would have less frequent endoscopies, 33% (13/40) would continue to receive yearly surveillance per current guidelines and 17% (7/40) would have had potentially unnecessary treatment compared with current guidelines. Three patients from our discovery cohort are shown with the guidelines compared (Fig. 4c,d) as examples. Furthermore, the increasing sensitivity of the model as samples are taken closer to the endpoint is evident, because the most progressive patients are recommended treatment at their penultimate endoscopy whereas none would be recommended longer surveillance times.

Fig. 4: CN profiling facilitates earlier treatment and reduced monitoring.
figure 4

a, A schematic overview of surveillance guidelines based on the CN model risk classes. It is important to note that these guidelines would apply at each endoscopy, and that they use information from the previous endoscopy to determine the treatment or surveillance. b, This schematic used to characterize the discovery cohort patients after their second endoscopy (many years before dysplastic transformation); patients with only a single sequenced endoscopy before their endpoint are excluded for a total n = 76 patients (see Supplementary Table 2). The y axis provides the four recommendations in order from the schematic in a. All bars show the total number of patients for the specific recommendation split between nonprogressor (blue) and progressor (red) patients. In ‘3- to 5-year surveillance’ at the top, the blue bar indicates the number of nonprogressor patients who would have reduced treatment needs over time (n = 21), whereas, in the ‘RFA’ recommendation at the bottom, the red bar shows those progressor patients who would have had earlier intervention (n = 19). All patients in the middle two groups would receive the same surveillance as current guidelines recommend. ce, Individual patients with each sample plotted at the time of endoscopy and location within the esophagus. Samples are colored based on their risk class, and shapes inside the tiles describe diagnosed histopathology of the sample. Relevant clinical information is included above each endoscopy plot, including the length of the BE segment and patient age at diagnosis. British Society of Gastroenterology’s (BSG) recommendations for each patient are based on the 2014 BE management guidelines33 and shown in gray text; the blue text indicates CN model recommendations from the schematic in a. EGD, esophagogastroduodenoscopy. Below the patients are the overall follow-up recommendations for the current guidelines and the CN model.

Source data

Recent evidence from the large-scale pan-cancer studies have suggested that genomic alterations are present many years before detectable disease1 in many cancer types. BE constitutes a known pre-malignant condition with historical follow-up to test whether genomic medicine can contribute to early cancer detection. Previous studies of BE progression have shown that genomic and epigenetic changes are present before cancer progression and differ in patients who do ultimately develop cancer including: p53 expression12,14, DNA-methylation changes17,18, CN losses and copy neutral loss of heterozygosity26,28,34, and high clonal diversity27.

However, our analysis has shown that even highly variable CN profiles generated from the entire biopsy sample (not dissected or separated) translate into surprisingly stable predictions of a patient’s risk of progression. Furthermore, these single-sample predictions were as accurate as aggregated data from multiple biopsies across the entire endoscopy or patient, showing that, despite high levels of divergence, there are common patterns of CN alterations indicative of progressive disease. This level of predictive power using a genome-wide algorithm is more challenging to achieve with a focused biomarker approach given the disease heterogeneity.

Perhaps most interestingly for biomarker investigations is that, although our statistical model selects some genomic regions of instability as features that are known to be early drivers of EAC (for example, TP53; see Extended Data Fig. 7), few other features have any clearly associated tumor-suppressor genes or other cancer-related activity (see Supplementary Table 3). The heterogeneous nature of BE would partly explain the differences between the features our model selects as contributing to progression from those found in previous studies28; however, there is currently no clear functional explanation for most of the features identified. It is likely that the sum of many small changes and the breakdown of gene-regulatory control fuel oncogenicity.

Although the present study provides good evidence that genomic changes can predict future cancer risk, it is limited by the relatively small number of patients in the cohort, particularly patients who progress. Future studies that include more longitudinal genomic data will improve the sensitivity and specificity estimates of this model.

Ultimately, the combined use of low-cost genomic technologies, standard clinical samples and statistical modeling presented here is an example of how genomic medicine can be implemented for early detection of cancer. This demonstrates that genomic risk stratification has a realistic potential to enable earlier intervention for high-risk conditions, and at the same time reduce the intensity of monitoring and even reduce overtreatment in cases of stable disease.

Methods

Patient cohorts

A nested case–control cohort of 90 patients was initially recruited to the present study from patients who had been under surveillance for BE in the east of England from 2001 to 2016 for a total of 632 person-years. Permission to analyze existing clinical diagnostic samples was approved by the North West Preston Research Ethics Committee (REC 14-NW-0252). Cases comprised 45 patients who progressed from NDBE to HGD or IMC with a minimum follow-up of 1 year (mean ± s.d. = 4.6 ± 3.7 years). Controls were 45 patients who had not progressed beyond LGD, starting from NDBE with a minimum follow-up of 3 years (6.7 ± 3.2 years). Cases and controls were matched for age, gender and length of BE segment (see Supplementary Table 1). Patients had endoscopies at intervals determined by clinical guidelines with four-quadrant biopsies taken every 2 cm of BE length (the Seattle protocol). One nonprogressor patient revoked consent before analysis and a second nonprogressor was later removed during analysis when multiple comorbidities affecting the esophagus were identified. A total of 777 samples were sequenced, with 773 passing our post-processing quality control. An additional 8 technical replicates from 2 patients were sequenced for comparison, but only one set of replicates was included in the 773-sample set.

An independent, unmatched cohort of 75 patients was subsequently selected from patients under surveillance for BE in the east of England from 2001 to 2018 for model validation. This cohort comprised 18 patients who had progressed from NDBE to HGD or IMC with a minimum follow-up of 1 year (6.1 ± 3.4 years) and 58 patients who had not progressed beyond LGD starting from NDBE, with a minimum follow-up of 1.5 years (5.4 ± 3.0 years). The earliest available endoscopy samples subsequent to initial BE diagnosis were obtained to assess future risk. No diagnostic endpoint samples (for example, HGD or IMC) were included. This cohort was selected from available samples with no attempt to match demographics; however, no significant differences were found between the groups (see Supplementary Table 4). A total of 219 samples was sequenced from this cohort, with 213 passing our post-processing quality control.

Each sample from both cohorts was graded by multiple expert gastrointestinal histopathologists using current clinical guidelines for IMC, HGD, LGD, indeterminate (ID) and NDBE. A single biopsy graded as HGD or IMC was considered the endpoint for progression because patients were immediately recommended for treatment in the clinic. Since 2014, patients with LGD are also routinely treated with RFA, making prospective analysis of the real rate of progression difficult.

All patients had previously given informed consent to be part of the following studies: the Progressor study (REC 10/H0305/52, Cambridge South Research Ethics Committee), Barrett’s Biomarker Study (REC 01/149, Cambridge Central Research Ethics Committee), OCCAMS (REC 07/H0305/52 and 10/H0305/1, Cambridge South Research Ethics Committee), BEST (REC 06/Q0108/272, Cambridge Central Research Ethics Committee), BEST2 (REC 10/H0308/71, Cambridge Central Research Ethics Committee), Barrett’s Gene Study (REC 02/2/57, the London Multi-centre Research Ethics Committee), Time & TIME 2 (REC 09/H0308/118, Cambridge Central Research Ethics Committee), the NOSE study (REC 08/H0308/272, Cambridge Central Research Ethics Committee) and the Sponge study (REC 03/306, Cambridge Central Research Ethics Committee).

All patient and sample metadata were collected by study nurses at NHS Addenbrooke’s Hospital, UK and collated in Microsoft Excel 2016 spreadsheets.

Patient samples from the Seattle Barrett’s Esophagus Study31, which uses SNP arrays as an orthogonal measure of CN with an endpoint of EAC, were also included for further validation (see Supplementary Information).

Tissue sample processing and p53 immunohistochemistry

FFPE tissue samples from routine surveillance endoscopies were processed from scrolls, without microdissection because this protocol aims to be clinically relevant. Following the Seattle protocol for endoscopic surveillance, four-quadrant biopsies were taken every 1–2 cm of the BE length at each endoscopy per patient. At each 1- to 2-cm length the quadrant biopsies were pooled for sequencing as a single sample to ensure that sufficient DNA (75 ng) was present.

An additional section at each level of the Barrett’s segment (n = 88, n = 590 sections) was stained (immunohistochemistry) using a monoclonal antibody for wild-type and mutant p53 (NCL-L-p53-D07, ready-use solution, protein concentration 10 mg ml−1) at the NHS Addenbrooke’s Hospital, UK on the Leica BOND-MAX system using Bond Polymer Refine Detection reagents (Leica Microsystems UK Ltd.), and graded by an expert pathologist as aberrant (absent or overexpressed) or normal35,36.

Shallow WGS pipeline

Single-end 50-bp sequencing was performed at a depth of 0.4× on the Illumina HiSeq platform. Sequence alignment was performed using BWA37 v.0.7.15, and pre-processing of the reads for mappability, GC content and filtering was performed with quantitative DNA-sequencing (QDNAseq)30 using 50-kb bins. Only autosomal sequences are retained after filtering due to low-depth mappability and GC correction. Samples were segmented for CN analysis using the piecewise constant fit function in the R Bioconductor ‘copynumber‘ v.1.16 package38. Input to this function was the GC-adjusted read counts from QDNAseq.

Post-processing quality control

Per-segment residuals were calculated and the overall variance across the median absolute deviation of the segment residuals was derived as a per-sample quality control measure. This measure was developed using an additional set of samples (n = 233), from fresh–frozen tumor tissue, FFPE cell-line tissue and FFPE patient samples. No relationship was found between sample age and data quality, and post-segmentation quality issues were not resolvable (see Extended Data Fig. 8). Therefore, samples with a mean variance of the segment residuals >0.008 were excluded from analysis. This excluded more than 73% (171/233) from the quality control samples across all sample types (FFPE patient, FFPE cell line, fresh–frozen tumor). In the discovery cohort we excluded 0.5% (4/777) of samples and in the validation cohort 2% (6/219) of samples.

Statistical methods

We encoded all CN data on a genome-wide scale by taking a per-sample weighted average across the segmented values per 5-Mb window, and mean standardizing per genomic window across the entire cohort. To evaluate chromosomal instability on a larger scale, we averaged the segmented values across chromosome arms and adjusted each 5-Mb window by the difference between the window and the arm. The resulting data were 589 5-Mb windows and 44 chromosome arms. We additionally included a measure of genomic complexity (cx) by summing, per sample, the 5-Mb windows that had CN values 2 s.d. from the mean.

We performed elastic-net regression with the R glmnet39 package to fit regression models with varying regularization parameters. Fivefold crossvalidation, repeated 10×, was performed on a per-patient basis, removing all samples from 20% of patients in each fold. This process was performed in three conditions: using all samples; excluding HGD/IMC samples; and excluding LGD/HGD/IMC. The two exclusion conditions were performed to assess the contribution of dysplasia to the classification rate of the model.

The model was additionally tuned on two parameters: (1) QDNAseq bin size and (2) elastic-net regression penalty, between 0 (ridge) and 1 (lasso). We assessed the crossvalidation classification performance of the model at multiple QDNAseq bin sizes and multiple regression penalties. We selected the final QDNAseq bin size by comparing the leave-one-patient-out predictions from the discovery cohort with the model predictions for the validation. This was done to minimize the batch errors in the raw data (see Extended Data Figs. 9 and 10). For the regression penalty parameter, all models had a crossvalidation classification rate of 72–75%. We therefore selected the parameter that limited the number of non-zero coefficients (n = 74) and was not full lasso (for example, 0.9). Coefficients determining the log(RR) of change stemming from a unit change were calculated for each genomic region selected.

Subsequently, a leave-one-patient-out analysis (excluding all samples of an individual) was performed to generate predictions for all samples from a single individual and estimate the overall model accuracy using the AUC of the ROC with the R pROC40 package.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.