Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Big data in basic and translational cancer research

Abstract

Historically, the primary focus of cancer research has been molecular and clinical studies of a few essential pathways and genes. Recent years have seen the rapid accumulation of large-scale cancer omics data catalysed by breakthroughs in high-throughput technologies. This fast data growth has given rise to an evolving concept of ‘big data’ in cancer, whose analysis demands large computational resources and can potentially bring novel insights into essential questions. Indeed, the combination of big data, bioinformatics and artificial intelligence has led to notable advances in our basic understanding of cancer biology and to translational advancements. Further advances will require a concerted effort among data scientists, clinicians, biologists and policymakers. Here, we review the current state of the art and future challenges for harnessing big data to advance cancer research and treatment.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Considerations for using big data in translational applications and basic research.
Fig. 2: Prospective clinical studies guided by omics data to use off-label drugs.
Fig. 3: Data-driven artificial intelligence to support cancer diagnosis.
Fig. 4: Design of new kinase inhibitors using a generative artificial intelligence model.

Similar content being viewed by others

References

  1. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

    Article  CAS  PubMed  Google Scholar 

  2. Weinstein, J. N. et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–110 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Edgar, R., Domrachev, M. & Lash, A. E. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Deng, J. et al. ImageNet: a large-scale hierarchical image database. 2009 IEEE Conf. Computer Vis. Pattern Recognit. https://doi.org/10.1109/cvprw.2009.5206848 (2009).

    Article  Google Scholar 

  5. Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20, 257–272 (2019).

    Article  CAS  PubMed  Google Scholar 

  6. Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 1661–1662 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Deshwar, A. G. et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16, 35 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Miller, C. A. et al. SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput. Biol. 10, e1003665 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Minussi, D. C. et al. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature 592, 302–308 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Laks, E. et al. Clonal decomposition and DNA replication states defined by scaled single-cell genome sequencing. Cell 179, 1207–1221.e22 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Zhao, T. et al. Spatial genomics enables multi-modal study of clonal heterogeneity in tissues. Nature 601, 85–91 (2022).

    Article  CAS  PubMed  Google Scholar 

  14. Przybyla, L. & Gilbert, L. A. A new era in functional genomics screens. Nat. Rev. Genet. 23, 89–103 (2022).

    Article  CAS  PubMed  Google Scholar 

  15. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e17 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR-Cas9. Nat. Rev. Genet. 16, 299–311 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576.e16 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Johannessen, C. M. et al. A melanocyte lineage program confers resistance to MAP kinase pathway inhibition. Nature 504, 138–142 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

    Article  CAS  PubMed  Google Scholar 

  22. Hafner, M. et al. CLIP and complementary methods. Nat. Rev. Methods Prim. 1, 20 (2021).

    Article  CAS  Google Scholar 

  23. Vidal, M., Cusick, M. E. & Barabási, A.-L. Interactome networks and human disease. Cell 144, 986–998 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kempfer, R. & Pombo, A. Methods for mapping 3D chromosome architecture. Nat. Rev. Genet. 21, 207–226 (2020).

    Article  CAS  PubMed  Google Scholar 

  25. Liu, R. et al. Evaluating eligibility criteria of oncology trials using real-world data and AI. Nature 592, 629–633 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. van der Laak, J., Litjens, G. & Ciompi, F. Deep learning in histopathology: the path to the clinic. Nat. Med. 27, 775–784 (2021).

    Article  PubMed  Google Scholar 

  27. Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H. & Hjwl, A. Artificial intelligence in radiology. Nat. Rev. Cancer 18, 500–510 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Gillies, R. J., Kinahan, P. E. & Hricak, H. Radiomics: images are more than pictures, they are data. Radiology 278, 563–577 (2016).

    Article  PubMed  Google Scholar 

  29. Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat. Med. 24, 1550–1558 (2018). This integrative study of tumour immune evasion across many clinical datasets reveals that SERPINB9 expression consistently correlates with intratumoural T cell dysfunction and resistance to immune checkpoint blockade.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Parkinson, H. et al. ArrayExpress — a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 35, D747–D750 (2007).

    Article  CAS  PubMed  Google Scholar 

  31. Gentles, A. J. et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat. Med. 21, 938–945 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Tomlins, S. A. et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 644–648 (2005). This compendium analysis across 132 gene expression datasets representing 10,486 microarray experiments identifies ERG and ETV1 fused with TMPRSS2 as highly expressed genes in six independent prostate cancer cohorts.

    Article  CAS  PubMed  Google Scholar 

  33. Jiang, L. et al. Direct tumor killing and immunotherapy through anti-serpinB9 therapy. Cell 183, 1219–1233.e18 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Jiang, P. et al. Systematic investigation of cytokine signaling activity at the tissue and single-cell levels. Nat. Methods 18, 1181–1191 (2021). This study describes a transcriptomic data atlas collected from cytokine treatments in bulk cell cultures, which enables the inference of signalling activities in bulk and single-cell transcriptomics data to study human inflammatory diseases.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).

    Article  CAS  PubMed  Google Scholar 

  36. Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    Article  PubMed  Google Scholar 

  37. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Nygaard, V., Rødland, E. A. & Hovig, E. Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics 17, 29–39 (2016).

    Article  PubMed  Google Scholar 

  40. Boehm, K. M., Khosravi, P., Vanguri, R., Gao, J. & Shah, S. P. Harnessing multimodal data integration to advance precision oncology. Nat. Rev. Cancer 22, 114–126 (2022).

    Article  CAS  PubMed  Google Scholar 

  41. Huang, C. et al. Proteogenomic insights into the biology and treatment of HPV-negative head and neck squamous cell carcinoma. Cancer Cell 39, 361–379.e16 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021). This study integrates multiple single-cell data modalities, such as gene expression, cell-surface protein levels and chromatin accessibilities, to increase the accuracy of cell lineage clustering.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Klein, M. I. et al. Identifying modules of cooperating cancer drivers. Mol. Syst. Biol. 17, e9810 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Hofree, M., Shen, J. P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nat. Methods 10, 1108–1115 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Reyna, M. A. et al. Pathway and network analysis of more than 2500 whole cancer genomes. Nat. Commun. 11, 729 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Zheng, F. et al. Interpretation of cancer mutations using a multiscale map of protein systems. Science 374, eabf3067 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Paull, E. O. et al. A modular master regulator landscape controls cancer transcriptional identity. Cell 184, 334–351 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Avila Cobos, F., Alquicira-Hernandez, J., Powell, J. E., Mestdagh, P. & De Preter, K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat. Commun. 11, 5650 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Wang, K. et al. Deconvolving clinically relevant cellular immune cross-talk from bulk gene expression using CODEFACS and LIRICS stratifies patients with melanoma to anti-PD-1 therapy. Cancer Discov. 12, 1088–1105 (2022). Together with Newman et al. (2019), this study demonstrates that assembling gene expression profiles of diverse cell types from existing datasets can enable deconvolution of cell fractions and lineage-specific expression in a bulk-tumour expression profile.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Suvà, M. L. & Tirosh, I. Single-cell RNA sequencing in cancer: lessons learned and emerging challenges. Mol. Cell 75, 7–12 (2019).

    Article  PubMed  Google Scholar 

  54. Zhang, Y. et al. A T cell resilience model associated with response to immunotherapy in multiple tumor types. Nat. Med. https://doi.org/10.1038/s41591-022-01799-y (2022). This study uses a computational model to repurpose a vast amount of single-cell transcriptomics data and identify biomarkers of tumour-resilient T cells and new therapeutic targets, such as FIBP, to potentiate cellular immunotherapies.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Gopalan, V. et al. A transcriptionally distinct subpopulation of healthy acinar cells exhibit features of pancreatic progenitors and PDAC. Cancer Res. 81, 3958–3970 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat. Med. 20, 548–554 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Heitzer, E., Haque, I. S., Roberts, C. E. S. & Speicher, M. R. Current and future perspectives of liquid biopsies in genomics-driven oncology. Nat. Rev. Genet. 20, 71–88 (2019).

    Article  CAS  PubMed  Google Scholar 

  58. Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning (Springer, 2001).

  59. Ma, J. et al. Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients. Nat. Cancer 2, 233–244 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Raghu, M., Zhang, C., Kleinberg, J. & Bengio, S. Transfusion: understanding transfer learning for medical imaging. Adv. Neural Inf. Process. Syst. 33, 3347–3357 (2019).

    Google Scholar 

  61. Zoph, B. et al. Rethinking pre-training and self-training. Adv. Neural Inf. Process. Syst. 34, 3833–3845 (2020).

    Google Scholar 

  62. Meier, F. A., Varney, R. C. & Zarbo, R. J. Study of amended reports to evaluate and improve surgical pathology processes. Adv. Anat. Pathol. 18, 406–413 (2011).

    Article  PubMed  Google Scholar 

  63. Nakhleh, R. E. Error reduction in surgical pathology. Arch. Pathol. Lab. Med. 130, 630–632 (2006).

    Article  PubMed  Google Scholar 

  64. Nakhleh, R. E. et al. Interpretive diagnostic error reduction in surgical pathology and cytology: guideline from the College of American Pathologists Pathology and Laboratory Quality Center and the Association of Directors of Anatomic and Surgical Pathology. Arch. Pathol. Lab. Med. 140, 29–40 (2016).

    Article  CAS  PubMed  Google Scholar 

  65. Raab, S. S. et al. The ‘Big Dog’ effect: variability assessing the causes of error in diagnoses of patients with lung cancer. J. Clin. Oncol. 24, 2808–2814 (2006).

    Article  PubMed  Google Scholar 

  66. Jiang, P., Sellers, W. R. & Liu, X. S. Big data approaches for modeling response and resistance to cancer drugs. Annu. Rev. Biomed. Data Sci. 1, 1–27 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  67. van’t Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).

    Article  Google Scholar 

  68. Sparano, J. A. et al. Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer. N. Engl. J. Med. 379, 111–121 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Kalinsky, K. et al. 21-gene assay to inform chemotherapy benefit in node-positive breast cancer. N. Engl. J. Med. 385, 2336–2347 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Cardoso, F. et al. 70-gene signature as an aid to treatment decisions in early-stage breast cancer. N. Engl. J. Med. 375, 717–729 (2016).

    Article  CAS  PubMed  Google Scholar 

  71. Filipits, M. et al. A new molecular predictor of distant recurrence in ER-positive, HER2-negative breast cancer adds independent information to conventional clinical risk factors. Clin. Cancer Res. 17, 6012–6020 (2011).

    Article  CAS  PubMed  Google Scholar 

  72. Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160–1167 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Early Breast Cancer Trialists’ Collaborative Group (EBCTCG). Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. Lancet 365, 1687–1717 (2005).

    Article  Google Scholar 

  74. You, Y. N., Rustin, R. B. & Sullivan, J. D. Oncotype DX® colon cancer assay for prediction of recurrence risk in patients with stage II and III colon cancer: a review of the evidence. Surg. Oncol. 24, 61–66 (2015).

    Article  PubMed  Google Scholar 

  75. Klein, E. A. et al. A 17-gene assay to predict prostate cancer aggressiveness in the context of Gleason grade heterogeneity, tumor multifocality, and biopsy undersampling. Eur. Urol. 66, 550–560 (2014).

    Article  PubMed  Google Scholar 

  76. Kratz, J. R. et al. A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international validation studies. Lancet 379, 823–832 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Beaubier, N. et al. Integrated genomic profiling expands clinical options for patients with cancer. Nat. Biotechnol. 37, 1351–1360 (2019).

    Article  CAS  PubMed  Google Scholar 

  78. Snyder, A. et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N. Engl. J. Med. 371, 2189–2199 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Van Allen, E. M. et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science 350, 207–211 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Rizvi, N. A. et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348, 124–128 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Li, M. Statistical methods for clinical validation of follow-on companion diagnostic devices via an external concordance study. Stat. Biopharm. Res. 8, 355–363 (2016).

    Article  Google Scholar 

  83. Litchfield, K. et al. Meta-analysis of tumor- and T cell-intrinsic mechanisms of sensitization to checkpoint inhibition. Cell 184, 596–614.e14 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Bielski, C. M. et al. Widespread selection for oncogenic mutant allele imbalance in cancer. Cancer Cell 34, 852–862.e4 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. El Tekle, G. et al. Co-occurrence and mutual exclusivity: what cross-cancer mutation patterns can tell us. Trends Cancer Res. 7, 823–836 (2021).

    Article  Google Scholar 

  86. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Cheng, Y. et al. Targeting epigenetic regulators for cancer therapy: mechanisms and advances in clinical trials. Signal Transduct. Target. Ther. 4, 62 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  88. Rodon, J. et al. Genomic and transcriptomic profiling expands precision cancer medicine: the WINTHER trial. Nat. Med. 25, 751–758 (2019). This study describes the WINTHER trial, which prospectively matched patients with advanced cancer to therapy on the basis of DNA sequencing or RNA expression data from tumour biopsies and concluded that both data types were of value for improving therapy recommendations.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Pleasance, E. et al. Whole genome and transcriptome analysis enhances precision cancer treatment options. Ann. Oncol. https://doi.org/10.1016/j.annonc.2022.05.522 (2022).

    Article  PubMed  Google Scholar 

  90. Massard, C. et al. High-throughput genomics and clinical outcome in hard-to-treat advanced cancers: results of the MOSCATO 01 trial. Cancer Discov. 7, 586–595 (2017).

    Article  CAS  PubMed  Google Scholar 

  91. Tuxen, I. V. et al. Copenhagen Prospective Personalized Oncology (CoPPO) — clinical utility of using molecular profiling to select patients to phase I trials. Clin. Cancer Res. 25, 1239–1247 (2019).

    Article  PubMed  Google Scholar 

  92. Horak, P. et al. Comprehensive genomic and transcriptomic analysis for guiding therapeutic decisions in patients with rare cancers. Cancer Discov. 11, 2780–2795 (2021).

    Article  CAS  PubMed  Google Scholar 

  93. Von Hoff, D. D. et al. Pilot study using molecular profiling of patients’ tumors to find potential targets and select treatments for their refractory cancers. J. Clin. Oncol. 28, 4877–4883 (2010).

    Article  Google Scholar 

  94. Kato, S. et al. Real-world data from a molecular tumor board demonstrates improved outcomes with a precision N-of-one strategy. Nat. Commun. 11, 4965 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Hoefflin, R. et al. Personalized clinical decision making through implementation of a molecular tumor board: a German single-center experience. JCO Precis. Oncol. 1–16 https://doi.org/10.1200/po.18.00105 (2018).

  96. Irmisch, A. et al. The Tumor Profiler Study: integrated, multi-omic, functional tumor profiling for clinical decision support. Cancer Cell 39, 288–293 (2021).

    Article  CAS  PubMed  Google Scholar 

  97. Cohen, Y. C. et al. Identification of resistance pathways and therapeutic targets in relapsed multiple myeloma patients through single-cell sequencing. Nat. Med. 27, 491–503 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Lee, J. S. et al. Synthetic lethality-mediated precision oncology via the tumor transcriptome. Cell 184, 2487–2502.e13 (2021). This study demonstrates that integrating information regarding synthetic lethal interactions with tumour transcriptomics profiles can accurately score drug-target importance and predict clinical outcomes for a broad category of anticancer treatments.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Zhang, B. et al. The tumor therapy landscape of synthetic lethality. Nat. Commun. 12, 1275 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Pathria, G. et al. Translational reprogramming marks adaptation to asparagine restriction in cancer. Nat. Cell Biol. 21, 1590–1603 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Feng, X. et al. A platform of synthetic lethal gene interaction networks reveals that the GNAQ uveal melanoma oncogene controls the Hippo pathway through FAK. Cancer Cell 35, (2019).

  102. Lee, J. S. et al. Harnessing synthetic lethality to predict the response to cancer treatment. Nat. Commun. 9, 2546 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  103. Cheng, K., Nair, N. U., Lee, J. S. & Ruppin, E. Synthetic lethality across normal tissues is strongly associated with cancer risk, onset, and tumor suppressor specificity. Sci. Adv. 7, eabc2100 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Sahu, A. D. et al. Genome-wide prediction of synthetic rescue mediators of resistance to targeted and immunotherapy. Mol. Syst. Biol. 15, e8323 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  105. Elemento, O., Leslie, C., Lundin, J. & Tourassi, G. Artificial intelligence in cancer research, diagnosis and therapy. Nat. Rev. Cancer 21, 747–752 (2021).

    Article  CAS  PubMed  Google Scholar 

  106. Raciti, P. et al. Novel artificial intelligence system increases the detection of prostate cancer in whole slide images of core needle biopsies. Mod. Pathol. 33, 2058–2066 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  107. Office of the Commissioner. FDA authorizes software that can help identify prostate cancer. https://www.fda.gov/news-events/press-announcements/fda-authorizes-software-can-help-identify-prostate-cancer (2021).

  108. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Litjens, G. et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience 7, giy065 (2018).

    Article  PubMed Central  Google Scholar 

  110. Ehteshami Bejnordi, B. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  111. Wulczyn, E. et al. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS ONE 15, e0233678 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

    Article  CAS  PubMed  Google Scholar 

  113. Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Ardila, D. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25, 954–961 (2019).

    Article  CAS  PubMed  Google Scholar 

  115. Hosny, A. et al. Deep learning for lung cancer prognostication: a retrospective multi-cohort radiomics study. PLoS Med. 15, e1002711 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  116. Zviran, A. et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat. Med. 26, 1114–1124 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Mathios, D. et al. Detection and characterization of lung cancer using cell-free DNA fragmentomes. Nat. Commun. 12, 5060 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Beshnova, D. et al. De novo prediction of cancer-associated T cell receptors for noninvasive cancer detection. Sci. Transl. Med. 12, eaaz3738 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Katzman, J. L. et al. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 18, 24 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  120. Ching, T., Zhu, X. & Garmire, L. X. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput. Biol. 14, e1006076 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  121. Kann, B. H., Hosny, A. & Hjwl, A. Artificial intelligence for clinical oncology. Cancer Cell 39, 916–927 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Kadir, T. & Brady, M. Saliency, scale and image description. Int. J. Comput. Vis. 45, 83–105 (2001).

    Article  Google Scholar 

  123. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. 2016 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) https://doi.org/10.1109/cvpr.2016.319https://www.computer.org/csdl/proceedings/cvpr/2016/12OmNqH9hnp (2016).

  124. Wulczyn, E. et al. Interpretable survival prediction for colorectal cancer using deep learning. NPJ Digit. Med. 4, 71 (2020). This study clusters similar image patches related to colorectal cancer survival prediction to reveal that high-risk survival predictions are associated with a tumour–adipose feature, characterized by poorly differentiated tumour cells adjacent to adipose tissue.

    Article  Google Scholar 

  125. Buel, G. R. & Walters, K. J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 29, 1–2 (2022).

    Article  CAS  PubMed  Google Scholar 

  126. US Food and Drug Administration. Evaluation of automatic class III designation for Paige Prostate. https://www.accessdata.fda.gov/cdrh_docs/reviews/DEN200080.pdf (2021).

  127. Calcoen, D., Elias, L. & Yu, X. What does it take to produce a breakthrough drug? Nat. Rev. Drug Discov. 14, 161–162 (2015).

    Article  PubMed  Google Scholar 

  128. Jayatunga, M. K. P., Xie, W., Ruder, L., Schulze, U. & Meier, C. AI in small-molecule drug discovery: a coming wave? Nat. Rev. Drug Discov. 21, 175–176 (2022).

    Article  CAS  PubMed  Google Scholar 

  129. Pushpakom, S. et al. Drug repurposing: progress, challenges and recommendations. Nat. Rev. Drug Discov. 18, 41–58 (2019).

    Article  CAS  PubMed  Google Scholar 

  130. Jahchan, N. S. et al. A drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumors. Cancer Discov. 3, 1364–1377 (2013).

    Article  CAS  PubMed  Google Scholar 

  131. Kuenzi, B. M. et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 38, 672–684.e6 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Ling, A. & Huang, R. S. Computationally predicting clinical drug combination efficacy with cancer cell line screens and independent drug action. Nat. Commun. 11, 5848 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Aissa, A. F. et al. Single-cell transcriptional changes associated with drug tolerance and response to combination therapies in cancer. Nat. Commun. 12, 1628 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Menden, M. P. et al. Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nat. Commun. 10, 2674 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  135. Carvalho, D. M. et al. Repurposing vandetanib plus everolimus for the treatment of ACVR1-mutant diffuse intrinsic pontine glioma. Cancer Discov. https://doi.org/10.1158/2159-8290.CD-20-1201 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  136. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019). This study describes a deep generative AI model, which enabled the design of new inhibitors of the receptor tyrosine kinase DDR1 by modelling molecule structures from a compound library, existing DDR1 inhibitors, non-kinase inhibitors and patented drugs.

    Article  CAS  PubMed  Google Scholar 

  137. Ruthotto, L. & Haber, E. An introduction to deep generative modeling. GAMM-Mitteilungen 44, e202100008 (2021).

    Article  Google Scholar 

  138. Wallach, I., Dzamba, M. & Heifets, A. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. Preprint at https://arxiv.org/abs/1510.02855 (2015).

  139. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 55, 263–274 (2015).

    Article  CAS  PubMed  Google Scholar 

  140. Dagogo-Jack, I. & Shaw, A. T. Tumour heterogeneity and resistance to cancer therapies. Nat. Rev. Clin. Oncol. 15, 81–94 (2018).

    Article  CAS  PubMed  Google Scholar 

  141. Bansal, M. et al. A community computational challenge to predict the activity of pairs of compounds. Nat. Biotechnol. 32, 1213–1222 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Ahmadi, S. et al. The landscape of receptor-mediated precision cancer combination therapy via a single-cell perspective. Nat. Commun. 13, 1613 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Eduati, F. et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33, 933–940 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Gayvert, K. M., Madhukar, N. S. & Elemento, O. A data-driven approach to predicting successes and failures of clinical trials. Cell Chem. Biol. 23, 1294–1301 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. McDermott, M. B. A. et al. Reproducibility in machine learning for health research: still a ways to go. Sci. Transl. Med. 13, eabb1655 (2021).

    Article  PubMed  Google Scholar 

  146. AP News. Caris Precision Oncology Alliance partners with the National Cancer Institute, part of the National Institutes of Health, to expand collaborative clinical research efforts. Associated Press https://apnews.com/press-release/pr-newswire/technology-science-business-health-cancer-221e9238956a7a4835be75cb65832573 (2021).

  147. Alvi, M. A., Wilson, R. H. & Salto-Tellez, M. Rare cancers: the greatest inequality in cancer research and oncology treatment. Br. J. Cancer 117, 1255–1257 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  148. Park, K. H. et al. Genomic landscape and clinical utility in Korean advanced pan-cancer patients from prospective clinical sequencing: K-MASTER program. Cancer Discov. 12, 938–948 (2022).

    Article  PubMed  Google Scholar 

  149. Bailey, M. H. et al. Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples. Nat. Commun. 11, 4748 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281.e7 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  151. Zare, F., Dow, M., Monteleone, N., Hosny, A. & Nabavi, S. An evaluation of copy number variation detection tools for cancer using whole exome sequencing data. BMC Bioinforma. 18, 286 (2017).

    Article  Google Scholar 

  152. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

  153. Gawad, C., Koh, W. & Quake, S. R. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17, 175–188 (2016).

    Article  CAS  PubMed  Google Scholar 

  154. Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  155. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Furey, T. S. ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat. Rev. Genet. 13, 840–852 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Rotem, A. et al. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nat. Biotechnol. 33, 1165–1172 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  158. Papanicolau-Sengos, A. & Aldape, K. DNA methylation profiling: an emerging paradigm for cancer diagnosis. Annu. Rev. Pathol. 17, 295–321 (2022).

    Article  PubMed  Google Scholar 

  159. Smallwood, S. A. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat. Methods 11, 817–820 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Cieślik, M. & Chinnaiyan, A. M. Cancer transcriptome profiling at the juncture of clinical translation. Nat. Rev. Genet. 19, 93–109 (2018).

    Article  PubMed  Google Scholar 

  161. Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  162. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Ramsköld, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  164. Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods 14, 395–398 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  165. Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature 596, 211–220 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  166. Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).

    Article  PubMed  Google Scholar 

  167. Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. Lee, J. H. et al. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues. Nat. Protoc. 10, 442–458 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. Ellis, M. J. et al. Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. Cancer Discov. 3, 1108–1112 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  170. Li, J. et al. TCPA: a resource for cancer functional proteomics data. Nat. Methods 10, 1046–1047 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  171. Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  172. Bendall, S. C. et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  173. Jackson, H. W. et al. The single-cell pathology landscape of breast cancer. Nature 578, 615–620 (2020).

    Article  CAS  PubMed  Google Scholar 

  174. Keren, L. et al. A structured tumor-immune microenvironment in triple negative breast cancer revealed by multiplexed ion beam imaging. Cell 174, 1373–1387.e19 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  175. Schürch, C. M. et al. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell 183, 838 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  176. Beckonert, O. et al. Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts. Nat. Protoc. 2, 2692–2703 (2007).

    Article  CAS  PubMed  Google Scholar 

  177. Jang, C., Chen, L. & Rabinowitz, J. D. Metabolomics and isotope tracing. Cell 173, 822–837 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  178. Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  179. Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Article  PubMed  Google Scholar 

  180. Fedorov, A. et al. NCI Imaging Data Commons. Cancer Res 81, 4188–4193 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  181. Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).

    Article  PubMed  Google Scholar 

  182. Goldman, M. J. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat. Biotechnol. 38, 675–678 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  183. Jiang, P., Freedman, M. L., Liu, J. S. & Liu, X. S. Inference of transcriptional regulation in cancers. Proc. Natl Acad. Sci. USA 112, 7731–7736 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  184. Sun, D. et al. TISCH: a comprehensive web resource enabling interactive single-cell transcriptome visualization of tumor microenvironment. Nucleic Acids Res. 49, D1420–D1430 (2021).

    Article  CAS  PubMed  Google Scholar 

  185. Kristiansen, G. Markers of clinical utility in the differential diagnosis and prognosis of prostate cancer. Mod. Pathol. 31, S143–S155 (2018).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors are supported by the intramural research budget of the US National Cancer Institute.

Author information

Authors and Affiliations

Authors

Contributions

P.J. and E.R. designed the scope and structure of the Review, assembled write-up components and finalized the manuscript. C.S. wrote the text on tumour evolution and heterogeneity. S.H. wrote the text on transcriptional dysregulation. P.J. wrote the sections related to spatial genomics and artificial intelligence. P.J., E.R. and K.A. wrote the section on cancer diagnosis and treatment decisions. S.S. and P.J. prepared Tables 14.

Corresponding authors

Correspondence to Peng Jiang or Eytan Ruppin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Cancer thanks Itai Yanai, Anjali Rao and the other, anonymous, reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Array Express: https://www.ebi.ac.uk/arrayexpress/

CAMELYON: https://camelyon17.grand-challenge.org/

cBioportal: https://www.cbioportal.org/

CCLE: https://depmap.org/portal/ccle/

CPTAC: https://proteomics.cancer.gov/data-portal

CytoSig: https://cytosig.ccr.cancer.gov/

DepMap: https://depmap.org/portal

DNA sequencing costs: https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data

DrugCombDB: http://drugcombdb.denglab.org/

FDC: https://curate.ccr.cancer.gov/

GDC: https://gdc.cancer.gov/

GENIE: https://www.aacr.org/professionals/research/aacr-project-genie

GEO: https://www.ncbi.nlm.nih.gov/geo

Human Protein Atlas: https://www.proteinatlas.org/humanproteome/pathology

ICGC: https://dcc.icgc.org/

IDC: https://datacommons.cancer.gov/repository/imaging-data-commons

LINCS: https://clue.io/

PCAWG: https://dcc.icgc.org/pcawg

PRECOG: https://precog.stanford.edu/

RABIT: http://rabit.dfci.harvard.edu/

TARGET: https://ocg.cancer.gov/programs/target/data-matrix

TCIA: https://www.cancerimagingarchive.net/

TCGA: https://gdc.cancer.gov/

TIDE: http://tide.dfci.harvard.edu/

TISCH: http://tisch.comp-genomics.org/

Tres: https://resilience.ccr.cancer.gov/

UCSC Xena: https://xena.ucsc.edu/

Glossary

Few-shot learning

A machine learning method that classifies new data using only a few training samples by transferring knowledge from large, related datasets.

Saliency map

A map of important image locations that support machine learning outputs.

Class activation map

A coarse-resolution map of important image regions for predicting a specific class using activations and gradients in the final convolutional layer.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, P., Sinha, S., Aldape, K. et al. Big data in basic and translational cancer research. Nat Rev Cancer 22, 625–639 (2022). https://doi.org/10.1038/s41568-022-00502-0

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41568-022-00502-0

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer