Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions

Abstract

The dense network of interconnected cellular signalling responses that are quantifiable in peripheral immune cells provides a wealth of actionable immunological insights. Although high-throughput single-cell profiling techniques, including polychromatic flow and mass cytometry, have matured to a point that enables detailed immune profiling of patients in numerous clinical settings, the limited cohort size and high dimensionality of data increase the possibility of false-positive discoveries and model overfitting. We introduce a generalizable machine learning platform, the immunological Elastic-Net (iEN), which incorporates immunological knowledge directly into the predictive models. Importantly, the algorithm maintains the exploratory nature of the high-dimensional dataset, allowing for the inclusion of immune features with strong predictive capabilities even if not consistent with prior knowledge. In three independent studies our method demonstrates improved predictions for clinically relevant outcomes from mass cytometry data generated from whole blood, as well as a large simulated dataset. The iEN is available under an open-source licence.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The immunological Elastic-Net analysis pipeline.
Fig. 2: Integration of immunological priors.
Fig. 3: Prior knowledge and sparsification.
Fig. 4: Incorporation of prior knowledge improves predictions in two clinical studies and a simulated experiment.
Fig. 5: iEN is robust to errors in the prior knowledge tensor.

Similar content being viewed by others

Data availability

Longitudinal term pregnancy raw data, processed data and source code for reproduction of the results are publicly available at http://flowrepository.org/id/FR-FCM-ZY3Q and http://flowrepository.org/id/FR-FCM-ZY3R for the original and validation studies, respectively. Similarly, chronic periodontitis raw data, processed data and source code for reproduction of the results are publicly available at https://flowrepository.org/id/FR-FCM-ZYT6.

Code availability

The iEN source code as well as scripts for reproduction of the results are available through: https://nalab.stanford.edu/immunological-elastic-net/ and https://github.com/Teculos/immunological-EN under an MIT licence with https://doi.org/10.5281/zenodo.3885868.

References

  1. Davis, M. M., Tato, C. M. & Furman, D. Systems immunology: just getting started. Nat. Immunol. 18, 725–732 (2017).

    Google Scholar 

  2. Rieckmann, J. C. et al. Social network architecture of human immune cells unveiled by quantitative proteomics. Nat. Immunol. 18, 583–593 (2017).

    Google Scholar 

  3. Mathew, D. et al. Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications. Science (2020); https://doi.org/10.1126/science.abc8511.

  4. Wilk, A. J. et al. A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat. Med. 26, 1070–1076 (2020).

    Google Scholar 

  5. Porter, D. L., Levine, B. L., Kalos, M., Bagg, A. & June, C. H. Chimeric antigen receptor-modified T cells in chronic lymphoid leukemia. New Engl. J. Med. 365, 725–733 (2011).

    Google Scholar 

  6. Ryu, J. K. et al. Fibrin-targeting immunotherapy protects against neuroinflammation and neurodegeneration. Nat. Immunol. 19, 1212–1223 (2018).

    Google Scholar 

  7. Saphire, E. O., Schendel, S. L., Gunn, B. M., Milligan, J. C. & Alter, G. Antibody-mediated protection against Ebola virus. Nat. Immunol. 19, 1169–1178 (2018).

    Google Scholar 

  8. Krutzik, P. O. & Nolan, G. P. Intracellular phospho-protein staining techniques for flow cytometry: monitoring single cell signaling events. Cytometry A 55, 61–70 (2003).

    Google Scholar 

  9. Nettey, L., Giles, A. J. & Chattopadhyay, P. K. OMIP-050: a 28-color/30-parameter fluorescence flow cytometry panel to enumerate and characterize cells expressing a wide array of immune checkpoint molecules. Cytometry A 93, 1094–1096 (2018).

    Google Scholar 

  10. Chattopadhyay, P. K., Winters, A. F., Lomas, W. E., Laino, A. S. & Woods, D. M. High-parameter single-cell analysis. Annu. Rev. Anal. Chem. 12, 411–430 (2019).

    Google Scholar 

  11. Bandura, D. R. et al. Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Anal. Chem. 81, 6813–6822 (2009).

    Google Scholar 

  12. Bendall, S. C. et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011).

    Google Scholar 

  13. Finak, G. et al. Standardizing flow cytometry immunophenotyping analysis from the human immunophenotyping consortium. Sci. Rep. 6, 20686 (2016).

    Google Scholar 

  14. Newell, E. W. & Cheng, Y. Mass cytometry: blessed with the curse of dimensionality. Nat. Immunol. 17, 890–895 (2016).

    Google Scholar 

  15. Jain, A. K., Duin, P. W. & Mao, Jianchang Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4–37 (2000).

    Google Scholar 

  16. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction 2nd edn (Springer, 2016).

  17. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Google Scholar 

  18. Li, J., Liu, L., Le, T. D. & Liu, J. Accurate data-driven prediction does not mean high reproducibility. Nat. Mach. Intell. 2, 13–15 (2020).

    Google Scholar 

  19. Krupka, E. & Tishby, N. Incorporating prior knowledge on features into learning. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (eds Meila, M. & Shen, X.) Vol. 2, 227–234 (PMLR, 2007).

  20. Mollaysa, A., Strasser, P. & Kalousis, A. Regularising non-linear models using feature side-information. In Proceedings of the 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) Vol. 70, 2508–2517 (PMLR, 2017).

  21. Tai, F. & Pan, W. Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data. Bioinformatics 23, 3170–3177 (2007).

    Google Scholar 

  22. Bergersen, L. C., Glad, I. K. & Lyng, H. Weighted LASSO with data integration. Stat. Appl. Genet. Mol. Biol. 10 (2011); https://doi.org/10.2202/1544-6115.1703

  23. Handl, L., Jalali, A., Scherer, M., Eggeling, R. & Pfeifer, N. Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data. Bioinformatics 35, i154–i163 (2019).

    Google Scholar 

  24. Zuo, Y., Yu, G. & Ressom, H. W. Integrating prior biological knowledge and graphical LASSO for network inference. In 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 1543–1547 (IEEE, 2015); https://doi.org/10.1109/BIBM.2015.7359905

  25. Guan, X. & Liu, L. Know-GRRF: domain-knowledge informed biomarker discovery with random forests. In Bioinformatics and Biomedical Engineering: 6th International Work-Conference, IWBBIO 2018, Granada, Spain, 2018, Proceedings, Part II (eds Rojas, I. & Ortuño, F.) Vol. 10814, 3–14 (2018).

  26. Shi, J., Zhang, S. & Qiu, L. Credit scoring by feature-weighted support vector machines. J. Zhejiang Univ. Sci. C 14, 197–204 (2013).

    Google Scholar 

  27. Sarafianos, N., Vrigkas, M. & Kakadiaris, I. A. Adaptive SVM+: learning with privileged information for domain adaptation. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) 2637–2644 (IEEE, 2017); https://doi.org/10.1109/ICCVW.2017.313

  28. Xing, H., Ha, M., Hu, B. & Tian, D. Linear feature-weighted support vector machine. Fuzzy Inf. Eng. 1, 289–305 (2009).

    MATH  Google Scholar 

  29. Bhattacharya, G., Ghosh, K. & Chowdhury, A. S. Granger causality driven AHP for feature weighted knn. Pattern Recogn. 66, 425–436 (2017).

    Google Scholar 

  30. Mollaysa, A., Kalousis, A., Bruno, E. & Diephuis, M. Learning to augment with feature side-information. In Proceedings of the 11th Asian Conference on Machine Learning (PMLR) Vol. 101, 173–187 (PMLR, 2019).

  31. Ye, Y., Li, H., Deng, X. & Huang, J. Z. Feature Weighting Random Forest for Detection of Hidden Web Search Interfaces (ACL, 2008); https://www.aclweb.org/anthology/O08-6001.pdf

  32. Zhang, W., Chien, J., Yong, J. & Kuang, R. Network-based machine learning and graph theory algorithms for precision oncology. NPJ Precis. Oncol. 1, 25 (2017).

    Google Scholar 

  33. Sinha, S. Integration of prior biological knowledge and epigenetic information enhances the prediction accuracy of the Bayesian Wnt pathway. Integr. Biol. (Camb.) 6, 1034–1048 (2014).

    Google Scholar 

  34. Fabris, F. & Freitas, A. A. New KEGG pathway-based interpretable features for classifying ageing-related mouse proteins. Bioinformatics 32, 2988–2995 (2016).

    Google Scholar 

  35. Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B (2005); https://doi.org/10.1111/j.1467-9868.2005.00527.x

  36. Hegre, H., Metternich, N. W., Nygård, H. M. & Wucherpfennig, J. Introduction. J. Peace Res. 54, 113–124 (2017).

    Google Scholar 

  37. Madhukar, N. S. et al. A Bayesian machine learning approach for drug target identification using diverse data types. Nat. Commun. 10, 5221 (2019).

    Google Scholar 

  38. Sharpless, N. E. & Depinho, R. A. The mighty mouse: genetically engineered mouse models in cancer drug development. Nat. Rev. Drug Discov. 5, 741–754 (2006).

    Google Scholar 

  39. Zhu, F., Nair, R. R., Fisher, E. M. C. & Cunningham, T. J. Humanising the mouse genome piece by piece. Nat. Commun. 10, 1845 (2019).

    Google Scholar 

  40. Meier, L., Van De Geer, S. & Bühlmann, P. The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008).

    MathSciNet  MATH  Google Scholar 

  41. Velten, B. & Huber, W. Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes. Biostatistics (2019); https://doi.org/10.1093/biostatistics/kxz034.

  42. Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S (Springer, 2002); https://doi.org/10.1007/978-0-387-21706-2

  43. Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).

    MATH  Google Scholar 

  44. Breiman, L. Random forests. Mach. Learning 45, 5–32 (2001).

    MATH  Google Scholar 

  45. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996).

    MathSciNet  MATH  Google Scholar 

  46. van der Laan, M. J., Polley, E. C. & Hubbard, A. E. Super learner. Stat. Appl. Genet. Mol. Biol. 6, 25 (2007).

    MathSciNet  MATH  Google Scholar 

  47. Silvennoinen, O., Ihle, J. N., Schlessinger, J. & Levy, D. E. Interferon-induced nuclear signalling by Jak protein tyrosine kinases. Nature 366, 583–585 (1993).

    Google Scholar 

  48. Ivashkiv, L. B. & Donlin, L. T. Regulation of type I interferon responses. Nat. Rev. Immunol. 14, 36–49 (2014).

    Google Scholar 

  49. Boyman, O. & Sprent, J. The role of interleukin-2 during homeostasis and activation of the immune system. Nat. Rev. Immunol. 12, 180–190 (2012).

    Google Scholar 

  50. Hunter, C. A. & Jones, S. A. IL-6 as a keystone cytokine in health and disease. Nat. Immunol. 16, 448–457 (2015).

    Google Scholar 

  51. Beutler, B. A. TLRs and innate immunity. Blood 113, 1399–1407 (2009).

    Google Scholar 

  52. Park, J. M. et al. Signaling pathways and genes that inhibit pathogen-induced macrophage apoptosis–CREB and NF-kB as key regulators. Immunity 23, 319–329 (2005).

    Google Scholar 

  53. Kadowaki, N. et al. Subsets of human dendritic cell precursors express different toll-like receptors and respond to different microbial antigens. J. Exp. Med. 194, 863–869 (2001).

    Google Scholar 

  54. Adib-Conquy, M., Scott-Algara, D., Cavaillon, J.-M. & Souza-Fonseca-Guimaraes, F. TLR-mediated activation of NK cells and their role in bacterial/viral immune responses in mammals. Immunol. Cell Biol. 92, 256–262 (2014).

    Google Scholar 

  55. Caramalho, I. et al. Regulatory T cells selectively express Toll-like receptors and are activated by lipopolysaccharide. J. Exp. Med. 197, 403–411 (2003).

    Google Scholar 

  56. Aghaeepour, N. et al. An immune clock of human pregnancy.Sci. Immunol. 2, eaan2946 (2017).

    Google Scholar 

  57. Deshmukh, H. & Way, S. S. Immunological basis for recurrent fetal loss and pregnancy complications. Annu. Rev. Pathol. 14, 185–210 (2018).

    Google Scholar 

  58. Arck, P. C. & Hecher, K. Fetomaternal immune cross-talk and its consequences for maternal and offspring’s health. Nat. Med. 19, 548–556 (2013).

    Google Scholar 

  59. Romero, R., Dey, S. K. & Fisher, S. J. Preterm labor: one syndrome, many causes. Science 345, 760–765 (2014).

    Google Scholar 

  60. Paquette, A. G., Hood, L., Price, N. D. & Sadovsky, Y. Deep phenotyping during pregnancy for predictive and preventive medicine. Sci. Transl. Med. 12, eaay1059 (2020).

    Google Scholar 

  61. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    MATH  Google Scholar 

  62. Pihlstrom, B. L., Michalowicz, B. S. & Johnson, N. W. Periodontal diseases. Lancet 366, 1809–1820 (2005).

    Google Scholar 

  63. Eke, P. I. et al. Update on prevalence of periodontitis in adults in the United States: NHANES 2009 to 2012. J. Periodontol. 86, 611–622 (2015).

    Google Scholar 

  64. Kassebaum, N. J. et al. Global burden of severe periodontitis in 1990–2010: a systematic review and meta-regression. J. Dent. Res. 93, 1045–1053 (2014).

    Google Scholar 

  65. Hanley, J. A. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982).

    Google Scholar 

  66. Meyer, D., Dimitriadou, E., Hornik, K. & Leisch, F. Package e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071) (TU Wien, 2019).

  67. Littmann, M. et al. Validity of machine learning in biology and medicine increased through collaborations across fields of expertise. Nat. Mach. Intell. (2020); https://doi.org/10.1038/s42256-019-0139-8.

  68. Vapnik, V. & Vashist, A. A new learning paradigm: learning using privileged information. Neural Netw. 22, 544–557 (2009).

    MATH  Google Scholar 

  69. Kveler, K. et al. Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed. Nat. Biotechnol. 36, 651–659 (2018).

    Google Scholar 

  70. Aghaeepour, N. et al. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 10, 228–238 (2013).

    Google Scholar 

  71. Lux, M. et al. flowLearn: fast and precise identification and quality checking of cell populations in flow cytometry. Bioinformatics 34, 2245–2253 (2018).

    Google Scholar 

  72. Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).

    Google Scholar 

  73. Van Gassen, S. et al. FlowSOM: using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 87, 636–645 (2015).

    Google Scholar 

  74. Qiu, P. et al. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat. Biotechnol. 29, 886–891 (2011).

    Google Scholar 

  75. Samusik, N., Good, Z., Spitzer, M. H., Davis, K. L. & Nolan, G. P. Automated mapping of phenotype space with single-cell data. Nat. Methods 13, 493–496 (2016).

    Google Scholar 

  76. Stanley, N. et al. VoPo leverages cellular heterogeneity for predictive modeling of single-cell data. Nat. Commun. 11, 3738 (2020).

    Google Scholar 

  77. Ding, X. et al. Prior knowledge-based deep learning method for indoor object recognition and application. Syst. Sci. Control Eng. 6, 249–257 (2018).

    Google Scholar 

  78. Xu, Z., Liu, B., Wang, B., Sun, C. & Wang, X. Incorporating loose-structured knowledge into conversation modeling via recall-gate LSTM. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN) 3506–3513 (IEEE, 2017); https://doi.org/10.1109/IJCNN.2017.7966297

  79. Diligenti, M., Roychowdhury, S. & Gori, M. Integrating prior knowledge into deep learning. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) 920–923 (IEEE, 2017); https://doi.org/10.1109/ICMLA.2017.00-37

  80. Ghaemi, M. S. et al. Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy. Bioinformatics 35, 95–103 (2019).

    Google Scholar 

  81. Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970).

    MATH  Google Scholar 

  82. Hans, C. Elastic net regression modeling with the orthant normal prior. J. Am. Stat. Assoc. 106, 1383–1393 (2011).

    MathSciNet  MATH  Google Scholar 

  83. LeBeau, B. simglm: Simulate Models Based on the Generalized Linear Model (CRAN, 2019).

  84. Zunder, E. R. et al. Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm. Nat. Protoc. 10, 316–333 (2015).

    Google Scholar 

  85. Finck, R. et al. Normalization of mass cytometry data with bead standards. Cytometry A 83, 483–494 (2013).

    Google Scholar 

  86. Pacella, I. et al. IFN-α promotes rapid human Treg contraction and late Th1-like Treg decrease. J. Leukoc. Biol. 100, 613–623 (2016).

    Google Scholar 

  87. Metidji, A. et al. IFN-α/β receptor signaling promotes regulatory T cell development and function under stress conditions. J. Immunol. 194, 4265–4276 (2015).

    Google Scholar 

  88. Scheller, J., Chalaris, A., Schmidt-Arras, D. & Rose-John, S. The pro- and anti-inflammatory properties of the cytokine interleukin-6. Biochim. Biophys. Acta 1813, 878–888 (2011).

    Google Scholar 

  89. Heinrich, P. C. et al. Principles of interleukin (IL)-6-type cytokine signalling and its regulation. Biochem. J. 374, 1–20 (2003).

    Google Scholar 

Download references

Acknowledgements

This study was supported by the March of Dimes Prematurity Research Center at Stanford (22-FY18-808), the Bill and Melinda Gates Foundation (OPP1112382, OPP1189911 and OPP1113682), the Department of Anesthesiology, Perioperative and Pain Medicine at Stanford University, the Robertson Foundation, NIH (1R01HL13984401A1, R35GM138353), the American Heart Association (18IPA34170507 and 19PABHI34580007), the Food and Drug Administration (FDA; HHSF223201610018C), the Burroughs Wellcome Fund (1019816) and the National Institutes of Health (NIH; R01AG058417, R01HL13984401, R21DE02772801 and R61NS114926). N.A., D.R.M. and G.P.N. were supported by US FDA contract no. HHSF223201610018C. N.S. was supported by a Stanford Immunology Training Grant (5 T32 AI07290-33). B.G. was supported by the NIH (K23GM111657) and the Doris Duke Charitable Foundation (2018100). D.G. and B.G. were supported by NIH R21DE02772801. L.P. and X.H. were supported by the Stanford Maternal and Child Health Research Institute. The authors are solely responsible for the content of this Article, which does not necessarily represent the official views of the US Department of Health and Human Services (HHS).

Author information

Authors and Affiliations

Authors

Contributions

A.C. conducted the data analysis and software development, generated figures and contributed to writing the manuscript. A.S.T. performed experiments, processed data, wrote the manuscript and produced figures. N.S., M.B., M.S.G., R.F., H.N., T.P., I.M., A.L.C., C.E. and M.X. contributed to the analysis plan, figure design and manuscript revisions. E.G., L.P., X.H., I.A.S., K.A. and D.G. designed and performed experiments. D.R.M., A.T., G.M.S., D.K.S., S.B., K.L.D., W.F., G.P.N., T.H. and R.T. contributed to the design and evaluation of the algorithm, and edited the manuscript. M.S.A. and B.G. coordinated the effort to collect and analyse biological data and interpreted the results, and contributed to writing the manuscript. N.A. conceived, designed and coordinated the study, interpreted data, and contributed to writing the manuscript. All authors read and approved the Article.

Corresponding author

Correspondence to Nima Aghaeepour.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Culos, A., Tsai, A.S., Stanley, N. et al. Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions. Nat Mach Intell 2, 619–628 (2020). https://doi.org/10.1038/s42256-020-00232-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-020-00232-8

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing