Abstract
The dense network of interconnected cellular signalling responses that are quantifiable in peripheral immune cells provides a wealth of actionable immunological insights. Although high-throughput single-cell profiling techniques, including polychromatic flow and mass cytometry, have matured to a point that enables detailed immune profiling of patients in numerous clinical settings, the limited cohort size and high dimensionality of data increase the possibility of false-positive discoveries and model overfitting. We introduce a generalizable machine learning platform, the immunological Elastic-Net (iEN), which incorporates immunological knowledge directly into the predictive models. Importantly, the algorithm maintains the exploratory nature of the high-dimensional dataset, allowing for the inclusion of immune features with strong predictive capabilities even if not consistent with prior knowledge. In three independent studies our method demonstrates improved predictions for clinically relevant outcomes from mass cytometry data generated from whole blood, as well as a large simulated dataset. The iEN is available under an open-source licence.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
Longitudinal term pregnancy raw data, processed data and source code for reproduction of the results are publicly available at http://flowrepository.org/id/FR-FCM-ZY3Q and http://flowrepository.org/id/FR-FCM-ZY3R for the original and validation studies, respectively. Similarly, chronic periodontitis raw data, processed data and source code for reproduction of the results are publicly available at https://flowrepository.org/id/FR-FCM-ZYT6.
Code availability
The iEN source code as well as scripts for reproduction of the results are available through: https://nalab.stanford.edu/immunological-elastic-net/ and https://github.com/Teculos/immunological-EN under an MIT licence with https://doi.org/10.5281/zenodo.3885868.
References
Davis, M. M., Tato, C. M. & Furman, D. Systems immunology: just getting started. Nat. Immunol. 18, 725–732 (2017).
Rieckmann, J. C. et al. Social network architecture of human immune cells unveiled by quantitative proteomics. Nat. Immunol. 18, 583–593 (2017).
Mathew, D. et al. Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications. Science (2020); https://doi.org/10.1126/science.abc8511.
Wilk, A. J. et al. A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat. Med. 26, 1070–1076 (2020).
Porter, D. L., Levine, B. L., Kalos, M., Bagg, A. & June, C. H. Chimeric antigen receptor-modified T cells in chronic lymphoid leukemia. New Engl. J. Med. 365, 725–733 (2011).
Ryu, J. K. et al. Fibrin-targeting immunotherapy protects against neuroinflammation and neurodegeneration. Nat. Immunol. 19, 1212–1223 (2018).
Saphire, E. O., Schendel, S. L., Gunn, B. M., Milligan, J. C. & Alter, G. Antibody-mediated protection against Ebola virus. Nat. Immunol. 19, 1169–1178 (2018).
Krutzik, P. O. & Nolan, G. P. Intracellular phospho-protein staining techniques for flow cytometry: monitoring single cell signaling events. Cytometry A 55, 61–70 (2003).
Nettey, L., Giles, A. J. & Chattopadhyay, P. K. OMIP-050: a 28-color/30-parameter fluorescence flow cytometry panel to enumerate and characterize cells expressing a wide array of immune checkpoint molecules. Cytometry A 93, 1094–1096 (2018).
Chattopadhyay, P. K., Winters, A. F., Lomas, W. E., Laino, A. S. & Woods, D. M. High-parameter single-cell analysis. Annu. Rev. Anal. Chem. 12, 411–430 (2019).
Bandura, D. R. et al. Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Anal. Chem. 81, 6813–6822 (2009).
Bendall, S. C. et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011).
Finak, G. et al. Standardizing flow cytometry immunophenotyping analysis from the human immunophenotyping consortium. Sci. Rep. 6, 20686 (2016).
Newell, E. W. & Cheng, Y. Mass cytometry: blessed with the curse of dimensionality. Nat. Immunol. 17, 890–895 (2016).
Jain, A. K., Duin, P. W. & Mao, Jianchang Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4–37 (2000).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction 2nd edn (Springer, 2016).
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Li, J., Liu, L., Le, T. D. & Liu, J. Accurate data-driven prediction does not mean high reproducibility. Nat. Mach. Intell. 2, 13–15 (2020).
Krupka, E. & Tishby, N. Incorporating prior knowledge on features into learning. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (eds Meila, M. & Shen, X.) Vol. 2, 227–234 (PMLR, 2007).
Mollaysa, A., Strasser, P. & Kalousis, A. Regularising non-linear models using feature side-information. In Proceedings of the 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) Vol. 70, 2508–2517 (PMLR, 2017).
Tai, F. & Pan, W. Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data. Bioinformatics 23, 3170–3177 (2007).
Bergersen, L. C., Glad, I. K. & Lyng, H. Weighted LASSO with data integration. Stat. Appl. Genet. Mol. Biol. 10 (2011); https://doi.org/10.2202/1544-6115.1703
Handl, L., Jalali, A., Scherer, M., Eggeling, R. & Pfeifer, N. Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data. Bioinformatics 35, i154–i163 (2019).
Zuo, Y., Yu, G. & Ressom, H. W. Integrating prior biological knowledge and graphical LASSO for network inference. In 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 1543–1547 (IEEE, 2015); https://doi.org/10.1109/BIBM.2015.7359905
Guan, X. & Liu, L. Know-GRRF: domain-knowledge informed biomarker discovery with random forests. In Bioinformatics and Biomedical Engineering: 6th International Work-Conference, IWBBIO 2018, Granada, Spain, 2018, Proceedings, Part II (eds Rojas, I. & Ortuño, F.) Vol. 10814, 3–14 (2018).
Shi, J., Zhang, S. & Qiu, L. Credit scoring by feature-weighted support vector machines. J. Zhejiang Univ. Sci. C 14, 197–204 (2013).
Sarafianos, N., Vrigkas, M. & Kakadiaris, I. A. Adaptive SVM+: learning with privileged information for domain adaptation. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) 2637–2644 (IEEE, 2017); https://doi.org/10.1109/ICCVW.2017.313
Xing, H., Ha, M., Hu, B. & Tian, D. Linear feature-weighted support vector machine. Fuzzy Inf. Eng. 1, 289–305 (2009).
Bhattacharya, G., Ghosh, K. & Chowdhury, A. S. Granger causality driven AHP for feature weighted knn. Pattern Recogn. 66, 425–436 (2017).
Mollaysa, A., Kalousis, A., Bruno, E. & Diephuis, M. Learning to augment with feature side-information. In Proceedings of the 11th Asian Conference on Machine Learning (PMLR) Vol. 101, 173–187 (PMLR, 2019).
Ye, Y., Li, H., Deng, X. & Huang, J. Z. Feature Weighting Random Forest for Detection of Hidden Web Search Interfaces (ACL, 2008); https://www.aclweb.org/anthology/O08-6001.pdf
Zhang, W., Chien, J., Yong, J. & Kuang, R. Network-based machine learning and graph theory algorithms for precision oncology. NPJ Precis. Oncol. 1, 25 (2017).
Sinha, S. Integration of prior biological knowledge and epigenetic information enhances the prediction accuracy of the Bayesian Wnt pathway. Integr. Biol. (Camb.) 6, 1034–1048 (2014).
Fabris, F. & Freitas, A. A. New KEGG pathway-based interpretable features for classifying ageing-related mouse proteins. Bioinformatics 32, 2988–2995 (2016).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B (2005); https://doi.org/10.1111/j.1467-9868.2005.00527.x
Hegre, H., Metternich, N. W., Nygård, H. M. & Wucherpfennig, J. Introduction. J. Peace Res. 54, 113–124 (2017).
Madhukar, N. S. et al. A Bayesian machine learning approach for drug target identification using diverse data types. Nat. Commun. 10, 5221 (2019).
Sharpless, N. E. & Depinho, R. A. The mighty mouse: genetically engineered mouse models in cancer drug development. Nat. Rev. Drug Discov. 5, 741–754 (2006).
Zhu, F., Nair, R. R., Fisher, E. M. C. & Cunningham, T. J. Humanising the mouse genome piece by piece. Nat. Commun. 10, 1845 (2019).
Meier, L., Van De Geer, S. & Bühlmann, P. The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008).
Velten, B. & Huber, W. Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes. Biostatistics (2019); https://doi.org/10.1093/biostatistics/kxz034.
Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S (Springer, 2002); https://doi.org/10.1007/978-0-387-21706-2
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Breiman, L. Random forests. Mach. Learning 45, 5–32 (2001).
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996).
van der Laan, M. J., Polley, E. C. & Hubbard, A. E. Super learner. Stat. Appl. Genet. Mol. Biol. 6, 25 (2007).
Silvennoinen, O., Ihle, J. N., Schlessinger, J. & Levy, D. E. Interferon-induced nuclear signalling by Jak protein tyrosine kinases. Nature 366, 583–585 (1993).
Ivashkiv, L. B. & Donlin, L. T. Regulation of type I interferon responses. Nat. Rev. Immunol. 14, 36–49 (2014).
Boyman, O. & Sprent, J. The role of interleukin-2 during homeostasis and activation of the immune system. Nat. Rev. Immunol. 12, 180–190 (2012).
Hunter, C. A. & Jones, S. A. IL-6 as a keystone cytokine in health and disease. Nat. Immunol. 16, 448–457 (2015).
Beutler, B. A. TLRs and innate immunity. Blood 113, 1399–1407 (2009).
Park, J. M. et al. Signaling pathways and genes that inhibit pathogen-induced macrophage apoptosis–CREB and NF-kB as key regulators. Immunity 23, 319–329 (2005).
Kadowaki, N. et al. Subsets of human dendritic cell precursors express different toll-like receptors and respond to different microbial antigens. J. Exp. Med. 194, 863–869 (2001).
Adib-Conquy, M., Scott-Algara, D., Cavaillon, J.-M. & Souza-Fonseca-Guimaraes, F. TLR-mediated activation of NK cells and their role in bacterial/viral immune responses in mammals. Immunol. Cell Biol. 92, 256–262 (2014).
Caramalho, I. et al. Regulatory T cells selectively express Toll-like receptors and are activated by lipopolysaccharide. J. Exp. Med. 197, 403–411 (2003).
Aghaeepour, N. et al. An immune clock of human pregnancy.Sci. Immunol. 2, eaan2946 (2017).
Deshmukh, H. & Way, S. S. Immunological basis for recurrent fetal loss and pregnancy complications. Annu. Rev. Pathol. 14, 185–210 (2018).
Arck, P. C. & Hecher, K. Fetomaternal immune cross-talk and its consequences for maternal and offspring’s health. Nat. Med. 19, 548–556 (2013).
Romero, R., Dey, S. K. & Fisher, S. J. Preterm labor: one syndrome, many causes. Science 345, 760–765 (2014).
Paquette, A. G., Hood, L., Price, N. D. & Sadovsky, Y. Deep phenotyping during pregnancy for predictive and preventive medicine. Sci. Transl. Med. 12, eaay1059 (2020).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Pihlstrom, B. L., Michalowicz, B. S. & Johnson, N. W. Periodontal diseases. Lancet 366, 1809–1820 (2005).
Eke, P. I. et al. Update on prevalence of periodontitis in adults in the United States: NHANES 2009 to 2012. J. Periodontol. 86, 611–622 (2015).
Kassebaum, N. J. et al. Global burden of severe periodontitis in 1990–2010: a systematic review and meta-regression. J. Dent. Res. 93, 1045–1053 (2014).
Hanley, J. A. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982).
Meyer, D., Dimitriadou, E., Hornik, K. & Leisch, F. Package e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071) (TU Wien, 2019).
Littmann, M. et al. Validity of machine learning in biology and medicine increased through collaborations across fields of expertise. Nat. Mach. Intell. (2020); https://doi.org/10.1038/s42256-019-0139-8.
Vapnik, V. & Vashist, A. A new learning paradigm: learning using privileged information. Neural Netw. 22, 544–557 (2009).
Kveler, K. et al. Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed. Nat. Biotechnol. 36, 651–659 (2018).
Aghaeepour, N. et al. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 10, 228–238 (2013).
Lux, M. et al. flowLearn: fast and precise identification and quality checking of cell populations in flow cytometry. Bioinformatics 34, 2245–2253 (2018).
Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).
Van Gassen, S. et al. FlowSOM: using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 87, 636–645 (2015).
Qiu, P. et al. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat. Biotechnol. 29, 886–891 (2011).
Samusik, N., Good, Z., Spitzer, M. H., Davis, K. L. & Nolan, G. P. Automated mapping of phenotype space with single-cell data. Nat. Methods 13, 493–496 (2016).
Stanley, N. et al. VoPo leverages cellular heterogeneity for predictive modeling of single-cell data. Nat. Commun. 11, 3738 (2020).
Ding, X. et al. Prior knowledge-based deep learning method for indoor object recognition and application. Syst. Sci. Control Eng. 6, 249–257 (2018).
Xu, Z., Liu, B., Wang, B., Sun, C. & Wang, X. Incorporating loose-structured knowledge into conversation modeling via recall-gate LSTM. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN) 3506–3513 (IEEE, 2017); https://doi.org/10.1109/IJCNN.2017.7966297
Diligenti, M., Roychowdhury, S. & Gori, M. Integrating prior knowledge into deep learning. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) 920–923 (IEEE, 2017); https://doi.org/10.1109/ICMLA.2017.00-37
Ghaemi, M. S. et al. Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy. Bioinformatics 35, 95–103 (2019).
Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970).
Hans, C. Elastic net regression modeling with the orthant normal prior. J. Am. Stat. Assoc. 106, 1383–1393 (2011).
LeBeau, B. simglm: Simulate Models Based on the Generalized Linear Model (CRAN, 2019).
Zunder, E. R. et al. Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm. Nat. Protoc. 10, 316–333 (2015).
Finck, R. et al. Normalization of mass cytometry data with bead standards. Cytometry A 83, 483–494 (2013).
Pacella, I. et al. IFN-α promotes rapid human Treg contraction and late Th1-like Treg decrease. J. Leukoc. Biol. 100, 613–623 (2016).
Metidji, A. et al. IFN-α/β receptor signaling promotes regulatory T cell development and function under stress conditions. J. Immunol. 194, 4265–4276 (2015).
Scheller, J., Chalaris, A., Schmidt-Arras, D. & Rose-John, S. The pro- and anti-inflammatory properties of the cytokine interleukin-6. Biochim. Biophys. Acta 1813, 878–888 (2011).
Heinrich, P. C. et al. Principles of interleukin (IL)-6-type cytokine signalling and its regulation. Biochem. J. 374, 1–20 (2003).
Acknowledgements
This study was supported by the March of Dimes Prematurity Research Center at Stanford (22-FY18-808), the Bill and Melinda Gates Foundation (OPP1112382, OPP1189911 and OPP1113682), the Department of Anesthesiology, Perioperative and Pain Medicine at Stanford University, the Robertson Foundation, NIH (1R01HL13984401A1, R35GM138353), the American Heart Association (18IPA34170507 and 19PABHI34580007), the Food and Drug Administration (FDA; HHSF223201610018C), the Burroughs Wellcome Fund (1019816) and the National Institutes of Health (NIH; R01AG058417, R01HL13984401, R21DE02772801 and R61NS114926). N.A., D.R.M. and G.P.N. were supported by US FDA contract no. HHSF223201610018C. N.S. was supported by a Stanford Immunology Training Grant (5 T32 AI07290-33). B.G. was supported by the NIH (K23GM111657) and the Doris Duke Charitable Foundation (2018100). D.G. and B.G. were supported by NIH R21DE02772801. L.P. and X.H. were supported by the Stanford Maternal and Child Health Research Institute. The authors are solely responsible for the content of this Article, which does not necessarily represent the official views of the US Department of Health and Human Services (HHS).
Author information
Authors and Affiliations
Contributions
A.C. conducted the data analysis and software development, generated figures and contributed to writing the manuscript. A.S.T. performed experiments, processed data, wrote the manuscript and produced figures. N.S., M.B., M.S.G., R.F., H.N., T.P., I.M., A.L.C., C.E. and M.X. contributed to the analysis plan, figure design and manuscript revisions. E.G., L.P., X.H., I.A.S., K.A. and D.G. designed and performed experiments. D.R.M., A.T., G.M.S., D.K.S., S.B., K.L.D., W.F., G.P.N., T.H. and R.T. contributed to the design and evaluation of the algorithm, and edited the manuscript. M.S.A. and B.G. coordinated the effort to collect and analyse biological data and interpreted the results, and contributed to writing the manuscript. N.A. conceived, designed and coordinated the study, interpreted data, and contributed to writing the manuscript. All authors read and approved the Article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–10 and Tables 1–6.
Rights and permissions
About this article
Cite this article
Culos, A., Tsai, A.S., Stanley, N. et al. Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions. Nat Mach Intell 2, 619–628 (2020). https://doi.org/10.1038/s42256-020-00232-8
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s42256-020-00232-8
This article is cited by
-
A machine learning approach to leveraging electronic health records for enhanced omics analysis
Nature Machine Intelligence (2025)
-
Predicting viral proteins that evade the innate immune system: a machine learning-based immunoinformatics tool
BMC Bioinformatics (2024)
-
3D molecular generative framework for interaction-guided drug design
Nature Communications (2024)
-
Unlocking human immune system complexity through AI
Nature Methods (2024)
-
Single-cell peripheral immunoprofiling of Lewy body and Parkinson’s disease in a multi-site cohort
Molecular Neurodegeneration (2024)


