Fig. 2: Comparison of predictive performance of deep learning CNN gene expression prediction models for crop plants under varying combinations of training data.
From: Deep learning the cis-regulatory code for gene expression in selected model plants

a The leaf model performances were estimated by calculation of prediction accuracy. For each crop plant reference species, A. thaliana (A. tha.), S. lycopersicum (S. lyc.), S. bicolor (S. bic.) and Z. mays (Z. may.), four different gene expression prediction models were generated based on varying combinations and variations of the training data. Training consisted of single-species references (SSR), multi-species references (MSR), species-specific references with homologous sequences (SSRU) and shuffled-sequence controls (SSC) (Supplementary Data 5). The error bars represent the 95% confidence intervals; the significance of the two-sided t-test is depicted as asterisk (p value: * ≤0.05, **≤0.01, ***≤0.001); the number of observations n = 5 (A. thaliana), n = 12 (S. lycopersicum), n = 10 (S. bicolor) and n = 10 (Z. mays) (Source Data). b The root model performances were estimated by calculation of prediction accuracy (see caption for sub-figure a). c Cross-species performances for leaf models were estimated by predicting expression profiles of other species using species-specific models. The highest cross-species accuracy was 80.55%, testing the SSR model of S. bicolor on Z. mays. The lowest estimated accuracy was 66.62% testing the SSR model of S. lycopersicum on S. bicolor. d The highest cross-species accuracy for the models generated with root transcript profiles was 81.68%, testing the SSR model of S. bicolor on Z. mays. The lowest estimated accuracy was 54.06%, testing the SSR model of S. bicolor on S. lycopersicum.