Fig. 3: Transcriptomic disorder can be quantified by entropy, which predicts fitness.

a Depiction of the transcriptomic response of wildtype T4 and VNC-adapted T4 in response to 1× MIC-wt of Vancomycin. Differential expression (DE) of each gene over time is represented as a line. The response of the wild type is more disordered than the adapted-response, and has higher entropy. b Entropy captures disorder in a transcriptome and not simply high-magnitude changes. The top panel shows three hypothetical scenarios, where DE of four individual genes are tracked over time. In scenarios 1 and 2, the individual genes are dependent on each other and follow similar transcriptional trajectories. In scenario 3, dependencies are largely absent and the overall changes in DE seem much more disordered. In the bottom panel, magnitude changes (blue, quantified as the sum of absolute DE), and entropy (red) for the three scenarios are compared. While the largest changes in magnitude are in scenario 1, both scenario 1 and 2 have relatively low entropy, due to dependencies among genes. In scenario 3, overall DE is similar to the other two scenarios, but the magnitude changes have lost much of their dependency and have become disordered, resulting in high entropy. c Selection of regularization parameter ρ. Fivefold crossvalidation was used to determine the best choice of ρ. Error (1-accuracy) is reported as the mean ± standard deviation across n = 5 folds. The value of ρ that minimizes the mean crossvalidation error is determined to be 1.5 (red dashed line). d Performance of temporal entropy-based fitness prediction is shown as receiver-operator characteristic (ROC) curves plotting the sensitivity against the false-positive rate across a range of thresholds for training (black) and test (red) datasets. The area under the ROC (AUROC) curve shows how well the predictor can separate high and low fitness. The AUROC is 0.89 and 0.94 for the training and test set respectively. e Performance of temporal entropy-based fitness prediction is shown as precision-recall (PR) curves plotting precision against recall across a range of thresholds for training (black) and test (red) datasets. The area under the PR curve (AUPRC) shows how well the predictor can detect high fitness cases. The AUPRC is 0.88 and 0.98 for the training and test set respectively. f Entropy of all experiments in the training (top panel) and test (bottom panel) sets. Each experiment is represented as an individual bar, colored according to the experimentally determined fitness outcome. Bars above the entropy threshold (Entropy = 1066.25) are predicted to be low fitness and bars below the threshold are predicted to be high fitness. Both training and test sets score very well with an accuracy of 0.97 and 0.84, respectively.