Fig. 2: Classification of mutant catalytic defects with machine learning algorithms. | Nature Communications

Fig. 2: Classification of mutant catalytic defects with machine learning algorithms.

From: Widespread epistasis shapes RNA polymerase II active site function and evolution

Fig. 2: Classification of mutant catalytic defects with machine learning algorithms.

a ROC curves of two multiple logistic regression models. Using 65 mutants with validated in vitro catalytic defects and conditional growth fitness, we trained two models to classify variants as GOF or LOF. The GOF AUROC is 0.9889 (P ≤ 0.0001), whereas the LOF ROC is 0.9914 (P ≤ 0.0001). The predicted vs. observed graphs show GOF/LOF probabilities for 65 known mutants. The threshold to determine GOF or LOF mutations is shown by lines at 0.75. Details are in Supplementary Table 6. Among the 6054 viable mutants, 1390 were classified as GOF (22.96%), 1702 as LOF (28.11%), and 2962 remained unclassified (48.93%). b Left: t-SNE projection of all mutants (n = 15174) with perplexity = 50. Right: k-means clustering of all mutants. The t-SNE and k-means projections suggest GOF are in 3 clusters (cluster 2, 14, and 16), LOF are in 2 clusters (cluster 3 and 18), and unclassified mutants are in 2 clusters (11 and 15). Most ultra-sick/lethal mutants (fitness ≤ −6.5) are projected together into 13 clusters, likely due to significant noise from low read counts across conditions. c Feature plot of viable mutations in t-SNE and k-means projections (n = 6054). Ultra-sick/lethal mutations were removed, and the viable mutants were projected with t-SNE (perplexity = 100) and k-means (10 clusters). GOF were grouped into 4 clusters (4, 5, 7, and 10) and LOF were in 4 clusters (1, 3, 6, and 9). Each spot in the projection represents a mutant, and it is colored based on the fitness of the mutant in selective conditions. GOF and LOF mutants in different clusters are related to various phenotype patterns. GOF clusters 7 and 10 are defined by strong MPAS, while clusters 4 and 5 show slight MPAS, GalR, MnS, but strong Lys+. Slight FormS is a common feature across four GOF clusters. LOF clusters 3 and 6 show slight MnR, while clusters 1 and 9 are strongly MnR and GalR. Cluster 8, which mostly contains unclassified mutants, appears defined by Gal super sensitivity, indicating a potential specific defect defining this cluster.

Back to article page