Fig. 7: Compound class distribution. | Nature Communications

Fig. 7: Compound class distribution.

From: Coverage bias in small molecule machine learning

Fig. 7

Compound classes are not represented equally in the datasets. Here, all ClassyFire compound classes occurring in at least 5% of the biomolecular structures are investigated. The occurrences for structures of the datasets BACE, BBBP, ClinTox, Delaney, Lipo, SIDER, Tox21, ToxCast, SMRT, and MS/MS are shown. Some molecular structures could not be classified by ClassyFire and were discarded: This applies to seven structures from BBBP, one from SIDER, four from ToxCast, and 1275 biomolecular structures. Compound classes with at least 15 occurrences in all datasets are omitted; this applies to 65 compound classes. See Supplementary Fig. 9 for a larger subset of ClassyFire classes at 1% cutoff.

Back to article page