Fig. 3: Evaluation of the effects of molecular weight and dataset size on PCA using molecular fingerprints. | Communications Chemistry

Fig. 3: Evaluation of the effects of molecular weight and dataset size on PCA using molecular fingerprints.

From: Structural Isomer Cumulative molecular fingerprinting method (SIC) for standardizing structural isomeric relationships

Fig. 3

Structurally isomeric compounds with the same molecular formula were retrieved from PubChem. Red circles represent compounds with the formula C₆H₆O₂ (n = 377), and blue circles represent compounds with C₄₈H₈₉NO₁₈ (n = 31). The PCA distribution shows that conventional fingerprints are strongly influenced by dataset size and molecular weight, leading to biased chemical space representations.

Back to article page