Fig. 2: Workflow for calculating the SIC.

Structurally isomeric compounds are grouped by molecular formula, and predefined substructures are extracted. For each substructure, the planar distance from the geometric center of the molecule is calculated (Lsub) and compared to the median distance across all compounds in the group (Lmedian). If the deviation exceeds a defined threshold, the substructure is considered distinct, and its contribution to the cumulative structural difference is weighted by molecular weight (M). Red circle: geometric center of the molecule based on atomic coordinates. Red arrows: distances from substructures to the molecular center. Blue arrows: median distances (Lmedian) for each substructure across the group. Lsub: substructure-to-center distance in each compound. Lmedian: median substructure distance across all compounds. M: molecular weight of each substructure.