Fig. 3: Existing data and the proposed novel product generation procedure.

a t-distributed stochastic neighbor embedding (t-SNE) map of chemical structures encoded with Morgan fingerprints. Molecules were collected via random sampling from the PubChem database and from compounds that were registered in the mass spectra used in the research. Source data are provided as a Source Data file; b t-SNE map of the archived MS data used in the research (see Figure S20 for the enlarged version). Each point represents a unique mass spectrum. Different colors indicate instrument operators (coded by letters) who recorded mass spectra. Operator C registers mass spectra for the entire institute. Source data are provided as a Source Data file; c Functional groups and ligands, which were used in the generation process; NHC—N-heterocyclic carbene, Ar — aryl group, Nu — nucleophile, EWG — electron-withdrawing group. d The generation of ion formulas involves a complete enumeration of all functional groups and ligands for each core; e Bar chart illustrating the number of detected ions, categorized by the type of transformation. Source data are provided as a Source Data file.