Fig. 1: Workflow of SigmaCCS. | Communications Chemistry

Fig. 1: Workflow of SigmaCCS.

From: Highly accurate and large-scale collision cross sections prediction with graph neural networks

Fig. 1: Workflow of SigmaCCS.The alt text for this image may have been generated using AI.

a Dataset curation: a curated dataset with 5597 experimental CCS values was used to train, validate and test the SigmaCCS model. It was obtained through a five-step cleaning pipeline from CCSbase. b Conformer generation: the molecular object of each molecule was constructed from its SMILES string, and the 3D conformer was generated and optimized by ETKDG and MMFF94. The attributes of each atom and bond in the molecule were calculated by RDKit. c Molecular graph construction: the molecular graph of each molecule was established by initializing the node attribute matrix, the edge attribute matrix, and the adjacency matrix with attributes calculated in the previous step and its connection table. d Edge-conditioned convolution: the atomic vector of each atom in the molecule was learned from the curated dataset with edge-conditioned convolution, and the molecular vector was generated from atom vectors through global sum pooling. e Adduct encoding: the adduct ion type ([M + H]+, [M+Na]+, and [M-H]-) was encoded as a one-hot vector. The molecular vector and the one-hot vector of adduct type were concatenated to obtain the feature vector. f CCS prediction: the feature vector was fed into the fully connected layers and feedforwarded to the output layer to predict the CCS value. g Database generation: the SigmaCCS model was used to predict CCS values of 94,161,201 compounds in PubChem. Three different adduct ions of each molecule were predicted. There are >280,000,000 predicted CCS values in the CCS database. The complete workflow of SigmaCCS was implemented in Python (v3.7.7).

Back to article page