Fig. 1: Overview of the t-SMILES algorithm. | Nature Communications

Fig. 1: Overview of the t-SMILES algorithm.

From: t-SMILES: a fragment-based molecular representation framework for de novo ligand design

Fig. 1

a SMILES of molecule Celecoxib. b The structural formula of Celecoxib which is marked with different colors to indicate its fragments. c The molecule is broken down into fragments, each marked with a different color. The thumbnail illustrates the overall topological structure of fragmented molecule. d AMT(Acyclic Molecular Tree) of fragmented molecule. Each fragment is presented using both its SMILES code and structural formula. e FBT (Full Binary Tree) of fragmented molecule. Tree node is presented with fragment or new introduced symbl “&”. “L” and “R” refer to the left or right sub-tree. “L1”–“L7” refers to the number of layers of FBT. In L5, new symbol ‘^‘ is used to separate two pieces in t-SMILES string. f TSID (t-SMILES with ID and dummy atom) and TSDY (t-SMILES with dummy atom but without ID) code of Celecoxib. The colors in the t-SMILES string are used to match the corresponding fragments in the structural formula.The molecular graph is first decomposed using selected molecular fragmentation algorithm to build an AMT, which is then transformed into an FBT. Finally, the BFS (Breadth-First Search) algorithm is used to traverse the FBT and obtain its t-SMILES string. To reconstruct the molecule, rebuilding the FBT from the t-SMILES string, transforming it to an AMT, and finally assembling the AMT back into the original molecule. In TSID, [n*] are used to indicate joint point. When the IDs are removed from the TSID code, the TSDY code is created. New symbol “&“ is used to mark empty tree nodes. TSSA (t-SMILES with shared atom) uses a different way to get pieces, please see Supplementary A.1 for the entire process and more examples. MMPA is used as an example in both figures to cut molecules.

Back to article page