Fig. 2: TernaryDB construction and visualization. | Nature Communications

Fig. 2: TernaryDB construction and visualization.

From: SE(3)-equivariant ternary complex prediction towards target protein degradation

Fig. 2

a The process of collecting and cleaning the ternary complexes dataset. Initially, a search of ternary structures from the PDB yielded 46,797 PDB IDs, each of which contains at least two proteins and one small molecule. High-quality PDB IDs were retained based on criteria such as X-ray crystallography data, resolution, and R-free value. From this subset, 42,441 complexes were extracted, each comprising just two proteins and one small molecule. These complexes underwent further refinement based on peptide chain length and the number of contacts. Ultimately, 22,303 complexes met our stringent criteria and were used to train our model. b Histogram of the ligand atom number (excluding hydrogens) within the dataset. c Histogram of cluster sizes within the dataset according to the protein sequence similarity. d The distribution of protein source organisms in the dataset. e Proteome-wide view of the collected dataset. ESM-1b60 sequence embeddings for the two proteins in each complex are calculated and concatenated. This is followed by two-dimensional (2D) uniform manifold approximation and projection (UMAP). Similar complexes to PROTACs- and MG(D)s-involved ternary structures are denoted as red and green square points, respectively. f Chemical space covered by the dataset. Morgan fingerprints are converted to 1024-length vectors and visualized through a 2D UMAP. The points on the map are differentiated and colored by molecular weight (hydrogen excluded). PROTACs- and MG(D)s-like molecules are highlighted as red and green square points, respectively.

Back to article page