Fig. 2: Protein 3D context differs from 1D sequence context. | Nature Communications

Fig. 2: Protein 3D context differs from 1D sequence context.

From: The 3D mutational constraint on amino acid sites in the human proteome

Fig. 2

a To quantify the 3D spatial context of each amino acid site (i), our framework defines its contact set as the amino acid residues that are in contact (Cβ < 8 Å) with the residue. For the example index site (i), the contact set is (i, j1, j3, j4, j6). Numbers below the 1D sequence schematic represent residue sequence positions and illustrate that contact set residues may be distant in sequence from the index site. b The COSMIS framework covers 80.3% of the reference human proteome. Defining the contact set of an amino acid site requires protein 3D structures. We used PDB and SWISS-MODEL as our primary sources of protein 3D structures. For proteins with no structure in the PDB or SWISS-MODEL that meet our criteria (Methods), we analyze models from the AlphaFold2 structure database. Numbers inside the pie chart represent fractions of the human reference proteome (20,600 proteins) for which we used the corresponding protein structure resource to compute COSMIS scores (Supplementary Data 1). c Contact sets capture long-range sites (separated by more than 15 residues along the 1D sequence) that interact in 3D. For example, residues j1 and j6 in panel a are not neighbors in 1D sequence, but nevertheless form long-range contacts with the index site i. The bar plot shows the fraction of all 6.1 million sites with at least a certain fraction of long-range 3D contacts in their contact sets. d Many neighboring sites in 1D sequence do not form 3D contacts with an index site. Defining the contact set eliminates these sites from consideration. For example, residues j2 and j4 in panel a are 1D sequence neighbors (within 15 residues) of the index site i but do not form 3D contacts with it. The bar plot shows the fraction of all 6.1 million sites that have at least a certain fraction of 1D sequence neighbors that do not form 3D contacts. PDB Protein Data Bank, AF2 AlphaFold2. Source data are provided as a Source Data file.

Back to article page