Fig. 1: The bioinformatic framework of the G2P portal. | Nature Methods

Fig. 1: The bioinformatic framework of the G2P portal.

From: Genomics 2 Proteins portal: a resource and discovery tool for linking genetic screening outputs to protein sequences and structures

Fig. 1

a, Schematic of data and method integration in the G2P portal and its two main modules: ‘Gene/Protein Lookup’ and ‘Interactive Mapping’. In the Gene/Protein Lookup module, the connections across identifiers of human genes, transcripts, protein sequences and structures were established using an in-house API: G2P3D, for the entire human proteome (see ‘Construction of G2P3D API’ in Methods for details). Variants from databases, such as gnomAD9, ClinVar10 and HGMD11, were subsequently mapped onto protein sequences and structures upon dynamically querying UniProtKB21 and structure databases (PDB6 and AlphaFoldDB25), respectively. Additionally, protein feature annotations were fetched and calculated from various databases and tools (UniProtKB, DSSP30 and PhosphoSitePlus27). All annotated protein sequences and structures with variants and features are viewable on the portal and downloadable in interoperable formats for further analyses. In the Interactive Mapping module of the portal, users can upload protein residue-wise annotations of variants and additional features and perform linking genetic data to protein structural data. Users can access this module by starting from a gene and by uploading an in-house protein structure. b, An example of G2P3D API output; the API links human genes (HGNC22) to transcripts (Ensembl23 and RefSeq24) to protein sequences (UniProtKB) and structures (PDB6 and AlphaFoldDB25). In this example, AADAT has four Ensembl transcripts and four RefSeq transcripts; three pairs of Ensembl-RefSeq transcripts encode the canonical protein isoform (Q8N5Z0-1*) and the remaining one transcript (ENST00000509167/NM_001286682) corresponds to the noncanonical protein isoform, Q8N5Z0-2. The canonical protein isoform is further dynamically linked to multiple available PDB structures and the AlphaFold structure. In the portal, variants are mapped onto both canonical and noncanonical protein isoforms. Only canonical protein isoform variants are mapped to available protein structures.

Back to article page