Fig. 1: Pipeline of DeepSCFold. | Nature Communications

Fig. 1: Pipeline of DeepSCFold.

From: High-accuracy protein complex structure modeling based on sequence-derived structure complementarity

Fig. 1: Pipeline of DeepSCFold.

a DeepSCFold takes multiple query sequences of a protein complex as input, with each component sequence being independently searched against protein databases to generate monomeric multiple sequence alignments (MSAs). A deep learning model is designed to assess structural similarity between each sequence homolog in the MSA and the query sequence, enabling the ranking and selection of appropriate homologs. The pairing of homologs from different monomeric MSAs is accomplished through two distinct strategies: one leveraging species information or manually curated annotations, and another utilizing a probability matrix derived from an interaction probability prediction model that scores potential homolog pairs. These paired MSAs are subsequently processed by AlphaFold-Multimer to generate potential complex structures. The resulting models undergo quality assessment, with the top-ranked model serving as a template for a single iteration of refinement, ultimately yielding the final predicted structure of the protein complex. b Deep learning model architecture: The features of a pair of sequences are input into a multi-scale retention module (d) to generate sequence representations, which are then fed into a criss-cross attention module (e) to generate paired representations, followed by a down sample module (c) to generate the final predicted scores (pSS or pIA). Source data are provided as a Source Data file.

Back to article page