Fig. 1: The ESL-PSC procedure. | Nature Communications

Fig. 1: The ESL-PSC procedure.

From: Evolutionary sparse learning reveals the shared genetic basis of convergent traits

Fig. 1

The inputs are a set of orthologous protein multiple sequence alignments (MSAs) and a file indicating a set of paired convergent and control species. A single set of species pairs can be given, or alternate species from the same convergent and control clades may be provided, which will result in an ensemble of models using all possible combinations. The outputs (blue) are Species Trait Predictions and Gene Ranks. Species trait predictions include a predicted phenotype in the form of a Sequence Prediction Score for each species in the input MSA that was not used to build the given model. The Gene Ranks output consists of an ordered list of the input genes (MSAs) according to their Group (gene) Sparsity Scores (GSSs)8 that measure the degree to which they inform ESL models of the genetic distinction between the convergent and ancestral trait (see the Methods section). The highest-ranking genes can then be tested for ontology enrichments in order to detect relevant pathways and biological categories that show an abundance of evidence of convergent molecular evolution.

Back to article page