Fig. 2: Domain and GO associations using DomainGO-prob.
From: Domain-PFP allows protein function prediction using function-aware domain embedding representations

a Functional consistency of domain embeddings. The domain functional similarity was quantified by the Jaccard Index of GO terms relative to the Manhattan distance of domain embeddings, computed on 100,000 random domain pairs. Three GO categories, MF, BP, CC, are separately shown. b Functional coherence in the protein level. 1,000,000 random Protein pairs were split into bins based on their embedding distance and the mean funSim score for each bin was plotted. Bins with <100 proteins were discarded. The last bin includes protein pairs with a distance >25. The size of circles indicates the number of protein pairs in the bin and the color of a data point indicates the standard deviation of the funSim score. c Predicted scores of GO terms for domains in 34,832 InterPro2GO entries. The score distribution GO terms for domains were taken from DomainGO-prob. We included the scores from both the standard model (trained on the entire dataset) and the model trained in the adversarial manner (trained after removing the InterPro2GO pair information), which are represented with orange and blue bars, respectively.