Fig. 5

Empirical investigation of the Ghana data shows resemblance to the immune selection scenario. a Negative correlation between DBLα type frequencies and their number of similar genes for the upsB/upsC var genes in the parasite population (r = − 0.040, p < 2.2e – 16). The number of similar genes is calculated as the degree (k) of a focal gene in the gene similarity network, for amino acid similarities above 0.6. Histograms on the top and right of the plot show the distributions of k and DBLα-type frequencies, respectively. b Classification of networks generated with the agent-based model using Discriminant Analyses of Principle Components66 onto a 2-D space formed by the two linear discriminant (LD) functions using the top 10% edge weights. The empirical network is more likely to be generated under an immune selection regime (posterior probability [PP] = 1), as opposed to neutrality (PP = 3.57E – 8) or generalized immunity (PP = 2.15E – 9). The classification relies on comparisons of 34 network properties (see Supplementary Table 1) trained with 7000 simulated networks and verified on test sets of 800 networks (i.e., 100 combinations of different parameter settings for each of the scenarios were run; infected hosts were sampled in October and the next June each year, as for the empirical sampling, at the stationary stage of the simulations (i.e., the last 26 years in the simulations); similarity networks were then built for randomly sampled parasites from the sampled hosts). Accuracy of network classifications is above 0.99 for each scenario (see Supplementary Table 2 for comparisons of accuracy and the classifications of the empirical network using different percentage of top edges in the network, and Supplementary Fig. 5 for motif properties of the empirical network compared with simulated ones)