Abstract
Group A Streptococcus (GAS; Streptococcus pyogenes) is a bacterial pathogen for which a commercial vaccine for humans is not available. Employing the advantages of high-throughput DNA sequencing technology to vaccine design, we have analyzed 2,083 globally sampled GAS genomes. The global GAS population structure reveals extensive genomic heterogeneity driven by homologous recombination and overlaid with high levels of accessory gene plasticity. We identified the existence of more than 290 clinically associated genomic phylogroups across 22 countries, highlighting challenges in designing vaccines of global utility. To determine vaccine candidate coverage, we investigated all of the previously described GAS candidate antigens for gene carriage and gene sequence heterogeneity. Only 15 of 28 vaccine antigen candidates were found to have both low naturally occurring sequence variation and high (>99%) coverage across this diverse GAS population. This technological platform for vaccine coverage determination is equally applicable to prospective GAS vaccine antigens identified in future studies.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
Data availability
Illumina sequence reads and draft genome assemblies were deposited to the European Nucleotide Archive under the accession numbers specified in Supplementary Table 2. GenBank accession numbers for the 30 new GAS reference genomes are provided in Supplementary Table 5. To facilitate community accessibility and interrogation of the data presented in this study, the phylogenetic tree (Fig. 1a), PopPUNK phylogroup designations and associated metadata components have been uploaded to the interactive web interface Microreact66 (https://microreact.org/project/5DEFpeck4). The PopPUNK database for assigning new genomes is available at https://doi.org/10.6084/m9.figshare.6931439.v1.
Code availability
The script for assessing antigenic variation from genome assemblies, as used in this study, is available at https://github.com/shimbalama/screen_assembly.
Change history
19 July 2019
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
References
Carapetis, J. R., Steer, A. C., Mulholland, E. K. & Weber, M. The global burden of group A streptococcal diseases. Lancet Infect. Dis. 5, 685–694 (2005).
Walker, M. J. et al. Disease manifestations and pathogenic mechanisms of group A Streptococcus. Clin. Microbiol. Rev. 27, 264–301 (2014).
Watkins, D. A. et al. Global, regional, and national burden of rheumatic heart disease, 1990–2015. N. Engl. J. Med. 377, 713–722 (2017).
Henningham, A., Gillen, C. M. & Walker, M. J. Group A streptococcal vaccine candidates: potential for the development of a human vaccine. Curr. Top. Microbiol. Immunol. 368, 207–242 (2013).
Kotloff, K. L. et al. Safety and immunogenicity of a recombinant multivalent group A streptococcal vaccine in healthy adults: phase 1 trial. J. Am. Med. Assoc. 292, 709–715 (2004).
McNeil, S. A. et al. Safety and immunogenicity of 26-valent group A Streptococcus vaccine in healthy adult volunteers. Clin. Infect. Dis. 41, 1114–1122 (2005).
Brandt, E. R. et al. New multi-determinant strategy for a group A streptococcal vaccine designed for the Australian Aboriginal population. Nat. Med. 6, 455–459 (2000).
Sabharwal, H. et al. Group A Streptococcus (GAS) carbohydrate as an immunogen for protection against GAS infection. J. Infect. Dis. 193, 129–135 (2006).
Van Sorge, N. M. et al. The classical lancefield antigen of group A Streptococcus is a virulence determinant with implications for vaccine design. Cell Host Microbe 15, 729–740 (2014).
Henningham, A. et al. Conserved anchorless surface proteins as group A streptococcal vaccine candidates. J. Mol. Med. (Berl.) 90, 1197–1207 (2012).
Valentin-Weigand, P., Talay, S. R., Kaufhold, A., Timmis, K. N. & Chhatwal, G. S. The fibronectin binding domain of the Sfb protein adhesin of Streptococcus pyogenes occurs in many group A streptococci and does not cross-react with heart myosin. Micro. Pathog. 17, 111–120 (1994).
Steer, A. C., Law, I., Matatolu, L., Beall, B. W. & Carapetis, J. R. Global emm type distribution of group A streptococci: systematic review and implications for vaccine development. Lancet Infect. Dis. 9, 611–616 (2009).
Beall, B., Facklam, R. & Thompson, T. Sequencing emm-specific PCR products for routine and accurate typing of group A streptococci. J. Clin. Microbiol. 34, 953–958 (1996).
Sanderson-Smith, M. et al. A systematic and functional classification of Streptococcus pyogenes that serves as a new tool for molecular typing and vaccine development. J. Infect. Dis. 210, 1325–1338 (2014).
Enright, M. C., Spratt, B. G., Kalia, A., Cross, J. H. & Bessen, D. E. Multilocus sequence typing of Streptococcus pyogenes and the relationships between emm type and clone. Infect. Immun. 69, 2416–2427 (2001).
Mostowy, R. et al. Efficient inference of recent and ancestral recombination within bacterial populations. Mol. Biol. Evol. 34, 1167–1182 (2017).
Lees, J. A. et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res. 29, 304–316 (2019).
Chochua, S. et al. Population and whole genome sequence based characterization of invasive group A streptococci recovered in the United States during 2015. MBio 8, e01422-17 (2017).
Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15 (2015).
Marttinen, P. et al. Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res. 40, e6 (2012).
Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693 (2015).
Beres, S. B. et al. Genome-wide molecular dissection of serotype M3 group A Streptococcus strains causing two epidemics of invasive infections. Proc. Natl Acad. Sci. USA 101, 11833–11838 (2004).
Nasser, W. et al. Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences. Proc. Natl Acad. Sci. USA 111, E1768–E1776 (2014).
Turner, C. E. et al. Emergence of a new highly successful acapsular group A Streptococcus clade of genotype emm89 in the United Kingdom. MBio 6, e00622 (2015).
You, Y. et al. Scarlet fever epidemic in China caused by Streptococcus pyogenes serotype M12: epidemiologic and molecular analysis. EBioMedicine 28, 128–135 (2018).
Lees, J. A. et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat. Commun. 7, 12797 (2016).
Lees, J. A., Galardini, M., Bentley, S. D., Weiser, J. N. & Corander, J. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics 34, 4310–4312 (2018).
McIver, K. S., Subbarao, S., Kellner, E. M., Heath, A. S. & Scott, J. R. Identification of isp, a locus encoding an immunogenic secreted protein conserved among group A streptococci. Infect. Immun. 64, 2548–2555 (1996).
Henningham, A. et al. Virulence role of the GlcNAc side chain of the Lancefield cell wall carbohydrate antigen in non-M1-serotype group A Streptococcus. MBio 9, e02294-17 (2018).
Dale, J. B., Penfound, T. A., Chiang, E. Y. & Walton, W. J. New 30-valent M protein-based vaccine evokes cross-opsonic antibodies against non-vaccine serotypes of group A streptococci. Vaccine 29, 8175–8178 (2011).
Batzloff, M. R. et al. Protection against group A Streptococcus by immunization with J8-diphtheria toxoid: contribution of J8- and diphtheria toxoid-specific antibodies to protection. J. Infect. Dis. 187, 1598–1608 (2003).
Guilherme, L. et al. Towards a vaccine against rheumatic fever. Clin. Dev. Immunol. 13, 125–132 (2006).
Pandey, M. et al. Combinatorial synthetic peptide vaccine strategy protects against hypervirulent CovR/S mutant streptococci. J. Immunol. 196, 3364–3374 (2016).
Feil, S. C., Ascher, D. B., Kuiper, M. J., Tweten, R. K. & Parker, M. W. Structural studies of Streptococcus pyogenes streptolysin O provide insights into the early steps of membrane penetration. J. Mol. Biol. 426, 785–792 (2014).
Kagawa, T. F. et al. Model for substrate interactions in C5a peptidase from Streptococcus pyogenes: a 1.9 A crystal structure of the active form of ScpA. J. Mol. Biol. 386, 754–772 (2009).
Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 (2015).
Yates, C. M., Filippis, I., Kelley, L. A. & Sternberg, M. J. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J. Mol. Biol. 426, 2692–2701 (2014).
Croucher, N. J. et al. Population genomics of post-vaccine changes in pneumococcal epidemiology. Nat. Genet. 45, 656–663 (2013).
Bart, M. J. et al. Global population structure and evolution of Bordetella pertussis and their relationship with vaccination. MBio 5, e01074 (2014).
Courtney, H. S. et al. Trivalent M-related protein as a component of next generation group A streptococcal vaccines. Clin. Exp. Vaccin. Res. 6, 45–49 (2017).
Corander, J. et al. Frequency-dependent selection in vaccine-associated pneumococcal population dynamics. Nat. Ecol. Evol. 1, 1950–1960 (2017).
McNally, A. et al. Signatures of negative frequency dependent selection in colonisation factors and the evolution of a multi-drug resistant lineage of Escherichia coli. Preprint at bioRxiv https://doi.org/10.1101/400374 (2018).
Bao, Y. J., Shapiro, B. J., Lee, S. W., Ploplis, V. A. & Castellino, F. J. Phenotypic differentiation of Streptococcus pyogenes populations is induced by recombination-driven gene-specific sweeps. Sci. Rep. 6, 36644 (2016).
Vos, M. & Didelot, X. A comparison of homologous recombination rates in bacteria and archaea. ISME J. 3, 199–208 (2009).
Chewapreecha, C. et al. Dense genomic sampling identifies highways of pneumococcal recombination. Nat. Genet. 46, 305–309 (2014).
David, S. et al. Dynamics and impact of homologous recombination on the evolution of Legionella pneumophila. PLoS Genet. 13, e1006855 (2017).
Bergmann, R., Nerlich, A., Chhatwal, G. S. & Nitsche-Schmitz, D. P. Distribution of small native plasmids in Streptococcus pyogenes in India. Int. J. Med. Microbiol. 304, 370–378 (2014).
Woodbury, R. L. et al. Plasmid-borne erm(T) from invasive, macrolide-resistant Streptococcus pyogenes strains. Antimicrob. Agents Chemother. 52, 1140–1143 (2008).
Wescombe, P. A., Heng, N. C., Burton, J. P., Chilcott, C. N. & Tagg, J. R. Streptococcal bacteriocins and the case for Streptococcus salivarius as model oral probiotics. Future Microbiol. 4, 819–835 (2009).
Bensi, G. et al. Multi high-throughput approach for highly selective identification of vaccine candidates: the group A Streptococcus case. Mol. Cell Proteom. 11, 015693 (2012).
Ji, Y., Carlson, B., Kondagunta, A. & Cleary, P. P. Intranasal immunization with C5a peptidase prevents nasopharyngeal colonization of mice by the group A Streptococcus. Infect. Immun. 65, 2080–2087 (1997).
Rivera-Hernandez, T. et al. An experimental group A vaccine that reduces pharyngitis and tonsillitis in a nonhuman primate model. MBio 10, e00693-19 (2019).
Chaguza, C. et al. Recombination in Streptococcus pneumoniae lineages increase with carriage duration and size of the polysaccharide capsule. MBio 7, e01053-16 (2016).
Hanage, W. P. et al. Using multilocus sequence data to define the pneumococcus. J. Bacteriol. 187, 6223–6230 (2005).
Driebe, E. M. et al. Using whole genome analysis to examine recombination across diverse sequence types of Staphylococcus aureus. PLoS ONE 10, e0130955 (2015).
Enright, M. C., Day, N. P., Davies, C. E., Peacock, S. J. & Spratt, B. G. Multilocus sequence typing for characterization of methicillin-resistant and methicillin-susceptible clones of Staphylococcus aureus. J. Clin. Microbiol. 38, 1008–1015 (2000).
Coscolla, M. & Gonzalez-Candelas, F. Population structure and recombination in environmental isolates of Legionella pneumophila. Environ. Microbiol. 9, 643–656 (2007).
Diancourt, L., Passet, V., Verhoef, J., Grimont, P. A. & Brisse, S. Multilocus sequence typing of Klebsiella pneumoniae nosocomial isolates. J. Clin. Microbiol. 43, 4178–4182 (2005).
Wyres, K. L. et al. Distinct evolutionary dynamics of horizontal gene transfer in drug resistant and virulent clones of Klebsiella pneumoniae. PLoS Genet. 15, e1008114 (2019).
Seale, A. C. et al. Invasive group A Streptococcus infection among children, rural Kenya. Emerg. Infect. Dis. 22, 224–232 (2016).
Athey, T. B. et al. Deriving group A Streptococcus typing information from short-read whole-genome sequencing data. J. Clin. Microbiol. 52, 1871–1876 (2014).
Chalker, V. et al. Genome analysis following a national increase in scarlet fever in England 2014. BMC Genom. 18, 224 (2017).
Kapatai, G., Coelho, J., Platt, S. & Chalker, V. J. Whole genome sequencing of group A Streptococcus: development and evaluation of an automated pipeline for emmgene typing. PeerJ 5, e3226 (2017).
Ibrahim, J. et al. Genome analysis of Streptococcus pyogenes associated with pharyngitis and skin infections. PLoS ONE 11, e0168177 (2016).
Page, A. J. et al. Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data. Micro. Genom. 2, e000083 (2016).
Souvorov, A., Agarwala, R. & Lipman, D. J. SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biol. 19, 153 (2018).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Sumby, P. et al. Evolutionary origin and emergence of a highly successful clone of serotype M1 group A Streptococcus involved multiple horizontal gene transfer events. J. Infect. Dis. 192, 771–782 (2005).
He, M. et al. Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat. Genet. 45, 109–113 (2013).
Arndt, D. et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 44, W16–W21 (2016).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44, W242–W245 (2016).
Argimon, S. et al. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Micro. Genom. 2, e000093 (2016).
Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682 (2010).
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Weyrich, L. S. et al. Neanderthal behaviour, diet, and disease inferred from ancient DNA in dental calculus. Nature 544, 357–361 (2017).
Davies, M. R. et al. Emergence of scarlet fever Streptococcus pyogenes emm12 clones in Hong Kong is associated with toxin acquisition and multidrug resistance. Nat. Genet. 47, 84–87 (2015).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Acknowledgements
This work was supported by National Health and Medical Research Council project and program grants for: Protein Glycan Interactions in Infectious Diseases and Cellular Microbiology; the Coalition to Accelerate New Vaccines Against Streptococcus (CANVAS; an Australian and New Zealand joint initiative); and The Wellcome Trust, UK. For part of this study, M.R.D. was supported by a National Health and Medical Research Council postdoctoral training fellowship (635250) and A.M. was a GENDRIVAX fellow funded by the European Union’s Seventh Framework Programme FP7/2007–2013/ under REA grant agreement number 251522. We acknowledge assistance from the sequencing and pathogen informatics core teams at The Wellcome Trust Sanger Institute. We acknowledge and thank the database curators of the S. pyogenes MLST and emm databases (especially D. Bessen). We dedicate this work to the memory of our friend and colleague Gusharan Singh Chhatwal.
Author information
Authors and Affiliations
Contributions
M.R.D., G.D. and M.J.W. conceived the project. M.R.D., A.M., J.A.Lacey, J.A.Lees, S.Duchene, P.R.S., M.T.G.H., S.Y.C.T., P.M.G., A.C.S., J.A.B., G.S.C., S.D.B., R.A.S., T.L., J.D.F., N.J.M., J.R.C., A.C.S., J.P., A.S., D.A.W., B.J.C. and M.J.W. designed the experiments. M.R.D., L.M., J.A.Lacey, J.A.Lees, S.David, A.M., R.J.T., K.A.W., S.R.H., T.R.-H., H.R.F., R.S.L.A.T., O.B., A.J.C., R.B., P.N.-S., N.J.M. and D.A.W. performed the experimental protocols. M.R.D., L.M., J.A.Lacey, J.A.Lees, S.Duchene, D.J.P., A.M., P.R.S., N.J.M., G.D. and M.J.W. analyzed the experimental results. M.R.D. and M.J.W. wrote the manuscript and all authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
A.S. is an employee of the GSK group of companies with a commercial interest in GAS vaccine development. These companies had no influence over study design.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–14 and Supplementary Tables 1, 5, 7 and 10–12
Supplementary Table 2
GAS strains used in this study
Supplementary Table 3
List of 890 core GAS genes identified as having recombinogenic signatures as defined by fastGEAR
Supplementary Table 4
List of 416 ‘non-recombinogenic’ core GAS genes and markers of selection pressure (ratio of non-synonymous (dN) to synonymous (dS) codon subsititutions (dN/dS))
Supplementary Table 6
Frequency, size (length) and relative rates of recombination within 36 PopPUNK phylogroups
Supplementary Table 8
Position of amino acid variants within the streptolysin O (SLO) protein and the consensus sequence of the SLO mature protein (as plotted in Fig. 3a,c)
Supplementary Table 9
Position of amino acid variants within the C5a peptidase (ScpA) protein and the consensus sequence
Rights and permissions
About this article
Cite this article
Davies, M.R., McIntyre, L., Mutreja, A. et al. Atlas of group A streptococcal vaccine candidates compiled using large-scale comparative genomics. Nat Genet 51, 1035–1043 (2019). https://doi.org/10.1038/s41588-019-0417-8
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41588-019-0417-8
This article is cited by
-
Elucidating the role of group A Streptococcus genomics and pharyngeal microbiota in acute paediatric pharyngitis
Scientific Reports (2025)
-
Dissecting the properties of circulating IgG against streptococcal pathogens through a combined systems antigenomics-serology workflow
Nature Communications (2025)
-
An mRNA vaccine encoding five conserved Group A Streptococcus antigens
Nature Communications (2025)
-
Streptolysin O accelerates the conversion of plasminogen to plasmin
Nature Communications (2024)
-
Overlapping Streptococcus pyogenes and Streptococcus dysgalactiae subspecies equisimilis household transmission and mobile genetic element exchange
Nature Communications (2024)