1887

Abstract

Investigating the genomic epidemiology of major bacterial pathogens is integral to understanding transmission, evolution, colonization, disease, antimicrobial resistance and vaccine impact. Furthermore, the recent accumulation of large numbers of whole genome sequences for many bacterial species enhances the development of robust genome-wide typing schemes to define the overall bacterial population structure and lineages within it. Using the previously published data, we developed the Pneumococcal Genome Library (PGL), a curated dataset of 30 976 genomes and contextual data for carriage and disease pneumococci recovered between 1916 and 2018 in 82 countries. We leveraged the size and diversity of the PGL to develop a core genome multilocus sequence typing (cgMLST) scheme comprised of 1222 loci. Finally, using multilevel single-linkage clustering, we stratified pneumococci into hierarchical clusters based on allelic similarity thresholds and defined these with a taxonomic life identification number (LIN) barcoding system. The PGL, cgMLST scheme and LIN barcodes represent a high-quality genomic resource and fine-scale clustering approaches for the analysis of pneumococcal populations, which support the genomic epidemiology and surveillance of this leading global pathogen.

Funding
This study was supported by the:
  • Wellcome Trust (Award 218205/Z/19/Z)
    • Principle Award Recipient: MartinC.J. Maiden
  • Wellcome Trust (Award 206394/Z/17/Z)
    • Principle Award Recipient: AngelaB Brueggemann
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001280
2024-08-13
2024-12-12
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10/8/mgen001280.html?itemId=/content/journal/mgen/10.1099/mgen.0.001280&mimeType=html&fmt=ahah

References

  1. Bambini S, Rappuoli R. The use of genomics in microbial vaccine development. Drug Discov Today 2009; 14:252–260 [View Article] [PubMed]
    [Google Scholar]
  2. Hill DMC, Lucidarme J, Gray SJ, Newbold LS, Ure R et al. Genomic epidemiology of age-associated meningococcal lineages in national surveillance: an observational cohort study. Lancet Infect Dis 2015; 15:1420–1428 [View Article] [PubMed]
    [Google Scholar]
  3. Gardy JL, Loman NJ. Towards a genomics-informed, real-time, global pathogen surveillance system. Nat Rev Genet 2018; 19:9–20 [View Article] [PubMed]
    [Google Scholar]
  4. Baker KS. Microbe hunting in the modern era: reflecting on a decade of microbial genomic epidemiology. Curr Biol 2020; 30:R1124–R1130 [View Article] [PubMed]
    [Google Scholar]
  5. GBD 2019 Antimicrobial Resistance Collaborators Global mortality associated with 33 bacterial pathogens in 2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet 2022; 400:2221–2248 [View Article] [PubMed]
    [Google Scholar]
  6. Brueggemann AB, Jansen van Rensburg MJ, Shaw D, McCarthy ND, Jolley KA et al. Changes in the incidence of invasive disease due to Streptococcus pneumoniae, Haemophilus influenzae, and Neisseria meningitidis during the COVID-19 pandemic in 26 countries and territories in the Invasive Respiratory Infection Surveillance Initiative: a prospective analysis of surveillance data. Lancet Digit Health 2021; 3:e360–e370 [View Article] [PubMed]
    [Google Scholar]
  7. Shaw D, Abad R, Amin-Chowdhury Z, Bautista A, Bennett D et al. Trends in invasive bacterial diseases during the first 2 years of the COVID-19 pandemic: analyses of prospective surveillance data from 30 countries and territories in the IRIS Consortium. Lancet Digit Health 2023; 5:e582–e593 [View Article] [PubMed]
    [Google Scholar]
  8. van Tonder AJ, Mistry S, Bray JE, Hill DMC, Cody AJ et al. Defining the estimated core genome of bacterial populations using a Bayesian decision model. PLoS Comput Biol 2014; 10:e1003788 [View Article] [PubMed]
    [Google Scholar]
  9. van Tonder AJ, Bray JE, Jolley KA, Jansen van Rensburg M, Quirk SJ et al. Genomic analyses of >3,100 nasopharyngeal pneumococci revealed significant differences between pneumococci recovered in four different geographical regions. Front Microbiol 2019; 10:317 [View Article] [PubMed]
    [Google Scholar]
  10. Gladstone RA, Lo SW, Lees JA, Croucher NJ, van Tonder AJ et al. International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact. EBioMedicine 2019; 43:338–346 [View Article] [PubMed]
    [Google Scholar]
  11. Lees JA, Tonkin-Hill G, Yang Z, Corander J. Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210237 [View Article] [PubMed]
    [Google Scholar]
  12. Brueggemann AB, Pai R, Crook DW, Beall B. Vaccine escape recombinants emerge after pneumococcal vaccination in the United States. PLoS Pathog 2007; 3:e168 [View Article] [PubMed]
    [Google Scholar]
  13. Ganaie F, Saad JS, McGee L, van Tonder AJ, Bentley SD et al. A new pneumococcal capsule type, 10D, is the 100th serotype and has a large cps fragment from an oral Streptococcus. mBio 2020; 11:e00937-20 [View Article] [PubMed]
    [Google Scholar]
  14. Manna S, Werren JP, Ortika BD, Bellich B, Pell CL et al. Streptococcus pneumoniae serotype 33G: genetic, serological, and structural analysis of a new capsule type. Microbiol Spectr 2024; 12:e0357923 [View Article] [PubMed]
    [Google Scholar]
  15. Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A 1998; 95:3140–3145 [View Article] [PubMed]
    [Google Scholar]
  16. Enright MC, Spratt BG. A multilocus sequence typing scheme for Streptococcus pneumoniae: identification of clones associated with serious invasive disease. Microbiology 1998; 144:3049–3060 [View Article] [PubMed]
    [Google Scholar]
  17. Jolley KA, Bliss CM, Bennett JS, Bratcher HB, Brehony C et al. Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain. Microbiology 2012; 158:1005–1015 [View Article] [PubMed]
    [Google Scholar]
  18. Jolley KA, Bray JE, Maiden MCJ. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications. Wellcome Open Res 2018; 3:124 [View Article] [PubMed]
    [Google Scholar]
  19. Harrison OB, Cehovin A, Skett J, Jolley KA, Massari P et al. Neisseria gonorrhoeae population genomics: use of the gonococcal core genome to improve surveillance of antimicrobial resistance. J Infect Dis 2020; 222:1816–1825 [View Article] [PubMed]
    [Google Scholar]
  20. Cody AJ, Bray JE, Jolley KA, McCarthy ND, Maiden MCJ. Core genome multilocus sequence typing scheme for stable, comparative analyses of Campylobacter jejuni and C. coli human disease isolates. J Clin Microbiol 2017; 55:2086–2097 [View Article] [PubMed]
    [Google Scholar]
  21. Abdel-Glil MY, Thomas P, Linde J, Jolley KA, Harmsen D et al. Establishment of a publicly available core genome multilocus sequence typing scheme for Clostridium perfringens. Microbiol Spectr 2021; 9:e0053321 [View Article] [PubMed]
    [Google Scholar]
  22. Gonzalez-Escalona N, Jolley KA, Reed E, Martinez-Urtaza J. Defining a core genome multilocus sequence typing scheme for the global epidemiology of Vibrio parahaemolyticus. J Clin Microbiol 2017; 55:1682–1697 [View Article] [PubMed]
    [Google Scholar]
  23. Abdel-Glil MY, Thomas P, Brandt C, Melzer F, Subbaiyan A et al. Core genome multilocus sequence typing scheme for improved characterization and epidemiological surveillance of pathogenic Brucella. J Clin Microbiol 2022; 60:e0031122 [View Article] [PubMed]
    [Google Scholar]
  24. Maiden MCJ, Jansen van Rensburg MJ, Bray JE, Earle SG, Ford SA et al. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 2013; 11:728–736 [View Article] [PubMed]
    [Google Scholar]
  25. Vinatzer BA, Tian L, Heath LS. A proposal for A portal to make earth’s microbial diversity easily accessible and searchable. Antonie van Leeuwenhoek 2017; 110:1271–1279 [View Article] [PubMed]
    [Google Scholar]
  26. Hennart M, Guglielmini J, Bridel S, Maiden MCJ, Jolley KA et al. A dual barcoding approach to bacterial strain nomenclature: genomic taxonomy of Klebsiella pneumoniae strains. Mol Biol Evol 2022; 39:msac135 [View Article] [PubMed]
    [Google Scholar]
  27. Zerbino DR. Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinform 2010 [View Article] [PubMed]
    [Google Scholar]
  28. Bogaardt C, van Tonder AJ, Brueggemann AB. Genomic analyses of pneumococci reveal a wide diversity of bacteriocins - including pneumocyclicin, a novel circular bacteriocin. BMC Genom 2015; 16:554 [View Article] [PubMed]
    [Google Scholar]
  29. Brueggemann AB, Harrold CL, Rezaei Javan R, van Tonder AJ, McDonnell AJ et al. Pneumococcal prophages are diverse, but not without structure or history. Sci Rep 2017; 7:42976 [View Article] [PubMed]
    [Google Scholar]
  30. Kurioka A, van Wilgenburg B, Javan RR, Hoyle R, van Tonder AJ et al. Diverse Streptococcus pneumoniae strains drive a mucosal-associated invariant T-cell response through major histocompatibility complex class I-related molecule-dependent and cytokine-driven pathways. J Infect Dis 2018; 217:988–999 [View Article] [PubMed]
    [Google Scholar]
  31. Quirk SJ, Haraldsson G, Erlendsdóttir H, Hjálmarsdóttir , van Tonder AJ et al. Effect of vaccination on pneumococci isolated from the nasopharynx of healthy children and the middle ear of children with Otitis Media in Iceland. J Clin Microbiol 2018; 56:e01046-18 [View Article] [PubMed]
    [Google Scholar]
  32. Quirk SJ, Haraldsson G, Hjálmarsdóttir , van Tonder AJ, Hrafnkelsson B et al. Vaccination of Icelandic Children with the 10-valent pneumococcal vaccine leads to a significant herd effect among adults in Iceland. J Clin Microbiol 2019; 57:e01766-18 [View Article] [PubMed]
    [Google Scholar]
  33. Rezaei Javan R, van Tonder AJ, King JP, Harrold CL, Brueggemann AB. Genome sequencing reveals a large and diverse repertoire of antimicrobial peptides. Front Microbiol 2018; 9:2012 [View Article] [PubMed]
    [Google Scholar]
  34. Rezaei Javan R, Ramos-Sevillano E, Akter A, Brown J, Brueggemann AB. Prophages and satellite prophages are widespread in Streptococcus and may play a role in pneumococcal pathogenesis. Nat Commun 2019; 10:4852 [View Article] [PubMed]
    [Google Scholar]
  35. van Tonder AJ, Bray JE, Roalfe L, White R, Zancolli M et al. Genomics reveals the worldwide distribution of multidrug-resistant serotype 6E pneumococci. J Clin Microbiol 2015; 53:2271–2285 [View Article] [PubMed]
    [Google Scholar]
  36. van Tonder AJ, Bray JE, Quirk SJ, Haraldsson G, Jolley KA et al. Putatively novel serotypes and the potential for reduced vaccine effectiveness: capsular locus diversity revealed among 5405 pneumococcal genomes. Microb Genom 2016; 2:000090 [View Article] [PubMed]
    [Google Scholar]
  37. Wyres KL, Lambertsen LM, Croucher NJ, McGee L, von Gottberg A et al. The multidrug-resistant PMEN1 pneumococcus is a paradigm for genetic success. Genome Biol 2012; 13:R103 [View Article] [PubMed]
    [Google Scholar]
  38. Wyres KL, van Tonder A, Lambertsen LM, Hakenbeck R, Parkhill J et al. Evidence of antimicrobial resistance-conferring genetic elements among pneumococci isolated prior to 1974. BMC Genom 2013; 14:500 [View Article] [PubMed]
    [Google Scholar]
  39. Wyres KL, Lambertsen LM, Croucher NJ, McGee L, von Gottberg A et al. Pneumococcal capsular switching: a historical perspective. J Infect Dis 2013; 207:439–449 [View Article] [PubMed]
    [Google Scholar]
  40. Andam CP, Mitchell PK, Callendrello A, Chang Q, Corander J et al. Genomic epidemiology of penicillin-nonsusceptible pneumococci with nonvaccine serotypes causing invasive disease in the United States. J Clin Microbiol 2017; 55:1104–1115 [View Article] [PubMed]
    [Google Scholar]
  41. Chang B, Morita M, Lee KI, Ohnishi M. Whole-genome sequence analysis of Streptococcus pneumoniae strains that cause hospital-acquired pneumonia infections. J Clin Microbiol 2018; 56:e01822-17 [View Article] [PubMed]
    [Google Scholar]
  42. Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes de novo assembler. Curr Protoc Bioinform 2020; 70:e102 [View Article] [PubMed]
    [Google Scholar]
  43. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J et al. BLAST+: architecture and applications. BMC Bioinform 2009; 10:421 [View Article] [PubMed]
    [Google Scholar]
  44. Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol 2021; 38:4647–4654 [View Article] [PubMed]
    [Google Scholar]
  45. Silva M, Machado MP, Silva DN, Rossi M, Moran-Gilad J et al. chewBBACA: a complete suite for gene-by-gene schema creation and strain identification. Microb Genom 2018; 4:e000166 [View Article] [PubMed]
    [Google Scholar]
  46. Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 2019; 47:D309–D314 [View Article] [PubMed]
    [Google Scholar]
  47. Bruen TC, Philippe H, Bryant D. A simple and robust statistical test for detecting the presence of recombination. Genetics 2006; 172:2665–2681 [View Article] [PubMed]
    [Google Scholar]
  48. Palma F, Hennart M, Jolley KA, Crestani C, Wyres KL et al. Bacterial strain nomenclature in the genomic era: life identification numbers using a gene-by-gene approach. Microbiology 2024 [View Article]
    [Google Scholar]
  49. Katoh K, Asimenos G, Toh H. Multiple alignment of DNA sequences with MAFFT. Methods Mol Biol 2009; 537:39–64 [View Article] [PubMed]
    [Google Scholar]
  50. Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 2010; 5:e9490 [View Article] [PubMed]
    [Google Scholar]
  51. Didelot X, Wilson DJ. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 2015; 11:e1004041 [View Article] [PubMed]
    [Google Scholar]
  52. Yu G, Lam TT-Y, Zhu H, Guan Y. Two methods for mapping and visualizing associated data on phylogeny using Ggtree. Mol Biol Evol 2018; 35:3041–3043 [View Article] [PubMed]
    [Google Scholar]
  53. Zhou Z, Alikhan NF, Sergeant MJ, Luhmann N, Vaz C et al. GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome Res 2018; 28:1395–1404 [View Article] [PubMed]
    [Google Scholar]
  54. Nascimento M, Sousa A, Ramirez M, Francisco AP, Carriço JA et al. PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods. Bioinformatics 2017; 33:128–129 [View Article] [PubMed]
    [Google Scholar]
  55. Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genom Res 2019; 29:304–316 [View Article] [PubMed]
    [Google Scholar]
  56. Scrucca L, Fop M, Murphy TB, Raftery AE. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 2016; 8:289–317 [PubMed]
    [Google Scholar]
  57. Donati C, Hiller NL, Tettelin H, Muzzi A, Croucher NJ et al. Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome Biol 2010; 11:R107 [View Article] [PubMed]
    [Google Scholar]
  58. Hiller NL, Janto B, Hogg JS, Boissy R, Yu S et al. Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome. J Bacteriol 2007; 189:8186–8195 [View Article] [PubMed]
    [Google Scholar]
  59. Rosconi F, Rudmann E, Li J, Surujon D, Anthony J et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 2022; 7:1580–1592 [View Article] [PubMed]
    [Google Scholar]
  60. Porat N, Greenberg D, Givon-Lavi N, Shuval DS, Trefler R et al. The important role of nontypable Streptococcus pneumoniae international clones in acute conjunctivitis. J Infect Dis 2006; 194:689–696 [View Article] [PubMed]
    [Google Scholar]
  61. Marimon JM, Ercibengoa M, García-Arenzana JM, Alonso M, Pérez-Trallero E. Streptococcus pneumoniae ocular infections, prominent role of unencapsulated isolates in conjunctivitis. Clin Microbiol Infect 2013; 19:E298–E305 [View Article] [PubMed]
    [Google Scholar]
  62. Yahara K, Didelot X, Jolley KA, Kobayashi I, Maiden MCJ et al. The landscape of realized homologous recombination in pathogenic bacteria. Mol Biol Evol 2016; 33:456–471 [View Article] [PubMed]
    [Google Scholar]
  63. Lees JA, Kremer PHC, Manso AS, Croucher NJ, Ferwerda B et al. Large scale genomic analysis shows no evidence for pathogen adaptation between the blood and cerebrospinal fluid niches during bacterial meningitis. Microb Genom 2017; 3:e000103 [View Article] [PubMed]
    [Google Scholar]
  64. Chaguza C, Senghore M, Bojang E, Gladstone RA, Lo SW et al. Within-host microevolution of Streptococcus pneumoniae is rapid and adaptive during natural colonisation. Nat Commun 2020; 11:3442 [View Article] [PubMed]
    [Google Scholar]
  65. Tenopir C, Rice NM, Allard S, Baird L, Borycz J et al. Data sharing, management, use, and reuse: practices and perceptions of scientists worldwide. PLoS One 2020; 15:e0229003 [View Article] [PubMed]
    [Google Scholar]
  66. Orata FD, Keim PS, Boucher Y. The 2010 cholera outbreak in Haiti: how science solved a controversy. PLoS Pathog 2014; 10:e1003967 [View Article] [PubMed]
    [Google Scholar]
  67. Pettengill JB, Markell A, Conrad A, Carleton HA, Beal J et al. A multinational listeriosis outbreak and the importance of sharing genomic data. Lancet Microbe 2020; 1:e233–e234 [View Article] [PubMed]
    [Google Scholar]
  68. Rito T, Fernandes P, Duarte R, Soares P. Evaluating data sharing of SARS-CoV-2 genomes for molecular epidemiology across the COVID-19 pandemic. Viruses 2023; 15:560 [View Article] [PubMed]
    [Google Scholar]
  69. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data 2016; 3:160018 [View Article] [PubMed]
    [Google Scholar]
  70. Blanc DS, Magalhães B, Koenig I, Senn L, Grandbastien B. Comparison of whole genome (wg-) and core genome (cg-) MLST (BioNumericsTM) versus SNP variant calling for epidemiological investigation of Pseudomonas aeruginosa. Front Microbiol 2020; 11:1729 [View Article] [PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.001280
Loading
/content/journal/mgen/10.1099/mgen.0.001280
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error