Skip to content
1887

Abstract

Genetic distances between bacterial DNA sequences can be used to cluster populations into closely related subpopulations and as an additional source of information when detecting possible transmission events. Due to their variable gene content and order, reference-free methods offer more sensitive detection of genetic differences, especially among closely related samples found in outbreaks. However, across longer genetic distances, frequent recombination can make calculation and interpretation of these differences more challenging, requiring significant bioinformatic expertise and manual intervention during the analysis process. Here, we present a ulation analysis line (PopPIPE) which combines rapid reference-free genome analysis methods to analyse bacterial genomes across these two scales, splitting whole populations into subclusters and detecting plausible transmission events within closely related clusters. We use k-mer sketching to split populations into strains, followed by split k-mer analysis and recombination removal to create alignments and subclusters within these strains. We first show that this approach creates high-quality subclusters on a population-wide dataset of . When applied to nosocomial vancomycin-resistant samples, PopPIPE finds transmission clusters that are more epidemiologically plausible than core genome or multilocus sequence typing (MLST) approaches. Our pipeline is rapid and reproducible, creates interactive visualizations and can easily be reconfigured and re-run on new datasets. Therefore, PopPIPE provides a user-friendly pipeline for analyses spanning species-wide clustering to outbreak investigations.

Funding
This study was supported by the:
  • Biotechnology and Biological Sciences Research Council (Award BB/S019669/1)
    • Principal Award Recipient: MartinP. McHugh
  • Chief Scientist Office (Award SIRN/10)
    • Principal Award Recipient: MartinP. McHugh
  • Medical Research Council (Award MR/X020258/1)
    • Principal Award Recipient: NicholasJ. Croucher
  • Medical Research Council (Award MR/T016434/1)
    • Principal Award Recipient: NicholasJ. Croucher
  • Medical Research Council (Award MR/R015600/1)
    • Principal Award Recipient: NicholasJ. Croucher
  • Biotechnology and Biological Sciences Research Council (Award BB/Y513805/1)
    • Principal Award Recipient: JohnA Lees
  • European Molecular Biology Laboratory
    • Principal Award Recipient: JohnA Lees
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001404
2025-04-28
2026-04-21

Metrics

Loading full text...

Full text loading...

/deliver/fulltext/mgen/11/4/mgen001404.html?itemId=/content/journal/mgen/10.1099/mgen.0.001404&mimeType=html&fmt=ahah

References

  1. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 2018; 9:5114 [View Article] [PubMed]
    [Google Scholar]
  2. Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res 2019; 29:304–316 [View Article]
    [Google Scholar]
  3. Rodriguez-R LM, Conrad RE, Viver T, Feistel DJ, Lindner BG et al. An ANI gap within bacterial species that advances the definitions of intra-species units. mBio 2024; 15:e0269623 [View Article] [PubMed]
    [Google Scholar]
  4. Hedlund BP, Chuvochina M, Hugenholtz P, Konstantinidis KT, Murray AE et al. SeqCode: a nomenclatural code for prokaryotes described from sequence data. Nat Microbiol 2022; 7:1702–1708 [View Article] [PubMed]
    [Google Scholar]
  5. Duchêne S, Holt KE, Weill F-X, Le Hello S, Hawkey J et al. Genome-scale rates of evolutionary change in bacteria. Microb Genom 2016; 2:e000094 [View Article] [PubMed]
    [Google Scholar]
  6. Wymant C, Hall M, Ratmann O, Bonsall D, Golubchik T et al. PHYLOSCANNER:Inferring transmission from within- and between-host pathogen genetic diversity. Mol Biol Evol 2018; 35:719–733 [View Article] [PubMed]
    [Google Scholar]
  7. Didelot X, Fraser C, Gardy J, Colijn C. Genomic infectious disease epidemiology in partially sampled and ongoing outbreaks. Mol Biol Evol 2017; 34:997–1007 [View Article] [PubMed]
    [Google Scholar]
  8. Cori A, Nouvellet P, Garske T, Bourhy H, Nakouné E et al. A graph-based evidence synthesis approach to detecting outbreak clusters: an application to dog rabies. PLoS Comput Biol 2018; 14:e1006554 [View Article] [PubMed]
    [Google Scholar]
  9. Smith CM, Allen DJ, Nawaz S, Kozlakidis Z, Nastouli E et al. An interactive data visualisation application to investigate nosocomial transmission of infections. Wellcome Open Res 2019; 4:100 [View Article] [PubMed]
    [Google Scholar]
  10. Nagraj VP, Randhawa N, Campbell F, Crellen T, Sudre B et al. epicontacts: handling, visualisation and analysis of epidemiological contacts. F1000Res 2018; 7:566 [View Article] [PubMed]
    [Google Scholar]
  11. Sakoparnig T, Field C, van Nimwegen E. Whole genome phylogenies reflect the distributions of recombination rates for many bacterial species. Elife 2021; 10:e65366 Epub ahead of print 8 January 2021 [View Article] [PubMed]
    [Google Scholar]
  12. Hedge J, Wilson DJ. Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. mBio 2014; 5:e02158 [View Article] [PubMed]
    [Google Scholar]
  13. Sherry NL, Horan KA, Ballard SA, Gonҫalves da Silva A, Gorrie CL et al. An ISO-certified genomics workflow for identification and surveillance of antimicrobial resistance. Nat Commun 2023; 14:60 [View Article] [PubMed]
    [Google Scholar]
  14. Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH et al. Sustainable data analysis with Snakemake. F1000Res 2021; 10:33 [View Article] [PubMed]
    [Google Scholar]
  15. Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E et al. Nextflow enables reproducible computational workflows. Nat Biotechnol 2017; 35:316–319 [View Article] [PubMed]
    [Google Scholar]
  16. Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 2015; 43:e15 [View Article] [PubMed]
    [Google Scholar]
  17. Didelot X, Wilson DJ. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 2015; 11:e1004041 [View Article] [PubMed]
    [Google Scholar]
  18. Hennart M, Guglielmini J, Bridel S, Maiden MCJ, Jolley KA et al. A dual barcoding approach to bacterial strain nomenclature: genomic taxonomy of Klebsiella pneumoniae strains. Mol Biol Evol 2022; 39:msac135 Epub ahead of print 2 July 2022 [View Article] [PubMed]
    [Google Scholar]
  19. Croucher NJ, Finkelstein JA, Pelton SI, Mitchell PK, Lee GM et al. Population genomics of post-vaccine changes in pneumococcal epidemiology. Nat Genet 2013; 45:656–663 [View Article] [PubMed]
    [Google Scholar]
  20. Derelle R, von Wachsmann J, Mäklin T, Hellewell J, Russell T et al. Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis. Genome Res 2024; 34:1661–1673 [View Article] [PubMed]
    [Google Scholar]
  21. Mostowy R, Croucher NJ, Andam CP, Corander J, Hanage WP et al. Efficient inference of recent and ancestral recombination within bacterial populations. Mol Biol Evol 2017; 34:1167–1182 [View Article] [PubMed]
    [Google Scholar]
  22. Derelle R, Lees J, Phelan J, Lalvani A, Arinaminpathy N et al. fastlin: an ultra-fast program for Mycobacterium tuberculosis complex lineage typing. Bioinformatics 2023; 39: Epub ahead of print 1 November 2023 [View Article]
    [Google Scholar]
  23. Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol 2020; 5:1403–1407 [View Article] [PubMed]
    [Google Scholar]
  24. Simonsen M, Mailund T, Pedersen CNS. Rapid Neighbour-Joining. In Algorithms in Bioinformatics Springer Berlin Heidelberg; 2008 pp 113–122 [View Article]
    [Google Scholar]
  25. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 2015; 32:268–274 [View Article] [PubMed]
    [Google Scholar]
  26. Tonkin-Hill G, Lees JA, Bentley SD, Frost SDW, Corander J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Res 2019; 47:5539–5549 Epub ahead of print 11 May 2019 [View Article] [PubMed]
    [Google Scholar]
  27. Argimón S, Abudahab K, Goater RJE, Fedosejev A, Bhai J et al. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microb Genom 2016; 2:e000093 [View Article] [PubMed]
    [Google Scholar]
  28. Lees JA, Tonkin-Hill G, Yang Z, Corander J. Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210237 [View Article] [PubMed]
    [Google Scholar]
  29. Didelot X, Croucher NJ, Bentley SD, Harris SR, Wilson DJ. Bayesian inference of ancestral dates on bacterial phylogenetic trees. Nucleic Acids Res 2018; 46:e134 Epub ahead of print 3 September 2018 [View Article]
    [Google Scholar]
  30. Croucher NJ, Finkelstein JA, Pelton SI, Parkhill J, Bentley SD et al. Population genomic datasets describing the post-vaccine evolutionary epidemiology of Streptococcus pneumoniae. Sci Data 2015; 2:150058 [View Article] [PubMed]
    [Google Scholar]
  31. Robinson O, Dylus D, Dessimoz C. Phylo.io: interactive viewing and comparison of large phylogenetic trees on the web. Mol Biol Evol 2016; 33:2163–2166 [View Article] [PubMed]
    [Google Scholar]
  32. Enright MC, Spratt BG. Extensive variation in the ddl gene of penicillin-resistant Streptococcus pneumoniae results from a hitchhiking effect driven by the penicillin-binding protein 2b gene. Mol Biol Evol 1999; 16:1687–1695 [View Article] [PubMed]
    [Google Scholar]
  33. Smith MR. Information theoretic generalized Robinson-Foulds metrics for comparing phylogenetic trees. Bioinformatics 2020; 36:5007–5013 [View Article] [PubMed]
    [Google Scholar]
  34. Lees JA, Kendall M, Parkhill J, Colijn C, Bentley SD et al. Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study. Wellcome Open Res 2018; 3:33 [View Article] [PubMed]
    [Google Scholar]
  35. Gladstone RA, Lo SW, Goater R, Yeats C, Taylor B et al. Visualizing variation within Global Pneumococcal Sequence Clusters (GPSCs) and country population snapshots to contextualize pneumococcal isolates. Microbial Genomics 2020; 6: Epub ahead of print May 2020 [View Article]
    [Google Scholar]
  36. Higgs C, Sherry NL, Seemann T, Horan K, Walpola H et al. Optimising genomic approaches for identifying vancomycin-resistant Enterococcus faecium transmission in healthcare settings. Nat Commun 2022; 13:509
    [Google Scholar]
  37. Maechler F, Weber A, Schwengers O, Schwab F, Denkel L et al. Split k-mer analysis compared to cgMLST and SNP-based core genome analysis for detecting transmission of vancomycin-resistant enterococci: results from routine outbreak analyses across different hospitals and hospitals networks in Berlin, Germany. Microbial Genomics 2023; 9: Epub ahead of print 30 January 2023 [View Article]
    [Google Scholar]
  38. Sundermann AJ, Rangachar Srinivasa V, Mills EG, Griffith MP, Evans E et al. Genomic sequencing surveillance of patients colonized with vancomycin-resistant Enterococcus (VRE) improves detection of hospital-associated transmission. medRxiv [View Article]
    [Google Scholar]
  39. Rath A, Kieninger B, Mirzaliyeva N, Schmid S, Mester P et al. The genome-oriented surveillance of vancomycin-resistant enterococci shows a clear misclassification of nosocomial transmission events. Clin Microbiol Infect 2024; 30:1086–1088
    [Google Scholar]
  40. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 2014; 30:2114–2120
    [Google Scholar]
  41. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–477 [View Article] [PubMed]
    [Google Scholar]
  42. Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb Genom 2016; 2:e000056 [View Article] [PubMed]
    [Google Scholar]
  43. McHugh MP, Pettigrew KA, Taori S, Evans TJ, Leanord A et al. Consideration of within-patient diversity highlights transmission pathways and antimicrobial resistance gene variability in vancomycin-resistant Enterococcus faecium. J Antimicrob Chemother 2024; 79:656–668 [View Article] [PubMed]
    [Google Scholar]
  44. Hunt M, Lima L, Shen W, Lees J, Iqbal Z et al. AllTheBacteria - all bacterial genomes assembled, available and searchable. bioRxiv 2024 [View Article]
    [Google Scholar]
  45. Stevens EL, Carleton HA, Beal J, Tillman GE, Lindsey RL et al. The use of whole-genome sequencing by the federal interagency collaboration for genomics for food and feed safety in the United States. J Food Prot 2022; 85:755–772 Epub ahead of print 8 March 2022 [View Article] [PubMed]
    [Google Scholar]
  46. Campbell F, Cori A, Ferguson N, Jombart T. Bayesian inference of transmission chains using timing of symptoms, pathogen genomes and contact data. PLoS Comput Biol 2019; 15:e1006930 [View Article] [PubMed]
    [Google Scholar]
  47. Lees JA, Kremer PHC, Manso AS, Croucher NJ, Ferwerda B et al. Large scale genomic analysis shows no evidence for pathogen adaptation between the blood and cerebrospinal fluid niches during bacterial meningitis. Microb Genom 2017; 3:e000103 [View Article] [PubMed]
    [Google Scholar]
  48. Colquhoun RM, Hall MB, Lima L, Roberts LW, Malone KM et al. Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs. Genome Biol 2021; 22:267 [View Article] [PubMed]
    [Google Scholar]
  49. Derelle R, Madon K, Hellewell J, Rodríguez-Bouza V, Arinaminpathy N et al. Reference-free variant calling with local graph construction with SKA lo (SKA). Mol Biol Evol 2025msaf077 [View Article] [PubMed]
    [Google Scholar]
  50. Permana B, Harris PNA, Roberts LW, Cuddihy T, Paterson DL et al. HAIviz: an interactive dashboard for visualising and integrating healthcare-associated genomic epidemiological data. Microb Genom 2024; 10:001200 [View Article] [PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.001404
Loading
/content/journal/mgen/10.1099/mgen.0.001404
Loading

Data & Media loading...

Supplements

Supplementary material 1

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An error occurred
Approval was partially successful, following selected items could not be processed due to error