1887

Abstract

is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the population is driven by high genome plasticity and a very large gene pool. All these have made one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 and genomes to provide a single, uniform, high-quality dataset. were included as they are considered specialized pathovars of . We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between lineages and the distribution and flow of genes in the population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen.

Funding
This study was supported by the:
  • European Research Council () (Award 742158)
    • Principle Award Recipient: JukkaCorander
  • Wellcome Trust PhD Scholarship Grant (Award 204016)
    • Principle Award Recipient: GerryTonkin-Hill
  • Wellcome Sanger Institute PhD Studentship (Award NA)
    • Principle Award Recipient: GalHoresh
  • Wellcome Sanger Institute (Award 206194)
    • Principle Award Recipient: NotApplicable
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000499
2021-01-08
2024-09-20
Loading full text...

Full text loading...

/deliver/fulltext/mgen/7/2/mgen000499.html?itemId=/content/journal/mgen/10.1099/mgen.0.000499&mimeType=html&fmt=ahah

References

  1. Rasko DA, Rosovitz MJ, Myers GSA, Mongodin EF, Fricke WF et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol 2008; 190:6881–6893 [View Article][PubMed]
    [Google Scholar]
  2. Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 2009; 5:e1000344 [View Article][PubMed]
    [Google Scholar]
  3. Zhou Z, Alikhan N-F, Mohamed K, Fan Y. Agama Study Group The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Res 2020; 30:138–152 [View Article][PubMed]
    [Google Scholar]
  4. Croxen MA, Law RJ, Scholz R, Keeney KM, Wlodarska M et al. Recent advances in understanding enteric pathogenic Escherichia coli . Clin Microbiol Rev 2013; 26:822–880 [View Article][PubMed]
    [Google Scholar]
  5. Welch RA, Burland V, Plunkett G, Redford P, Roesch P et al. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli . Proc Natl Acad Sci USA 2002; 99:17020–17024 [View Article][PubMed]
    [Google Scholar]
  6. Wirth T, Falush D, Lan R, Colles F, Mensa P et al. Sex and virulence in Escherichia coli: an evolutionary perspective. Mol Microbiol 2006; 60:1136–1151 [View Article][PubMed]
    [Google Scholar]
  7. Didelot X, Méric G, Falush D, Darling AE. Impact of homologous and non-homologous recombination in the genomic evolution of Escherichia coli . BMC Genomics 2012; 13:256 [View Article][PubMed]
    [Google Scholar]
  8. World Health Organization Global Priority List of Antibiotic-Resistant Bacteria to Guide Research, Discovery, and Development of New Antibiotics Geneva: World Health Organization; 2017
    [Google Scholar]
  9. Pettengill EA, Pettengill JB, Binet R. Phylogenetic analyses of Shigella and enteroinvasive Escherichia coli for the identification of molecular epidemiological markers: whole-genome comparative analysis does not support distinct genera designation. Front Microbiol 2015; 6:1573 [View Article][PubMed]
    [Google Scholar]
  10. Chattaway MA, Schaefer U, Tewolde R, Dallman TJ, Jenkins C. Identification of Escherichia coli and Shigella species from whole-genome sequences. J Clin Microbiol 2017; 55:616–623 [View Article][PubMed]
    [Google Scholar]
  11. Ochoa TJ, Contreras CA. Enteropathogenic Escherichia coli infection in children. Curr Opin Infect Dis 2011; 24:478–483 [View Article][PubMed]
    [Google Scholar]
  12. Kotloff KL, Nataro JP, Blackwelder WC, Nasrin D, Farag TH et al. Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case-control study. Lancet 2013; 382:209–222 [View Article][PubMed]
    [Google Scholar]
  13. de la Cabada Bauche J, Dupont HL. New developments in traveler’s diarrhea. Gastroenterol Hepatol 2011; 7:88–95
    [Google Scholar]
  14. Nguyen Y, Sperandio V. Enterohemorrhagic E. coli (EHEC) pathogenesis. Front Cell Infect Microbiol 2012; 2:90 [View Article][PubMed]
    [Google Scholar]
  15. Dean-Nystrom EA, Bosworth BT, Moon HW. Pathogenesis of Escherichia coli O157:H7 in weaned calves. In Paul PS, Francis DH. eds Mechanisms in the Pathogenesis of Enteric Diseases 2 Boston, MA: Springer; 1999 pp 173–177
    [Google Scholar]
  16. Carattoli A. Plasmids and the spread of resistance. Int J Med Microbiol 2013; 303:298–304 [View Article][PubMed]
    [Google Scholar]
  17. Acheson DW, Reidl J, Zhang X, Keusch GT, Mekalanos JJ et al. In vivo transduction with shiga toxin 1-encoding phage. Infect Immun 1998; 66:4496–4498 [View Article][PubMed]
    [Google Scholar]
  18. Dudley EG, Thomson NR, Parkhill J, Morin NP, Nataro JP. Proteomic and microarray characterization of the AggR regulon identifies a pheU pathogenicity island in enteroaggregative Escherichia coli . Mol Microbiol 2006; 61:1267–1282 [View Article][PubMed]
    [Google Scholar]
  19. Pilla G, Tang CM. Going around in circles: virulence plasmids in enteric pathogens. Nat Rev Microbiol 2018; 16:484–495 [View Article][PubMed]
    [Google Scholar]
  20. Kallonen T, Brodrick HJ, Harris SR, Corander J, Brown NM et al. Systematic longitudinal survey of invasive Escherichia coli in England demonstrates a stable population structure only transiently disturbed by the emergence of ST131. Genome Res 2017; 27:1437–1449 [View Article][PubMed]
    [Google Scholar]
  21. Brodrick HJ, Raven KE, Kallonen T, Jamrozy D, Blane B et al. Longitudinal genomic surveillance of multidrug-resistant Escherichia coli carriage in a long-term care facility in the United Kingdom. Genome Med 2017; 9:70 [View Article][PubMed]
    [Google Scholar]
  22. Baker KS, Burnett E, McGregor H, Deheer-Graham A, Boinett C et al. The Murray collection of pre-antibiotic era Enterobacteriacae: a unique research resource. Genome Med 2015; 7:97 [View Article][PubMed]
    [Google Scholar]
  23. Hazen TH, Donnenberg MS, Panchalingam S, Antonio M, Hossain A et al. Genomic diversity of EPEC associated with clinical presentations of differing severity. Nat Microbiol 2016; 1:15014 [View Article][PubMed]
    [Google Scholar]
  24. von Mentzer A, Connor TR, Wieler LH, Semmler T, Iguchi A et al. Identification of enterotoxigenic Escherichia coli (ETEC) clades with long-term global distribution. Nat Genet 2014; 46:1321–1326 [View Article][PubMed]
    [Google Scholar]
  25. Salipante SJ, Roach DJ, Kitzman JO, Snyder MW, Stackhouse B et al. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains. Genome Res 2015; 25:119–128 [View Article][PubMed]
    [Google Scholar]
  26. Ingle DJ, Tauschek M, Edwards DJ, Hocking DM, Pickard DJ et al. Evolution of atypical enteropathogenic E. coli by repeated acquisition of LEE pathogenicity island variants. Nat Microbiol 2016; 1:15010 [View Article][PubMed]
    [Google Scholar]
  27. Goh KGK, Phan M-D, Forde BM, Chong TM, Yin W-F et al. Genome-wide discovery of genes required for capsule production by uropathogenic Escherichia coli . mBio 2017; 8:e01558-17 [View Article][PubMed]
    [Google Scholar]
  28. Chen SL, Wu M, Henderson JP, Hooton TM, Hibbing ME et al. Genomic diversity and fitness of E. coli strains recovered from the intestinal and urinary tracts of women with recurrent urinary tract infection. Sci Transl Med 2013; 5:184ra60 [View Article][PubMed]
    [Google Scholar]
  29. Public Health England Public Health England Routine Surveillance BioProject (PRJNA315192) (downloaded on September 17th 2018) London: Public Health England; 2018
    [Google Scholar]
  30. Bolger A, Giorgi F. Trimmomatic: a Flexible Read Trimming Tool for Illumina NGS Data 2014 http://www usadellab org/cms/index php
  31. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014; 15:R46 [View Article][PubMed]
    [Google Scholar]
  32. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008; 18:821–829 [View Article][PubMed]
    [Google Scholar]
  33. Page AJ, De Silva N, Hunt M, Quail MA, Parkhill J et al. Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data. Microb Genom 2016; 2:e000083 [View Article][PubMed]
    [Google Scholar]
  34. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016; 17:132 [View Article][PubMed]
    [Google Scholar]
  35. Kim M, Oh H-S, Park S-C, Chun J. Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol 2014; 64:346–351 [View Article][PubMed]
    [Google Scholar]
  36. Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol 2020; 21:180 [View Article][PubMed]
    [Google Scholar]
  37. Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11:119 [View Article][PubMed]
    [Google Scholar]
  38. Turner I, Garimella KV, Iqbal Z, McVean G. Integrating long-range connectivity information into de Bruijn graphs. Bioinformatics 2018; 34:2556–2565 [View Article][PubMed]
    [Google Scholar]
  39. Bradley P, den Bakker HC, Rocha EPC, McVean G, Iqbal Z. Ultrafast search of all deposited bacterial and viral genomic data. Nat Biotechnol 2019; 37:152–159 [View Article][PubMed]
    [Google Scholar]
  40. Page AJ, Taylor B, Keane JA. Multilocus sequence typing by blast from de novo assemblies against PubMLST. JOSS 2016; 8:118 [View Article]
    [Google Scholar]
  41. Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res 2019; 29:304316 [View Article]
    [Google Scholar]
  42. Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015; 31:3691–3693 [View Article][PubMed]
    [Google Scholar]
  43. Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb Genom 2016; 2:e000056 [View Article][PubMed]
    [Google Scholar]
  44. Price MN, Dehal PS, Arkin AP. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One 2010; 5:e9490 [View Article][PubMed]
    [Google Scholar]
  45. Menardo F, Loiseau C, Brites D, Coscolla M, Gygli SM et al. Treemmer: a tool to reduce large phylogenetic datasets with minimal loss of diversity. BMC Bioinformatics 2018; 19:164 [View Article][PubMed]
    [Google Scholar]
  46. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014; 30:1312–1313 [View Article][PubMed]
    [Google Scholar]
  47. Beghain J, Bridier-Nahmias A, Le Nagard H, Denamur E, Clermont O. ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping. Microb Genom 2018; 4:e000192 [View Article][PubMed]
    [Google Scholar]
  48. Clermont O, Christenson JK, Denamur E, Gordon DM. The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups. Environ Microbiol Rep 2013; 5:58–65 [View Article][PubMed]
    [Google Scholar]
  49. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S et al. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother 2012; 67:2640–2644 [View Article][PubMed]
    [Google Scholar]
  50. Hunt M, Mather AE, Sánchez-Busó L, Page AJ, Parkhill J et al. ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. Microb Genom 2017; 3:e000131 [View Article][PubMed]
    [Google Scholar]
  51. Robins-Browne RM, Holt KE, Ingle DJ, Hocking DM, Yang J et al. Are Escherichia coli pathotypes still relevant in the era of whole-genome sequencing?. Front Cell Infect Microbiol 2016; 6:141 [View Article][PubMed]
    [Google Scholar]
  52. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12:2825–2830
    [Google Scholar]
  53. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013; 30:772–780 [View Article][PubMed]
    [Google Scholar]
  54. Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 2004; 20:289–290 [View Article][PubMed]
    [Google Scholar]
  55. Yu G, Smith DK, Zhu H, Guan Y, Lam TT‐Y. GGTREE: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 2017; 8:28–36 [View Article]
    [Google Scholar]
  56. Wickham H. ggplot2: Elegant Graphics for Data Analysis New York: Springer; 2016
    [Google Scholar]
  57. Day MJ, Doumith M, Abernethy J, Hope R, Reynolds R et al. Population structure of Escherichia coli causing bacteraemia in the UK and Ireland between 2001 and 2010. J Antimicrob Chemother 2016; 71:2139–2142 [View Article][PubMed]
    [Google Scholar]
  58. Bortolaia V, Larsen J, Damborg P, Guardabassi L. Potential pathogenicity and host range of extended-spectrum beta-lactamase-producing Escherichia coli isolates from healthy poultry. Appl Environ Microbiol 2011; 77:5830–5833 [View Article][PubMed]
    [Google Scholar]
  59. Selander RK, Caugant DA, Whittam TS. Genetic structure and variation in natural populations of Escherichia coli . In Neidhardt FC, Ingraham JL, Low KB, Magasanik B, Schaechter M et al. eds Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology Washington, DC: American Society for Microbiology; 1987 pp 1625–1648
    [Google Scholar]
  60. Herzer PJ, Inouye S, Inouye M, Whittam TS. Phylogenetic distribution of branched RNA-linked multicopy single-stranded DNA among natural isolates of Escherichia coli . J Bacteriol 1990; 172:6175–6181 [View Article][PubMed]
    [Google Scholar]
  61. Clermont O, Olier M, Hoede C, Diancourt L, Brisse S et al. Animal and human pathogenic Escherichia coli strains share common genetic backgrounds. Infect Genet Evol 2011; 11:654–662 [View Article][PubMed]
    [Google Scholar]
  62. Clermont O, Dixit OVA, Vangchhia B, Condamine B, Dion S et al. Characterization and rapid identification of phylogroup G in Escherichia coli, a lineage with high virulence and antibiotic resistance potential. Environ Microbiol 2019; 21:3107–3117 [View Article][PubMed]
    [Google Scholar]
  63. Waters NR, Abram F, Brennan F, Holmes A, Pritchard L. Easily phylotyping E. coli via the EzClermont web app and command-line tool. Access Microbiology 2020; 6:acmi000143
    [Google Scholar]
  64. Magiorakos A-P, Srinivasan A, Carey RB, Carmeli Y, Falagas ME et al. Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance. Clin Microbiol Infect 2012; 18:268–281 [View Article][PubMed]
    [Google Scholar]
  65. Touchon M, Perrin A, de Sousa JAM, Vangchhia B, Burn S et al. Phylogenetic background and habitat drive the genetic diversification of Escherichia coli . PLoS Genet 2020; 16:e1008866 [View Article][PubMed]
    [Google Scholar]
  66. Burger R. EHEC O104:H4 in Germany 2011: Large Outbreak of Bloody Diarrhea and Haemolytic Uraemic Syndrome by Shiga Toxin-Producing E. coli via Contaminated Food Washington, DC: National Academies Press; 2012
    [Google Scholar]
  67. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J et al. blast+: architecture and applications. BMC Bioinformatics 2009; 10:421 [View Article][PubMed]
    [Google Scholar]
  68. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods 2015; 12:59–60 [View Article][PubMed]
    [Google Scholar]
  69. Abram K, Udaondo Z, Bleker C, Wanchai V, Wassenaar TM. What can we learn from over 100000 Escherichia coli genomes?. bioRxiv 2020708131
    [Google Scholar]
  70. Hazen TH, Sahl JW, Fraser CM, Donnenberg MS, Scheutz F et al. Refining the pathovar paradigm via phylogenomics of the attaching and effacing Escherichia coli . Proc Natl Acad Sci USA 2013; 110:12810–12815 [View Article][PubMed]
    [Google Scholar]
  71. Public Health England Public Health England NCTC 3000 Project reference collection ( https://www.phe-culturecollections.org.uk/collections/nctc-3000-project) London: Public Health England; 2020
    [Google Scholar]
  72. Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 2016; 44:W242–W245 [View Article][PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.000499
Loading
/content/journal/mgen/10.1099/mgen.0.000499
Loading

Data & Media loading...

Supplements

Loading data from figshare Loading data from figshare
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error