Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli

Timothy J. Davies; Jeremy Swann; Anna E. Sheppard; Hayleigh Pickford; Samuel Lipworth; Manal AbuOun; Matthew J. Ellington; Philip W. Fowler; Susan Hopkins; Katie L. Hopkins; Derrick W. Crook; Timothy E. A. Peto; Muna F. Anjum; A. Sarah Walker; Nicole Stoesser

doi:10.1099/mgen.0.001151

Volume 9, Issue 12

Research Article

Open Access

Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli

Timothy J. Davies^1,2, Jeremy Swann^1,2, Anna E. Sheppard^1,2, Hayleigh Pickford^1,2, Samuel Lipworth^1,2, Manal AbuOun³, Matthew J. Ellington^2,4, Philip W. Fowler¹, Susan Hopkins^2,4, Katie L. Hopkins^2,5, Derrick W. Crook^1,2,6, Timothy E. A. Peto^1,2,6, Muna F. Anjum³, A. Sarah Walker^1,2 and Nicole Stoesser^1,2,6
View Affiliations Hide Affiliations

Affiliations: ¹ Nuffield Department of Medicine, Oxford University, Oxford, UK ² National Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, Oxford, UK ³ Bacteriology, Animal and Plant Health Agency, Surrey, UK ⁴ Antimicrobial Resistance and Healthcare Associated Infections (AMRHAI) Division, UK Health Security Agency, London, UK ⁵ HCAI, Fungal, AMR, AMU and Sepsis Division, UK Health Security Agency, London, UK ⁶ Oxford University Hospitals NHS Foundation Trust, Oxford, UK
*Correspondence: Nicole Stoesser, [email protected] *Correspondence: Timothy J. Davies, [email protected]
Published: 15 December 2023 https://doi.org/10.1099/mgen.0.001151

Abstract

Several bioinformatics genotyping algorithms are now commonly used to characterize antimicrobial resistance (AMR) gene profiles in whole-genome sequencing (WGS) data, with a view to understanding AMR epidemiology and developing resistance prediction workflows using WGS in clinical settings. Accurately evaluating AMR in Enterobacterales, particularly Escherichia coli , is of major importance, because this is a common pathogen. However, robust comparisons of different genotyping approaches on relevant simulated and large real-life WGS datasets are lacking. Here, we used both simulated datasets and a large set of real E. coli WGS data (n=1818 isolates) to systematically investigate genotyping methods in greater detail. Simulated constructs and real sequences were processed using four different bioinformatic programs (ABRicate, ARIBA, KmerResistance and SRST2, run with the ResFinder database) and their outputs compared. For simulation tests where 3079 AMR gene variants were inserted into random sequence constructs, KmerResistance was correct for 3076 (99.9 %) simulations, ABRicate for 3054 (99.2 %), ARIBA for 2783 (90.4 %) and SRST2 for 2108 (68.5 %). For simulation tests where two closely related gene variants were inserted into random sequence constructs, KmerResistance identified the correct alleles in 35 338/46 318 (76.3 %) simulations, ABRicate identified them in 11 842/46 318 (25.6 %) simulations, ARIBA identified them in 1679/46 318 (3.6 %) simulations and SRST2 identified them in 2000/46 318 (4.3 %) simulations. In real data, across all methods, 1392/1818 (76 %) isolates had discrepant allele calls for at least 1 gene. In addition to highlighting areas for improvement in challenging scenarios, (e.g. identification of AMR genes at <10× coverage, identifying multiple closely related AMR genes present in the same sample), our evaluations identified some more systematic errors that could be readily soluble, such as repeated misclassification (i.e. naming) of genes as shorter variants of the same gene present within the reference resistance gene database. Such naming errors accounted for at least 2530/4321 (59 %) of the discrepancies seen in real data. Moreover, many of the remaining discrepancies were likely ‘artefactual’, with reporting of cut-off differences accounting for at least 1430/4321 (33 %) discrepants. Whilst we found that comparing outputs generated by running multiple algorithms on the same dataset could identify and resolve these algorithmic artefacts, the results of our evaluations emphasize the need for developing new and more robust genotyping algorithms to further improve accuracy and performance.

Received: 27/06/2023
Accepted: 21/11/2023
Published Online: 15/12/2023

Keyword(s): antimicrobial resistance genotyping , Escherichia coli , genomics and resistance prediction

Funding

This study was supported by the:

National Institute for Health Research Health Protection Research Unit (Award NIHR200915)
- Principle Award Recipient: ASarah Walker

This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001151

2023-12-15

2024-05-05

Full text loading...

/deliver/fulltext/mgen/9/12/mgen001151.html?itemId=/content/journal/mgen/10.1099/mgen.0.001151&mimeType=html&fmt=ahah

References

Quainoo S, Coolen JPM, van Hijum S, Huynen MA, Melchers WJG et al. Whole-genome sequencing of bacterial pathogens: the future of nosocomial outbreak analysis. Clin Microbiol Rev 2017; 30:1015–1063 [View Article] [PubMed]
[Google Scholar]
Quan TP, Bawa Z, Foster D, Walker T, Del Ojo Elias C et al. Evaluation of whole-genome sequencing for mycobacterial species identification and drug susceptibility testing in a clinical setting: a large-scale prospective assessment of performance against line probe assays and phenotyping. J Clin Microbiol 2017; 56:e01480-17 [View Article] [PubMed]
[Google Scholar]
Sherry NL, Horan KA, Ballard SA, Gonҫalves da Silva A, Gorrie CL et al. An ISO-certified genomics workflow for identification and surveillance of antimicrobial resistance. Nat Commun 2023; 14:1–12 [View Article] [PubMed]
[Google Scholar]
ISO 15189:2022 Medical laboratories — Requirements for quality and competence. n.d https://www.iso.org/standard/76677.html accessed 7 November 2023
Guidance for Industry and FDA Class II Special Controls Guidance Document: Antimicrobial Susceptibility Test (AST) Systems; 2018 http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocument accessed 11 April 2019
Verschuuren T, Bosch T, Mascaro V, Willems R, Kluytmans J. External validation of WGS-based antimicrobial susceptibility prediction tools, KOVER-AMR and ResFinder 4.1, for Escherichia coli clinical isolates. Clin Microbiol Infect 2022; 28:1465–1470 [View Article] [PubMed]
[Google Scholar]
Bortolaia V, Kaas RS, Ruppe E, Roberts MC, Schwarz S et al. ResFinder 4.0 for predictions of phenotypes from genotypes. J Antimicrob Chemother 2020; 75:3491–3500 [View Article] [PubMed]
[Google Scholar]
Brian PA, Amogelang RR, Tammy TY, Kara TK, Mégane B et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res 2020D517–D525
[Google Scholar]
Feldgarden M, Brover V, Haft DH, Prasad AB, Slotta DJ et al. Using the NCBI AMRFinder tool to determine antimicrobial resistance genotype-phenotype correlations within a collection of NARMS isolates. Antimicrob Agents Chemother 201900483-19 [View Article]
[Google Scholar]
Hunt M, Mather AE, Sánchez-Busó L, Page AJ, Parkhill J et al. ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. Microb Genom 2017; 3:e000131 [View Article] [PubMed]
[Google Scholar]
Inouye M, Dashnow H, Raven L-A, Schultz MB, Pope BJ et al. SRST2: rapid genomic surveillance for public health and hospital microbiology labs. Genome Med 2014; 6:90 [View Article] [PubMed]
[Google Scholar]
Clausen P, Zankari E, Aarestrup FM, Lund O. Benchmarking of methods for identification of antimicrobial resistance genes in bacterial whole genome data. J Antimicrob Chemother 2016; 71:2484–2488 [View Article] [PubMed]
[Google Scholar]
Zankari E, Allesøe R, Joensen KG, Cavaco LM, Lund O et al. PointFinder: a novel web tool for WGS-based detection of antimicrobial resistance associated with chromosomal point mutations in bacterial pathogens. J Antimicrob Chemother 2017; 72:2764–2768 [View Article] [PubMed]
[Google Scholar]
Stoesser N, Batty EM, Eyre DW, Morgan M, Wyllie DH et al. Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella pneumoniae isolates using whole genomic sequence data. J Antimicrob Chemother 2013; 68:2234–2244 [View Article] [PubMed]
[Google Scholar]
Shelburne SA, Kim J, Munita JM, Sahasrabhojane P, Shields RK et al. Whole-genome sequencing accurately identifies resistance to extended-spectrum β-lactams for major gram-negative bacterial pathogens. Clin Infect Dis 2017; 65:738–745 [View Article] [PubMed]
[Google Scholar]
Stubberfield E, AbuOun M, Sayers E, O’Connor HM, Card RM et al. Use of whole genome sequencing of commensal Escherichia coli in pigs for antimicrobial resistance surveillance, United Kingdom, 2018. Euro Surveill 2019; 24:1900136 [View Article] [PubMed]
[Google Scholar]
Doyle RM, O’Sullivan DM, Aller SD, Bruchmann S, Clark T et al. Discordant bioinformatic predictions of antimicrobial resistance from whole-genome sequencing data of bacterial isolates: an inter-laboratory study. Microb Genom 2020; 6:e000335 [View Article] [PubMed]
[Google Scholar]
Seemann T. ABRicate; 2020 https://github.com/tseemann/abricate. https://github.com/tseemann/abricate accessed 19 October 2023
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990; 215:403–410 [View Article] [PubMed]
[Google Scholar]
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9:357–359 [View Article] [PubMed]
[Google Scholar]
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–477 [View Article] [PubMed]
[Google Scholar]
van der Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 2011; 13:22–30 [View Article]
[Google Scholar]
Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009; 25:1422–1423 [View Article] [PubMed]
[Google Scholar]
Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics 2012; 28:593–594 [View Article]
[Google Scholar]
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 2013; 29:1072–1075 [View Article] [PubMed]
[Google Scholar]
Ellington MJ, Ekelund O, Aarestrup FM, Canton R, Doumith M et al. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee. Clin Microbiol Infect 2017; 23:2–22 [View Article] [PubMed]
[Google Scholar]
Davies TJ, Stoesser N, Sheppard AE, Abuoun M, Fowler P et al. Reconciling the potentially irreconcilable? Genotypic and phenotypic amoxicillin-clavulanate resistance in Escherichia coli. Antimicrob Agents Chemother 2020; 64:e02026-19 [View Article] [PubMed]
[Google Scholar]
AbuOun M, Stubberfield EJ, Duggett NA, Kirchner M, Dormer L et al. mcr-1 and mcr-2 variant genes identified in Moraxella species isolated from pigs in Great Britain from 2014 to 2015. J Antimicrob Chemother 2017; 72:2745–2749 [View Article] [PubMed]
[Google Scholar]
Zerbino DR. Using the velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics 2010; 11: [View Article] [PubMed]
[Google Scholar]
Stoesser N, Sheppard AE, Peirano G, Anson LW, Pankhurst L et al. Genomic epidemiology of global Klebsiella pneumoniae carbapenemase (KPC)-producing Escherichia coli. Sci Rep 2017; 7:5917 [View Article] [PubMed]
[Google Scholar]
Wirth T, Falush D, Lan R, Colles F, Mensa P et al. Sex and virulence in Escherichia coli: an evolutionary perspective. Mol Microbiol 2006; 60:1136–1151 [View Article] [PubMed]
[Google Scholar]
Papp M, Solymosi N. Review and comparison of antimicrobial resistance gene databases. Antibiotics 2022 [View Article]
[Google Scholar]
Madden DE, Baird T, Bell SC, McCarthy KL, Price EP et al. Keeping up with the pathogens: improved antimicrobial resistance detection and prediction from Pseudomonas aeruginosa. medRxiv 202222278689 [View Article]
[Google Scholar]
McArthur AG, Tsang KK. Antimicrobial resistance surveillance in the genomic age. Ann N Y Acad Sci 2017; 1388:78–91 [View Article] [PubMed]
[Google Scholar]

http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001151

Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli

M Gen 9, 001151 (2023); https://doi.org/10.1099/mgen.0.001151

/content/journal/mgen/10.1099/mgen.0.001151

Volume 9, Issue 12

Research Article

Open Access

Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli

Abstract

Funding

Supplementary material 1

Supplementary material 2

Most read this month

Most cited Most Cited RSS feed

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification

Completing bacterial genome assemblies with multiplex MinION sequencing

ResFinder – an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes

SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies

ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping

Identification of Klebsiella capsule synthesis loci from whole genome data

Emergence, molecular mechanisms and global spread of carbapenem-resistant Acinetobacter baumannii

chewBBACA: A complete suite for gene-by-gene schema creation and strain identification

Kaptive 2.0: updated capsule and lipopolysaccharide locus typing for the Klebsiella pneumoniae species complex