1887

Abstract

Minimum Inhibitory Concentrations (MICs) are the gold standard for quantitatively measuring antibiotic resistance. However, lab-based MIC determination can be time-consuming and suffers from low reproducibility, and interpretation as sensitive or resistant relies on guidelines which change over time. Genome sequencing and machine learning promise to allow MIC prediction as an alternative approach which overcomes some of these difficulties, albeit the interpretation of MIC is still needed. Nevertheless, precisely how we should handle MIC data when dealing with predictive models remains unclear, since they are measured semi-quantitatively, with varying resolution, and are typically also left- and right-censored within varying ranges. We therefore investigated genome-based prediction of MICs in the pathogen using 4367 genomes with both simulated semi-quantitative traits and real MICs. As we were focused on clinical interpretation, we used interpretable rather than black-box machine learning models, namely, Elastic Net, Random Forests, and linear mixed models. Simulated traits were generated accounting for oligogenic, polygenic, and homoplastic genetic effects with different levels of heritability. Then we assessed how model prediction accuracy was affected when MICs were framed as regression and classification. Our results showed that treating the MICs differently depending on the number of concentration levels of antibiotic available was the most promising learning strategy. Specifically, to optimise both prediction accuracy and inference of the correct causal variants, we recommend considering the MICs as continuous and framing the learning problem as a regression when the number of observed antibiotic concentration levels is large, whereas with a smaller number of concentration levels they should be treated as a categorical variable and the learning problem should be framed as a classification. Our findings also underline how predictive models can be improved when prior biological knowledge is taken into account, due to the varying genetic architecture of each antibiotic resistance trait. Finally, we emphasise that incrementing the population database is pivotal for the future clinical implementation of these models to support routine machine-learning based diagnostics.

Funding
This study was supported by the:
  • Medical Research Council (Award MR/T016434/1)
    • Principle Award Recipient: NotApplicable
  • Medical Research Council (Award MR/R015600/1)
    • Principle Award Recipient: NotApplicable
  • European Molecular Biology Laboratory
    • Principle Award Recipient: JohnA. Lees
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001222
2024-03-26
2024-04-22
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10/3/mgen001222.html?itemId=/content/journal/mgen/10.1099/mgen.0.001222&mimeType=html&fmt=ahah

References

  1. Antimicrobial Resistance Collaborators Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet 2022; 399:629–655
    [Google Scholar]
  2. Thulin E, Sundqvist M, Andersson DI. Amdinocillin (Mecillinam) resistance mutations in clinical isolates and laboratory-selected mutants of Escherichia coli. Antimicrob Agents Chemother 2015; 59:1718–1727 [View Article] [PubMed]
    [Google Scholar]
  3. Elshamy AA, Aboshanab KM. A review on bacterial resistance to carbapenems: epidemiology, detection and treatment options. Future Sci OA 2020; 6:FSO438 [View Article] [PubMed]
    [Google Scholar]
  4. Li Y, Metcalf BJ, Chochua S, Li Z, Gertz RE et al. Penicillin-binding protein transpeptidase signatures for tracking and predicting β-lactam resistance levels in Streptococcus pneumoniae. mBio 2016; 7:e00756-16 [View Article] [PubMed]
    [Google Scholar]
  5. Skwark MJ, Croucher NJ, Puranen S, Chewapreecha C, Pesonen M et al. Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis. PLoS Genet 2017; 13:e1006508 [View Article] [PubMed]
    [Google Scholar]
  6. Ruppé E, Cherkaoui A, Lazarevic V, Emonet S, Schrenzel J. Establishing genotype-to-phenotype relationships in bacteria causing hospital-acquired pneumonia: a prelude to the application of clinical metagenomics. Antibiotics 2017; 6:30 [View Article] [PubMed]
    [Google Scholar]
  7. Břinda K, Callendrello A, Ma KC, MacFadden DR, Charalampous T et al. Rapid inference of antibiotic resistance and susceptibility by genomic neighbour typing. Nat Microbiol 2020; 5:455–464 [View Article] [PubMed]
    [Google Scholar]
  8. Adams E, Sepich-Poore GD, Miller-Montgomery S, Knight R. Using all our genomes: blood-based liquid biopsies for the early detection of cancer. VIEW 2022; 3:20200118 [View Article] [PubMed]
    [Google Scholar]
  9. Hunt M, Bradley P, Lapierre SG, Heys S, Thomsit M et al. Antibiotic resistance prediction for Mycobacterium tuberculosis from genome sequence data with Mykrobe. Wellcome Open Res 2019; 4:191 [View Article]
    [Google Scholar]
  10. Zhang Y, Mitchison D. The curious characteristics of pyrazinamide: a review. Int J Tuberc Lung Dis 2003; 7:6–21 [PubMed]
    [Google Scholar]
  11. Van Deun A, Aung KJM, Bola V, Lebeke R, Hossain MA et al. Rifampin drug resistance tests for tuberculosis: challenging the gold standard. J Clin Microbiol 2013; 51:2633–2640 [View Article] [PubMed]
    [Google Scholar]
  12. Camarlinghi G, Parisio EM, Antonelli A, Nardone M, Coppi M et al. Discrepancies in fosfomycin susceptibility testing of KPC-producing Klebsiella pneumoniae with various commercial methods. Diagn Microbiol Infect Dis 2019; 93:74–76 [View Article] [PubMed]
    [Google Scholar]
  13. Elias R, Melo-Cristino J, Lito L, Pinto M, Gonçalves L et al. Klebsiella pneumoniae and colistin susceptibility testing: performance evaluation for broth microdilution, agar dilution and minimum inhibitory concentration test strips and impact of the “Skipped Well” phenomenon. Diagnostics 2021; 11:2352 [View Article] [PubMed]
    [Google Scholar]
  14. Aanensen DM, Feil EJ, Holden MTG, Dordel J, Yeats CA et al. Whole-genome sequencing for routine pathogen surveillance in public health: a population snapshot of invasive Staphylococcus aureus in Europe. mBio 2016; 7:e00444-16 [View Article] [PubMed]
    [Google Scholar]
  15. Harris SR, Cole MJ, Spiteri G, Sánchez-Busó L, Golparian D et al. Public health surveillance of multidrug-resistant clones of Neisseria gonorrhoeae in Europe: a genomic survey. Lancet Infect Dis 2018; 18:758–768 [View Article] [PubMed]
    [Google Scholar]
  16. Eyre DW, De Silva D, Cole K, Peters J, Cole MJ et al. WGS to predict antibiotic MICs for Neisseria gonorrhoeae. J Antimicrob Chemother 2017; 72:1937–1947 [View Article] [PubMed]
    [Google Scholar]
  17. Florensa AF, Kaas RS, Clausen PTLC, Aytan-Aktug D, Aarestrup FM. ResFinder - an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes. Microb Genom 2022; 8:000748 [View Article] [PubMed]
    [Google Scholar]
  18. Feldgarden M, Brover V, Haft DH, Prasad AB, Slotta DJ et al. Validating the AMRFinder tool and resistance gene database by using antimicrobial resistance genotype-phenotype correlations in a collection of isolates. Antimicrob Agents Chemother 2019; 63:e00483-19 [View Article] [PubMed]
    [Google Scholar]
  19. Walker TM, Miotto P, Köser CU, Fowler PW, Knaggs J et al. The 2021 WHO catalogue of Mycobacterium tuberculosis complex mutations associated with drug resistance: a genotypic analysis. Lancet Microbe 2022; 3:e265–e273 [View Article] [PubMed]
    [Google Scholar]
  20. Farhat MR, Freschi L, Calderon R, Ioerger T, Snyder M et al. GWAS for quantitative resistance phenotypes in Mycobacterium tuberculosis reveals resistance genes and regulatory regions. Nat Commun 2019; 10:2128 [View Article] [PubMed]
    [Google Scholar]
  21. Davis JJ, Boisvert S, Brettin T, Kenyon RW, Mao C et al. Antimicrobial resistance prediction in PATRIC and RAST. Sci Rep 2016; 6:27930 [View Article]
    [Google Scholar]
  22. Moradigaravand D, Palm M, Farewell A, Mustonen V, Warringer J et al. Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data. PLoS Comput Biol 2018; 14:e1006258 [View Article] [PubMed]
    [Google Scholar]
  23. Macesic N, Bear Don’t Walk OJ, Pe’er I, Tatonetti NP, Peleg AY et al. Predicting phenotypic polymyxin resistance in Klebsiella pneumoniae through machine learning analysis of genomic data. mSystems 2020; 5:e00656-19 [View Article] [PubMed]
    [Google Scholar]
  24. Yang JH, Wright SN, Hamblin M, McCloskey D, Alcantar MA et al. A white-box machine learning approach for revealing antibiotic mechanisms of action. Cell 2019; 177:1649–1661 [View Article] [PubMed]
    [Google Scholar]
  25. Nguyen M, Long SW, McDermott PF, Olsen RJ, Olson R et al. Using machine learning to predict antimicrobial MICs and associated genomic features for nontyphoidal Salmonella. J Clin Microbiol 2019; 57:e01260-18 [View Article] [PubMed]
    [Google Scholar]
  26. Li Y, Metcalf BJ, Chochua S, Li Z, Gertz RE Jr et al. Validation of β-lactam minimum inhibitory concentration predictions for pneumococcal isolates with newly encountered penicillin binding protein (PBP) sequences. BMC Genomics 2017; 18:621 [View Article]
    [Google Scholar]
  27. Alexander SL. The CRyPTIC consortium Quantitative drug susceptibility testing for M. tuberculosis using unassembled sequencing data and machine learning. bioRxiv 2022
    [Google Scholar]
  28. Nguyen M, Brettin T, Long SW, Musser JM, Olsen RJ et al. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae. Sci Rep 2018; 8:421 [View Article] [PubMed]
    [Google Scholar]
  29. Anahtar MN, Yang JH, Kanjilal S. Applications of machine learning to the problem of antimicrobial resistance: an emerging model for translational research. J Clin Microbiol 2021; 59:e0126020 [View Article] [PubMed]
    [Google Scholar]
  30. Saber MM, Shapiro BJ. Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes. Microb Genom 2020; 6:e000337 [View Article] [PubMed]
    [Google Scholar]
  31. Collins C, Didelot X. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Comput Biol 2018; 14:e1005958 [View Article] [PubMed]
    [Google Scholar]
  32. Lees JA, Mai TT, Galardini M, Wheeler NE, Horsfield ST et al. Improved prediction of bacterial genotype-phenotype associations using interpretable pangenome-spanning regressions. mBio 2020; 11:e01344-20 [View Article] [PubMed]
    [Google Scholar]
  33. Earle SG, Wu C-H, Charlesworth J, Stoesser N, Gordon NC et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol 2016; 1:16041 [View Article] [PubMed]
    [Google Scholar]
  34. Hicks AL, Wheeler N, Sánchez-Busó L, Rakeman JL, Harris SR et al. Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data. PLoS Comput Biol 2019; 15:e1007349 [View Article] [PubMed]
    [Google Scholar]
  35. David S, Reuter S, Harris SR, Glasner C, Feltwell T et al. Epidemic of carbapenem-resistant Klebsiella pneumoniae in Europe is driven by nosocomial spread. Nat Microbiol 2019; 4:1919–1929 [View Article]
    [Google Scholar]
  36. Thorpe HA, Booton R, Kallonen T, Gibbon MJ, Couto N et al. A large-scale genomic snapshot of Klebsiella spp. isolates in Northern Italy reveals limited transmission between clinical and non-clinical settings. Nat Microbiol 2022; 7:2054–2067 [View Article] [PubMed]
    [Google Scholar]
  37. Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res 2019; 29:304–316 [View Article]
    [Google Scholar]
  38. Lam MMC, Wick RR, Watts SC, Cerdeira LT, Wyres KL et al. A genomic surveillance framework and genotyping tool for Klebsiella pneumoniae and its related species complex. Nat Commun 2021; 12:4188 [View Article] [PubMed]
    [Google Scholar]
  39. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 30:2068–2069 [View Article] [PubMed]
    [Google Scholar]
  40. Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G et al. Producing polished prokaryotic pangenomes with the panaroo pipeline. Genome Biol 2020; 21:180 [View Article] [PubMed]
    [Google Scholar]
  41. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015; 4:7 [View Article] [PubMed]
    [Google Scholar]
  42. Gona F, Comandatore F, Battaglia S, Piazza A, Trovato A et al. Comparison of core-genome MLST, coreSNP and PFGE methods for Klebsiella pneumoniae cluster analysis. Microb Genom 2020; 6:e000347 [View Article] [PubMed]
    [Google Scholar]
  43. Corander J, Croucher NJ, Harris SR, Lees JA, Tonkin-Hill G. Handbook of Statistical Genomics. In Balding D. eds Bacterial Population Genomics Hoboken, NJ: Wiley; 2019 [View Article]
    [Google Scholar]
  44. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 2011; 88:76–82 [View Article] [PubMed]
    [Google Scholar]
  45. Chewapreecha C, Marttinen P, Croucher NJ, Salter SJ, Harris SR et al. Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet 2014; 10:e1004547 [View Article] [PubMed]
    [Google Scholar]
  46. Coll F, Gouliouris T, Bruchmann S, Phelan J, Raven KE et al. PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies. Commun Biol 2022; 5:266 [View Article] [PubMed]
    [Google Scholar]
  47. Kremer PHC, Lees JA, Ferwerda B, van de Ende A, Brouwer MC et al. Genetic variation in Neisseria meningitidis does not influence disease severity in Meningococcal meningitis. Front Med 2020; 7:594769 [View Article]
    [Google Scholar]
  48. Coolen JPM, den Drijver EPM, Verweij JJ, Schildkraut JA, Neveling K et al. Genome-wide analysis in Escherichia coli unravels a high level of genetic homoplasy associated with cefotaxime resistance. Microb Genom 2021; 7:000556 [View Article] [PubMed]
    [Google Scholar]
  49. Crispell J, Balaz D, Gordon SV. HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny. Microb Genom 2019; 5:e000245 [View Article] [PubMed]
    [Google Scholar]
  50. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 2015; 32:268–274 [View Article] [PubMed]
    [Google Scholar]
  51. Lewis PO. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst Biol 2001; 50:913–925 [View Article] [PubMed]
    [Google Scholar]
  52. Farhat MR, Shapiro BJ, Kieser KJ, Sultana R, Jacobson KR et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat Genet 2013; 45:1183–1189 [View Article] [PubMed]
    [Google Scholar]
  53. Jaillard M, Lima L, Tournoud M, Mahé P, van Belkum A et al. A fast and agnostic method for bacterial genome-wide association studies: bridging the gap between k-mers and genetic events. PLoS Genet 2018; 14:e1007758 [View Article] [PubMed]
    [Google Scholar]
  54. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 2005; 67:301–320 [View Article]
    [Google Scholar]
  55. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33:1–22 [PubMed]
    [Google Scholar]
  56. James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning: With Applications in R New York, NY: Springer; 2013 [View Article]
    [Google Scholar]
  57. Wright MN, Ziegler A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Statist Software 2017; 77:1–17 [View Article]
    [Google Scholar]
  58. Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI et al. FaST linear mixed models for genome-wide association studies. Nat Methods 2011; 8:833–835 [View Article] [PubMed]
    [Google Scholar]
  59. Mai TT, Lees JA, Gladstone RA, Corander J. Inferring the heritability of bacterial traits in the era of machine learning. Bioinform Adv 2023; 3:vbad027 [View Article] [PubMed]
    [Google Scholar]
  60. Mai TT, Turner P, Corander J. Boosting heritability: estimating the genetic component of phenotypic variation with multiple sample splitting. BMC Bioinformatics 2021; 22:164 [View Article] [PubMed]
    [Google Scholar]
  61. Yang J, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 2014; 46:100–106 [View Article] [PubMed]
    [Google Scholar]
  62. Wheeler NE, Reuter S, Chewapreecha C, Lees JA, Blane B et al. Contrasting approaches to genome-wide association studies impact the detection of resistance mechanisms in Staphylococcus aureus. Bioinformatics [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001222
Loading
/content/journal/mgen/10.1099/mgen.0.001222
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error