Skip to content
1887

Abstract

Amplicon sequencing is a popular method for understanding the diversity of bacterial communities in samples containing multiple organisms as exemplified by 16S rRNA sequencing. Another application of amplicon sequencing includes multiplexing both primer sets and samples, allowing sequencing of multiple targets in multiple samples in the same sequencing run. Multiple tools exist to process the amplicon sequencing data produced via the short-read Illumina platform, but there are fewer options for long-read Oxford Nanopore Technologies (ONT) sequencing, or for processing data from environmental surveillance or other sources with many different organisms. We have developed AmpliconTyper (v0.1.28, DOI: 10.5281/zenodo.15045111) for analysing multiplex amplicon sequencing data from environmental (e.g. wastewater) or similarly complex samples, generated using ONT devices. The tool uses machine learning to classify sequencing reads into target and non-target organisms with very high specificity and sensitivity. The user can train models using public and/or user-generated data, which can subsequently be applied to analyse new data. The tool can also generate amplicon consensus sequences, as well as identify SNPs and report their genotype implications, such as association with lineages or antimicrobial resistance (AMR). The tool is freely available via Bioconda and GitHub (https://github.com/AntonS-bio/AmpliconTyper). AmpliconTyper allows robust identification of target organism reads in ONT-sequenced environmental samples and can identify user-specified lineage or AMR markers.

Funding
This study was supported by the:
  • Bill and Melinda Gates Foundation (Award INV047158)
    • Principal Award Recipient: NicholasGrassly
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001421
2025-09-10
2025-12-12

Metrics

Loading full text...

Full text loading...

/deliver/fulltext/mgen/11/9/mgen001421.html?itemId=/content/journal/mgen/10.1099/mgen.0.001421&mimeType=html&fmt=ahah

References

  1. Agarwala R, Barrett T, Beck J, Benson DA, Bollin C et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2018; 46:D8–D13 [View Article]
    [Google Scholar]
  2. Mahindroo J, Spadar A, Troman C, Dyson Z, Holt K et al. Amplicon sequencing for genotyping S. Typhi v2. Typhi 2024; V:3 [View Article]
    [Google Scholar]
  3. Ashton PM, Nair S, Peters TM, Bale JA, Powell DG et al. Identification of Salmonella for public health surveillance using whole genome sequencing. PeerJ 2016; 4:e1752 [View Article] [PubMed]
    [Google Scholar]
  4. European Organization For Nuclear Research CERN Zenodo; 2013 https://www.zenodo.org
  5. Nair S, Patel V, Hickey T, Maguire C, Greig DR et al. Real-time PCR assay for differentiation of typhoidal and nontyphoidal Salmonella. J Clin Microbiol 2019; 57:e00167-19 [View Article] [PubMed]
    [Google Scholar]
  6. Kidd M, Richter A, Best A, Cumley N, Mirza J et al. S-Variant SARS-CoV-2 lineage B1.1.7 is associated with significantly higher viral load in samples tested by taqpath polymerase chain reaction. J Infect Dis 2021; 223:1666–1670 [View Article] [PubMed]
    [Google Scholar]
  7. Carey ME, Dyson ZA, Ingle DJ, Amir A, Aworh MK et al. Global diversity and antimicrobial resistance of typhoid fever pathogens: Insights from a meta-analysis of 13,000 Salmonella Typhi genomes. Elife 2023; 12:e85867 [View Article] [PubMed]
    [Google Scholar]
  8. Dyson ZA, Holt KE. Five years of genotyphi: updates to the global salmonella typhi genotyping framework. J Infect Dis 2021; 224:S775–S780 [View Article] [PubMed]
    [Google Scholar]
  9. Uzzell CB, Abraham D, Rigby J, Troman CM, Nair S et al. Environmental surveillance for Salmonella Typhi and its association with typhoid fever incidence in India and Malawi. J Infect Dis 2024; 229:979–987 [View Article] [PubMed]
    [Google Scholar]
  10. Wong VK, Baker S, Connor TR, Pickard D, Page AJ et al. An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid. Nat Commun 2016; 7:12827 [View Article] [PubMed]
    [Google Scholar]
  11. Hall MB, Wick RR, Judd LM, Nguyen AN, Steinig EJ et al. Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data. Elife 2024; 13:RP98300 [View Article] [PubMed]
    [Google Scholar]
  12. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 2016; 13:581–583 [View Article] [PubMed]
    [Google Scholar]
  13. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20:257 [View Article] [PubMed]
    [Google Scholar]
  14. Curry KD, Wang Q, Nute MG, Tyshaieva A, Reeves E et al. Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data. Nat Methods 2022; 19:845–853 [View Article] [PubMed]
    [Google Scholar]
  15. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010; 7:335–336 [View Article] [PubMed]
    [Google Scholar]
  16. Petrone JR, Rios Glusberger P, George CD, Milletich PL, Ahrens AP et al. RESCUE: a validated Nanopore pipeline to classify bacteria through long-read, 16S-ITS-23S rRNA sequencing. Front Microbiol 2023; 14:1201064 [View Article]
    [Google Scholar]
  17. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019; 37:852–857 [View Article]
    [Google Scholar]
  18. Schacksen PS, Østergaard SK, Eskildsen MH, Nielsen JL. Complete pipeline for Oxford Nanopore Technology amplicon sequencing (ONT-AmpSeq): from pre-processing to creating an operational taxonomic unit table. FEBS Open Bio 2024; 14:1779–1787 [View Article] [PubMed]
    [Google Scholar]
  19. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25:2078–2079 [View Article] [PubMed]
    [Google Scholar]
  20. Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 2010; 38:1767–1771 [View Article] [PubMed]
    [Google Scholar]
  21. Danecek P, Auton A, Abecasis G, Albers CA, Banks E et al. The variant call format and VCFtools. Bioinformatics 2011; 27:2156–2158 [View Article] [PubMed]
    [Google Scholar]
  22. Niu JD D, Hoffman M. The browser extensible data (BED) format GitHub 2022 https://samtools.github.io/hts-specs/BEDv1.pdf
    [Google Scholar]
  23. O’Cathail C, Ahamed A, Burgin J, Cummins C, Devaraj R et al. The European nucleotide archive in 2024. Nucleic Acids Res 2025; 53:D49–D55 [View Article] [PubMed]
    [Google Scholar]
  24. IBMAspera Connect. 4.2.8.540 2024
  25. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B et al. Scikit-learn: machine learning in python. J Mach Learn Res 2011; 12:2825–2830
    [Google Scholar]
  26. Tsuruoka Y, Tsujii J, Ananiadou S. eds Stochastic Gradient Descent Training for L1-Regularized Log-Linear Models with Cumulative Penalty Suntec, Singapore: Association for Computational Linguistics; 2009
    [Google Scholar]
  27. Heger A, Marshall J, Jacobs K. pysam: htslib interface for python 2023
  28. Bonfield JK, Marshall J, Danecek P, Li H, Ohan V et al. HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 2021; 10:giab007 [View Article] [PubMed]
    [Google Scholar]
  29. Python Software Foundation Pickle — python object serialization. 3.13, 1st. edn 2024
    [Google Scholar]
  30. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018; 34:3094–3100 [View Article]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.001421
Loading
/content/journal/mgen/10.1099/mgen.0.001421
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An error occurred
Approval was partially successful, following selected items could not be processed due to error