1887

Abstract

We have adopted an open bioinformatics ecosystem to address the challenges of bioinformatics implementation in public health laboratories (PHLs). Bioinformatics implementation for public health requires practitioners to undertake standardized bioinformatic analyses and generate reproducible, validated and auditable results. It is essential that data storage and analysis are scalable, portable and secure, and that implementation of bioinformatics fits within the operational constraints of the laboratory. We address these requirements using Terra, a web-based data analysis platform with a graphical user interface connecting users to bioinformatics analyses without the use of code. We have developed bioinformatics workflows for use with Terra that specifically meet the needs of public health practitioners. These Theiagen workflows perform genome assembly, quality control, and characterization, as well as construction of phylogeny for insights into genomic epidemiology. Additonally, these workflows use open-source containerized software and the WDL workflow language to ensure standardization and interoperability with other bioinformatics solutions, whilst being adaptable by the user. They are all open source and publicly available in Dockstore with the version-controlled code available in public GitHub repositories. They have been written to generate outputs in standardized file formats to allow for further downstream analysis and visualization with separate genomic epidemiology software. Testament to this solution meeting the requirements for bioinformatic implementation in public health, Theiagen workflows have collectively been used for over 5 million sample analyses in the last 2 years by over 90 public health laboratories in at least 40 different countries. Continued adoption of technological innovations and development of further workflows will ensure that this ecosystem continues to benefit PHLs.

  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001051
2023-07-10
2024-04-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/9/7/mgen001051.html?itemId=/content/journal/mgen/10.1099/mgen.0.001051&mimeType=html&fmt=ahah

References

  1. Armstrong GL, MacCannell DR, Taylor J, Carleton HA, Neuhaus EB et al. Pathogen Genomics in Public Health. Obstet Gynecol Surv 2020; 75:275–276 [View Article]
    [Google Scholar]
  2. Köser CU, Ellington MJ, Cartwright EJP, Gillespie SH, Brown NM et al. Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathog 2012; 8:e1002824 [View Article] [PubMed]
    [Google Scholar]
  3. Kwong JC, McCallum N, Sintchenko V, Howden BP. Whole genome sequencing in clinical and public health microbiology. Pathology 2015; 47:199–210 [View Article] [PubMed]
    [Google Scholar]
  4. Black A, MacCannell DR, Sibley TR, Bedford T. Ten recommendations for supporting open pathogen genomic analysis in public health. Nat Med 2020; 26:832–841 [View Article] [PubMed]
    [Google Scholar]
  5. Inzaule SC, Tessema SK, Kebede Y, Ogwell Ouma AE, Nkengasong JN. Genomic-informed pathogen surveillance in Africa: opportunities and challenges. Lancet Infect Dis 2021; 21:e281–e289 [View Article] [PubMed]
    [Google Scholar]
  6. docker-builds: Dockerfiles and documentation on tools for public health bioinformatics. Github; n.d https://github.com/StaPH-B/docker-builds accessed 25 January 2023
  7. Bai J, Bandla C, Guo J, Vera Alvarez R, Bai M et al. BioContainers registry: searching bioinformatics and proteomics tools, packages and containers. J Proteome Res 2021; 20:2056–2061 [View Article] [PubMed]
    [Google Scholar]
  8. Docker. n.d https://hub.docker.com/r/staphb/pangolin accessed 25 January 2023
  9. Docker. n.d https://hub.docker.com/r/staphb/ivar/ accessed 25 January 2023
  10. Docker. n.d https://hub.docker.com/r/staphb/vadr/ accessed 25 January 2023
  11. Wratten L, Wilm A, Göke J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat Methods 2021; 18:1161–1168 [View Article] [PubMed]
    [Google Scholar]
  12. Ahmed AE, Allen JM, Bhat T, Burra P, Fliege CE et al. Design considerations for workflow management systems use in production genomics research and the clinic. Sci Rep 2021; 11:21680 [View Article] [PubMed]
    [Google Scholar]
  13. Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E et al. Nextflow enables reproducible computational workflows. Nat Biotechnol 2017; 35:316–319 [View Article] [PubMed]
    [Google Scholar]
  14. wdl: workflow description language - specification and implementations. Github; n.d https://github.com/openwdl/wdl accessed 31 January 2023
  15. Amstutz P, Crusoe MR, Tijanić N, Chapman B, Chilton J et al. Common Workflow Language, v1.0. Figshare; 2016 https://research.manchester.ac.uk/en/publications/common-workflow-language-v10
  16. Köster J, Rahmann S. Snakemake--a scalable bioinformatics workflow engine. Bioinformatics 2012; 28:2520–2522 [View Article] [PubMed]
    [Google Scholar]
  17. Petit RA, Read TD. Bactopia: a flexible pipeline for complete analysis of bacterial genomes. mSystems 2020; 5:e00190-20 [View Article] [PubMed]
    [Google Scholar]
  18. viral-pipelines: viral-ngs: complete pipelines. Github [cited 2023 Jan 25]. https://github.com/broadinstitute/viral-pipelines [PubMed]
  19. Huddleston J, Hadfield J, Sibley TR, Lee J, Fay K et al. Augur: a bioinformatics toolkit for phylogenetic analyses of human pathogens. J Open Source Softw 2021; 6:2906 [View Article] [PubMed]
    [Google Scholar]
  20. Chan Zuckerberg GEN EPI. n.d https://czgenepi.org/ accessed 25 January 2023
  21. Li P-E, Lo C-C, Anderson JJ, Davenport KW, Bishop-Lilly KA et al. Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform. Nucleic Acids Res 2017; 45:67–80 [View Article] [PubMed]
    [Google Scholar]
  22. Gangiredla J, Rand H, Benisatto D, Payne J, Strittmatter C et al. GalaxyTrakr: a distributed analysis tool for public health whole genome sequence data accessible to non-bioinformaticians. BMC Genomics 2021; 22:114 [View Article] [PubMed]
    [Google Scholar]
  23. nf-tower: Nextflow Tower system. Github; n.d https://github.com/seqeralabs/nf-tower accessed 7 February 2023
  24. Terra. n.d https://app.terra.bio accessed 1 February 2023
  25. Tool registry service API: Enabling an interoperable library of genomics analysis tools. n.d https://www.ga4gh.org/news/tool-registry-service-api-enabling-an-interoperable-library-of-genomics-analysis-tools/ accessed 7 February 2023
  26. dockstore: Our VM/Docker sharing infrastructure and management component. Github; n.d https://github.com/dockstore/dockstore accessed 30 January 2023
  27. Pereira F. Security [Internet]. Terra.Bio; 2020 https://terra.bio/resources/security/ accessed 14 March 2023
  28. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018; 34:i884–i890 [View Article] [PubMed]
    [Google Scholar]
  29. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014; 30:2114–2120 [View Article] [PubMed]
    [Google Scholar]
  30. Nayfach S, Rodriguez-Mueller B, Garud N, Pollard KS. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res 2016; 26:1612–1625 [View Article] [PubMed]
    [Google Scholar]
  31. Kislyuk AO, Katz LS, Agrawal S, Hagen MS, Conley AB et al. A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 2010; 26:1819–1826 [View Article] [PubMed]
    [Google Scholar]
  32. Souvorov A, Agarwala R, Lipman DJ. SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biol 2018; 19:153 [View Article] [PubMed]
    [Google Scholar]
  33. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 2013; 29:1072–1075 [View Article] [PubMed]
    [Google Scholar]
  34. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015; 31:3210–3212 [View Article] [PubMed]
    [Google Scholar]
  35. Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol 2018; 14:e1005944 [View Article] [PubMed]
    [Google Scholar]
  36. Lumpe J, Gumbleton L, Gorzalski A, Libuit K, Varghese V et al. GAMBIT (Genomic approximation method for bacterial identification and tracking): a methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. bioRxiv 2020496173
    [Google Scholar]
  37. Bortolaia V, Kaas RS, Ruppe E, Roberts MC, Schwarz S et al. ResFinder 4.0 for predictions of phenotypes from genotypes. J Antimicrob Chemother 2020; 75:3491–3500 [View Article] [PubMed]
    [Google Scholar]
  38. Feldgarden M, Brover V, Gonzalez-Escalona N, Frye JG, Haendiges J et al. AMRFinderPlus and the reference gene catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci Rep 2021; 11:12728 [View Article] [PubMed]
    [Google Scholar]
  39. Seemann T. mlst: Scan contig files against PubMLST typing schemes. Github; n.d https://github.com/tseemann/mlst accessed 1 February 2023
  40. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 30:2068–2069 [View Article] [PubMed]
    [Google Scholar]
  41. Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb Genom 2021; 7:000685 [View Article] [PubMed]
    [Google Scholar]
  42. Carattoli A, Zankari E, García-Fernández A, Voldby Larsen M, Lund O et al. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother 2014; 58:3895–3903 [View Article] [PubMed]
    [Google Scholar]
  43. Joensen KG, Tetzschner AMM, Iguchi A, Aarestrup FM, Scheutz F. Rapid and easy in silico serotyping of Escherichia coli isolates by use of whole-genome sequencing data. J Clin Microbiol 2015; 53:2410–2426 [View Article] [PubMed]
    [Google Scholar]
  44. Bessonov K, Laing C, Robertson J, Yong I, Ziebell K et al. ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data. Microb Genom 2021; 7:000728 [View Article] [PubMed]
    [Google Scholar]
  45. Wu Y, Lau HK, Lee T, Lau DK, Payne J. In silico serotyping based on whole-genome sequencing improves the accuracy of Shigella identification. Appl Environ Microbiol 2019 [View Article]
    [Google Scholar]
  46. Zhang X, Payne M, Nguyen T, Kaur S, Lan R. Cluster-specific gene markers enhance Shigella and enteroinvasive Escherichia coli in silico serotyping. Microb Genom 2021; 7: [View Article]
    [Google Scholar]
  47. Hawkey J, Paranagama K, Baker KS, Bengtsson RJ, Weill F-X et al. Global population structure and genotyping framework for genomic surveillance of the major dysentery pathogen, Shigella sonnei. Nat Commun 2021; 12:2684 [View Article] [PubMed]
    [Google Scholar]
  48. Yoshida CE, Kruczkiewicz P, Laing CR, Lingohr EJ, Gannon VPJ et al. The Salmonella in silico typing resource (SISTR): an open web-accessible tool for rapidly typing and subtyping draft Salmonella genome assemblies. PLoS One 2016; 11:e0147101 [View Article] [PubMed]
    [Google Scholar]
  49. Zhang S, Yin Y, Jones MB, Zhang Z, Deatherage Kaiser BL et al. Salmonella serotype determination utilizing high-throughput genome sequencing data. J Clin Microbiol 2015; 53:1685–1692 [View Article] [PubMed]
    [Google Scholar]
  50. Wong VK, Baker S, Connor TR, Pickard D, Page AJ et al. An extended genotyping framework for Salmonella enterica serovar typhi, the cause of human typhoid. Nat Commun 2016; 7:12827 [View Article] [PubMed]
    [Google Scholar]
  51. Doumith M, Buchrieser C, Glaser P, Jacquet C, Martin P. Differentiation of the major listeria monocytogenes serovars by multiplex PCR. J Clin Microbiol 2004; 42:3819–3822 [View Article] [PubMed]
    [Google Scholar]
  52. Gaia V, Fry NK, Afshar B, Lück PC, Meugnier H et al. Consensus sequence-based scheme for epidemiological typing of clinical and environmental isolates of Legionella pneumophila. J Clin Microbiol 2005; 43:2047–2052 [View Article] [PubMed]
    [Google Scholar]
  53. Lam MMC, Wick RR, Watts SC, Cerdeira LT, Wyres KL et al. A genomic surveillance framework and genotyping tool for Klebsiella pneumoniae and its related species complex. Nat Commun 2021; 12:4188 [View Article] [PubMed]
    [Google Scholar]
  54. Phelan JE, O’Sullivan DM, Machado D, Ramos J, Oppong YEA et al. Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs. Genome Med 2019; 11:41 [View Article] [PubMed]
    [Google Scholar]
  55. Wyres KL, Cahill SM, Holt KE, Hall RM, Kenyon JJ. Identification of Acinetobacter Baumannii Loci for Capsular polysaccharide (KL) and Lipooligosaccharide outer core (OCL) synthesis in genome assemblies using Curated reference databases compatible with Kaptive. Microb Genom 2020; 6: [View Article]
    [Google Scholar]
  56. Lam MMC, Koong J, Holt KE, Hall RM, Hamidian M. Detection and typing of plasmids in Acinetobacter baumannii using rep genes encoding replication initiation proteins. Microbiol Spectr 2022; 11:e0247822 [View Article] [PubMed]
    [Google Scholar]
  57. Thrane SW, Taylor VL, Lund O, Lam JS, Jelsbak L. Application of whole-genome sequencing data for O-specific antigen analysis and in silico serotyping of Pseudomonas aeruginosa isolates. J Clin Microbiol 2016; 54:1782–1788 [View Article] [PubMed]
    [Google Scholar]
  58. Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res 2019; 29:304–316 [View Article] [PubMed]
    [Google Scholar]
  59. Epping L, van Tonder AJ, Gladstone RA, Bentley SD et al. The global pneumococcal sequencing consortium, Bentley SD, Page AJ, et al. SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data. Microb Genom 2018; 4:e000186 [View Article] [PubMed]
    [Google Scholar]
  60. Li Y, Metcalf BJ, Chochua S, Li Z, Gertz RE et al. Penicillin-binding protein transpeptidase signatures for tracking and predicting β-Lactam resistance levels in Streptococcus pneumoniae. mBio 2016; 7:e00756-16 [View Article] [PubMed]
    [Google Scholar]
  61. Bayliss SC, Thorpe HA, Coyle NM, Sheppard SK, Feil EJ. PIRATE: a fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria. Gigascience 2019; 8:giz119 [View Article] [PubMed]
    [Google Scholar]
  62. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 2015; 32:268–274 [View Article] [PubMed]
    [Google Scholar]
  63. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20:257 [View Article] [PubMed]
    [Google Scholar]
  64. Seemann T. snippy: Rapid haploid variant calling and core genome alignment. Github; n.d https://github.com/tseemann/snippy accessed 1 February 2023
  65. Katz KS, Shutov O, Lapoint R, Kimelman M, Brister JR et al. STAT: a fast, scalable, MinHash-based k-mer tool to assess sequence read Archive next-generation sequence submissions. Genome Biol 2021; 22:270 [View Article] [PubMed]
    [Google Scholar]
  66. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V et al. Twelve years of SAMtools and BCFtools. Gigascience 2021; 10:giab008 [View Article] [PubMed]
    [Google Scholar]
  67. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 2010; 26:589–595 [View Article] [PubMed]
    [Google Scholar]
  68. Grubaugh ND, Gangavarapu K, Quick J, Matteson NL, De Jesus JG et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol 2019; 20:8 [View Article] [PubMed]
    [Google Scholar]
  69. Shepard SS, Meno S, Bahl J, Wilson MM, Barnes J et al. Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler. BMC Genomics 2016; 17:708 [View Article] [PubMed]
    [Google Scholar]
  70. Schäffer AA, Hatcher EL, Yankie L, Shonkwiler L, Brister JR et al. VADR: validation and Annotation of virus sequence submissions to Genbank. BMC Bioinformatics 2020; 21:211 [View Article]
    [Google Scholar]
  71. Aksamentov I, Roemer C, Hodcroft E, Neher R. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J Open Source Softw 2021; 6:3773 [View Article]
    [Google Scholar]
  72. O’Toole Á, Scher E, Underwood A, Jackson B, Hill V et al. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evol 2021; 7:veab064 [View Article] [PubMed]
    [Google Scholar]
  73. Borges V, Pinheiro M, Pechirra P, Guiomar R, Gomes JP. INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza whole-genome-sequencing-based surveillance. Genome Med 2018; 10:46 [View Article] [PubMed]
    [Google Scholar]
  74. Seemann T. abricate: mass screening of contigs for antimicrobial and virulence genes. Github; n.d https://github.com/tseemann/abricate accessed 1 February 2023
  75. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 2018; 34:4121–4123 [View Article] [PubMed]
    [Google Scholar]
  76. quasitools: Quasitools is a collection of tools for analysing viral quasispecies data. Github; n.d https://github.com/phac-nml/quasitools accessed 1 February 2023
  77. Theiagen Genomics. Github; n.d https://github.com/theiagen accessed 31 January 2023
  78. Dockstore. n.d https://dockstore.org/organizations/Theiagen accessed 1 February 2023
  79. Stevens EL, Carleton HA, Beal J, Tillman GE, Lindsey RL et al. Use of whole genome sequencing by the federal interagency collaboration for genomics for food and feed safety in the United States. J Food Prot 2022; 85:755–772 [View Article] [PubMed]
    [Google Scholar]
  80. PHA4GE - genomic epidemiology. PHA4GE; 2021 https://pha4ge.org/ accessed 1 February 2023
  81. CDC. SPHERES [Internet]. Centers for Disease Control and Prevention; 2022 https://www.cdc.gov/coronavirus/2019- ncov/variants/spheres.html accessed 1 February 2023
  82. Home. n.d https://staphb.org/ accessed 1 February 2023
  83. Timme RE, Rand H, Shumway M, Trees EK, Simmons M et al. Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance. PeerJ 2017; 5:e3893 [View Article] [PubMed]
    [Google Scholar]
  84. Hoffmann M, Luo Y, Monday SR, Gonzalez-Escalona N, Ottesen AR et al. Tracing origins of the Salmonella bareilly strain causing a food-borne outbreak in the United States. J Infect Dis 2016; 213:502–508 [View Article] [PubMed]
    [Google Scholar]
  85. Gladstone RA, Lo SW, Lees JA, Croucher NJ, van Tonder AJ et al. International Genomic definition of Pneumococcal lineages, to Contextualise disease, antibiotic resistance and vaccine impact. EBioMedicine 2019; 43:338–346 [View Article]
    [Google Scholar]
  86. Lutgring JD, Zhu W, de Man TJB, Avillan JJ, Anderson KF et al. Phenotypic and Genotypic characterization of Enterobacteriaceae producing Oxacillinase-48-like Carbapenemases, United States. Emerg Infect Dis 2018; 24:700–709 [View Article]
    [Google Scholar]
  87. Gardner SN, Slezak T, Hall BG. kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics 2015; 31:2877–2878 [View Article] [PubMed]
    [Google Scholar]
  88. Hadfield J, Croucher NJ, Goater RJ, Abudahab K, Aanensen DM et al. Phandango: an interactive viewer for bacterial population genomics. Bioinformatics 2018; 34:292–293 [View Article] [PubMed]
    [Google Scholar]
  89. Barker M, Chue Hong NP, Katz DS, Lamprecht A-L, Martinez-Ortiz C et al. Introducing the FAIR Principles for research software. Sci Data 2022; 9:622 [View Article] [PubMed]
    [Google Scholar]
  90. Chen NFG, Chaguza C, Gagne L, Doucette M, Smole S et al. Development of an Amplicon-based sequencing approach in response to the global emergence of human monkeypox virus. medRxiv 2023 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001051
Loading
/content/journal/mgen/10.1099/mgen.0.001051
Loading

Data & Media loading...

Supplements

Supplementary material 1

EXCEL

Supplementary material 2

EXCEL

Supplementary material 3

EXCEL

Supplementary material 4

EXCEL

Supplementary material 5

EXCEL

Supplementary material 6

EXCEL

Supplementary material 7

EXCEL

Supplementary material 8

EXCEL

Supplementary material 9

EXCEL

Supplementary material 10

EXCEL

Supplementary material 11

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error