1887

Abstract

Reference-based alignment of short-reads is a widely used technique in genomic analysis of the complex (MTBC) and the choice of reference sequence impacts the interpretation of analyses. The most widely used reference genomes include the ATCC type strain (H37Rv) and the putative MTBC ancestral sequence of Comas both of which are based on a lineage 4 sequence. As such, these reference sequences do not capture all of the structural variation known to be present in the ancestor of the MTBC. To better represent the base of the MTBC, we generated an imputed ancestral genomic sequence, termed MTBC from reference-free alignments of closed MTBC genomes. When used as a reference sequence in alignment workflows, MTBC mapped more short sequencing reads and called more pairwise SNPs relative to the Comas sequence while exhibiting minimal impact on the overall phylogeny of MTBC. The results also show that MTBC provides greater fidelity in capturing genomic variation and allows for the inclusion of regions absent from H37Rv in standard MTBC workflows without additional steps. The use of MTBC as an ancestral reference sequence in standard workflows modestly improved read mapping, SNP calling and intuitively facilitates the study of structural variation and evolution in MTBC.

Funding
This study was supported by the:
  • Canada Research Chairs (Award Tier 1 Canada Research Chair)
    • Principle Award Recipient: MarcelA Behr
  • Huck Institutes of the Life Sciences (Award Chair in Global Health)
    • Principle Award Recipient: VivekKapur
  • Bill and Melinda Gates Foundation (Award OPP1176950)
    • Principle Award Recipient: VivekKapur
  • Canadian Institutes for Health Research (Award FDN-148362)
    • Principle Award Recipient: MarcelA Behr
  • Fonds de Recherche du Québec - Santé (Award Clinician Scientist Training Program for Residents in Medical Specialties)
    • Principle Award Recipient: LukeB Harrison
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001165
2024-01-04
2024-04-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10/1/mgen001165.html?itemId=/content/journal/mgen/10.1099/mgen.0.001165&mimeType=html&fmt=ahah

References

  1. Kohl TA, Utpatel C, Schleusener V, De Filippo MR, Beckert P et al. MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates. PeerJ 2018; 6:e5895 [View Article] [PubMed]
    [Google Scholar]
  2. Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C et al. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 1998; 393:537–544 [View Article] [PubMed]
    [Google Scholar]
  3. Comas I, Chakravartti J, Small PM, Galagan J, Niemann S et al. Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat Genet 2010; 42:498–503 [View Article] [PubMed]
    [Google Scholar]
  4. Meehan CJ, Goig GA, Kohl TA, Verboven L, Dippenaar A et al. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol 2019; 17:533–545 [View Article] [PubMed]
    [Google Scholar]
  5. Brosch R, Gordon SV, Marmiesse M, Brodin P, Buchrieser C et al. A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl Acad Sci U S A 2002; 99:3684–3689 [View Article] [PubMed]
    [Google Scholar]
  6. Brites D, Loiseau C, Menardo F, Borrell S, Boniotti MB et al. A new phylogenetic framework for the animal-adapted Mycobacterium tuberculosis complex. Front Microbiol 2018; 9:2820 [View Article] [PubMed]
    [Google Scholar]
  7. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome.”. Proc Natl Acad Sci U S A 2005; 102:13950–13955 [View Article] [PubMed]
    [Google Scholar]
  8. Gagneux S. Ecology and evolution of Mycobacterium tuberculosis. Nat Rev Microbiol 2018; 16:202–213 [View Article] [PubMed]
    [Google Scholar]
  9. Negrete-Paz AM, Vázquez-Marrufo G, Gutiérrez-Moraga A, Vázquez-Garcidueñas MS. Pangenome reconstruction of Mycobacterium tuberculosis as a guide to reveal genomic features associated with strain clinical phenotype. Microorganisms 2023; 11:1495 [View Article] [PubMed]
    [Google Scholar]
  10. Ngabonziza JCS, Loiseau C, Marceau M, Jouet A, Menardo F et al. A sister lineage of the Mycobacterium tuberculosis complex discovered in the African Great Lakes region. Nat Commun 2020; 11:2917 [View Article] [PubMed]
    [Google Scholar]
  11. Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0; 2015 http://www.repeatmasker.org/
  12. Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM et al. Progressive cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020; 587:246–251 [View Article] [PubMed]
    [Google Scholar]
  13. Treangen TJ, Ondov BD, Koren S, Phillippy AM. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol 2014; 15:524 [View Article] [PubMed]
    [Google Scholar]
  14. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014; 30:1312–1313 [View Article] [PubMed]
    [Google Scholar]
  15. Coscolla M, Gagneux S, Menardo F, Loiseau C, Ruiz-Rodriguez P et al. Phylogenomics of Mycobacterium africanum reveals a new lineage and a complex evolutionary history. Microb Genom 2021; 7:1–14 [View Article] [PubMed]
    [Google Scholar]
  16. Vågene ÅJ, Honap TP, Harkins KM, Rosenberg MS, Giffin K et al. Geographically dispersed zoonotic tuberculosis in pre-contact South American human populations. Nat Commun 2022; 13:1195 [View Article] [PubMed]
    [Google Scholar]
  17. Hubisz MJ, Pollard KS, Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform 2011; 12:41–51 [View Article] [PubMed]
    [Google Scholar]
  18. Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 2016; 44:6614–6624 [View Article] [PubMed]
    [Google Scholar]
  19. Chiner-Oms Á, López MG, Moreno-Molina M, Furió V, Comas I. Gene evolutionary trajectories in Mycobacterium tuberculosis reveal temporal signs of selection. Proc Natl Acad Sci U S A 2022; 119:e2113600119 [View Article] [PubMed]
    [Google Scholar]
  20. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014; 30:2114–2120 [View Article] [PubMed]
    [Google Scholar]
  21. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20:257 [View Article] [PubMed]
    [Google Scholar]
  22. Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 2009; 25:1754–1760 [View Article] [PubMed]
    [Google Scholar]
  23. Goig GA, Blanco S, Garcia-Basteiro AL, Comas I. Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability. BMC Biol 2020; 18:24 [View Article] [PubMed]
    [Google Scholar]
  24. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009; 25:2078–2079 [View Article] [PubMed]
    [Google Scholar]
  25. Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 2019; 35:526–528 [View Article] [PubMed]
    [Google Scholar]
  26. Revell LJ. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 2012; 3:217–223 [View Article]
    [Google Scholar]
  27. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990; 215:403–410 [View Article] [PubMed]
    [Google Scholar]
  28. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES et al. Integrative genomics viewer. Nat Biotechnol 2011; 29:24–26 [View Article] [PubMed]
    [Google Scholar]
  29. Mostowy S, Onipede A, Gagneux S, Niemann S, Kremer K et al. Genomic analysis distinguishes Mycobacterium africanum. J Clin Microbiol 2004; 42:3594–3599 [View Article] [PubMed]
    [Google Scholar]
  30. Liu Z, Jiang Z, Wu W, Xu X, Ma Y et al. Identification of region of difference and H37Rv-related deletion in Mycobacterium tuberculosis complex by structural variant detection and genome assembly. Front Microbiol 2022; 13:984582 [View Article] [PubMed]
    [Google Scholar]
  31. Lee RS, Behr MA. Does choice matter? Reference-based alignment for molecular epidemiology of tuberculosis. J Clin Microbiol 2016; 54:1891–1895 [View Article] [PubMed]
    [Google Scholar]
  32. Bottai D, Frigui W, Sayes F, Di Luca M, Spadoni D et al. TbD1 deletion as a driver of the evolutionary success of modern epidemic Mycobacterium tuberculosis lineages. Nat Commun 2020; 11:684 [View Article] [PubMed]
    [Google Scholar]
  33. Achtman M, Zhou Z, Charlesworth J, Baxter L. EnteroBase: hierarchical clustering of 100 000s of bacterial genomes into species/subspecies and populations. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210240 [View Article] [PubMed]
    [Google Scholar]
  34. Charif D, Lobry JR. SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis. In Bastolla U, Porto M, Roman HE, Vendruscolo M. eds Structural Approaches to Sequence Evolution: Molecules, Networks, Populations. Biological and Medical Physics, Biomedical Engineering Springer; 2007 pp 207–232 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001165
Loading
/content/journal/mgen/10.1099/mgen.0.001165
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error