1887

Abstract

As genome sequencing efforts are unveiling the genetic diversity of the biosphere with an unprecedented speed, there is a need to accurately describe the structural and functional properties of groups of extant species whose genomes have been sequenced, as well as their inferred ancestors, at any given taxonomic level of their phylogeny. Elaborate approaches for the reconstruction of ancestral states at the sequence level have been developed, subsequently augmented by methods based on gene content. While these approaches of sequence or gene-content reconstruction have been successfully deployed, there has been less progress on the explicit inference of functional properties of ancestral genomes, in terms of metabolic pathways and other cellular processes. Herein, we describe PathTrace, an efficient algorithm for parsimony-based reconstructions of the evolutionary history of individual metabolic pathways, pivotal representations of key functional modules of cellular function. The algorithm is implemented as a five-step process through which pathways are represented as fuzzy vectors, where each enzyme is associated with a taxonomic conservation value derived from the phylogenetic profile of its protein sequence. The method is evaluated with a selected benchmark set of pathways against collections of genome sequences from key data resources. By deploying a pangenome-driven approach for pathway sets, we demonstrate that the inferred patterns are largely insensitive to noise, as opposed to gene-content reconstruction methods. In addition, the resulting reconstructions are closely correlated with the evolutionary distance of the taxa under study, suggesting that a diligent selection of target pangenomes is essential for maintaining cohesiveness of the method and consistency of the inference, serving as an internal control for an arbitrary selection of queries. The PathTrace method is a first step towards the large-scale analysis of metabolic pathway evolution and our deeper understanding of functional relationships reflected in emerging pangenome collections.

Funding
This study was supported by the:
  • Christos A Ouzounis , FP7 Food, Agriculture and Fisheries, Biotechnology , (Award 222886-2)
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000429
2020-09-14
2020-10-20
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10.1099/mgen.0.000429/mgen000429.html?itemId=/content/journal/mgen/10.1099/mgen.0.000429&mimeType=html&fmt=ahah

References

  1. Omland KE. The assumptions and challenges of ancestral state reconstructions. Syst Biol 1999; 48:604–611 [CrossRef]
    [Google Scholar]
  2. Demuth JP, Hahn MW. The life and death of gene families. Bioessays 2009; 31:29–39 [CrossRef][PubMed]
    [Google Scholar]
  3. Gaucher EA, Thomson JM, Burgan MF, Benner SA. Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins. Nature 2003; 425:285–288 [CrossRef][PubMed]
    [Google Scholar]
  4. Acevedo-Rocha CG, Fang G, Schmidt M, Ussery DW, Danchin A. From essential to persistent genes: a functional approach to constructing synthetic life. Trends Genet 2013; 29:273–279 [CrossRef][PubMed]
    [Google Scholar]
  5. Kunin V, Ouzounis CA. The balance of driving forces during genome evolution in prokaryotes. Genome Res 2003; 13:1589–1594 [CrossRef][PubMed]
    [Google Scholar]
  6. Kunin V, Goldovsky L, Darzentas N, Ouzounis CA. The net of life: reconstructing the microbial phylogenetic network. Genome Res 2005; 15:954–959 [CrossRef][PubMed]
    [Google Scholar]
  7. Ouzounis C, Kyrpides N. The emergence of major cellular processes in evolution. FEBS Lett 1996; 390:119–123 [CrossRef][PubMed]
    [Google Scholar]
  8. Groussin M, Boussau B, Gouy M. A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences. Syst Biol 2013; 62:523–538 [CrossRef][PubMed]
    [Google Scholar]
  9. Bansal MS, Banay G, Harlow TJ, Gogarten JP, Shamir R. Systematic inference of highways of horizontal gene transfer in prokaryotes. Bioinformatics 2013; 29:571–579 [CrossRef][PubMed]
    [Google Scholar]
  10. Bansal MS, Banay G, Gogarten JP, Shamir R. Detecting highways of horizontal gene transfer. J Comput Biol 2011; 18:1087–1114 [CrossRef][PubMed]
    [Google Scholar]
  11. Kunin V, Ouzounis CA. GeneTRACE-reconstruction of gene content of ancestral species. Bioinformatics 2003; 19:1412–1416 [CrossRef][PubMed]
    [Google Scholar]
  12. Ouzounis CA. Ancestral state reconstructions for genomes. Curr Opin Genet Dev 2005; 15:595–600 [CrossRef][PubMed]
    [Google Scholar]
  13. Gophna U, Bapteste E, Doolittle WF, Biran D, Ron EZ. Evolutionary plasticity of methionine biosynthesis. Gene 2005; 355:48–57 [CrossRef][PubMed]
    [Google Scholar]
  14. Sun G, Huang J. Horizontally acquired DAP pathway as a unit of self-regulation. J Evol Biol 2011; 24:587–595 [CrossRef][PubMed]
    [Google Scholar]
  15. Lima WC, Varani AM, Menck CFM. NAD biosynthesis evolution in bacteria: lateral gene transfer of kynurenine pathway in Xanthomonadales and Flavobacteriales. Mol Biol Evol 2009; 26:399–406 [CrossRef][PubMed]
    [Google Scholar]
  16. Csurös M, Miklós I. Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model. Mol Biol Evol 2009; 26:2087–2095 [CrossRef][PubMed]
    [Google Scholar]
  17. Mirkin BG, Fenner TI, Galperin MY, Koonin EV. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol Biol 2003; 3:2 [CrossRef][PubMed]
    [Google Scholar]
  18. Ye Y, Doak TG. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 2009; 5:e1000465 [CrossRef][PubMed]
    [Google Scholar]
  19. Patro R, Kingsford C. Predicting protein interactions via parsimonious network history inference. Bioinformatics 2013; 29:i237–i246 [CrossRef][PubMed]
    [Google Scholar]
  20. Han MV, Thomas GWC, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using cafe 3. Mol Biol Evol 2013; 30:1987–1997 [CrossRef][PubMed]
    [Google Scholar]
  21. Iwasaki W, Takagi T. Reconstruction of highly heterogeneous gene-content evolution across the three domains of life. Bioinformatics 2007; 23:i230–i239 [CrossRef][PubMed]
    [Google Scholar]
  22. Iwasaki W, Takagi T. Rapid pathway evolution facilitated by horizontal gene transfers across prokaryotic lineages. PLoS Genet 2009; 5:e1000402 [CrossRef][PubMed]
    [Google Scholar]
  23. Li Y, Calvo SE, Gutman R, Liu JS, Mootha VK. Expansion of biological pathways based on evolutionary inference. Cell 2014; 158:213–225 [CrossRef][PubMed]
    [Google Scholar]
  24. Pitkänen E, Jouhten P, Hou J, Syed MF, Blomberg P et al. Comparative genome-scale reconstruction of gapless metabolic networks for present and ancestral species. PLoS Comput Biol 2014; 10:e1003465 [CrossRef][PubMed]
    [Google Scholar]
  25. Saito R, Smoot ME, Ono K, Ruscheinski J, Wang P-L et al. A travel guide to Cytoscape plugins. Nat Methods 2012; 9:1069–1076 [CrossRef][PubMed]
    [Google Scholar]
  26. Psomopoulos FE, Vitsios DM, Baichoo S, Ouzounis CA. BioPAXViz: a cytoscape application for the visual exploration of metabolic pathway evolution. Bioinformatics 2017; 26:btw813 [CrossRef]
    [Google Scholar]
  27. Demir E, Cary MP, Paley S, Fukuda K, Lemer C et al. The BioPAX community standard for pathway data sharing. Nat Biotechnol 2010; 28:935–942 [CrossRef][PubMed]
    [Google Scholar]
  28. Caspi R, Billington R, Ferrer L, Foerster H, Fulcher CA et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 2016; 44:D471–D480 [CrossRef][PubMed]
    [Google Scholar]
  29. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 2017; 45:D353–D361 [CrossRef][PubMed]
    [Google Scholar]
  30. Psomopoulos FE, Mitkas PA, Ouzounis CA. Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles. PLoS One 2013; 8:e52854 [CrossRef][PubMed]
    [Google Scholar]
  31. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using diamond. Nat Methods 2015; 12:59–60 [CrossRef][PubMed]
    [Google Scholar]
  32. Tettelin H, Medini D. The Pangenome: Diversity Dynamics and Evolution of Genomes: Springer; editors (2020
    [Google Scholar]
  33. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z et al. Gapped blast and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997; 25:3389–3402 [CrossRef][PubMed]
    [Google Scholar]
  34. Promponas VJ, Enright AJ, Tsoka S, Kreil DP, Leroy C et al. CAST: an iterative algorithm for the complexity analysis of sequence tracts. Bioinformatics 2000; 16:915–922 [CrossRef]
    [Google Scholar]
  35. Maddison WP, Maddison DR. Mesquite: a modular system for evolutionary analysis. version 2.75 ED; 2011
  36. Medina-Rivera A, Defrance M, Sand O, Herrmann C, Castro-Mondragon JA et al. RSAT 2015: regulatory sequence analysis tools. Nucleic Acids Res 2015; 43:W50–W56 [CrossRef][PubMed]
    [Google Scholar]
  37. Bishop C. Pattern Recognition and Machine Learning New York: Springer-Verlag New York; 2006
    [Google Scholar]
  38. McDonald AG, Tipton KF. Fifty-five years of enzyme classification: advances and difficulties. FEBS J 2014; 281:583–592 [CrossRef][PubMed]
    [Google Scholar]
  39. van Helden J. Regulatory sequence analysis tools. Nucleic Acids Res 2003; 31:3593–3596 [CrossRef][PubMed]
    [Google Scholar]
  40. Durfee T, Nelson R, Baldwin S, Plunkett G, Burland V et al. The complete genome sequence of Escherichia coli DH10B: insights into the biology of a laboratory workhorse. J Bacteriol 2008; 190:2597–2606 [CrossRef][PubMed]
    [Google Scholar]
  41. Cvitkovitch DG, Gutierrez JA, Bleiweis AS. Role of the citrate pathway in glutamate biosynthesis by Streptococcus mutans. J Bacteriol 1997; 179:650–655 [CrossRef][PubMed]
    [Google Scholar]
  42. Huynen MA, Dandekar T, Bork P. Variation and evolution of the citric-acid cycle: a genomic perspective. Trends Microbiol 1999; 7:281–291 [CrossRef][PubMed]
    [Google Scholar]
  43. Zientz E, Dandekar T, Gross R. Metabolic interdependence of obligate intracellular bacteria and their insect hosts. Microbiol Mol Biol Rev 2004; 68:745–770 [CrossRef][PubMed]
    [Google Scholar]
  44. van Vugt-Lussenburg BMA, van der Weel L, Hagen WR, Hagedoorn P-L. Identification of two [4Fe-4S]-cluster-containing hydro-lyases from Pyrococcus furiosus. Microbiology 2009; 155:3015–3020 [CrossRef][PubMed]
    [Google Scholar]
  45. Berg IA, Kockelkorn D, Ramos-Vera WH, Say RF, Zarzycki J et al. Autotrophic carbon fixation in archaea. Nat Rev Microbiol 2010; 8:447–460 [CrossRef][PubMed]
    [Google Scholar]
  46. Risso C, Van Dien SJ, Orloff A, Lovley DR, Coppi MV. Elucidation of an alternate isoleucine biosynthesis pathway in Geobacter sulfurreducens. J Bacteriol 2008; 190:2266–2274 [CrossRef][PubMed]
    [Google Scholar]
  47. Nishida H, Nishiyama M, Kobashi N, Kosuge T, Hoshino T et al. A prokaryotic gene cluster involved in synthesis of lysine through the amino adipate pathway: a key to the evolution of amino acid biosynthesis. Genome Res 1999; 9:1175–1183 [CrossRef][PubMed]
    [Google Scholar]
  48. Yoshida A, Tomita T, Atomi H, Kuzuyama T, Nishiyama M. Lysine biosynthesis of Thermococcus kodakarensis with the capacity to function as an ornithine biosynthetic system. J Biol Chem 2016; 291:21630–21643 [CrossRef][PubMed]
    [Google Scholar]
  49. Velasco AM, Leguina JI, Lazcano A. Molecular evolution of the lysine biosynthetic pathways. J Mol Evol 2002; 55:445–449 [CrossRef][PubMed]
    [Google Scholar]
  50. Kyrpides NC, Ouzounis CA. Transcription in archaea. Proc Natl Acad Sci U S A 1999; 96:8545–8550 [CrossRef][PubMed]
    [Google Scholar]
  51. Rodionov DA, Mironov AA, Gelfand MS. Conservation of the biotin regulon and the BirA regulatory signal in eubacteria and archaea. Genome Res 2002; 12:1507–1516 [CrossRef][PubMed]
    [Google Scholar]
  52. Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. Nature 2000; 407:81–86 [CrossRef][PubMed]
    [Google Scholar]
  53. Russell CW, Bouvaine S, Newell PD, Douglas AE. Shared metabolic pathways in a coevolved insect-bacterial symbiosis. Appl Environ Microbiol 2013; 79:6117–6123 [CrossRef][PubMed]
    [Google Scholar]
  54. Russell CW, Poliakov A, Haribal M, Jander G, van Wijk KJ et al. Matching the supply of bacterial nutrients to the nutritional demand of the animal host. Proc Biol Sci 2014; 281:20141163 [CrossRef][PubMed]
    [Google Scholar]
  55. Moran NA. Microbial minimalism: genome reduction in bacterial pathogens. Cell 2002; 108:583–586 [CrossRef][PubMed]
    [Google Scholar]
  56. Yu X-J, Walker DH, Liu Y, Zhang L. Amino acid biosynthesis deficiency in bacteria associated with human and animal hosts. Infect Genet Evol 2009; 9:514–517 [CrossRef][PubMed]
    [Google Scholar]
  57. Hoskins J, Alborn WE, Arnold J, Blaszczak LC, Burgett S et al. Genome of the bacterium Streptococcus pneumoniae strain R6. J Bacteriol 2001; 183:5709–5717 [CrossRef][PubMed]
    [Google Scholar]
  58. Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS. Regulation of lysine biosynthesis and transport genes in bacteria: yet another RNA riboswitch?. Nucleic Acids Res 2003; 31:6748–6757 [CrossRef][PubMed]
    [Google Scholar]
  59. Willenborg J, Goethe R. Metabolic traits of pathogenic streptococci. FEBS Lett 2016; 590:3905–3919 [CrossRef][PubMed]
    [Google Scholar]
  60. Mukherjee S, Barash D, Sengupta S. Comparative genomics and phylogenomic analyses of lysine riboswitch distributions in bacteria. PLoS One 2017; 12:e0184314 [CrossRef][PubMed]
    [Google Scholar]
  61. Fondi M, Brilli M, Emiliani G, Paffetti D, Fani R. The primordial metabolism: an ancestral interconnection between leucine, arginine, and lysine biosynthesis. BMC Evol Biol 2007; 7 Suppl 2:S3 [CrossRef][PubMed]
    [Google Scholar]
  62. Laetsch DR, Blaxter ML. KinFin: software for Taxon-Aware analysis of clustered protein sequences. G3 2017; 7:3349–3357 [CrossRef]
    [Google Scholar]
  63. Kunin V, Ahren D, Goldovsky L, Janssen P, Ouzounis CA. Measuring genome conservation across taxa: divided strains and United kingdoms. Nucleic Acids Res 2005; 33:616–621 [CrossRef][PubMed]
    [Google Scholar]
  64. Francis AR, Tanaka MM. Evolution of variation in presence and absence of genes in bacterial pathways. BMC Evol Biol 2012; 12:55 [CrossRef][PubMed]
    [Google Scholar]
  65. Zheng C, Jeong Y, Turcotte MG, Sankoff D. Resolution effects in reconstructing ancestral genomes. BMC Genomics 2018; 19:100 [CrossRef][PubMed]
    [Google Scholar]
  66. Hu F, Lin Y, Tang J. MLGO: phylogeny reconstruction and ancestral inference from gene-order data. BMC Bioinformatics 2014; 15:354 [CrossRef][PubMed]
    [Google Scholar]
  67. Press MO, Queitsch C, Borenstein E. Evolutionary assembly patterns of prokaryotic genomes. Genome Res 2016; 26:826–833 [CrossRef][PubMed]
    [Google Scholar]
  68. Sriswasdi S, Yang C-C, Iwasaki W. Generalist species drive microbial dispersion and evolution. Nat Commun 2017; 8:1162 [CrossRef][PubMed]
    [Google Scholar]
  69. Huang X, Albou L-P, Mushayahama T, Muruganujan A, Tang H et al. Ancestral genomes: a resource for reconstructed ancestral genes and genomes across the tree of life. Nucleic Acids Res 2019; 47:D271–D279 [CrossRef][PubMed]
    [Google Scholar]
  70. Goldovsky L, Janssen P, Ahrén D, Audit B, Cases I et al. CoGenT++: an extensive and extensible data environment for computational genomics. Bioinformatics 2005; 21:3806–3810 [CrossRef][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000429
Loading
/content/journal/mgen/10.1099/mgen.0.000429
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error