1887

Abstract

The recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tools to identify meaningful patterns in protein diversity are essential. Large sequence alignments from fast-evolving micro-organisms are particularly challenging to dissect using standard tools from phylogenetics and multivariate statistics because biologically relevant functional signals are easily masked by neutral variation and noise. To meet this need, a novel computational method is introduced that is easily executed in parallel using a cluster environment and can handle thousands of sequences with minimal subjective input from the user. The usefulness of this kind of machine learning is demonstrated by applying it to nearly 5000 haemagglutinin sequences of influenza A/H3N2.Antigenic and 3D structural mapping of the results show that the method can recover the major jumps in antigenic phenotype that occurred between 1968 and 2013 and identify specific amino acids associated with these changes. The method is expected to provide a useful tool to uncover patterns of protein evolution.

  • This is an open-access article distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the source is credited.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000025
2015-07-15
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/mgen/1/1/mgen000025.html?itemId=/content/journal/mgen/10.1099/mgen.0.000025&mimeType=html&fmt=ahah

References

  1. Aguas R., Ferguson N.M. 2013; Feature selection methods for identifying genetic determinants of host species in RNA viruses. PLOS Comput Biol 9:e1003254 [View Article][PubMed]
    [Google Scholar]
  2. Bao Y., Bolotov P., Dernovoy D., Kiryutin B., Zaslavsky L., Tatusova T., Ostell J., Lipman D. 2008; The influenza virus resource at the National Center for Biotechnology Information. J Virol 82:596–601 [View Article][PubMed]
    [Google Scholar]
  3. Bedford T., Suchard M.A., Lemey P., Dudas G., Gregory V., Hay A.J., McCauley J.W., Russell C.A., Smith D.J., Rambaut A. 2014; Integrating influenza antigenic dynamics with molecular evolution. eLife 3:e01914 [View Article][PubMed]
    [Google Scholar]
  4. Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Wheeler D.L. 2005; GenBank. Nucleic Acids Res 33:D34–D38 [View Article][PubMed]
    [Google Scholar]
  5. Bernardo J.M., Smith A.F.M. 2000 Bayesian Theory Chichester: Wiley;
    [Google Scholar]
  6. Bizebard T., Gigant B., Rigolet P., Rasmussen B., Diat O., Bösecke P., Wharton S.A., Skehel J.J., Knossow M. 1995; Structure of influenza virus haemagglutinin complexed with a neutralizing antibody. Nature 376:92–94 [View Article][PubMed]
    [Google Scholar]
  7. Bogner P., Capua I., Lipman D.J., Cox N.J., other authors. 2006; A global initiative on sharing avian flu data. Nature 442:981–981 [View Article]
    [Google Scholar]
  8. Cheng L., Connor T.R., Sirén J., Aanensen D.M., Corander J. 2013; Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol Biol Evol 30:1224–1228 [View Article][PubMed]
    [Google Scholar]
  9. Cotten M., Watson S.J., Kellam P., Al-Rabeeah A.A., Makhdoom H.Q., Assiri A., Al-Tawfiq J.A., Alhakeem R.F., Madani H., other authors. 2013; Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a descriptive genomic study. Lancet 382:1993–2002 [View Article][PubMed]
    [Google Scholar]
  10. Edgar R.C. 2004; muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797 [View Article][PubMed]
    [Google Scholar]
  11. Fitch W.M., Leiter J.M., Li X.Q., Palese P. 1991; Positive Darwinian evolution in human influenza A viruses. Proc Natl Acad Sci U S A 88:4270–4274 [View Article][PubMed]
    [Google Scholar]
  12. Fleury D., Barrère B., Bizebard T., Daniels R.S., Skehel J.J., Knossow M. 1999; A complex of influenza hemagglutinin with a neutralizing antibody that binds outside the virus receptor binding site. Nat Struct Biol 6:530–534 [View Article][PubMed]
    [Google Scholar]
  13. Gire S.K., Goba A., Andersen K.G., Sealfon R.S.G., Park D.J., Kanneh L., Jalloh S., Momoh M., Fullah M., other authors. 2014; Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345:1369–1372 [View Article][PubMed]
    [Google Scholar]
  14. Gong L.I., Bloom J.D. 2014; Epistatically interacting substitutions are enriched during adaptive protein evolution. PLoS Genet 10:e1004328 [View Article][PubMed]
    [Google Scholar]
  15. Hastie T., Tibshirani R., Friedman J. 2009 The Elements of Statistical Learning 2nd edn. Berlin: [View Article] Springer;
    [Google Scholar]
  16. Jain S., Neal R.M. 2007; Splitting and merging components of a nonconjugate Dirichlet process mixture model. Bayesian Anal 2:445–472 [View Article]
    [Google Scholar]
  17. Knossow M., Gaudier M., Douglas A., Barrère B., Bizebard T., Barbey C., Gigant B., Skehel J.J. 2002; Mechanism of neutralization of influenza virus infectivity by antibodies. Virology 302:294–298 [View Article][PubMed]
    [Google Scholar]
  18. Koel B.F., Burke D.F., Bestebroer T.M., van der Vliet S., Zondag G.C.M., Vervaet G., Skepner E., Lewis N.S., Spronken M.I.J., other authors. 2013; Substitutions near the receptor binding site determine major antigenic change during influenza virus evolution. Science 342:976–979 [View Article][PubMed]
    [Google Scholar]
  19. Köser C.U., Ellington M.J., Cartwright E.J.P., Gillespie S.H., Brown N.M., Farrington M., Holden M.T.G., Dougan G., Bentley S.D., other authors. 2012; Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathog 8:e1002824 [View Article][PubMed]
    [Google Scholar]
  20. Linderman S.L., Chambers B.S., Zost S.J., Parkhouse K., Li Y., Herrmann C., Ellebedy A.H., Carter D.M., Andrews S.F., other authors. 2014; Potential antigenic explanation for atypical H1N1 infections among middle-aged adults during the 2013–2014 influenza season. Proc Natl Acad Sci U S A 111:15798–15803 [View Article][PubMed]
    [Google Scholar]
  21. Marttinen P., Corander J., Törönen P., Holm L. 2006; Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics 22:2466–2474 [View Article][PubMed]
    [Google Scholar]
  22. Marttinen P., Myllykangas S., Corander J. 2009; Bayesian clustering and feature selection for cancer tissue samples. BMC Bioinformatics 10:90 [View Article][PubMed]
    [Google Scholar]
  23. Meroz D., Yoon S.-W., Ducatez M.F., Fabrizio T.P., Webby R.J., Hertz T., Ben-Tal N. 2011; Putative amino acid determinants of the emergence of the 2009 influenza A (H1N1) virus in the human population. Proc Natl Acad Sci U S A 108:13522–13527 [View Article][PubMed]
    [Google Scholar]
  24. Neal R.M. 2000; Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9:249–265
    [Google Scholar]
  25. Nei M., Kumar S. 2000 Molecular Evolution and Phylogenetics Oxford: Oxford University Press;
    [Google Scholar]
  26. Reuter S., Ellington M.J., Cartwright E.J.P., Köser C.U., Török M.E., Gouliouris T., Harris S.R., Brown N.M., Holden M.T.G., other authors. 2013; Rapid bacterial whole-genome sequencing to enhance diagnostic and public health microbiology. JAMA Intern Med 173:1397–1404 [View Article][PubMed]
    [Google Scholar]
  27. Sato K., Morishita T., Nobusawa E., Tonegawa K., Sakae K., Nakajima S., Nakajima K. 2004; Amino-acid change on the antigenic region B1 of H3 haemagglutinin may be a trigger for the emergence of drift strain of influenza A virus. Epidemiol Infect 132:399–406 [View Article][PubMed]
    [Google Scholar]
  28. Skehel J.J., Wiley D.C. 2000; Receptor binding and membrane fusion in virus entry: the influenza hemagglutinin. Annu Rev Biochem 69:531–569 [View Article][PubMed]
    [Google Scholar]
  29. Smith D.J., Lapedes A.S., de Jong J.C., Bestebroer T.M., Rimmelzwaan G.F., Osterhaus A.D.M.E., Fouchier R.A.M. 2004; Mapping the antigenic and genetic evolution of influenza virus. Science 305:371–376 [View Article][PubMed]
    [Google Scholar]
  30. Squires R.B., Noronha J., Hunt V., García-Sastre A., Macken C., Baumgarth N., Suarez D., Pickett B.E., Zhang Y., other authors. 2012; Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza Other Respi Viruses 6:404–416 [View Article][PubMed]
    [Google Scholar]
  31. Suzuki Y. 2006; Natural selection on the influenza virus genome. Mol Biol Evol 23:1902–1911 [View Article][PubMed]
    [Google Scholar]
  32. Suzuki Y. 2011; Positive selection for gains of N-linked glycosylation sites in hemagglutinin during evolution of H3N2 human influenza A virus. Genes Genet Syst 86:287–294 [View Article][PubMed]
    [Google Scholar]
  33. Tamura K., Kumar S. 2002; Evolutionary distance estimation under heterogeneous substitution pattern among lineages. Mol Biol Evol 19:1727–1736 [View Article][PubMed]
    [Google Scholar]
  34. Wolf Y.I., Viboud C., Holmes E.C., Koonin E.V., Lipman D.J. 2006; Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus. Biol Direct 1:34 [View Article][PubMed]
    [Google Scholar]
  35. Worobey M., Han G.-Z., Rambaut A. 2014; A synchronized global sweep of the internal genes of modern avian influenza virus. Nature 508:254–257 [View Article][PubMed]
    [Google Scholar]
  36. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334296 (2015)
  37. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334294 (2015)
  38. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334297 (2015)
  39. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334293 (2015)
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000025
Loading
/content/journal/mgen/10.1099/mgen.0.000025
Loading

Data & Media loading...

Supplements

Loading data from figshare Loading data from figshare
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error