1887

Abstract

The recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tools to identify meaningful patterns in protein diversity are essential. Large sequence alignments from fast-evolving micro-organisms are particularly challenging to dissect using standard tools from phylogenetics and multivariate statistics because biologically relevant functional signals are easily masked by neutral variation and noise. To meet this need, a novel computational method is introduced that is easily executed in parallel using a cluster environment and can handle thousands of sequences with minimal subjective input from the user. The usefulness of this kind of machine learning is demonstrated by applying it to nearly 5000 haemagglutinin sequences of influenza A/H3N2.Antigenic and 3D structural mapping of the results show that the method can recover the major jumps in antigenic phenotype that occurred between 1968 and 2013 and identify specific amino acids associated with these changes. The method is expected to provide a useful tool to uncover patterns of protein evolution.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000025
2015-07-15
2019-10-23
Loading full text...

Full text loading...

/deliver/fulltext/mgen/1/1/mgen000025.html?itemId=/content/journal/mgen/10.1099/mgen.0.000025&mimeType=html&fmt=ahah

References

  1. Aguas R., Ferguson N.M.. ( 2013;). Feature selection methods for identifying genetic determinants of host species in RNA viruses. PLOS Comput Biol 9: e1003254 [CrossRef] [PubMed].
    [Google Scholar]
  2. Bao Y., Bolotov P., Dernovoy D., Kiryutin B., Zaslavsky L., Tatusova T., Ostell J., Lipman D.. ( 2008;). The influenza virus resource at the National Center for Biotechnology Information. J Virol 82: 596–601 [CrossRef] [PubMed].
    [Google Scholar]
  3. Bedford T., Suchard M.A., Lemey P., Dudas G., Gregory V., Hay A.J., McCauley J.W., Russell C.A., Smith D.J., Rambaut A.. ( 2014;). Integrating influenza antigenic dynamics with molecular evolution. eLife 3: e01914 [CrossRef] [PubMed].
    [Google Scholar]
  4. Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Wheeler D.L.. ( 2005;). GenBank. Nucleic Acids Res 33: D34–D38 [CrossRef] [PubMed].
    [Google Scholar]
  5. Bernardo J.M., Smith A.F.M.. ( 2000;). Bayesian Theory Chichester: Wiley;.
    [Google Scholar]
  6. Bizebard T., Gigant B., Rigolet P., Rasmussen B., Diat O., Bösecke P., Wharton S.A., Skehel J.J., Knossow M.. ( 1995;). Structure of influenza virus haemagglutinin complexed with a neutralizing antibody. Nature 376: 92–94 [CrossRef] [PubMed].
    [Google Scholar]
  7. Bogner P., Capua I., Lipman D.J., Cox N.J., other authors. ( 2006;). A global initiative on sharing avian flu data. Nature 442: 981–981 [CrossRef].
    [Google Scholar]
  8. Cheng L., Connor T.R., Sirén J., Aanensen D.M., Corander J.. ( 2013;). Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol Biol Evol 30: 1224–1228 [CrossRef] [PubMed].
    [Google Scholar]
  9. Cotten M., Watson S.J., Kellam P., Al-Rabeeah A.A., Makhdoom H.Q., Assiri A., Al-Tawfiq J.A., Alhakeem R.F., Madani H., other authors. ( 2013;). Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a descriptive genomic study. Lancet 382: 1993–2002 [CrossRef] [PubMed].
    [Google Scholar]
  10. Edgar R.C.. ( 2004;). muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797 [CrossRef] [PubMed].
    [Google Scholar]
  11. Fitch W.M., Leiter J.M., Li X.Q., Palese P.. ( 1991;). Positive Darwinian evolution in human influenza A viruses. Proc Natl Acad Sci U S A 88: 4270–4274 [CrossRef] [PubMed].
    [Google Scholar]
  12. Fleury D., Barrère B., Bizebard T., Daniels R.S., Skehel J.J., Knossow M.. ( 1999;). A complex of influenza hemagglutinin with a neutralizing antibody that binds outside the virus receptor binding site. Nat Struct Biol 6: 530–534 [CrossRef] [PubMed].
    [Google Scholar]
  13. Gire S.K., Goba A., Andersen K.G., Sealfon R.S.G., Park D.J., Kanneh L., Jalloh S., Momoh M., Fullah M., other authors. ( 2014;). Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345: 1369–1372 [CrossRef] [PubMed].
    [Google Scholar]
  14. Gong L.I., Bloom J.D.. ( 2014;). Epistatically interacting substitutions are enriched during adaptive protein evolution. PLoS Genet 10: e1004328 [CrossRef] [PubMed].
    [Google Scholar]
  15. Hastie T., Tibshirani R., Friedman J.. ( 2009;). The Elements of Statistical Learning 2nd edn. Berlin:: [CrossRef] Springer;.
    [Google Scholar]
  16. Jain S., Neal R.M.. ( 2007;). Splitting and merging components of a nonconjugate Dirichlet process mixture model. Bayesian Anal 2: 445–472 [CrossRef].
    [Google Scholar]
  17. Knossow M., Gaudier M., Douglas A., Barrère B., Bizebard T., Barbey C., Gigant B., Skehel J.J.. ( 2002;). Mechanism of neutralization of influenza virus infectivity by antibodies. Virology 302: 294–298 [CrossRef] [PubMed].
    [Google Scholar]
  18. Koel B.F., Burke D.F., Bestebroer T.M., van der Vliet S., Zondag G.C.M., Vervaet G., Skepner E., Lewis N.S., Spronken M.I.J., other authors. ( 2013;). Substitutions near the receptor binding site determine major antigenic change during influenza virus evolution. Science 342: 976–979 [CrossRef] [PubMed].
    [Google Scholar]
  19. Köser C.U., Ellington M.J., Cartwright E.J.P., Gillespie S.H., Brown N.M., Farrington M., Holden M.T.G., Dougan G., Bentley S.D., other authors. ( 2012;). Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathog 8: e1002824 [CrossRef] [PubMed].
    [Google Scholar]
  20. Linderman S.L., Chambers B.S., Zost S.J., Parkhouse K., Li Y., Herrmann C., Ellebedy A.H., Carter D.M., Andrews S.F., other authors. ( 2014;). Potential antigenic explanation for atypical H1N1 infections among middle-aged adults during the 2013–2014 influenza season. Proc Natl Acad Sci U S A 111: 15798–15803 [CrossRef] [PubMed].
    [Google Scholar]
  21. Marttinen P., Corander J., Törönen P., Holm L.. ( 2006;). Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics 22: 2466–2474 [CrossRef] [PubMed].
    [Google Scholar]
  22. Marttinen P., Myllykangas S., Corander J.. ( 2009;). Bayesian clustering and feature selection for cancer tissue samples. BMC Bioinformatics 10: 90 [CrossRef] [PubMed].
    [Google Scholar]
  23. Meroz D., Yoon S.-W., Ducatez M.F., Fabrizio T.P., Webby R.J., Hertz T., Ben-Tal N.. ( 2011;). Putative amino acid determinants of the emergence of the 2009 influenza A (H1N1) virus in the human population. Proc Natl Acad Sci U S A 108: 13522–13527 [CrossRef] [PubMed].
    [Google Scholar]
  24. Neal R.M.. ( 2000;). Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9: 249–265.
    [Google Scholar]
  25. Nei M., Kumar S.. ( 2000;). Molecular Evolution and Phylogenetics Oxford: Oxford University Press;.
    [Google Scholar]
  26. Reuter S., Ellington M.J., Cartwright E.J.P., Köser C.U., Török M.E., Gouliouris T., Harris S.R., Brown N.M., Holden M.T.G., other authors. ( 2013;). Rapid bacterial whole-genome sequencing to enhance diagnostic and public health microbiology. JAMA Intern Med 173: 1397–1404 [CrossRef] [PubMed].
    [Google Scholar]
  27. Sato K., Morishita T., Nobusawa E., Tonegawa K., Sakae K., Nakajima S., Nakajima K.. ( 2004;). Amino-acid change on the antigenic region B1 of H3 haemagglutinin may be a trigger for the emergence of drift strain of influenza A virus. Epidemiol Infect 132: 399–406 [CrossRef] [PubMed].
    [Google Scholar]
  28. Skehel J.J., Wiley D.C.. ( 2000;). Receptor binding and membrane fusion in virus entry: the influenza hemagglutinin. Annu Rev Biochem 69: 531–569 [CrossRef] [PubMed].
    [Google Scholar]
  29. Smith D.J., Lapedes A.S., de Jong J.C., Bestebroer T.M., Rimmelzwaan G.F., Osterhaus A.D.M.E., Fouchier R.A.M.. ( 2004;). Mapping the antigenic and genetic evolution of influenza virus. Science 305: 371–376 [CrossRef] [PubMed].
    [Google Scholar]
  30. Squires R.B., Noronha J., Hunt V., García-Sastre A., Macken C., Baumgarth N., Suarez D., Pickett B.E., Zhang Y., other authors. ( 2012;). Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza Other Respi Viruses 6: 404–416 [CrossRef] [PubMed].
    [Google Scholar]
  31. Suzuki Y.. ( 2006;). Natural selection on the influenza virus genome. Mol Biol Evol 23: 1902–1911 [CrossRef] [PubMed].
    [Google Scholar]
  32. Suzuki Y.. ( 2011;). Positive selection for gains of N-linked glycosylation sites in hemagglutinin during evolution of H3N2 human influenza A virus. Genes Genet Syst 86: 287–294 [CrossRef] [PubMed].
    [Google Scholar]
  33. Tamura K., Kumar S.. ( 2002;). Evolutionary distance estimation under heterogeneous substitution pattern among lineages. Mol Biol Evol 19: 1727–1736 [CrossRef] [PubMed].
    [Google Scholar]
  34. Wolf Y.I., Viboud C., Holmes E.C., Koonin E.V., Lipman D.J.. ( 2006;). Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus. Biol Direct 1: 34 [CrossRef] [PubMed].
    [Google Scholar]
  35. Worobey M., Han G.-Z., Rambaut A.. ( 2014;). A synchronized global sweep of the internal genes of modern avian influenza virus. Nature 508: 254–257 [CrossRef] [PubMed].
    [Google Scholar]
  36. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334296 (2015).
  37. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334294 (2015).
  38. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334297 (2015).
  39. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334293 (2015).
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000025
Loading
/content/journal/mgen/10.1099/mgen.0.000025
Loading

Data & Media loading...

Most Cited This Month

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error