1887

Abstract

The recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tools to identify meaningful patterns in protein diversity are essential. Large sequence alignments from fast-evolving micro-organisms are particularly challenging to dissect using standard tools from phylogenetics and multivariate statistics because biologically relevant functional signals are easily masked by neutral variation and noise. To meet this need, a novel computational method is introduced that is easily executed in parallel using a cluster environment and can handle thousands of sequences with minimal subjective input from the user. The usefulness of this kind of machine learning is demonstrated by applying it to nearly 5000 haemagglutinin sequences of influenza A/H3N2.Antigenic and 3D structural mapping of the results show that the method can recover the major jumps in antigenic phenotype that occurred between 1968 and 2013 and identify specific amino acids associated with these changes. The method is expected to provide a useful tool to uncover patterns of protein evolution.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000025
2015-07-15
2020-02-21
Loading full text...

Full text loading...

/deliver/fulltext/mgen/1/1/mgen000025.html?itemId=/content/journal/mgen/10.1099/mgen.0.000025&mimeType=html&fmt=ahah

References

  1. Aguas R., Ferguson N.M.. 2013; Feature selection methods for identifying genetic determinants of host species in RNA viruses. PLOS Comput Biol9:e1003254 [CrossRef][PubMed]
    [Google Scholar]
  2. Bao Y., Bolotov P., Dernovoy D., Kiryutin B., Zaslavsky L., Tatusova T., Ostell J., Lipman D.. 2008; The influenza virus resource at the National Center for Biotechnology Information. J Virol82:596–601 [CrossRef][PubMed]
    [Google Scholar]
  3. Bedford T., Suchard M.A., Lemey P., Dudas G., Gregory V., Hay A.J., McCauley J.W., Russell C.A., Smith D.J., Rambaut A.. 2014; Integrating influenza antigenic dynamics with molecular evolution. eLife3:e01914 [CrossRef][PubMed]
    [Google Scholar]
  4. Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Wheeler D.L.. 2005; GenBank. Nucleic Acids Res33:D34–D38 [CrossRef][PubMed]
    [Google Scholar]
  5. Bernardo J.M., Smith A.F.M.. 2000; Bayesian Theory Chichester: Wiley;
    [Google Scholar]
  6. Bizebard T., Gigant B., Rigolet P., Rasmussen B., Diat O., Bösecke P., Wharton S.A., Skehel J.J., Knossow M.. 1995; Structure of influenza virus haemagglutinin complexed with a neutralizing antibody. Nature376:92–94 [CrossRef][PubMed]
    [Google Scholar]
  7. Bogner P., Capua I., Lipman D.J., Cox N.J., other authors. 2006; A global initiative on sharing avian flu data. Nature442:981–981 [CrossRef]
    [Google Scholar]
  8. Cheng L., Connor T.R., Sirén J., Aanensen D.M., Corander J.. 2013; Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol Biol Evol30:1224–1228 [CrossRef][PubMed]
    [Google Scholar]
  9. Cotten M., Watson S.J., Kellam P., Al-Rabeeah A.A., Makhdoom H.Q., Assiri A., Al-Tawfiq J.A., Alhakeem R.F., Madani H., other authors. 2013; Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a descriptive genomic study. Lancet382:1993–2002 [CrossRef][PubMed]
    [Google Scholar]
  10. Edgar R.C.. 2004; muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res32:1792–1797 [CrossRef][PubMed]
    [Google Scholar]
  11. Fitch W.M., Leiter J.M., Li X.Q., Palese P.. 1991; Positive Darwinian evolution in human influenza A viruses. Proc Natl Acad Sci U S A88:4270–4274 [CrossRef][PubMed]
    [Google Scholar]
  12. Fleury D., Barrère B., Bizebard T., Daniels R.S., Skehel J.J., Knossow M.. 1999; A complex of influenza hemagglutinin with a neutralizing antibody that binds outside the virus receptor binding site. Nat Struct Biol6:530–534 [CrossRef][PubMed]
    [Google Scholar]
  13. Gire S.K., Goba A., Andersen K.G., Sealfon R.S.G., Park D.J., Kanneh L., Jalloh S., Momoh M., Fullah M., other authors. 2014; Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science345:1369–1372 [CrossRef][PubMed]
    [Google Scholar]
  14. Gong L.I., Bloom J.D.. 2014; Epistatically interacting substitutions are enriched during adaptive protein evolution. PLoS Genet10:e1004328 [CrossRef][PubMed]
    [Google Scholar]
  15. Hastie T., Tibshirani R., Friedman J.. 2009; The Elements of Statistical Learning 2nd edn. Berlin: [CrossRef] Springer;
    [Google Scholar]
  16. Jain S., Neal R.M.. 2007; Splitting and merging components of a nonconjugate Dirichlet process mixture model. Bayesian Anal2:445–472 [CrossRef]
    [Google Scholar]
  17. Knossow M., Gaudier M., Douglas A., Barrère B., Bizebard T., Barbey C., Gigant B., Skehel J.J.. 2002; Mechanism of neutralization of influenza virus infectivity by antibodies. Virology302:294–298 [CrossRef][PubMed]
    [Google Scholar]
  18. Koel B.F., Burke D.F., Bestebroer T.M., van der Vliet S., Zondag G.C.M., Vervaet G., Skepner E., Lewis N.S., Spronken M.I.J., other authors. 2013; Substitutions near the receptor binding site determine major antigenic change during influenza virus evolution. Science342:976–979 [CrossRef][PubMed]
    [Google Scholar]
  19. Köser C.U., Ellington M.J., Cartwright E.J.P., Gillespie S.H., Brown N.M., Farrington M., Holden M.T.G., Dougan G., Bentley S.D., other authors. 2012; Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathog8:e1002824 [CrossRef][PubMed]
    [Google Scholar]
  20. Linderman S.L., Chambers B.S., Zost S.J., Parkhouse K., Li Y., Herrmann C., Ellebedy A.H., Carter D.M., Andrews S.F., other authors. 2014; Potential antigenic explanation for atypical H1N1 infections among middle-aged adults during the 2013–2014 influenza season. Proc Natl Acad Sci U S A111:15798–15803 [CrossRef][PubMed]
    [Google Scholar]
  21. Marttinen P., Corander J., Törönen P., Holm L.. 2006; Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics22:2466–2474 [CrossRef][PubMed]
    [Google Scholar]
  22. Marttinen P., Myllykangas S., Corander J.. 2009; Bayesian clustering and feature selection for cancer tissue samples. BMC Bioinformatics10:90 [CrossRef][PubMed]
    [Google Scholar]
  23. Meroz D., Yoon S.-W., Ducatez M.F., Fabrizio T.P., Webby R.J., Hertz T., Ben-Tal N.. 2011; Putative amino acid determinants of the emergence of the 2009 influenza A (H1N1) virus in the human population. Proc Natl Acad Sci U S A108:13522–13527 [CrossRef][PubMed]
    [Google Scholar]
  24. Neal R.M.. 2000; Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat9:249–265
    [Google Scholar]
  25. Nei M., Kumar S.. 2000; Molecular Evolution and Phylogenetics Oxford: Oxford University Press;
    [Google Scholar]
  26. Reuter S., Ellington M.J., Cartwright E.J.P., Köser C.U., Török M.E., Gouliouris T., Harris S.R., Brown N.M., Holden M.T.G., other authors. 2013; Rapid bacterial whole-genome sequencing to enhance diagnostic and public health microbiology. JAMA Intern Med173:1397–1404 [CrossRef][PubMed]
    [Google Scholar]
  27. Sato K., Morishita T., Nobusawa E., Tonegawa K., Sakae K., Nakajima S., Nakajima K.. 2004; Amino-acid change on the antigenic region B1 of H3 haemagglutinin may be a trigger for the emergence of drift strain of influenza A virus. Epidemiol Infect132:399–406 [CrossRef][PubMed]
    [Google Scholar]
  28. Skehel J.J., Wiley D.C.. 2000; Receptor binding and membrane fusion in virus entry: the influenza hemagglutinin. Annu Rev Biochem69:531–569 [CrossRef][PubMed]
    [Google Scholar]
  29. Smith D.J., Lapedes A.S., de Jong J.C., Bestebroer T.M., Rimmelzwaan G.F., Osterhaus A.D.M.E., Fouchier R.A.M.. 2004; Mapping the antigenic and genetic evolution of influenza virus. Science305:371–376 [CrossRef][PubMed]
    [Google Scholar]
  30. Squires R.B., Noronha J., Hunt V., García-Sastre A., Macken C., Baumgarth N., Suarez D., Pickett B.E., Zhang Y., other authors. 2012; Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza Other Respi Viruses6:404–416 [CrossRef][PubMed]
    [Google Scholar]
  31. Suzuki Y.. 2006; Natural selection on the influenza virus genome. Mol Biol Evol23:1902–1911 [CrossRef][PubMed]
    [Google Scholar]
  32. Suzuki Y.. 2011; Positive selection for gains of N-linked glycosylation sites in hemagglutinin during evolution of H3N2 human influenza A virus. Genes Genet Syst86:287–294 [CrossRef][PubMed]
    [Google Scholar]
  33. Tamura K., Kumar S.. 2002; Evolutionary distance estimation under heterogeneous substitution pattern among lineages. Mol Biol Evol19:1727–1736 [CrossRef][PubMed]
    [Google Scholar]
  34. Wolf Y.I., Viboud C., Holmes E.C., Koonin E.V., Lipman D.J.. 2006; Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus. Biol Direct1:34 [CrossRef][PubMed]
    [Google Scholar]
  35. Worobey M., Han G.-Z., Rambaut A.. 2014; A synchronized global sweep of the internal genes of modern avian influenza virus. Nature508:254–257 [CrossRef][PubMed]
    [Google Scholar]
  36. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334296 (2015)
  37. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334294 (2015)
  38. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334297 (2015)
  39. Pessia, A., Grad, Y., Cobey, S., Puranen, J. S. & Corander, J. Figshare. http://dx.doi.org/10.6084/m9.figshare.1334293 (2015)
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000025
Loading
/content/journal/mgen/10.1099/mgen.0.000025
Loading

Data & Media loading...

Supplements

Loading data from figshare Loading data from figshare

Most cited articles

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error