1887

Abstract

Classification of viral sequences should be fast, objective, accurate and reproducible. Most methods that classify sequences use either pair-wise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the BI for subtype classification in hepatitis C virus (HCV) and human immunodeficiency virus-1 (HIV-1). Pairs of BI values with known positive- and negative-test results were computed from 10  000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signals that grouped reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1 % agreement with reference subtypes, with equal false-positive and false-negative rates. For HIV-1, a threshold of 0.66 yields 93.5 % agreement. Higher thresholds can be used where lower false-positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not represent any known subtype uniquely. Web-based services for viral subtype classification with the BI are available online.

Loading

Article metrics loading...

/content/journal/jgv/10.1099/vir.0.83657-0
2008-09-01
2024-10-10
Loading full text...

Full text loading...

/deliver/fulltext/jgv/89/9/2098.html?itemId=/content/journal/jgv/10.1099/vir.0.83657-0&mimeType=html&fmt=ahah

References

  1. Calef C., Kuiken C., Szinger J., Gaschen B., Abfalterer W., Zhang M., Tao N., Funkhouser R., Yusim K. other authors 2005; Gateway to tools of HIV and HCV databases. In HIV Sequence Compendium 2005 pp 49–79Edited by Leitner T., Foley B., Hahn B., Marx P., McCutchan F., Mellors J., Wolinsky S., Korber B. Los Alamos, NM: Theoretical Biology and Biophysics Group, Los Alamos National Laboratory;
    [Google Scholar]
  2. de Oliveira T., Deforche K., Cassol S., Salminen M., Paraskevis D., Seebregts C., Snoeck J., van Rensburg E. J., Wensing A. M. other authors 2005; An automated genotyping system for analysis of HIV-1 and other microbial sequences. Bioinformatics 21:3797–3800 [CrossRef]
    [Google Scholar]
  3. Felsenstein J. 1984; Distance methods for inferring phylogenies: a justification. Evolution 38:16–24 [CrossRef]
    [Google Scholar]
  4. Felsenstein J. 2004 Inferring Phylogenies Sunderland, MA: Sinauer Associates;
    [Google Scholar]
  5. Fried M. W., Shiffman M. L., Reddy K. R., Smith C., Marinos G., Goncales F. L., Haussinger D., Diago M., Carosi G. other authors 2002; Peginterferon alfa-2a plus ribavirin for chronic hepatitis C virus infection. N Engl J Med 347:975–982 [CrossRef]
    [Google Scholar]
  6. Gascuel O. 1997; BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14:685–695 [CrossRef]
    [Google Scholar]
  7. Gascuel O. 2000; On the optimization principle in phylogenetic analysis and the minimum-evolution criterion. Mol Biol Evol 17:401–405 [CrossRef]
    [Google Scholar]
  8. Hadziyannis S. J., Sette H. Jr, Morgan T. R., Balan V., Diago M., Marcellin P., Ramadori G., Bodenheimer H., Bernstein D. other authors 2004; Peginterferon- α 2a and ribavirin combination therapy in chronic hepatitis C: a randomized study of treatment duration and ribavirin dose. Ann Intern Med 140:346–355 [CrossRef]
    [Google Scholar]
  9. Hillis D. M., Huelsenbeck J. P., Cunningham C. W. 1994; Application and accuracy of molecular phylogenies. Science 264:671–677 [CrossRef]
    [Google Scholar]
  10. Hraber P. T., Fischer W., Bruno W. J., Leitner T., Kuiken C. 2006; Comparative analysis of hepatitis C virus phylogenies from coding and non-coding regions: the 5′ untranslated region (UTR) fails to classify subtypes. Virol J 3:103 [CrossRef]
    [Google Scholar]
  11. Hraber P. T., Leach R. W., Reilly L. P., Thurmond J., Yusim K., Kuiken C. 2007; Los Alamos hepatitis C virus sequence and human immunology databases: an expanding resource for antiviral research. Antivir Chem Chemother 18:113–124 [CrossRef]
    [Google Scholar]
  12. Korber B., Gaschen B., Yusim K., Thakallapally R., Kesmir C., Detours V. 2001; Evolutionary and immunological implications of contemporary HIV-1 variation. Br Med Bull 58:19–42 [CrossRef]
    [Google Scholar]
  13. Kuiken C., Yusim K., Boykin L., Richardson R. 2005; The Los Alamos hepatitis C sequence database. Bioinformatics 21:379–384 [CrossRef]
    [Google Scholar]
  14. Kuiken C., Combet C., Bukh J., Shin-I T., Deléage G., Mizokami M., Richardson R., Sablon E., Yusim K. other authors 2006; A comprehensive system for consistent numbering of HCV sequences, proteins and epitopes. Hepatology 44:1355–1361 [CrossRef]
    [Google Scholar]
  15. Leitner T. editor 2002 The Molecular Epidemiology of Human Viruses Norwell, MA: Kluwer Academic Publishers;
    [Google Scholar]
  16. Leitner T., Escanilla D., Franzén C., Uhlén M., Albert J. 1996; Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc Natl Acad Sci U S A 93:10864–10869 [CrossRef]
    [Google Scholar]
  17. Leitner T., Kumar S., Albert J. 1997; Tempo and mode of nucleotide substitutions in gag and env gene fragments in human immunodeficiency virus type 1 populations with a known transmission history. J Virol 71:4761–4770
    [Google Scholar]
  18. Leitner T., Korber B., Daniels M., Calef C., Foley B. 2005; HIV-1 subtype and circulating recombinant form (CRF) reference sequences; 2005 In HIV Sequence Compendium 2005 pp 41–48Edited by Leitner T., Foley B., Hahn B., Marx P., McCutchan F., Mellors J., Wolinsky S., Korber. Los Alamos, NM: Theoretical Biology and Biophysics Group, Los Alamos National Laboratory;
    [Google Scholar]
  19. Motulsky H. 1995 Intuitive Biostatistics Oxford: Oxford University Press;
    [Google Scholar]
  20. R Development Core Team 2007 R: a language and environment for statistical computing Vienna, Austria: R Foundation for Statistical Computing;
    [Google Scholar]
  21. Robertson D. L., Anderson J. P., Bradac J. A., Carr J. K., Foley B., Funkhouser R. K., Gao F., Hahn B. H., Kalish M. L. other authors 2000; HIV-1 nomenclature proposal. Science 288:55–56
    [Google Scholar]
  22. Salminen M. O., Carr J. K., Burke D. S., McCutchan F. E. 1995; Identification of breakpoints in intergenotypic recombinants of HIV-1 by bootscanning. AIDS Res Hum Retroviruses 11:1423–1425 [CrossRef]
    [Google Scholar]
  23. Schultz A.-K., Zhang M., Leitner T., Kuiken C., Korber B., Morgenstern B., Stanke M. 2006; A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes. BMC Bioinformatics 7:265 [CrossRef]
    [Google Scholar]
  24. Simmonds P., Bukh J., Combet C., Deléage G., Enomoto N., Feinstone S., Halfon P., Inchauspé G., Kuiken C. other authors 2005; Consensus proposals for a unified system of nomenclature of hepatitis C virus genotypes. Hepatology 42:962–973 [CrossRef]
    [Google Scholar]
  25. Sokal R. R., Rohlf F. J. 1995 Biometry: The Principles and Practice of Statistics in Biological Research New York: W. H. Freeman;
    [Google Scholar]
  26. Stajich J. E., Block D., Boulez K., Brenner S. E., Chervitz S. A., Dagdigian C., Fuellen G., Gilbert J. G., Korf I. other authors 2002; The Bioperl toolkit: Perl modules for the life sciences. Genome Res 12:1611–1618 [CrossRef]
    [Google Scholar]
  27. Swofford D. L. 2002 paup*: phylogenetic analysis using parsimony (*and other methods Sunderland, MA: Sinauer Associates;
    [Google Scholar]
  28. Thompson J. D., Higgins D. G., Gibson T. J. 1994; clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680 [CrossRef]
    [Google Scholar]
  29. van Regenmortel M. H. V. 2007; Virus species and virus identification: past and current controversies. Infect Genet Evol 7:133–144 [CrossRef]
    [Google Scholar]
  30. Wilbe K., Salminen M., Laukkanen T., McCutchan F., Ray S. C., Albert J., Leitner T. 2003; Characterization of novel recombinant HIV-1 genomes using the branching index. Virology 316:116–125 [CrossRef]
    [Google Scholar]
/content/journal/jgv/10.1099/vir.0.83657-0
Loading
/content/journal/jgv/10.1099/vir.0.83657-0
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error