Distortions of Taxonomic Structure from Incomplete Data on a Restricted Set of Reference Strains Free

Abstract

The paper examines how well taxonomic relationships can be estimated when the data are restricted to the similarities between each of the strains and a small subset of reference strains. Such data represent a strip from the similarity matrix rather than the complete matrix. The methods studied were: () minimum spanning trees, () the definition of one group at a time, and () the calculation of ‘derived matrices’. A derived matrix is a complete matrix obtained solely from the entries of the incomplete matrix, by treating these as quantitative character states. The data used were taxonomic distances based on morphological, biochemical and physiological results, and were selected from a previous study to provide good examples of salient patterns of taxonomic relationship. The results that were most similar to those from the complete data were given by derived matrices. Surprisingly little taxonomic distortion occurred, even if the reference strains were rather few, provided these were suitably chosen. Reference strains should be well dispersed, because distortion was considerable if all were very similar to one another. Ideally there should be a reference strain from each cluster, and aids to ensuring this are discussed.

The method has considerable potential for serological or nucleic acid pairing studies in which it is usually impracticable to obtain complete data on numerous strains.

Loading

Article metrics loading...

/content/journal/micro/10.1099/00221287-129-4-1045
1983-04-01
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/micro/129/4/mic-129-4-1045.html?itemId=/content/journal/micro/10.1099/00221287-129-4-1045&mimeType=html&fmt=ahah

References

  1. Bartlett M. S. 1951; A further note on tests of significance in factor analysis. British Journal of Statistical Psychology 4:1–2
    [Google Scholar]
  2. Basford N. L., Butler J. E., Leone C. A., Rohlf F. J. 1968; Immunologic comparisons of selected Coleoptera with analysis of relationships using numerical taxonomic methods. Systematic Zoology 17:388–406
    [Google Scholar]
  3. Baum B. R. 1977; Reduction of dimensionality for heuristic purposes. Taxon 26:191–195
    [Google Scholar]
  4. Cristofolini G. 1980; Interpretation and analysis of serological data. In Chemosystematics: Principles and Practice pp. 269–288 Bisby F. A., Vaughn J. G., Wright C. A. Edited by London & New York:: Academic Press.;
    [Google Scholar]
  5. Cristofolini G. 1981; Serological systematics of the Leguminosae. In Advances in Legume Systematics pp. 513–531 Polhill R. M., Raven P. H. Edited by Kew:: Royal Botanic Garden.;
    [Google Scholar]
  6. Cristofolini G., Feoli Chiapella L. 1977; Serological systematics of the tribe Genisteae (Fabaceae). Taxon 26:43–56
    [Google Scholar]
  7. Darbyshire J. H., Rowell J. G., Cook J. K. A., Peters R. W. 1979; Taxonomic studies on strains of avian infections bronchitis virus using neutralization tests in tracheal organ cultures. Archives of Virology 61:227–238
    [Google Scholar]
  8. Davison M. L. 1976; Fitting and testing Carroll’s weighted unfolding model for preferences. Psychometrika 41:233–247
    [Google Scholar]
  9. De Ley J., Segers P., Gillis M. 1978; Intra- and intergeneric similarities of Chromobacterium and Janthinobacterium ribosomal ribonucleic acid cistrons. International Journal of Systematic Bacteriology 28:154–168
    [Google Scholar]
  10. Gasser F., Gasser C. 1971; Immunological relationships among lactic dehydrogenases in the genera Lactobacillus and Leuconostoc. Journal of Bacteriology 106:113–125
    [Google Scholar]
  11. Gleason T. C., Staelin R. 1975; A proposal for handling missing data. Psychometrika 40:229–252
    [Google Scholar]
  12. Gold E. M. 1973; Metric unfolding: data requirement for unique solution and clarification of Schönemann’s algorithm. Psychometrika 38:555–569
    [Google Scholar]
  13. Goodman M., Moore G. W. 1971; Immunodiffusion systematics of the Primates. I. TheCatar- rhini. Systematic Zoology 20:19–62
    [Google Scholar]
  14. Gorman G. C., Buth D. G., Wyles J. S. 1980; Anolis lizards of the Eastern Caribbean: a case study in evolution. III. A cladistic analysis of albumin immunological data, and the definition of species groups. Systematic Zoology 29:143–158
    [Google Scholar]
  15. Gower J. C. 1966; Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338
    [Google Scholar]
  16. Gower J. C. 1968; Adding a point to vector diagrams in multivariate analysis. Biometrika 55:582–585
    [Google Scholar]
  17. Gower J. C. 1978; Comment on transformations to reduce dimensionality. Taxon 27:353–355
    [Google Scholar]
  18. Gower J. C., Ross G. J. S. 1969; Minimum spanning trees and single linkage cluster analysis. Applied Statistics 18:54–64
    [Google Scholar]
  19. Grimont P. A. D., Grimont F., Dulong De Rosnay H. L. C., Sneath P. H. A. 1977; Taxonomy of the genus Serratia. Journal of General Microbiology 98:39–66
    [Google Scholar]
  20. Hebert G. A., Moss C. W., Mcdougal L. K., Bozeman F. M., Mckinney R. M., Brenner D. J. 1980; The rickettsia-like organisms TAT-LOCK (1943) and HEBA (1959): bacteria phenotypically similar, but genetically distinct from Legionella pneumophila and the WIG A bacterium. Annals of Internal Medicine 92:45–52
    [Google Scholar]
  21. Hildebrand D. C., Schroth M. N., Huisman O. C. 1982; The DNA homology matrix and nonrandom variation concepts as the basis for the taxonomic treatment of plant pathogenic and other bacteria. Annual Review of Phytopathology 20:235–256
    [Google Scholar]
  22. Hogeweg P. 1976; Iterative character weighing in numerical taxonomy. Computers in Biology and Medicine 6:199–211
    [Google Scholar]
  23. Hope K. 1968 Methods of Multivariate Analysis. London:: University of London Press.;
    [Google Scholar]
  24. Kaneko T. 1979; Correlative similarity coefficient: new criterion for forming dendrograms. International Journal of Systematic Bacteriology 29:188–193
    [Google Scholar]
  25. Krych V. K., Johnson J. L., Yousten A. A. 1980; Deoxyribonucleic acid homologies among strains of Bacillus sphaericus. International Journal of Systematic Bacteriology 30:476–484
    [Google Scholar]
  26. Kurylowicz W., Paszkiewicz A., WÓznicka W., Kurztkowski W., Szulga T. 1975; Classification of streptomycetes by different numerical studies. Posepy Higieny i Medycyny Dośwîadczalnej Zeszyty Problemowe Warsawa 29:7–81
    [Google Scholar]
  27. Lee A. M. 1968; Numerical taxonomy and the influenza B virus. Nature; London: 217620–622
    [Google Scholar]
  28. London J., Chace N. M., Kline K. 1975; Aldolase of lactic acid bacteria: immunological relationships among aldolases of streptococci and Gram-positive nonsporeforming anaerobes. International Journal of Systematic Bacteriology 25:114–123
    [Google Scholar]
  29. Rogers D. J., Tanimoto T. T. 1960; A computer program for classifying plants. Science 132:1115–1118
    [Google Scholar]
  30. Searle S. R. 1966 Matrix Algebra for the Biological Sciences. New York:: John Wiley.;
    [Google Scholar]
  31. Sgorbati B. 1979; Preliminary quantification of immunological relationships among transaldolases of the genus Bifidobacterium. Antonie van Leeuwenhoek 45:557–564
    [Google Scholar]
  32. Sgorbati B., London J. 1982; Demonstration of phylogenetic relatedness among members of the genus Bifidobacterium by means of the enzyme transaldolase as an evolutionary marker. International Journal of Systematic Bacteriology 32:37–42
    [Google Scholar]
  33. Sibson R. 1972; Order invariant methods for data analysis. Journal of the Royal Statistical Society, B 34:311–338
    [Google Scholar]
  34. Silvestri L., Turri M., Hill L. R., Gllardi E. 1962; A quantitative approach to the systematics of actinomycetes based on overall similarity. Symposia of the Society for General Microbiology 12:333–360
    [Google Scholar]
  35. Skerman V. B. D., Mcgowan V., Sneath P. H. A. (editors) 1980; Approved lists of bacterial names. International Journal of Systematic Bacteriology 30:225–420
    [Google Scholar]
  36. Sneath P. H. A. 1974; Test reproducibility in relation to identification. International Journal of Systematic Bacteriology 24:508–523
    [Google Scholar]
  37. Sneath P. H. A. 1976; Phenetic taxonomy at the species level and above. Taxon 25:437–450
    [Google Scholar]
  38. Sneath P. H. A. 1977; A method for testing the distinctness of clusters: a test of the disjunction of two clusters in Euclidean space as measured by their overlap. Journal of the International Association for Mathematical Geology 9:123–143
    [Google Scholar]
  39. Sneath P. H. A. 1978; Classification of microorganisms. In Essays in Microbiology pp. 9/1–9/31 Norris J. R., Richmond M. H. Edited by Chichester:: John Wiley.;
    [Google Scholar]
  40. Sneath P. H. A. 1979a; BASIC program for a significance test for clusters in UPGMA dendrograms obtained from squared Euclidean distances. Computers and Geosciences 5:127–137
    [Google Scholar]
  41. Sneath P. H. A. 1979b; BASIC program for a significance test for two clusters in Euclidean space as measured by their overlap. Computers and Geosciences 5:143–155
    [Google Scholar]
  42. Sneath P. H. A. 1980; The probability that distinct clusters will be unrecognized in low dimensional ordinations. Classification Society Bulletin 4:422–43
    [Google Scholar]
  43. Sneath P. H. A., Johnson R. 1972; The influence on numerical taxonomic similarities of errors in microbiological tests. Journal of General Microbiology 72:377–392
    [Google Scholar]
  44. Sneath P. H. A., Sokal R. R. 1973 Numerical Taxonomy: the Principles and Practice of Numerical Classification. San Francisco:: W. H. Freeman.;
    [Google Scholar]
  45. Somerville W. J., Jones M. L. 1972; DNA competition studies within the Bacillus cereus group of bacilli. Journal of General Microbiology 73:257–265
    [Google Scholar]
  46. Wilkinson C. 1970; Adding a point to a Principal Coordinates analysis. Systematic Zoology 19:258–263
    [Google Scholar]
  47. Williams W. T., Clifford H. T., Lance G. N. 1971; Group-size dependence: a rationale for choice between numerical classifications. Computer Journal 14:157–162
    [Google Scholar]
  48. Williamson M. H. 1978; The ordination of incidence data. Journal of Ecology 66:911–920
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/micro/10.1099/00221287-129-4-1045
Loading
/content/journal/micro/10.1099/00221287-129-4-1045
Loading

Data & Media loading...

Most cited Most Cited RSS feed