Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

Estimating the Number of Species: A Review

J. Bunge and M. Fitzpatrick
Journal of the American Statistical Association
Vol. 88, No. 421 (Mar., 1993), pp. 364-373
DOI: 10.2307/2290733
Stable URL: http://www.jstor.org/stable/2290733
Page Count: 10
  • Download ($14.00)
  • Cite this Item
Estimating the Number of Species: A Review
Preview not available

Abstract

How many kinds are there? Suppose that a population is partitioned into C classes. In many situations interest focuses not on estimation of the relative sizes of the classes, but on estimation of C itself. For example, biologists and ecologists may be interested in estimating the number of species in a population of plants or animals, numismatists may be concerned with estimating the number of dies used to produce an ancient coin issue, and linguists may be interested in estimating the size of an author's vocabulary. In this article we review the problem of statistical estimation of C. Many approaches have been proposed, some purely data-analytic and others based in sampling theory. In the latter case numerous variations have been considered. The population may be finite or infinite. In finite, samples may be taken with replacement (multinomial sampling) or without replacement (hypergeometric sampling), or by Bernoulli sampling; if infinite, sampling may be multinomial or Bernoulli, or the sample may be the result of random Poisson contributions of each class. Given a sampling model, one may approach estimation of C via a parametric or nonparametric formulation; in either case there may be frequentist and Bayesian procedures. We begin by discussing the existing literature on this problem (over 120 references), organizing it by sampling model, population specification, and philosophy of estimation. We find that (a) the problem is quite resistant to statistical solution, essentially because no matter how many classes have been observed, there may still be a large number of very small unobserved classes; (b) many closely related estimation procedures have been developed independently and have not yet been compared; (c) there is not as yet a globally preferable estimator of C, although for some models there is an acceptable estimator (for some not even this is true); and (d) there are promising directions for research to pursue; for example, it appears possible to exploit estimates of the "coverage" of the sample (the total proportion of the population represented by the observed classes) to improve the accuracy of estimators of the number of classes. Finally, we make specific recommendations for future research, regarding parametric estimation, coverage-based estimation, resampling methods, Poisson process representation of sampling models, and frequentist decision theory.

Page Thumbnails

  • Thumbnail: Page 
364
    364
  • Thumbnail: Page 
365
    365
  • Thumbnail: Page 
366
    366
  • Thumbnail: Page 
367
    367
  • Thumbnail: Page 
368
    368
  • Thumbnail: Page 
369
    369
  • Thumbnail: Page 
370
    370
  • Thumbnail: Page 
371
    371
  • Thumbnail: Page 
372
    372
  • Thumbnail: Page 
373
    373