If you need an accessible version of this item please contact JSTOR User Support
A Bayesian Analysis of Some Nonparametric Problems
Thomas S. Ferguson
The Annals of Statistics
Vol. 1, No. 2 (Mar., 1973), pp. 209230
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/2958008
Page Count: 22
You are not currently logged in.
Access your personal account or get JSTOR access through your library or other institution:
If you need an accessible version of this item please contact JSTOR User Support
Abstract
The Bayesian approach to statistical problems, though fruitful in many ways, has been rather unsuccessful in treating nonparametric problems. This is due primarily to the difficulty in finding workable prior distributions on the parameter space, which in nonparametric ploblems is taken to be a set of probability distributions on a given sample space. There are two desirable properties of a prior distribution for nonparametric problems. (I) The support of the prior distribution should be largewith respect to some suitable topology on the space of probability distributions on the sample space. (II) Posterior distributions given a sample of observations from the true probability distribution should be manageable analytically. These properties are antagonistic in the sense that one may be obtained at the expense of the other. This paper presents a class of prior distributions, called Dirichlet process priors, broad in the sense of (I), for which (II) is realized, and for which treatment of many nonparametric statistical problems may be carried out, yielding results that are comparable to the classical theory. In Section 2, we review the properties of the Dirichlet distribution needed for the description of the Dirichlet process given in Section 3. Briefly, this process may be described as follows. Let X be a space and A a σfield of subsets, and let α be a finite nonnull measure on (X, A). Then a stochastic process P indexed by elements A of A, is said to be a Dirichlet process on (X, A) with parameter α if for any measurable partition (A1, ⋯, Ak) of X, the random vector (P(A1), ⋯, P(Ak)) has a Dirichlet distribution with parameter (α(A1), ⋯, α(Ak)). P may be considered a random probability measure on (X, A), The main theorem states that if P is a Dirichlet process on (X, A) with parameter α, and if X1, ⋯, Xn is a sample from P, then the posterior distribution of P given X1, ⋯, Xn is also a Dirichlet process on (X, A) with a parameter α + ∑n 1 δxi , where δx denotes the measure giving mass one to the point x. In Section 4, an alternative definition of the Dirichlet process is given. This definition exhibits a version of the Dirichlet process that gives probability one to the set of discrete probability measures on (X, A). This is in contrast to Dubins and Freedman [2], whose methods for choosing a distribution function on the interval [0, 1] lead with probability one to singular continuous distributions. Methods of choosing a distribution function on [0, 1] that with probability one is absolutely continuous have been described by Kraft [7]. The general method of choosing a distribution function on [0, 1], described in Section 2 of Kraft and van Eeden [10], can of course be used to define the Dirichlet process on [0, 1]. Special mention must be made of the papers of Freedman and Fabius. Freedman [5] defines a notion of tailfree for a distribution on the set of all probability measures on a countable space X. For a tailfree prior, posterior distribution given a sample from the true probability measure may be fairly easily computed. Fabius [3] extends the notion of tailfree to the case where X is the unit interval [0, 1], but it is clear his extension may be made to cover quite general X. With such an extension, the Dirichlet process would be a special case of a tailfree distribution for which the posterior distribution has a particularly simple form. There are disadvantages to the fact that P chosen by a Dirichlet process is discrete with probability one. These appear mainly because in sampling from a P chosen by a Dirichlet process, we expect eventually to see one observation exactly equal to another. For example, consider the goodnessoffit problem of testing the hypothesis H0 that a distribution on the interval [0, 1] is uniform. If on the alternative hypothesis we place a Dirichlet process prior with parameter α itself a uniform measure on [0, 1], and if we are given a sample of size n ≥ 2, the only nontrivial nonrandomized Bayes rule is to reject H0 if and only if two or more of the observations are exactly equal. This is really a test of the hypothesis that a distribution is continuous against the hypothesis that it is discrete. Thus, there is still a need for a prior that chooses a continuous distribution with probability one and yet satisfies properties (I) and (II). Some applications in which the possible doubling up of the values of the observations plays no essential role are presented in Section 5. These include the estimation of a distribution function, of a mean, of quantiles, of a variance and of a covariance. A twosample problem is considered in which the MannWhitney statistic, equivalent to the ranksum statistic, appears naturally. A decision theoretic upper tolerance limit for a quantile is also treated. Finally, a hypothesis testing problem concerning a quantile is shown to yield the sign test. In each of these problems, useful ways of combining prior information with the statistical observations appear. Other applications exist. In his Ph. D. dissertation [1], Charles Antoniak finds a need to consider mixtures of Dirichlet processes. He treats several problems, including the estimation of a mixing distribution, bioassay, empirical Bayes problems, and discrimination problems.
Page Thumbnails

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230
The Annals of Statistics © 1973 Institute of Mathematical Statistics