You are not currently logged in.
Access JSTOR through your library or other institution:
Posterior Simulation in Countable Mixture Models for Large Datasets
Journal of the American Statistical Association
Vol. 105, No. 490 (June 2010), pp. 775-786
Stable URL: http://www.jstor.org/stable/29747082
Page Count: 12
You can always find the topics here!Topics: Modeling, Simulations, Statistical models, Markov models, Computational statistics, Datasets, Statistics, Chromosomes, Comparative genomic hybridization, Bookkeeping
Were these topics helpful?See something inaccurate? Let us know!
Select the topics that are inaccurate.
Preview not available
Preview not available
Mixture models, or convex combinations of a countable number of probability distributions, offer an elegant framework for inference when the population of interest can be subdivided into latent clusters having random characteristics that are heterogeneous between, but homogeneous within, the clusters. Traditionally, the different kinds of mixture models have been motivated and analyzed from very different perspectives, and their common characteristics have not been fully appreciated. The inferential techniques developed for these models usually necessitate heavy computational burdens that make them difficult, if not impossible, to apply to the massive data sets increasingly encountered in real world studies. This paper introduces a flexible class of models called generalized Pólya urn (GPU) processes. Many common mixture models, including finite mixtures, hidden Markov models, and Dirichlet processes, are obtained as special cases of GPU processes. Other important special cases include finite-dimensional Dirichlet priors, infinite hidden Markov models, analysis of densities models, nested Chinese restaurant processes, hierarchical DP models, nonparametric density models, spatial Dirichlet processes, weighted mixtures of DP priors, and nested Dirichlet processes. An investigation of the theoretical properties of GPU processes offers new insight into asymptotics that form the basis of cost-effective Markov chain Monte Carlo (MCMC) strategies for large datasets. These MCMC techniques have the advantage of providing inferences from the posterior of interest, rather than an approximation, and are applicable to different mixture models. The versatility and impressive gains of the methodology are demonstrated by simulation studies and by a semiparametric Bayesian analysis of high-resolution comparative genomic hybridization data on lung cancer. The appendixes are available online as supplemental material.
Journal of the American Statistical Association © 2010 American Statistical Association