Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

Posterior Simulation in Countable Mixture Models for Large Datasets

Subharup Guha
Journal of the American Statistical Association
Vol. 105, No. 490 (June 2010), pp. 775-786
Stable URL: http://www.jstor.org/stable/29747082
Page Count: 12
  • Download ($14.00)
  • Cite this Item
Preview not available
Preview not available

Abstract

Mixture models, or convex combinations of a countable number of probability distributions, offer an elegant framework for inference when the population of interest can be subdivided into latent clusters having random characteristics that are heterogeneous between, but homogeneous within, the clusters. Traditionally, the different kinds of mixture models have been motivated and analyzed from very different perspectives, and their common characteristics have not been fully appreciated. The inferential techniques developed for these models usually necessitate heavy computational burdens that make them difficult, if not impossible, to apply to the massive data sets increasingly encountered in real world studies. This paper introduces a flexible class of models called generalized Pólya urn (GPU) processes. Many common mixture models, including finite mixtures, hidden Markov models, and Dirichlet processes, are obtained as special cases of GPU processes. Other important special cases include finite-dimensional Dirichlet priors, infinite hidden Markov models, analysis of densities models, nested Chinese restaurant processes, hierarchical DP models, nonparametric density models, spatial Dirichlet processes, weighted mixtures of DP priors, and nested Dirichlet processes. An investigation of the theoretical properties of GPU processes offers new insight into asymptotics that form the basis of cost-effective Markov chain Monte Carlo (MCMC) strategies for large datasets. These MCMC techniques have the advantage of providing inferences from the posterior of interest, rather than an approximation, and are applicable to different mixture models. The versatility and impressive gains of the methodology are demonstrated by simulation studies and by a semiparametric Bayesian analysis of high-resolution comparative genomic hybridization data on lung cancer. The appendixes are available online as supplemental material.

Page Thumbnails

  • Thumbnail: Page 
775
    775
  • Thumbnail: Page 
776
    776
  • Thumbnail: Page 
777
    777
  • Thumbnail: Page 
778
    778
  • Thumbnail: Page 
779
    779
  • Thumbnail: Page 
780
    780
  • Thumbnail: Page 
781
    781
  • Thumbnail: Page 
782
    782
  • Thumbnail: Page 
783
    783
  • Thumbnail: Page 
784
    784
  • Thumbnail: Page 
785
    785
  • Thumbnail: Page 
786
    786