Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

If you need an accessible version of this item please contact JSTOR User Support

A Correlated Topic Model of Science

David M. Blei and John D. Lafferty
The Annals of Applied Statistics
Vol. 1, No. 1 (Jun., 2007), pp. 17-35
Stable URL: http://www.jstor.org/stable/4537420
Page Count: 19
  • Read Online (Free)
  • Download ($19.00)
  • Subscribe ($19.50)
  • Cite this Item
If you need an accessible version of this item please contact JSTOR User Support
A Correlated Topic Model of Science
Preview not available

Abstract

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than X-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [J. Roy. Statist. Soc. Ser. B 44 (1982) 139-177]. We derive a fast variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. We apply the CTM to the articles from Science published from 1990-1999, a data set that comprises 57M words. The CTM gives a better fit of the data than LDA, and we demonstrate its use as an exploratory tool of large document collections.

Page Thumbnails

  • Thumbnail: Page 
17
    17
  • Thumbnail: Page 
18
    18
  • Thumbnail: Page 
19
    19
  • Thumbnail: Page 
20
    20
  • Thumbnail: Page 
21
    21
  • Thumbnail: Page 
22
    22
  • Thumbnail: Page 
23
    23
  • Thumbnail: Page 
24
    24
  • Thumbnail: Page 
25
    25
  • Thumbnail: Page 
26
    26
  • Thumbnail: Page 
27
    27
  • Thumbnail: Page 
28
    28
  • Thumbnail: Page 
29
    29
  • Thumbnail: Page 
30
    30
  • Thumbnail: Page 
31
    31
  • Thumbnail: Page 
32
    32
  • Thumbnail: Page 
33
    33
  • Thumbnail: Page 
34
    34
  • Thumbnail: Page 
35
    35