Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

If you need an accessible version of this item please contact JSTOR User Support

Bayesian Variable Selection in Linear Regression

T. J. Mitchell and J. J. Beauchamp
Journal of the American Statistical Association
Vol. 83, No. 404 (Dec., 1988), pp. 1023-1032
DOI: 10.2307/2290129
Stable URL: http://www.jstor.org/stable/2290129
Page Count: 10
  • Download ($14.00)
  • Cite this Item
If you need an accessible version of this item please contact JSTOR User Support
Bayesian Variable Selection in Linear Regression
Preview not available

Abstract

This article is concerned with the selection of subsets of predictor variables in a linear regression model for the prediction of a dependent variable. It is based on a Bayesian approach, intended to be as objective as possible. A probability distribution is first assigned to the dependent variable through the specification of a family of prior distributions for the unknown parameters in the regression model. The method is not fully Bayesian, however, because the ultimate choice of prior distribution from this family is affected by the data. It is assumed that the predictors represent distinct observables; the corresponding regression coefficients are assigned independent prior distributions. For each regression coefficient subject to deletion from the model, the prior distribution is a mixture of a point mass at 0 and a diffuse uniform distribution elsewhere, that is, a "spike and slab" distribution. The random error component is assigned a normal distribution with mean 0 and standard deviation σ, where ln(σ) has a locally uniform noninformative prior distribution. The appropriate posterior probabilities are derived for each submodel. If the regression coefficients have identical priors, the posterior distribution depends only on the data and the parameter γ, which is the height of the spike divided by the height of the slab for the common prior distribution. This parameter is not assigned a probability distribution; instead, it is considered a parameter that indexes the members of a class of Bayesian methods. Graphical methods are proposed as informal guides for choosing γ, assessing the complexity of the response function and the strength of the individual predictor variables, and assessing the degree of uncertainty about the best submodel. The following plots against γ are suggested: (a) posterior probability that a particular regression coefficient is 0; (b) posterior expected number of terms in the model; (c) posterior entropy of the submodel distribution; (d) posterior predictive error; and (e) posterior probability of goodness of fit. Plots (d) and (e) are suggested as ways to choose γ. The predictive error is determined using a Bayesian cross-validation approach that generates a predictive density for each observation, given all of the data except that observation, that is, a type of "leave one out" approach. The goodness-of-fit measure is the sum of the posterior probabilities of all submodels that pass a standard F test for goodness of fit relative to the full model, at a specified level of significance. The dependence of the results on the scaling of the variables is discussed, and some ways to choose the scaling constants are suggested. Examples based on a large data set arising from an energy-conservation study are given to demonstrate the application of the methods.

Page Thumbnails

  • Thumbnail: Page 
1023
    1023
  • Thumbnail: Page 
1024
    1024
  • Thumbnail: Page 
1025
    1025
  • Thumbnail: Page 
1026
    1026
  • Thumbnail: Page 
1027
    1027
  • Thumbnail: Page 
1028
    1028
  • Thumbnail: Page 
1029
    1029
  • Thumbnail: Page 
1030
    1030
  • Thumbnail: Page 
1031
    1031
  • Thumbnail: Page 
1032
    1032