If you need an accessible version of this item please contact JSTOR User Support

Biased and Unbiased Cross-Validation in Density Estimation

David W. Scott and George R. Terrell
Journal of the American Statistical Association
Vol. 82, No. 400 (Dec., 1987), pp. 1131-1146
DOI: 10.2307/2289391
Stable URL: http://www.jstor.org/stable/2289391
Page Count: 16
  • Download PDF
  • Cite this Item

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

If you need an accessible version of this item please contact JSTOR User Support
Biased and Unbiased Cross-Validation in Density Estimation
Preview not available

Abstract

Nonparametric density estimation requires the specification of smoothing parameters. The demands of statistical objectivity make it highly desirable to base the choice on properties of the data set. In this article we introduce some biased cross-validation criteria for selection of smoothing parameters for kernel and histogram density estimators, closely related to one investigated in Scott and Factor (1981). These criteria are obtained by estimating L2 norms of derivatives of the unknown density and provide slightly biased estimates of the average squared L2 error or mean integrated squared error. These criteria are roughly the analog of Wahba's (1981) generalized cross-validation procedure for orthogonal series density estimators. We present the relationship of the biased cross-validation procedure to the least squares cross-validation procedure, which provides unbiased estimates of the mean integrated squared error. Both methods are shown to be based on U statistics. We compare the two methods by theoretical calculation of the noise in the cross-validation functions and corresponding cross-validated smoothing parameters, by Monte Carlo simulation, and by example. Surprisingly large gains in asymptotic efficiency are observed when biased cross-validation is compared with unbiased cross-validation if the underlying density is sufficiently smooth. The theoretical results explain some of the small sample behavior of cross-validation functions: we show that cross-validation algorithms can be unreliable for sample sizes that are "too small." To aid the practitioner in the use of these appealing automatic cross-validation algorithms and to help facilitate evaluation of future algorithms, we must address some oftentimes controversial issues in density estimation: squared loss, the integrated squared error and mean integrated squared error criteria, adaptive density estimates, sample size requirements, and assumptions about the underlying density's smoothness. We conclude that the two cross-validation procedures behave quite differently, so one might well use both in practice.

Page Thumbnails

  • Thumbnail: Page 
1131
    1131
  • Thumbnail: Page 
1132
    1132
  • Thumbnail: Page 
1133
    1133
  • Thumbnail: Page 
1134
    1134
  • Thumbnail: Page 
1135
    1135
  • Thumbnail: Page 
1136
    1136
  • Thumbnail: Page 
1137
    1137
  • Thumbnail: Page 
1138
    1138
  • Thumbnail: Page 
1139
    1139
  • Thumbnail: Page 
1140
    1140
  • Thumbnail: Page 
1141
    1141
  • Thumbnail: Page 
1142
    1142
  • Thumbnail: Page 
1143
    1143
  • Thumbnail: Page 
1144
    1144
  • Thumbnail: Page 
1145
    1145
  • Thumbnail: Page 
1146
    1146