Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

If you need an accessible version of this item please contact JSTOR User Support

The Effect of Two-Stage Sampling on the F Statistic

C. F. J. Wu, D. Holt and D. J. Holmes
Journal of the American Statistical Association
Vol. 83, No. 401 (Mar., 1988), pp. 150-159
DOI: 10.2307/2288934
Stable URL: http://www.jstor.org/stable/2288934
Page Count: 10
  • Download ($14.00)
  • Cite this Item
If you need an accessible version of this item please contact JSTOR User Support
The Effect of Two-Stage Sampling on the F Statistic
Preview not available

Abstract

The assumption of iid observations that underlies many statistical procedures is called into question when analyzing complex survey data. The population structure--particularly the existence of clusters in two-stage samples that usually exhibit positive intracluster correlation--invalidates the independence assumption. Kish and Frankel (1974) investigated the impact of this fact on regression analysis by using the standard sample-survey-theory framework; Campbell (1977) and Scott and Holt (1982) used the linear model framework. In general, although ordinary least squares (OLS) procedures are unbiased but not fully efficient for estimation of the regression coefficients, serious difficulties can arise when using OLS estimators for second-order terms. Variances of the OLS estimators for the regression coefficients can be larger (sometimes much larger) than the usual OLS variance expression would indicate. Failure to consider this possibility leads to underestimation of variances, with consequences for confidence intervals. This article follows this effect through to the F statistic, because of its importance to hypothesis tests and confidence ellipsoids. Our major aim is to investigate the effect of intracluster correlation on the F statistic. We propose a diagnostic measure identifying when the ordinary F statistic is likely to be affected and give decomposition in terms of the contributions of the individual regressors and their cross-products, based on a similar decomposition for the projection matrix in Appendix A. We establish numerically and theoretically the effectiveness of this measure in understanding the degree of distortion of F by intracluster correlation. The measure leads to a correction for the F test for unknown intracluster correlation. This is a slightly simpler numerical procedure than the generalized least squares (GLS), since it does not require iteration. The correction is shown to perform at least as well as the GLS in a simulation study.

Page Thumbnails

  • Thumbnail: Page 
150
    150
  • Thumbnail: Page 
151
    151
  • Thumbnail: Page 
152
    152
  • Thumbnail: Page 
153
    153
  • Thumbnail: Page 
154
    154
  • Thumbnail: Page 
155
    155
  • Thumbnail: Page 
156
    156
  • Thumbnail: Page 
157
    157
  • Thumbnail: Page 
158
    158
  • Thumbnail: Page 
159
    159