If you need an accessible version of this item please contact JSTOR User Support

Procedures for the Identification of Multiple Outliers in Linear Models

Ali S. Hadi and Jeffrey S. Simonoff
Journal of the American Statistical Association
Vol. 88, No. 424 (Dec., 1993), pp. 1264-1272
DOI: 10.2307/2291266
Stable URL: http://www.jstor.org/stable/2291266
Page Count: 9
  • Download PDF
  • Cite this Item

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

If you need an accessible version of this item please contact JSTOR User Support
Procedures for the Identification of Multiple Outliers in Linear Models
Preview not available

Abstract

We consider the problem of identifying and testing multiple outliers in linear models. The available outlier identification methods often do not succeed in detecting multiple outliers because they are affected by the observations they are supposed to identify. We introduce two test procedures for the detection of multiple outliers that appear to be less sensitive to this problem. Both procedures attempt to separate the data into a set of "clean" data points and a set of points that contain the potential outliers. The potential outliers are then tested to see how extreme they are relative to the clean subset, using an appropriately scaled version of the prediction error. The procedures are illustrated and compared to various existing methods, using several data sets known to contain multiple outliers. Also, the performances of both procedures are investigated by a Monte Carlo study. The data sets and the Monte Carlo indicate that both procedures are effective in the detection of multiple outliers in linear models and are superior to other methods, including methods based on robust fits (e.g., least median of squares residuals). In particular, the methods do not require presetting numbers of outliers to test for, do not require the efficiency level of an estimator, do not require Monte Carlo to determine cutoff values, are not highly computationally intensive, and are relatively resistant to both masking and swamping effects.

Page Thumbnails

  • Thumbnail: Page 
1264
    1264
  • Thumbnail: Page 
1265
    1265
  • Thumbnail: Page 
1266
    1266
  • Thumbnail: Page 
1267
    1267
  • Thumbnail: Page 
1268
    1268
  • Thumbnail: Page 
1269
    1269
  • Thumbnail: Page 
1270
    1270
  • Thumbnail: Page 
1271
    1271
  • Thumbnail: Page 
1272
    1272