Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

If you need an accessible version of this item please contact JSTOR User Support

Model Uncertainty, Data Mining and Statistical Inference

Chris Chatfield
Journal of the Royal Statistical Society. Series A (Statistics in Society)
Vol. 158, No. 3 (1995), pp. 419-466
Published by: Wiley for the Royal Statistical Society
DOI: 10.2307/2983440
Stable URL: http://www.jstor.org/stable/2983440
Page Count: 48
  • Read Online (Free)
  • Download ($29.00)
  • Cite this Item
If you need an accessible version of this item please contact JSTOR User Support
Model Uncertainty, Data Mining and Statistical Inference
Preview not available

Abstract

This paper takes a broad, pragmatic view of statistical inference to include all aspects of model formulation. The estimation of model parameters traditionally assumes that a model has a prespecified known form and takes no account of possible uncertainty regarding the model structure. This implicitly assumes the existence of a `true' model, which many would regard as a fiction. In practice model uncertainty is a fact of life and likely to be more serious than other sources of uncertainty which have received far more attention from statisticians. This is true whether the model is specified on subject-matter grounds or, as is increasingly the case, when a model is formulated, fitted and checked on the same data set in an iterative, interactive way. Modern computing power allows a large number of models to be considered and data-dependent specification searches have become the norm in many areas of statistics. The term data mining may be used in this context when the analyst goes to great lengths to obtain a good fit. This paper reviews the effects of model uncertainty, such as too narrow prediction intervals, and the non-trivial biases in parameter estimates which can follow data-based modelling. Ways of assessing and overcoming the effects of model uncertainty are discussed, including the use of simulation and resampling methods, a Bayesian model averaging approach and collecting additional data wherever possible. Perhaps the main aim of the paper is to ensure that statisticians are aware of the problems and start addressing the issues even if there is no simple, general theoretical fix.

Page Thumbnails

  • Thumbnail: Page 
[419]
    [419]
  • Thumbnail: Page 
420
    420
  • Thumbnail: Page 
421
    421
  • Thumbnail: Page 
422
    422
  • Thumbnail: Page 
423
    423
  • Thumbnail: Page 
424
    424
  • Thumbnail: Page 
425
    425
  • Thumbnail: Page 
426
    426
  • Thumbnail: Page 
427
    427
  • Thumbnail: Page 
428
    428
  • Thumbnail: Page 
429
    429
  • Thumbnail: Page 
430
    430
  • Thumbnail: Page 
431
    431
  • Thumbnail: Page 
432
    432
  • Thumbnail: Page 
433
    433
  • Thumbnail: Page 
434
    434
  • Thumbnail: Page 
435
    435
  • Thumbnail: Page 
436
    436
  • Thumbnail: Page 
437
    437
  • Thumbnail: Page 
438
    438
  • Thumbnail: Page 
439
    439
  • Thumbnail: Page 
440
    440
  • Thumbnail: Page 
441
    441
  • Thumbnail: Page 
442
    442
  • Thumbnail: Page 
443
    443
  • Thumbnail: Page 
444
    444
  • Thumbnail: Page 
445
    445
  • Thumbnail: Page 
446
    446
  • Thumbnail: Page 
447
    447
  • Thumbnail: Page 
448
    448
  • Thumbnail: Page 
449
    449
  • Thumbnail: Page 
450
    450
  • Thumbnail: Page 
451
    451
  • Thumbnail: Page 
452
    452
  • Thumbnail: Page 
453
    453
  • Thumbnail: Page 
454
    454
  • Thumbnail: Page 
455
    455
  • Thumbnail: Page 
456
    456
  • Thumbnail: Page 
457
    457
  • Thumbnail: Page 
458
    458
  • Thumbnail: Page 
459
    459
  • Thumbnail: Page 
460
    460
  • Thumbnail: Page 
461
    461
  • Thumbnail: Page 
462
    462
  • Thumbnail: Page 
463
    463
  • Thumbnail: Page 
464
    464
  • Thumbnail: Page 
465
    465
  • Thumbnail: Page 
466
    466