Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures

Ewout W. Steyerberg, Andrew J. Vickers, Nancy R. Cook, Thomas Gerds, Mithat Gonen, Nancy Obuchowski, Michael J. Pencina and Michael W. Kattan
Epidemiology
Vol. 21, No. 1 (January 2010), pp. 128-138
Stable URL: http://www.jstor.org/stable/25662818
Page Count: 11
  • More info
  • Cite this Item
Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures
Preview not available

Abstract

The performance of prediction models can be assessed using a variety of methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic [ROC] curve), and goodness-of-fit statistics for calibration. Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision—analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions. We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration, we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n = 544 for model development, n = 273 for external validation). We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.

Page Thumbnails

  • Thumbnail: Page 
128
    128
  • Thumbnail: Page 
129
    129
  • Thumbnail: Page 
130
    130
  • Thumbnail: Page 
131
    131
  • Thumbnail: Page 
132
    132
  • Thumbnail: Page 
133
    133
  • Thumbnail: Page 
134
    134
  • Thumbnail: Page 
135
    135
  • Thumbnail: Page 
136
    136
  • Thumbnail: Page 
137
    137
  • Thumbnail: Page 
138
    138