Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

Unbiased Recursive Partitioning: A Conditional Inference Framework

Torsten Hothorn, Kurt Hornik and Achim Zeileis
Journal of Computational and Graphical Statistics
Vol. 15, No. 3 (Sep., 2006), pp. 651-674
Stable URL: http://www.jstor.org/stable/27594202
Page Count: 24
  • Download ($14.00)
  • Cite this Item
Unbiased Recursive Partitioning: A Conditional Inference Framework
Preview not available

Abstract

Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, confirming the need for an unbiased variable selection. Moreover, it is shown that the prediction accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on glaucoma classification, node positive breast cancer survival and mammography experience are re-analyzed.

Page Thumbnails

  • Thumbnail: Page 
651
    651
  • Thumbnail: Page 
652
    652
  • Thumbnail: Page 
653
    653
  • Thumbnail: Page 
654
    654
  • Thumbnail: Page 
655
    655
  • Thumbnail: Page 
656
    656
  • Thumbnail: Page 
657
    657
  • Thumbnail: Page 
658
    658
  • Thumbnail: Page 
659
    659
  • Thumbnail: Page 
660
    660
  • Thumbnail: Page 
661
    661
  • Thumbnail: Page 
662
    662
  • Thumbnail: Page 
663
    663
  • Thumbnail: Page 
664
    664
  • Thumbnail: Page 
665
    665
  • Thumbnail: Page 
666
    666
  • Thumbnail: Page 
667
    667
  • Thumbnail: Page 
668
    668
  • Thumbnail: Page 
669
    669
  • Thumbnail: Page 
670
    670
  • Thumbnail: Page 
671
    671
  • Thumbnail: Page 
672
    672
  • Thumbnail: Page 
673
    673
  • Thumbnail: Page 
674
    674