Error Estimates for Pruning Regression Trees

Luís Torgo
1998


Abstract

This paper presents a comparative study of several methods for estimating the true error of tree-structured regression models. We evaluate these methods in the context of regression tree pruning. The study is focused on problems where large samples of data are available. We present two novel variants of existent estimation methods. We evaluate several methods that follow different approaches to the estimation problem, and perform experimental evaluation in twelve domains. The goal of this evaluation is to characterise the performance of the methods in the task of selecting the best possible tree among the alternative trees considered during pruning. The results of the comparison show that certain estimators lead to very bad decisions in some domains. Our proposed variant of the holdout method obtained the best results in the experimental comparisons.