A Comparative Study of Reliable Error Estimators for Pruning Regression Trees

Luís Torgo
1998


Abstract

This paper presents a comparative study of several methods for estimating the true error of tree- structured regression models. We evaluate these methods in the context of regression tree pruning. Pruning is considered a key issue for obtaining reliable tree-structured models in a real world scenario. The major step of a pruning process consists of obtaining accurate estimates of the error of alternative tree models. We evaluate experimentally four methods for obtaining these estimates in twelve domains. The goal of this evaluation was to characterise the performance of the methods in the task of selecting the best possible tree among the set of trees considered during pruning. The results of the comparison show that certain estimators lead to poor decisions in some domains. Our study shows that the Cross Validation variant that we propose is the best choice for the set-ups we have considered.