Regression by Classification

Luís Torgo and João Gama
1996


Abstract

We present a methodology that enables the use of existent classification inductive learning systems on problems of regression. We achieve this goal by transforming regression problems into classification problems. This is done by transforming the range of continuous goal variable values into a set of intervals that will be used as discrete classes. We provide several methods for discretizing the goal variable values. These methods are based on the idea of performing an iterative search for the set of final discrete classes. The search algorithm is guided by a N-fold cross validation estimation of the prediction error resulting from using a set of discrete classes. We have done extensive empirical evaluation of our discretization methodologies using C4.5 and CN2 on four real world domains. The results of these experiments show the quality of our discretization methods compared to other existing methods.

Our method is independent of the used classification inductive system. The method is easily applicable to other inductive algorithms. This generality turns our method into a powerful tool that extends the applicability of a wide range of existing classification systems.