Regression by Classification
Luís Torgo and João Gama
1996
Abstract
We present a methodology that enables the use of existent classification
inductive learning systems on problems of regression. We achieve this goal by
transforming regression problems into classification problems. This is done by
transforming the range of continuous goal variable values into a set of
intervals that will be used as discrete classes. We provide several methods for
discretizing the goal variable values. These methods are based on the idea of
performing an iterative search for the set of final discrete classes. The
search algorithm is guided by a N-fold cross validation estimation of the
prediction error resulting from using a set of discrete classes. We have done
extensive empirical evaluation of our discretization methodologies using C4.5
and CN2 on four real world domains. The results of these experiments show the
quality of our discretization methods compared to other existing methods.
Our method is independent of the used classification inductive system. The
method is easily applicable to other inductive algorithms. This generality
turns our method into a powerful tool that extends the applicability of a wide
range of existing classification systems.