<< , >> , up , Title , Contents

1. Introduction

Machine learning (ML) researchers have traditionally concentrated their efforts on classification problems. However, many interesting real world domains demand for regression tools. In this paper we present and evaluate a discretization methodology that extends the applicability of existing classification systems to regression domains. With this reformulation of regression we broaden the range of ML systems that can deal with these domains.

The idea of mapping regression into classification was originally used by Weiss & Indurkhya [19, 20] with their rule-based regression system. They used the P-class algorithm[1] for class discretization as a part of their learning system. This work clearly showed that it is possible to obtain excellent predictive results by transforming regression problems into classification ones and then use a classification learning system. Our works is based on these results. We have oriented our research into the discretization phase as opposed to Weiss & Indurkhya's work. We do not supply a complete regression learning system like those authors did. We concentrated our research on two major goals related to the problem of class discretization. Firstly, to provide alternative discretization methods. Secondly, to enable the use of these methodologies with other classification systems. As to the first goal we were able to prove through extensive empirical evaluation on four real world domains that two of our proposed discretization methodologies outperformed the method used on the cited work. These experiments also revealed that the best methodology is dependent on both the regression domain as well as on the used classification system, thus providing strong evidence for our search-based discretization method. With respect to the second goal we have used our methodologies with CN2 [2] and C4.5 [15]. Our discretization system is easily interfaced to any other classification algorithm[2].

The next section gives a brief overview of the steps involved in solving regression problems by means of classification inductive algorithms. We then present our discretization methodology on section 3. The experiments we have done are described on section 4. Finally we describe some future work and present the conclusions of this paper.

<< , >> , up , Title , Contents