Regression by Classification - 3.1 The Wrapper Approach

3.1 The Wrapper Approach

The goal of the class discretization process is to obtain a discrete data set that enables the classification algorithm to learn a theory that has the best possible regression accuracy. As we change the number of used classes we are changing the input to this classification system and thus varying its regression accuracy. Because of this we can easily see that the discretization process should take into account the classification system that will be used afterwards. In other terms, the used discrete classes are just a kind of parameter of the classification algorithm. The wrapper approach [8, 9] is a well known strategy which has been mainly used for feature subset selection ([8] and [10] among others) and parameter estimation [13]. Pazzani [14] also used a similar approach on feature creation which is a similar problem to ours. The use of this approach to estimate a parameter of a learning algorithm can be described by the following figure:

Fig. 2. The wrapper approach.

The two main components of the wrapper approach are the way how new parameter settings are generated and how they are evaluated in the context of the target learning algorithm. The basic idea is to try different parameter settings and choose the one that gives best estimated results. This best setting is the result of the wrapper process an will then be used in the learning algorithm for the real evaluation using an independent test set.

Translating this scenario to our discretization problem we basically have to find the discretization method that gives the best results. Our method tries several possible discretization settings (i.e. set of discrete classes) and chooses the one that gives the best estimated accuracy. To evaluate the candidate setups we use the well known N-fold cross validation (CV) test.

The search component of our wrapper approach consists of the process used to generate a new candidate set of classes (i.e. the search operators) and the search algorithm. We use a simple hill-climbing search algorithm coupled with a kind of lookahead mechanism to try to avoid the well-known problem of local minimum of this algorithm. The search proceeds by trying new candidate sets of classes until a certain number (the lookahead value) of consecutive worse trials occur.

We provide two alternative ways of generating a new candidate discretization setting. Both of them can be applied to the three presented splitting strategies (section 2.1). This gives six different discretization methods that can be used to create a set of discrete classes using this wrapper approach.

<< , >> , up , Title , Contents