<< , >> , up , Title , Contents

5. Relations To Other Work And Future Directions

We have presented a general class discretization method and evaluated it in conjunction with two classification algorithms. It is our goal to experiment with more systems with different characteristics.

Within the ML community other work exists on the area of continuous attribute discretization. This work usually performs a kind of pre-processing by trying to maximize the mutual information between the resulting discrete attribute and the classes (for instance [4] and [11]). This is a good strategy but it is applicable only when the classes are given. Ours is a very different problem, as we are determining which classes to consider.

Within the ML field some regression learning systems exist (for instance CART [1], M5 [16] and R2 [17]) that could be used on these domains. These systems do not transform regression into classification tasks. Weiss & Indurkhya have demonstrated [20] that this transformation can obtain good results when compared the these more "classical" methods. They have done this with their rule-based regression system that learns with discrete classes. They have tested it on several domains (including the ones we have used). The results they report show that their system clearly outperforms CART, a Nearest Neighbor algorithm and the statistical method MARS [5]. These results were a key motivation for our work. They indicate that it is possible to obtain good accuracy with classification systems on regression problems. Their system is a two step algorithm. First there is the discretization phase where they use a method that is equal to our VNI+KM method. Finally they use the resulting discrete data set with their classification system. As we did not have available their classification system we were not able to test our discretization methods together with this system. However, we have tested their discretization method with CN2 and C4.5. The experiments showed that the best method depends on both the domain as well as on the used classification system (Table 3). This fact does not enable us to definitely say that our methods are always better than the VNI+KM method. Nevertheless, these results reinforce our search-based approach that is able to chose the discretization method to use depending on both these factors. Table 4 also shows that on average both SIC+KM and VNI+EP are better then VNI+KM. This seems to indicate that these methodologies together with Weiss & Indurkhya's classification system could even get better overall regression accuracies when compared to the other "classical" regression methodologies.

One possible future improvement on our work is to try to use other search algorithms (like best-first [6], simulated annealing [18] or even genetic-based search algorithms [7]).

Another interesting research topic as to do with the inability of classification systems to take advantage of the implicit ordering of the obtained discrete classes. Because of this, an error has always the same cost. Unfortunately this is not suitable for the evaluation measures used for calculating the accuracy of regression models. A possible way to overcome this drawback is to use a cost matrix in the learning phase. This matrix would distinguish between the errors. This error cost information is important even in the classification scenario for several domains [12]. We could use the distance between the median of each interval as the error cost of confusing classes. We have already implemented this idea together with a linear discriminant that is able to use cost matrices. We do not include this work here as we still do not have experimental results.


<< , >> , up , Title , Contents