<< , >> , up , Title , Contents

2.1 Methods For Splitting A Set Of Continuous Values

The key issue on a discretization process is the transformation of a set of values into a set of intervals. These intervals may then be used as discrete classes. In this section we present three methods for performing this task. All of them receive as input a set of values and the desired number of intervals.

· Equally probable intervals (EP)

This strategy creates a set of N intervals with the same number of elements.

· Equal width intervals (EW)

The original range of values is divided into N intervals with the same range.

· K-means clustering (KM)

In this method we try to build N intervals that minimize the sum of the distances of each element of an interval to the interval's gravity center 4[3]. This is basically the P-class method that is given in [20]. This method starts with the EP approximation but then tries to move the elements of each interval to contiguous intervals whenever these changes reduce the referred sum of distances.

To better illustrate these strategies we show how they group the set of values {1,3,6,7,8,9.5,10,11} assuming that we want to partition them into three intervals (N=3):

- EP gives the intervals [1 .. 6.5], ]6.5 .. 9.75] and ]9.75 .. 11] with each interval containing respectively the values{1,3,6}, {7,8,9.5} and {10,11}.

- Using EW we get [1 .. 4.33], ]4.33 .. 7.66] and ]7.66 .. 11] containing the values {1,3}, {6,7} and {8,9.5,10,11}.

- Finally KM obtains the intervals [1 ..4.5], ]4.5 .. 8.75] and ]8.75 .. 11] grouping the values in {1,3}, {6,7,8} and {9.5,10,11}.

The problem of these strategies is that they assume that we know the number of intervals that is appropriate for our problem. Our experiments show that this number is dependent not only on the domain we are dealing with, but also on the classification system that will be used after the discretization process. The methodology we present in this paper overcomes this difficulty by means of an iterative search approach.


<< , >> , up , Title , Contents