<< , >> , up , Title , Contents

3. Experiments

In this section we present some experiments carried out with R². At this stage of development our system is not yet ready in terms of computational efficiency for real-world high dimensionality domains so we have decided to test it in some artificial data sets.

We now present the data sets used in our experiments. We give a kind of specification used in the generation of the data sets :

· Linear 1 (L1) - 100 examples

This data set has 1 real valued attribute with values randomly generated from the interval [-10..10]. The class for each case is obtained using the rules :

IF a1 >= 1 THEN y = -5 - 0.67 x a1 + e
IF a1 < 1 THEN y = 10 + 0.3 x a1 + e
where e is a random value in [0..2]

· Linear 2 (L2) - 100 examples

This domain has 4 real valued attributes. The class is calculated as :

IF a3 >= 1 THEN y = -5 - 0.67 x a1
IF a3 < 1 THEN y = 10 + 3 x a1
OTHERWISE y = 100 - 0.5 a1 + 0.3 x a3 + a2

· Non-linear 1(NL1)^[9] - 100 examples]

The examples of this domain are described by 5 attributes. The first has equally probable values 1 and 2. The others are random real numbers. The classes are obtained using the rules :

IF a1 = 1 THEN y = 1 + 2 x a2 + a3 - e^(-2(a4+a5)) IF a1 = 2 THEN y = 1 - 1.2 x a2 - 3.1 x a3 + e^(-3(a4-a5))

· Non-linear 2 (NL2) - 100 examples

This domain uses 3 real valued attributes and the classes are obtained by :

We compared R² with M5. Different parameter settings permit M5 to simulate several other algorithms (like CART, standard linear regression models, etc.). The variations of M5 that we have tried were :

- Default parameter values (M5).

- Standard multiple linear regression models (M5a).

- Regression trees (M5b) - simulating CART, i.e. trees with average values in the leaves (no linear regression performed).

- Instance based (M5c) - simulates an instance-based predictor, namely M5 uses a predictor similar to David Aha's IB1 [1].

- Full models, no smoothing (M5d) - with these settings M5 does not simplify the regression models at the leaves and it also does not perform smoothing [12] when classifying.

We performed a 10-fold cross validation test^[10] on each data set and collected averages on three measures of prediction errors :

· Mean Absolute Error - same as MAD (see equation 1).

· Mean Standard Error - MSE

(5)

· Normalized Mean Standard Error - NMSE

(6)

Predicting the "class" of a given example involves checking which rules are satisfied by the example. If more than one rule is satisfied we have a conflicting prediction problem. Several strategies exist to deal with these problems [16]. Among them we can mention the technique of averaging the predictions, the strategy of choosing the "best rule" prediction, or the average of the predictions weighted by the rules quality. In this initial developing stage of R² we have decided to use the simpler (but not necessarily the worst [16]) strategy of choosing the rule with lowest MAD as the one that is used for obtaining the prediction for the example.

The results on the data sets presented before are shown in table 1. The first line represents the average error over the 10 tests and the second line the corresponding standard deviation.

              L1                   L2                   NL1                  NL2           
        MAD   MSE    NMSE   MAD    MSE    NMSE   MAD    MSE    NMSE   MAD    MSE     NMSE   
 R2     0.7   2.9    0.04   0.9    26.1   0.82   0.9    1.4    0.66   21.1   2679.6  1.37   
        0.5   8.0    0.13   2.8    82.3   2.60   0.2    1.1    0.4    15.5   4389.6  0.80   
 M5     0.9   3.0    0.04   1.5    11.7   0.07   0.3    0.5    0.12   18.0   1669.9  2.71   
       0.42   6.7    0.09   0.9    23.8   0.13   0.2    0.7    0.13   9.8    1795.9  3.32   
 M5a    3.5   19.5   0.28   7.7    81.0   0.68   1.1    2.7    0.77   20.8   3360.6  0.87   
        0.7   7.8    0.09   1.4    21.5   0.30   0.6    3.9    0.28   17.8   4871.6  0.48   
 M5b    0.8   3.2    0.04   2.0    17.2   0.09   0.7    1.6    0.46   21.9   2816.9  3.33   
        0.5   8.4    0.12   1.3    31.7   0.12   0.3    1.6    0.33   16.5   3620.2  6.10   
 M5c    2.0   10.3   0.15   4.6    45.7   0.38   0.5    0.7    0.21   15.4   1267.9  2.72   
        0.7   6.3    0.09   1.4    22.4   0.21   0.2    0.9    0.23   11.5   1653.6  7.36   
 M5d    0.7   2.9    0.04   1.0    24.5   0.16   0.3    0.4    0.15   16.6   2280.7  4.69   
        0.4   7.9    0.13   1.2    34.9   0.22   0.1    0.5    0.17   14.6   3406.7  11.95

Table 1. Comparative results on the artificial data sets.

The results of R² compare reasonably to other systems on these artificial data sets. We believe that if we add some strategy for eliminating terms^[10] on linear regression models we could get some improvement. We hope that by incorporating other non-linear models we will get better results on non-linear data. We also think that the method for searching for good rules could be improved in several aspects. In effect there several issues in the current algorithm that are solved a bit ad hoc and further research should provide better solutions. Nevertheless, the results indicate that our approach is valid although needing some more improvements before it can clearly win over these ML-based systems^[11].

We observed that standard deviation varies quite a lot. This variability of the results could derive from small amount of training cases used in each run (90 examples). The same can be said about the test cases as our measures were based on 10 predictions. We confirmed that the standard deviation decreases significantly if we extend the size of the data. For instance, M5 on a extended L2 data set (1000 examples) got a value of 0.279 of MAD with a standard deviation of 0.062. The reason for generating so small data sets is the fact that R² is currently implemented in Prolog and it has several problems in efficiency. In effect several parts of the system demand intensive numeric computations, namely matrix algebra. Prolog is clearly not the ideal programming language for these tasks. However, on the part of specialization it is much more suitable than other languages due to the symbolic character of the task. We will need to reach some compromise in order to be able to deal with real-world problems.

<< , >> , up , Title , Contents