The goal of these experiments was to compare the results of the ML learned models to other types of systems not belonging to the symbolic learning field.
We compare our models to the 12 methods used in the DGOR competition. This comparison was made using three statistics of errors :- the mean square error (MAE), root mean squared error (RMSE) and Theil's coefficient (TU). They are calculated using the following formulas :
The goal of the experiments was to predict the value of the goal variable for the next 12 time periods. Errors were calculated on the basis of those 12 predictions. In resume for each of our 5 data sets we constructed the data sets of examples for each of the candidate attribute introduction strategy and after learning, we tried to predict those 12 future values using the resulting learned model.
The 12 methods used in the DGOR competition included smoothing average variants, Kalman filters, Box-Jenkins [1] methods, etc. In the table of results we refer to them as m1 to m12.
We summarize the results of the experiments in table 1. This table shows the results on the MAE statistic. The results are ordered showing the best model at the top.
ZR03 ZR04 ZR06 ZR11 ZR15 m5 9.9 ZR04_t2 0.97 m10 3.28 m8 208 m11 26.8 m12 12.2 ZR04_t3 0.98 m11 3.38 m1 217 m4 28.8 m3 14.8 ZR04_t4 0.98 m7 3.73 m7 222 m8 30.9 m8 16.9 ZR04_t5 0.98 m5 3.97 m11 243 m5 36.5 m11 17.9 ZR04_t2d1 0.98 m12 3.98 m6 251 m6 37.3 m7 19.7 ZR04_t3d1 0.98 m3 4.1 m3 255 m7 40 m9 22.3 ZR04_t4d1 0.98 m8 4.15 m2 257 m1 40.4 m4 25.1 ZR04_t5d1 0.98 m1 4.26 m9 292 ZR15_t1 42.2 6 m10 25.9 ZR04_t3d2 0.98 m2 4.87 m12 292 m10 44.2 ZR03_t1 26.4 ZR04_t4d2 0.98 m9 5.33 m5 295 m12 45.8 8 m2 26.5 ZR04_t5d2 0.98 m4 5.65 m10 308 m3 46.8 m1 29.4 ZR04_sm3 0.98 ZR06_t3d1 6.64 m4 317 ZR15_t2d1 48.2 ZR03_t3d1 30.6 ZR04_t4sm 0.98 ZR06_t2d1 6.71 ZR11_t4d2 422. ZR15_t5 48.4 9 4w 27 4 ZR03_t3d2 30.6 ZR04_t4sm 0.98 ZR06_sm3 6.71 ZR11_t4sm 432. m2 48.6 9 4wv 4w 85 ZR03_t2 31.6 m4 1 m6 6.79 ZR11_t4sm 440. ZR15_t4 48.7 3 4wv 84 4 ZR03_t2d1 32.1 ZR04_t4sm 1 ZR06_t5 6.83 ZR11_t4sm 466. ZR15_t3d1 48.7 8 4 4 3 4 ZR03_smt5 32.6 m2 1.08 ZR06_t3d2 6.87 ZR11_t1 471. ZR15_t4d1 48.7 d2 7 88 4 ZR03_t4d1 34 m10 1.08 ZR06_t4d1 6.88 ZR11_t2 477. ZR15_t5d1 48.7 8 4 ZR03_t5d1 34 m11 1.09 ZR06_t5d1 6.98 ZR11_t2d1 477. ZR15_t4d2 48.7 8 4 ZR03_t3 34.5 ZR04_smt5 1.16 ZR06_t5d2 6.98 ZR11_sm3 478. ZR15_t5d2 48.7 3 d2 37 4 ZR03_t5d2 34.5 ZR04_t1 1.25 ZR06_t4 7.29 ZR11_t3d1 518. ZR15_sm3 48.7 5 93 4 ZR03_t5 35.0 m6 1.29 ZR06_t4sm 7.29 ZR11_t4d1 526. ZR15_smt5 48.7 9 4wv 02 d2 4 ZR03_t4d2 35.2 m5 1.38 ZR06_t3 7.53 ZR11_t5d2 546. ZR15_t4sm 48.7 7 57 4 4 ZR03_t4 35.4 m12 1.44 ZR06_t4d2 7.58 ZR11_t5d1 579. ZR15_t4sm 48.7 5 36 4w 4 ZR03_sm3 35.4 m8 1.45 ZR06_t4sm 7.63 ZR11_t3d2 585. ZR15_t4sm 48.7 7 4 25 4wv 4 ZR03_t4sm 35.5 m3 1.54 ZR06_smt5 8.17 ZR11_t4 591. ZR15_t2 49.8 4wv 5 d2 29 4 ZR03_t4sm 35.5 m7 1.57 ZR06_t4sm 8.19 ZR11_smt5 769. ZR15_t3 54.0 4w 6 4w d2 3 1 ZR03_t4sm 36.8 m9 1.89 ZR06_t2 8.5 ZR11_t3 827. m9 58.6 4 1 32 m6 37.9 m1 1.9 ZR06_t1 8.96 ZR11_t5 841. ZR15_t3d2 64.8 71 2Table 1. Summary of comparative results with other fields' methods.
The ordering for the other statistics is similar so we omit the tables for space reasons.
As we can see from table 1 the results of our methods vary a lot from domain to domain. They range from the surprisingly good results of M5 on ZR04 to the very bad results on ZR11. This variation also occurs with the other methods as there seem to be no clear winner on all problems. However, our results are not generally ranked in the best positions. This indicates that further research is needed before this ML-based models are ready for this kind of competitions. We expect that by tuning some of the learning parameters of M5 we can improve these results. Automatic feature selection is another technique that can improve our rankings. We should also try other learning algorithms like RETIS.
We have tried to understand the reason for such bad outcome in the ZR11 data. We observed that M5 was using a branch of its learned model tree whose label was not a regression formula but an average value (which can happen in M5). The system was making almost always the same prediction for all 12 values of this data set. This value was an average value which of course would not allow the system to go outside of the scope of the values in the learning set. This clearly indicates that further investigation is needed on either improving the behavior of M5 through parameter tuning, or develop other systems that do not have this behaviour.