Knowledge Integration and Forgetting - 3.1 Comparisons of Integration Method and Incremental Learning

3.1 Comparisons of Integration Method and Incremental Learning

An interesting question is whether one particular system, say Si, could obtain a similar performance as TI had it been supplied with the data of the other systems. Let us use the name Si+ to identify the system that obtained all the training data in this manner. The performance of Si+ can then be compared with the performance of the integrated theory TI. A series of experiments were conducted to measure the performance of systems S1+ and S3+. The results are shown in Table 3.

Table 3. Performance of the Integrated Theory (TI) and Systems S1+ and S3+.

5 10 15 20 25 30 35 40 45 50 examples

S1+ 50.4 53.4 58.6 60.8 64.9 63.4 62.7 64.4 64.8 66.3 %

S3+ 61.4 63.4 65.3 64.6 70.6 69.8 69.4 71.5 73.8 71.0

TI 55.1 66.5 70.0 73.1 74.2 73.4 73.8 74.4 75.9 76.5

Table 3a. Performance. The first row of this table shows the performance of system S1+ that has been supplied with the data of systems S1 - S4. This system incorporates IRule1 learning method. The second row shows the performance of system S3+ which uses ITree1 learning method. The extra data helps these systems to augment its performance, but the performance of the integrated theory (TI) is better.

5 10 15 20 25 30 35 40 45 50 examples

S1+ 13.5 7.2 10.1 9.0 6.2 7.4 7.3 8.5 7.4 7.1 %

S3+ 14.1 9.9 9.7 7.7 7.5 6.6 5.4 5.0 5.8 7.8

TI 9.4 8.9 8.6 5.4 6.6 8.0 6.8 6.7 7.4 6.6

Table 3b. Standard Deviations of Performance.

5 10 15 20 25 30 35 40 45 50 examples

S1+ 7 11 13 15 18 20 19 21 20 19 rules

S3+ 8 13 17 22 24 27 28 29 30 29

TI 5 7 10 12 14 16 17 19 20 20

Table 3c. Sizes of Theories (Numbers of Rules)

The following figure shows the results in a graphical form:

Fig. 4. Performance of the Integrated Theory TI and Systems S1+ and S3+.

In this figure the performance of the integrated theory TI is contrasted with the performance of systems S1+ and S3+. The graph has been constructed on the basis of data shown in Table 3.

This series of experiments shows that the integrated theory still wins over the incremental methods (although the gains are not too high). These results are somewhat surprising. We would expect that the system that "sees all the data" has a better chance of arriving at a theory that "explains the reality". However, as knowledge integration works with several theories, it has a good chance of eliminating random variations of noise and avoid "overfitting". Perhaps this is the reason why the integrated theory has a better performance than systems S1+ and S3+.

<< , >> , up , Title , Contents