<< , >> , up , Title , Contents

4 Experiments

Several experiments with YAILS were performed on real world domains. The three medical domains chosen were obtained from the Jozef Stefan Institute, Ljubljana. This choice enables comparisons with other systems as these datasets are very often used to test learning algorithms. On the other hand, the datasets offer different characteristics thus enabling the test to be more thorough. Table 1 shows the main characteristics of the datasets :

Table 1. Main characteristics of the datasets.

                 Lymphography    	 Breast Cancer      	Primary Tumour
================================================================================
 Dimension     148 exs./18 attrs.	288 exs./10attrs.	339 exs./17attrs.
		  4 classes      	  2 classes            	  22 classes
 Attributes        Symbolic     	Symbolic+numeric            Symbolic
   Noise           Low level     	    Noisy                  Very noisy
  Unknowns             No       	     Yes                       Yes

The experiments carried out had the following structure: each time 70% of examples where randomly chosen for training and the remaining left for testing; all tests were repeated 10 times and averages calculated.

Table 2 presents a summary of the results obtained on the 3 datasets (standard deviations are between brackets).

Table 2. Results of the experiments.

                 	Lymphography    Breast Cancer   Primary Tumour  
      Accuracy         	  85% (5%)        80% (3%)        34% (6%)        
  No. of Used Rules       14 (2.7)        13.9 (5.6)      37.2 (2.8)      
 Aver.Conditions / Rule  1.86 (0.2)       1.94 (0.13)     1.96 (0.22)

The results are very good on two of the datasets and the theories are sufficiently simple (see table 3 for a comparison with other systems). This gives a clear indication of the advantages of redundancy. We should take into account that YAILS is an incremental system which means that all decisions are made sin a step-wise fashion and not with a general overview of all the data as in non-incremental systems. Because of this, a lower performance is generally accepted. This is not the case with YAILS (with exception to primary tumour) as we can see from the following table :

Table 3. Comparative results.

             Lymphography           Breast Cancer         Primary Tumour   
 System    Accuracy  Complexity  Accuracy  Complexity  Accuracy  Complexity  
 YAILS       85%     14 cpxs.      80%      13.9 cpxs.   34%      37.2 cpxs.
Assistant    78%     21 leaves     77%      8 leaves     42%      27 leaves
 AQ15        82%     4 cpxs.       68%       2 cpxs.     41%       42 cpxs.  
  CN2        82%     8 cpxs.       71%       4 cpxs.     37%       33 cpxs.

The results presented in table 3 do not establish any ranking of the systems as this requires that tests of significance are carried out. As no results concerning standard deviations are given in the papers of the other systems and the number of repetitions of the tests is also different, the table is merely informative. It should also be noted that AQ15 uses VL-1 descriptive language that includes internal disjunctions in each selector. This means that, for instance, the 4 complexes obtained with AQ15 are much more complex than 4 complexes in the language used by YAILS (which does not allow internal disjunction).

4.1 The Effect of Redundancy

<< , >> , up , Title , Contents