Most existing algorithms like for instance, AQ [10] and CN2 [4], use a covering strategy during learning. This means that the algorithm attempts to cover all known examples and that whenever some example has been covered it is removed. These systems would consider a rule useless if it covered examples that are already covered by other rules. AQ16 [15] uses a set of weights to remove this type of rules (considered redundant rules). This method is able to produce simpler theories than if the redundant rules were left in.
YAILS does not follow this method. Whenever the current theory does not cover a new example, new rules are created. The goal of this procedure is to find a "good" rule that covers the example. However the introduced rule may cover examples already covered by other rules. The only criterion used to consider a rule is its quality (see 2.2). Thank to this strategy, YAILS usually generates more rules than other systems.
The utility of such redundant rules can be questioned. The problem becomes even more relevant if we are concerned with comprehensibility. Nevertheless, there are several advantages on using these rules. YAILS is an incremental learning system and so what may seem a redundant rule may become useful in future. This implies that by not discarding some redundant rules the system can save learning time. In addition hand we can look at redundancy as a way of dealing with certain types of uncertainty that arise during classification. Suppose that we have a rule that cannot be used to classify an example because it tests attributes whose values are unknown in the example. If redundant rules are admitted, it is possible that one such rule can be found to classify the example. The advantages of redundancy are in efficiency and accuracy. The disadvantage is the number of rules (comprehensibility) of the theory.
YAILS uses a simple mechanism to control redundancy. Our goal is obtain the advantages of redundancy but at the same time minimise the number of rules used for classification. This mechanism consists on splitting the learned rules in two sets :- the foreground rules and the background rules. This split is guided by a user-definable parameter (minimal utility) which acts as a way of controlling redundancy. The utility of one rule is calculated as the ratio of the number of examples uniquely covered by the rule, divided by the total number of examples covered by the rule (this measure is basically the same as the u-weights used in [15]). Given a value of minimal utility YAILS performs the following iterative process :
Let the initial set of Learned Rules be the Candidate Foreground (CF)
REPEAT Calculate the utility of each rule in the CF IF the lowest utility rule in CF has utility less than the minimal utility THEN Remove it from CF and put it on the Background Set of Rules UNTIL no rule was put on the Background Foreground Set is the final CFThe higher the minimal utility threshold the less redundant is the theory in the foreground. The redundancy present in the foreground set of rules is called here static redundancy. YAILS uses only the foreground set of rules (FS) during classification. Only when it is not able to classify one example, it tries to find one rule in the background set (BS). If such rule is found the system transfers it from the BS to the FS so that in the end FS contains the rules used during the classification of the examples. This latter type of redundancy is called dynamic redundancy. The advantage of this strategy is to minimise the introduction of redundant rules.
YAILS can use different types of classification strategies. The "normal" strategy includes both static and dynamic redundancy. Other possibility is to use only static redundancy disabling thus the use of the BS. Finally it is also possible to use all the rules learned disregarding the splitting referred above. This latter strategy corresponds to the maximum level of redundancy. Notice that for the two first strategies is always possible to state the level of static redundancy through the minimal utility parameter. Section 4 presents the results obtained with several datasets using different classification strategies showing the effect of redundancy on both accuracy and comprehensibility.