Without a title - 3.1 Redundancy

3.1 Redundancy

YAILS learns a set of rules that might be redundant. The issue of redundancy is two-folded. On one side it increases complexity of the learned theory, which is undesirable. On the other hand it helps to solve several problems related to real world domains, namely noise, unknown attribute values, etc.

Looking in more detail at the advantages of redundancy we can see that in incremental learning programs the notion of redundancy is somehow dangerous. In effect what looks redundant at present time can became useful with more experience arriving to the program. With respect to unknown attribute values the existence of other rules (possibly redundant) which test other attributes can overcame the problem of classifying those examples. This is a real problem because rules are usually very small (2 or 3 conditions) and if the example to classify has an unknown value on some of these condition's attributes then classifying it with that rule can be unreliable. Another issue is that we are talking about induction and we can't be sure that two or more different rules are really redundant. They seam to be with the examples that we have seen during learning but still they can cover some different characteristics of the learning space which can be satisfiable by some examples and not by others. If we throw them away we are loosing those differences.

The main problem of redundancy is the increase in complexity. This is a serious problem being comprehensibility an important goal of learning programs. In order to overcame this drawback YAILS uses a mechanism of controlled redundancy [Torgo,1992]. The main idea of this mechanism is to split the learned rules in two different sets :- the foreground set (FS) and the background set (BS). This splitting is done every time a classification task is demanded. YAILS uses only the FS to classify examples. When it's not able to classify one example YAILS consults the BS. If there is a rule in the BS which would enable the classification YAILS uses it and transfers it to the FS. This means that in the FS there are always the rules responsible for the classification performance of YAILS.

The splitting of the learned theory in two sets is done using a criterion of utility [Zhang&Michalski,1989]. There is a user definable parameter which states the minimum utility demanded to one rule in order to belong to the FS. The utility of one rule is calculated as follows:

where

The splitting of the theory is an iterative process as the utility of one rule depends on the other rules.

<< , >> , up , Title , Contents