Without a title - 3.2 Weighted Flexible Matching

3.2 Weighted Flexible Matching

Systems like AQ16 [15] that strive to eliminate redundancy become more sensitive to uncertainty inherent in real world domains. A small number of rules means that few alternatives exist when classifying the examples. If some condition of those rules is not satisfied the rule can not be used and the system is unable to classify the example. To minimise this undesirable effect these systems use flexible matching. This mechanism consists basically of allowing rules to be used to classify examples even though some of their conditions are not satisfied. With this strategy the systems are capable of improving performance but keeping the theory simple. Nevertheless, flexible matching does not solve some types of problems. If we have very simple rules (one or two conditions) and an example with an unknown value, then flexible matching is not sufficiently reliable. Small rules are in fact quite frequent. When using for instance the "Lymphography" medical dataset the resulting theory can have on average 2 to 3 conditions per rule. Flexible matching may fail to help in these situations. That is the reason why YAILS uses both redundancy and flexible matching during classification.

To explain flexible matching in YAILS, we need to describe the notion of weights associated with all conditions in each rule. These are generated by YAILS in the learning phase. The aim of these values is to express the relative importance of a particular condition with respect to the conclusion of the rule. YAILS uses the decrease of entropy originated by the addition of the condition as the measure of this weight:

Weight(c) = H(R-c) - H(R) (3)

where

c is a condition belonging to the conditional part of rule R,
R-c is the conjunction resulting from eliminating the condition c from the conditional part of R, and H(x) is the entropy of event x.

These values play an important role in flexible matching. Given an example to classify, YAILS calculates the value of its Matching Score (MS) for each rule. This value is 1 if the example completely satisfies all the conditions of the rule, and a value between 0 and 1 otherwise. In effect it is a ratio of the conditions matched by the example. These conditions are weigh using (3). On the other hand if the example has some unknown value, equation (2) is used as an approximation. The general formula to calculate MS values is the following :

(4)

Just to better illustrate the idea observe the following example (between brackets the condition weights) :

That is the matching score of the example relative to the rule is 93.27%.

Having calculated this value for all rules YAILS disregards those whose MS is less than some threshold. The remaining set of rules are the candidates for the classification of the example. For those rules the system calculates the Opinion Value (OV) of each rule which is the product of the MS times the rule quality (Q) obtained during the learning phase. The classification of the example is the classification of the rule with highest OV. Note that if this latter set of rules is empty this means that there was no rule in FS able to classify the example. In that case the next step would be to apply the same procedure in the background set.

The mechanisms of redundancy and weighted flexible matching are interconnected in YAILS. The user can control this mechanisms through the minimal utility parameter as well as the threshold referred to above. These two values enable YAILS to exhibit different behaviours. For instance, if you are interested in very simple theories then the minimal utility should be set near 1 and the flexible matching threshold to the lowest possible value but be careful not to deteriorate accuracy. On the other hand, if you are interested only in accuracy you could set the minimal utility to a value near 0 and raise the strictness of the flexible matching mechanism. Of course all these parameter settings are dependent on the type of domain. Section 4.1 shows some experiments with these parameters and their effect on accuracy and comprehensibility.

<< , >> , up , Title , Contents