Without a title - Rule Characterization

Rule Characterization

Qualitative characterization of a particular rule R consists of two lists. The first one mentions all the test cases (belonging to DI) that were correctly classified by the rule. The second list mentions all the examples incorrectly covered by the rule.

Quantitative characterization of some rule R is done using estimates of rule quality. Again these estimations are based on the tests made using DI. In INTEG.3 rule quality is calculated using the expression :

QR = ConsR * e^(ComplC,R ^{- 1)} (1)

where ConsR represents an estimate of consistency of rule R and ComplC,R an estimate of completeness.

The notions of consistency and completeness are usual parameters of observing the performance of learning algorithms[Michalski,1983]. With consistency one tries to evaluate how well a rule classifies and with completeness we observe how well a rule covers the universe of examples of the concept to which the rule belongs.

When doing classification two type of errors can occur :- errors caused by misclassification sometimes referred to as errors of comission (EcR) and errors of omission (EoR) which arise whenever a rule fails to cover some case, that is when no classification is actually predicted.

The estimate of consistency of rule R is calculated using the formula :

(2)

where CR represents the number of correctly classified cases, and EcR the number of misclassifications.

As we can see ConsR represents a ratio of correctly classified cases. The errors of omission (EoR) are not included in this expression. These play a role in ComplC,R, the completeness of rule R with respect to concept C. This value is calculated as follows :

(3)

Notice that when estimating rule quality (1) we use the value of ComplC,R as a power of e. We wanted to differentiate the weight of rule consistency and rule completeness. By this method rule consistency is affected by rule completeness, in spite of being more important. In other terms, if we have two rules with equal consistency, the one which covers more cases (more complete) is preferred. With this solution good results were obtained (as it will be shown later). More details about this method and about comparisons with other methods of estimating rule quality can be found in [Brazdil&Torgo,1990b].

<< , >> , up , Title , Contents