<< , >> , up , Title , Contents

2. The R2 system

R² resembles in many ways other propositional learning systems. It receives as input a set of examples, each of them described by a set of attributes^[2] and labeled by a class (in this case a real number). The algorithm of R² consists of two main steps. It starts by trying to develop a regression model for the set of examples under consideration. We then check if by specializing the domain of applicability of the resulting model we obtain a better fit of the data. The resulting rule is added to the theory and the system repeats the process until a given criterion is reached. Figure 1 presents this algorithm :

UncoveredRegion <- AllDomain

WHILE CurrentCoverage < CoverageRequestedByUser DO

Select an uncovered region of the Domain

Build an Unconditional Model for that region

REPEAT

Restrict the domain of applicability of the current model

Update the Model for the new domain of applicability

UNTIL the resulting specialization is worse than the current model

Fig. 1 - R² main algorithm.

Let us see how R² selects an uncovered region. The goal of this step is to find a condition that defines a region of the domain where we can find some examples yet uncovered by the current rules. In the beginning of the learning process this condition is empty as all domain is uncovered. As soon as new rules are learned R² finds these uncovered regions by looking at the rule conditions. Imagine that we have the following rules in our theory :

R² negates each condition of all rules and chooses the condition that is satisfied by more uncovered examples. With the rules presented above one of the conditions of the set { X1 <= 13, X2 > 0.5, X3 <= 5}^[3] is chosen for developing a regression model. chooses the condition that covers more yet uncovered examples. This ensures a quicker convergence towards the terminating condition of the algorithm. This condition is satisfied by a set of examples. Some of them are uncovered but others can already be covered by other rules in the theory. For instance, if the chosen condition was X3 5, it is possible that an example already covered by the rule R0 also satisfies this condition. The consequence of this is that R² can learn different rule models for the same data. This form of redundancy is similar to the one used in the classification system Yails [17]. In noisy data it has obvious advantages [17] when compared to the more common covering algorithms (like CN2 [3] or AQ [10]).

The next step in the algorithm consists of developing a regression model for the chosen condition. R² uses a lattice of increasingly complex regression model languages as the basis of this process. The system builds a regression model using each of these languages and chooses the one with smaller fitting error. At this stage R² is very similar to a classical regression system. Given a set of data it tries to develop a regression model for it. The difference is that the system tries several types of regression models but the algorithm is independent of the strategy followed for this task. As a consequence it is very easy to either enlarge the set of languages or even apply any standard regression package to the data. Currently, as the system is in an initial stage of implementation, we are using a limited set of regression models. This set is equivalent to the one used by M5 or RETIS and it is presented below :

where

y is the variable representing the class

is the average class value

Ki's are constant values

and Ai's are attribute values

The final step in the algorithm consists of trying to specialize the model previously built. This is an iterative search procedure that ends when no further improvements are possible according to a given evaluation criterion that will be explained later.

The set of candidate specializations is built in the following way. Given a rule covering a set of examples R² uses the example that the rule fits worst as the source for this process. The algorithm tries to exclude this example from the covered set by adding some condition to the rule. For each attribute of the domain R² checks the value of the example and tries conditions on that attribute that guarantee its exclusion. Imagine the following rule with its respective worst fit example :

R² would generate the set of candidate specializations of rule R3 by adding each of the following conditions:

X1 < 10 X1 > 10 X2 < 0.5 X2 > 0.5 X3 > 0.2
X3 < 0.2 X4 < 21 X4 > 21 X5 < 6.32 X5 > 6.32
If any of the resulting specializations is better than R3 according to the evaluation criterion that will be presented below, R² will consider it the new best rule and will proceed by applying the same worst-fit exclusion algorithm to this new rule.

2.1 Evaluating the specializations of a rule

<< , >> , up , Title , Contents

X1 < 10	X1 > 10	X2 < 0.5	X2 > 0.5	X3 > 0.2
X3 < 0.2	X4 < 21	X4 > 21	X5 < 6.32	X5 > 6.32