<< , >> , up , Title , Contents

2. General Overview of Concept Learning

Typical concept learning algorithms, such as AQ [Michalski&Larson,1978] or ID3 [Quinlan,1983] perform learning from training sets of examples, producing concept descriptions in the form of production rules (AQ) or decision trees (ID3). Production rules have the form of if-then rules, where the if-part contains description of an object while the then-part contains classification of the object into one concept. In a decision tree the nodes represent attributes and the leaves the concepts. Each branch is a value of the attribute in the parent node. Notice that each path in a decision tree, from the root node to a leaf, can be viewed as a production rule.

Many algorithms for learning from examples have been published. Several commercial systems are already on the market. Among the criteria for the evaluation of these systems the most commonly used are accuracy, simplicity and robustness against noise in the given examples.

Usually these characteristics are measured by means of some classification task. Given a set of examples we divide it in a training set which is used for learning purposes and a testing set which is used to evaluate the learned theory. Accuracy is then calculated as the percentage of testing examples that are correctly classified by the learned theory. The algorithm should be robust in the sense that it should cope with the various forms of noise [Brazdil&Clark,1988], such as wrong attribute values, incorrect pre-classification of the examples given to the algorithm, etc. Simplicity can be expressed in terms of the number of rules (or tree branches) and the average length of the rules (or the average number of nodes of a path from the root to a leaf in the tree).

The first learning programs that appeared were based on algorithms that processed the whole set of training examples at the same time, producing a theory. When a new example appeared, the program had to be re-run on the whole training set (plus the new example, of course).

Later incremental algorithms of learning from examples were created. Programs such ID4 [Schlimmer&Fisher,1986], the first incremental version of ID3 were made, as well as several evolutions (ID5, ID5R, IDL, etc.). Also some incremental versions of AQ appeared (AQ15, AQ16, etc.).

Apart from eliminating the need for re-runing the algorithm in the whole set these systems also provide at any moment of time a theory which explains the known examples. This present theory can be used for classification at that particular stage of the learning process.

Further, incremental systems do not require so many examples to develop a plausible theory. They require only those examples that lead to improvements. In this respect, higher efficiency is achieved (see [Markovitch&Scott,1989], but his idea was pointed out as early as in [Mitchell,1977]). Nevertheless given a set of examples, if we use it to learn both in a non-incremental system and in an incremental one, it should be expected that the performance of the non-incremental system is higher. This is obvious, as the non-incremental system makes all its decisions with a view of all the examples, while the incremental one does not (only in the end it is in the same position). Notice that this observation is valid only if the non-incremental and incremental algorithms are similar (let's say that one is an incremental version of the other but without any major strategy differences).

Finally, it seems that incremental systems can cope with at least some of the problems posed by flexible concepts. These are concepts whose meaning varies with time [Michalski,1990].

<< , >> , up , Title , Contents