1 M5 with its default parameters performs term simplification on linear regression models, while currently does not.

2 Currently, the system is restricted to numeric attributes.

3 We are not saying that these conditions represent all the uncovered areas of the search space. They are just portions of the uncovered area that can be efficiently found.

4 There is also a lost in terms of simplicity of the rule, but at this stage we are not considering this factor.

5 As the loss of coverage is a negative effect we use the value (1-LossCOV) on this calculation.

6 Being a weighted average it is in effect one weight as the quality formula will have the form :

Q = GainMAD weight + (1-LossCOV) (1-weight)

7 Notice that it is impossible to use the same formulation for obvious reasons.

8 This is a rather ad hoc heuristic altough it has given reasonable results. We intend to study this issue in the near future.

9 Adaptation of the LEXP artificial problem presented in [8].

10 This type of test consists of dividing the original data set into 10 randomly chosen partitions and then performing 10 learning/testing runs. Each run uses one of the 10 partitions as the test set and the union of the remaining 9 as the learning set. The results of the test are averages over the 10 runs.

11 See section 6.6 of the book Multivariate Analysis [4] for an excellent overview of several methods.

12 M5a is not an ML-based system, of course.