Footnotes

¹ M5 with its default parameters performs term simplification on linear regression models, while currently does not.

² Currently, the system is restricted to numeric attributes.

³ We are not saying that these conditions represent all the uncovered areas of the search space. They are just portions of the uncovered area that can be efficiently found.

⁴ There is also a lost in terms of simplicity of the rule, but at this stage we are not considering this factor.

⁵ As the loss of coverage is a negative effect we use the value (1-LossCOV) on this calculation.

⁶ Being a weighted average it is in effect one weight as the quality formula will have the form :

Q = GainMAD weight + (1-LossCOV) (1-weight)

⁷ Notice that it is impossible to use the same formulation for obvious reasons.

⁸ This is a rather ad hoc heuristic altough it has given reasonable results. We intend to study this issue in the near future.

⁹ Adaptation of the LEXP artificial problem presented in [8].

¹⁰ This type of test consists of dividing the original data set into 10 randomly chosen partitions and then performing 10 learning/testing runs. Each run uses one of the 10 partitions as the test set and the union of the remaining 9 as the learning set. The results of the test are averages over the 10 runs.

¹¹ See section 6.6 of the book Multivariate Analysis [4] for an excellent overview of several methods.

¹² M5a is not an ML-based system, of course.