6 CONCLUSIONS

We have presented an empirical study of hybrid regression trees. We have compared several alternative models for regression tree leaves. Regarding existing work on regression we have added the possibility of using kernel models in regression tree leaves. Our experiments with 11 data sets have shown the advantage of these models over the existing approaches.
The experiments carried out revealed that hybrid models are able to overcome some limitations of the individual models. Hybrid models achieve high accuracy levels across all tested domains. Moreover, they provide a symbolic representation of the unknown regression function that improves our knowledge of the domain.
The use of local models has a significant computational cost on large data sets. Hybrid models relax this problem by applying the models on subsets of the input domain without a significant loss on accuracy and even with some gains when the unknown regression surface has strong discontinuities.

References
Aha, D. (1990) : A study of instance-based learning algorithms for supervised learning tasks: Mathematical, empirical, and psychological evaluations. PhD Thesis. Tech. Report 90-42. University of California at Irvine, Department of Information and Computer Science.
Aha, D. (1992) : Generalizing from case studies : A case study. In Proceedings of the 9th International Conference on Machine Learning. Sleeman,D. & Edwards,P. (eds.). Morgan Kaufmann.
Atkeson,C.G., Moore,A.W., Schaal,S. (in press) : Locally Weighted Learning. Special issue on lazy learning. Aha, D. (Ed.). Artificial Intelligence Review.
Bentley,J.L. (1975) : Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509-517.
Brazdil,P. , Gama,J., Henery,B. (1994) : Characterizing the applicability of classification algorithms using meta-level learning. In Proceedings ECML-94. Bergadano,F. & De Raedt,L. (eds.). Lecture Notes in Artificial Intelligence, 784. Springer-Verlag.
Brazdil,P. Torgo,L. (1990) : Knowledge Acquisition via Knowledge Integration. In Current Trends in Knowledge Acquisition. Wielinga,B. et al. (eds.). IOS Press.
Breiman,L. (1996) : Bagging Predictors. In Machine Learning, 24, (p.123-140). Kluwer Academic Publishers.
Breiman,L. , Friedman,J.H., Olshen,R.A. & Stone,C.J. (1984): Classification and Regression Trees, Wadsworth Int. Group, Belmont, California, USA, 1984.
Broadley, C. E. (1995) : Recursive automatic bias selection for classifier construction. Machine Learning, 20, 63-94. Kluwer Academic Publishers.
Chatfield, C. (1983) : Statistics for technology (third edition). Chapman and Hall, Ltd.
Cleveland, W.S. (1979) : Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74, 829-836.
Cleveland,W.S., Loader,C.R. (1995) : Smoothing by Local Regression: Principles and Methods (with discussion). In Computational Statistics.
Deng,K., Moore,A.W. (1995) : Multiresolution Instance-based Learning. In Proceedings of IJCAI'95.
Domingos,P. (1996) : Unifying Instance-based and Rule-based Induction. In Machine Learning, 24-2, 141-168. Kluwer Academic Publishers.
Draper,N.R., Smith,H. (1981) : Applied Regression Analysis, 2nd edition, John Wiley.
Fix, E., Hodges, J.L. (1951) : Discriminatory analysis, nonparametric discrimination consistency properties. Technical Report 4, Randolph Filed, TX: US Air Force, School of Aviation Medicine.
Freund,Y., Schapire,R.E. (1995) : A decision-theoretic generalization of on-line learning and an application to boosting. Technical Report . AT & T Bell Laboratories.
Hong,S.J. (1994) : Use of contextual information for feature ranking and discretization. Technical Report RC19664, IBM. To appear in IEEE Trans. on Knowledge and Data Engineering.
Indurkhya,N., Weiss,S. (1995) : Using Case Data to Improve on Rule-based Function Approximation. In proceedings of the 1st International Conference on Case-base Reasoning (ICCBR'95). Lecture Notes in AI, 1010, p.217-228. Springer-Verlag.
Karalic,A. (1992) : Employing Linear Regression in Regression Tree Leaves. In Proceedings of ECAI-92. Wiley & Sons.
Kira,K., Rendell,L.A. (1992) : The feature selection problem : traditional methods and new algorithm. In Proceedings of AAAI'92.
Kononenko,I. (1994) : Estimating attributes : analysis and extensions of Relief. In Proceedings ECML-94. Bergadano,F. & De Raedt,L. (eds.). Lecture Notes in Artificial Intelligence, 784. Springer-Verlag.
Merz,C.J., Murphy,P.M. (1996) : UCI repository of machine learning databases [http:// www.ics.uci.edu/ MLReposiroty.html]. Irvine, CA. University of California, Dept. of Information and Computer Science.
Michie,D., Spiegelhalter;D.J. & Taylor,C.C. (1994): Machine Learning, Neural and Statistical Classification, Ellis Horwood Series in Artificial Intelligence, 1994.
Nadaraya, E.A. (1964) : On estimating regression. Theory of Probability and its Applications, 9:141-142.
Press,W., Teukolsky,S., Vetterling,W., Flannery,B. (1992) : Numerical Recipes in C, 2nd edition. Cambridge University Press.
Quinlan, J.R. (1992): Learning with Continuos Classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence. World Scientific, 1992.
Quinlan,J.R. (1993) : Combining Instance-based and Model-based Learning. Proceedings of the 10th ICML. Morgan Kaufmann.
Robnik-Sikonja,M., Kononenko,I. (1996) : Context-sensitive attribute estimation in regression. Proceedings of the ICML-96 Workshop on Learning in Context-Sensitive Domains.
Salzberg,S. (1991) : A nearest hyperrectangle learning method. Machine Learning, 6-3, 251-276. Kluwer Academic Publishers.
Smyth,P., Gray,A., Fayyad,U.M. (1995) : Retrofitting Decision Tree Classifiers using Kernel Density Estimation. Proceedings of the 12th ICML. Prieditis,A., Russel,S. (Eds.). Morgan Kaufmann.
Stone, C.J. (1977) : Consistent nonparametric regression. The Annals of Statistics, 5, 595-645.
Torgo,L. Gama,J. (1997) : Search-based Class Discretization. To appear in Proceedings of ECML'97. Springer-Verlag.
Utgoff,P. (1989) : Incremental induction of decision trees. Machine Learning, 4-2, 161-186. Kluwer Academic Publishers.
Watson, G.S. (1964) : Smooth Regression Analysis. Sankhya: The Indian Journal of Statistics, Series A, 26 : 359-372.
Weiss, S. and Indurkhya, N. (1993) : Rule-base Regression. In Proc. of the 13th IJCAI, p.1072-1078.
Weiss, S., Indurkhya, N. (1995) : Rule-based Machine Learning Methods for Functional Prediction. In Journal Of Artificial Intelligence Research (JAIR), 3, p.383-403.
Wettschereck,D. (1994) : A study of distance-based machine learning algorithms. PhD thesis. Oregon State University.
Wettschereck,D., Aha,D.W., Mohri,T. (in press) : A review and empirical evaluation of feature weighting methods for a class of lazzy learning algorithms. Special issue on lazy learning. Aha, D. (Ed.). Artificial Intelligence Review.
Wolpert,D.H. (1992) : Stacked Generalization. In Neural Networks, 5, (p.241-259).