Clustered Multivariate Regression

Luís Torgo and Joaquim Pinto da Costa
2000


Abstract

This paper describes a new method for dealing with multiple regression problems. This method integrates a clustering technique with regression trees, leading to what we have named as clustered regression trees. We use the clustering method to form sub-samples of the given data that are similar in terms of the predictor variables. By proceeding this way we aim at facilitating the subsequent regression modeling process based on the assumption of a certain smoothness of the regression surface. For each of the found clusters we obtain a different regression tree. These clustered regression trees can be used to predict the response value for a query case by an averaging process based on the cluster membership probabilities of the case. We have carried out a series of experimental comparisons of our proposal that have shown a significant predictive accuracy advantage over the use of a single regression tree.