Clustered Multivariate Regression
Luís Torgo and Joaquim Pinto da Costa
2000
Abstract
This paper describes a new method for dealing with multiple
regression problems. This method integrates a clustering technique
with regression trees, leading to what we have named as clustered
regression trees. We use the clustering method to form sub-samples of
the given data that are similar in terms of the predictor
variables. By proceeding this way we aim at facilitating the subsequent
regression modeling process based on the assumption of a certain smoothness of the regression surface. For each of the found clusters we obtain
a different regression tree. These clustered regression trees can be
used to predict the response value for a query case by an averaging
process based on the cluster membership probabilities of the case. We
have carried out a series of experimental comparisons of our proposal
that have shown a significant predictive accuracy advantage over the
use of a single regression tree.