Clustered Partial Linear Regression
Luís Torgo and Joaquim Pinto da Costa
2003
Abstract
This paper presents a new method that deals with a supervised learning task 
usually known as multiple regression.  The main distinguishing feature of 
our technique is the use of a multistrategy approach to this learning task.
We use a clustering method to form sub-sets of the 
training data before the actual regression modeling takes place. This pre-clustering stage
creates several training sub-samples containing cases that are ``nearby'' to
each other from the perspective of the multidimensional input space. Supervised 
learning within each of these sub-samples is easier and more accurate as our
experiments show. We call the resulting 
method clustered partial linear regression. Predictions using these models are 
preceded by a cluster membership query for each test case. The cluster membership 
probability of a test case is used as a weight in an averaging process that 
calculates the final prediction. This averaging process involves the predictions 
of the regression models associated to the clusters for which the test case may 
belong. We have tested this general multistrategy approach using several 
regression techniques and we have observed significant accuracy gains in several 
data sets. We have also compared our method to bagging that also uses an averaging 
process to obtain predictions. This experiment showed that the two methods are 
significantly different. Finally, we present a comparison of our method with 
several state-of-the-art regression methods showing its competitiveness.