## Contents of the ZIP file The files in the ZIP file, will be extracted into a directory containing all code and data that is necessary to replicate all experiments reported in the paper "Re-sampling Strategies for Regression". The folder contains the following files: - allDataSets.Rdata the used data sets - cvAuxs.R the auxiliary functions used by the different learning routines to get the statistics estimated by CV - extremesResampling.R the smote and under-sampling strategies - newCVexps.R the script for running a set of experiences - PaperGraphs.R to replicate the graphs in the paper ## Necessary software To replicate these experiments you will need a working installation of R (check www.r-project.org if you need to download and install it) In your R installation you also need to install the following additional R packages: - DMwR - e1071 - earth - randomForest - uba All the above packages except "uba", can be installed from CRAN Repository directly as ay "normal" R package. Essentially you issue the command : > install.packages(c("DMwR", "e1071", "earth", "randomForest")) within R. The package uba needs to be installed from a tar.gz file that you can download from http://www.dcc.fc.up.pt/~rpribeiro/uba/. Download the tar.gz file into your folder and then issue: > install.packages("uba_0.7.5.tar.gz",repos=NULL,dependencies=T) To replicate the graphs in the paper you also need to install the packages: - plyr - ggplot2 ## Running the experiences: To run all experiments you execute R in the folder with the data and code and then issue the command: > source("newCVexps.R") Alternatively, you may run the experiments directly from a Linux terminal (useful if you want to logout because the experiments take a long time to run): $ > nohup R --vanilla --quiet < newCVexps.R & ## Running a subset of the experiences: For running only a subset of the experiences the newCVexps.R must be altered. The VARS parameters can be altered to incorporate other parameters for the svm, randomForest or earth; the TODO variable can be altered for containing only a subset of the learning systems; the todoDSs variable can also be altered to contain just a subset of the data sets; finally, the parameters used in smote and undersampling can also be altered in the variants defined on the main loop by removing, adding or changing the values. After performing the desired changes, run the experiences as described previously.