Hands-on Ensembles

1. How would you obtain a random forest to forecast the value of alga a4?

library(randomForest)

## randomForest 4.6-14

## Type rfNews() to see new features/changes/bug fixes.

library(DMwR)

## Loading required package: lattice

## Loading required package: grid

data(algae)

algae <- algae[,c(1:11,15)]

algae <- algae[complete.cases(algae),]
rf.a4 <- randomForest(a4 ~ ., algae)

2. Repeat the previous exercise but now using a boosting model

library(gbm)

## Loaded gbm 2.1.5

boost.a4 <- gbm(a4 ~ ., data = algae, n.trees = 100)

## Distribution not specified, assuming gaussian ...

3. Obtain the predictions of the two previous models for the data used to obtain them (train set). Draw a scatterplot comparing these predictions.

preds.rf <- predict(rf.a4, algae)
preds.boost <- predict(boost.a4, algae, n.trees = 100)

plot(preds.rf, preds.boost, xlab="Random Forest Predictions", ylab="Boosting Predictions")
abline(0,1, col="red")

4. Load the data set testAlgae. It contains a data frame names test.algae with some extra 140 water samples for which we want predictions. Use the previous two models to obtain predictions for a4 on these new samples. Check what happened to the test cases with NA’s. Fill-in the NA’s on the test set (remember the function `knnImputation()` from the `DMwR` package) and repeat the experiment.

data(testAlgae)

preds.rf <- predict(rf.a4, test.algae)
preds.boost <- predict(boost.a4, test.algae, n.trees=100)

any(is.na(preds.rf)); any(is.na(preds.boost))

## [1] TRUE

## [1] FALSE

test.algae <- knnImputation(test.algae)

preds.rf <- predict(rf.a4, test.algae)
preds.boost <- predict(boost.a4, test.algae, n.trees=100)

any(is.na(preds.rf)); any(is.na(preds.boost))

## [1] FALSE

## [1] FALSE

Hands-on Ensembles

Nuno Moniz

04/02/2020

1. How would you obtain a random forest to forecast the value of alga a4?

2. Repeat the previous exercise but now using a boosting model

3. Obtain the predictions of the two previous models for the data used to obtain them (train set). Draw a scatterplot comparing these predictions.