Hands-on Tree-Based Model

1. Build a regression tree for the data set

ds <- read.csv("~/Desktop/Academia/Teaching/2020 - 2021/2º Semestre/DF_MSI/Classes/Class 5 - Predictive Modelling 1/wine.data")
head(ds)

##   Class Alcohol Malic_acid  Ash Ash_Alcalinity Magnesium Total_Phenols
## 1     1   14.23       1.71 2.43           15.6       127          2.80
## 2     1   13.20       1.78 2.14           11.2       100          2.65
## 3     1   13.16       2.36 2.67           18.6       101          2.80
## 4     1   14.37       1.95 2.50           16.8       113          3.85
## 5     1   13.24       2.59 2.87           21.0       118          2.80
## 6     1   14.20       1.76 2.45           15.2       112          3.27
##   Flavanoids Nonfla_Phenols Proanthocyanins ColorInt  Hue   OD Proline
## 1       3.06           0.28            2.29     5.64 1.04 3.92    1065
## 2       2.76           0.26            1.28     4.38 1.05 3.40    1050
## 3       3.24           0.30            2.81     5.68 1.03 3.17    1185
## 4       3.49           0.24            2.18     7.80 0.86 3.45    1480
## 5       2.69           0.39            1.82     4.32 1.04 2.93     735
## 6       3.39           0.34            1.97     6.75 1.05 2.85    1450

any(is.na(ds))

## [1] FALSE

library(DMwR)

## Loading required package: lattice

## Loading required package: grid

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

m <- rpartXse(Proline ~ ., ds, model=TRUE)

2. Obtain a graph of the obtained regression tree

library(rpart.plot)

## Loading required package: rpart

prp(m,type=3,extra=101)

3. Apply the tree to the data used to obtain the model and calculate the mean squared error of the predictions

p <- predict(m, ds)
mse <- mean((ds$Proline - p)^2); mse

## [1] 31167.56

4. Split the data set in two parts: 70% of the tests (training data) and the remaining 30% (test data). Use the larger part to obtain a regression tree and apply it to the other part. Calculate the mean squared error again, and compare with the previous scores.

ind.train <- sample(1:nrow(ds),0.7*nrow(ds),replace = FALSE)
train <- ds[ind.train,]
test <- ds[-ind.train,]

m <- rpartXse(Proline ~ ., train, model=TRUE)
p <- predict(m, test)
mse <- mean((test$Proline - p)^2); mse

## [1] 31818.95

Hands-on Tree-based Models

Nuno Moniz

03/18/2021

Hands-on Tree-Based Model

1. Build a regression tree for the data set

2. Obtain a graph of the obtained regression tree

3. Apply the tree to the data used to obtain the model and calculate the mean squared error of the predictions

4. Split the data set in two parts: 70% of the tests (training data) and the remaining 30% (test data). Use the larger part to obtain a regression tree and apply it to the other part. Calculate the mean squared error again, and compare with the previous scores.