0:00

This lecture is about bagging, which is short for bootstrap aggregating.

The basic idea is that when you fit complicated models,

sometimes if you average those models together, you get a

smoother model fit, that gives you a better balance between

potential bias in your fit and variance in your fit.

0:19

So bootstrap aggregating has a very simple idea.

The basic idea is take your data and take resamples of the data set.

So, this is the similar to the idea of bootstrapping, which you would have

learned about in the inference class that

is part of the data science specialization.

After you resample the cases with replacement, then

you recalculate your prediction function on that resampled data.

And then you either average the

predictions from all these repeated predictors that

you built or you majority vote or

something like that when you're doing classification.

0:50

The thing is that you get a similar bias that you would get from fitting any one

of those models individually, but a reduced variability

because you've averaged a bunch of different predictors together.

This is most useful for non-linear functions.

So, we'll show an example with smoothing, but it's

also very useful for things like predicting with trees.

1:22

And then I look at the data set and I

can see it has four variables, ozone, radiation, temperature, and wind.

So the idea is that I'm going to try

to predict temperature as a function of ozone.

[BLANK_AUDIO]

So the first thing that we can do is just show you an example of how this works.

So the basic idea is, I'm going to create a matrix

here and it's going to have 10 rows and a 155 columns.

Then what I'm going to do is, I'm going to resample the data set.

2:00

Then I'm going to create a new data set, ozone0, which is

the resample data set for that particular element of the loop.

And that's just the subset of the data set corresponding to our random sample.

Then I'm going to reorder the data set every time by

the ozone variable, and you'll see why in just a minute.

Then I fit a loess curve each time, so a loess is

kind of a smooth curve that you can fit through the data.

It's very similar to the sublime model fits that we

saw in a previous example with modeling with linear regression.

And so the basic idea is we're fitting

a smooth curve relating temperature, to the ozone variables.

So temperature is the outcome, and ozone is the predictor, and each time I

use the resample data set as the data set I'm building that predictor on.

And I use a common span for each time, the

span being a measure of how smooth that fit will be.

2:51

I then predict for every single loess curve the outcome

for a new data set for the exact same values.

I always predict for ozone values 1 to 155.

So the ith row of this ll object is now the prediction from

the loess curve, from the ith resample of the date ozone.

So what I've done here?

I've resampled my data set ten different times, fit a smooth curve through it those

ten different times, and then what I'm

going to do is I'm going to average those values.

3:23

So, here's what it looks like in this plot.

So, here I've plotted ozone on the x

axis, these are the observed ozone values versus temperature

on the y axis, those are the observed

temperature values, and each black dot represents an observation.

Each gray line here represents the fit with one resampled data set.

So you can see the gray lines that have a lot of curviness to them.

They capture a lot of the variability in the data set.

But they also maybe over-capture some of the variability.

They're little bit too curvy.

Once I've averaged those lines together I get something that's a little

bit smoother and is closer to the middle of the data set.

That's the red line.

So the red line is the bagged loess curve.

It's basically the average of multiple fitted loess curves,

the same data set where I've resampled it every time.

4:07

There's a proof that shows that the bagging estimate will always have

lower variability but similar bias to the individual model fits that you do.

In the caret package there's some models that already perform bagging for you.

So if you're using the train function you

could set method to be bagEarth, treebag, or bagFDA.

And those are specific bagged models that the the

model that the caret package will fit for you.

4:34

Alternatively, you can actually build your own bagging function in caret.

This is a bit of an advanced use and so I recommend that you

read the documentation carefully if you're going to be trying to do that yourself.

The idea here though is you basically are going to

take your predictor variable and put it into one data frame.

So I'm going to make the predictors be a data frame that contains the ozone data.

Then you have your outcome variable.

Here's it's going to be just a temperature variable from the data set.

And I pass this to the bag function in caret package.

So I tell it, I want to use the predictors

from that data frame, this is my outcome, this

is the number of replications with the number of

sub samples I'd like to take from the data set.

And then bagControl tells me something about how I'm going to fit the model.

So fit is the function that's going to be applied to fit the model every time.

This could be a call to the train function in the caret package.

Predict is a the way that given a particular

model fit, that we'll be able to predict new values.

So this could be, for example, a call to the predict function from a trained model.

And then aggregate is the way that we'll put the var, the predictions together.

So for example it could average the

predictions across all the different replicated samples.

You can see that if you look at this

custom bag version of the conditional regression trees, you can

see that it gets some of the benefit that I

was showing you in the previous slide with bag loess.

So the idea here is I'm plotting ozone

again on the x-axis versus temperature on the y-axis.

The little grey dots represent actual observed values.

The red dots represent the fit from a single conditional regression tree.

And so you can see that for example, it capture, it doesn't capture the

trend that's going on down here very well, the red line is just flat.

Even though there appears to be a trend upward in the data points here.

But when I average over ten different bagged

model model fits with these conditional regression trees.

I see that there's an increase here in the values in

the blue fit, which is the fit from the bagged regression.

6:41

So we're going to look a little bit

at those different parts of the bagging function.

So in this particular case I'm using the ctreeBag function, which you

can look at in, if you've loaded the caret package in R.

So, for the fit part it takes the data frame

that we've passed and the predict, and the outcome that

we've passed, and it basically uses the ctree function to

train a tree, conditional regression tree on the data set.

This is the last command that's called the ctree command.

So it returns this model fit from the ctree function.

The prediction takes in the object.

So this is going to be an object from the ctree model fit.

And a new data set x, and it's going to get a new prediction.

So what you can see here is it basically calculates each time

the tree response or the outcome from the object and the new data.

It then calculates this probability matrix and

returns either the actually the observed levels that

it predicts or it actually re, just returns

the response, the predicted response from the variable.

7:47

The aggregation then takes those values and averages

them together or puts them together in some way.

So here what this is doing is

it's basically getting the prediction from every

single one of these model fits, so that' s across a large number of observations.

And then it binds them together into one data matrix by with

each row being equal to the prediction from one of the model predictions.

And then it takes the median at every value.

So in other words it takes the median prediction from

each of the different model fits across all the bootstrap samples.

8:24

So bagging is very useful for nonlinear models, and it's widely used.

It's often used with trees.

And you can think of an extension to this as

being random forest, which we'll talk about in a future lecture.

Several models use bagging and caret's main train

function, like I told you about in previous slide.

And you can also build your own specific bagging functions, for any

classification or prediction algorithm that you'd like to take a look at.

For further resources, I've linked to a couple

of different tutorials on bagging and boosting, as

well as the Elements of Statistical Learning which

has a lot more details about how bagging works.

But remember that the basic idea is to basically resample

your data, refit your nonlinear model, then average those model

fits together over resamples to get a smoother model fit,

than you would've got from any individual fit on its own