0:00

This lecture's going to be about the different counting systems in R.

Â And over time R has developed three core plotting systems that vary from each other

Â in slightly different ways, and are useful for achieving a few different goals.

Â I thought about, I thought I'd talk about you know the three different systems.

Â What makes them different, and how they're useful for various types of plots.

Â So the first system that I'll talk about is, is, is usually referred to as the

Â base plotting system.

Â The base plotting system is the oldest system in R.

Â It came with the original version of R.

Â 0:32

and, the kind of conceptual model that it uses

Â for building plots is a kind of artist's palette model.

Â The idea is that you're kind of, you're, you're kind of, you have a blank canvas.

Â 0:43

And then you kind of add things to it one by one.

Â So first you create the, a box with some, maybe some points in it.

Â And then you add labels. And then maybe you add a regression line

Â through it.

Â And then maybe you add titles, and, and, axis ticks, and things like that.

Â And so you kind of piece together this plot, one by one.

Â Every little piece of the plot takes another

Â line of code, or another couple lines of code.

Â And so you kind of add to it one by one.

Â So this is a kind of intuitive

Â model, because of, especially when you are exploring

Â data because you may not know right away what's the plot that you want to make.

Â And so maybe you'll just throw some points on the canvas,

Â and then you'll add some colors, then you'll add some labels,

Â and then eventually you'll piece it all together.

Â And so this is all well and good as long as you're kind

Â of keeping track of all the code that you used to make the plot.

Â Then you can always reconstruct the plot later.

Â 1:28

And so the typical mode is for this type of model is to

Â use the plot function so, there's always a function that generates a plot.

Â And then and there are other functions that so called, annotate the plot.

Â And these

Â are functions that add things like text,

Â lines, labels, things like that, to the plot.

Â So the generation, and then there's the annotation.

Â 1:48

so, the nice thing about the system that's very convenient, it's kind of intuitive.

Â I think many people think of building plots in this manner.

Â But one of the, one of the drawbacks is that you can't go back, so if

Â you, if you make a plot and you add something to it, you can't take it away.

Â 2:10

It's difficult to translate the a new plot to another person.

Â So for example, if you develop a new kind

Â of plot it's, there's really no way to translate

Â that, those ideas to another person, because there's no

Â language, there's no kind of conceptual language to use.

Â Every plot is just a, is just a bunch of R code basically.

Â And so and so that's a little bit of a drawback sometimes.

Â Another drawback

Â with the base plotting system is that every, is because you have

Â so much control over the system, you therefore have to control everything.

Â And therefore set everything very carefully

Â if you don't like the default values.

Â 2:55

the speed at which a car is moving and the distance at

Â which it takes to bring, to bring the car to a full stop.

Â And so you can see it's just a simple scatter plot

Â with speed on the x axis and distance on the y axis.

Â And so base plots look like this.

Â You could add a lot of other things. You could add a title.

Â You could add labels in the plot.

Â You could make the points a different color.

Â You can make them in different shapes.

Â There's all kinds of options that you can choose

Â from and we'll talk about that in detail later.

Â 3:20

The second major plotting system

Â in R is called the Lattice System.

Â And this is implemented in the Lattice package.

Â So this, the idea here is actually quite different from the base plotting system.

Â Rather than piecing a plot together one by one through a

Â series of commands, every plot is constructed with a single function call.

Â And so the most commonly used function is going to be the function xyplot.

Â But there are other functions like bwplot, etc.

Â And so these functions basically construct an entire plot

Â all at once.

Â And so therefore, you have to specify a lot of information in the call to the

Â function, so that it has enough data to build a plot and in an appropriate way.

Â The last system is most useful for

Â what are called, Co-plots or Conditioning Plots.

Â Where you have, you want to look at the relationship between,

Â let's say x and y as it changes across levels of z.

Â And so, there, the, so you.

Â You're conditioning on different levels of z and then you're

Â looking at x and y at each one of those plots.

Â These are some times called panel plots.

Â because you're looking at the same thing in every

Â panel but just for different levels of a third variable.

Â And you can even combine variables so you can look at multiple factors.

Â 4:29

So the system is very useful, because you can put a lot of plots on a

Â page very easily and very quickly as long

Â as you kind of follow this conditioning model.

Â And then, furthermore, a lot of the details that you would have

Â to specify directly in the base

Â plotting system are kind of calculated automatically.

Â So things like the margins and a

Â lot of the spacings, are calculated automatically as

Â long as you can accept the, those defaults then most things will look quite nice.

Â 4:57

So the downside of the Lattice system is that sometimes it's, it's going to

Â be very awkward to specify an entire plot using a single function call.

Â And sometimes you, it, it seems more natural to kind of

Â piece things together one by one, like in the Base system.

Â 5:10

the, it's difficult to annotate a plot, especially after the

Â plot's been generated, you can't add anything to the plot.

Â It's done, and if you want to add something,

Â you kind of have to reconstruct the function call altogether.

Â 5:22

There is a

Â way to annotate each of the individual panels in

Â a Lattice plot, but it's a very tricky and

Â a not very intuitive use of functions like panel

Â functions or things like subscripts, which is not very intuitive.

Â And finally like I said you can't add to a plot once it is done.

Â 5:38

So here's a basic Lattice plot.

Â I use some data from the Lattice package. And I basically plot life expectancy.

Â So the average life expectancy

Â in a state, versus the average per capita income in that state.

Â This is data from the late 60s, early 70s and, and then

Â I condition on the region of the country that the state is in.

Â So the, the country is divided into four regions.

Â And you can see it, look at the relationship between income and

Â life expectancy by state within each, or sorry, across states with, by region.

Â And so you can see that this type of panel

Â plot is very, is just a single function call, in

Â Lattice.

Â I use the xy plot function and its

Â very simple to construct where something like this.

Â And the base plotting system would involve many different lines of code.

Â It would be much more involved. So

Â 6:24

the last system, plotting system that I want to talk about is the ggplot2 system.

Â So this comes from the the grammar of graphics which is

Â which lays out a set of principles for a kind of plotting.

Â and, and it creates a kind of language

Â or grammar for describing different aspects of a plot.

Â So it's based on a kind of well grounded kind of rigorous theoretical system.

Â And it's implemented in R in the gg2,

Â ggplot2 package.

Â It kind of splits the difference between the Lattice

Â and the base package, so it mixes ideas from both.

Â So on the one hand you can kind of build the plot incrementally

Â by adding things one by one, so that's kind of like the base system.

Â On the other hand a lot of the kind

Â of aesthetic calculations are done automatically without you having

Â to directly control, so things like spacings and labels

Â are all kind of put in the right place.

Â So that's kind of like the Lattice system.

Â The ggplot2 system is very useful for conditioning plots just like in Lattice.

Â And so you can make those kinds of panel plots.

Â 7:27

And the default of, the default so the ggplot2 system has a lot of defaults.

Â And if you can accept them, it's quite handy.

Â But you can always customize them if you don't like what the defaults are.

Â And so if you know how to

Â use the Lattice system, you, the transition

Â to ggplot2 is not too difficult, although there

Â are some differences that are worth, that I'll

Â talk about in in the lecture on ggplot.

Â 7:51

So this a typical ggwhat default ggplot2 plot.

Â Here I, here I used the miles per gallon

Â data set from the ggplot2 package and I'm plotting on

Â the x axis the kind of, the displacement or

Â it's actually the size of the engine of a car.

Â And the y axis

Â is the mileage on the highway for that car.

Â And you can see that roughly as the size

Â of the engine is increasing, the mileage is decreasing.

Â And so you can see that the ggplot2

Â package creates plots in a slightly different aesthetic.

Â There's a kind of a gray background with white grid lines.

Â The default is to use solid circles rather than open circles.

Â And so you know, these things, you can always customize if you want.

Â But the defaults are a little bit different than the other two systems.

Â 8:42

Artist palette model, where you kind of add things one by one.

Â There's the Lattice system which you kind of

Â specify an entire plot using a single function call.

Â And then there's the ggplot2 system which looks, which

Â kind of mixes the custom ideas from both systems.

Â One of

Â the important things to know when you're using

Â these three systems is that they can't be interchanged.

Â You can't use them interchangeably.

Â So if you're going to use the base plotting system, you

Â have to use all the functions associated with that system.

Â Similarly if you're going to use the ggplot2 system, you

Â have to use all the functions associated with that system.

Â So you can't mix the functions between systems.

Â Because otherwise you'll get, the plotting will be confused.

Â And so so you typically have to choose a system and kind of go with it

Â