So what's going on under the hood here, R is actually if you can
look at the model matrix function, is actually what's happening under the hood.
It's taking that variable gender which was just a factor variable, and
it's turning it into ones and zeros.
One when you're male and zero when you're a female.
And then it's using that quantitative variable to fit the actual
regression model just like it was done before.
You can actually do this with even more categories.
So if you look at the tissue type variable,
it has a large number of categories here, and so
you can define multiple dummy variables, one for each tissue type.
So if you say, is the tissue type equal to adipose, then only for
the first sample is that true.
You could also say, is the tissue type equal to adrenal.
So that's only true for the second sample and so forth.
So you could define a whole series of dummy variables, and
R will actually do this for you.
So if I relate the first genes expression to this factor variable that has multiple
levels, R is going to actually fit coefficients for every one of these.
So the baseline intercept term is the one that didn't get if in the model.
So you have to be a little bit clever here, you have to figure out which of
these tissue types doesn't actually appear in the list of coefficients.
So it turns out its usually the first one, so adipose is missing, so
this is sort of the average count in adipose.
And then you can interpret this estimate as being the difference in
average count between adipose and adrenal.
So if you add the two together that's the average count for the adrenal samples.
So you can fit this more complicated example.
The other thing you can do is adjust for variables and
so we talked about adjusting for variables.
And the way that you can do that is basically by expanding the model formula.
So here going to add,
we're going to add a model that has both an age term and a gender term.