0:03

Okay. Welcome back, troops.

We're going to start talking about student or

as I like to say Gosset's t distribution.

So, the reason I call it Gosset's t distribution is, it's usually

called student's t distribution because Gosset

published under the pseudonym's Student in 1908.

So it was actually Gosset's distribution.

He laid the found work for the actual

distribution, and then I believe that Fisher actually proved

the mathematics, some of them, the finer scale mathematics.

But I wanted to talk a minute about Gosset, because

he's a pretty interesting character in the analogs of Statistics.

So Gosset was a researcher and he worked at the Guinness Brewery in Ireland.

And when he created the t distribution, he was actually working for Guinness.

And at the time, he had actually several

really brilliant researchers working for him and he wouldn't

let them publish under their real names.

So that's how Gosset wound up publishing under a pseudonym

and make this sort of landmark discovery as a researcher.

It's interesting, so the reason he came up

with this distribution is because for him, the

central limit thereom just was simply not rich

enough to describe the problems he was looking at.

So he was working with these small batches in the science

of brew making and it wasn't adequate

to assume that things were heading to infinity.

So he came up with this distribution and we're all the more fortunate for it.

One thing I really like about Gosset is whenever you read about him, he was

apparently a tremendous nice guy, extremely humble,

and he made several major discoveries in Statistics.

He came up one of the first uses of Poisson distribution.

And then

he also, in the Guinness company, he rose up pretty high.

He was a head brewmaster at its, I think,

its London Brewery by the time he had retired.

So anyway, he's a really interesting character.

And if you get a chance, you should read about him.

Any rate, so he came up with this wonderful distribution called, as

far as I would like it to be called, Gosset's t distribution.

And the t distribution is

really kind of, used when you have smaller sample sizes.

It assumes your data is Gaussian but it

tends to work even if your data is non-Gaussian.

And so the t distribution, it has degrees of freedom.

It's indexed by something called degrees of freedom and

the t distribution which looks like a normal distribution where

someone kind of squashed it down at its tip

and all the extra mass went out into its tails.

Well, it looks more and

more like a standard normal as the degrees of freedom get larger and larger.

2:40

So, how do you get a t distribution?

Say, you wanted to simulate it on a computer,

you would take a standard normal, say, Z here.

And you would divide it by an

independent Chi-squared divided by its degrees of freedom.

So where Z and Chi-squared here are independent standard normal and

Chi-squared random variables , that's how you wind with a t distribution.

So how is this useful?

On the next

slide, we'll look at how we apply this.

Well let's suppose that X1 to Xn are iid normal mu sigma squared, then x bar

minus mu divided by sigma over square root n is, of course, standard normal, right?

Because a linear combinations of normal random variables are themselves normal.

So in this case, X bar is normal.

And because they're iid, we know exactly what the standard

deviation of x bar is.

It's sigma over square root n and we know that its mean is mu.

And so when we shift and scale our non-standard normal by mu and divide

it by its standard deviation sigma over square root m, we get a standard normal.

Hopefully, this should not be news to you at this point in the class.

And then, we also know from earlier on in today's lecture, that n minus

1s squared over sigma squared is Chi-squared

with n minus 1 degrees of freedom.

So, if we take n minus 1

S squared over sigma squared, and divide it by

an additional n minus 1 and square root the whole

thing, we get S over sigma and we've taken

a Chi-squared and divided it by its degrees of freedom.

So S over sigma is the square root

of a Chi-squared divided by its degrees of freedom.

Therefore, if we take X bar minus mu divided by sigma

over square root n, and then divide the whole thing by

S over sigma which if we do the arithmetic works out to

be X bar minus mu divided by S over square root n.

We wind up with a standard normal divided by a

square root of a Chi-square divided by its degrees of freedom.

4:32

Now there's one small thing that we're kind of fudging over.

We haven't shown that the X bar and s are independent, right?

They're from the same data, so it doesn't seem obvious that they're independent.

They are, it's just not immediately clear and let's sweep that under the road.

So, forget about that for the time being, take my word for it.

x bar and S are independent so this exactly has Gosset's t distribution

with n minus 1 degrees of freedom and notice what a basically accomplish is.

So, we saw previously in constructing confidence intervals that X bar minus

mu divided by sigma over n, that that's, you

know, a nice kind of pivotal statistic to work with.

It's useful for generating confidence intervals, we'll

see that it's useful for doing hypothesis tests.

And all we've done is replaced sigma by S. And it's basically saying that

we can take the unknown population variants and replace it with the known

sample variance.

And we get a statistic whose distribution we know, okay?

And by the way, this statistic X bar minus mu S over square

root n, it also limits to a standard normal as n goes to infinity.

Which, you know, of course, the Gosset's t

distribution is the degrees of freedom goes to infinity.

If you look at it, if you plot it it looks

more and more like a normal distribution as n goes to infinity.

So we haven't violated the central limit

theorem or anything like that in the process of doing this stuff.

So, let's actually use this distribution to create a confidence interval.

It's a statistic who under the assumption of normality of the underlying

data, does not depend on the parameter mu that we're interested in.

And therefore we can use it to create a confidence interval for mu.

So let's let tdf alpha be

the alphath quantile of the t distribution.

So, t n minus 1, 1 minus alpha over 2 is, say, the upper quantile from the relevant

t distribution and tn minus 1 alpha over two

is the lower quantile from the relevant t distribution.

And so this probability statement here, 1

minus alpha is equal to the probability that

this statistic lies between those two conference

intervals is then, of course, true, right?

So the probability that this t random

variable lies between the alpha over 2 lower

quantile and the 1 minus alpha over two upper quantile is exactly 1 minus alpha.

Oh, and I should not here, by the way,

because the t distribution is symmetric, the alpha over

2 lower quantile is equal to the negative

of the 1 minus alpha over 2 upper quantile.

This is because the t distribution is symmetric about zero.

So that's why here instead of writing alpha over 2,

I wrote -tn minus 1, 1 minus alpha over 2.

And you'll see why I do that in a second.

So anyway, this probability statement applies here, so we can just rearrange

terms and keep track of flipping our inequalities

around when we multiply by a negative sign.

And we get that X bar minus a t quantile times

a standard error is less than mu and X bar plus

a t quantile times a standard error is bigger than a

mu, that random interval contains mu with probability 1 minus alpha.

But if you look at the form of this interval when I wrote it out this way,

that happens to be X bar plus and minus the

upper quantile from the t distribution times the standard error.

And that's why I took only the upper quantile,

that way, we can write it as plus minus.

8:15

Okay, so that's how we wind up with these intervals.

Estimate plus or minus quantile times standard error.

And that's where it comes from.

This interval assumes

that the data are iid normal, though it's very robust to this assumption.

You know, whenever the data is kind of roughly symmetric

and mound shaped, the t confidence interval works amazingly well.

And if you want, you know, if you

have paired observations, people before and after a

treatment, for example, you can subtract them and

then create a t confidence interval on the difference.

So often, paired observations are analyzed using

this exact confidence interval technique by taking differences.

And often, differences tend to be much more Gaussian-looking,

they tend to be nice and symmetric about zero.

And then for large degrees of freedom, the t

quantiles become the same as the standard normal quantiles.

And so this interval just converges to the same interval that you get as the CLT.

Some more notes.

For skewed distributions, the kind of spirit

of the t interval assumptions are violated.

You could probably show that it still works kind of okay.

And the reason is because those quantiles, the t n minus 1,

1 minus alpha over 2, the quantiles are so far out there,

you know, the t distribution is a very heavy tail distribution that

shoves those quantiles way out there that makes the interval a lot wider.

And then it tends to work kind of conservatively

in a broad variety of settings.

But for skewed distributions, you're kind of violating

the, you know, the spirit of the t interval.

And you're often better off, you know, trying

some things like taking a natural log of your

data, if it's positive, to get it to be

more Gaussian-looking before you do a t confidence interval.

And we'll spend an entire lecture on the consequences

of logging data, so you can wait for that.

But, you know, I would just say, for skewed distribution,

it kind of violates the intent of the t interval, so

maybe think of things, like doing logs to consider it.

And also, I'd say for skewed distributions, maybe

it doesn't make as much sense to center

to the interval around its mean, in the way that we're doing with this t interval.

We're centering it right around the mean.

And then, the other thing, for discrete data, like binary data,

you know, again, you could probably, I bet you could do simulation

studies and show that the t interval, you know,

actually probably works okay for discrete data like binary data.

But, you know, we have lot of techniques for

binary data that make direct use of the binary data.

And you better off for those using, for example, things based

on Chi-squares or exact binomial intervals and that sort of thing.

Because, you know, you're so far from kind of the spirit

and intent of the t interval that is not worth using

regardless, the t interval is an incredibly handy tool.

And I'm sure actually, in some of these cases, it probably works

fine but you're so far from the kind of assumptions at that point.

And you better off using all these other

techniques that have been developed for these other cases.

And that's enough discussion about the t

confidence interval, let's go through an example.

So maybe take a break, go have a Guinness and

well be back in a second. Okay, so welcome back.

So we're going to talk about Guinness' original data, which involve sleep data.

So try not to fall asleep while we're talking about it.

So, Gosset's original data appeared in this journal called Biometrika, with a k.

And Biometrika, interestingly enough, was founded by a person called Francis Galton.

So Gosset was an interesting character.

If you really want to read up on another, you know,

absolutely brilliant, interesting character, read up on Gosset.

He was Charles Darwin's cousin.

He invented the term and the concept regression.

He, you know, invented the term and the concept correlation.

And he invented lots of other things, some good, some bad.

And he, he was just generally rather interesting character.

So any rate, Biometrika was founded

by Francis Galton and that is where Gosset's original

paper appeared and that's where the sleep data occurred.

So at any rate, the sleep data shows the increase

in hours slept for ten patients, on two sleeping drugs.

So, R treats the data as two groups rather

than paired, and I have to admit, I haven't

taken the time to go through and figure out

exactly why, there's a discrepancy between when you read

Gosset's Biometrika paper, which treats the data as

paired, and R treats it as two groups.

And, anyway, I haven't gone through the details

so I'm going to treat it exactly like Gosset's data.

So here is what it looks like as Gosset's data.

So we have patient one, two, up to ten. We have the two drugs and the difference.

13:45

And this will give you our confidence interval manually.

But if you want to go the easier way, R actually has

a function, of course, to do the t confidence intervals because it's one

of the most popular statistical procedures.

So if you t.test and here, difference is

the name of the vector that contains the differences.

And then this dollar sign grabs the relevant output.

So in this case, I want the confidence interval so its $conf.int.

If you omitted the dollar sign when you hit Return,

it would give you lots of information including the confidence interval.

Here it just returns exactly the confidence interval and

you get 0.7 to 2.5 basically. We've talked a

lot about likelihoods so I wanted to talk about how you can use the t

distribution to create a likelihood for 0 mu.

So remember, we're in this kind of hard setting where we have a data that

have two parameters mu and sigma, you

know, the likelihood inherently is a two-dimensional object.

And we showed oh, where you can get a trick

and figure out how to get a likelihood for sigma.

And here I'm going to say, well here, you can do another trick and get another

likelihood for a single parameter but the single

parameter is a function of the two parameters.

So in this, the single parameter is mu divided

by sigma, which is actually quite an important parameter.

Mu divided by sigma is the mean in standard deviation unit.

So it's a unit for equantity and

it's often called the effect size and this is a

nifty little trick to create a likelihood for the effect size.

So if x is normal mu sigma squared and then this Chi-squared random

variable is a Chi-squared random variable with df degrees of freedom, then if you

take x divided by sigma and divide it by the square root of

a Chi-squared divided by its degrees of freedom, then notice we forgot to subtract

off mu in the top.

So, x over sigma still has a mean, in

this case, its mean is specifically mu over sigma.

So we have not taken a standard normal and divided by an

independent square root of an independent

Chi-squared divided by its degrees of freedom.

We took a non-standard normal and divided by a square

root of an independent Chi-squared divided by its degrees of freedom.

Well so it can't work out to be a

t random variable because we haven't satisfied the definition

of a t random variable.

So it's what's called a non-central t random variable.

And in the specific case when mu is zero, we wind up with a t random variable.

And this non-central t random variable also has degrees of freedom

but then it has a second parameter called the non-centrality parameter.

In this case, the non-centrality parameter is mu over sigma.

17:34

The effect size values that we want to plot is, let's say,

we go from 0 to 1, and our length is a 1,000.

Our likelihood values are then

the t density, in this case, R's dt function, t

density function, has an argument ncp, which stands for non-centrality parameter.

So here are, we have our t density.

We plug in our t statistic.

Our degrees of freedom are n minus 1, and then we loop over all of our

non-centrality effect sizes, and that creates a collection of likelihood values.

And then, we want our likelihood values to be

peaked at one so, instead of figuring out what

the exact maximum likelihood is, let's just divide by

the maximum when we grid searched over 1,000 points.

And let's plot our effect size values by

our likelihood values, let's make sure it's a

line by doing type equals l, and let's draw lines at 1 8th and 1 16th.