0:22

Now, what do we mean by that?

Â We mean that, with respect to some sort of random mechanism,

Â you can say that the estimates you've created represent the full population.

Â So the interpretation can be in terms of repeated sampling.

Â In other words, if we drew the sample over and over and over in the way that we did

Â for the particular probability sample that we drew,

Â we made an estimate from each one, and we looked at that ensemble of estimates.

Â Then they would make sense.

Â I mean, they would average out to the population value, for example.

Â 1:06

The interpretation can, on the other hand, be in terms of models.

Â And we have to do this for non-probability samples because

Â we don't have this repeated sampling mechanism to fall back on,

Â a repeated sampling mechanism that we are under control of.

Â The modeling consists of either modeling how units show up in the sample,

Â which could be in some kind of quasi-randomization way, or

Â in terms of the structure of the y-values that we're measuring in the population.

Â Either one of those is a possibility for modelling.

Â Now, in probability sampling, we say an estimator is

Â unbiased if, over all the random samples that could be selected,

Â the values we compute from each of those samples averages out to the census value.

Â 2:09

Another important concept is consistency.

Â So an estimator is said to be consistent if, as the sample

Â size gets big, the estimator gets closer and closer to the census value.

Â This is actually a more desirable property than unbiasedness.

Â This is saying that as the sample sizing increases, we get closer and

Â closer to what we're trying to estimate.

Â Unbiasedness can hold even if, here's the population value,

Â if our estimates bounce all around it by quite a large distance.

Â 3:05

And we want these things to be true even for

Â complicated statistics, like medians and quartiles.

Â Now, on probability samples,

Â we've met various kinds in Course 4, which was about sampling people and records.

Â Some examples that you learned about there are simple random sampling,

Â stratified simple random sampling, stratified systematic random sampling,

Â two-stage sampling, multi-stage sampling.

Â We can sample with probabilities proportional to some measure of size.

Â This is often done for businesses or institutions.

Â All those are possibilities, and all those

Â 3:53

Now on the other hand, non-probability samples are often used to make inference,

Â and we've got to think clearly about what we're doing there.

Â As I said a minute ago, the unbiasedness and

Â consistency properties have to be with respect to some sort of a model.

Â 4:11

And we need to be able to estimate the population model from the sample.

Â So in that sense, our sample has to be projectable to the full population,

Â even though we didn't obtain it in a random way.

Â So if a sample has a serious holes in coverage, then

Â it's hard to justify saying their estimators are aiming at the right thing.

Â They may be biased. They may be estimators, but

Â not of the full population that we might be interested in.

Â For example, we've got a volunteer web panel and

Â it has no African-American women over 70 years old.

Â Well, if those are an important part of the population and

Â they behave differently, according to whatever we're measuring,

Â than the rest of the population, then we've got trouble.

Â 5:14

Now, what types of non-probability samples are there?

Â There are many. I've just listed three general

Â categories here.

Â One might be a convenience sample.

Â For example, if you take all your students in an introductory psychology course and

Â you experiment on them in some way, that's a convenience sample.

Â Those students don't represent the entire population of a country or

Â even a subset of the country.

Â 6:07

If we recruit persons from those who visit particular websites or

Â a particular website, that's another way of doing it.

Â A popup ad comes up and says, do you want to be part of a survey?

Â You say yes and you do it.

Â That would be a volunteer panel.

Â A little more organized way of doing this is called

Â a river sample where you post your ads on some carefully

Â selected set of websites where people may visit and

Â ask them if they want to be part of a panel that does surveys.

Â And they go through some steps to actually get in the panel.

Â But these are not random samples or probability samples of an entire

Â finite population because the sample doesn't have control over who shows up.

Â 7:07

On the other hand,

Â there are probability samples that really suffer from non-response.

Â In fact, they may have such a huge amount of it that you begin to wonder,

Â should we even treat them as probability samples?

Â For example, if you do an overnight election poll these days

Â in the US by telephone, and it doesn't matter if you include

Â 7:33

landlines and cellphones, you'll still get the same sort of phenomenon.

Â You'll only get about 5% of the people

Â to actually answer the phone and cooperate with you.

Â Well, a 5% response rate is hardly what you'd call a good sample.

Â And it's hardly what you would be willing to defend as a probability sample.

Â Now, even if we draw a probability sample, we may have coverage errors, but

Â certainly, we'll have coverage errors in non-probability samples.

Â It could be under- or over-coverage,

Â depending on the frame that we're drawing from.

Â What we try to do to combat that is

Â do something called calibrating the weights with auxiliary data.

Â So what we need is target population control totals that we know,

Â not necessarily for every individual in the population,

Â but at least we know grand totals for the population.

Â And we can adjust our sample weights so

Â that weighted estimates of these control

Â variables will match the population or census counts.

Â So if we do that, then what we hope is that the sample can be

Â projected to the target population using those covariates.

Â And we typically had to put a model-based interpretation on that.

Â So for example, some of the covariates might be counts if

Â we're doing persons, human population.

Â Counts by age, race, ethnicity, and

Â gender might be used as calibrating variables.

Â So the units we've got in our sample have to be expanded

Â using weights to represent the full population.

Â And at least we can do it in such a way that the weights will reproduce

Â the population control totals.

Â It doesn't necessarily mean that we do it for

Â all those other y-variables we're trying to estimate.

Â But if we can do it for

Â the control totals, then that's a step in the right direction.

Â So we'll learn more on how to do that in later sections.

Â