A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

241 ratings

Johns Hopkins University

241 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 2A: Summarization and Measurement

Module 2A consists of two lecture sets that cover measurement and summarization of continuous data outcomes for both single samples, and the comparison of two or more samples. Please see the posted learning objectives for these two lecture sets for more detail.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in the last section,

we worked really hard to come up with single number summaries,

both of individual samples and comparisons between samples extensively,

to quantify or give a best estimate for

some unknown underlying population quantity or comparison between populations.

But you may be thinking, John,

you know I would feel more comfortable about some of

these depending on characteristics of the study.

For example, I would feel more comfortable and more confident about a comparison

based on hundreds of people in each of the groups

versus tens of people in each of the groups.

And I might push you and say, "Well,

why would you feel more comfortable?"

And you'd probably end up saying something like estimates

based on larger samples are more stable.

So, I want you to think about what you might mean by more stable,

and that's what we're going to investigate first in this module.

We're going to rigorously define something called

sampling variability or something that measures

the stability of a statistic based on

a single sample as an estimate amount of some underlying truth.

And we're going to look at sampling variability through the idea that just by chance,

in our sampling process,

we could get one of many different samples with

different elements all of the same size from a population.

And understanding how our estimate across these samples,

these different samples we could have gotten by chance,

would vary gives us some insight as to how stable our sample statistic,

like a sample mean,

what proportion or incidence rate is as an estimate of

the underlying true quantity at the population level.

And then we're going to show,

that you know you may seem strange,

well we're only in real life research going to take

one sample from each of the populations we're studying,

so how can we have an understanding of how our estimate would vary

across multiple samples given that we only have one sample.

But we're going to show a powerful mathematical result that

will pretty much tell us what would have happened if

we had taking multiple samples and allow us to structure

that and quantify it based on the results of single samples.

And so we're going to build towards something called intervals,

confidence intervals that allow us to take our best estimate of a quantity,

like a mean or proportion from a single sample,

and then put uncertainty bounds on it to come up with an interval that reflects

our confidence about our ability to estimate

the underlying truth that we can directly observe.

And this has going to become particularly critical when we start comparing

two or more populations through two or more samples,

and we want to put uncertainty bounds on the differences between those populations.

So we'll explore that in detail in this section as well.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.