A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

135 ratings

Johns Hopkins University

135 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Greetings and welcome back. In this section, we'll look at computing the sample size necessary to achieve a desired level of precision, a desired margin of error for single population quantities like a population mean or a population proportion, for example.

So upon completion of this lecture section, you will be able to create a table relating sample size to precision for an estimate of a single population quantity, and solve for the necessary sample size to get a desired level of precision, i.e. margin of error. So the idea of this is in order to justify a funding request for a larger study, a researcher needs to both demonstrate that the study allows for the estimation of outcomes with a good margin of error, and that the study can be performed given the requested budget, that the sample size request is reasonable.

Designing a study, such the results have a certain margin of error requires some speculation in advance about what the study results will be before the study is done. So this is a sticky part about doing such computations, as you have to have an educated guess going in to the process of what your study results will be before you've done the study. So, where can this information come from?

Well, sometimes pilot studies, like some examples we gave in part A, are done. Which are low to no budget studies done on a restricted number of participants to get some data on the table in order to design a larger study. Can also come from other research.

Researchers who studied a similar population for different purposes but may have estimates of some qualities that you would need.

Or, in the case where nobody's done any research related to what you're looking at, educated guesswork. And that's, that's hard to do, but sometimes it's the best that can be done.

So let me give you an example using the results from a pilot study to design a bigger study. So recall the length of stay study with 30 subjects. Suppose we were actually recruiting the subjects. And following them up and length of stay was the big outcome of interest. But we actually needed to do more than solicit their patient record, so that this study could be costly. Well, recall when we actually looked at the pilot results, the researcher had studied 30 persons.

And the average length of stay in the sample of 30 was 6.3. There was a fair amount of variability amongst the 30 length-of-stay measurements. And so the margin of error for that study was 2 times 7.5 days, the standard deviation over the square root of the sample size. So this is just the standard error in parentheses here. That turned out to be 2 forms 1.4 days or 2.8 days.

But suppose we use this pilot study as a starting point. Said well, in order to estimate the margin of error for a given sample size we need an estimate of the standard deviation of individual values in the population we're studying. The working one we have. And it may not be a great one, but we'll talk about that in a minute, is the 7.5 days from this study on 30 people. And so using that we can write the margin of error for studies dealing with length of stay and the population we're sampling from as a function of sample size like this. For a given sample size our estimated margin of error would be 2 times the estimated standard deviation of 7.5 over the square root of the sample size. So for example, if we were looking at, based on these results, an estimated margin of error for a study with n equals 100. The estimated margin of error would be 2 times 7.5 over the square root of 100, which equals 2 times 0.75.

So we would estimate, be able to estimate the mean length of stay within a margin of error of plus or minus 1.5 days. If we thought that was a little wide, wanted to be more precise in our estimate, we could see what would happen if we looked at a study with 250 people. What would happen to our margin of error?

We get a margin of error, when all the dust settles, and you can check my math, of plus or minus 0.95 days, so almost one day. Plus or minus one day would be how we get our confidence interval. Taking our mean estimate and adding or subtracting plus or minus 0.95 days to that.

Following that type of logic, and you can use a spreadsheet program or something like that, you can easily make a table like this, where you actually look at the expected margin of error for different sample sizes. But then, given that, you know, our estimate of standard deviation is based on a small study to begin with, there's some uncertainty in that. So it's us-, it's usual practice to actually look at some other possibilities both above and below the estimate, just to get a sense of the, what the possibilities for margins of error are with combinations of sample size and allowing for the uncertainty in our estimate of the standard deviation. So we might produce a table like this, and then what we could say to our funding agency is, suppose we desired, or we desired to get a margin of error of one day or less. We could say, well, if you give us the funding to recruit 300 patients, we're pretty much good under all anticipated standard deviation scenarios. This is a little above one day, but it's very close. So 300 will cover all the bases. But if you're not willing to pay for 300 subjects, if you pay for 200, or somewhere between 200 and 300, for at least two of the standard deviation scenario, we'll be okay. However, if you cut our budget such that we can only sample 100 people, we're going to be way off the mark.

You could also, if you were designing a study and really wanted a point single estimate of the sample size to get a desired margin of error, you could solve for it relatively easy, algebraically. If you recall, you know our estimated margin of error was a function of our sample size, is two times our sample standard deviation, or the square root of the sample size. And we could solve for n that would give us a margin of error of 0.5 days. So for our data, it'd be 2 times the estimated 7.5 days standard deviation, divided by the square root of n, equal to 0.5. Do a little algebra to solve that.

And we get the square root of n and just rewriting it in the opposite order here but equals 2 times 7.5 over 0.5. That's actually, when you do the math, that's 30. So we get the square root of n. Our desired, our necessary sample size is 30. We square both sides, we get N equals 30 squared, or 900 people. So we need 900 people, an estimated 900 people, to get in an estimated margin of error of 0.5 days. Let's look at another example. Recall the pilot study example from section A on 30 participants given a drug and follow to see who experienced a minor reaction and nine subjects had the reaction, so, in that study our estimated proportion was 30%. Our margin of error was 2 times the estimated standard error based on 30, which is the proportion, who had the reaction times the proportion who didn't. Over the square root of the sample size, should have stuck a 30 in there, sorry. 0.3 times 0.7 over 30. But, this actually is fortuitous, because this is what I was going to write down here. If we assumed our starting guess for the proportion, or expected proportion of people who have the reaction in the population as a whole is 30%, then a margin of error for other studies from the same population for different sample sizes could look like this. So, for example, if we wanted to estimate, based on these pilot results, the margin of error for a study with 150 patients, it would be two times the square root of 0.3 times 0.7, over 150 which equals 0.075. So the margin

of error here is plus or minus 7.5%. We increase the sample size to 300, two times square root of 0.3 times 0.7, over 300. If we do the math on that, that gives us the margin of error of 5%. What, so our confidence interval would be created by taking the resulting estimate from the study based on 300 and then subtracting 5%. So just like we did with the previous example, we could easily make a table that explored the margin of error, both as a function of sample size, and the expected proportion with the outcome. And again, since that 30% was only based on 30 persons there's clearly a lot of uncertainty we stated before the confidence interval was very wide. We wouldn't necessarily use all values in the confidence interval, but we would allow for a little bit of uncertainty, at least, in doing these computations. And we could then look at the trade-offs between margin of error and the expected proportion in such a table.

We could also easily solve again for the sample size to get a desired margin of error. So for example, so suppose we want to be able to estimate the me, the proportion, sorry, a victim of my own cutting and pasting.

Proportion with the reaction within plus or minus 2.5%. Well, we could set up an equation just like we did before. Our margin of error, by the function of sample size using the estimated proportion from the sample of size 30 looks like this, and we want this to equal 0.025. So the first thing we might do is divide both sides by two. Then we get square root of 0.3 times 0.7 over n equals 0.025 over 2 which is 0.0125. Square both sides to actually get rid of that square root, so we get 0.3 times 0.7 over n equals 0.0125. Do some cross multiplication, and this'll bring us up here, and 0.0125 times n equals 0.3 times 0.7, sorry I meant actually squared. I need to fix this here. When we square both sides, that should be squared 0.0125. So this is 0.0125 squared times n, and then n equals 0.3 times 0.7 over 0.0125 squared. And when you do this, you actually get the necessary sample size. If you do this and round up, based on the result, you get a necessary sample size of over 1,300 to get a margin of error plus or minus 2.5%. Need a fair amount of subjects. You've probably heard reference to margin of error frequently in the media when they talked about the results of a poll, and they say something like, this poll was conducted with a margin of error of plus or minus 3% or plus or minus 2.5%. And they designed it using the exact same approach that we just did here. They figured out how many people they needed to poll to get the margin of error within that 3% or 2.5% range. So in summary in order to compute the margin of error for a given sample size you will need estimates of the standard deviation for continuous measure. The proportion for binary outcome. And we didn't actually do an example with incidence rates. And the, thematically the computations are the same but they're a little trickier because we have to also estimate the follow up time. But what you need in order to compute a margin error is an estimate of the incidence rate itself for the time to event outcome and then some estimate of the follow up time in the study you propose. But the principle is exactly the same, the bigger the sample size, the smaller the margin of error. So these estimates of these aforementioned qualities, of these aforementioned quantities come from either a small pilot study, secondary results from other research, or educated guess work. But for single population level quantity, it's pretty straight forward if you have the appropriate estimate, to see what the margin of error looks like as a function of your sample size.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.