Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

35 ratings

Johns Hopkins University

35 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so let's move on to another very important distribution,

Â maybe not quite so important as the normal distribution but awfully darn important.

Â Maybe, maybe the second most important distribution.

Â I think a compelling case could be made for

Â the Poisson distribution being the second most important distribution.

Â I don't know, I have leanings in that direction.

Â Okay, so the Poisson distribution is used to model counts.

Â The Poisson mass function is this guy, lambda to the x,

Â e to the native lambda all over x factorial,

Â where this is the probability x takes value x for parameter lambda.

Â And it turns out that lambda is an easy parameter because it's just the mean,

Â it's expected value of x is lambda.

Â But what's interesting about the Poisson random variable

Â is that its variance is also lambda.

Â Well, we knew its variance had to be some function of lambda because the Poisson

Â distribution only depends on the one parameter.

Â But it is kind of neat that it works out to the mean and

Â the variance are equal in this case.

Â And then notice in this case, x ranges from 0 to infinity.

Â So you might use Poisson's to model counts that are sort of unbounded.

Â So, however many people show up at a bus stop,

Â sure you know there's some bound, there's only so many people in the world.

Â But for all intents and purposes, that's an unbounded number, you can't really put

Â a number on what the limit of that really is in a meaningful way.

Â Of course it's bounded by the number of total people, but It's kind of

Â conceptually unbounded in a different way than if I flip a coin five times and

Â I know I'm going to flip it five times that's a different problem.

Â Right?

Â We know that the most number of successes I can get is five.

Â Whereas if you're modeling how many people show up at a bus stop, then you really

Â kind of don't know a realistic version of what that upper limit would be.

Â So some uses for the Poisson event time data,

Â if you're counting anything like the number of people that show up at a bus

Â stop, the number of photons that are detected from a nuclear reactor

Â per given unit time, these are all reasonable thing to model with Poisson.

Â So, radioactive decay is the classic one because you can kind of demonstrate via

Â some limiting arguments that radioactive decay does follow a Poisson distribution.

Â Survival data, it turns out there's this deep connection between a lot of these

Â classic survival analysis models and Poisson models.

Â Survival analysis being modeling the time until some event.

Â Classically time until death for modeling for looking at diseases.

Â Any kind of unbounded count data you're going to model as Poisson.

Â Contingency tables, so if collect a bunch of people,

Â collect a bunch of characteristics on them, and just create cross classified

Â tables of how many people fell into this different classification.

Â That's called a contingency table and it turns out modelling the counts of

Â contingency tables, you model them with Poisson usually.

Â And then binomials, which are clearly not Poisson, if you have n being large and

Â p being small, then people tend to model them as Poisson.

Â In fact, people do this so frequently that they don't even

Â bother to mention that they are approximating a binomial with a Poisson.

Â So if you're modeling say for example a disease that is very rare and

Â you have large sample sizes.

Â Let's suppose you're studying autism and vaccination rate so

Â that the percent of the kids with autism is very small,

Â the number of kids that gets vaccinated is very large.

Â So if you were to model the autism rates,

Â you would do it probably more likely with a Poisson than you would with a binomial.

Â I take that example because I saw someone doing that just the other day,

Â that they were studying that question.

Â Okay.

Â So this is where it comes from,,

Â the Poisson distribution comes about from the so-called

Â Poisson processes, and Poisson processes,

Â if you define a like mean number of events per unit time.

Â You let the kind of a window h that you're looking at be very small,

Â and we can assume that in that interval of length h

Â the probability of an event occurring is lambda times h.

Â While the probability of more than one event is negligible.

Â So imagine you are monitoring your best stop using,

Â I going to look at windows of 0.1 second, and

Â then only one person can show up at a time in that 0.1 second.

Â So let's just assume two people are showing up holding hands or

Â something like that, and it's a commuter day where everyone's coming by themselves.

Â That if you take a small enough time window, you're only going to get one

Â person showing up to the bus stop at any given time.

Â And then whether or not a person shows up in one interval doesn't impact whether or

Â not a person shows up in another interval.

Â That assumption maybe, all of these for the bus stop example are suspect.

Â But any rate these are the underlying assumptions for

Â something to be a Poisson process.

Â And then if you take an interval and you count the number of

Â events that occur in that interval that's a Poisson random variable.

Â And that's the original derivation of the Poisson random variable through

Â Poisson processes.

Â So I want to emphasize that the idea of studying rates and

Â using Poisson Is really kind of highly tied together.

Â The lambda parameter from a Poisson has a unit, right?

Â If I'm looking at radioactive decay lambda is the decay per unit time and

Â t is the number of that many time points I monitor.

Â So If I wanted to look at radioactive decay per minute,

Â t would be in minutes and lambda would be the rate per minute.

Â So note, lambda is expected value of x over t,

Â is the expected count per unit time, and t is the total monitoring time.

Â So we always use the Poisson distribution this way.

Â Where we think of sometimes t is 1, in which case it drops out.

Â But in our heads, we're always thinking of a Poisson

Â random variable as being a model for rates.

Â Here's the Poisson approximation to the binomial, if you ever need to use it.

Â We assume that lambda is n times p and x is a binomial np.

Â Recall, we're making the lambda np.

Â And n gets large and p gets small, but lambda stays constant.

Â Then you can approximate x as Poisson.

Â Let's do it.

Â So imagine the number of people that show up at a bus stop is Poisson

Â with a mean of 2.5 per hour.

Â And you are watching the bus stop for.

Â Oh, I'm sorry, this is not an example of a Poisson approximation of a binomial.

Â This is just a regular Poisson.

Â So we want the number of people that show up

Â at a bus stop is Poisson with a mean of 2.5 people per hour.

Â We watch the bus stop for four hours, what's the probability that three or

Â fewer people show up the whole time?

Â That is exactly just a Poisson probability.

Â 3, remembering that when we do p for probability distribution,

Â it does three or less, which is what we want, so we put three in there.

Â Lambda is 2.5 times the number of hours that we monitored,

Â 4, and that works out to be 1%.

Â Okay, so see how we used it?

Â Lambda was the event per unit time, and

Â t was the number of units of time that we measured.

Â Okay, so there's your Poisson example.

Â Let's go through an example of the Poisson approximating the binomial.

Â We flip a coin with success probability 0.01, so p is small.

Â We flip it 500 times.

Â What's the probability of 2 or fewer successes?

Â So again, pbinom(2, size equals 500, prob equals 0.1.

Â Here's the exact calculation.

Â It works out to be 0.12.

Â Here's the Poisson approximation, ppois(2 lambda = 500 times 0.01.

Â n times p, and that works out to be 0.1247.

Â So pretty close.

Â So that's the Poisson approximation of the binomial.

Â So here we just showed you exactly how accurate they were in this specific case.

Â And then, in your regression class, you will actually cover modeling

Â counts using kind of a Poisson version of regression.

Â So it's a very convenient model, and it's great for modeling things like rates.

Â And I would also mention this, Poisson approximation to the binomial is so

Â common that people don't even acknowledge that they're doing it.

Â It's just sort of done immediately in a lot of applications,

Â particularly certain epidemiological applications.

Â If you're studying something like an infection or something like that that's

Â rare relative to the size of the population you're studying,

Â people just automatically use Poisson approximations.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.