0:00

In this video, we're going to

Â define the binomial distribution, discuss its properties,

Â and list conditions required for a

Â random variable to follow a binomial distribution.

Â We will also calculate probabilities under the binomial distribution

Â using web applets, R, as well as doing hand calculations.

Â Finally, we're going to evaluate characteristics of the binomial

Â distribution, such as its mean and its standard deviation.

Â 0:37

These experiments measured willingness of participants to obey an authority figure

Â who instructed them to perform acts that conflicted with their personal conscience.

Â Here is the setup.

Â The experimenter orders the teacher to give

Â severe electric shocks to a learner each

Â time the learner answers a question incorrectly.

Â The teacher is the subject of the study and

Â the learner is actually just an actor and the

Â electric shocks are not real but pre-recorded sound is

Â played each time the teacher administers an electric shock.

Â So they actually think that they're really shocking these people.

Â Milgram found that about 65% of people

Â would obey authority and give such shocks. Over the years, additional research

Â suggested this number is approximately consistent across communities and time.

Â 1:29

Each person in Milgram's experiment can be thought of as a trial.

Â A person is labeled a success if she refuses to administer

Â a severe shock, and failure if she administers such a shock.

Â Since only 35% of people refused to administer such

Â a shock the probability of success is p equals 0.35.

Â Note that we're just defining success and failure

Â here as we like because in the remainder

Â of the analysis we're going to focus

Â on people who refused to administer the shock.

Â 2:09

Suppose we randomly select four

Â individuals to participate in this experiment.

Â What is the probability that exactly one

Â of them will refuse to administer the shock?

Â Lets name our four individuals, Anthony, Brittany, Clara, and Dorian.

Â We'll refer to them as A, B, C, D respectively.

Â We're interested in one out of four people refusing

Â to administer the shock and there are multiple scenarios

Â where this can happen. So lets run through them.

Â In scenario number one we have four people in our experiment.

Â And let's say that the first person refuses to administer the shock,

Â and the remainder of them all do not refuse to administer the shock.

Â The probability associated with refusing is 0.35, and

Â the probability associated with all the rest is 0.65.

Â Since we're saying that the first person will

Â refuse, and the second person will shock, and the third

Â person will shock, and the fourth person will shock, when we'd

Â say and, and these are remember independent trials because since

Â these are a random sample of people, we multiply these probabilities.

Â So the probability associated with this first scenario,

Â first person refuses and everybody else shocks, is 0.0961.

Â What other scenario can

Â we think of?

Â Well, again, we have these four people that are our placeholders.

Â Let's say that the person shocks. The second person refuses.

Â And the remainder of them shock.

Â The probability of shocking for the first person is 0.65, the

Â probability of refusing is 0.35, and the remainder is 0.65 as well.

Â Once again, we want to multiply these probabilities and we yet arrive

Â at the same probability for the overall scenario.

Â So even though the order has changed, the overall probability has not changed

Â because the order in which you multiply numbers does not change the product.

Â 4:03

Scenario three would be one where you have again four people.

Â The first two people shock, the third person refuses, and the last one shocks.

Â So the probabilities are 0.65 for the first two, 35 for

Â the third and 65 for the last one.

Â Multiplying these probabilities once again gets us to the same answer.

Â Lastly, scenario number four we again have our four people.

Â This time we're going to have the first three people

Â shocking A B and C and D not, refusing to shock.

Â The probability associated with shocking is 0.65 for the first three.

Â And 0.35 for the last person.

Â Once again we multiply the probabilities since these are independent outcomes, and

Â we're looking for the joint probability, and the answer once again is 0.0961.

Â So what's going on here?

Â What we're saying is that, the possible scenarios could be scenario number 1

Â or scenario number 2 or scenario number 3 or scenario number 4.

Â These are disjoint scenarios, disjoint outcomes.

Â They can't all happen at the same time. Therefore when we say or, we add the

Â probabilities, and therefore we find that the overall probability that exactly one

Â person out of four refuses to administer the shock is 0.3844.

Â We could have actually arrived at this answer as the probability

Â of the first scenario or any scenario times the number of scenarios.

Â So if we didn't have to go through the scenarios one by one

Â for illustrative purposes, after we were done

Â with the first calculation we could quickly

Â try to figure out how many scenarios there are and simply multiply the probability

Â of one scenario with the number of scenarios to arrive at the same answer.

Â This is a perfect setting for the binomial distribution,

Â as this distribution describes the probability of having exactly

Â k successes in n independent Bernouilli trials with probability of success, p.

Â We show that this probability can be calculated as the product

Â of the number of scenarios times the probability of a single scenario.

Â The probability of a single scenario is simply p to

Â the k times 1 minus p to the n minus k.

Â Let's decipher what this means. This means the probability of success to

Â the power of number of successes, that was our k.

Â Multiplied by the probability of failure, to the power of number of failures.

Â To find the number of scenarios

Â we actually enumerated each possible scenario,

Â but this was only feasible since there were only four of them.

Â And to be frank it was little tedious and boring as well.

Â If there were many more, say we were looking for how many scenarios for

Â four success in 100 trials, this method would be very tedious, and

Â also very error prone. Therefore we usually use an alternative

Â approach, namely the choose function which is useful for calculating the number

Â of ways to choose k successes in n trials. To evaluate this function,

Â we divide n factorial by, by k factorial times

Â n minus k factorial. Let's give a couple examples here.

Â 7:18

Say you want to find how many scenarios yield one success in four trials.

Â Here n is 4, k is 1; therefore, n choose k is 4 choose 1, which

Â is 4 factorial divided by 1 factorial times 4 minus 1 factorial.

Â Expanding these out we get 4 times 3 times

Â 2 times 1 in the numerator and 1 factorial,

Â so that's 1 times 3 factorial, 3 times 2 times 1 in the denominator.

Â A little bit of simplification here.

Â And we find that there are four possible scenarios.

Â we already knew that from the earlier example anyway.

Â Let's take a look at another example.

Â How many scenarios yield two successes and nine trials?

Â N is equal to 9, k is equal to 2. Let's take a look to

Â see if we can enumerate these easily. Just like we did with the earlier example.

Â The first two might be successes and the reminder failures.

Â The first one might be a success.

Â The second one a failure.

Â The third one a success and the remainder failures.

Â We could also have the first and the fourth one successes

Â and the remainder are failures and this could go on and on.

Â Obviously this is not the way to go, so we'll use the

Â choose function.

Â In this case 9 choose 2 is 9 factorial divided by 2 factorial times 7 factorial.

Â And we can expand this out to 9 times 8

Â times 7 factorial, divided by 2 times 1 times 7 factorial.

Â We purposefully didn't expand everything out, since

Â the 7 factorials cancel easily, and we're

Â left with 9 times 8 divided by 2.

Â So that's 72 divided by 2, a total of 36 scenarios.

Â These hand calculations are nice, but to

Â speed things up we can also use computation.

Â In R the associated function is also called

Â choose and it takes two arguments n and k.

Â So choose 9 comma 2.

Â So that's n is 9, k is 2 actually yields the same

Â 36 scenarios.

Â 9:28

Putting all of this together, if p

Â represents probability of success, 1-p represents probability of

Â failure, n represents number of independent trials, and

Â k represents the number of successes, the probability

Â of k successes in n trials can be thought of n choose k That's the number of

Â scenarios times the probability of one scenario made up of p to the kth power

Â times 1 minus p to the n minus kth power. And remember, n choose k is

Â simply n factorial divided by k factorial times n minus k factorial.

Â 10:07

Now that we know how to apply these formulas

Â and calculate binomial probabilities lets pause for a moment,

Â step back and think about what does it take

Â for a random variable to follow a binomial distribution.

Â One, the trials must be independent.

Â Two, the number of trials, n, must be fixed.

Â Three, each trial outcome must be classified

Â as either a success or a failure.

Â And four,

Â the probability of success, p, must be same for each trial.

Â This last condition actually goes hand in hand with

Â the first one, because if you have independent trials, then

Â you can be reasonably certain, that the probability of

Â success is going to be the same for each one.

Â According to a 2013 Gallup pool, worldwide only 13% of employees are engaged at work.

Â Engaged meaning psychologically

Â committed to their jobs and likely to

Â be making positive contributions to their organizations.

Â Among a random sample of ten employees, what is

Â the probability that eight of them are engaged at work?

Â First, let's parse through the information that we're given.

Â We're told that we have ten employees, so n is equal

Â to ten, we're also told that 13% of them are engaged.

Â So probability of success is 0.13

Â then the probability of failure will be the complement of this 0.87.

Â This value's going to come in handy during our calculations.

Â Lastly we're looking for eight successes.

Â The probability that eight of them are engaged.

Â So we set our number of successes k equal to eight.

Â 11:44

We can find this probability using the binomial distribution, because

Â we actually meet the conditions required for the binomial distribution.

Â We have a random sample of employees, therefore the

Â independence condition for, or the independent trials condition is met.

Â And since we have independent trials, the probability of

Â success is going to be 0.13 for each employee.

Â For each employee there are only

Â two possible outcomes, either they're engaged or

Â they're not engaged.

Â And lastly, we have a fixed number of trials, n is equal to 10.

Â Therefore, to find the probability of eight successes

Â in ten trials, we would first calculate the

Â number of scenarios using ten choose eight, and

Â then multiply that by the probability of one scenario.

Â Probability of success 0.13 to the 8th

Â power, the number of successes, times the probability

Â of failure 0.87 to the number of failures 2.

Â Expanding out the choose function gives us ten factorial in

Â the numerator and eight factorial times two factorial in the denominator.

Â The rest of it is the same.

Â Expanding this even further, in the numerator would get 10 times 9 times

Â 8 factorial divided by 8 factorial in the denominator times 2 times 1.

Â And again the rest of the equation is the same.

Â The 8 factorials are going to cancel.

Â Therefore, to calculate the number of scenarios, all

Â we need to do is multiply 10 by 9.

Â That's 90; divided by 2 gives us 45 different scenarios, each

Â with a probability of 0.13 to the 8th times 0.87 to the 2nd power.

Â 13:23

The result is a pretty tiny probability,

Â so if that's what you had guessed earlier, you're on the right track.

Â Why is this a pretty low probability?

Â Well, out of ten employees, we would expect so much fewer

Â employees to be engaged than eight if the probability of success is only 13%.

Â That's why what we're looking for here is a highly

Â unlikely outcome, and highly unlikely means a very low probability.

Â 13:51

We can also calculate the same probability using R.

Â And to use R here we would use the dbinom

Â function, where the first argument is the number of successes.

Â The second argument is our sample size or our number of trials, 10.

Â And the third argument is the probability of success, 0.13.

Â And once again we get to the very same tiny probability as our response.

Â One other approach is to use our distribution calculator applet.

Â So let's take a look to see how we

Â can work with the binomial distribution on this applet.

Â 14:23

First, we select our distribution to be the binomial.

Â We can choose our n and the default was actually

Â good enough, n is equal to 10 for our scenario.

Â And our probability of success is 0.13, so if we slide that down to 0.13.

Â Remember that we're looking for eight successes, so we can

Â slide our slider for the number of successes over to 8.

Â However, we're not looking for a tail,

Â we're actually looking for exactly eight successes.

Â We can't even see the shaded area here anymore, because the probability of

Â eight successes is very, very low, so it's almost invisible on our plot here.

Â But we can see down here that the probability

Â is calculated to be that very same low probability.

Â Let's take a moment to look at the plot that we're seeing here.

Â Each bar

Â represents a possible outcome and the height of each

Â bar in the plot represents the probability of that outcome.

Â So for example if were looking for a different number of successes,

Â say, what's the probability of getting exactly two successes in 10 trials.

Â With the probability of success as 0.12 we would be looking at

Â the height of the bar corresponding to a number of successes of 2.

Â And that would

Â be about 25%.

Â Among a random sample of 100 employees, how

Â many would you expect to be engaged at work?

Â Remember, p is equal to 0.13.

Â Easy enough, the expected number of engaged

Â employees is going to be 100 times 0.13.

Â So n times p, 13, or more formally, the expected value, or the mean

Â of the binomial distribution, is simply equal to n times p.

Â But this doesn't mean that in every random sample

Â of 100 employees exactly 13 will be engaged at work.

Â In some samples the number of engaged

Â employees will be fewer, and in others, more.

Â So much would we expect this value to vary?

Â As usual, we can quantify the variability

Â around the mean using the standard deviation.

Â And for a binomial distribution, the standard deviation is defined as

Â the square root of n times p times 1 minus p.

Â And plug in the values from the original survey, we would expect the square root

Â of 100 times 0.13 times 0.87, 3.36. This means that 13 out of

Â 100 employees are expected to be at

Â engaged at work, give or take approximately 3.36.

Â Note that the mean on the standard deviation of a

Â binomial, might not always be whole numbers, and that's alright.

Â These values represent what we would expect to see on average.

Â