0:00

>> We're now going to review some of the basic concepts from probability.

Â We'll discuss expectations and variances, we'll discuss Bayes' theorem, and we'll

Â also review some of the commonly used distributions from probability theory.

Â These include the binomial and Poisson distributions as well as the normal and

Â log normal distributions. First of all, I just want to remind all of

Â us what's a cumulative distribution function is.

Â A CDF, a cumulative distribution function is f of x, we're going to use f of x to

Â denote the CDF and we define f of x to be equal to a probability that a random

Â variable x is less than or equal to little x.

Â Okay. We also, for discrete random variables,

Â have what's called a probability mass function.

Â Okay. And a probability mass function, which

Â we'll denote with little p, it satisfies the following properties.

Â P is greater than or equal to 0, and for all events, A, we have that the

Â probability that x is in A, okay, is equal to the sum of p of x over all those

Â outcomes x that are in the event A. Okay.

Â The expected value of a discrete random variable, x, is then given to us by this

Â over here. So, it's the sum of the possible values of

Â the random variable x. These are the xi's, weighted by their

Â probabilities, p of xi. So, that's the expected value of x.

Â If I was to give you an example. Suppose, for example I tosses a dice.

Â So, it takes on 6 possible values. Okay, 1, 2, 3, 4, 5, and 6.

Â Okay. And it takes on each of these values with

Â probability, so that's wp, with probability 1 6th, with probability 1 6th

Â and all the way down 1 6th. So, in this case, for example, the

Â probability that x is greater than or equal to 4 is equal to, well, it's 1 6th

Â for 4, 1 6th for 5, and 1 6th for 6, so that's equal to 1 6th plus 1 6th plus 1

Â 6th equals 1 half. Likewise, we can compute the expected

Â value of x. In this case, it is equal to 1 6th times 1

Â plus 1 6th times 2, and so on, plus 1 6th times 6.

Â And that comes out to be 3 and a half. Okay.

Â So, we also have the variance of a random variable.

Â It's defined as the expected value of x minus the expected value of x, all to be

Â squared. And if you expand this quantity out, you

Â can see that you'll also get this alternative representation, so that the

Â variance of x is also equal to the expected value of x squared minus the

Â expected value of x, all to be squared. Okay.

Â So, there, a discrete round of variables, probability mass function, and so on.

Â So, let's look at a couple of distributions.

Â The first distribution I want to talk about is the binomial distribution.

Â We say that a random variable x has a binomial distribution, and we write it as

Â x tilde binomial, or bin n, p, if the probability that x is equal to r, is equal

Â to n choose r times p to the r by 1 minus p to the n minus r.

Â And for those of you who have forgotten, n choose r is equal to n factorial divided

Â by r factorial times n minus r factorial. So, the binomial distribution arises, for

Â example, in the following situation. Suppose we toss a coin n times, and we

Â count the number of heads. Well then, the total number of heads has a

Â binomial distribution and we're assuming here that these are independent coin

Â tosses so that the result of one coin toss has no impact or influence on the, the

Â outcome of other coin tosses. The mean and variance of the binomial

Â distribution are given to you by these quantities here.

Â So, the expected value of x equals np, the variance of x equals np times 1 minus p.

Â Now, there's actually an interesting application of the binomial distribution

Â to finance. And it actually arises in the context of

Â analyzing fund manager performance. We'll actually return to this example

Â later in the course. But let me just give you a little flavor

Â of it now. So, suppose, for example, a fund manager

Â outperforms the market in any given year, with probability p.

Â And that she underperforms the market at probability 1 minus p.

Â So, we're assuming here that the fund manager either outperforms or

Â underperforms the market, only two possible outcomes.

Â And that they occur with probabilities p and 1 minus p respectively.

Â Suppose this fund manager has a track record of ten years, and that she has

Â outperformed the market in eight of these ten years.

Â Moreover, let's assume that the performance, the fund manager performance

Â in any one year is independent of the performance in other years.

Â So, a question that many of us would like to ask is the following.

Â How likely is a track record as good as this outperforming eight years out of ten,

Â if the fund manager had no skill? And, of course, if the fund manager had no

Â skill, we could assume maybe that p is equal 1 half.

Â Okay. So, actually, we can answer this question

Â using the binomial model, or the binomial distribution.

Â So, let x be the number of outperforming years.

Â Since the fund manager has no skill, then there are ten years, and the total number

Â of outperforming years x, is then binomial, with n equals 10, 10 years, and

Â p equals a half, okay? So, we can then compute the probability

Â that the phone manager does at least as well as outperforming in eight years out

Â of ten, by calculating the probability that X is greater than or equal to 8.

Â So, what we're doing here is calculating the probability that the fund manager

Â would have 8, 9, or 10 years out of 10 in which she outperformed the market.

Â And that is given to us by the sum of these binomial probabilities here.

Â So, these were the original binomial probabilities on each slide, and we summed

Â them from r equals 8, to n. And n, in this case, of course, is 10,

Â okay? So, that's one way to try and evaluate

Â whether the fund manager has just been lucky or not.

Â One can compute this probability and if it's very small, then you might conclude

Â that the fund manager was not lucky and that she had some skill.

Â But actually, this opens up a whole can of worms.

Â There are a lot of other related questions that are very interesting.

Â Suppose there are M fund managers, how well should the best one do over the

Â ten-year period if none of them had any skill?

Â So, in this case, you don't have just one fund manager as we had in this example so

Â far, we now have M of them, okay? And it stands to reason that even if none

Â of them had any skill, then as M gets large, you would expect at least one of

Â them or even a few of them to do very well.

Â Well, how can you analyze that? Again, you can use the binomial model and

Â what are called order statistics of the binomial model to do this.

Â And we'll actually return to this question later in the course.

Â Okay. So, let's talk about another distribution

Â that often arises in finance and financial engineering, that is the Poisson

Â distribution. We say, that x has a Poisson lambda

Â distribution so lambda is the parameter of the distribution.

Â If the probability that x equals r is equal to the lambda to the power of r

Â times e to the minus lambda, divided by r factorial.

Â And for those who have forgotten factorials, I also used it in the binomial

Â model a while ago. R factorial is equal to r times r minus 1

Â times r minus 2, all the way down to 2 times 1.

Â Okay. So, this is the Poisson distribution.

Â The expected value and the variance of a Poisson random variable are identical and

Â equal to lambda. So, for example, we'll actually just show

Â this result here. It's very simple and the mean is

Â calculated as follows. We know that the expected value of x is

Â equal to the sum of the possible values of x, so these are the r's, times the

Â probability that x is equal to r and r runs from 0 to infinity.

Â We can calculate that as follows. So, we have the summation of r and the

Â probability that X equals r. We know from up here, okay, and we can

Â substitute that down in here and now, we just evaluate the sum.

Â The first thing to notice is that when r equals 0, this term in the sum is equal to

Â 0. So, we can actually ignore the 0, the

Â first element, the 0 element and replace the summation running from r equals 1.

Â So then, we get this quantity here. We can cancel this r out with the first r

Â up here and write, this is r minus 1 factorial.

Â We can also pull one of these lambdas out here leaving us with a lambda to the r

Â minus 1. And now, if we look at this quantity here,

Â this summation here, we see that this is the same as changing this to run from r

Â equals 1 to r equals 0 and replacing r minus 1 with r and r minus 1 factorial

Â with r factorial here. This total we see is equal to the sum of

Â the probabilities. These are the probability that x equals r,

Â so this is the sum of the probabilities that x equals 0, x equals 1, x equals 2,

Â so this is equal to 1. The total sum of probabilities must be

Â equal to 1, so this is equal to lambda. Okay, let's talk a little bit now about

Â Bayes' theorem. Let A and B be two events for which the

Â probability of B is nonzero, then the probability if A given B, and this is

Â notation we'll use throughout the course, this vertical line means it's a

Â conditional probability. S,o it's the probability of A given that B

Â has occurred, well, this is equal to the probability of A intersection B divided by

Â the probability of B. Alternatively, we can actually write this,

Â this numerator probably of A intersection B, as being the probability of B given A

Â by the probability of A. So, this is another way to write a Bayes'

Â theorem. And finally, if we like, we can actually

Â expand the denominator here, the probability of B, and write it as the

Â summation of the probability of B given Aj, by the probability of Aj.

Â Let me sum over all Aj's. For the Aj's, form a partition of the

Â sample-space. What do I mean by partition?

Â Well, I mean the following. So, Ai intersection Aj is equal to the

Â null set, for i not equal to j, and at least 1 Ai, at least, at least one Ai must

Â occur. And, in fact, because Ai intersection Aj

Â is equal to the null set, for i not equal to j, I can actually replace this

Â condition with the following, exactly one Ai must occur.

Â Okay. So, that's Bayes' theorem.

Â Let's look at an example. So, here's an example where we're going to

Â toss 2 fair 6-sided dice. So, Y1 is going to be the outcome of the

Â first toss, and Y2 would be the outcome of the second toss.

Â X is equal to the sum of the two, and that's what we plotted in the table here.

Â So, for example, the 9 here comes from the 5 on the first toss and 4 on the second

Â toss. So, 4 plus 5 equals 9.

Â So, that's X equals Y1 plus Y2. So, the question we're interested in

Â answering is the following. What is the probability of Y1 being

Â greater than or equal to 4, given that x is greater than or equal to 8?

Â Well, we can answer this using this guy here on the previous slide.

Â So, this is equal to the probability that Y1 is greater than or equal to 4 and X is

Â greater than or equal to 8, divided by the probability that X is greater than or

Â equal to 8. Okay.

Â So, how do we calculate these two quantities?

Â Let's look at the numerator first of all. So, we need two events here.

Â Y1 must be greater than or equal to 4 and X being greater than or equal to 8.

Â Okay. So, the first event is clearly captured

Â inside this box here, okay, because this corresponds to Y1 being greater than or

Â equal to 4. So, all of these outcomes correspond to

Â that event. The event that X is greater than or equal

Â to 8 corresponds to this event or these outcomes.

Â So therefore, the intersection of these two outcomes, where Y1 is greater than or

Â equal to 4 and X is greater than or equal to 8, is this area here, which is very

Â light, so let me do it a little bit darker.

Â So, it's this area here. Now, each of these cells is equally

Â probable and occurs at probability 1 over 36.

Â There are a total of 3, 4, 7, plus 5, 12. So that's 12 cells here.

Â So, the numerator occurs with probability 12 over 36.

Â And the, the denominator, the probability that X is greater than or equal to 8,

Â well, that's what we highlighted in the red here.

Â And the probability of that occurring, well, there's 12 plus these 3 additional

Â outcomes equals 15 outcomes. So, that's 15 over 36, and that is equal

Â to 4 over 5. So, that's our application of, of Bayes'

Â theorem. Okay.

Â So, let me talk a little about continuous random variables.

Â We say a continuous random variable x has a probability density function, or a PDF,

Â f. If f of x is greater or equal to 0, and

Â for all events, A, the probability that x is in A, or the probability that A has

Â occurred is the integral of the density, f of y, dy over A.

Â The CDF, cumulative distribution function, and the PDF are related as follows, f of x

Â is equal to the integral from minus infinity to little x of f of y dy.

Â And, of course, that's because we know that f of x, by definition, is equal to

Â the probability that X is less than or equal to x, so this, of course, is equal

Â to the probability that minus infinity is less than or equal to X, is less than or

Â equal to little x. So, this is our event A here and this

Â definition here. So, A is now integrated from, A is now the

Â event minus infinity less than or equal to the random variable x, less than or equal

Â to little x, so that's what we have over here.

Â So, it's often convenient to recognize the following, that the probability that x is

Â in this little integral here, x minus epsilon of 2 and x plus epsilon over 2.

Â Well, that's equal to this integral, x minus epsilon over 2 to x plus epsilon

Â over 2 times f of y dy, okay? And if you like, we can draw, something

Â like this. So, this could be the density, f of x.

Â This is x here, maybe we've got some point here which is little x, and this is x

Â minus epsilon over 2. This is x plus epsilon over 2.

Â So, in fact, what we're saying is that the probability is this shaded area, and it's

Â roughly equal to this value, which is f of x times epsilon, which is the width of

Â this interval here, okay? And, of course, the approximation clearly

Â works much better as epsilon gets very small.

Â Okay. So, there are continuous random variables.

Â Let me talk briefly about the normal distribution.

Â We say that X has a normal distribution or write X tilde N mu sigma squared if it has

Â this density function here. So, f of x equals 1 over root 2 pi sigma

Â squared times the exponential of minus x minus mu, all to be squared divided by 2

Â sigma squared. The mean and variance are given to us by

Â mu and sigma squared respectively. So, the normal distributions are very

Â important distribution in practice, its mean is at mu, its mode, the highest point

Â in the density is also at mu and approximately 95% of the probability

Â actually lies within plus or minus 2 standard deviations of the mean.

Â So, this is approximately equal to 95% for a normal distribution.

Â Okay. So, this is a very famous distribution.

Â It arises an awful lot in finance. It certainly has its weaknesses and we'll

Â discuss some of them as well later in the course.

Â A related distribution is the log-normal distribution.

Â And we will write that x has got a log-normal distribution with parameters mu

Â and sigma squared if the log of x, is normally distributed with mean mu and

Â variance sigma squared. The mean and variance of the log-normal

Â distribution as given to us by these two quantities here, and again, the log-normal

Â distribution plays a very important role in financial applications.

Â