2:36

Or you can do the sum, and substitute p and k in there again and get back to p of

Â zu. Okay, so that's enumeration.

Â And again, this math over here is not really needed except to check your

Â understanding of what's going on. Now let's look at the cumulative cost.

Â the way that we compute averages when using generating functions is take the

Â total cost in the objects of size n and divide that by the number of objects of

Â size n. That's the average.

Â that's a little bit different calculation than working with probabilities but since

Â we have this powerful mechanism for counting, I am forgetting counting

Â results, it's natural to do it and we can see from the ordinary bjf how this works.

Â So we're interested in the total cost in objects to size in.

Â So, again, that's for all values of k, we multiply, the number of objects, the

Â size, and the cost k, by k, that gives the cost for all of those objects, then

Â we sum over all values of k, we'll just call that, q sub n.

Â what's that value in terms of the Generating Function?

Â Well what we do is we differentiate with respect to u, that's the marker variable

Â for the cost and then evaluate it, z equals 1 and u equals 1.

Â So that's the partial with respect to u of the ordinary bi-variate generating

Â function. A value at it u equals 1 just from the

Â definition that's the sum on all objects, cost times z to number of objects and

Â again if you collect all the objects of a given size n.

Â That's, exactly the same as q sub n z to the n.

Â So it's the coefficient of z to the n in the partial with respect to u of the OBGF

Â evaluated at z equals 1. or you can look at this representation on

Â the right of the generating function. when we, sorry this one at the top.

Â If you differentiate with respect to u, it brings down a k.

Â So that you get k, p and k. and that's the familiar way to think of

Â the total cost. but in terms of the, bivariate generating

Â function, it's a simple partial derivative and then evaluate at 1.

Â So that gives us what we need to calculate the mean cost of objects of

Â size n. we take the coefficient of z to the n, in

Â the partial with respect to u, of the OBGF.

Â Evaluate at u equals 1. And divide by coefficient of z to the n

Â in the OBGF evaluated at u equals 1. and that's in many cases, a simple

Â calculation. And again, that's the average, which

Â maybe you're more familiar in the form over here, kp and k over, over pn and we

Â can calculate then the variance in the same way.

Â definition of the variance is the sum on k, k minus the means squared times the

Â probability in, in terms of the ordinary bi-variate generating function.

Â it's not hard to check but that's the second partial with respect to u

Â evaluated at z equals 1. n subtract of at the mean, subtract of

Â the mean squared just as with normal probability calculations.

Â So that's a overview of calculating moments from an ordinary bi-variate

Â generating function. just using partial derivatives with

Â respect to the variable that marks the cost and then evaluating at that variable

Â equals 1. And we'll look at an example next.

Â So this is our bit strings example for what's the uh,[COUGH] number of zero bits

Â in a ran, a random binary bit string. we talked in the last section about our

Â construction In the OBGF equation 1 over 1 minus z times plus u.

Â So now we'll start with this equation and calculate the average number of 0 bits in

Â a bit stream. And again we know the answer to this it's

Â going to be half. but this will illustrate the calculation

Â and we'll use the very same straight forward method for more difficult

Â problems. so how many binary bit strings are there?

Â Well we evaluate at u equals 1. So that's 1 over 1 minus 2z, and then

Â take the coefficient of z to the n. Now that's 2 to the n, so that's as

Â expected. to do the cumulated cost, the total

Â number of zero bits in all bitstrings of length n, we need to compute the partial

Â with respect to u of the OBGF. So to compute the partial with respect to

Â u, it's 1 over 1 minus z minus z u. all that's relevant is the minus z u, so

Â it's that to the 1 minus z minus z u to the minus 1 power.

Â so it's minus that squared and then uh,[COUGH] times the minus z.

Â The two minuses cancel, so the partial, with respect to u of that, is over here

Â on the right, z over 1 minus z minus z squared.

Â and to calculate the accumulated cost, we just evaluate that, at u equals 1.

Â So if u equals 1 that's z minus z, 1 minus 2z squared.

Â So we want for the accumulated cost, the coefficient of z to the n, and z over 1

Â minus 2z squared. and that's an elementary calculation.

Â from s, power series that, that it's n times 2 to the n minus 1.

Â And so now the average is just divide those 2.

Â and that's n over 2 as expected. so it's just computing the partial with

Â respect to u, which you know, once you've practiced a few times and remembered the

Â definitions, it's not so difficult a calculation at all, and it's certainly

Â one that can be done automatically nowadays.

Â so we're not going to compute the variance because there's an easier way to

Â do it that we'll talk about in just a second.

Â [COUGH] Okay now I mentioned the idea of horizontal, and vertical generating

Â functions that go with bi-variate generating functions, and so now, I

Â want to take a look at moment calculation using these methods.

Â so, let's look at the moment calculations in terms of the horizontal, ordinary

Â generating function. So that's a generating function that

Â gives the cost for an object of a given size.

Â So what we do is we take the coefficient of z to the n, in the bi-variate

Â generating function, and we'll call that little p sub n of u.

Â It's a generating function for the cost and all the objects of size n.

Â and it's just the sum over all objects that are of size n, u to the cost of that

Â object. and again that cheat sheet over here in

Â the terms of, of p and k that's that's a sum on kp and k, u to k.

Â So it's the generating function for cost. and so that's only got one variable, and

Â since we're interested in knowing the number of objects of size n and the

Â average number of zero[UNKNOWN] an object of size n, we can use that generating

Â function directly to get the answer that we want.

Â So the first numeration if you evaluate that, u equals 1, you get the number of

Â objects of size n. so that's just evaluating the UV 1 that's

Â summing the for all values of k the number objects of size n cause k, that's

Â equal to the number of objects of size n. So this paying use the useful thing both

Â the numeration and for accumulated cost. If you just differentiate the function

Â and evaluate it at 1. Then at some kpnk, which is your

Â accumulated cost, qn. So if we know that pnu evaluated at 1 to

Â enumerate, differentiate evaluate at 1 to get the accumulated cost, and just divide

Â those 2 to get the mean. and even simpler calculation usually.

Â And then the calculation for the variance just involves the second derivative.

Â And again, that's easy to check from the definition, of the variance.

Â So this method, using horizontal ordinary generating functions, is the one that we

Â use most often, to calculate average values and other moments of parameters.

Â 14:08

Now we can also do moment calculations with vertical ordinary generating

Â functions. I put this slide up for completeness.

Â I'm not going to spend a lot of time on it because actually I, in the lectures,

Â we're not going to do any derivations this way.

Â but it's the. same idea, just on the other variable,

Â the vertical instead of the horizontal. If you take the coefficient of u to the k

Â in the ordinary bivariate generating function, you get the generating function

Â for the cost of objects of size k. And it's the same idea, you're just

Â holding the cost constant. And that's sum of p and k z to the n.

Â And so to enumerate we do it the same way, with the the bi-variate generating

Â function. But, accumulated cost is a coefficient of

Â z to the n in sum of kqk of z, where. q k of z is the bivariate generating

Â function. So we're, have to extract another

Â coefficient but there's some problems where it's easier to do the calculation

Â this way. and so, and it gets down to the mean

Â again, and you could do the variance the same way but I'm going to skip it.

Â so and again this is just for completeness this is not the way to

Â determine the average number of zero in a bitstring but and some mechanical as you

Â might do it and it illustrates the idea. So if we pull up the coificant of u to

Â the k, we get the vertical generating function.

Â In this case it's z to the k over 1 minus k to the k plus 1, and then we enumerate

Â same way as before to get the 2 at the end, but the cumulated cost is

Â coefficient of z to the n in sum on k, k times the vertical ogf...

Â and that's a familiar, geometric series, with with an extra factor, and here's the

Â details, for those of you who are used to those sums, that sum evaluates to z over

Â 1 minus 2 z squared. the coefficient of z to the n and that is

Â n to the n minus 1 so we get n over 2 as before.

Â And again, in this case there's more work to get the same answer, but in some case

Â is where it's less work. And it's just a matter of the order in

Â which we extract the coefficients, usually, it's better to use the

Â horizontal where we extract the coefficient z to the nth first.

Â 16:51

and in this simple example, we're always getting N over 2.

Â So why the, the focus on the moments? Well we're always interested in the

Â average, what value can we expect. and, this is classical in probability, so

Â I'm just going to then take a slide to mention these.

Â These classical results. So, if we have a random object of size N

Â with mean use of N and standard deviation sigmus of N.

Â Then we have these famous inequalities, the Markov inequality and Chebyshev

Â inequality. It, it says that the probability of being

Â much larger and small, or smaller than the mean, must decay.

Â And an upper bound on the rate of the decay is measured in units given by the

Â standard deviation. That's why the standard deviation is

Â important. The Markov inequality says, you're not

Â going to be, too many standard deviations away from the mean.

Â the Chebyshev inequality, says that the probability that you're, t standard

Â deviations away from the mean is less than 1 over t squared.

Â so actually, we can get much more accurate results than these for many of

Â the things we study cause we have the full distribution.

Â But still the basic idea is the same and what we say is that if the standard

Â deviation is of smaller order of growth than the mean then we call the

Â distribution concentrated because the means that as in gets larger the average

Â value tends. Towards the mean.

Â We can think of the mean value as a typical value.

Â We can calculate out some specific bounds but usually it suffices to, in the

Â analysis, to learn that the distribution is concentrated.

Â And we move on knowing that we could do more detailed analysis if we want.

Â So that's, this is just a formal statement of that.

Â If a distribution is concentrated, then then your value of your random value over

Â the mean approaches one in probability, in a very specific sense.

Â so, when it's concentrated we say the expected value is typical.

Â That's the one that you really do get expected to get.

Â And this is just, an example for a random bit.

Â If you have 100,000 random bits. You expect about half of them to be one

Â bits and about half of them to be zero bits.

Â the standard deviation is, of that, it's square root of N over 2, so this, it's

Â only about 5,000. So that says that the probability that a

Â value, the random value that you get Is between 49,900,000 and 50,100,000 is

Â better that 99%. That's a kind of result every one in

Â practice with really high probability. We have a good idea of what the value of

Â the random variables going to be. That's what we care about or

Â concentrated. Uh[COUGH] so this is just a plot of this

Â distribution, which is the binomial distribution.

Â as n increases, there's another curve. and so what it shows is that as n gets

Â higher, the curves comes tighter and tighter together towards the mean.

Â and so that's just true in many, many cases.

Â In fact, this particular curve describes lots of distributions.

Â we'll get to that later. Alright so here's just another example

Â really generalizing our number of zero bits example.

Â so let's talk about the class of all M-words.

Â so that's just Uh,[COUGH] strings with every you know, integer between one and m

Â or sequence of sets as we talked about last time.

Â so the size again is going to be the number of letters of a length of the word

Â but let's take as a parameter the number of occurrences of a particular letter in

Â the word. So that's our bi-variate generating

Â function, so to now construct that we use u to mark one the letters, a particular

Â letter, and then we don't mark all the others.

Â So that's n minus 1z. So we had m equals 2 before, we had to

Â use z plus z. Now we have general m.

Â It's in minus 1z. And that immediately translates to 1 over

Â 1 minus, N minus 1 plus u z. We have N minus 1 z N u z, and that's it.

Â So now that's our ordinary bivariate generating function, if we want to

Â enumerate, we just evaluate that at u equals 1.

Â Evaluating that at u equals 1, then we want the coefficient as z to the n and 1

Â over 1 minus mz. That's m to the n.

Â So that checks. and if we want to the accumulated cost,

Â we differentiate with respect to u, and find the coefficient of z to the n.

Â and that's the same kind of calculation that we just did for 2.

Â and we get n m to the m minus 1 So the average number occurences of a given

Â letter, in a random n word, within n letters, is n over n.

Â And again, that's as expected. and we can also go ahead an compute the

Â variance, and it's n over n minus n over m squared.

Â and the square root of that, then, is proportional to the square root of N, so

Â ,uh, it's concentrated. And that's a, direct analysis, using the

Â symbolic method, followed by, calculations using partial derivatives,

Â the basic method that, we're going to use to evaluate parameters.

Â And, I mentioned the practical application of hashing algorithms, and

Â this is from, our algorithms books, or Kanuth book, or, any, book on algorithms.

Â Where uh,[COUGH] we, organize a table of values, by, hashing them to random values

Â between 0 and N Input are n items into m keys interested into length of all the

Â list because we are going to have to search the list sequentially.

Â so that's the strategy, I was never interested in was the average number of

Â keys on the list. and again it's trivial that the average

Â is N over M but that's also what we just proved with the symbolic method.

Â Uh[COUGH] but the extra analysis to get the standard deviation.

Â tells us that it's concentrated which means as n increases it's going to that

Â values going to be typical. and that's useful to know in practice.

Â And we're not likely to have that case where they all fall on one list and

Â nobody falls on the others. So[COUGH] that's just one practical

Â example in many-many of the applications that we look at.

Â What we want to do is find the average value, show that the standard deviation

Â is of small order and that's the case in many-many-many of the applications that

Â we study. So that's the process of calculating

Â moments from the ordinary binary generating function.

Â and again, with the illustration, from familiar binary numbers, but, also one,

Â practical application. next, we'll look at applying these same

Â kinds of calculations in other applications.

Â [BLANK_AUDIO]

Â