0:00

Over the years many natural cryptographic constructions were found to be insecure.

Â In response, modern cryptography was developed as a rigorous science where

Â constructions are always accompanied by a proof of security. The language used to

Â describe security relies on discreet probability. In this segment and the next,

Â I'll give a brief overview of discreet probability, and I point to this Wiki

Â books article over here for a longer introduction. Discrete probability is

Â always defined over a universe which I'll denote by u and this universe in our case

Â is always going to be a finite set. In fact very commonly our universe is going

Â to be simply the set of all n bit strings which here is denoted by zero one to the

Â n. So for example the set zero one squared is the set of all two bit strings which

Â happens to be the string zero, zero, zero one, One, zero, and one, one. So there are

Â four elements in this set, and more generally in the set zero one to the N,

Â there are two to the N elements. Now a probability distribution over this

Â universe U is simply a function which I'll denote by P, and this function, what it

Â does, is it assigns to every element in the universe a number between zero and

Â one. And this number is what I'll call the weight or the probability of that

Â particular element in the universe. Now there's only one requirement on this

Â function P, and that is that the sum of all the weights, sum up to one. That is,

Â if I sum the probability of all elements X in the universe, what I end up with is the

Â number one. So let's look at a very simple example looking back to our 2-bit

Â universe. So 0001, ten and eleven And you can consider the following probability

Â distribution Which, for example, assigns to the element 00, the probability one

Â half. The elements 01, we assign the probability 1/8th, to ten we assign the

Â probability one quarter and to eleven we assign the probability 1/8th. Okay we can

Â see that if we sum up these numbers in fact we get one which means that this

Â probability P is in fact the probability distributio N. Now what these number mean

Â is that if I sample from this probability distribution I'll get the string 00 with

Â probability one half I'll get the string 01 with probability 1/8th and so on and so

Â forth. So now that we understand what a probability distribution is, let's look at

Â two classic examples of probability distributions. The first one is what's

Â called the uniform distribution. The uniform distribution assigns to every

Â element in the universe, exactly the same weight. I'm gonna use U between two bars

Â to denote the size of the universe U. That is the number of elements in the universe,

Â and since we want the sum of all the weights to sum out to one, and we want all

Â these weights to be equal, what this means is that for every element X in the

Â universe, we assign a probability of one over U. So in particular if we look at our

Â example, the uniform distribution and the set of two [inaudible] strings, would

Â simply assign one-quarter the weight, one-quarter to each one of these strings

Â And clearly that, the sum of all the weights sums up to one. Well again, what

Â this means is that if I sample at random from this distribution, I'll get a uniform

Â sample across all our 2-bit strings So all of these 4-bit strings are equally likely

Â to be sampled by this distribution. Another distribution that's very common is

Â what's called a point distribution at the point X0 And what this point distribution

Â does is basically it puts all the weight on a single point, namely X0. So here we

Â assign to the point X0 all the weight, one And then to all other points in the

Â universe, we assign the weight zero And by the way, I want to point out that this,

Â inverted, A here should be read as, for all. So all this says is, that for all X

Â that are not equal to X zero, the probability of that X is zero. So again

Â going back to our example a point distribution for example, that would put

Â all its mass on the string 1-0, would assign probability one to the string 1-0

Â and zero to all other strings. So now if I sample from this distribution pretty much

Â I'm always guaranteed to always sample the string 1-0 and never sample any of the

Â other strings. So now we know what a distribution is, and I just want to make

Â one last point, and that is that because this universe U is always gonna be a

Â finite set up for us, we can actually write down the weights that the

Â distribution assigns to every element in U, and represent the entire distribution

Â as a vector. So, here for example, if you look at the universe of an all 3-bit

Â strings, we can literally write down the ways that the distribution assigns to the

Â string 000, then the way that distribution assigns to the string 001 And so on, and

Â so forth. We you can see that we can write this as a vector, in this case it will be

Â a vector of dimension eight, there will be, there are eight strings of 3-bits as a

Â result basically the entire distribution is captured by this vector of eight real

Â numbers, in the range of all zero to one. The next thing I wanna do is define the

Â concept of an event. So consider a subset A of our universe And I, I'll define the

Â probability of the subsets to be simply the sum of the weights of all the elements

Â in the set A. In other words, I'm summing over all X and A, the weights of these

Â elements X in the set A, Now because the sum over the entire universe of all

Â weights needs to be one. This means that if we sum, well if you look at the

Â probability of the entire universe, basically we get one. And if we look at

Â the probability of a subset of the universe, we're gonna get some number in

Â the interval zero to one And we say that the probability of this set A, is the sum

Â which is a number between zero and one. And I'll tell you that a subset A of the

Â universe is called an event. And the probability of the set A is called the

Â probability of that event. So let's look at a simple example. So suppose we look at

Â the universe u, which consists of all 8-bit strings, right? So the size of this

Â universe u is 256 because there are 256 8-bit strings. Essentially we're looking

Â at all byte values, all 256 possible byte values. Now let's define the following

Â event. Basically the event is gonna contain all bytes so all [inaudible]

Â extremes in the universe such that the two least significant bits of the byte happens

Â to be eleven So for example, if we look at 01011010 that's an element in the universe

Â that's not in the set A, but if we choose a zero to a one. Then that's an element of

Â the universe which gives in our set A. And now let's look at the uniform distribution

Â over the universe U and let me ask you what is the probability of the, of the

Â event A? So what is the probability that when we choose a random byte, the two

Â least significant bits of that byte happens to be one, one? Well the answer is

Â one-fourth, and the reason that's true is because it's not too difficult to convince

Â yourself that of the 256 eight bit strings, exactly 64 of them, one quarter

Â of them, end in one, one. And the probability of each string is, we're

Â looking at the uniform distribution or probability of each string is exactly one

Â over the size of the universe, mainly one over 256. And the product of these, you

Â know, 64 elements, each one has weight one over 256 is exactly one-fourth, which is

Â the probability of the event A that we're looking at. So a very simple bound on the

Â probability of events is called the union bound. So imagine we have two events a1

Â and a2. So these are both subsets of some universe U Snd we wanna know what is the

Â probability that either A1 occurs, or A2 occurs In other words, what is the

Â probability of the union of these two events? This little U here denotes the

Â union of the two sets. So the union bound tells us that the probability that either

Â A1 occurs or A2 occurs is basically less than the sum of the two probabilities. And

Â that's actually quite easy to see. So simply look at this picture here, you can

Â see that when we look at, at the sum of the two probabilities, we're basically

Â summing the probability of all the elements in A1, all the elements in A2 And

Â you realized, we kind of double-summed these elements in the intersec tion. They

Â get summed twice here on the right hand side And as a result, the sum of the two

Â probabilities is going to be larger or larger than or equal to, the actual

Â probability of the union of A1 and A2. So that's the classic union bound And in fact

Â I'll tell you that if the two events are disjoint, in other words they're

Â intersection is empty, in that case if we look at the sum, at the probability that

Â either A-1 or A02 happens, that's exactly equal to the sum of the two probabilities.

Â Okay? So we'll use these facts here and there throughout the course. So just to be

Â clear, the inequality holds always But when the two events are disjoint, then in

Â fact we get an equality over here. So let's look at a simple example. Suppose

Â our, event A1 is the set of all n-bit stings that happen to end in 1-1 And

Â suppose A2 is the set of all n-bit strings that happen to begin with 1-1. Okay, so N

Â thinks of it as H or some large number and I'm asking that what is the probability

Â that either a one happens or a two happens, In other words if I sample

Â uniformly from the universe U, what is the probability that either the least

Â significant bits are one, one or the most significant digits are one, one. Well as

Â we said that's basically the probability of the union of A1 and A2. We know that

Â the probability of each one of these events is one-quarter by what we just did

Â previous slide And therefore by the union [inaudible] the probability of the

Â [inaudible]. Is, you know, a quarter of the probability of A1, plus the

Â probability of A2, which is a quarter plus a quarter. And we just proved that the

Â probability of seeing 1-1 in the most significant bit, or 1-1 least significant

Â bit, is less than one-half. So that's a simple example of how we might go about

Â using the Union Bound to bound the probability that one of two events might

Â happen. The next concept we need to define, is what's called a random

Â variable. Now, random variables are fairly intuitive objects. But unfortunately the

Â formal definition of a random variable can be a little c onfusing. So what I'll do

Â is, I'll give an example, and hopefully that will be clear enough. So formally, a

Â random variable denoted say, by X. Is a function, from the universe into some set

Â V And we say that this set V is where the random variable takes its values. So let's

Â look at a particular example. So suppose we have a random variable x And this

Â random variable maps into the set 01. So the values of this random variable are

Â going to be either zero or one. So, one bit, basically. Now, this random variable

Â maps our universe, which is the center of all end bit binary strings, 01 to the end

Â And how does it do it? Well, given a particular sample in the universe, a

Â particular end-bit string y. What the random variable will do is simply output

Â the least significant bit of y And that's it. That's the whole random variable. So,

Â now let me ask you. Suppose we look at a uniform distribution on the set zero one

Â to the end. Let me ask you what is the probability that this random variable

Â output zero and what is the probability that a random variable outputs one? Well

Â you can see that the answers are half and half. Well let's just lead them through

Â why that's the case. So here we have a picture showing the universe and the

Â possible alpha space. And so in this case the variable can output a zero or a one.

Â When there is a variable output zero, there is a variable output zero when the

Â sample in the universe happens to be, to have its least inefficient [inaudible] bid

Â be set to zero. In variable one, output one When the sample in the universe

Â happens to have its least significant bit set to one. Well, if choose strings

Â uniformly at random, the probability that we choose a string that has its least

Â significant bits set to zero is exactly one half Which the random variable output

Â zero with a probability of exactly one-half. Similarly, if we choose a random

Â end-bit string, the probability that the least significant bit is equal to one is

Â also one-half. And so we say that the random variable output's one, also with

Â exactly proba bility of one-half. Now, more generally, if we have a random

Â variable taking values in a certain set v, then this random variable actually induces

Â a distribution on this set v. And here, I just wrote a, kind of a, in symbols, what

Â this distribution means But it's actually very easy to explain. Essentially, what it

Â says is that the variable outputs v Basically, with the same probability that

Â if we sample a random element in the universe, and, and then we apply the

Â function x. We ask, how likely is it that the output is actually=to v? So formally

Â we say that the probability that X outputs V, is the same as the probability of the

Â event That when we sample a random element in the universe, we fall into the pre

Â image of V under the function X And again, if this wasn't clear, it's not that

Â important. All you need to know is that a random variable takes values in a

Â particular set V, and in, induces a distribution on that set V. Now there's a

Â particularly important random variable called a uniform random variable. And it's

Â basically defined as you would expect. So let's say that U is some fine [inaudible]

Â set For example the set of all N bit binary strings, and we're gonna denote a

Â random variable R that's basically sample's uniformly from the set U by this

Â little funny arrow with a little R on top of it. In this, again the notes that the

Â random variable R is literally a uniform random variable sampled over the set U. So

Â in symbols what's this means is that for all elements A in the universe, the

Â probability that R is equal to A is simply R one over U. And if you want to stick to

Â the formal definition of a, of a uniform variable, it's not actually that important

Â But I would just say that formally the uniform random variable is an identity

Â function namely R [inaudible] is equal to X for all X in the universe So just to see

Â that this is clear, let me ask you a simple puzzle. Suppose we have a uniform

Â random variable over 2-bit strings, so over the set, 01, ten, one and now, let's

Â define a new random variable X to basicall y sum the first and second bits of R. That

Â is, X simply is the sum of R1 and R2, the first and second bits of R, treating those

Â bits as integers. So, for example, if, r happens to be 00, then x will be zero+0,

Â which is zero. So let me ask you, what is the probability that x is = to two? So

Â it's not difficult to see that the answer is exactly, one-fourth because, basically

Â the only way that x is equal to two is if r happens to be 1,1 but the probability

Â that r is equal to 1,1 is basically one-fourth because r is uniform over the

Â set of all two bit stings. The last concept I want to define in this segment

Â is what's called a randomized algorithm. So I'm sure you're all familiar with

Â deterministic algorithms. These are algorithms that basically take a

Â particular, input data, as input, and they always produce the same output, say Y. So

Â if we run the algorithm a hundred times on the same input, we'll always get the same

Â output. So you can think of a deterministic algorithm as a function that

Â given a particular input data, M, will always produce exactly the same output, A

Â of M. A randomized algorithm is a little different, in that, as before, it takes

Â the [inaudible] and as input, but it also has an implicit argument called R, where

Â this R is sampled anew every time the algorithm is run. And in particular this R

Â is sampled uniformly at random from the set of all N-bit strings, for some

Â arbitrary end. Now what happens is every time we run the algorithm on a particular

Â input M we're gonna get a different output because a different R is generated every

Â time. So the first time we run the algorithm we get one output. The second

Â time we run the algorithm a new R is generated and we get a different output.

Â The third time we run the algorithm a new R is generated and we get a third output

Â and so on. So really the way to think about a randomized algorithm is it's

Â actually defining a random variable. Right? So given a particular input

Â message, M, it's defining a random variable which is, defining a distribution

Â over the set of a [laugh] possible outputs of this algorithm, given the input, M. So

Â the thing to remember is that the output of a randomized algorithm changes every

Â time you run it And in fact, the algorithm defines a distribution and the set of all

Â possible outputs. So let's look at a particular example. So suppose we have a

Â randomized algorithm that takes as input a message M And of course it also takes an

Â implicate input which is this random string that is used to randomize its

Â operation. So now what the algorithm will do is simply will encrypt the message M

Â using the random string as input. So this basically defines a random variable. This

Â random variable takes values that are encryptions of the message M And really

Â what this random, random variable is it's a distribution over the set of all

Â possible encryptions of the message M under a uniform key. So the main point to

Â remember is that even though the inputs to a randomized algorithm might always be the

Â same every time you run the randomized algorithm you're gonna get a different

Â output. Okay So, that concludes this segment, and we'll see a bit more discrete

Â probability in the next segment.

Â