In the next three segments we will change gears a little bit and talk about the
definition of a PRG. This definition is a really good way to think of a PRG. And we
will see many applications for this definition. So consider a PRG with
keyspace A that ouputs N bit strings. Our goal is to define, what does it mean for
the output of the generator to be indistinguishable from random? In other
words, we're gonna define a distribution that basically is defined by choosing a
random key in the keyspace. Remember that a arrow with R above it means choosing
uniformly from the set script K. And then we output, basically, the output of the
generator. And what we'd like to say. Is that this distribution. This distribution
of pseudo random strings is indistinguishable from a truly uniform
distribution. In other words, if we just choose a truly uniform string in 01 to the
N and simply output this string, we'd like to say that these two distributions are
indistinguishable from one another. Now if you think about it, this sounds really
surprising because if we draw a circle here of all possible strings in 01 to the
N, then the uniform distribution basically can ouput any of these strings with equal
probability. That's the definition of the uniform distribution. However a
pseudo-random distribution generated by this generator G. Because the seed space
is so small, the set of possible outputs is really, really small, it's tiny inside
of, 01 to the N. And this is really all that the generator can output. And yet,
what we're arguing is that an adversary who looks at the output of the generator
in this tiny set can't distinguish it from the output of the uniform distribution
over the entire set. I think that's the property that we're actually shooting for.
So to understand how to define this concept of indistinguishability from
random, we need the concept of a statistical test. So, let me define what a
statistical test on 01 to the N is. I'm gonna define these statistical tests by
the letter A. And the statistical test is basically an algorithm that takes its
inputs and N bit string, and simply outputs zero or one. Now I'll tell you
that zero, we're gonna think of it as though the statistical test said, the
input you gave me is not random. And one, we're going to think of it as saying that
the imput you gave me actually is random. Okay, so all this statistical test does is
it basically takes the input x that was given to it, the n bit string that was
given to it, and decides whether it looks random or it doesn't look random. Let's
look at a couple of examples. So the first example basically will use the fact that
for a random string, the number of zeros is roughly equal to the number of ones in
that string. In other words, the statistical test is going to say one. If
and only if basically the number of zeros in the given string X minus the number of
1's in the given string X. These numbers are not too far apart. In other words, the
difference between the number of 0's and the number of 1's. Let's just say is less
than ten times square root of n. Okay If the difference is less than ten times, the
statistical test will say hey the string X looks random. If the
difference happens to be much bigger than ten times square root of n, that starts to
look suspicious and the test, hey the string you gave me does not
look random. A statistical test. Let's look at another similar example. We'll say
here, the statistical test will say one. If and only if say the number of times
that we have two consecutive zeros. Inside of X. But let's think about this for a
second. This basically again counts. In this string of, n bits. It counts a number
of times that we see the pattern zero, zero. Two consecutive zeros. Well for a
random string. We will expect to see 0,0 as probability one fourth. And there for
in a random string. We'll expect about N over four 0,0's. Yeah, N over four blocks
a 0,0. And so, what the statistical test will do is it will say, well, if the
number of zero zeros is roughly N over four. In other words, the difference
between the number and N over four, is, say, less than ten square root of n,
then we will say that X looks random. And if the gap is much bigger than N over
four, we'll say, hey, this string doesn't really look random. And then the
statistical test will output zero, okay? So here are two examples of statistical
tests that basically, for random strings, they will output one with very high
probability. But for strings that, you know, don't look random, for example,
think of the all zero string. So the all zero string, neither one of these tests
will output, one. And in fact, the all zero string does not look random. Let's
look at one more example of the statistical test just to kinda show you
the, basically statistical test can pretty much do whatever they want. So here's the
third example. Let's say that statistical test output one if an only if I say the
biggest blocks what we'll call this the Maximum Run of zero inside of the string
x, this is basically the longest sequence of zero inside of the string x. In a
random string you expect the longest sequence of zeros to be roughly of length
log N. So we'll say if the longest sequence of zero happens to be less than
ten times log N Then this test will say that X was random. But if, all of a
sudden, we see a run of zeros that, say, is much bigger than ten log N, then the
statistical test will say, the string is not random, okay? So this is another crazy
thing that the statistical test will do. By the way, you notice that if you give
this test, the all one string. So one, one, one, one, one. This test will also
output one. In other words this test will think that the all one string is random.
Even though it's not. Yeah, even though one string is not particularly random.
Okay, so statistical tests don't have to get things right. They can do whatever
they like. They can test, they can decide to output random or not. You know, zero or
one, however they like. And similarly, there are many, many, many, many other
statistical tests. There are literally hundreds of statistical tests that one can
think of. And I can tell you that in the old days, basically, the way you would
define. That something looks random. As you would say, hey, here's a battery of
statistical tests. And all of them said that this string looks random. Therefore,
we say that this generator that generated the string is good generator. In other
words, this definition, then, uses a fixed set of statistical tests, is actually not
a good definition for security, but more generally, for crytpo. But before we talk
about actually defining security, the next thing we talk about is how do we evaluate
whether a statistical test is good or not? So to do that, we define the concept of
advantage. And so let me define the advantage. So here we have a generator
that outputs N bit strings. And we have a statistical tests on N bit strings. And we
define the advantage of this generator, as denoted by advantage sub PRG,
the advantage of the statistical test A relative to the generator g. I'll define
it as follows, basically the difference between two quantities. The first quantity
is basically, we ask how likely is this statistical test to output one. When we
give it a pseudo random string just like here K is chosen uniformly from the C
space we ask how likely is the statistical test to output one when we give it a
pseudo random output generated by the generator verses now we ask how likely is
the statistical test to output one when we give it a truly random string. So here are
is truly random in zero random one to the n. Okay, and yeah. We look at the
difference between these two quantities. Now you realize because these are
differences of probabilities this advantage is always going to lie in the
interval zero, one. So let's think a little bit about what this advantage
actually means. So first of all if the advantage happens to be close to one. Well
what does that mean. That means that somehow, the statistical test A behaves
differently when we gave it pseudo-random inputs, when we gave it the output of the
generator, for when we gave it truly random inputs, right? It somehow behaved
differently. In one case, it output one with a certain probability. And in the
other case, it output one with a very different probability, okay? So somehow,
it was able to behave differently. And what they really means is that the
statistical test could basically distinguish the output of the generator
from random. Okay, so in some sense we'll say that this statistical test broke the
generator G because it was able to distinguish the output from random.
However if the advantage is close to zero Well what does that mean. That means that
basically the statistical tests behaves pretty much the same on pseudo random
inputs. As it does on truly random inputs. And basically there we would say that A
could not distinguish the generator from random. Okay, so this sum gives you a
little bit of intuition about why this concept of advantage is important. It
basically tells us whether A was able to break the generator, namely distinguish it
from random, or not able to break it. So let's look, first of all, at a very silly
example. Suppose we have a statistical test A that simply ignores its inputs and
always outputs zero. Okay. Always output zero. What do you think of the advantage
of this statistical test relative to a generator G? So, I hope everybody
say the advantage is zero, let me just explain, why that's the case,
well, if the statistical test, always outputs, zero, that
means, pseudo random inputs, it will never output one, so, the probability that
outputs one, is zero. Similarly, when we give a truly random input, it still will
never output one, and, so, the probability that outputs one, is zero,
and, so, zero minus zero is zero, so, its advantage is zero, so, basically, and, a
statistical test that ignores its, its input, does not able to distinguish, truly
random inputs, from, a pseudo random input, obviously. Okay, now let's look at
a more interesting example. So suppose we have a generator G that satisfies a funny
property. It so happens that for two-thirds of the keys The first bit
of the output of the generator happens to be one, okay? So if I choose a random key
with probability two-thirds, the generator will output one as its first bit, okay? So
that's the property of the generator that we're looking at. Now, let's look at the
following statistical test. The statistical test basically says, if the
most signifigant bit of the string you gave me is one, I'm gonna say one, meaning
I think it's random. If the most signigant bit of the stream you gave me is not one,
zero, I'm gonna say zero. Okay so now my question to you is what is the
advantage of this statistical test on the generator G? Okay, so remember I just
wrote down the definition here again. And I'll let you think about this for a
second. So let me explain. Suppose we give the statistical tests pseudo random
inputs. By definition of G, we know that with probability two-thirds, the first
bits in the inputs will start with the bit one. But if it starts with a bit one, then
the statistical test will output one. In other words, the probability that this
statistical test outputs one is exactly two-thirds. Now let's look at the case of
a random string. If I give you a random string, how likely is it that the most
signifigant bits of the random string is one? Well, for a random string, that
happens exactly half the time, and so in this case the statistical test will output
one, with probability one-half. And so the overall advantage is one-sixth, and
one-sixth is actually a non-negligible number, that's actually a fairly large
number, which basically means that this a was able to distinguish the output. We'll
say that a breaks the generator G with advantage 1/6. Okay, which basically
means that this generator is no good, is broken. Okay, so now that we understand
what statistical tests are, we can go ahead and define, what is a secure
pseudo-random generator. So, basically, we say that, as generator G is secure, if
essentially no efficient, statistical tests can distinguish its output from
random. More precisely, what we'll say is that, basically for all efficient
statistical tests, a... Statistical tests, a... It so happens that if I look at the
advantage. Of the statistical test E relative to G. This advantage basically is
negligible. So, in other words, it's very close to zero, and as a result,
this, statistical test was not able to distinguish the output from random, and
that has to be true for all statistical tests. So, this is a very, very pretty and
elegant definition, that says that a generator is secure, not only if a
particular battery of statistical tests says that the output looks random, but, in
fact, all efficient statistical tests will say the output looks random. Okay? One
thing I'd like to point out is, that the restriction to efficient statistical tests
is actually necessary. If we ask that all statistical tests, regardless of whether
they're efficient or not, not be able to distinguish the output from random. Then
in fact, that can not be satisfied. So in other words if we took out the
requirements that the test be efficient. Then this definition would be
unsatisfiable. And I'll leave this as a simple puzzle for you to think about. But
basically the fact is that restricting this definition into only efficient
statistical tests is actually necessary for this to be satisfiable. So now that we
have a definition, the next question is can we actually construct a generator and
then prove that it is in fact a secure PRG. In other words, prove that no
efficient statistical test can distinguish its output from random. And it turns out
that the answer is we actually can't. In fact, it's not known. If there are any
probably secure PRG's. Then I will just say very briefly the reason is that if you
could prove that a particular generator is secure that would actually imply that P
is not equal to NP. And I don't want to dwell on this. Because I don't want to
assume that you guys know what P and NP are. But I'll tell you as a simple fact
that in fact in P is equal to NP. Then it's very easy to show that there are no
secure PRGs. And so if you could prove to me that a particular PRG is secure, that
would imply that P is not equal to NP. Again, I will leave this to
you as a simple puzzle to think about. But, even though we can't actually
rigorously prove that a particular PRG is secure, we still have lots and lots and
lots of heuristic candidates, and we even saw some of those in the previous
segments. Okay now that we understand what is a secure PRG. I want to talk a little
bit about some applications and implications of this definition. And so
the first thing I want to show you is that in fact a secure PRG is necessarily
unpredictable. In a previous segment, we talked about what it means for a generator
to be unpredictable. And we said that, basically, what that means is that, given
a prefix of the output generator, it's impossible to predict the next bit of the
output. Okay, so we'd like to show that if a generator is secure, then necessarily,
it means it's unpredictable. And so the only way we're gonna do that is using the
contrapositive. That is, we're gonna say that if you give me a generator that is
predictable, then necessarily, it's insecure. In other words, necessarily, I
can distinguish it from random. And so let's see, this is actually a very simple
fact. And so let's see how we would do that. So suppose you give me a predictor.
In other words, suppose you give me an efficient algorithm, such that, in fact,
if I give this algorithm the output of the generator, but I give it only the first
I-bits of the outputs. It's able to predict the next bit of the output. In
other words given the first I-bit it's able to predict the I plus first bit. And
it does that with a certain probability. So let's say if we choose a random k. From
the keyspace. Then, clearly, a done predictor would be able to predict the
next bit with probability one-half, simply just guess the bits. You'll be right with
probability one-half. However, this algorithm A is able to predict the next
bit with probability half with epsilon. So it's bound to the way. From a half. And,
in fact, we require that this by true for some non negligible epsilon. So, for
example, epsilon =1/1000 would already be a dangerous predictor, because it can
predict the next bits, given a prefix, with non negligible advantage. Okay, so
suppose we have such an algorithm. Let's see that we can use this algorithm to
break our generator. In other words to show that a generator is distinguishable
from random and therefore, is insecure. So what we'll do is we'll define a
statistical test. So, let's define the statistical test B as follows. Basically,
B, given a string, x, what it will do, is it will simply run algorithm A on the
first I-bit of the string x that it was given. And, statistical test b is simply
gonna ask, was a successful in predicting the I-plus first bit of the string? If it
was successful, then it's gonna output one. And if it wasn't successful, then
it's gonna output zero. Okay. This our statistical task. Let's put it in a box So
we can take it wherever we like. And we can run the statistical test on any N bit
string that's given to us as inputs. So now, let's look at what happens. Suppose
we give the statistical test, a truly random string. So a truly random string R.
And we ask, what is the probability that the statistical test outputs one? Well,