In this segment, we're gonna study the security of the ElGamal public key

encryption system. So let me remind you that when we first presented the

Diffie-Hellman protocol, we said that the security is based on the assumption that

says that given G, G to the A, G to the B, it's difficult to compute the

Diffie-Hellman secret, G to the AB. This is basically what the attacker has to do.

He sees Alice's contribution. He sees Bob's contribution and then his goal is to

figure out what the Diffie-Hellman secret is. And we said that this problem is

difficult. The statement that the problem is difficult is what's called the

computational Diffie-Hellman assumption. So, let's take this assumption more

precisely. So, as usual we're going to look at a finite cyclic group of order N,

so G is some group, in your head you should be thinking ZP star, but there could

be other groups, for example, like an ellipctic curve group. And then we say that

the computational Diffie-Hellman assumption. I've often used the shorthand

CDH for Computational Diffie Hellman. We'll say this assumption holds in G if

the following condition holds, namely for all efficient algorithms. If we choose a

random generator, and then we choose random exponents A, B in ZN. Then when

we give the algorithm G, G to the A, and G to the B, the probability that the

algorithm will produce the Diffie Hellman secret is negligible. If this is true for

all efficient algorithms, then we say that the CDH assumption holds for G. CDH, as I

said, stands for Computational Diffie Hellman. As it turns out, this assumption

is not ideal for analyzing the security of the ElGamal system. And instead I'm gonna

go ahead and make a slightly stronger assumption called a hash Diffie-Hellman

assumption. Okay. So what is hash Diffie-Hellman assumption? Again, we are

focusing on a particular group G and now we're also gonna introduce a hash function

H that maps pairs of elements in G into the key space of some symmetric encryption

system. And then we say that the hash Diffie-Hellman a ssumption, or HDH for

short, holds for this pair, G comma H for the group in the hash function if the

following is true. Basically, if I choose a random generator and then I choose

random exponents A and B in ZN and then I also choose a random R and K, then the

following distributions are computationally indistinguishable. That

is, if I give you G, G to the A, G to the B, and then this hash value, this should

look familiar to you. This is the hash value that's used in the ElGamal system to

derive the symmetric encryption key. What we're saying is that this distribution is

indistinguishable from a distribution where you're also given. G, G to the A, G

to the B. But now, instead of giving the hash, you're given just really random key.

So what this assumption says is that the symmetric key that was derived during

encryption in the ElGamal system, essentially is indistinguishable from a

truly random key that is derived independently from the rest of the

parameters of the system. It's a truly independent random key, looks basically

the same as the key that we used, to derive from the G to the A and the G to

the B. Now I'd like to point out that the Hash Diffie-Hellman assumption is actually

a stronger assumption than the Computational Diffie-Hellman assumption.

The way to see that is using the contra positive, that is suppose the

Computational Diffie-Hellman assumption happens to be easy in the group G. Then I

claim that in fact the Hash Diffie-Hellman assumption is also easy in the group G. In

fact, I'll say for G, H and this is true in fact for all hash functions where the

image of the hash function. It contains at least two elements. In other words all I

want to say is that the Hash Diffie-Hellman assumption and it's easy for all

hash functions to go match everything to a single point. Which is true for almost all

hash functions of interest to us. So actually, this is a really simple

statement to prove. Basically, if the Computational Diffie-Hellman assumption is easy, what that

says is that, given G to the A and G to the B, I can actually calculate G to the AB

myself. Cuz I know the hash function H, I can calculate this complete value here. So

if you give me a tuple that's sampled from the left or sampled from the

right. I can very easily tell where it's from. If it's sampled from the left, then

once I've calculated G to the AB myself, I can just test that the last element in the

tuple is, in fact, the hash of G to the B and G to the AB. If it is, I know

the sample is from the left. If it isn't, I know the sample is from the right. Okay,

so this gives me pretty high advantage, in distinguishing these two distributions. So

CDH is easy, it's very easy to see that these distributions are distinguishable.

And this basically says that if the Hash Diffie-Hellman is in fact hard, then CDH

must also be hard. Which then we say, that therefore the Hash Diffie-Hellman is a

stronger assumption. There might be group where CDH is hard, but HDH is not hard.

Okay. So we say HDH is a stronger assumption. If you found this

discussion of weak assumption versus strong assumption confusing, it's not

really that important, it's fine to ignore it. The important thing that I want to

show you is in fact that the Hash Diffie-Hellman assumption is sufficient to

prove the semantic security of the ElGamal system. Before we do that let me quickly

ask you the following puzzle just to make sure the Hash Diffie-Hellman assumption is

clear. So imagine our key space is {0, 1} to the 128. So 128 bit strings and our

hash function will map pairs of group elements into this 128 byte strings.

Suppose it so happens that we chose a hash function Such that it always outputs

strings that begin with zero. In other words, of the 128 bit strings the most

significant bit of the output is always zero. If we chose such a hash function,

does the Hash Diffie-Hellman assumption hold for this pair, G comma H. So the

answer is no it doesn't hold. And the reason is because it now very easy to

distinguish the two distributions that we have here. The distribution on the left

an the distribution on the right. And the way you would distinguish the two is

basically if I choose a truly random element in K, in {0, 1} to the 128,

then mostly it can very well be zero with probability one half. However, the tuple

that's given to me is chosen from the left distribution, then the most significant

bit of the hash will always be zero and therefore if I see zero, I'm gonna say the

distribution is a distribution on the left. If I see one, I'm gonna say the

distribution is a distribution on the right. That will give me advantage one

half in distinguishing these two distributions. And so as a result, the

Hash Diffie-Hillman assumption is false in this case. So clearly you could see that,

even though CDH might be hard in a certain group G, if you choose a bad hash

function, then HDH will not hold for the pair G comma H. Okay. But suppose it so

happens that we choose a group G and a hash function H for which the Hash Diffie

Hellman assumption holds. And in fact, if you choose the hash function to be just

SHA-256, for all we know, the Hash Diffie Hellman assumption holds in the

group ZP star, and holds in elliptic curve groups. My goal then is to show you that

ElGamal is semantically secure under the Hash Diffie-Hellman assumption. So let me

quickly remind you how theElGamal system works. While we're gonna choose a

random generator G, we're gonna choose a random 'a' in ZN, the public key is

gonna be G, and G to the A, the secret key is simply gonna be A. And now here I wrote

shorthand for the ElGamal encryption. Basically, what to encrypt, what we do is

we choose a random B. We hash G as being H to the B. Remember this H to the B is the

value G to the AB. This is the Diffie Hellman secret. The hash function gave us

a symmetric encryption key K. We encrypt the message with K, and we output G to the

B and the symmetric cipher text. To decrypt we have to value U, and the Diffie

Hellman secret, G to the A. To derive a symmetric key, we decrypt the cipher text.

And then we output the plaintext message m. So let's see how to argue that,

in fact, ElGamal is semantically secure under this Hash Diffie-Hillman assumption is

fairly simple. So well we have to argue, remember we had, in semantic security, we

have these two experiments. One experiment, the attacker is given the

encryption of m0. In the other experiment, the attacker is given the encryption of m1.

So I wrote the two experiments here. Here you notice that the attacker starts by

sending off the public key to the adversary. The adversary then chooses two

equal length messages m0 and m1. And then if he gets the ElGamal encryption of M0,

and a kind of rough shorthand for what ElGamal ciphertext is, okay? Similarly, in

encryption one, he simply gets the encryption of M1 instead of M0, and

everything else is the same about these two experiments. Now, because of the Hash

Diffie-Hellman assumption, we know that even though the attacker got to see G, G

to the a and G to the b, we know that the output of the hash of G to the b, G to the

ab is indistinguishable from random. Therefore, if we replace the actual hash

function by a truly generated random key K that's independent of everything else, by

the Hash Diffie-Hellman assumption, the attacker can't distinguish these two

games. And again, it's a simple exercise for you to show that if he could

distinguish these two games, then he would break the Hash Diffie-Hellman assumption.

But then the same is true for experiment one. The attacker also could not

distinguish the output of the hash function from a truly random key, that was

used the generate the challenge cipher text. Okay. So now basically we look at

these two experiments and we realize that, wait a minute, what's going on in these

two experiments? Basically everything is the same except in one experiment, the

attacker gets the encryption under a symmetric encryption system of m0. In the

other one, he gets the encryption under a symmetric encryption system of m1 and the

key is chosen at random independently over everything else. So by the fact that the

symmetric encryption system is semantically secure, these two games are

indistinguishable. If they were distinguishable, then the attacker could

break the semantic security of this symmetric encryption system. So now we

kinda prove this, you know, this chain of equivalences. And that proves that the

original games, in fact, are indistinguishable, computationally

indistinguishable. And therefore, the ElGamal system is semantically secure.

Okay so it's like I said a very simple proof by pictures and you can make this

into a rigorous proof without too much work. But semantic security isn't enough,

what we really want is chosen ciphertext security, and the question is ElGamal chosen ciphertext

secure? So it turns out to prove the chosen ciphertext security of ElGamal we

actually need a stronger assumption, Hash Diffie-Hellman or Computational Diffie-Hellman

are actually not enough to prove chosen ciphertext security of the system as far

as we know. So to prove chosen-ciphertext security, I'm going to introduce a new

assumption called Interactive Diffie Hellman assumption. And actually,

technically we really don't like this assumption. And we're going to try to get

rid of this, later on. But for now, we just want to analyze the security of the

basic ElGamal system against chosen ciphertext attack. So to argue that it is

chosen ciphertext secure, here is what the assumption says. Basically the challenger

starts, you know, plays a game with the adversary, he generates a random G,

generates two exponents, and then he says to the adversary as usual, G, G to the a

and G to the b. Now the adversary's goal is basically to figure out the

Diffie-Helmman's secrets, mainly g to the ab. He outputs the value of V and he wins

the game if V happens to be equal to G to the AB. So, so far this looks identical to

the Computational Diffie-Hellman assumption, there's no difference - this is

the Computational Diffie-Hellman assumption except in Interactive Diffie-Hellman would

give the attacker a little bit more power. So because the attacker has more power,

it's harder to satisfy the assumption, which is why we say that this assumption

is stronger than Computational Diffie-Hellman. Now what is this power to be

given? We are given the ability to make queries. In particular, he gets to submit

two elements from the group G, so U1, V1 disappear from the group G. And then he's

told whether U1 to the A is equal to V1, so he gets one. If you wanted the A equals

to V1 get zero, otherwise it's kind of a bizarre type of query. What, how does it

be possibly help the attacker? But it turns out we need to be able to answer

these types of queries to the adversary in order to be able to prove chosen

ciphertext security. And in fact, he can do these type of queries again and again

and again. So he can issue as many queries like these as he wants and then the

assumption says that despite all these queries he still can't figure out the

Diffie-Hellman secret, namely despite making all these queries, the probability

that he outputs the Diffie-Hellman secret is negligible. Okay.

So clearly if this assumption holds, that the Computational Diffie-Hellman assumption

holds, because here, the adversary has more power, and as a result we say that this assumption

is stronger that Computational Diffie-Hellman, the thing, we really don't like about this

assumption, is the fact, that, it's interactive, and that the adversary is allowed to

make queries to the challenger, and as I said, we're gonna trying to get rid

of this interaction later on. But for now I'll tell you that under this Interactive

Diffie-Hellman assumption and under the assumption that the symmetric encryption

system provides authenticated encryption, and under the assumption that the hash

function is kind of an ideal hash function, what we call the random oracle,

then in fact the ElGamal system is chosen ciphertext secure and again this

RO here denotes the fact that it's in the random oracle model. Which is not

important, so much for our purposes. The main thing to remember that under

kind of this weird, interactive assumption we can actually prove that

ElGamal is a chosen ciphertext secure. And as far as we know this assumption

actually holds for the group ZP star. So what we're gonna do next is basically

answer this question of whether we can get rid of the interactive assumption. Can we

prove security strictly based on CDH? And similarly can we proof security without

relying on random oracles? Just you know without having to assume, that the hash

function is ideal. Just you know, can we prove security using a concrete hash

function. And we'll do that very briefly in the next segment.