This course will introduce you to the foundations of modern cryptography, with an eye toward practical applications.

Loading...

From the course by University of Maryland, College Park

Cryptography

452 ratings

This course will introduce you to the foundations of modern cryptography, with an eye toward practical applications.

- Jonathan KatzProfessor, University of Maryland, and Director, Maryland Cybersecurity Center

Maryland Cybersecurity Center

[SOUND].

In this lecture, we're going to take a brief detour from the topic of method

authentication, to study the important cryptographic primitive of hash functions.

Very roughly speaking, a Cryptographic hash function is a function that

maps arbitrary length inputs to short, fixed-length outputs.

And the output of a hash function is very often called a digest.

It turns out that you can consider two classes of hash functions.

Either functions that are keyed or functions that are unkeyed.

Formally speaking, it turns out that to meaningfully define security

properties keyed hash functions are needed.

But, in this lecture and throughout the course we're going to be

a little bit informal and work with unkeyed hash functions instead.

And the reason for doing that is that in practice,

hash functions that are used are in fact unkeyed.

And so it's easier just to stick to the case of unkeyed hash functions.

Let H be a hash function.

So again that means that it takes arbitrary length inputs.

Here denoted by 0,1*, and

produces some fixed-length output, here an output of length and bits.

A collision in H is a pair of distinct inputs, x and

x prime, whose digests are equal.

That is for which H of x is equal to H of x prime.

And we'll say that hash function H is collision resistant if it's

infeasible to find a collision in H.

Now if we give, if we given some hash function H,

what's the best generic collision attack that we can have?

And this is important because we'd like to understand what value we need for

the hash-function output length n.

Well, one thing we can observe, is that if we simply compute H of x1, H of x2,

H of x3, all the way up to H of x2 n plus 1, for

2 n plus 1, 2 to the n plus 1, distinct inputs.

Then we're guaranteed to find a collision.

Right, the output length of the hash function is only n bits.

That means that there are 2 to the n possible outputs of the hash function.

If we evaluate the hash function on 2 to the n plus 1 inputs,

then there must be some pair of inputs,

that are distinct by construction, which have the same hash value.

And so we're guaranteed to find a collision.

Is it possible to do better?

Well it turns out that it's possible to do much better.

And this is, by using a form of attack called a Birthday attack,

for a reason we'll explain in a moment.

So what you do in this case, for this generic attack,

is simply evaluate your hash function H, on 2 to the n over 2, distinct inputs.

Now if we do that, what's the probability of a collision?

Well now, you're not guaranteed to find a collision, when it could be the case,

that all of the hash values you've just computed, turn out to be distinct.

But we might hope that with some high probability,

we find two inputs that map to the same output.

And the probability with which that happens is related to the so

called birthday paradox.

In the birthday paradox, we ask how many people do we need,

in order to have a 50% chance, that some two people share a birthday.

And if you haven't seen this before it's worth stopping a moment to think about it.

And just to take a a guess or an estimate, as to how many people you think you need,

to have in a room or in a group of people say.

Such that with at least half probability, some pair of people in that room,

share a common birthday.

We can analyze both of these problems, by looking at the following balls and

bins experiment.

So here we have a bunch of bins, and

we're going to start throwing balls randomly into these bins.

This may look odd if you haven't seen this before, but

this is a scenario that's commonly encountered when discussing probability,

and when analyzing certain events.

Anyway for now let's just imagine we have a bunch of bins and

we're going to throw a bunch of balls in those bins.

And when we throw a ball, it lands in a uniform bin.

The number of bins, we'll denote by capital N.

And as I said we start throwing balls into bins.

So we throw the first ball, it happens to land into the third bin.

We throw the next ball it lands into the second to last bin and so on.

And, let's say we keep on doing this.

And here, we've managed to get two balls, in the same bin.

And if we fix some value for the number of balls, we can ask,

well what's the probability that we end up with two balls in the same bin?

And I claim that this is related to the two problems we've discussed on

the previous slide.

If we let the bins denote days of the year, and so we imagine that

there are 365 different bins, and we let the balls correspond to people.

And imagine that if we pick a random person,

their birthday will fall on a random day between January 1st and December 31st.

Then the probability with which two people in a room have the same birthday,

the probability with,

with which they share a birthday, corresponds exactly to the probability.

That when we throw a certain number of balls into these 365 bins,

that two of the balls end up in the same bin.

Turning to the question of hash functions, now we can let the number of bins be 2

to the little n, the number of possible outputs of our hash function.

And we can let the balls, denote computations of our hash function.

Or more precisely,

the output value that we obtain, when evaluating the function on some input.

And here we're going to make the heuristic assumption,

that we can model the hash function as a random function.

It turns out that if you don't like that assumption then actually that's

the worst case in terms of analyzing the time to collision.

And any deviation from random will only,

will only make it easier to find collisions.

But nevertheless in our analysis we're going to model the hash function as

choosing a random output for every input, for every distinct input you feed it.

And so again, the question of how many times we

have to evaluate the hash function in order to find a collision,

comes down to an analysis on this simple balls and bins experiment.

And the question we have is how many balls do we need,

in order to have a 50% chance of a collision?

Well, it turns out that it's possible to prove the following.

When the number above is about n to the one half the square root of n,

the probability of a collision is about 50%.

What that means is that if we look at the birthday problem,

then having 23 people in the room, suffice us to give a 50-50 probability that

some pair of people in that room will share a birthday.

And this is called a paradox, not because it's actually any paradox, but

because the number is much smaller than most people typically expect.

Turning to hash functions, this means that if we evaluate the hash function,

about 2 to the n over 2 times.

Which is the square root of the size of the range of the hash function,

which was 2 to the n.

Then we expect to find some pair of inputs, that hash to the same output.

That is we expect to find a collision with about 50% probability.

Now this is much better than what we had before in the trivial attack where we

simply evaluate h on 2 to the n plus 1 inputs.

We're doing here much better it's a little hard to see because we,

the, we're only dividing by half but that value is in the exponent.

And so this makes actually a significant difference in terms of

the time required to find a hash function collision.

Now what this tells us in terms of security, is that if we

want security against attackers, running in time, about 2 to the n,

then we need the output length of our hash function to be about 2n bits long.

So that is if we want, for example, security or

collision resistance against attackers running for about 2 to the 80 steps.

Then we need to take the output length of our hash func,

of our hash function to be about twice times 80, or 160 bits long.

And importantly, this is twice the length, that we needed for block cypher keys.

So if we want a block cipher to defend against brute force attacks,

running in time 2 to the 80.

We can use block cipher keys that are exactly 80 bits long.

And if there's no better, and there's no better attack,

generically speaking, than just trying every key one by one.

So that means that if we have an 80 bit block cipher key,

we get about 2 to the 80 security, if the block cipher is optimally secured.

On the other hand for a hash function,

we need the output length of the hash function to be twice that, or 160 bits.

And only then do we have a hope of getting security against 2 to the 80

time attackers.

Of course if there's a weakness in the hash function,

then it won't be as secure as we would like it to be.

The 160 bits is a lower bound on the length of the hash function output that we

need, in order to get the security that we want.

In practice, there are a few hash functions you

need to be aware of that are very commonly encountered.

MD5 was a hash function developed in 1991, and it has a 128-bit output length.

The output length alone would be considered too short, for

modern day security applications.

But independent of that it turns out that MD5 is no longer secure at all.

Collisions in MD5 were found in 2004.

And since then the time required to find collisions,

has only gotten better and better,

to the point where now it's possible to find collisions in a number of minutes.

And so MD5 should no longer be used.

I mention it here only because it does still appear from time to time in legacy

code, and you need to be aware of what it is and the fact that it's out there.

But you should not be using it in new code that you write.

SHA-1, is a standardized hash function introduced in 1995.

SHA-1 has a 160-bit output length, and so

ideally we can hope that it provides about 80-bit security.

Very recently, theoretical analyses have indicated some weaknesses in SHA-1.

And although currently no collision attacks on SHA-1 are known,

because of these theoretical weaknesses that have been discovered,

people have begun migrating away from SHA-1.

So even though SHA-1 is still quite common, and

you'll find it in lots of code out on the Internet.

The tendency nowadays is to begin moving away from SHA-1 toward more recently,

more recent hash functions and stronger hash functions.

Like SHA-2 and other hash functions we'll talk about next.

SHA-2 is in the same design family as SHA-1,

and it's also a hash function that's been standardized.

And it's available in either 256-bit or 512-bit output lengths.

So these already are offering more security than SHA-1 if you

assume that they're giving you collision resistance up to the optimal bound.

Very importantly, no known weaknesses are, are currently present in SHA-2.

And so for that reason,

it's generally believed to be safe to use SHA-2 instead of SHA-1, which as

we mentioned before had some theoretical weaknesses that have been discovered.

So as we were saying a moment ago, many people have begun migrating away from shy,

from SHA-1 and replacing usage of SHA-1 with SHA-2.

In the last year ,SHA-3 has been introduced.

This was a result of a public competition run from 2008 to 2, 2,

to 2012, which was run very much the way that the competition for

the advanced encryption standard was run.

So anybody in the world was allowed to submit a candidate.

There was a public selection process in which people would try to attack

submissions by different teams.

And eventually a winner was chosen by Nist.

And standardized as the next generation SHA that is, SHA-3.

The name of the submission was Keccak, and so

you often hear people refer to it by that name as well.

One thing interesting about Keccak,

is that its design was very different from the SHA family.

In fact, it's different both from MD-5, as well as from SHA-1 and SHA-2.

And this was actually a conscious decision on the part of Nist,

because they were trying to get a hash function that was constructed in

a very different way from the SHA family.

So that in case any weaknesses where ever discovered in the future,

in the way that SHA was designed,

the hope would be that those weaknesses would not be present in Keccak as well.

Keccak supports a variety of output lengths ranging from 224 bits to 512 bits,

and people can choose which output length they prefer for

their particular application.

So in this lecture we've

just explored a little bit about cryptographic hash functions.

And in the next lecture we'll come back to our topic of message authentication, and

see how it's possible to use collision resistant hash functions,

to obtain message authentication for long messages.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.