0:00

In part one of this lesson, we looked at the preimage and second preimage attacks,

Â which are both order 2 to the n when carried out via brute force,

Â which is the only way they can be carried out against a random oracle.

Â Now we get to a very interesting attack that is quite non-intuitive.

Â In a hash collision attack, all we are interested in doing is finding

Â two messages that happen to have the same digest.

Â And we don't care what the specific digest is.

Â Intuitively, we probably expect this to be little different from a second

Â preimage attack.

Â After all, it would appear to be essentially the same thing.

Â We want two messages that have the same digest.

Â But appearances can be deceiving because the seemingly minor distinction that

Â we don't care what the actual digest is,

Â makes all the difference due to what is known as the birthday paradox.

Â And which is why this attack is usually referred to as a birthday attack.

Â Many practical breaks in real cryptosystems,

Â such as the breaking of WEP keys and

Â the recently demonstrated break in SHA-1, involve some kind of birthday attack.

Â So let's spend some time getting familiar with this.

Â The Birthday Paradox asks a very simple question that most people,

Â the first time they see it, can confidently guess an answer that

Â is almost always not even close to being correct.

Â How many randomly chosen people do I need to have in a room

Â before there is a 50/50 chance that at least two people have the same birthday?

Â Let's simplify the problem by ignoring leap years, so we have 365 possible

Â birthdays, and also assuming that the distribution of birthdays is uniform.

Â Before answering this question, let's look at the analog to a preimage attack

Â that would be how many people I have to gather in a room before there's

Â a 50/50 chance of there being someone that has a specific birthday?

Â I can make this a second preimage attack by just specifying that I want

Â to find someone that has the same birthday as me.

Â 2:11

We already talked through the solution to this problem, which for small d,

Â the number of days or the number of digests, is given exactly by

Â this expression, which requires about 253 people on average.

Â This may or may not seem too surprising to you,

Â since it might not seem that different from 183 people.

Â But it's important to understand why it is significantly more than half.

Â Remember that we are asking how many people we need to consider before finding

Â someone that has a specific birthday.

Â If you think about it, you'll realize that you we're actually never guaranteed of

Â finding someone that has that birthday no mater how many people we look at.

Â One way to think of this is that there's a finite,

Â albeit vanishingly small, possibility that no one else on this planet

Â happens to have been born on the same day of the year as me.

Â 3:04

Now let's consider the analog to the hash collision attack.

Â How many people do I need to consider before there is a 50/50 chance of finding

Â two people that happen to have the same birthday?

Â The first major difference is that because of the pigeonhole principle,

Â if I put 366 people in a room,

Â I am absolutely guaranteed that there are two people with the same birthday.

Â So as long as there are more messages than digests, I will find a collision.

Â But how many people do I need to be in that room before I have

Â a 50/50 chance of succeeding?

Â We can get at this by calculating the probability that

Â none of the k people currently in the room have the same birthday,

Â which we can do by adding people to the room one at a time.

Â If only one person is in the room,

Â then the likelihood that no two people in the room have the same birthday is 100%.

Â Or another way of saying this is that the one person can have any of

Â the 365 birthdays out of the 365 available birthdays.

Â When I put the second person into the room, the chance that they don't share

Â a birthday with the person already there is 364 divided by 365,

Â because they can have any birthday except one.

Â When I add the third person,

Â they can have any birthday except two, or 363 out of 365.

Â The next person can have any of 362 birthdays.

Â This pattern continues all the way down to the kth person.

Â We can now see mathematically why the 366th person

Â guarantees a solution, since the term associated with them is identically zero.

Â Meaning that the probability is zero

Â that no two people in the room have the same birthday.

Â We can write this using product notation.

Â We can also express it using factorial notation.

Â While there is no closed form solution for k, computing this probability using

Â one of these expressions is actually quite easy if D and

Â k are pretty small, especially if we use a simple spreadsheet.

Â We can work up incrementally from one person until the probability drops to less

Â than 50% since that means that the probability that at least two people do

Â have the same birthday has just risen above 50%.

Â Doing this, we find that the number of people needed to reach the 50/50

Â threshold is a surprisingly low 23 people.

Â 6:25

This leaves us with a very simple expression that we can solve

Â directly for k.

Â We see that we just need a little more than the square root of the number

Â of digests.

Â As a sanity check for D equals 365 days,

Â we get 22.5 people, which is in excellent agreement with our previous results.

Â So we can have very good confidence that we haven't done anything wrong.

Â For a hash function, we can write D in terms of the size of the digest.

Â The key observation is that the number of hashes that must be computed is on

Â the order of the square root of the number of possible digests.

Â It also means that while the difficulty of a hash collision attack is still

Â exponential in the size of the hash, the effective number of bits is

Â only half compared to the difficulty of a preimage or second preimage attack.

Â This is extremely significant.

Â An example in a prior module described the related key attack against the original

Â WIFI WEP encryption which had a 64-bit key, but

Â only 24 bits of it actually changed with each message.

Â This meant that the effective number of keys were about 16 million.

Â But since it was only necessary to find two packets that used the same key,

Â a birthday attack only required about 5,000 packets in order to recover the key.

Â An attack that could be carried out in real time.

Â Recall from another example, that if we calculate a million digests a second,

Â then it would take half a million years to carry out an exhaustive preimage or

Â second preimage attack against a 64-bit hash function,

Â though we would have an even chance of finding a solution in about 400,000 years.

Â A birthday attack, on the other hand,

Â would have an even chance of finding a collision in less than an hour and a half.

Â This is why adversaries go to great lengths to figure out ways to perform

Â birthday attacks against cryptosystems, and

Â why some form of birthday attack is often the first successful crack into a system.

Â The first hash collision involving SHA-1 was published in early 2017, and

Â was the result of finding weaknesses that reduced the expected number of digests

Â that needed to be calculated from the 2 to the 80th needed against a random oracle,

Â down to about 2 to the 63rd,

Â which still cost about $100,000 worth of computer time to carry out.

Â While expensive, however, this is well within the reach

Â of large criminal organizations, not to mention national intelligence agencies.

Â Whether this capability is worth the cost for either is debatable.

Â And for the foreseeable future,

Â it is still highly unlikely that anyone will actually be able to benefit from it.

Â But security specialists don't think in these terms if they don't have to.

Â And thus, the use of SHA-1 is very quickly on the way out.

Â For instance, most major Internet browsers announced several years ago,

Â well before the first and so far only actual break,

Â that they will no longer accept SHA-1 signed certificates by the end of 2017.

Â Present best practices recommend that a 160-bit hash function

Â is the smallest that should be used if any kind of birthday attack is conceivable,

Â but 256-bit is required for most classified cryptosystems.

Â 9:40

In the two parts of this lesson,

Â we've looked at the three attacks against the Random Oracle, and

Â seen that both preimage and second preimage attacks are ordered 2 to the n.

Â But the hash collision attack is order square root of 2 to the n,

Â which makes it extremely powerful.

Â Nearly all other attacks against real hash functions exploit

Â weaknesses in the algorithms in an attempt to reduce the amount of computational

Â effort to agree where some kind of brute force attack becomes feasible.

Â We are now at the point where we can discuss the properties that a real

Â cryptographic hash function should have.

Â