This course will introduce you to the foundations of modern cryptography, with an eye toward practical applications.

Loading...

From the course by University of Maryland, College Park

Cryptography

527 ratings

At Coursera, you will find the best lectures in the world. Here are some of our personalized recommendations for you

This course will introduce you to the foundations of modern cryptography, with an eye toward practical applications.

From the lesson

Week 1

Introduction to Classical Cryptography

- Jonathan KatzProfessor, University of Maryland, and Director, Maryland Cybersecurity Center

Maryland Cybersecurity Center

[SOUND] In this lecture, I wanted to go into

a little bit more detail about how it's possible to break the Vigenere cipher.

This lecture is really intended for

those of you who are going to be doing the first programming assignment, and so

I've actually tried to present exactly the necessary background you need.

In order to complete that assignment here.

Those of you who are not planning to do that assignment can feel free to skip

this video, although you may find it interesting as well,

even if you're not planning on doing that assignment.

Now when we introduced the Vigenère cipher,

we presented it in the following way.

We said that the key was a string of letters.

From the English alphabet and

the plaintext will also consist of letters from the English alphabet.

To encrypt a given plaintext with a key, what you do is simply shift every

character in the plaintext by the amount dictated by the next character of the key.

You can view this simple as addition modular 26,

because there are 26 letters in the English alphabet.

And if the key is sorted then the plaintext you simply wrap around in

the key as needed, and decryption just reverses the process.

So here was the example we gave, where we're encrypting the plain text,

tell him about me, using the key cafe, and so

you can view this as simply replicating the key cafe, as many times as necessary,

in order to complete to fill out the length of plaintext.

And then doing the appropriate addition, ie shifting,

modular 26 in order to get the cipher text as displayed her on the bottom line.

Now, I wanna present a variant of the Vigenere cipher, which is sort of

mathematically the same, although technically a little bit different.

And what we are gonna do here is we are gonna work with both plaintext and

keys viewed in terms of bytes rather than of letters over the English alphabet.

And so in particular what we are gonna do is we are gonna resume,

we start with an ASCII plaintext and we are gonna have a cipher text.

Which we're gonna represent in hex.

So ultimately, of course,

the plain text key and cipher text are all going to consist of bytes.

It's just that we're gonna view the plain text as nasty string and

it's easiest to view the key and cipher text as hex, digits or hex strings.

Now why would you want to do this?

Well, first of all, it turns out to be easier to implement.

So if you look at the code, actually for programming assignment one,

you'll see that it's fairly easy to implement this.

It's not that it's too much more difficult to implement the Vigenère cipher the way

we had just presented it, but it's slightly easier to do it this way.

I think more to the point, is that it's easier to use and the reason for

that is the Vigenère cipher as originally presented

was limited to plaintext that only consisted of lowercase English letters.

It did not support capital letters, it did not support punctuation,

it did not support spaces.

And if you were ever gonna do encryption and practice, you'd, of course,

like your encryption scheme to be able to handle arbitrary ASCII characters.

It turns out that what we're also gonna change is rather than doing addition,

modular 26, we're going to do byte-wise XOR.

So we're gonna take a byte of plaintext and a byte of the key, and

XOR them together to get the next byte of the ciphertext.

Now we could have done modular addition here too.

The number of possible bytes is 256, because a byte is of course eight bits,

and so we could have done addition, modular 256, and that would have been

maybe morally closer to the Vigenère cipher as we had originally presented it.

In fact, it doesn't really matter much from a mathematical point of view, and

it's easier to implement XOR,

it's faster to implement XOR than to implement modular 26 edition I suppose.

So let me give an example here.

This is just to specify exactly what's going on and

you can look in the code actually for programming assignment one.

I've given their code for encryption and

you can see how that lines up exactly with what I'm saying here.

So again we're gonna view the key as a string of bytes.

We are gonna view the plaintext as a string of ASCII characters, and

to encrypt, what we do is simply XOR, as I said,

the next character of the plaintext with the next character of the key.

And we wrap around in the key as needed, just like in the original Vigenere cipher.

And the decryption can reverse the process because XOR,

if invertible in the sense that if we take a plain text character,

XOR the next byte of key to get a cipher text character and then XOR the key

into the cipher text character, we'll recover the original plaintext character.

So here's a worked example.

Let's say our plaintext is the string "Hello!" so

this is a six character ASCII string.

Notice we have a capital letter here.

We have lower case letters here and we also have the exclamation point.

The key let's just take as the hex string.

A1 2F, so that's going to be a two byte key.

If we represent the plaintext "Hello!" as a sequence of hex characters and

you can do this just by looking in an ASCII table,

then you'll see that if we write it out in hex,

the plaintext is equal to hex 45 65 6C 6C 6F 21.

We're going to then XOR that with the key were again, the key is replicated as many

times as needed to fill out the length of the plain text.

So that means that in this particular case, we're going to XOR

the hex string on the previous line with the hex string A1 2F A1 2F.

Now it's easiest to see what's going on if you write everything out in terms of its

underlying binary representation.

So let's take the first pair of characters.

So we have the plain text character hex 48 and

the first character of the key, hex A1.

If we write those out in binary,

hex 48 is equal to 0100 1000 hex A1 is 10100001.

If you XOR those bit by bit you get the result 11101001.

And then if you convert that back into hex you get the hex string E9.

And if you do that for each character of the plaintext,

throwing it with the correspondent character of the key, you should find that

you get the cipher text, the six byte cipher text E9 4A CD 43 CE 0E.

Now how can we attack this variant Vigenere cipher?

And from now on, for the rest of this lecture,

we're going to assume that we're attacking this variant Rather than the original.

Well there are two steps involved.

So the first is determining the key length.

So, I didn't specify exactly how the key was chosen.

And in fact there are different ways you can imagine choosing the key.

But what I'm thinking of here, is that the key length is chosen uniformly in some

known interval and then after you fix the key length,

you then chose a uniform sequence of bytes of the specified length.

So when we're attacking the scheme,

we're gonna be given some cipher attacks, and we're gonna try to recover the key.

And we're gonna do that in two steps by A, first determining the length of the key,

how many bytes are in the key.

And then determining every byte of the key until we have the whole thing.

So let's go through these steps one by one.

When attacking the Vigenere cipher,

what we're going to make use of known plaintext letter frequencies for

the English language alphabet, we're gonna assume here that we're gonna

be given cipher text that results from encryption of English language test.

We've said already that if it's not the case,

if we're working with some other language.

Where the plaintext is from some other language, we can, of course,

do a corresponding attack, but just using different frequencies.

But here I'm assuming English,

though it won't matter in any of the mathematical equations that I show.

You can get these letter frequencies from Wikipedia.

You can also get them from calculating them yourself over some corpus.

But for our purposes it suffices to take the values that are given

on Wikipedia or in this chart.

So how do we determine the key length?

Well, first of all, let's let p i, for i ranging from 0 to 255,

be the frequency of the byte i in some English language text.

That is, p0 represents the frequency

of having the byte 0 in some English language plaintext.

Now of course, if you look at the ASCII table, you'll see that ASCII

characters corresponding to values less than 32 are non-printable characters.

And so if you're encrypting English language text,

you would have that p i is equal to 0 when i is less than 32.

And similarly, the value of p i for i greater than 127 is also 0.

The ASCII character with the value 127 is some other non-printable character,

and ASCII values are not even defined once you get above 128.

So the values of p i are going to be 0, the frequencies

of those bytes are going to be 0, when i is less than 32 or i is greater than 127.

As another example, if you look at the ASCII value corresponding

to the lowercase letter a, that's a 97.

And so p 97 is the frequency of the lowercase character a in

standard English texts, let's say.

And the main point here is, first of all, that this distribution can be calculated.

Here it wouldn't suffice to look at the distribution on Wikipedia

that only has English letters.

You would need also to take into account punctuation and

spaces, perhaps numerals, perhaps the difference between upper and lowercase.

But nevertheless you could calculate these from a known corpus.

And more to the point is that this distribution is very far from uniform.

As we've already seen, many of the p i's are 0.

Even the p i's that are non-0 are not uniformly distributed among those values.

An e is the most common letter, for example, and

a space would be relatively common as well.

And the key point we're gonna use in our attack is that if the length of the key

that was used to generate some ciphertext is N, then every Nth

character of the plaintext was encrypted using the same shift, or XOR in this case.

And so if we take every Nth character of the ciphertext and

then calculate frequencies of different bytes in every Nth character.

So we just imagine taking out every Nth character or

selecting out every Nth character of the ciphertext, and

then running your frequencies, your statistics over that.

Then what you expect to get are exactly the p i values, but

in some permuted order.

Because every time you have an e, let's say a lowercase e in the plaintext,

that's gonna get shifted by the same amount if you look at every Nth character

to some ciphertext character, some ciphertext byte.

And then you expect to see that corresponding byte appear with the same

frequency in the ciphertext, again, if you look at every Nth character.

So what you're doing is simply shuffling around the p i's, but

the values themselves should be replicated.

On the other hand, if you take every Mth character, where M is not a multiple of N,

then you're picking out characters that are all shifted by different key bytes.

And in that case, you would not expect the frequencies of those characters

to match up with the frequencies of the underlying plaintext.

And so, in particular, if you calculate the resulting frequencies

that you get from doing that, heuristically speaking, you

expect to get something that will be close to uniform or at least closer to uniform.

Because again, you're shifting your plaintext by different amounts

in this particular selection of ciphertext characters.

So the question of determining the key length boils down to distinguishing

the case where you have a bunch of frequencies that are permutations of

these known p i values,

versus the case where you've got a bunch of frequencies that are close to uniform.

There are various ways you could imagine distinguishing these two possibilities,

but the easiest is to do something like the following.

So say you've computed some candidate, some observed set of frequencies,

q 0 through q 255, where, again, q i here represents

the observed frequency of byte i in your selection of ciphertext characters.

Say, again, by picking out every Nth character of the ciphertext.

Calculate those frequencies and then just compute the summation of q i squared.

And let's see what we expect.

So if your q i values are close to uniform, i.e.,

each q i is about 1/256, well,

then the summation of q i squared is going to be the summation of 256 values that

are all approximately 1/256 squared, and that equals 1/256.

On the other hand, if the q i values are a permutation of the p i values,

then the summation of the q i squared will be about equal to, or exactly equal to

if they're exactly a permutation, the summation of the p i squared.

As I said earlier, you could compute the p i's from a known corpus, and

then compute what you expect to get for the sum of the p i's squared.

This would be somewhat annoying and time consuming to do, but

you certainly could do it.

But the key point, actually, is that this value,

the sum of the p i squared, will be significantly larger than 1/256.

And so this gives a way to determine the key length.

What you do is you simply try all possibilities for the key length, and

for every candidate value N, you pick out every Nth character of the ciphertext.

Compute the frequencies of each byte in that selection of characters.

Compute the summation of the frequency squared.

And then just look for

the maximum value among all the different choices of the key length.

And again, what you expect is that when you try the wrong value for the key

length, you should get approximately the summation of q i squared being 1/256.

And when you hit upon the right key length,

you expect to get something that's much larger than 1/256.

It will be close to the sum of p i squared, though again,

you don't really need to know what that value is.

You just need to look for the maximum.

So now that we have a way to get the length of the key,

we're left with determining each of the bytes of the key.

So let's assume we've done the first part correctly and

the key length N is now known.

Well, what we're gonna do, again,

is look at every Nth character of the ciphertext, pluck out every Nth character.

Note that we can do this starting with any character we want.

And in particular if we wanna figure out what the ith character or

byte of the key is, what we'll do is simply start at the ith character

of the ciphertext and then count every N characters from that and

pick out every Nth character starting from the ith character.

If we start from the first character and then get the first,

n plus first, two one plus first,etc., that will all correspond to a bunch of

ciphertext characters generated by shifting using the first byte of the key.

And so on, if we had started with the second character,

that would correspond to shifting or XORing with the second byte of the key.

And I'll call this the ith ciphertext stream.

So the sequence of every Nth character starting with the first will be the first

ciphertext stream, etc.

And again, the observation is that every byte in the stream was generated by XORing

some underlying plaintext with the same byte of the key.

What we'll then do is we'll take this resulting stream and

we'll try decrypting it using every possible byte value B.

So we'll try every possible value of that byte of the key ranging from 0 to 255.

And that will give us a candidate plaintext stream for

each possible value of that byte.

So we can imagine if we did them all at once, we would get 256 possible plaintext

streams that could result from decrypting the corresponding ciphertext stream.

Now these are all going to be gibberish because they don't correspond to adjacent

characters, they correspond to taking every nth character and if you took every

nth character of a plain text, you would get something which is gibberish.

However, you do still expect that the plain text frequencies are observed.

That is, even if you take every eleventh character of your plain text, you would

expect that the letter E will occur most frequently in that plaintext stream.

And so, when the guess B is correct, then you can deduce a couple of things.

So when you guess B for the shift value,

the value of that ith byte of the key is correct, then first one all,

every byte in the plaintext stream, that you've obtained,

should be a valid character that would appear in English text.

And, in particular, that means every byte in the plaintext stream

should take on a value between 32 and 127.

If it's outside that range,

then unless the plaintext came from something else that you didn't expect,

that would mean that your guess B could not possibly be correct,

because the English language text does not have any ASCII characters lower than 32 or

greater than 127.

Moreover, as we said earlier, the frequencies

of each of the letters would correspond to their frequencies in regular English text.

And so, in particular, if you just look at the frequencies of all the lower

case letters, say right as a fraction of all lower case letters over all so

the number of times that the lowercase E appears as a fraction of all lowercase

letters that should be close to known English letter frequencies.

And so what you can do is you can then try your guess B for the next byte of the key,

pluck out ever end character from your decipher text, everything by B

to get a candidate value for every nth character of plain text.

And then look and tabulate the observed frequencies for

each of the lowercase letters, say.

And we'll call these values qa through qv, so they just correspond,

those are the observed frequencies for the letter A among all lowercase letters,

the letter B among all lowercase letters,

up to the letter z over all lowercase letters.

And again when you guess B is correct, then you should find that

the summation of q i p i where PI are the known letter frequencies of the lower

case English letters should be equal to the sum of the PI squared.

And this is because you expect in this case QI to in fact be equal to PI right?

QI here is observed frequencies in a candid of plain text not in

the cipher text itself.

And that value is a known value you can calculate it for yourself or

you can take my word for it and use the value display here.

So again, what you expect is that when you hit upon the correct guess B for

the next byte of the key and then decrypt the Ith ciphertext

stream using that byte B, you should get a plain text stream, a sequence of bytes,

where first of all every byte in the stream is between 32 and 127.

Furthermore, the calculated frequencies of lower case letters in that stream

should give you some values, q a through q z,

such that the sum of q i p i is about .065.

In practice, you're not going to get exactly .065 and what you can do is simply

take the value of b that maximizes the value of summation q a p i.

Subject to this caveat of everything lying between 32 and 127 and

possibly others as well, other restrictions I mean as well.

So for example, if you happen to fee a huge amount of commas, that would be

a sign that you probably that you're guess for B is probably not correct.

And then you would try another one.

What's the time that we need to implement these attacks?

Well, let's say that we know that the key length is in the range of 1 to L.

So the determining the key length takes about time 256 times L.

These numbers are very roughly, they're just meant to be indicative.

There are different operations going on and so

this is not any kind of very specific measure.

But just in general, it's about 256 times L, because, for

each candidate value of the key length,

you're doing about 256 works to figure out all the observed frequencies, and

do the calculations, and then figure out which one is the maximum.

Once you've determined N, the actual length of the key.

Then, determining every byte of the key is done by

repeating N times the process of finding the next byte of the key.

And finding the next byte of the key takes about time 256 square because you're

enumerating overall 256 possible values for that byte.

And then, for each candidate value of a byte, you're doing about 256 work.

You're doing that.

That 256 squared work a total of N times.

N is at most L, so the work here is at most 256 squared times L,

and the point here is that it's linear in L.

So this really is not only polynomial time, but it's really quite efficient.

And it's instructive to compare this to the time it would require

to carry out a brute force key search.

So a brute force key search would require time about 256 to the L.

So the number of possible keys of length exactly L is 256 to the L,

if the key has L bytes in it and each byte takes on 256 values.

So we've gone here from the trivial brute-force key search, which takes

exponential time in L down to linear in L, just by being a bit clever in our attack.

Now I do want to point out a couple of things in practice,

now the first thing is that the attacks that I have described will become

more reliable as the ciphertext length grows larger, and the reason for

that is because when your ciphertext is larger, your plaintext is larger, and

when your plaintext is larger, you expect that the observed

letter frequencies in the plaintext will more closely correspond to

the letter frequencies tabulated on say Wikipedia.

And for very short ciphertexts and correspondingly short plain text, it's

possible that you have large deviations from those average letter frequencies.

But if you have a very short playing text of about 30 characters,

you might have no E's at all.

And then that would completely throw everything off.

But as you have hundreds of characters and

even thousands of characters, the attacks get better and better.

Now the attacks do still work for relatively short ciphertexts and

in particular I did it myself and the attacks I described do work

on the ciphertexts I've given you for homework assignment one.

However, the attacks don't quite work or may not quite work out of the box as I've

described them, and you may need to do a little bit of tweaking in the algorithms,

and a little bit of manual involvement might be needed.

You might have to even potentially guess.

You might get, for example, two different candidate bites for

the first bite of the key.

And then you have to look yourself and

see which one give you something which looks meaningful.

And if you have to do this in order to solve homework one,

you shouldn't feel you're doing something wrong,

it's perfectly fine to do that as long as you get the right answer in the end.

So, with that in mind, I'll just leave you to take

a look at the programming assignment and try your luck with it.

Please do feel free to use the discussion boards to ask questions and

to try to get help from both myself and other students in the class.

You shouldn't feel if you're stuck you should feel free to ask and

get yourself unstuck as quickly as you can.

And with that I just wish you good luck and

I do hope that you have fun with this assignment.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.