In the last segment we talked about the CBC-MAC and the NMAC, but throughout
the segment we always assumed that the message length was a multiple of the block
size. In this segment, we're going to see what to do when the message length is not
a multiple of the block size. So recall that the encrypted CBC mac or ECBC-MAC for
short uses pseudorandom permutation F to actually compute the CBC function as we
discussed in the last segment. But in the last segment, we always assumed that the
message itself could be broken into an integer number of blocks for the block
cipher. And the question is what to do when the message length is not a multiple
of the block size. So here we have a message where the last block actually is
shorter than the full block and the question is how to compute the ECBC-MAC in
that case. So the solution of course is to pad the message and the first pad that
comes to mind is to simply pad the message with all zeros. In other words we take the
last block and just add zeros to it until the last block becomes as long as one full
block size. And so my question to you is whether the resulting MAC is secure. So
the answer is no, the MAC is not secure. And let me explain why, basically the
problem is that it's possible now to come up with messages so that message m and the
message m concatenated zero happen to have exactly the same Pad. And as a result once
we plug in both m and m||0 into ECBC we'll get the same tag out, which means that
both m and m||0 have the same tag and therefore the attacker can
mount an existential forgery. He would ask for the tag on the message m. And then he
would output as its forgery the tag and the message m||0. And it's
easy to say why that's the case. Basically, to be absolutely clear here, we have our
message m. Which after padding becomes m000. So we had to add three
0's to it. And here we have the message m zero, an m that ends with zero. And after
padding, we basically now have to add two 0's to it. And lo and behold, they become
the same pad, so that they're gonna have exactly the same tag which allows the
adversary to mount the existential forgery attack. So this is not a good idea. In
fact appending all 0s is a terrible idea. And if you think about concrete case where
this comes up imagine the automatic clearing house system used for clearing
checks. I might have a check for a $100 and the tag on that check. Well, now, the
attacker basically could append a zero to my check and make it a check for $1000,
and that wouldn't actually change the tag. So this ability to extend the message
without changing the tag actually could have pretty disastrous consequences. So I
hope this example convinces you that the padding function itself must be a one to
one function. In other words, it should be the case that two distinct messages always
map to two distinct padded messages. We shouldn't actually have a collision on the
padding function. Another way of saying it is that the padding function must be
invertible. That guarantees that the padding function is one to one. So a
standard way to do this was proposed by the International Standards Organization
ISO. What they suggested is basically, let's append the string 100000 to the end
of the message to make the message be a multiple of the block length. Now to see
that this padding is invertible, all we do is describe the inversion algorithm
which simply is gonna scan the message from right to left, until it hits the
first one and then it's gonna remove all the bits to the right of this one,
including the one. And you see that once we've removed the pattern this way, we
obtain the original message. So here's an example, so here we have a message where
the last block happens to be shorter than the block length, and then we
append the 1,0,0 string to it. It's very easy to see what the pad is,
simply look for the first one from the right, we can remove this pad and recover
the original message back. Now there's one corner case that's actually quite
important, and that is what do we do if the original message length is already the
multiple of a block size? In that case it's really very, very important that we
add an extra dummy block. That contains the pad 1000. And again, I can't tell you
how many products and standards have actually made this mistake where they
didn't add a dummy block and as a result, the MAC is insecure because there's an
easy existential forgery attack. And let me show you why. So suppose in case the
message is a multiple of a block length, suppose we didn't add a dummy block and we
literally MAC-ed this message here. Well, the result now is that if you look at
the message which is a multiple of the block size and a message which is not a
multiple of the block size but is padded to the block size, and imagine it so
happens that this message m prime one happens to end with 1-0-0. At this point,
you realize that here, the original message. Here, let me draw it this way.
You realize that the original message after padding. Would become identical to
the second message that was not padded at all. And as a result, if I ask for the tag
on this message over here, I would obtain also the tag on the second message that
happened to end in 1-0-0. Okay, so if we didn't add the dummy block, basically,
again, the pad would be not invertible, because two different messages, two
distinct messages, happen to map to the same padded result. Again, as a result,
the MAC becomes insecure. So to summarize, this ISO standard is a perfectly fine way
to pad, except you have to remember also to add a dummy block in case message is a
multiple of the block length to begin with. Now some of you might be wondering
if there is a padding scheme that never needs to add a dummy block, and the answer
is that if you look at a deterministic padding function, then it's pretty easy to
argue that there will always be cases where we need to pad, and the reason is
just literally the number of messages that are multiples of the block length is much
smaller than the total number of messages that need not be a multiple of the block
length. And as a result we can't have a one to one function from this bigger
set of all messages to the smaller set of messages which are a multiple of
the block length. There will always be cases where we have to extend the original
message and in this case that would correspond to adding this dummy padding
block. However, there is a very clever idea called CMAC which shows that using a
randomized padding function we can avoid having to ever add a dummy block. And so
let me explain how CMAC works. So CMAC actually uses three keys. And, in fact,
sometimes this is called a three key construction. So this first key, K, is
used in the CBC, the standard CBC MAC algorithm. And then the keys, K1 and K2,
are used just for the padding scheme at the very, very last block. And in fact in
the CMAC standard, the keys K1, K2 are derived from the key K by some sort of a
pseudo random generator. So the way CMAC works is as follows. Well, if the message
happens to not be a multiple of a block length, then we append the ISO padding to
it. But then we also XOR this last block with a secret key, K1, that the
adversary doesn't know. However, if the message is a multiple of the block length,
then of course, we don't append anything to it. But we XOR it with a different
key, K2, that, again, the adversary doesn't actually know. So it turns out,
just by doing that, it's now impossible to apply the extension attacks that we could
do on the cascade function, and on raw CBC. Because the poor
adversary actually doesn't know what is the last block that went into the
function. He doesn't know K1, and therefore, he doesn't know the value at this
particular point and as a result, he can't do an extension attack. In fact, this is a
provable statement. And so that this construction here is simply by XOR-ing
K1 or XOR-ing K2 is really a PRF. Despite not having to do a final
encryption step after the raw CBC function is computed. So, that's one
benefit, that there's no final encryption step. And the second benefit is that we resolve
this ambiguity between whether padding did happen or padding didn't happen by using
two different keys to distinguish the case that the message is a multiple of the block
length versus the case where it's not but we have a pad appended to the message. So
the two distinct keys resolve this ambiguity between the two cases, and as a
result, this padding actually is sufficiently secure. And as I said,
there's actually a nice security theorem that goes with CMAC that shows that the
CMAC construction really is a pseudo-random function, with the same security
properties as CBC-MAC. So I wanted to mention that CMAC is a federal standard
standardized by NIST and if you now, these days, wanted to use a CBC-MAC for
anything, you would actually be using CMAC as the standard way to do it. But
particularly in CMAC, the underlying block cypher is AES and that gives us a secure
CBC-MAC derived from AES. So that's the end on the segment and in the next segment
we'll talk about a parallel MAC.