0:04

In section 1.2, we're going to talk about Hash Pointers and their application.

Â A hash pointer is a kind of data structure that turns out to be used a lot

Â in the systems that we're talking about.

Â And a hash pointer is basically a simple thing,

Â that we're going to take a pointer to where some information is stored.

Â And we're going to together with the pointer store a cryptographic hash of

Â the information.

Â So whereas a regular pointer gives you a way to retrieve the information.

Â A hash pointer is gonna let us ask to get the information back.

Â 0:41

And we're gonna draw a hash pointer in diagrams like this.

Â That we're gonna have each of, and then an arrow that points to something.

Â So anything drawn like this, think of it as being a hash pointer to this thing.

Â It's a pointer to where it's stored and

Â it's also the hash of the value that this data had when we last saw it.

Â 1:01

And we can take hash pointers and

Â we can use them to build all kinds of data structures.

Â So a key idea here, take any data structure or link lists or

Â binary search tree or something like that and

Â implement it with hash pointers instead of pointers as we normally would.

Â 1:25

So just like a regular linked list where you have a series of blocks and

Â each block has data as well as a pointer to the previous block in the list,

Â here the previous block pointer will be replaced with a hash pointer.

Â So it says where it is and what the value of this entire previous block was.

Â And we're going to store, we're gonna remember the head of the list like this.

Â Just as a regular hash pointer.

Â And a use case for this for a block train like this is a tamper evident log,

Â that is if we wanna build a log data structure that stores a bunch of data.

Â So that we can add data onto the end of the log but if somebody goes later and

Â messes with data that is earlier in the log we're going to be detect it.

Â That's what temper evidence means.

Â So to understand why a block chain gives us this tamper evident property.

Â Let's ask what happens if an adversary wants to go back and

Â tamper with data later that's in the middle of the chain.

Â So let's assume that an adversary wants to tamper with this block down here.

Â He wants to change the data here.

Â And he wants to do it in such a way that we,

Â the holders of the hash pointer at the head here, won't be able to detect it.

Â 2:34

So the adversary changed the contents of this block.

Â And therefore, the hash here which is a hash of this entire block

Â is not going to mash up because the hash function is collision free,

Â it must be the case that the hash of this block is now different.

Â And so we could detect the inconsistency between this data and

Â the hash pointer that we remembered before or

Â we could do that unless the advisory allows tampers with the hash pointer.

Â If he tampers with this hash pointer then he makes these two match up.

Â But now he's changed the content of this block.

Â And what that means is that when we come back later and

Â hash the contents of this block, it's not going to match the hash

Â that we remembered before because the contents of the block has changed.

Â And so we're going to detect the inconsistency between

Â the contents of this block and this hash,

Â unless the adversary also tampers with the block over here on the right.

Â But now, when he does that, the hash of this block is not going to match

Â the hash that we remembered up here and the hash that we're holding on to.

Â And this the adversary can't tamper with because this is the value that we

Â remembered as being the head of the list.

Â And so the upshot of this is that if the adversary wants to tamper with data

Â anywhere in this entire chain, in order to keep the story consistent he's going to

Â have to tamper with hash pointers all the way back to the beginning.

Â And he's ultimately going to run into a road block because he's

Â wont be able to tamper with the head of the list.

Â And so what this means is that just by remembering this hash pointer,

Â we've essentially remembered a kind of hash,

Â a tamper evident hash of the entire list all the way back to the beginning.

Â And so we can build a block chain like this containing as many blocks as we want

Â going back to some special block at the beginning of the list which

Â we might call the genesis block.

Â And that's a tamper evidence log built out of the block chamber.

Â 4:22

Now, another useful data structure that we can build using hash pointers

Â is a binary tree.

Â We can build a binary tree with hash pointers and

Â this is called in the jargon, a Merkle tree after Ralph Merkle who invented it.

Â And the idea is this, suppose we have a bunch of data blocks which we'll draw

Â across the bottom down here.

Â We're going to take consecutive pairs of these data blocks and for

Â these two data blocks we're going to build a data structure here that has two hash

Â pointers, one to each of these blocks, and similarly all the way across.

Â We'll then go another level up and

Â this block here will contain a hash pointer of these two children down here.

Â And so on, all the way back up to the root of the tree.

Â And then just like before we're going to remember just the hash pointer

Â up here at the head of the tree.

Â And we can then,

Â if we want traverse down through the hash pointers to any point in the list.

Â And we can make sure that the data hasn't been tampered with.

Â Because just like I showed you with the block chain,

Â if an adversary tampers with some block down here at the bottom with the data

Â that will cause the hash pointer that's one level up to not match.

Â So he'll have to tamper with that.

Â And therefore, he'll have to tamper with the hash pointer one level up from there.

Â And eventually he'll get up to the top,

Â where he won't be able to tamper with the hash pointer that we've remembered.

Â So again, any attempt to tamper with any piece of data across the bottom

Â will be in short against, by just remembering the hash pointer at the top.

Â 5:50

Now, another nice feature about Merkle trees,

Â is that unlike the block chain that we built before, that if someone wants

Â to prove to us that a particular data block is a member of this Merkle tree.

Â All they need to show us is this amount of data.

Â So if we remember just the root and someone wants to convince us that this

Â block is in the Merkle tree, they need to show us this block.

Â And we can verify that the hash matches up.

Â And then they need to show us this block and

Â we can verify that the hash of this matches that.

Â They can show us this block.

Â We verify that the hash of this block matches this hash pointer.

Â And then they show us the data.

Â And just by verifying the hashes up to the root, we can ensure,

Â we can verify that this data block was in the Merkle tree.

Â So that takes about log n items that we need to be shown, and

Â it takes about log n time for us to verify it.

Â And so at the very large number of data blocks in the Merkle tree,

Â we can still verify proven membership in a relatively short time.

Â 6:54

So Merkle trees have various advantages.

Â One advantage of course, is the tree holds many items but

Â we just need to remember the one root hash which is only 256 bits.

Â We can verify membership in a Merkle tree in logarithmic time and logarithmic space.

Â That's nice.

Â And there's a variant which is a sorted Merkle tree.

Â That's just a Merkle tree where we take the blocks at the bottom and

Â we sort them into some order.

Â Say alphabetical, lexicographic or numeric order or some order that we agree on.

Â Having done that, once we've sorted the Merkle tree now,

Â it's possible to verify non-membership in a Merkle tree.

Â That is, we can prove that a particular block is not in the Merkle tree.

Â And the way we do that is simply by showing a path to the item that's just

Â before where that item would be and just after where it would be.

Â And then we can say look, both of these items are in the Merkle tree,

Â they're consecutive.

Â And therefore there is no space in between them.

Â There is nothing in between them and so

Â the thing that we are trying to prove non-membership of can't be there.

Â Merkle tree is binary search tree,

Â built with hash pointers, we can do logarithmic time membership proofs,

Â non-membership proofs if we sort the tree and it is very efficient.

Â 8:05

More generally, it turns out that we can use has pointers in

Â any pointer-based data structure as long as the data structure doesn't have cycles.

Â If there are cycles in the data structure,

Â then we won't be able to make all hashes match up.

Â If you think about it in an acyclic data structure,

Â we can sort of start near the lees or near the things that

Â don't have any pointers coming out of them compute the hashes of those and

Â then work our way back, sort of towards the beginning.

Â But in a structure with cycles, there's no end that we can start with and

Â compute back from.

Â So for example, a directed acyclic graph, out of hash pointers and

Â we'll be able to verify membership in that day very efficiently.

Â And it'll be easy to compute.

Â So this is a general trick you'll see over and

Â over throughout the distributed data structures and throughout the algorithms

Â that we talk about later in this lecture and in subsequent lectures.

Â