0:07

In this lecture, we'll see vector clocks,

Â which is another way of assigning timestamps to events in a distributed system.

Â Vector timestamps are used in cloud computing systems such as Riak,

Â which is a key-value store.

Â In this approach, each process uses a vector of integer clocks.

Â Think of this as an area of integer clocks.

Â This is as opposed to integer clocks by

Â themselves which were used in the Lamport timestamps approach.

Â Suppose there are N processes in the group numbered one through

Â N. Each vector at each process is going to have N elements.

Â So, process i would maintain the vector V sub i,

Â 1 through N. This vector has N elements.

Â The jth element of the vector at process i,

Â V sub i of j,

Â is i's knowledge of the latest events that have happened at process j,

Â because of whenever an event happens at process j and some message gets sent over to i,

Â then Vi of j might get updated.

Â However, j might also send messages to other processes,

Â and this might get really transitively to i,

Â and this would result in Vi of j also getting incremented.

Â You would see as we discuss the rules.

Â So, how do you increment vector clocks?

Â We discussed the rules for incrementing Lamport clocks.

Â Rules for incrementing vector clocks are simple,

Â they are similar to the Lamport clock but they're slightly different.

Â Whenever a process executes an instruction or sends a message to say this is process i,

Â it only increments the ith element of its vector clock.

Â So, only V sub i of i incremented

Â whenever a process executes an instruction or sends a message.

Â When a message is sent, it carries a send-event's vector timestamp.

Â The entire vector here is assigned as a timestamp to that send-event,

Â and this entire vector is carried along with that particular message.

Â What does a process do when it receives a message?

Â Well, when it receives a message with a vector timestamp in there,

Â it uses it to update its own local vector clock.

Â The ith element of the vector clock at

Â the receiving process for i is simply incremented by one,

Â because it stands for the number of events that have happened at process i.

Â For each of the other elements in the vector,

Â it simply takes the maximum of the corresponding element

Â in the incoming message and in the local loop clock,

Â and it takes the maximum of those two and it sets it to

Â be the corresponding jth element in the local clock itself.

Â Again, there are two simple rules.

Â You simply increment your local ith element for every event,

Â and whenever you receive a message you take the max for

Â every other element in the vector.

Â So, let's see this for our earlier example.

Â Again, the same example as we saw previously.

Â So, all the clock values here are initialized to be all zeros at all the processes.

Â Again, these are local clocks,

Â only getting updated by those processes.

Â So, the process P1 executes an instruction.

Â It simply assigns it an event (1, 0,

Â 0) because it increments the first element of the vector,

Â that's because process one is P1.

Â Similarly, when P3 sends a message to P2,

Â it increments the third element of the vector because it is P3,

Â and assigns this resulting vector as the timestamp to the send-event of this message.

Â The message also carries this timestamp (0,0,1),

Â so that when P2 receives it,

Â it can then use it to update its local clock.

Â Remember again, that when it receives this message,

Â P2 will simply update the second element of the vector by one and increments it by one.

Â And for every other element, it takes the max

Â from the corresponding elements in the local clock.

Â So, zero here and zero here, sorry.

Â Zero here and zero here result in zero for

Â the first element and one from the received messaged,

Â and zero in the local clock for the third element result in a max of one,

Â being set for the third element of the local clock.

Â So,(0,1,1) is assigned as the timestamp for the received event

Â of this particular message being received at P2 from P3.

Â Next, what happens in the system is that P1 sends a message to P2.

Â In order to do this,

Â it increments its first element from one to two,

Â (2,0,0) is assigned as the timestamp of the send-event here,

Â and that is carried with the message.

Â When the message is received at P2,

Â it increments its second element from one to two,

Â that's shown in blue here.

Â The other elements are all maxed.

Â So, the green element, the first element

Â is taken from the received message because that's

Â higher than the corresponding first element in the local clock.

Â And the third element is taken from the local clock,

Â that's highlighted in red, the one,

Â because that's higher than what is in the received message.

Â Okay, so (2,1,1) is the resulting timestamp of the received event over here,

Â and that is also updated to be P2's current clock.

Â So, if you play the rules forward,

Â you can see for yourself that

Â these timestamps are true and these follow the rules that is laid out earlier.

Â So, one of the reasons for preferring vector timestamp was that they obey causality.

Â So, let's see the rules for obeying causality.

Â Two vector timestamps, VT1 and VT2,

Â assigned to events E1 and E2 respectively,

Â are equal to each other if and only if the corresponding elements are all equal.

Â So, the ith element of VT1 equals the ith element of VT2,

Â for all i equaling one through

Â N. Vector timestamp VT1 is less than or equal to vector timestamp VT2,

Â if the ith element of VT1 is less than or equal to the corresponding ith element

Â of VT2, for all i.

Â Now, given this, we can say the two events E1 and E2,

Â with vector timestamps VT1 and VT2, are causally related.

Â That is, E1 happens before E2,

Â if and only if VT1 is strictly less than VT2.

Â Okay, this is different than less than or equal to.

Â VT1 is strictly less than VT2,

Â if and only if the following conditions are true: VT1 is less than or equal to VT2,

Â so the corresponding elements of VT1

Â are less than or equal to the corresponding elements of VT2.

Â But also, there exists some value of j such that,

Â the jth element of VT1 is strictly less than the jth element of VT2.

Â Okay, so, if these two conditions are true,

Â then VT1 is said to be less than VT2,

Â and is also true that the event to which VT1 is assigned as

Â a timestamp happens before the event to which VT2 is assigned as a timestamp.

Â Now, we can also say that two events are concurrent, VT1 and VT2,

Â if and only if neither VT1 is less or equal to VT2,

Â nor is VT2 less than or equal to VT1.

Â Okay. This means that VT1 and VT2 are concurrent with each other.

Â We denote it using three vertical bars between VT1 and VT2.

Â So, let's see our example again and verify that these conditions are,

Â in fact, true for some of the events in this particular run.

Â So, A happens before B,

Â and you notice that the timestamps assigned to them obey a causality.

Â So, (1,0,0) is less than or equal to (2,0,0) because, for all the elements,

Â the ith element of the left vector is

Â less than or equal to the ith element of the right vector,

Â and there is some index in this case one,

Â where the first element of the left vector

Â is strictly less than the first element of this second vector.

Â Similarly, B happens before F is obeyed by the fact that (2,0,0) is less than (2,2,1),

Â you have less than or equal to obeyed here,

Â and several elements are smaller in the left vector compared to the right vector.

Â Similarly, A happens before F again is obeyed because (1,0,0) is less than (2,2,1).

Â Similarly, you can verify that for these other

Â happens before relationships outlined here,

Â the corresponding vectors obey the causalities.

Â So, let's look at an example. F happens before J,

Â F is assigned a timestamp of (2,2,1) and G is assigned a timestamp of (5,3,3).

Â It's clear that all the elements of F are

Â strictly less than all the elements of G. That's not required,

Â you just need one of the elements to be

Â smaller while all the other elements are less than or equal to,

Â but in any case here,

Â the left vector (2,2,1) is strictly less than the right vector (5,3,3).

Â Similarly, for H and G,

Â you see that (0,0,1) is less than (2,3,1),

Â for H happens before J,

Â you see that (0,0,1) is less than (5,3,3).

Â Similarly, C happens before J is obeyed,

Â because (3,0,0) which is assigned to C is less than J's timestamp which is (5,3,3).

Â What about those concurrent events?

Â Well, C and F are concurrent events,

Â and you see that they get timestamps (3,0,0) and (2,2,1) respectively.

Â These obey causality because they are not causally,

Â we cannot say that one of the vector timestamps is less than the other. Why is this?

Â Well, because each of the vectors has at least one element

Â that is greater than the corresponding element in the other vector.

Â C's vector has the first element, three,

Â which is greater than the corresponding element in the other vector, two.

Â And F's vector has the third element, one,

Â which is greater than the corresponding third element in the C's vector which is zero.

Â Neither of these two vectors is related to each other by less than or equal to,

Â and so, we can say that C and F are,

Â in fact, concurrent with respect to each other.

Â Similarly, H and C are two other concurrent events,

Â and H's third element, one,

Â is greater than the third element in C,

Â and C's first element, three,

Â is greater than the first element in H,

Â which is having a value of zero over here.

Â Okay. So, in this way you can use the vector timestamps to

Â verify that two events are either causally related if the vector timestamps are,

Â in fact, one of them is strictly less than the other.

Â And, on the other hand,

Â if you cannot find a strictly less than relationship between two vector timestamps,

Â then you know for sure that they are concurrent events.

Â So, in Lamport timestamps,

Â we assigned integer clocks to events in a way that the timestamps obey causality.

Â But the problem there was that we could not distinguish

Â concurrent events based on their Lamport timestamps.

Â Vector timestamps come to the rescue,

Â where you can clearly identify when a pair of

Â events either are concurrent or are causally related to each other.

Â Vector timestamps obviously take

Â much more space at each process and also within the messages,

Â but you have the added advantage,

Â that you can identify concurrent events.

Â So, that wraps up our discussion of time and ordering.

Â Time is an essential problem to solve in anything in the distributed system,

Â because the different processes

Â have clocks that are not synchronized with respect to each other,

Â and yet we would like to assign timestamps to

Â events that happen at different processes in the distributed system.

Â One way to do this is to use time synchronization

Â to keep the clocks and processes close to each other,

Â if not identical, and then use these clocks to assign timestamps to events.

Â We discussed several algorithms for this including

Â the Cristian's algorithm, the NTP algorithm.

Â The Berkeley algorithm is essentially using synchronization only within the group,

Â but in all of these cases,

Â when you have a non-zero latencies in the underlying network,

Â you're going to have an error in the clock that you set.

Â So, the error is a function of the round-trip time.

Â You can avoid time synchronization issues all together

Â by instead assigning causal timestamps to events,

Â either Lamport timestamps or vector timestamps to events.

Â