0:13

So this failure detector is called the SWIM

Â or Scalable Weekly consistent

Â Infection style Membership protocol.

Â You'll see, uh, why it is called that in this lecture,

Â as well as the next.

Â Essentially here, instead of using heart beating

Â we use the reverse which is, pinging.

Â This is a concept that all of you are familiar with.

Â 0:30

Process pi runs the following protocol,

Â which runs periodically every T prime time units,

Â that's called the Protocol period.

Â The beginning of the Protocol period

Â it picks one other process at random,

Â wa-call that process pj, and sends it a ping message.

Â If the process pj receives a ping messages,

Â it responds back with a ack message immediately.

Â If pi receives this ack

Â then, it does nothing else

Â for the remainder of the Protocol period,

Â it's satisfied.

Â However, if it does not hear back an acknowledgement,

Â which might happen if the acknowledgement is dropped

Â or the original ping is dropped.

Â Then, it tries to ping pj again,

Â but, instead of using the direct path, it uses indirect path.

Â It does this by sending indirect pings to K other

Â randomly selected processes.

Â This, third process is one of them.

Â When it receives this indirect ping,

Â it then sends a direct ping, uh, to pj,

Â which responds back with an acknowledgement,

Â and then, uh, a direct acknowledgement,

Â and then the third process sends

Â an indirect acknowledgement back to pi.

Â If pi receives at least one such indirect acknowledgement,

Â by the end of the protocol, uh, period,

Â uh, then, uh, it is happy and is satisfied.

Â If it does not receive either,

Â direct acknowledgement from the beginning

Â or any indirect acknowledgements

Â then, it marks pj as having failed.

Â 1:41

So there're two things going on here,

Â first of all, pi is giving pj a second chance

Â to respond back to a ping,

Â maybe the first ping was dropped.

Â That's why we have the second stage,

Â uh, also the pi to pj path in the internet itself

Â might be congested

Â and might be dropping more packets than other paths.

Â So, pi uses other internet paths,

Â by using these indirect pingers,

Â bypassing this potential congestion,

Â uh, and, uh, giving pj a better chance

Â of responding with an acknowledgement.

Â So, essentially, you're giving a temporal chance to pj,

Â uh, ah, by sending a second ping

Â and also spatial chance to pj by using indirect paths.

Â 2:16

So, where does SWIM lie with respect to heart beating,

Â I have two axis here, on this, uh, plot.

Â The Y axis, the vertical axis is the first detection time.

Â The X axis is the process load.

Â 2:25

Here we fix the false positive rate and the message loss rate.

Â For heart beating, as you increase,

Â when you have a very low process load,

Â another words, when you have a,

Â a low bound on the bandwidth that can be used,

Â the detection time can be very high.

Â If it is constant, a load, that is your constraint,

Â the detection time could be as high as order N.

Â On the other hand, if your process load is order N

Â then, your detection time could be constant.

Â Swim on the other hand, gets both,

Â a constant detection time on expectation,

Â as well as, a constant process load.

Â We'll see this on the next few slides.

Â 2:54

So the detection time in SWIM

Â is on expectation e/e-1 protocol periods,

Â this is a constant and it's independent of group size.

Â The load on each process is a constant per period

Â because each process is sending out, one ping maybe,

Â uh, another K indirect ping messages

Â and our expectation is receiving,

Â uh, one, um, direct ping

Â and our expectation, uh, a few, um, uh, indirect ping messages.

Â You can show by analysis

Â that the load is in fact less than 8 times the optimal load,

Â when you have 15% packet loss.

Â 3:26

The false positive rate,

Â uh, that this protocol achieves is tunable,

Â by increasing K, you can lower the false positive rate

Â and, uh, the false positive also falls exponentially,

Â uh, as the value of K's increase up.

Â Finally, you get completeness, uh, when a process fails,

Â it will eventually be detec-be, uh, be selected for pinging,

Â as long as, there are any further processes with,

Â uh, this failed process in their membership list,

Â this is the nice property about,

Â uh, picking, uh, ping targets at random.

Â That, uh, when you have multiple, uh, pingers,

Â uh, eventually one of them will pick you.

Â 4:01

Uh, but in fact, the expectation is even better,

Â uh, y'know, an expectation you have e-1,

Â e/e-1 protocol periods

Â until at least one of them pings you.

Â 4:17

So, because you have an expected

Â e/e-1 protocol periods until detection,

Â you can show that,

Â uh, within log in protocol periods

Â or order log in protocol periods,

Â uh, with high probability,

Â uh, at least one,

Â uh, process will ping the failed process,

Â and will mark it as having failed.

Â 4:33

So, the probability mistake is exponentially in -K,

Â so as you increase the value of K,

Â uh, the probability mistake falls.

Â It also, depends on the message lost rate,

Â which we marked as, pml, in the previous, uh, lecture.

Â It also depends on the probability of failure,

Â uh, if you have many failures in the system,

Â then, the probability of mistake might, uh, go up,

Â uh, essentially, because

Â some of these indirect pingers might get affected.

Â Uh, however, PM(T) stays, uh, small

Â and, it goes down as K is increased.

Â 4:59

You can show that,

Â uh, the load in the worst case is,

Â uh, 28 times,

Â uh, less than 28 times the optimal, uh, load,

Â uh, and the expectation, uh,

Â on expectation the load is less than 8 times the optimal load,

Â when you have 15% packet loss rates,

Â which are pretty high for the internet,

Â uh, but for the analysis, uh, this is good.

Â 5:17

So let's see for a moment

Â why it's e/e-1 protocol periods

Â on expectation for being, uh, for the failure detection.

Â So, consider a process that has failed,

Â uh, what is the probability, and assume that all the other,

Â uh, processes in the system, uh, are, uh,

Â have this failed process in their membership list.

Â And each one of them is picking one,

Â uh, ping target, at random.

Â The probability that, um,

Â uh, this process will be, the failed process will be,

Â uh, picked as, uh, a target by a given other process, pi,

Â is 1/N.

Â The probability that it will not be picked as,

Â uh, target by this,

Â one of the process pi is 1-1/N.

Â The probability that it will not be picked as target

Â by any of this other N-1 processes in the system

Â is simply this quantity raised to N-1

Â and the probability that at least one of these,

Â uh, non-value processes will ping it,

Â is just 1 minus this quantity.

Â The second part of this equation, is a well-known,

Â uh, limit, as N goes very high,

Â which is what we expect in data centers.

Â This value becomes e^-1.

Â 6:15

Essentially, what we are seeing is that,

Â in each protocol period,

Â you flip a coin,

Â with heads probability 1-e, uh, ^-1

Â and the coin turns up heads,

Â then at least one other, uh, process in the group

Â is going to pick the failed process as a ping target

Â and mark it as having failed.

Â 6:33

So, um, if you know your probability theory than,

Â basically, you'll know that it, uh, takes an expectation,

Â uh, one over... this quantity,

Â 1-e^-1 number of protocol periods

Â f-on expectation for the first heads to turn up.

Â And that is why we get e/e-1.

Â 6:54

Also, as I mentioned before, you have completeness,

Â eventually,

Â um, once a process fails,

Â uh, um, uh, uh, at least one,

Â uh, process will mark it,

Â uh, will, uh, choose it as a pinged target

Â and in fact, eventually,

Â every other non-faulty process

Â that has this failed process in this list,

Â will pick it as a pinged target,

Â because that's what you get with, uh, random picking.

Â 7:25

The trick is as following,

Â um, whenever you pick a membership element,

Â uh, you, uh, pick the next membership element in your list,

Â in the linear fashions,

Â so, essentially, you traverse the membership list

Â that you have, uh, one per around.

Â When you reach the end of the membership list, you simply,

Â reorder and permute the membership list that you have.

Â 7:51

So you can see,

Â um, um, uh, if you think about it, you can,

Â uh, see that,

Â this results in 2N-1 protocol periods,

Â in the worst case,

Â before, uh, process is, uh, picked as a pinged target,

Â after it has failed.

Â Uh, the worst case happens if,

Â uh, the process has just been passed, in this round,

Â in this round robin traverser and then after the permutation,

Â uh, the process ends up at the bottom of the list,

Â uh, the very end of the list,

Â that takes N-1+N, uh, protocol periods

Â for the pinging process to get around and,

Â and ping the failed process.

Â 8:25

Uh, this change of, um,

Â uh, the way in which ping targets are picked,

Â uh, does not change the failure detection properties,

Â such as, uh, the false positive rate,

Â and other scalability properties.

Â 8:38

So far we've discussed, uh, failure detection protocols,

Â uh, but, uh, we need to return to the big picture

Â of the group membership protocol,

Â um, and, uh, see how the dissemination component works,

Â uh, in tandem with the failure detection,

Â uh, component that we have seen so far.

Â