0:00

So in the last video I left you with a cliffhanger. I introduced you to the

Â minimum cut problem. I introduced you to a very simple algorithm, randomized

Â algorithm, in the form of contraction algorithm. We observed that sometimes it

Â finds the main cut and sometimes it doesn't. And so the $64000 question is

Â just how frequently does it succeed and just how frequently does it fail. So now

Â that I hope you've brushed up the conditional probability and independence,

Â we are gonna give a very precise answer to that question in this lecture. >>So

Â recalling this problem we are given as input in undirected graph, possibly with

Â parallel edges, and that the goal is to compute among the exponential number of

Â possible different cuts, that's with the fewest number of crossing edges. So, for

Â example in this graph here, which you've seen in a previous video, the goal is to

Â compute the cut (A, B). Here, cuz there are only two crossing edges, and that's as

Â small as it gets. That's the minimum cut problem and Karger proposed the following

Â random contraction algorithm based on random sampling, so we have N-2

Â iterations, and the number of vertices gets decremented by 1 in each iteration.

Â So we start with N vertices, we get down to 2. And how do we decrease the number

Â of vertices? We do it by contracting or fusing two vertices together. How do we

Â pick which pair of edges, which pair of vertices to fuse? Well we pick one of the

Â remaining edges, uniformly at random. So there's [inaudible] many edges there are

Â remaining. We pick each one, equally likely. What, if the endpoints of that

Â edge are (u, v), then we collapse u and v together into a single super node. So

Â that's what we mean by contracting two nodes into a single vertex and then if

Â that causes any self-loops, and as we saw the examples, we will in general have

Â self-loops, then we delete them before proceeding. After the N-2

Â generations, only two vertices remain. You'll recall that two vertices

Â naturally correspond to a cut. The first group of the cut A corresponds to the

Â vertices that were fused into one of the super vertices remaining at the end. The

Â other super vertex corresponds to the set B the other original vertices of the

Â graph. >>So the goal of this lec, of this video is to give an answer to the

Â following question: What is the probability of success? Where by success,

Â we mean outputs a particular min cut (A, B). So let's set up the basic notation. We're

Â gonna fix any with input graph, undirected graph. As usual we use N to denote the

Â number of vertices and M to denote the number of edges. We're also going to fix a

Â minimum cuts (A, B). If a graph has only one minimum cut, then it's clear what

Â I'm talking about here. If a graph has multiple minimum cuts, I'm actually

Â selecting just one of them. Because I'm gonna focus on a distinguished minimum cut (A, B),

Â and we're only gonna define the algorithm as successful if it outputs this

Â particular minimum cut (A, B). If it outputs some other minimum cut, we don't count it.

Â We don't count it as successful. Okay. So, we really want this distinguished minimum

Â cut (A, B). In addition to N and M, we're gonna have a parameter K, which is

Â the size of the minimum cut. That is, it's the number of crossing edges of a minimum

Â cut. For example, that cross (A, B). The K edges that cross the minimum cut (A, B); we're

Â going to call capital F. So the picture you wanna have in mind is, there is, out

Â there in the world, this minimum cut (A, B). There's lots of edges with both end points

Â in A, lots of edges possibly with both endpoints in B. But, there's not a whole

Â lot of edges with one endpoint in A and one in endpoint in B. So the edges F,

Â would be precisely, these three crossing edges here. >>So our next step is to get a

Â very clear understanding of exactly when the execution of the contraction algorithm

Â can go wrong, and exactly when it's gonna work, exactly when we're going to succeed.

Â So let me redraw the same picture from the previous slide. So given they were hoping

Â that the contraction algorithm outputs this cut (A, B) at the end of the day, what

Â could possibly go wrong? Well, to see what could go wrong, suppose,, at some iteration,

Â one of the edges in capital F, remember F are the edges crossing the min cut (A, B), so

Â it's these three magenta edges in the picture. Suppose at some iteration one of

Â the edges of F gets chosen for contraction. Well because this edge of F

Â has one endpoint in A and one endpoint in B, when it gets chosen for contraction, it

Â causes this node from A and this node from B to be fused together. What does that

Â mean? That means, in the cut that the contraction algorithm finally outputs, this

Â node from A and this node from B will be on the same side of the output cut. Okay,

Â so the cut output by the contraction algorithm will have on one side both the

Â node from A and the node from B. Therefore, the output of the contraction

Â algorithm if this happens will be a different cut than (A, B), okay? It will not

Â output (A, B) if some edge of F is contracted. >>And if you think about it, the converse is

Â also true. So let's assume now, that in each of the N-2 iterations, the contraction

Â algorithm never contracts an edge from capital F. Remember capital F are exactly

Â the edges with one endpoint in A and one endpoint in B. So if it never contracts

Â any edge of F, then it only contracts edges where both endpoints lie in capital

Â A or both endpoints lie in capital B. Well, if this is the case then, vertices

Â from A always stick together in the fused nodes, and vertices from B always stick

Â together in the fused nodes. There is never A iteration where a node from A and

Â a node from B are fused together. What does that mean? That means that when the

Â algorithm outputs <i>cuts</i> all of the nodes in A have been grouped together, all

Â of the nodes in B have been grouped together, in each of the two super nodes,

Â which means that the output of the algorithm is indeed the desired

Â cut (A, B). Summarizing, the contraction algorithm will do what we

Â want. It will succeed and output the cut (A, B), if and only if it never chooses an

Â edge from capital F for contraction. Therefore, the probability of success,

Â that is, the probability that the output is the distinguished min cut (A, B), is

Â exactly the probability that never contracts an edge of capital F. >>So, this

Â is what we're gonna be interested in here. This really is the object of our mathematical

Â analysis, the probability that in all of the N-2 iterations we never

Â contact an edge of capital F. So, to think about that, let's think about each

Â iteration in isolation, and actually define some events describing that. So for an

Â iteration I, let Si denote the event, that we screw up an iteration I. With this

Â notation, we can succinctly say what our goal is, so, to compute the probability of

Â success. What we wanna do is we wanna compute the probability that <i>none</i> of the

Â events, S1, S2 up to N minus, S(N-2) never occur. So, I'm gonna use this NOT(Â¬)

Â symbol to say that S1 does not happen. So we don't screw up in iteration 1, we

Â don't screw up in iteration 2, we don't screw up in iteration 3, and so on. All

Â the way up to, we don't screw up, we don't contract anything from capital F, in the

Â final iteration, either. So summarizing, analyzing the success probability of the

Â contraction algorithm boils down to analyzing the probability of this event,

Â the intersection of the NOT Sis with I ranging from iteration 1 to

Â iteration N-2. >>So we're gonna take this in baby steps, and the next quiz will

Â lead you through the first one, which is, let's have a more modest goal. Let's just

Â think about iteration 1. Let's try and understand, what's the chance we

Â screw up, what's the chance we don't screw up, just in the first iteration? So the

Â answer to this quiz is the second option. The probability is K/M, where K is the

Â number edges crossing the cut (A, B), and M is the total number of edges. And

Â that's just because the probability of S1, the probability we screw up, is just the

Â number of crossing edges. That's the number of outcomes which are bad, which

Â cause which trigger S1, divided by the number of edges. That's the total number

Â of things that could happen. And since all edges are equally likely, it just boils

Â down to this. And by the definition of our notation, this is exactly K/M. So this

Â gives us an exact calculation of the failure probability in the first

Â iteration, as a function of the number of crossing edges, and the number of overall

Â edges. Now, it turns out it's gonna be more useful for us to have a bound not

Â quite as exact, an inequality. That's in terms of the number of vertices N, rather

Â than the number of edges, M. The reason for that is, it's a little hard to

Â understand how the number of edges is changing in each iteration. It's certainly

Â going down by 1 in each iteration, because we contract that in each iteration, but it

Â might go down by more than 1 when we delete self-loops. By contrast the number

Â of vertices is this very steady obvious process. One less vertex with each

Â successive iteration. >>So, let's rewrite this bound in terms of the number of

Â vertices N. To do that in a useful way, we make the following key observation. I

Â claim that, in the original graph G, we are given as input, every vertex has at

Â least K edges incident on it, that is in graph theoretic terminology, every edge

Â has degree at least K. Where, recall, K is the number of edges crossing our favorite

Â min cut (A, B). So why is that true? Why must every vertex have a decent number of

Â neighbors, a decent number of edges incident to it. Well, it's because, if you

Â think about it, each vertex defines a cut by itself. Remember, a cut is just any

Â grouping into other vertices into two groups, that are not empty, that don't

Â overlap. So one cut is to take a single vertex, and make that the first group, A,

Â and take the other N-1 vertices, and make that the second group, capital B.

Â So how many edges cross this cut? Well, it's exactly the edges that are incident

Â on the first note, on the note on the left side. So every single cut, fall

Â exponentially many cuts, have at least K crossing edges, then certainly the

Â N cuts defined by single vertices have at least K crossing edges, so therefore, the

Â degree of a vertex is at least K. So our assumption that every single cut in

Â the graph has at least K crossing edges because it's a lower bound on the number

Â edges incident on each possible vertex. >>So, why is that usual? Well let's recall

Â the following general facts about any graph; which is that if you sum up over the degrees of

Â the nodes, so if you go node by node, look at how many edges are insident on that

Â node, that's the degree of V, and then sum them up over all vertices. What will you

Â get? You'll get exactly twice the number of edges, okay? So this is true for any

Â undirected graph, that the sum of the degrees of the vertices is exactly double-

Â the number of edges. To see this, you might think about taking a graph, starting

Â with the empty set of edges, and then adding the edges of the graph one at a

Â time. Each time you add a new edge to a graph, obviously the number of edges goes

Â up by 1, and the degree of each of the endpoints of that edge also go up by 1,

Â and there are, of course, two endpoints. So every time you add an edge, the number

Â of edges goes up by 1, the sum of those degrees goes up by 2. Therefore, when

Â you've added all the edges, the sum of the degrees is double the number of edges that

Â you've added. That's why this is true. Now, in this graph, at that we have a hand here,

Â every degree is at least K, and there's N nodes. So this left hand side, of course,

Â is at least KN for us. So therefore if we just divide through by 2, and flip the

Â inequality around, we have the number of edges has to be at least the size of the

Â crossing cut, so the degrees of every vertex times the number of vertices

Â divided by 2. So this is just the primitive inequality rearranging. Putting

Â this together with your answer on the quiz, since the probability of S1 is exactly K/M,

Â and M is at least KN/2, if we substitute, we get that the probability

Â of S1 is at worst 2/N, 2 over the number of vertices, and the K cancels out.

Â So that's, sort of, our first milestone. We've figured out the chance

Â that we screw up in the first iteration, that we pick some edge from the

Â crosses the cut (A, B). And things look good. This is a, this is a small number, right?

Â So, in general, the number of vertices might be quite big. And this says that

Â the probability we screw up is only 2 over the number of vertices. So, so far,

Â so good. Of course, this was only the first iteration. Who knows what happens

Â later? >>So now that we understand the chances of screwing up in the first

Â iteration, let's take our next baby step, and understand the probability that we

Â don't screw up in either of the first two iterations. That is, we're gonna be

Â interested. And the following probability. The probability that we don't screw up in

Â the first iteration nor in the second iteration. Now, as you go back to the

Â definition of a conditional probability, to realize that we can rewrite an

Â intersection like this in terms of conditional probabilities. Namely, as the

Â probability that we don't screw up in the second iteration, given that we didn't do

Â it already, times the probability that we didn't screw up in the first iteration.

Â Okay? So the probability that we miss all of these K vulnerable edges and in the second

Â iteration given that we didn't contract any of them in the first iteration. Now

Â notice this, we already have a good understanding on the previous slide. We are

Â given a nice lower bound of this. We say there's a good chance that we don't screw

Â up, probably at least 1-2/N. And in some sense we also have a

Â very good understanding of this probability. We know this is 1 minus the

Â chance that we do screw up. And what's the chance that we do screw up? Well, these K

Â edges are still hanging out in the graph. Remember we didn't contract any, in the

Â first iteration that's what's given. So there are K ways to screw up, and we choose

Â an edge to contract uniformly at random, so the total number of choices is the number

Â of remaining edges. >>Now the problem is, what's nice is we have an exact

Â understanding of this probability. This is an equality. The problem is we don't have

Â a good understanding of this denominator. How many remaining edges are there? We

Â have an upper bound on this. We know this is at most N-1, assuming we got

Â rid of one edge in the previous iteration, but actually what, if you think about it,

Â what we need of this quantity is a lower bound and that's a little unclear because

Â in addition to contracting the one edge and getting that out of the graph, we

Â might have created a bunch of self loops and deleted all events. So it's hard to

Â understand exactly what this quantity is. So instead we're gonna rewrite this bound

Â in terms of the numbers of the remaining vertices, and of course we know it's

Â exactly N-1 vertices remaining. We took two of the last iterations and

Â contracted down to 1. So how do we relate the number of edges to the number of

Â vertices? Well we do it just in exactly the same way as in the first iteration. We'll

Â make some more general observation. In the first iteration, we observed that every

Â node in the original graph induces a cut. Okay, with that node was on one side, the

Â other N-1 edges were on the other side. But the fact that's a more general

Â statement, even after we've done a bunch of contractions, any single node in the

Â contracted graph, even if it represents a union of a bunch of nodes in the original

Â graph, we can still think of that as a cut in the original graph. Right? So if

Â there's some super node in the contracted graph, which is the result of fusing

Â twelve different things together, that corresponds to a cut where those twelve

Â nodes in the original graph are on the one side A, and the other N-12

Â vertices are on the other side of the cut, B. So, even after contractions, as long as

Â we have at least two nodes in our contracted graph, you can take any node

Â and think of it as half of a cut, one side of a cut in the original graph.

Â 16:31

>>Now remember, K is the number of edges crossing our minimum cut (A, B), so

Â any cuts in the original graph G has to have K crossing edges. So, since every

Â node in the contracted graph naturally maps over to a cut in the original graph

Â with at least K edges crossing it, that means, in the contracted graph, all of the

Â degrees have to be at least K. If you ever had a node in the contracted graph that

Â had only say K-1 incident edges, well then you'd have a cut in the original

Â graph with only K-1 edges contradiction. So just like in the first

Â iteration, now that we have a lower bound on the degree of every single vertex, we

Â can derive a lower bound on the number of edges that remain in the graph. The number

Â of remaining edges is at least 1/2, that's because when you sum over the

Â degrees of the vertices, you double count the edges, times the degree of each

Â vertex, that we just argued that that's at least K in this contracted graph, times

Â the number of vertices, that we know there's exactly N-1 vertices left in

Â the graph at this point. So now what we do is to plug this inequality, to plug this

Â lower bound of the number of remaining edges, on, as we'll substitute that for

Â this denominator, so in lower bounding the denominator, we upper bound this fraction,

Â which gives us a lower bound on 1 minus that fraction, and that's what we want. So

Â what we find is that the probability that we don't screw up in the second iteration

Â given that we didn't screw up in the first iteration. Where again, by screwing up

Â means picking one of these K edges crossing (A, B) to contract is at least

Â 1-(2/(N-1)). So, that's pretty cool. We took the first iteration, we analyzed it,

Â we showed the probability that we screw up is pretty low, we succeed with probability of

Â at least 1-(2/N). In the second iteration, our success probability

Â has dropped a little bit, but it's still looking pretty good for reasonable values

Â of N, 1-(2/(N-1)). >>Now, as I hope you've picked up, we can

Â generalize this pattern to any number of iterations, so that the degree of every node of

Â the contracted graph remains at least K. The only thing which is changing is the

Â number of vertices is dropping by 1. So, extending this pattern to its logical

Â conclusion, we get the following lower bound on the probability that the

Â contraction algorithm succeeds. The probability that the contraction algorithm

Â outputs the cut (A, B), you recall we argued, is exactly the same thing as the

Â probability that it doesn't contract anything, any of the K crossing edges, any

Â of the set F in the first iteration, nor in the second iteration, nor in the third

Â iteration, and then so on, all the way up to the final (N-2)th iteration. Using the

Â definition of conditional probability, this is just the probability that we

Â don't screw up in the first iteration, times the probability that we don't screw

Â up in the second iteration given that we didn't screw up in the first iteration,

Â and so on. In the previous two slides, we showed that, we don't screw up in the

Â first iteration, with probability of at least 1-(2/N). In the second

Â iteration, with probability at least 1-(2/(N-1)). And of course,

Â you can guess what that pattern looks like. And that results in the following

Â product. Now, because we stop when we get down to two nodes remaining, the last

Â iteration in which we actually make a contraction, there are three nodes. And

Â then, the second to last iteration in which we make a contraction, there are

Â four nodes. So that's where these last two terms come from. Rewriting, this is just

Â (N-2)/N times (N-3)/(N-1), and so on. And now something

Â very cool happens, which is massive cancellation, and to this day, this is

Â always just incredibly satisfying to be able to cross out so many terms. So you

Â get N-2, cross it out here and now here, there's going to be a pair of N-3s that

Â get crossed out, and N-4s, and so on. On the other side, there's going to be a pair of

Â 4s that get crossed out, and a pair of 3s that get crossed out. And we'll be left

Â with only the two largest terms on the denominator, and the two smallest terms in

Â the numerator, which is exactly 2/N(N-1). And to keep things

Â simple among friends, let's just be sloppy and lower bound this by 1/(N^2).

Â So that's it. That's our analysis of the success probability of Karger's

Â contraction algorithm. Pretty cool, pretty slick, huh? >>Okay, I'll concede, probably

Â you're thinking "Hey, wait a minute. We're analyzing the probability that the

Â algorithm succeeds, and we're thinking of the number of vertices N as being big, so

Â we'll see here as a success probability of only 1/(N^2), and that kinda sucks."

Â So that's a good point. Let me address that problem. This is a low

Â success probability. So that's disappointing. So why are we talking about

Â this algorithm, or this analysis? Well, here's something I want to point out.

Â Maybe this is not so good, 1/(N^2) you're going to succeed, but

Â this is still actually shockingly high for an brute-forth algorithm which honestly

Â seems to be doing almost nothing. This is a nontrivial lower bound and non trivial

Â success probability, because don't forget, there's an exponential number of cuts in

Â the graph. So if you try to just pick a random cut i.e you put every vertex 50:50

Â left or right, you'll be doing way worse than this. You'll have a success

Â probability of like 1/(2^N). So this is way, way better than that. And

Â the fact that its inverse polynomial means is that using repeated trials, we can

Â turn a success probability that's incredibly small into a failure

Â probability that's incredibly small. So lemme show you how to do that next. >>So,

Â we're gonna boost the success probability of the contraction algorithm in, if you

Â think about it a totally obvious way. We're gonna run it a bunch of times, each

Â one independently using a fresh batch of random coins. And we're just going to

Â remember the smallest cut that we ever see, and that's what we're gonna return, the

Â best of a bunch of repeated trials. Now the question is, how many trials are we

Â gonna need before we're pretty confident that we actually find the meant cut that we're

Â looking for? To answer this question vigorously, let's define some suitable

Â events. So by Ti, I mean the event at the Ith trail succeeds, that is the Ith

Â time we run the contraction algorithm which does output that desired meant

Â cut (A, B). For those of you that watched the part II of the probability review, I

Â said a rule of thumb for dealing with independents is that, you should maybe, as

Â a working hypothesis, assume granted variables are dependent, unless they're

Â explicitly constructed to be independent. So here's a case where we're just gonna

Â define the random variables to be independent. We're just gonna say that we

Â run [inaudible] the contraction algorithm over and over again with fresh randomness

Â so that they're gonna be independent trials. Now, we know that the, probability that a

Â single trial fails can be pretty big, could be as big as 1-1/(N^2). But,

Â here, now, with repeated trials, we're only in trouble if every single trial

Â fails. If even one succeeds, then we find the meant cut. So a different way of saying

Â that is we're interested in the intersection of T1 and T2 and so on,

Â that's the event that every single trial fails. And now we use the fact that the

Â trials are independent. So, the probability that all of these things

Â happen is just the product of the relevant probabilities. So, the product from I=1

Â to capital N of the probability of not TI. Recall that we argued that the

Â success probability of a single trial was bounded below by 1/(N^2). So

Â the failure probability is bounded above by 1-1/(N^2). So since

Â that's true for each of the capital N terms, you get an overall failure

Â probability for all capital N trials of 1 minus 1/(n^2) raised

Â to the capital of N. Alright, so that's a little calculation. Don't lose sight of

Â why we're doing the calculation. We want to answer this question, how many trials

Â do we need? How big does capital N need to be before are confident that we get the

Â answer that we want? >>Okay, so to answer that question I need a quick calculus fact,

Â which is both very simple and very useful. So for all real numbers X, we have the

Â following inequality, 1+x is bound above by e^x. So I'll just

Â give you a quick proof via picture. So first think about the line 1+x. What

Â does that cross through? Well, that crosses through the points when x is -1,

Â y is 0, and when x is 0, y is 1. And it's a line, so this looks like this

Â blue line here. What does e^x look like? Well, if you substitute x = 0,

Â it's gonna be 1. So in fact two curves kiss each other at x = 0. But

Â exponentials grow really quickly, so as you jack up x to higher positive numbers, it

Â becomes very, very steep. And for x negative numbers it stays non-negative the

Â whole way. So this sort of flattens out for the negative numbers. So, pictorially,

Â and I encourage you to, you know, type this into your own favorite graphing

Â program. You see the e^x bounds above everywhere, the line, the 1+x.

Â For those of you who want something more rigorous, there's a bunch

Â of ways to do it. For example, you can look at the [inaudible] expansion of

Â e^x at the point 0. >>What's the point? The point is this allows us to do

Â some very simple calculations on our previous upper bound on the failure

Â probability by working with exponentials instead of working with these ugly one

Â minus whatevers raised to the whatever term. So, let's combine our upper bound

Â from the previous slide with the upper bound provided by the calculus fact. And

Â to be concrete, let's substitute some particular number of capital N. So, let's

Â use little n^2 trials, where little n is the number of vertices of the graph.

Â In which case, the probability that every single trial fails to recover the cut (A, B)

Â is bounded above by e to the -1/(N^2). That's using the calculus

Â fact applied with X = -1/(N^2). And then we inherit the

Â capital N and the exponent which we just substantiated to little n^2. So of

Â course the N^2 are gonna cancel, this is gonna give us E^(-1),

Â also known as 1/E. So if we're willing to do little n^2 trials,

Â then our failure probability has gone from something very close to 1, to

Â something which is more like, say, 30 some more percent. Now, once you get to a

Â constant success probability, it's very easy to boost it further by just doing a

Â few more trials. So if we just add a natural log factor, so instead of a little

Â n^2 trials, we do little n^2 times the natural log of the little n.

Â