0:00

This video is a segway between bad news and good news.

Â The bad news, which we have now discussed, is NP completeness.

Â The fact that there are computationally intractable problems out there in the

Â world. In fact, they are fairly ubiquitous and

Â you're likely to encounter them in your own projects.

Â The good new is that NP completeness is hardly a death sentence.

Â Indeed, our algorithmic tool box is now rich enough to provide many different

Â strategies toward coping with NP complete problems.

Â So suppose you have identified a computational problem on which the

Â success of your new startup company rests.

Â May be you would spend the last several weeks throwing the kitchen sink at in.

Â All the algorithm design paradigms you know, all the data structures, all the

Â primitives, nothing works. Finally, you decide to try to prove the problem is NP

Â complete, and you succeed. Now you have an explanation for why your

Â weeks of effort have come to naught, but that doesn't change the fact that this is

Â the problem that governs the success of your project.

Â What should you do? Well, the good news is, NP completeness is certainly not a

Â death sentence. There are people solving, or at least

Â approximately solving, NP complete problems all the time.

Â However, knowing that your problem is NP complete does tell you where to set your

Â expectations. You should not expect some general

Â purpose, super-fast algorithm, like we have for other computational problems,

Â like say, sorting, or single source shortest paths.

Â Unless you are dealing with unusually small, or well structured inputs, you're

Â going to have to work pretty hard to solve this problem, and also, possibly,

Â make some compromises. The rest of this course is devoted to

Â strategies for solving or approximately solving NP- complete problems.

Â In the rest of the video, I'll give you an orientation for what those strategies

Â are, and what you can expect to come. So as usual, I'm going to to focus here

Â on general purpose strategies that cut across multiple application domains.

Â As usual, these general principles should just be a starting point.

Â You should take them, and run with them, augmenting them with whatever domain

Â expertise you have, in the specific problem that you need to solve.

Â The first strategy, is to focus on computationally tractable special cases,

Â of an np complete problem. Related, relatedly you want to think

Â about what's special about your domain or about the data sets that you're working

Â with, and try to understand if there's special structure which can be exploited

Â in your algorithm. Let me point out, we've already done this

Â in a couple of cases in this course. The first example we saw concerns the

Â weighted independent set. So we started this problem on path graphs

Â but computational problem makes perfect sense in general graphs.

Â The general problem is I give you as input, an undirected graph, every vertex

Â has a weight, and I want the maximum weight subset of vertices that is an

Â independent set. And remember, in an independent set, you

Â are forbidden to take any 2 vertices that are neighbors.

Â So in an independent set, none of the pairs of vertices that you've picked are

Â joined by an edge. In general graphs, the way to do an

Â independent set problem is NP complete, so we certainly don't expect it to have a

Â polynomial time algorithm. But, in the special case where the graph

Â is a path, as we saw, there's a linear time, dynamic programming algorithm, that

Â exactly solves the weight of the independent set problem.

Â 3:19

So path graphs form a special case of the weight of the weighted independent set

Â problem that's computationally tractable, solvable in polynomial, even linear,

Â time. In fact, the frontier of tractability can

Â be pushed well beyond path graphs. On the homework, I asked you to think

Â through the case of graphs that are trees, and notice that you could still do

Â dynamic programming efficiently, to be weighted independent sets and trees.

Â You can even get computationally efficient algorithms for a broader class

Â of graphs, known as bounded tree width graphs.

Â So the definition of that is a little outside the scope of this course, but you

Â can go even beyond trees. So the second example follows from my

Â dynamic programming algorithm for the Knapsack problem, so we discussed that

Â running time and we explain why it's exponential time.

Â The running time of our dynamic programming Knapsack algorithm is N, the

Â number of items times capital W, the Knapsack capacity.

Â And because it only takes log W bits to specify the capacity capital W, we don't

Â call that a polynomial time algorithm. But, imagine you only have to solve a

Â knapsack instance where the capacity is not too big, maybe even say that capacity

Â capital W is Big O event, and you definitely will see knapsack instances in

Â practice, which possess that property. Well then our dynamic programming

Â algorithm just runs in time, O(n^2), and that's a bonafide polynomial time

Â algorithm for this special case of a small knapsack capacity.

Â So next, let me mention a couple of examples we're going to see in the

Â forthcoming videos. The first one is going to concern the

Â 2-SAT Problem. The 2-SAT is a type of constraint

Â satisfaction problem. You remember, in a constraint

Â satisfaction problem, you have a bunch of variables, each one gets assigned a

Â value. So the simplest case is the Boolean case, where each variable can be

Â 0 or 1, false or true, and then you have a bunch of clauses, which specify the

Â permitted joint values of a collection of variables.

Â The 2 in 2-SAT refers to the fact that each constraint involves the joint values

Â of only a pair of variables. So, a canonical constraint in a two set

Â instance, is going to for two variables, specify three joint assignments that are

Â allowed and one that's forbidden. So for example may be it will say offer

Â variables x3 and x7, it's okay to set them both to true, its okay to set them

Â both to false, its okay to set 3 to true and 7 to false, but it's forbidden to set

Â 3 to false and 7 to true. So that's a canonical constraint in a

Â 2-SAT instance. 3-SAT, it's the same thing, except the constraints involve the

Â joint values to a triple of variables, and it's going to forbidden one out of

Â the eight possibilities. Now the 3 set problems are a conanicle NP

Â complete problem. That was really singled out by Cook and

Â Levin as being sufficiently expressive to encode all problems in NP.

Â But, if each constraints had size only two, then, as we'll see, the problem

Â becomes polynomial times solvable. There's a couple of ways of proving that.

Â We're going to talk about a local search algorithm that checks whether or not

Â there is indeed an assignment to the, variables that simultaneously satisfies

Â all of the given constraints. So the final example, we'll discuss in

Â more detail later, but just very briefly, we're going to discuss the vertex cover

Â problem. This is a graph problem,

Â and the vertex cover is just a complement of an independent set.

Â So while an independent set cannot take two vertices from the same edge, in the

Â vertex cover problem, you have to take at least one vertex from, every single edge.

Â And then what you want is you want the vertex cover that minimizes the sum of

Â the vertex weights. Yet again, this is an NP complete problem

Â in general, but we're going to focus on the special case where the optimal

Â solution is small. That is, we're going to focus on graphs,

Â where there's a small number of vertices, such that every single edge has at least

Â one N point in that small set, and we see that for that special case, using a smart

Â kind of exhaustive search we'll actually be able to solve the problem in

Â polynomial time. So let me reiterate that these tractable

Â special cases are meant primarily to be building blocks, upon which you can draw

Â in a possibly more sophisticated approach to your NP complete problem.

Â So just to make this a little more concrete, let me just kind of dream up

Â one scenario to let you know what I am thinking about.

Â Imagine, for example, that you have a project where unfortunately it's not

Â really 2-SAT that you're confronting, it's actually a 3-SAT instance. So you're

Â feeling kind of bummed, 3-SAT is NP complete, and maybe you have 1000

Â variables, and certainly you can't do brute force search over the 2 to the

Â 1,000 possible ways of assigning values to your 1000 variables.

Â But, maybe the good news is, because you have domain expertise, because you

Â understand this problem instance, you know that, yeah,

Â there's 1,000 variables, but there's really 20 that are crucial.

Â You have a feeling, that all of the action, basically, is boiling down to how

Â these 20 core variables get assigned. Well now, maybe you can mix together some

Â brute force search with some of these tractable special cases.

Â For example, you can ennumerate over all 2 to the 20 ways of assigning values to

Â this core set of 20 variables. 2 to the 20 is roughly a million, that's

Â not so bad. And now, what you're going to do is, for

Â each of these million scenarios, you check whether there's a way to extend

Â that tentative assignment of 20 values to the 20 variables, to the other 980

Â variables, so that all of the constraints get satisfied.

Â The original problem is solvable if and only if there exists a way of assigning

Â values to these 20 variables that successfully extends to the other 980.

Â Now, because these are the crucial variables and it's where all the action

Â is, maybe as soon as you assign all of them, 0's and 1's the residual SAT

Â instance is tractable. For example, maybe it just becomes a

Â simple 2-SAT instance, and then you can solve it in polynomial time.

Â So this gives you a hybrid appoach, approach.

Â Brute force search at the top level, tractable special cases for each guess of

Â assignment of the 20 variables, and you're off to the races.

Â And I hope it's clear, I mean this as just one possible way that you might

Â combine the various building blocks we're developing into a more elaborate approach

Â to tackling NP complete problem. And that's generally what they take, they

Â take a fairly elaborate approach, because, after all, they are NP complete,

Â you've gotta respect that. So with that digression complete, let me

Â mention what are the other two strategies we're going to be exploring in the

Â lectures to come. So the second strategy, which is

Â certainly one very common in practice, is to resort to heuristics.

Â That is, two algorithms, which are not guaranteed to be correct.

Â We haven't really seen examples of heuristics in the course thus far. those

Â of you that are alumni of part 1, perhaps we could classify Carger's randomized

Â minimum cut algorithm as a heuristic, because it did have a small failure

Â probability of failing to find, the minimum cut.

Â But rather, I'm going to focus on some examples in the upcoming lectures.

Â I'm going to use the. Knapsack problem as a case study,

Â and what we'll see, is that our toolbox, which contains various algorithm design

Â paradigms, it's useful not just for designing correct algorithms, but it's

Â also useful for designing heuristics. So in particular, we'll get a pretty good

Â algorithm for the Knapsack problem, using the greedy algorithm design paradigm,

Â and we'll get an excellent Heuristic for Knapsack, using the dynamic programming

Â algorithm design pardigm. The final strategy, is for situations

Â where you are unwilling to relax correctness.

Â You're unwilling to consider heuristics. Now, of course, for an NP complete

Â problem, if you're always going to be correct, you're not, you don't expect to

Â run in polynomial time, but there are still opportunities to have algorithms

Â that, while exponential time in the worst case, are smarter than naive brute force

Â search. So we have in fact already seen one

Â example that can be interpreted as a implementation of this strategy, that's

Â for the knapsack problem. So, in the knapsack problem, naive brute

Â force search, would just run over all possible subsets of the items.

Â It would check if a subset of items fit in the knapsack.

Â If it does, it remembers the value, and then it just returns the feasible

Â solution. with maximum value.

Â That has time proportional to 2 to the n, where n is the number of items.

Â Our dynamic programming algorithm, has running time n times W.

Â Now, of course, cap- this is no better than 2 to the n, if the knapsack capacity

Â is huge, if it is itself, 2 to the n. But, as we argued, if W is smaller, this

Â algorithm is going to be faster. And also, as you learned on the third

Â programming assignment, sometimes even though W is big, dynamic programming's

Â going to beat the crap out of brute force search.

Â So I'll show you a couple of other examples.

Â We'll talk about the travelling salesman problem, where naive brute force search

Â would roughtly take n factorial time, where n is the number of vertices.

Â We'll give an alternative dynamic programming base solution which runs in

Â time only 2 to the n, which is much better than n factorial.

Â The third example which will cover in a forthcoming video, we already alluded to

Â briefly on the last slide. It's for the vertex cover problem.

Â So this is where you're given a graph, every vertex has a weight, you want the

Â minimum weight subset of vertices that includes at least one endpoint from every

Â edge. We're going to consider the version of

Â the problem where you want to check whether or not it's possible to have a

Â vertex cover that uses only k vertices, whether or not there exists k vertices

Â that includes one endpoint at least, from each edge.

Â The naive brute force search will run in time, end the k, which gets absurd, even

Â when k is quite small, but we're going to show that there's a

Â smarter algorithm, which is still exponential in k that runs in time only 2

Â to the k times the size of the graph.

Â