0:00

I've said pretty much everything I want to say about sorting at this point but

Â I do want to cover one more related topic.

Â Namely the selection problem.

Â This is a problem of computing ordered statistics of an array

Â with computing the median of an array being a special case.

Â Analogous to our coverage of quick sort the goal is going to be the design and

Â analysis of a super practical randomized algorithm that solves the problem.

Â And this time, we'll even achieve an expected running time that is linear

Â in the length of the input array.

Â That is big O of n for input arrays of length n, as opposed

Â to the o of n log in time that we had for the expected running time of quick sort.

Â Like quick sort, the mathematical analysis is also going to be quite elegant.

Â So in addition these two required videos on this very practical algorithm will

Â motivate two optional videos that are on very cool topics but

Â of a similar more theoretical nature.

Â The first optional video is going to be on how you solve the selection problem in

Â deterministic linear time.

Â That is without using randomization.

Â And the second optional video will be a sorting lower bound that is why no

Â comparison based sort can be better than mergeshort.

Â Can have better running time than big O of n login.

Â 1:04

So a few words about what you should have fresh in your mind before you

Â watch this video.

Â I have definitely assuming that you watched quicksort videos.

Â And not just watched them but

Â that you have that material pretty fresh in your mind.

Â So in particular the video of quicksort about the partition subroutine, so

Â this is where you take a input ray and you choose a pivot and you do repeated swaps.

Â You rearrange the array so that everything less then the pivot is to the left of it.

Â Everything bigger then the pivot is to the right of it.

Â You should remember that sub routine,

Â you should also remember the previous discussion about pivot choices.

Â The idea that the quality of a pivot depends on how balanced a split into two

Â different sub problems it gives you.

Â Those are both going to be important.

Â For the analysis of this randomized linear time selection algorithm I need you to

Â remember the concepts from probability review part one.

Â And particular random variables, their expectation, and linearity of expectation.

Â That said, let's move on and formally define what the selection problem is.

Â The input is the same as for

Â the sorting problem, just you're giving it array of indistinct entries.

Â But in addition, you're told what order statistic you're looking for.

Â So that's going to be a number I, which an integer between 1 and N.

Â And the goal is to output just a single number.

Â Namely the ith order statistic,

Â that is the ith smallest entry in this input array.

Â 3:03

If the array has even length, there's two possible medians, so

Â let's just take the smaller of them, that's the n over 2th order statistic.

Â You might wonder why you'd ever want to compute the median of an array rather than

Â the mean, that is the average.

Â It's easy to see you that you can compute the average just with a simple

Â linear scan.

Â And the median you can, one motivation is it's a more robust version of the mean.

Â So if you just have a data entry problem and it corrupts one element of an input

Â array it can totally screw up the average value of the array, but

Â it has generally very little impact on the median.

Â Final comment about the problem is that I am going to assume that the array entries

Â are distinct, that is there's no repeated elements.

Â But just like in our discussions of sorting, this is not a big assumption.

Â I can encourage you to think about how to adapt these algorithms to work even if

Â the arrays do have duplicate.

Â You can, indeed, still get the same very practical,

Â very fast algorithms with duplicate elements.

Â Now if you think about it, we already have a pretty darn good algorithm that solves

Â the selection problem.

Â Here's the algorithm.

Â It's two simple steps and it runs in o of n log n time.

Â 4:23

So that's pretty cool, we've just done what a computer scientist would

Â call a reduction and that's a super useful and super fundamental concept.

Â It's when you realize that you can solve one problem by reducing it to another

Â problem that you already know how to solve.

Â So what we just showed is that the selection problem reduces easily

Â to the sorting problem.

Â We already know how to solve the sorting problem n log n time so

Â that gives an n log n time solution to this selection problem.

Â But again remember the mantra of any algorithm designer worth their salt,

Â is can we do better.

Â We should avoid contentedness.

Â Just because we got nlogn we should stop there.

Â Maybe can be even faster.

Â Now certainly we're going to have to look at all the elements in the input array,

Â in the worst case.

Â You shouldn't expect to do better than linear, but hey, why not linear time?

Â 5:11

Actually if you think about it, we probably should have asked that question

Â back when we were studying the sorting problem.

Â Why were we so content with the end login time bound for merch sort.

Â And the O of N login time on average bound, for quick sort.

Â Well it turns out, we have a really good reason to be happy with our N login

Â upper bounds for the sorting problem.

Â 5:41

So another words, if we insist on solving the selection problem via

Â a reduction to the sorting problem then we're stuck with this N log N time bound.

Â Okay, strictly speaking that's for

Â something called comparison sorts, see the video for more details but

Â the upshot is if you want a general purpose algorithm.

Â And we want to do better than N log N for

Â selection we have to do it using ingenuity beyond this reduction,

Â we have to prove that selection is a strictly easier problem then sort it.

Â That's the only way we're going to have an algorithm that beats n log n.

Â That's the only way we can conceivably get a linear time algorithm.

Â And that is exactly what is up next on our plates.

Â We're going to show selection is indeed fundamentally easier than sorting.

Â We can have a linear time algorithm for it,

Â even though we can't get a linear time algorithm for sorting.

Â You can think of the algorithm we're going to discuss as a modification of

Â quick sort and in the same spirit of quick sort it will be a randomized algorithm.

Â And the running time will be an expected running time that will hold for

Â any input array.

Â 6:40

Now, for the sorting problem we know that quick sort that's n

Â log in time on average, where the average is over the coin flips done by the code.

Â But we also know that if we wanted to,

Â we could get a sorting algorithm in n log n time that doesn't use randomization.

Â The merge sort algorithm is one such solution.

Â So here, we're giving a linear time solution for

Â selection, for finding order statistics that uses randomization.

Â And it would be natural to wonder, is there an analog to merge sort?

Â Is there an algorithm which does not use randomization, and

Â gets this exact same linear time down.

Â In fact there is.

Â The algorithm's a little more complicated, and

Â therefore not quite as practical as this randomized algorithm.

Â But it's still very cool.

Â It's a really fun algorithm to learn and to teach.

Â So I will have an optional video about linear time selection

Â without randomization.

Â So for those of you who aren't going to watch that video or

Â want to know what's the key idea.

Â The idea is to choose the pivot deterministically in a very careful way

Â using a trick called the median of medians.

Â That's all I'm going to say about it now you should watch the optional video if you

Â want more details.

Â 7:54

So what I want to do next is develop the idea that can modify the QuickSort

Â paradigm in order to directly solve The selection problem.

Â So to get an idea of how that works, let me review the Partition subroutine.

Â Like in Quicksort this subroutine will be our workhorse for the selection algorithm.

Â So, what the Partition subroutine does, it takes as inputs, some jumbled up array and

Â it's going to solve a problem which is much more modest than sorting.

Â So in partitioning, it's going to first choose a pivot element somehow.

Â We'll have to discuss what's a good strategy for choosing a pivot element.

Â But suppose in this particular input array it chooses the first element,

Â this three, as the pivot element, the responsibility of the partition

Â sub-routine then is to rearrange the elements in this array so

Â that the following properties are satisfied.

Â Anything less than the pivot is to the left of it and it can be in jumbled order.

Â But if you're less than pivot you better be to the left like this two and

Â one is less than three.

Â If you're bigger than the pivot than again you can be in jumbled order amongst those

Â elements but all of them have to be to the right of the pivot and

Â that's true for the numbers four through eight.

Â They all are to the right of the pivot three in a jumbled order.

Â So this in particular puts the pivot in its rightful position,

Â where it will belong in the final sorted array.

Â And at least for

Â Quicksort, it enabled us to recursively sort to smaller subproblems.

Â So this is where I want you to think a little bit about how we should adapt

Â this paradigm.

Â So, suppose I told you the first step of our selection algorithm is going to be

Â choose a pivot and partition the array.

Â Now the question is, how are we going to recurse?

Â We need to understand how to find the ith order statistic of the original

Â input array.

Â It suffices to recurse on just one sub problem of smaller size,

Â and find a suitable or a statistic in it.

Â So how should we do that?

Â Let me ask you that with some very concrete examples.

Â About what pivot we choose and what order statistic we're looking for and

Â see what you think.

Â So the correct action to this quiz is the second answer.

Â 9:53

So we can get away with recursing just once, and then this particular example,

Â we're going to recurse on the right side of the array.

Â And instead of looking for the fifth order statistic like we would originally,

Â we're going to recursively search for the second order statistic.

Â So why is that?

Â Well first why do we recurse on the right side of the array?

Â So by assumption we have this array of ten elements, we choose the pivot,

Â we do partitioning, remember the pivot winds up in its rightful position.

Â That's what partitioning does.

Â So in the bid it winds up in the third position,

Â we know it's the third smallest element in the array.

Â Now that's not what we were looking for.

Â We were looking for the fifth smallest element in the array.

Â That, of course, is bigger than the third smallest element of the array.

Â So by partitioning, where is the fifth element going to be?

Â It's gotta be to the right of this third smallest element,

Â to the right of the pivot.

Â So we know for sure that the fifth order statistic of the original array

Â lies to the right of the pivot.

Â That is guaranteed.

Â So we know where to recurse on the right hand side.

Â Now, what are we looking for?

Â We are no longer looking for the fifth order statistic,

Â the fifth smallest element.

Â Why?

Â Well we've thrown out both the pivot and everything smaller than it.

Â Remember we're only recursing on the right hand side.

Â So we've thrown out the pivot, the hird element, and everything less than it,

Â the minimum and the second minimum.

Â Having deleted the three smallest elements and originally looking for

Â the fifth smallest of what remains, of what we're recursing on.

Â We're looking for the second smallest element.

Â So the selection algorithm in general, is just the generalization of this idea.

Â So arbitrary arrays and arbitrary situations of whether the pivot comes back

Â equal to less or bigger than the element you are looking for.

Â So let me be more precise, I am going to call this algorithm R select for

Â randomized selection, and according to the problem definition it takes as input,

Â as usual an array A of some length n.

Â Then also the order statistic that we are looking for, so we are going to call

Â that i, and of course we assume that i is some integer between one and inclusive.

Â So for the base case, that is going to be if the array has size one,

Â then the only element we could be looking for is the oneth order statistic and

Â we just return the sole element of the array.

Â 12:10

Now we have to partition the array around the pivot element.

Â And just like in quick sort, we're going to very lazy about choosing the pivot.

Â We're going to choose it uniformly at random from the n possibilities, and

Â hope things work out.

Â And that will be the crux of the analysis,

Â proving that random pivots are good enough sufficiently often.

Â Having chosen the pivot, we now just invoke the standard partitioning and

Â subroutine.

Â As usual, that's going to give us the partitioned array.

Â You'll have the pivot element, you'll have everything less in the pivot to the left,

Â everything bigger, to the right.

Â As usual, I'll call everything to the left,

Â the first parts of the partitioned array.

Â And everything bigger, the second part.

Â 12:45

Now we have a couple of cases, depending on whether the pivot is bigger or

Â less then the element we are looking for.

Â So I need a little notation to talk about that.

Â So let's let j be the order statistic that p is.

Â So if p winds up being the third smallest element like in the quiz,

Â then j's going to be equal to three.

Â Equivalently we can think of j as defined as the physician of

Â the pivot in the partition version of the array.

Â Now there's one case, which is very unlikely to occur, but

Â we should include it just for completeness.

Â If we're really lucky, then, in fact,

Â a random pivot just happens to be the order statistic we were looking for.

Â That's when i equals j.

Â We're looking for the ith smallest element.

Â If by dumb luck the pivot winds up being the ith smallest element, we're done.

Â We can just return it.

Â We don't have to recurse.

Â 13:32

Now in general of course,

Â we don't randomly choose the element we are looking for.

Â We choose something that, that could be bigger or could be smaller then it.

Â In the quiz we chose a pivot that was smaller then what we were looking for.

Â Actually, that's the harder case.

Â So, let's first start with a case,

Â where the pivot winds up being bigger then the element we were looking for.

Â So that means that j is bigger than i.

Â We're looking for the i smallest.

Â We randomly chose the j smallest for j bigger than i.

Â So this is the opposite case of the quiz.

Â This is where we know what we're looking for has to be to the left of the pivot.

Â The pivot's the j smallest everything less than is to the left.

Â We're looking for the i smallest, i is less than j, so

Â that's got to be on the left.

Â That's where it recurs.

Â Moreover it clear we're looking for exactly the same order statistic.

Â If we're looking for the third smallest element, we're only throwing out stuff

Â which is bigger than something even bigger hthan the third smallest element so

Â we're still looking for the third smallest of what remains.

Â And naturally the new array size is j minus 1 because that's what's to

Â the left of the pivot.

Â And then finally, the final case is when the random element that we choose is less

Â than what we're looking for and then we're just like the quiz.

Â Namely what we're looking for is bigger than the pivot.

Â It's got to be in the right-hand side.

Â We know we've got a recurse in the right-hand side.

Â Whenever the right-hand side has n minus j elements,

Â we throw out everything up to the pivot.

Â So we throw out j things.

Â There's n minus j left.

Â All of those j things we threw out are less than what we're looking for.

Â So if we used to be looking for the i smallest element now we're looking for

Â the i minus j smallest element.

Â 14:56

So that is the whole algorithm.

Â That is how we adopt the approach we took to the sorting problem in quick sort and

Â adapt it to the problem of selection.

Â So, is this algorithm any good?

Â Let's start studying its properties and understand how well it works.

Â So let's begin with correctness.

Â So the claim is that, no matter how the algorithm's coin flips come up,

Â no matter what random pivots we choose, the algorithm is correct.

Â In the sense that it's guaranteed to output the ith order statistic.

Â The proof is by induction.

Â It proceeds very similarly to quick sort.

Â So I'm not going to give it here.

Â If you're curious about how these proofs go,

Â there's an optional video about the correctness of quick sort.

Â If you watch that and understand it, it should be clear how to adapt

Â that inductive argument to apply to this select algorithm as well.

Â 15:39

So as usual for divide and conquer algorithms, the interesting part is not so

Â much knowing, understanding why the algorithm works, but

Â rather understanding how fast it runs.

Â So the big question is, what is the running time of this selection algorithm?

Â Now, to understand this we have to understand the ramifications of

Â pivot choices on the running time.

Â So you've seen the QuickSort videos they're fresh in your mind so

Â what should be clear is that just like in QuickSort how

Â fast this algorithm runs is going to depend on how good the pivots are and

Â what good pivots means is pivots that guarantee a balanced split.

Â 16:29

So the correct answer to this question is exactly the same as the answer for

Â QuickSort.

Â The worst case running time, if the pivots are chosen just in a really unlucky way.

Â Is actually quadratic in the array length.

Â Remember, we're shooting for linear time.

Â So this quadratic is a total disaster.

Â So how could this happen?

Â Well suppose you're looking for the median, and

Â suppose you choose the minimum element as the pivot every single time.

Â So if this is what happens, if every time you choose a pivot to be the minimum,

Â just like in QuickSort, this means every time you recurse,

Â all you succeed in doing is peeling off a single element from the input array.

Â Now, you're not going to find the median element until you've roughly

Â n over 2 recursive cause,

Â each on an array that has size at least a constant fraction of the original one.

Â So it's a linear number of recursive calls,

Â each on an array of size at least some constant times n.

Â So that gives you a total running time of quadratic overall.

Â Of course, this is an absurdly unlikely event.

Â Frankly, your computer is more likely to be struck by a meteor than it is for

Â the pivot to be chosen as the minimum element in every recursive call.

Â But if you really have an absolutely worst case choice of pivots,

Â it would give this quadratic run time down.

Â 17:54

So the key to a fast running time is going to be the usual property that we want to

Â see in the divide and conquer algorithms, namely every time that recurse the problem

Â size better not just be smaller but it better be smaller by a significant factor.

Â How would that happen in the selection approach based on the partition

Â subroutine?

Â Well if both of the sub-problems are not too big,

Â then we're guaranteed that when we recurse we make a lot of progress.

Â So let's think about what the best possible pivot would be

Â in the sense of giving a "balanced" split, right, so of course in some sense the best

Â pivot is you just choose your statistic group you're looking for.

Â Then you're done in constant time.

Â But that's extremely unlikely, and it's not worth worrying about.

Â So ignore the fact that we might guess the pivot.

Â What's the best pivot if we want to guarantee an aggressive

Â decrease in the input side for the next iteration.

Â Well, the best pivot is the one that gives as most balanced split as possible.

Â So what's the pivot that gives us the most balanced split?

Â A 50/50 split. If you think

Â about it it's exactly the median.

Â Of course, this is not super-helpful,

Â because the median might well be what we're looking for in the first place.

Â So this is sort of a circular idea.

Â But for intuition, it's still worth exploring what kind of running time we

Â would get in the best case, right?

Â If we're not going to get linear time even in this magical best case,

Â we certainly wouldn't expect to get it on average over random choices of the pivots.

Â So what would happen if we actually did luckily choose the median as the pivot

Â every single time?

Â Well we get the recurrence that the running time that

Â the algorithm requires at a rate of length n.

Â Well, there's only going to be one recursive call.

Â So this is the big difference from QuickSort

Â where we had to recurse on both sides and we had two recursive calls.

Â So here, we're only going to have one recursive call.

Â In the magical case where our pivots are always equal to the median,

Â both sub-problem sizes are only half as large as the original one.

Â So when we recurse, it's on a problem size guaranteed.

Â Could be at most n over two and then outside the recursive call pretty much

Â all we do is a partitioning invocation, and we know that that is linear time.

Â So the recurrence we get is T of N is the most T of N over two plus big O of N.

Â This is totally ready to get plugged into the master method.

Â It winds up being two of the master method and

Â indeed we get exactly what we wanted, linear time.

Â 20:07

To reiterate this is not interesting in its own right.

Â This is just for intuition.

Â This was a sanity check that at least for a best case choice of pivots

Â we'd get what we want, the linear time algorithm and we do.

Â Now, the question is how well do we do with random pivots?

Â Now the intuition, the hope is exactly as it was for QuickSort which is the random

Â pivots are perfectly good surrogate for the median, for the perfect pivot.

Â 20:31

So having the analysis of Quicksort under our belt where indeed random pivots

Â do approximate very closely to the performance you get with best case pivots

Â maybe now we have reason to believe that this is hopefully true.

Â That said, as a mathematical statement this is totally not obvious and

Â it's going to take a proof.

Â That's the subject for the next video.

Â Let me just be clear exactly what we're claiming.

Â Here is the running time guarantee the random Rselection provide.

Â 20:57

For an arbitrary input array of input length n,

Â the average running time of this randomized selection is linear.

Â Big O of n. Let me reiterate a couple of points I made

Â for the analogous guarantee for the QuickSort algorithm.

Â The first is that we're making no assumptions for data whatsoever.

Â In particular we're not assuming that the data is random.

Â This guarantee holds,

Â no matter what input array you feed into this randomized algorithm.

Â In that sense, this is a totally general purpose subroutine.

Â