0:00

This optional video will be, more or less, the last word that we have on sorting for

Â the purposes of this course. And it'll answer the question, can we do better?

Â Remember, that's the mantra of any good algorithm designer. I've shown you N log N

Â algorithms for sorting, Merge Short in the worst case, Quick Sort, on average. Can we

Â do better than N log N? Indeed, for the selection problem, we saw we could do

Â better than N log N. We could linear time. Maybe we can do linear time for sorting as

Â well. The purpose of this video is to explain to you why we cannot, do sorting,

Â in linear time. So this is a rare problem where we understand quite precisely how

Â well it can be solved at least for a particular class of [inaudible] called

Â comparison based sorts which I'll explain in a moment. So here's the form of the

Â theorem, I want to give you the gist of in this video. So in addition to restricting

Â to comparison based sorts which is necessary as we'll see in a second, I'm

Â going to make a second assumption which is not necessary but is convenient for the

Â lecture which is that I'm only going to think about deterministic algorithms for

Â the moment. I encourage you to think about why the same style of arguments gives an N

Â log and lower bound on the expected running time of any randomized algorithm.

Â Maybe I'll put that on the course site as an optional theory problem. So, in

Â particular, a quick sort is optimal in the randomized sense. It have average and long

Â end time and then again claims that no comparison based sort can be better than

Â that, even on average. So, I need to tell you what I mean by a comparison based

Â sorting algorithm. What it means, it's a sorting algorithm that accesses the

Â elements of the input array. Only via comparisons, it does not do any kind of

Â direct manipulation on a single array element. All it does, is it picks pairs of

Â elements and asks the question is the left one bigger or is the right one bigger. I

Â like to think of comparison based sorts as general purpose sorting routines. They

Â make no assumptions about what the data is other than that it's from some totally

Â ordered set. I like to think of it really as a function that takes as an argument a

Â function pointer that allows it to do comparisons between abstract data types.

Â There's no way to access the guts of the elements. All you can do is go through

Â this API, which allows you to make comparisons. And indeed if you look at the

Â sorting routine and say the unit's operating system, that's exactly how it's

Â set up. You just patch in a function pointer to a comparison operator. I know

Â this sounds super abstract so, I think it becomes clear once we talk about some

Â examples. There's famous examples of comparison based sort including everything

Â we've discussed in the class so far. There's also famous examples of non

Â comparison based sort which we're not gonna cover, but perhaps some of you have

Â heard of or at the very least they're very easy to look up on Wikipedia or wherever.

Â So examples include the two sorting algorithms we discussed so far, mergesort.

Â The only way that mergesort interacts with the elements in the input array is by

Â comparing them and by copying them. Similarly, the only think Quick Sort does

Â with the input array elements is compare them and swap them in place. For those of

Â you that know about the heap data structure which we'll be reviewing later

Â in the class. Heap sort. Where you just, heapify a bunch of elements, and then

Â extract the minimum N times. That also uses only comparisons. So what are some

Â famous non examples? I think this will make it even more clear what we're talking

Â about. So bucket sort is one very useful one. So, bucket sort's used most

Â frequently when you have some kind of distributional assumption on the data that

Â you're sorting. Remember that's exactly what I'm focusing on avoiding in this

Â class. I'm focusing on general purpose subroutines where you don't know anything

Â about the data. If you do know stuff about the data, bucket sorting can sometimes be

Â a really useful method. For example, suppose you can model your data as I-I-D

Â samples from the uniform distribution on zero one. So they're all rational numbers,

Â bigger than zero, less than one, and you expect them to be evenly spread through

Â that interval. Then what you can do in bucket sort is you can just. Preallocate

Â end buckets where you're gonna collect these elements. Each one is gonna have the

Â same width, width one over n. The first bucket you just do linear pass with the

Â input array. Everything that's between zero and one over n you stick in the first

Â bucket. Everything in between one over n and two over n you stick in the second

Â bucket. Two over end and three over n you sick in the third bucket and so on. So

Â with the single pass. You've classified the input elements according to which

Â bucket they belong in, now because the data is assumed to be uniform at random,

Â that means you expect each of the buckets to have a very small population, just a

Â few elements in it. So remember if it. Elements are drawing uniform from the

Â interval zero one, then it's equally likely to be in each of the N available

Â buckets. And since there's N elements that means you only expect one element per

Â bucket. So that each one is gonna have a very small population. Having bucketed the

Â data, you can now just use, say, insertion sort on each bucket independently. You're

Â gonna be doing insertion sort on a tiny number of elements, so that'll run in

Â constant time, and then there's gonna be linear number of buckets, so it's linear

Â time overall. So the upshot is. If you're willing to make really strong assumptions

Â about your data like it's drawn uniformly at random from the interval zero one then

Â there's not an N log in lower bound in fact you can allude the lower bound and

Â sort them in your time. So, just to be clear. In what sense is bucket sort not

Â comparison based? In what sense does it look at the guts of its elements and do

Â something other than access them by pairs of comparisons? Well, it actually looks at

Â an element at input array and it says what is its value, and it checks if its value

Â is.17 versus.27 versus.77, and according to what value it sees inside this element,

Â it makes the decision of which bucket to allocate it to. So, it actually stares at

Â the guts of an element to decide how, what to do next. Another non-example, which eh,

Â can be quite useful is count and sort. So this sorting algorithm is good when your

Â data again we're gonna make an assumption on the data, when their integers, and

Â their small integers, so they're between zero and K where K is say ideally at most

Â linear in N. So then what you do, is you do a single pass through the input array.

Â Again, you just bucket the elements according to what their value is. It's

Â somewhere between zero and K, and it's an integer by assumption. So you need K

Â buckets. And then you do a pass, and you sort of depopulate the buckets and copy

Â them into an output array. And that gives you a, a sorting algorithm which runs in

Â time, O of N Plus K. Where K is the size of the biggest integer. So the upshot with

Â counting sort is that, if you're willing to assume that datas are integers bounded

Â above by some factor linear in N, proportional to N, then you can sort them

Â in linear time. Again county sort does not access the rail and it's merely through

Â comparisons. It actually stares at an element, figures out what it's value is,

Â and uses that value to determine what bucket to put the element in. So in that

Â sense it's not a comparison case sort and it can under compare it's assumptions to

Â beat the end log and lower it down. So a final example is the one that would

Â [inaudible] them rated sort. I think that this is sort of an extension of counting

Â sort, although you don't have to use counting sort as the interloop you can use

Â other so called stable sorts as well. It's the stuff you can read about in many

Â programming books or on the web. And up shot at rated sort. [inaudible]. You, you

Â again you assume that the date are integers. You think of them in digit

Â representation, say binary representation. And now you just sort one bit at time,

Â starting from the least significant bits and going all the way out to the most

Â significant bits. And so the upside of rating sort, it's an extension of counting

Â sort is the sense that if your data is integers that are not too big,

Â polynomially bounded in N. Then it lets you sort in linear time. So, summarizing,

Â a comparison based sort is one that can only access the input array through this

Â API, that lets you do comparisons between two elements. You cannot access the value

Â of an element, so in particular you cannot do any kind of bucketing technique. Bucket

Â sort, counting sort, and rating sort all fundamentally are doing some kind of

Â bucketing and that's why when you're willing to make assumptions about what the

Â data is and how you are permitted to access that data, that's when you can

Â bypass in all of those cases, this analog and lower value. But if you're stuck with

Â a comparison based sort, if you wanna have something. General purpose. You're gonna

Â be doing n log n comparisons in the worst case. Let's see why. So we have to prove a

Â lower band for every single comparison based sorting method, so a fixed one. And

Â let's focus on a particular input length. Call it N. Okay, so now, let's simplify

Â our lives. Now that we're focused on a comparison based sorting method, one that

Â doesn't look at the values of the array elements just in the relative order. We

Â may as well think of the array as just containing the elements... One, two,

Â three, all the way up to N, in some jumbled order. Now, some other algorithm

Â could make use of the fact that everything is small integers. But a comparison based

Â sorting method cannot. So there's no loss in just thinking about an unsorted array

Â containing the integers [inaudible] N inclusive. Now, depsite seemingly

Â restricting the space of inputs that we're thinking about, even here, there's kind of

Â a lot of different inputs we've gotta worry about, right? So N elements can, can

Â show up, and N factorial different orderings, right? There's N choices for

Â who the first element is, then N-1 choices for the second element, M minus two

Â choices for the third element, and so on. So, there's N factorial for how these

Â elements are, are arranged in the input array. So I don't wanna prove this super

Â formally, but I wanna give you, the gist, I think, the good intuition. Now, we're

Â interested in lower bounding the number of comparisons that, this method makes in the

Â worst case. So let's introduce a parameter K, which is its worst case number of

Â comparisons. That is, for every input, each of these end factorial inputs, by

Â assumption, this method makes no more than K comparisons. The idea behind the proof

Â is that, because we have N factorial fundamentally different inputs, the

Â sorting method has to execute in a fundamentally different way on each of

Â those inputs. But since the only thing that causes a branch in the execution of

Â the sorting method is the resolution of the comparison, and we have only

Â [inaudible] comparisons, it can only have two to the K different execution paths. So

Â that forces two to the K to be at least N factorial. And a calculation then shows

Â that, that forces K to be at least Omega N log N. So let me just quickly fill in the

Â details. So cross all in-factorial possible inputs just as a thought

Â experiment. We can imagine running this method in factorial times just looking at

Â the pattern of how the comparison is resolved. Right? For each of these

Â in-factorial inputs, we run it through this sorting method, it makes comparison

Â number one, then comparison number two, then comparison number three, then

Â comparison number four, then comparison number five, and you know it gets back a

Â zero, then a one, then a one, then a zero. Give in some other input and it gets back

Â a one, then a one, then a zero, then a zero and so on. The point is, for each of

Â these in-factorial inputs, it makes at most K comparisons, we can associate that

Â with a K bit string, and because it. Is there's only K bits we're only going to

Â see two to the K different K-bit strings two to the K different ways that a

Â sequence of comparisons results. Now to finish the proof we are gonna apply

Â something which I don't get to use as much as I'd like in an evident class but it's

Â always fun when it comes up, which is the pigeon-hole principle. The [inaudible]

Â principle you recall is the essentially obvious fact that if you try to stuff K

Â plus one pigeons into just K cubby holes, one of those K cubby holes has got to get

Â two of the pigeons. Okay at least one of the cubby holes gets at least two pigeons.

Â So for us what are the pigeons and what are the holes? So our pigeons are these in

Â factorial different inputs. The different ways you can scramble the images one

Â through. And, what are our holes? Those are the two indicate different executions

Â that the sorting method can possibly take on. Now if. The number of comparisons K

Â used is so small, that two to the K, the number of distinct execution, number of

Â distinct ways comparisons can resolve themselves, is less than the number of

Â different inputs that have to be correctly sorted. Then by the pivotal principal. One

Â Color [inaudible] gets two holes. That is, two different inputs get treated in

Â exactly the same way, by the sorting method. They are asked, exactly the same k

Â comparisons and the comparisons resolve identically. [inaudible] one. Jumbling of

Â one through N, then you get a 01101 then it's a totally different jumbling of N and

Â then again you get a 01101 and if this happens the algorithm is toast, in the

Â sense that it's definitely not correct, right, cuz we've fed it two different

Â inputs. And it is unable to resolve which of the two it is. Right? So, it may be one

Â premutation of one through N, or this totally different premutation of one

Â through N. The algorithm has tried to learn about what the input is through

Â these K comparisons, but it has exactly the same data about the input in two, the

Â two cases. So, if it outputs the correct sorted version in one case, it's gonna get

Â the other one wrong. So, you can't have a common execution of a sorting algorithm

Â unscramble totally different premutations. It can't be done. So what have we learned?

Â We've learned that by correctness, two to the K is in fact at least in the

Â factorial. So how does that help us? Well, we wanna lower bound K. K is the number of

Â comparisons this arbitrary storing method is using. They wanna show that's at least

Â N log N. So we, to lower bound K, we better lower bound N factorial. So, you

Â know, you could use Stirling's Approximation or something fancy. But we

Â don't need anything fancy here. We'll just do something super crude. We'll just say,

Â well, look. This is the product of N things, right? N times N minus one time N

Â minus two, blah, blah, blah, blah. And the largest of those, the N over two largest

Â of those N terms are all at least N over two. The rest we'll just ignore. Pretty

Â sloppy, but it gives us a lower bound of N divided by two raised to the N divided by

Â two. Now we'll just take log base two of both sides, and we get the K is at least N

Â over two, log base two of N over two, also known as omega of N log N. And that my

Â friends is why a heretics deterministic sorting algorithm that's comparison based

Â has gotta use N log N comparisons in the worst case.

Â