0:01

Now, we'll take a look at how the sorting algorithms that we talked about or

Â expressed in the systems that we use everyday. Now, the key point is that

Â sorting algorithms rhythms are essential in a very broad variety of applications

Â and, and all of us use sorting algorithms pretty much every day. Many obvious out

Â applications like or, organizing your music library or displaying your search

Â results or listening feeds in your in your web browsers. There's some other

Â applications that are not so obvious where we use sorting as a to make a problem easy

Â once you know that they're sorted. And so, for example, finding the median and if

Â it's already sorted, it's much easy to find the median. And now, the statistical

Â problems are like that or finding duplicates. Probably finding duplicates by

Â itself is not quite obvious what to do but the easy way to solve it is to just go

Â ahead and sort. And then there are plenty of applications that we'll see later in

Â this course like data compression or computer graphics like finding the convex

Â hull, applications in science such as computational biology or, or in systems

Â development. We're having a efficient sort as absolutely crucial. So, because there's

Â all these applications most programming systems have a fast sort as an important

Â part of their infrastructure and Java is no exemption. So, Java has a method called

Â arrays.sort and it's intended to be a general purpose sorting method for use by

Â Java programmers. And now, that you have some understanding of the classic methods

Â you can have a better idea of why arrays.sort is the way that it is. So it

Â has the infrastructure that allows us to be used for all types of data types and

Â all types of ordering so it's got a method that implements comparable then its got

Â methods easy compare order. So that you can use the natural order or you can

Â provide a compare order and provide your own order for any type of data. It uses

Â actually both quicksort and mergesort. It uses two quick sort for primitive types of

Â data and a two mergesort for objects. Why two different well it's just the

Â designer's assessment of the idea that if a programmer is using object maybe spaces,

Â not a, a critically important consideration. And so, the extra space

Â used by merge sort maybe is not a problem. And if the program is using primitive

Â types, maybe performance is the most important thing. And so, then we'll use

Â the quicksort. To invoke arrays that sort, you have to import the name space from

Â java.util.Arrays and then all you need to do is invoke Arrays.sort. So I just

Â answered this question, why do we use different algorithms for the two types?

Â And this is, is maybe arguable. Now are referred to this idea of a good system

Â sort, there was a good system sort that a lot of people used for many years. And in

Â 1991, there were some scientists that, that Bell Labs that were using qsort for a

Â scientific problem and they were used to taking just a few minutes and then they

Â realized that it was taking hours of CPU time. And the fact was that all the qsort

Â implementations at that time in Unix had this flaw well, there are two flaws and

Â one of them is a little complicated about the way they are raised order and the

Â other one was for a raise that had lots of equal keys and this is Wilks and Becker

Â problem and have lot of equal keys, it was quadratic time. So, the system designer,

Â Jon Bentley was one of the designers to take a look at these problems and that

Â lead ultimately to the development of the 3-way quick sort that were used today. He

Â worked with Doug McIlroy and they wrote a, a, a paper that outline this problem and

Â talk about some of these things and they had a three-way partitioning method that

Â was somewhat like the Dijkstra method that we showed but a bit more complicated.

Â Anoth er thing they did was rather than shuffling the array. They [cough] used

Â what's called a method for choosing partitioning element called Tukey's

Â ninther. Tukey is a statistician and he had this particular method for order

Â statistics that has some interesting properties and use that for the

Â partitioning element. This paper was very influential and, and that basic method is

Â widely used. And Tukey's ninther is just pick nine items out of the array and take

Â the median of the mediums and that's the ninther. So very inexpensive and they had

Â macros to do this so and use not too much cost to find a partitioning element that's

Â much closer to the middle than, and if you use a, a random one. So the, the reason

Â they used that is they thought they got them closer to the middle and they also

Â don't like the, some system designers don't like the idea of using random

Â choices in a system method because of way that it changes the state of the system.

Â So they felt that they got better partitioning than a random shuffling and

Â it was also less costly and then generating random numbers including this

Â change of state problem. But there's a problem so you would think that the system

Â sort would be completely solid with all this resource with all these research and

Â all of the development that's going into it. In fact there's a file out there in

Â your book site and get it that will actually break the Java system sort. There

Â was a killer input that will make the thing run in quadratic time. Actually it

Â usually crashes because it's recursive and it crashes the system stack. And it can

Â cause all sorts of problems. There's a killer input for the system sort and, and

Â it can be disastrous in various systems and the reason is, they didn't do the

Â random shuffling. Mcilroy, himself, actually found this problem that you could

Â while the sort is running figuring out an inp ut that would make it crash. And so,

Â you can just run that program and if the sort doesn't use randomness then it's

Â vulnerable to this attack. So which algorithm should we use to sort that's,

Â that's really a key question. We've looked at lot of sorting algorithms and actually,

Â there's hundreds of sorting algorithms out there and we have chosen the most

Â important and the most interesting for you but you could literally spend a year

Â reading all the papers on sorting and then you still continue to be invented new

Â algorithms are developed and that are found to have good characteristics all the

Â time. And really, the key idea is really important to think about cuz it applies to

Â all sorts of algorithmic problems. On our way, we've been talking about rules of the

Â game. What is it that we care about in a sort? It's a little bit more complicated

Â than just put stuff in order. There's a lot of attributes, the different

Â applications have. Like stability, that's a fairly sophisticated attribute that you

Â really have to think about, you maybe not be aware of. Maybe your computer is

Â parallel and the sort has to be parallel and we found that equal keys make a huge

Â difference. And I didn't really think about that at the beginning but it can

Â make a huge difference. Maybe the way your computer's memory is organized make a

Â difference. Maybe the keys are small and the items are large or maybe the keys are

Â really huge. Do we need guaranteed performance? Are we happy with random

Â performance? Do we know, is the array randomly ordered? You can think of a

Â matrix shown in the right here where we list out all the possible attributes and

Â then there's algorithms that worked well for different combinations of attributes.

Â But the thing is, there is way more possible combinations of attributes than

Â there are algorithms. So there is going to be situations that are going to require an

Â understanding of what it takes to engineer a, a sort method that's appropriate for

Â your application. There can't be a system sort out there that's going to cover all

Â possible combinations of attributes. Now, usually it's going to be good enough but

Â it's definitely worth while to understand what's going on with different sorting

Â algorithms in order to even find improved performance over the system sort. So,

Â here's the summary of some of the things that we've talked about. We've talked

Â about six different sorting algorithms. Then, we've talked about a bunch of

Â attributes. For example, the top line, selection sort is in place it always takes

Â about N^2 / two comparisons. And one of the things to remark about it is that it

Â only uses N exchanges and so forth. Insertion sort best case linear,

Â quadratic, and place unstable. And it'll work well if the file is small or

Â partially ordered. Shellsort, we don't know it's a running time, it's not stable

Â but it's a fast method for intermediate size files and not much code. Mergesort

Â and log N guarantee unstable but not in place, need that auxiliary array.

Â Quicksort not stable. Quadratic time worst case but that's unlikely to occur if you

Â do the random shuffling. And its the fastest and most useful in practice

Â particularly if you make improvements to deal with duplicate keys. And it's

Â interesting to note we've looked at important and classic algorithms that are

Â widely deployed but we don't have a, a useful, practical algorithms that are

Â widely used that's got all of these characteristics that's in place and stable

Â worst case N log N. There's versions of merge sort that come close but they are

Â too complex for practitioners to have adopted them. Those are some of the

Â situations that we encounter when developing a system sort. And, quicksort

Â certainly plays a role in most system sorts.

Â