[MUSIC] All right, welcome back. In the last video, we started to see real system times. In this next video, we're going to go into more depth. We're gonna look at those real system times and we're gonna use them to make decisions about the performance of our algorithms, the performance of our programs. So by the end of this video, you should be able to use runtimes for a real system to reason about performance. So in the last video, we had just seen the idea that we're gonna try to measure the sort times. Both for selection sort and for quick sort. Again, on a real system. So the idea behind this is to go with increasing sizes of n, print the value of n. Create a randomized array of size n and then time selection sort and print how long it took. We'll so the same thing again for quick sorts. We'll create a randomized array, time it and then print the outcome. What we got from that was essentially this set of data and we can see really quickly from the data that quick sort did better, but we couldn't really reason much about the performance yet, unless we do a more in-depth analysis. So what we're going to do next is throw this into Google Sheets and essentially, do some analysis of the graphs of this. So this is the initial graph of selection sort against quick sort. The y-axis here is how much time it took for them to run and the x-axis is essentially size n. And again, this confirms our original view of this that quick sort is just much, much faster than selection sort, even for fairly small values of n, but let's look at each of these algorithms in some depth. So starting with selection sort. What we see is essentially these are the runtimes as n goes forward. And if you look at this, it kinda looks like an n squared growth, but how do we know for sure? What we can do is essentially do a best fit on the data against say, an n squared. And the way to do that is essentially figure out a reasonable value, the constant that goes in front of that n squared term. So by best fit here, all I'm saying is I found a reasonably good constant value for k and then I plotted that. And you can see here that the actual runtimes for section sort, match an n squared growth quite well for this constant that I found. Now what you might be saying is well, wait a second, you said this is the best fit and you found a really good constant for it. Wouldn't it look really good whether it's n squared or n cubed or n? Well, let's do the same thing. I mean, try to find a good constant value for n cubed and a good concept value for n and see how it works. So here's the three different fits, we having blue is still the actual. In yellow, we still have the best fit for an n squared. And then in green, I have this best fit for n cubed. And immediately, you see that that n cubed grows way too quickly. So, either I pick a constant that has the values way too low early on or I have it essentially go off this chart fairly quickly. So n cubed just won't fit, no matter what constant value I use. Likewise, if we look at red, that's the best fit of n growth. And essentially here, we see that it's not fitting either. No matter what constant value I do for linear growth, it won't fit. It's gonna be too high at the beginning and too low later on. So really, this n squared is the right way to look at this in terms of the growth. And that's exactly what we would expect for selection sort, so we're happy to see that runtimes on our real system actually match our high-level analysis beforehand. So let's then focus in on quick sort. So in our initial view, we saw quick sort was essentially just this line on the bottom, because the values were so low, essentially just at the axis. So if I zoom in on it, what we're gonna get is a little bit more refined data or better view and the first thing I want you to notice is that this is real data. So we have two of these points that are essentially outliers and there is a number of reasons why these could be outliers, but I'll focus on two of those reasons. The first is that you know that the quick sort is n log(n), essentially an average case. But depending on the input set, it could run longer than that. So these could just be essentially to randomized inputs that didn't perform that well for a quick sort. What I think is actually is more likely is that this is on a real system. There could be other things scheduling, other programs running at the same time. And if you look at these values in the hundredths of seconds. So if something else got scheduled at the same time, so are conflicting for resources on the same system, you can see aberrations in data like this very easily. The second thing I wanna point out about this graph is that this kind of looks linear. Looks like linear growth, not the n log n that we talked about, the high-level analysis and how can you really tell the difference between linear versus n log(n)? Well, to do that, first thing is that we kinda need more data. These are really small sizes of n, especially for quick sort when quick sort is so quick. Now we have a great deal more data for quick sort and we're gonna all the way out to 10 million elements for size of n. And you can essentially see that we've still got the same kind of graph and we still have these some of these points are going up and down the chart, but this is really a clean, almost linear growth. But again, how do I tell the difference between when you're an log(n), especially in this granularity? What I can do is essentially that same best fit analysis that we just talked about. So what I'll do is I'll do a best fit, both for linear growth, as well as n log(n) growth. And if you zoom in on this, I know this is a little hard to see. If you zoom in on it, in yellow we have that n log(n) growth and it's matching the actual blue really well. Whereas in red, we've got our linear growth and you see that it essentially falls off. As you get to larger sizes of n, you started getting a bigger gap in terms of the actual versus the predicted data. So this really is n logging growth, which is exactly what we expect for quick sort. So that's great to see, but why were we seeing it look so much like linear? Well, the question is how big is log(n)? So let's say, my n is 10 million, how big is log(n)? Take a second to think about it. If you're answering a really small number in the 20s to 30s, you're spot on. In fact, 2 to the 23 is about 8 million. So this is roughly 23. Now 23 is really small relative to 10 million and that's the problem when you are trying to do a linear analysis versus n log(n) is that login is just so small relative to n. The n tends to dominate. So in summary, now that we've looked at all of these different data points, I want to bring up the fact that we can really just use real runtimes to reason about performance. But whenever you do so, be careful, because real system data can be noisy. But ultimately, if you want to know how well does your program behave on a real system, you have to do this kind of analysis.