0:00

In this video, we will study the so-called merge sort algorithm.

Â It is based on the divide and conquer technique,

Â which main idea is the following.

Â To solve a given computational problem, you first split it into two or

Â more disjoint subproblems, then you solve each of these subproblems recursively.

Â And finally, you combine the results that you get from the recursive

Â calls to get the result for your initial subproblem.

Â And this is exactly what we're going to do in merge sort algorithm.

Â So let's show a toy example.

Â We're given an array of size eight, and we are going to sort it.

Â First, we just split this array into two halves of size four,

Â just the left half and the right half.

Â Then we make two recursive calls to sort both these parts.

Â These are two results in arrays.

Â Now what remains to be done is to merge these two arrays into one,

Â these two arrays of size four into one array of size eight.

Â Well, let's think how this can be done.

Â First of all,

Â I claim that it is easy to find the minimal value in the resulting array.

Â Indeed, we know that the minimum value in this case in the first array is two, and

Â the minimum value in the second array is one.

Â Which means that the minimum value in the result in merge array must be one.

Â So let's take one from the right side of array, put it in the resulting array and

Â forget about it.

Â It is already in its right place.

Â 1:42

the result of merging these two arrays.

Â In this case, it is two, because the minimum value in the array of size four

Â is two, and the minimum value in the arrays of size three is six.

Â So two is smaller than six, so we get two out of our left array,

Â put it into the resulting array after one, and press hit.

Â In the end, we get the following sorted array.

Â Again, the pseudocode of the merge sort algorithm directly implements this idea.

Â So this pseudocode takes an input array A of size n as an input.

Â And if n is equal to 1, then in this case, just nothing needs to be done,

Â we can just return the rate A itself.

Â If n is greater than 1, on the other hand, then we split the rate

Â A into two roughly equal parts and sort them recursively.

Â We call them B and C here.

Â Then the only thing that needs to be done is to merge these two sorted arrays.

Â So this is done in the procedure merge, which we will present on the next slide.

Â And finally, we just return the result of this merging procedure.

Â 2:55

The pseudocode of the merging procedure is also straightforward.

Â Assumes that we are given two sorted arrays, B and C, of size p and

Â q respectively, and we would like to merge them into a sorted array of size p + q.

Â So the first thing we do is create an array of size p + q in array D.

Â It is initially empty.

Â Then we keep doing the following thing.

Â So what is the minimum value among all the values stored in the arrays B and C?

Â Well, it is easy to find.

Â We know that the first element in the array B is its smallest element, and

Â the first element in the array C is its smallest element.

Â So the smallest one among these two is the smallest

Â element inside the unit of these two arrays.

Â So we just find the minimum of these first elements and move it from one of

Â these arrays to the results in array D, and forget about this element completely.

Â Now what is left is essentially the same problem.

Â We're left with two sorted arrays, and we still need to merge them.

Â So we do it exactly the same.

Â We take the first two elements,

Â we compare them and move the smaller one to the resulting array.

Â And we keep doing this while both of these arrays are empty.

Â I mean, we need this to be able to take their first elements.

Â When one of them becomes empty,

Â we just copy the rest of the other array to the resulting array D.

Â I mean, where rest to the resulting array D.

Â Well, it is not difficult to see that this procedure is correct, and the trying

Â time is p + q, namely, the size of the array p plus the size of the array q.

Â And this just because we just can both of these arrays from left

Â to right in the run of this merging procedure.

Â This is how sorting our initial array of size eight

Â by the merge sort algorithm looks like.

Â So the merge sort algorithm first splits

Â the initial array of size eight into two arrays of size four.

Â Each of these arrays of size four in turn is split into two arrays of size two, and

Â each of them is split into two arrays of size one.

Â Then merge procedure starts merging these arrays of size one into arrays

Â of size twos and into, then these arrays of size two into a size four.

Â And finally, it merges the result into arrays of size four,

Â into the resulting array of size eight.

Â We are now going to prove that the running time of the merge sort algorithm,

Â on a sequence containing n elements, is big O of n log n.

Â Know that this is significantly faster than a quadratic selection sort algorithm.

Â For example, it is perfectly okay to sort the sequence of size 1 million, for

Â example, 10 to the 6th, on your laptop using merge sort algorithm.

Â While for the quadratic time selection sort algorithm,

Â sorting a sequence of size 10 to the 6th, 1 million, will take roughly

Â 10 to the 12th operations, which is too much for modern computers.

Â 6:11

Okay, so to prove this lemma, to prove the upper bound on the running

Â time of the merge sort algorithm, first know that to merge two parts

Â of size n over 2 of our initial array, takes the linear time.

Â Namely, big O of n, because while the left part has size n over 2,

Â the right part has size n over 2.

Â And for merging, we basically just combo these parts from left to right.

Â So it takes just a linear amount of work to do this.

Â Which, in turn means, that if we denote by T of n the running time of our merge sort

Â algorithm, then it satisfies the following recurrence.

Â T(n) is at most 2T(n / 2) + big O(n).

Â Here 2T(n / 2) could response to two recursive calls.

Â So we denote it by T(n), the running time of our algorithm on input of size n.

Â So when we sort two sequences of size n / 2,

Â we spend time twice T(n / 2).

Â So the big O of n term corresponds to what we do before we make recursive calls and

Â what we do after recursive calls.

Â So what we do before is just split the input array into two halves.

Â What we do after is merging the results of two arrays into one array of size n.

Â So it is not difficult to see that all of this can be done in linear time.

Â So we get this recurrence, and on the next slide,

Â we're going to show that this recurrence implies that the running

Â time of our algorithm is bounded from above by n log n.

Â To estimate the running time of this algorithm,

Â let's consider its recursion tree.

Â Namely, at the top of this tree, we have one array of size n.

Â So for this array of size n, we make two recursive calls for

Â arrays of size n over 2.

Â Each of these arrays of size n over 2 in turn is split into two

Â arrays of size n over 4.

Â So we get four arrays of size of n over 4 and so on.

Â So in this tree, we have log n levels.

Â Now let's estimate the work done at each of the levels of these three separately,

Â namely, once again, to solve a problem of size n.

Â To sort an array of size n, we first prepare to make recursive calls.

Â In this case, we just split the array into two halves of size n over 2.

Â Then we do make recursive calls, and then we need to combine the results.

Â So all the work now inside recursive calls will be accounted for

Â on the lower levels of this tree.

Â So now what we are going to do is to account for

Â only the work done before the recursive calls and

Â after the recursive calls at each separate level.

Â And we know already that it takes linear time to do this.

Â I mean, if we have an array of size n,

Â it takes linear time to split it into two halves.

Â And then it takes linear time to combine the results of

Â recursive calls into one array.

Â So let's just denote this time by cn,

Â I mean let's denote the hidden constant inside big O by c.

Â Then what we can say is that on the top level we spend time cn.

Â Then on the next level, for each subarray,

Â we spend time c times n over 2, because the size of array is n over 2.

Â However, we have 2 arrays, so the total work that we do at this level is 2

Â multiplied by c, multiplied by n over 2, which is again just cn.

Â On the next level, we spend time 4 because we have 4 arrays multiplied by c,

Â multiplied by n over 4, because the size of the array is now n over 4.

Â This is a cn again, and so on.

Â So we have log n levels.

Â At each level, we do roughly cn operations.

Â So the total number of operations in our algorithm is cn log n,

Â which proves our lemma.

Â So again, what we've just proved is that the running time of

Â the merge sort algorithm is big O of n log n.

Â So in the next video, we will show that actually no algorithm,

Â no comparison based algorithms, to be completely formal, can sort a given

Â sequence of n elements asymptotically faster than in n log n time.

Â Which actually means that the merge sort algorithm is asymptotically optimal.

Â