0:02

Welcome back. Today, we're going to look at Priority Queues which is a variant of

Â sorting that generalizes the idea to provide more flexible data structure that

Â we can use for all sorts of applications. To get started, we'll look at the API and

Â some elementary implementations. So at a week or so ago, we looked at collections

Â in Java and the idea of elementary data structures where we insert and delete

Â items. And then, the data structures differ on the basis of which item we choose

Â to delete. We looked to the push down stack where we removed the item that was

Â most recently added, And the queue where we remove the item that was least recently

Â added. And then, we talked about randomized queue or bag where we might

Â remove a random or an arbitrary item. Today, what we're going to talk about is

Â called the priority queue and for that, we insert items, but when it's time to

Â remove, we consider the items to have a total order and we want to remove the

Â largest or the smallest item. So this little example if we insert P, Q, and E

Â then when we do remove max, we want to get the Q out and for later, we insert X, A,

Â and M and then we removed max. The largest one in there is X. We'll insert P,

Â L, and E and then, after a while, we remove max P. So, that's our basic setup.

Â That's our definition of what a priority queue is. So, the API will look very

Â similar to our stack or queue API with a difference that we want to have generic

Â items that are comparable. So the Java language for that is in the class header.

Â We say that our generic type Key extends Comparable of Key. And that just means

Â that our keys must be Comparable and we must have a compareTo() method that will

Â compare it to another key. Otherwise we have a constructor and actually for some

Â applications, it's convenient to have a constructor that takes an array of keys as

Â argument. Then, an insert() that puts something in like push in a stack or

Â enqueue in a queue. Then delete the maximum. I referred to delete the minimum

Â just to avoid confusion, we have the implementation separate implementation

Â usually MinPQ, where we delete the minimum. Then test isEmpty() and we also

Â sometimes have extra method that just gives us the value of the largest key and

Â also size which is useful sometimes in collections. You might also want it to be

Â iterable but we'll skip that for now. There are lots and lots of applications of

Â priority queues. Even though it emerged as a data structure relatively late in the

Â game now that we see that there are many algorithms that are much easier to

Â implement when we think about the priority key abstraction. We have data that we are

Â processing we can't process it all at once. We have to save it some of way. And then,

Â what the priority queue tells us is - let's organize it in some ways that we are

Â always taking the best one to process next. And it turns out to be very close to

Â a generic algorithmic design technique that we will be looking at in many, many

Â different applications. Today, we're going to talk about event-driven simulation

Â which is an interesting idea that is based on priority queues but it's also used in

Â numerical computation and we'll see in algorithms for data compression and graph

Â searching that it's useful. And in many other applications in Computer Science and

Â in scientific computing. It generalizes the stack and the queue and gives us a data

Â structure that we can use to effectively design algorithm of all sorts. So here's

Â just a particular client that will help explain the idea. So our, our challenge is

Â let's say this is on the web we have billions of transactions, you know, and they

Â are streaming through our data warehouse or processor in some way. And just a very,

Â very huge number of transactions. In fact, we couldn't possibly hope to even store

Â them all. There's trillions of them there coming through as fast as possible. But

Â we're interested in the biggest ones and so maybe it's the biggest amount of

Â dollars, or the biggest cost, or whatever it might happen to be. And so we can pick

Â some numbers that we can store. I would like to, to store the, the thousand

Â biggest ones. So, you can imagine a credit card company looking for fraud - it's going

Â to care about keeping track of the largest transactions. So, maybe we can store

Â millions or thousands of them. So that's our parameter M - that's the number we can

Â afford to store but the total number of items we couldn't possibly afford to store

Â them. So and this is just some test data where we've got all, all these

Â transactions and so we are going to be able to take in data like this and again

Â an unbounded stream of data. But let's say, we want to keep track of the top five

Â [cough] values using the third column as the value. So we're going to look at a

Â client program called TopM that takes the command-line argument, how many and

Â this case, it's going to say five and then it's going to take from standard input the

Â transactions and it will print out the top five. So, that's a canonical example of a,

Â a priority queue client that we need to design a program that can do that. So,

Â with the priority queue abstraction that's not too difficult to do. So we are going to

Â use a min-oriented priority queue so that's going to keep, it'll [cough] it'll

Â be one where we can delete the minimum and, and it'll be generic so we'll have a

Â transaction type that holds this information including natural ordering

Â where it's ordered by dollars that last column. So, we'll build a new priority

Â queue, min priority queue or we'll have the capability to delete the minimum. And

Â then we read from standard input. We read the line, build the transaction from the

Â information on that line. And that will fill in the fields and then, we put that

Â transaction on the priority queue. Now, if the priority queue has more than M items

Â because we inserted that one, then we want to delete the smallest one there and that

Â way, we're keeping track of the largest M. Whenever we get a new one, then we

Â throw away the smallest one that's there. So, even with this huge stream of items

Â coming through, we're only keeping track of the M largest items and that's a fine

Â canonical client for priority queue. Now how we are going implement or solve this

Â problem or you can think of lots of ways to go ahead and solve this problem of

Â finding the largest M items in the stream of N items. So, for example, you could

Â sort them. And then just look at the M at the end but by sending up the problem, I

Â already kind of ruled that one out because we don't have the space to sort them all,

Â to store them all. So sorting is out of the question. We'll look at couple of

Â elementary priority queue implementations that are straightforward. You know, keep

Â the items like we would in the stack and then when we need to find the smallest or

Â the largest look at, look at them all. But that's going to be too slow. M is large

Â and N is huge, and M<i>N is going to be too slow. What would what we we'll look at is</i>

Â very simple and practical implementation using a data structure called the binary

Â heap that gets the job done in time proportional to N log M and only M space.

Â And that's pretty close to the best that we could do in theory and is very

Â important and useful, practical implementation and data structure.

Â Alright. So here's just an overview of two elementary implementations for priority

Â queues using the example operations that I gave before. So you can imagine keeping

Â the item, say, in a linked list or in a doubling array and just keeping just an

Â order just as we would in the, in the stack just keeping in the way that they

Â come in. And we'll put a new item at the end of the array and remove it from the

Â end of the array. Or you could do it in a linked list, and then when it's time to find the,

Â remove the maximum, you have to scan through everything to find the maximum.

Â So, so, that's a one way you could implement this with, with a linked list or

Â with the resizing array. Or you might to say well let's try to keep things in

Â order. And then that might involve some work with the, it's like insertion sort,

Â you find a place to put the new item and then put it in the right place. And again,

Â you could do this with a linked list or with the resizing array but then, with array,

Â you'd have to move all the larger ones over one position to fit the new item in. When

Â we insert E and that's supposed to keep them in order, we have to move over L, M,

Â P, and P to get the E and then so forth. But the advantage of that might be that

Â removing the maximum is easy. We just take away the one at the end. To remove the

Â Q - we know it's at the end to remove the max. At this point, that's X - it's

Â right at the end, and P is right at the end. So you can imagine the implementations of

Â priority queues using these two basic strategies. Not much code involved. This

Â is a an ordered array implementation of priority queues and it's quite straight

Â forward. We didn't put in the this is the cheat version where we require the client

Â to provide a capacity but we could easily change this to a resizing array. So

Â insert() just puts it at the end, and since its unordered delete maximum has to go

Â through the entire array to try to find the maximum when it refines it and the

Â changes that we're the one at the end and then removes it the same way that we do

Â within the stack. It'll use less and exchange just like we would in sorting

Â methods. So that's a fine implementation if the priority queue was going to be tiny

Â all the time. So if you, without even implementing it, you can understand this

Â table that if we use an unordered array implementation we can get insertion done

Â in constant time but we have to look at everything to delete the

Â maximum or even find the maximum. If we keep it in order, we can find the maximum

Â or delete it at constant time but it takes us linear time to insert. And since as

Â with stack and queue operations, these insert and deletes might be intermixed in

Â arbitrary ways and there might be huge numbers of them either one of these is

Â very attractive because they're going to take N times the number of operations.

Â Whereas what we can hope for and what we actually will achieve is to get log N time

Â for all operations, time proportion to log N for all operations. With the clever data

Â structure and interesting implementation we can actually achieve that goal. That's

Â the basic API and some elementary implementations for priority queues.

Â