1:04

So I'm starting with another copy of the last slide from the end of the previous

lecture, the hill climbing lecture. And what we said in that slide was that

the big idea, the two big ideas, of this new method for placement with random hill

climbing were that there was this temperature control parameter that we

started at a very hot value. And we cooled very slowly over the course

of many, many, many swaps, random swaps, that we're going to try to drive the

placement process forward. And that this temperature parameter was a

hill climbing control parameter and when the temperature was hot, we were more

tolerant of. Of changes in the gate locations that

made the placement wire length worse. And when the temperature control

parameter was cold, we were not so tolerant.

And what tolerant meant was that we actually used some random numbers in some

interesting ways to calculate some probabilities to decide whether we should

or should not keep a gate swap that made things worse.

This is an instance of a very famous general optimization method that has a

very famous name. This is an algorithm called Simulated

Annealing and so in this lecture, we are going to talk about.

Simulated Annealing in the context of placement, but we're also going to give

you a little background about why this thing has this name.

So let's let's just talk about a little context here.

So here's a physical analogy for you. Suppose you want to make a perfect

crystal. What does perfect mean for a crystal?

It means that all the atoms are lined up in their crystal lattice sites.

And that's actually the lowest energy state.

That's what the atoms really want to do. So I've got a little example over here of

a crystal that has lousy order. It is a disordered crystal.

And I've got basically a little 8 by 8 grid of, of, pink circles, that,

Are trying to look like atoms. And then I've got four little red atoms

that are in sort of strange locations. They're clearly not where they want to

be. This is disordered as a higher energy.

How do you take this physical system and actually make it a really good high

quality crystal, a low energy crystal, a crystal where all the atoms are on their

lattice sites? The answer is you get the material really

really hot. So the atoms have the energy to move

around even to bad places which is to say, places not on their lattice crystal

sites. And then you cool it very, very slowly so

the atoms settle into good locations. And the idea is that when there's a lot

of energy, the atoms can move around even to bad places and when it's not hot, the

atoms have to sort of stay where they found.

And by getting it really hot and cooling it really slowly, you actually encourage

it to go into this, from this highly disordered state to an ordered state.

This process has a name, it's called annealing.

So, you anneal it, you get it really hot and you cool it really slowly and

hopefully you get a crystal with, well it's probably not going to be perfect and

it's probably not going to be minimum but it's going to be really good.

It's going to have very few dislocations in the crystal lattice.

So that's physically how you do that. What if you actually wanted to write a

computer program to simulate that thing? What kind of physics do you need to know

to make that actually work. Well, you would have to have some kind of

a data structure that would represent the crystal lattice.

And you know, that could just be a bunch of atoms with x and y or x,y and x

coordinates. I'm just going to draw it in two

dimenstions here. And you would have, just like our placer

had. An inner loop where you would randomly

grab two gates and swap their locations and evaluate the change in a number that

mattered to you, and that change was the wirelength.

For the physical system, you grab an atom and you move it someplace, that's what

I'm showing here. You grab the blue atom and you move it

someplace, and the thing you calculate is the energy, then in particular you

calculate the change in energy, because that's what the crystal cares about.

And you also need to know how hot it is, because whether or not this is a good

thing to do, whether or not this is a likely thing to do, depends on how hot it

is. And so let's ask a question.

To model a real physical crystal, where we grab an atom, and we move it to a new

location and we calculate the change in energy, delta e.

What is the probability that that actually should happen in my computer

program? And the answer is 1.

If delta e goes down, if delta e is smaller, if this change in the atomic

location makes the energy better. You should keep it.

That's a good thing to do. That's a physically reali-, reasonable

thing to do. What if, to model a real physical

crystal, what if the energy goes up? What if you grab the atom and you move it

to a worse place in terms of the overall crystal lattice organization?

What if delta e is bigger than zero? Now, in your computer program, what's the

probability that you ought to, you know, make that happen?

And interestingly, the answer is e of the minus delta e over kt.

That's the physics of the real universe of taking that atom and putting it

someplace where the energy got locally, microscopically at the atomic level, a

little worse. Delta e is the change in the energy.

T is the temperature, and k amazingly enough, is the Boltzmann constant.

In real physics the units all have to line up.

If you probably took a physics class in your life you probably got yelled at by

some professor, lecturer, instructor or teaching assistant, because your units

didn't add up. You know, you can't take newtons and add

meters. It's just not okay, and similarly here

the thing that's in the exponent has to be without Units and so the thing on the

top and the thing on the bottom. The nits all have to cancel.

And so if the thing at the top is energy. And the thing on the bottom is

temperature there has to be something that links the way we measure temperature

to the way we measure energy. For this to get the real probability.

That the real atom, makes this real physical move and amazingly enough,

that's a famous number. That's a famous physical constant.

That's the Boltzmann constant and that's Ludwig Boltzmann over there on the right,

a picture from Wikipedia. The Boltzmann constant, in real physics.

Converts the units of temperature to the units of energy.

So what, you know, Ludwig was doing back there in the 19th century, was pioneering

some of the initial mathematics of statistical thermodynamics to be able to

sort of write these equations about what happens with what probabilities with what

energies. This is a long time ago.

Whats amazing is this e to the minus energy over temperature thing is the sort

of the foundational model how the probabilities works.

Now, I caution you that you don't need a k when you're doing a placement.

Because you don't want a placement. Nobody cares what the units of wire

length and temperature are. So as far as you're concerned, pretend

the temperature is measured in meters. Right?

It doesn't matter. It's all artificial.

It just has to be a probability. It has to be a number between 0 and 1.

We tune the parameters so that everything works.

But if you're doing real physics, with real atoms and real crystals and you want

real probabilities with real energies, the units all have to work and that's

where Ludwig comes in. This new method is called simulated

annealing. It's a very famous general optimization

method. You can optimize lots of different things

with it. It is, however, widely used in VLSI CAD.

It was invented at IBM in the early 1980s by Kirkpatrick, Gelatt, and Vecchi, from

this very famous paper from Science Magazine.

Which is a very interesting paper to read, I recommend it, if you can get

yourself a copy. and in fact some of the earliest examples

that our friends at IBM, in fact I know Scott pretty well used were actually

electronic design examples, ship design examples.

They were actually designing parts of IBM computers back there in the early 1980s,

and getting fabulously good results. And it's this idea of annealing something

getting it really hot. Allowing both downhill changes that make

things better. And uphill changes that make things

worse, with a probability controlled by the temperature.

But applying this in the form of a, of software, of a computer program, to

things that are not physical systems. Things like gates moving around on the

surface of chips. So, here's some pseudo-code for a

simulated annealing placer and it looks kind of complicated.

And what I want to say is that we're going to read this sort of from the

inside out to convince you that it's actually something very close to what you

already know. So, ignore the stuff with color behind

it. Ignore the yellow stuff and ignore the

kind of pink brown stuff, and let's just start with the simple white parts here,

and so up at the top it says that start with a random initial placement.

Okay, that's just like the greedy random iterative improvement stuff that you

know. And then it's got this part here that

says for s equals one to m times the number of gates, m is just how many swaps

we're going to do per gate, so pick a number.

1000. I'm going to do 1000 times the number of

gates swaps and that's going to be my placement.

And then look at the code inside the for loop, swap 2 random gates gi and gj.

Calculate delta l, the change of the wire length if delta l is less than 0, then

accept this swap, else. And then ignore the stuff with the brown

background, else undo this swap, okay. Well, great that's just the greedy

placer. What happens now when we add the,

simulated annealing stuff, is that, the first thing we do, is we add this inner

loop stuff that says, if the wire length went up, if delta l is greater than zero

than we generate a uniform random number r.

That's this thing here, and we calculate that probability p which is e to the

minus delta l over t. And we ask if r is less than p.

And if that's true then we accept this uphill swap.

It's a new worse placement but our hill climbing control parameter key says it's

okay to do. Alright, so that's the probabilistic swap

acceptance, and then the yellow stuff, is this new outer loop, this annealing

temperature control loop. And that says that you start with the

temperature being hot. And then you have this, this basically

this Boolean right, just a flag, that says frozen is false, and it says while

not frozen, do, we do a whole bunch of gate swaps at this temperature.

And then at the end it says, if the overall wire length is still decreasing

over the last few temperatures, then okay we're still annealing, what do you do?

You make the temperature a little cooler because as the annealing runs, you have

to make the temperature control go down so you don't keep taking all the big

uphill moves. So this is an arbitrary number.

T is 0.9T. We cool it.

And if it's not the case, then, if it is the case that the total sum of the wire

length has stopped changing. So let's say it just, you know, it hasn't

changed more than, you know, 1 100th of a percent over the last five temperatures.

Okay, we're done, right? I mean, we're just not going to find

anymore improvement in the wire length. Then we set frozen as true, we drop out

of the loop and we return the final placement as the best solution.

So, it's not that much different, it's just the temperature outer loop, the

yellow stuff in this diagram, and the brown-pink stuff.

The, you know? e to the minus delta, l over t.

Let me go uphill with a certain probability stuff that's really new here.

This probabilistic acceptance of swap stuff is really famous.

So this little snippet of code. If uniform random less than exp delta l

over t. Then accept the swap else undo the swap.

This little piece of code is sort of a version of a very famous idea.

So this is the idea that randomly accepting a perturbation of a system with

a specific probability will correctly simulate the physics of that real system.

This has a name. This is called the Metropolis Criterion.

And it's named after this guy, Nicholas Metropolis, Nick Metropolis, a very

famous guy. what was Nick doing when he was sort of

doing this? Nick was working with some very famous

people like John Von Neuman and Stanislaw Ulam, who's a very famous mathematical

physicist. Way back at the dawn of atomic physics in

the 1950's, when people had computers the size of football fields built out of

thousands of vacuum tubes. And they were trying to write the first

computer, some of the first computer programs, to simulate how the physics of

things, you know, with atoms doing interesting things, worked.

And these are the guys who invented the first Monte Carlo simulations, if you've

ever heard that word. And this sort of inner loop thing that is

generating particular probability distributions, checking that the

perturbation you make either does or doesn't fit within some probabilistic

acceptance criterion. That's really famous and so when you see

lots of computer. algorithms that are based on random sorts

of acceptance methods for how things make progress.

You'll often see metropolis the name show up.

Metropolis criterion, metropolis algorithms.

Lots of interesting places we use this idea.

This is really famous. How does this metropolis criterion make

our overall algorithm behave? and the answer is, in some very unusual

ways. In some ways that you've probably haven't

experienced before. So it really is random.

You might accept the swap. You might not accept the swap.

It depends on the random number you generate.

And the way to maybe explain that is suppose the temperature's 10 and delta l

is 20, plus 20. So either the minus delta l over t.

Is either the minus 2. Its about 0.14.

So this is about a 14% probability you accept the swap.

You know, what does that mean? So here's a, let me give you a concrete

example of what that means. Supposed this temperature you're trying a

really large number of swaps, in the, in the, in the, in the placement.

So, I mean, suppose you're trying a million swaps it this temperature.

And suppose it turns out that 5,000 of those different swaps all happen to

evaluate that they change the wire length by delta l is plus 20.

Then roughly 0.14 of those swaps that you try that all have a wire length

degradation of 20. Approximately 14% of them are 0.14 x 5000

equals 700 of them will be accepted. A random 700 of those 5000 which all

change the placement in exactly the same way.

They all make the placement 20 units of wire length longer.

About 700 of them will get accepted and about 4,300 of them won't.

And if you change the random number seed so that you get a different stream of

random numbers and you run your placer again.

You will get a different set of placement swaps proposed, and let's assume that you

again try about 5,000 of them that have delta l as 20.

You will still expect about 700 of them to be accepted, but it'll be a completely

different set of swaps with a completely different set of results.

It's a very different universe when you have probabilistic algorithms like this,

and the thing that's amazing is how well this stuff works.

So, how well does it work? And the answer is, it works great.

There is nothing else to say other than it works great.

So this is the same 2,500 gate benchmark, this is an annealer that looks very much

like the annealer pseudo-code that I showed you a couple slides ago.

The hot temperature is 800, the m number for how many moves per gate, how many

eval, swaps per gate, is 100. So we try 2500 times 100.

which is what, 250,000? swaps per temperature.

And then we lower the temperature this is the curve of the progress.

So the first thing I'll just say is we got about 30% less half-perimeter

wirelength total than the random iterative improvement algorithm, and

that's a gigantic number. That's a very big deal.

So let me tell you how you read an annealing curve, because this is a pretty

standard looking annealing curve. The x axis is temperature.

Okay? And the thing that's interesting is that,

it's a log scale. So what you see is the temperature is

0.1, 1, 10, 100, 1000, going from left to right.

And the other thing to remember, right? Is that you read the curve from right to

left. Because the temperature starts hot.

20:17

Right? And so I'm just sort of coloring some

squares in. Like, okay, that's a box in this grid.

That's, that's a, a bounding box for a net.

What we did is we have a very simple mathematical function that says, for

every bounding box. For every net who's length we're

evaluating. We calculate a number that's sort of a

model of the likelihood that the physical wire we route that's associated with that

box wants to use every square in the grid underneath it.

And so we calculate the simple function. It's related to how big the bounding box

is and relate it to how many Gates there are on the net that defines that bounding

box. We calculate this simple function for

every bounding box, and we take that number and we add it to the cell in the

grid. So if there are a lot of bounding boxes

representing a lot of nets, covering a particular square of the grid there's a

lot of demand for that. Region of space.

There's potentially a lot of congestion. Right?

And congestion is bad. There's only s many wires you can put in

every space on the grid. And what we're plotting is that function

evaluated at the end of every temperature of the annealer.

And so what you're seeing at the right is the first frame of a movie of this heat

map associated with net congestion. And so what you see is, there's a whole

bunch of red, and some orange, and a lot of yellow.

That's bad. There are a lot of bounding boxes

representing a lot of nets all on top of each other, all wanting to use that

space. And what we're going to show is, in this

movie, is, what happens as the annealer actually runs.

And so have to go back here, and turn the arrow, the pen back into an arrow.

And so here's the movie of the congestion of the annealer, as the annealer sort of

reduces the wire length. because remember, the annealer doesn't

really understand this stuff. you can put these kinds of things into

the formulation of the annealer. You don't have to anneal just the wire

length. Right?

But this particular version is just annealing the wire length and one of the

things you're seeing is that just by making the wire length better, right?

By making the gates closer together, by making the wires and their estimated

length shorter, we're getting almost all the, all the red is out.

There's a little bit of orange you can see a little bit way up on the top on the

left. We're getting most of the red out almost

all the orange out. There's a few little yellowy kind hot

spots. This is dramatically better.

What you're seeing is most of the wires are now short.

Most of the mounting boxes are now small, and there's not that many places where

there are lots and lots of mounting boxes on top of each other.

And in particular, there's not that giant hot spot on top of the chip.

This is a really pretty cool result of showing how simulated annealing can do a

really good job of, of sort of improving the quality of something like a

placement. I want to also do something about some

frequently asked questions, because this just happens all the time.

People see simulated annealing, they often get very excited, because a very

cool algorithm. And there's a bunch of questions they ask

and it's often the case that people have misconceptions.

So let me just go through these. First, does simulated annealing always

find the perfect best global optimum? No.

Gosh, we would all be out of jobs if that was true.

If you could take any combinatorial optimization problem.

Apply a simulated annealing algorithm and get a perfect answer in, you know, like

an hour or overnight? None of us in computer science would have

jobs. No.

It doesn't find the best global answer, it's just good at avoiding a lot of bad

answers. Really, very good at avoiding a lot of

bad local minimums. Second question, does annealing work on

every type of optimization problem? No.

It does work on very many, but it's not always the most efficient option.

It is often the one that people try first because it is often not very hard to

build a sort of crude simulator, the annealer, to just go try and see if you

can make some progress on your problem. but it's not always the right, most

efficient answer. Third one, is annealing always slow,

doing all those many swaps over many temperatures?

If you remember the the previous slide where I talked about the, a couple of

slides ago, about the actual annealing curve?

That annealer was doing a quarter of a million swaps at every temperature, and

it was doing a lot of temperatures. No, annealing is not always slow.

There's a lot of engineering tricks to make it go fast, but yeah, for some big

problems, it can be slow. Do I have to set all those parameters how

hot the initial temperature is and M, how many swaps per gate at every temperature?

And there was a line in that psedo code that says, the new temperature is 90% of

the old temperature. Do I have to set all those things by

trial and error? No.

There are fancy adaptive techniques to determine these things automatically.

There are actually lots of interesting papers on how you automatically control a

simulated annealer based on the dynamics of the problem you measure as the problem

runs itself. and finally, what happens if I run my

annealer severen, several different times?

I mean, what if I change the random seed that starts the random number generator

every time. And the answer is, you get a different

random answer every time. And you know, probably those answers

cluster in terms of the quality of their solution.

But if you run it enough times You might find a really bad answer.

It just happens. And, you know, if run it enough times you

might find a really good answer, one a little better than the others.

And so I'm going to let you a dirty little secret of all real simulated

annealing algorithms in sort of commercial use.

You never ever run it once. You always, always, always run it at

least twice. And you take the best answer.

So here's my high-level summary for simulated annealing for placement.

Simulated annealing is a very famous and successful method.

It is much better than dumb random improvement, and really, it was the

dominant method for placers in most of the end of the 1980s and some of the

1990s. And still today there are lots of other

CAD tasks where it's very important. And it's just important to understand

what a simulated annealing placer does. But I hate to tell you this, it's not

really the way we do placers today. It's a very important technique to know

for a lot of reasons. Annealing works well up to oh, 100,000

gates it works pretty well, a, a quarter of a million gates it's working okay.

400,000, 500,000 gates, really? Just too slow, not very efficient, and

honestly, there're some other techniques that just give better answers.

Especially if you want to do a million gates, 2 million gates, 5 million gates,

8 million gates, 10 million gates. There're just other ways of doing this.

And that's what we're going to talk about next.

So, annealing is too inefficient for things where there are a million or more

gates. We need another set of interesting ideas.

And so, that's what we're going to do next.