So, today we're going to continue our adventure in computer architecture and
talk more about parallel computer architecture.
last time we talked about coherence, memory coherence, and cache coherence,
systems and to differentiate that from memory consistency models which is a
model of how memory is supposed to work, versus the underlying algorithms that try
to keep memory consistent, and try to implement the consistency models.
We left off last time, we, we were talking about MOESI, or also known as the
Illinois protocol, and we walked through all of the
different arcs through here. And if you recall what we were talking
about, was, we split the shared state from the MSI protocol into two states,
shared and exclusive. And the insight here is, it's very common
for programs to read a memory address, which will pull it into your cache.
And then go modify that memory address. So for instance, if you want to increment
a number. You're going to do a load.
It's going to bring it into your, your, ca-, or into your register set.
But also into your cache. You're going to going to increment the
number and then you do a write back to the exact same location.
Pretty common in imperative programming languages.
Declarative programming languages like Scheme and such, they may at times copy
everything. But for declara- excuse me, imperative
programming languages it's pretty common to actually change state in place.
So, because of that, you can bring it right into, this exclusive stage, and
then when you have to go to modify it, you would have to go and broadcast in the
bus. You know, you would have to talk to
anybody and would loose, hm, effectively this intent to write message.
Then you would have to send otherwise across the bus and waiting for that
address to be snooped on the bus, or be seen by all the other entities on the
bus. note I, I say entities in the bus.
We've been talking primarily about, processors, the last day,
but there can be other entities on the bus that want to snoop the bus.
So examples sometimes include coherent, IO devices.
So, this isn't very popular right now,
but I think this will become much more popular as soon as we start to have, GPUs
or Graphics Processing Units or general purpose GPUs,
which will be sitting, effectively, very close to our processor on the same bus,
and will want to take part in the coherence traffic of the processor.
So it's going to want to basically read and write to the same memory addresses
that the processor is reading and writing,
and take part in the cash coherence protocol.
[COUGH] At a minimum, usually your IO devices need to effectively tell the
processor when its doing a memory transaction that the processor should
know about. So typically when you are moving data
from a IO device to main memory, that's going to have to effectively go
across the person. Everyone is going to have to validate
their cache's, you have to snoop the traffic, or they will all have to snoop
that memory traffic from the IO device. So we had talked about MOESI as an
enhancement to MSI. Well, we left off last time, and we were
going to talk about two more enhancements that are pretty common.
one is been used widely in AMD Opterons. I think they still use this in AMD.
I think they use something similar to this still in AMD, is my understanding.
and the idea is you add an extra state here,
which is called ownership, or the owned state.
And effectively, what this is, is it looks just like our MOESI protocol from
before. But now, instead of having data in the
modified stage, when you, let's say another processor needs to go access that
data, instead of having to send all that data back to main memory, and validate
that line out to main memory, and go fetch it back from main memory.
Instead, you can do direct cache to cache transfer.
This is of a, basically an optimization here.
So, you don't have to right back to data to main memory,
and in fact you can allow main memory to be stale.
And you can just transfer the data across the bus from the one cache to the cache
which needs it. So in this example here, we're going to
look at this edge here. So another processor wants to read the
data. So we see an intent to write to a
particular cache line and our processor currently has it in the modified state.
We see this other processors intent to write, and.
[COUGH] or excuse me. Intent to read.
And we're actually going to provide the data out of our cache,
and not write it back to main memory, and transition the line in our cache to
this owned state. The other processors can now take it in,
and take it in a shared state. So they will have it a read, read only
copy. Now, note this is only for, for
read-only, we'll talk about if another processor wants to write to the state in
a second. So we have it in its own state, and what
we're trying to do here is this processor is tracking that, that data needs to be
written back to main memory at some point.
That's the whole purpose of this state here,
is we've basically designated a processor which owns the data and owns the modified
state. So the processors which take at read only get it into the shared state,
and if they need to invalidate the line, they don't need to contact anybody.
Because they are having a share state, they have a read-only copy.
They don't need to make any bus transactions.
[COUGH] So if you think about it, if you actually want to effectively have one
core, read the state from other, read this dirty state from the other core,
and then in some points it goes in and just invalidates it in the, in the second
core. If the data is not up to date as in, it
would be in main memory, you lose the changes.
So, by one processor keeping it in the own state here, it keeps track that at
some point, if it never gets invalidated out of that processor's cache, it needs
to write that out to main memory, to keep it up, up to date.
Now, there's a couple other arcs here. you can transition from the own state
back to the modified state if the processor, which has it in the owned
state wants to go to a write. [COUGH].
It can't do that while it's in the owned state,
because while it's in the owned state, other processors may have shared copies
of it. So, when it needs to do that,
if it wants to do, P1 wants to do a write here, it needs to re-invalidate everyone
else's copies across the bus. So it's going to have to send an intent
to write for that line, and everyone else will snoop that
traffic, and transition to the invalid state.
And then, this processor will be able to transition to the modified state,
and now it's able to actually modify the data.
Okay. So we've got this arc here, which we sort
of already talked about, is that if you're in the owned state, anyone else
can get read only shared copies of it. [COUGH].
They can't go get an exclusive copy, because that would basically violate this
notion, because then they would be able to upgrade to modified without telling
anybody, and we don't want that. But they can get shared read-only copies
of the data and then there's this arc here from owned to invalid, is if some
other processor wants to write the data. We're going, processor 1, P1 here will
say, we'll see the intent to write from
another processor. It will, snoop that traffic effectively,
and at that point it will transition to this invalid state.
note here that this intent to write, we may need to provide information across
the bus when we're in the owned state. Because if the only, if we're the only
owner of that, or the only cache that has that data, and the other processor is
basically going straight into this state here via rightness, we're going to need
to provide the data. Okay, so, questions about MOESI?
So far, But a basic, extra optimization.'Cause we
don't have to. We can basically transfer data around.
And one cache can have a, a, a cache line in the owned state.
And later, some other cache, you know, the exact same cache line in the own
state. And it can basically bounce around
without ever having to go out to main memory.
And this, this decreases our bandwidth out to the main memory system.