0:01

The second view point of DPC is game. Just like optimization, a game can be used

Â as a common English word. But in this case, we got a very

Â unambiguous, formal definition of a game. Now, power control clearly is a

Â competition among the different transceiver peers.

Â Now, we can model competition as well as cooperation with games.

Â So, the kind of formal language of games that we will be looking at, will extend

Â our intuitive understanding, where we have a set of players, and players, and

Â transceiver peers in this particular case, and each has a certain strategy set.

Â It's a set of possible strategies that we can play.

Â And then, there is a certain payoff function that says, if you play a certain

Â strategy and other players play other strategies, then these will lead to a

Â particular reward coming to you or a cost coming to you.

Â So, we define a game formally as a three tupple, so one, two, three things.

Â First, is very simple, set of players. So, index to buy i, one, two, three up to

Â N. And then, for each of these players, say,

Â index by i, we have a certain set, A sub i.

Â And there are many possible elements that could be living in this set.

Â In each of these elements, is a possible strategy, for example, in soccer, if I

Â have a goal here and a goal keeper, okay, and a goal keeper thinking should I go

Â left, should I go right to catch the ball, that would be a strategy set with two

Â strategies. And then there would be a payoff function.

Â And this usually maps some element in the set into a particular real number, say ,

Â five or, you know, -three and so on. And if these numbers model something

Â desirable, then it's called a payoff function, or in general, a utility

Â function, as we will see in future chapters.

Â If this representing something bad, undesirable outcome then will be a cost

Â function. So, of course, you would like to maximize

Â your payoff function or minimize your cost function for, for each player, varying

Â only the strategy that is in your own control that is vary pick a particular

Â point in this set. For example, the goal keeper going left

Â 2:49

and the ball turns out and come over here then, you lose.

Â And this may be, you know, a -five, say, of [unknown] of cost.

Â Whereas, if he catches it, then you would get a reward of, say, ten.

Â So, that is the function in that map. The strategy that you choose to number.

Â Of course, it's determined, in part, by your decision, but also, in part, by

Â whoever is kicking the ball. So, there comes the coupling of

Â everybody's strategy. Your strategy and her strategy, together

Â determine the outcome. Now, we're going to later come back to our

Â power control and view it as a game. But first, let's take a look at two simple

Â but very famous examples. Perhaps, by far, the most famous example

Â of simple game theory is the so-called Prisoner's Dilemma.

Â It's a two-user game, modeling competition, and each user has two

Â possible strategies. So, we can plot this in a two-dimensional

Â space, in a table, on a piece of paper with two columns and two rows, okay.

Â Let's say, the two rows represent the two strategies for player A, and the two

Â columns represent the strategies for player B.

Â And players A and B are suspects of a crime, and the police is going to

Â investigate into this crime and examine both of them individually.

Â So, they are going to know the entries in this table later on, but they cannot

Â communicate, because they are being questioned separately.

Â And let's say, they have two choices to make.

Â One is you do not confess, no confess. The other is you confess.

Â Similarly, B can also say, I don't confess or I confess.

Â So, there are, altogether, four possible outcomes.

Â For each of these outcomes, jointly determined by A strategy and B strategy,

Â there will be a pair of numbers. The first number denotes the payoff to A,

Â and the second, the payoff to B. So, if they both decide not to confess,

Â then the police can only charge them with a lesser crime.

Â And they will each get, say, one year in prison.

Â Whereas, if they both confess, then, they will be sent to prison for longer terms,

Â say, three years. Whereas, if one confess, but the other not

Â confess, then the one that confess will be rewarded and walk away free for

Â cooperating with the investigation, and the other one will be left with, most

Â severe penalty of, say, five years in the prison.

Â And this is a symmetric game so we also have, this cell filled out exactly like

Â this one. Okay, B confess, then she walks away with

Â nothing, zero years. And A, do not confess, then she will be

Â getting five years, okay? So, this defines the game, the Prisoner's

Â Dilemma game. Because there are two players, each has a

Â strategy set consisting of two possible strategies.

Â And this table quantifies the payoff for each player, jointly determined by their

Â respective strategies. So, let's take a look at this famous

Â Prisoner's dilemma. Why is it called dilemma?

Â Because, imagine you are, suspect A and you say, I don't know what B will do.

Â But suppose, she chooses not to confess, then what should I do?

Â Well, if I don't confess, I guess my [unknown] is zero so I should pick,

Â confess. We call this the best response strategy by

Â player A in response to player B's picking no confession.

Â But B may also choose to confess. I don't know.

Â And, in that case, what should I do? Well, I'm basic comparing -five and -three

Â here, still confess is the better choice relative to -five, -three is better.

Â Therefore, confess is also the best response strategy if B chooses to confess.

Â Now, B can only choose between two choices.

Â So, A would say, hey, I have the same best response strategy no matter what.

Â That is called a dominant strategy. If it is the best response strategy for

Â all possible strategies by the other players, in this case, that is confess.

Â Now, by symmetry B is thinking in exactly the same way.

Â And B will also pick confess as the dominant strategy.

Â And therefore, the outcome of this game will be that both A and B will confess and

Â get three years each. What's wrong with that?

Â Why is it called dilemma? Because, they could have got, one year

Â each. If you think about a summation of the

Â payoffs and you want to, for each i, maximize this quantity, then -two would be

Â much better than -six, except, A and B cannot collaborate or communicate.

Â So, in their each strategization process, they end up picking a socially sub optimal

Â solution. And that's why it's called a dilemma.

Â 8:53

So, in what we just talked about, we have defined the best response strategy in a

Â game and the dominant strategy in a game. Now, there may not be a dominant strategy

Â as we'll see right away. This is another famous example called the

Â coordination game. Suppose there are two users, again, you

Â and your friend. And one might decide that maybe you should

Â go watch a movie. And let's say, there two kinds of movie,

Â action and romance. So, again, each player has two choices in

Â action strategy set, Action versus Romance movie.

Â For you player A and also for player B, two strategies, action or romance.

Â Now, suppose you two decide different kinds of movies, then there is no

Â agreement, and you decide not to watch a movie and you get zero payoff.

Â So, both of you, okay? This for you, this for your friend, this

Â is for you, this is for a friend. But suppose you both pick the same kind,

Â then you get some strictly positive return.

Â Except the return's value are different, to you and to your friend.

Â Say, you prefer action so if you two both agree on action movie, then you get two

Â units of reward and B only get one unit. And conversely, let's say, your friend B

Â likes romance movie more than action movies so if the outcome is going to see

Â the romance movie then, B gets two, and you get one.

Â So, obviously, you both have the incentive to try to reach some kind of consensus,

Â okay? But you still would like to compete,

Â because depending on what kind of consensus it is, you'll be getting

Â different kinds of payoffs. So, again, now I've defined a game, a

Â coordination game. The set of players, which is A and B, for

Â each user, there is a strategy set, two possible strategies.

Â And the payoff functions basically map action, action to two for A, one for B.

Â Maps romance, romance to one for A, two for B.

Â And it maps romance, action or action, romance to zero for both players.

Â That's the payoff function. So now, we have completely specified this

Â game. Let's understand the behavior here.

Â Now, clearly there are best response strategies for A.

Â If B chooses action, then the best response is action cuz two is bigger than

Â zero. But if B chooses romance, then A's best

Â response strategy is romance because one is bigger than zero.

Â So now, you see that there is no dominant strategy.

Â Because the best response strategies for different B's choices will be different.

Â Similarly, B does not have a dominant strategy either.

Â However, we see that their best response strategies actually click the match.

Â What do I mean by the match? Well, if A chooses action, then the best

Â response by B with respect to that is action.

Â Similarly, if B chooses action, then the best response would be, with respect to

Â that, for A, is also action. So, the action, action strategy choices by

Â A and B is one that matches. Same story for romance, romance.

Â So, you see that these, these two best responses are actually clicking with the

Â two best responses respectively by B. And in general, we can define what we call

Â the Nash equilibrium. Basically, it is an equilibrium that

Â describes strategic players, unilateral lack of incentive to move away from this

Â choice. Notice this word, equilibrium, is not the

Â same as the equilibrium we saw earlier which is used to describe the convergence

Â of iterative algorithm to a particular point.

Â In fact, we'll later see yet another notion of equilibrium.

Â So, more precisely, let's say that there are two users, okay?

Â And user one got the utility u1, user two got the utility u2, and user one got a

Â strategy set script A, and user two got strategy set script B.

Â 13:49

Now, we say that a particular point called an a star in a particular point b star in

Â A and B sets respectively, is called a Nash Equilibrium, if the following

Â definitions are true. First, the first users utility at this

Â point a star, b star will be no smaller than the utility where user two sticks to

Â b star, and user one is free to pick any a in the set of possible strategies, that is

Â for all a in the set A. This simply says that, hey, if the other

Â player doesn't move away from b star, I have no incentive to move away from a

Â star. End, that's the key word, end, at the same

Â time the other user feels the same way. You, too, at this, a star, b star, is no

Â smaller than the utility for user two. If a star remains there and b is free to

Â be chosen anywhere in the set, okay? So, this b can be any point in the set.

Â It doesn't matter. What it says is that for user two, again,

Â I have no unilateral incentive to move away from b star.

Â If the other player sticks to a star, I'm going to stick to b star.

Â And the key thing is that both states, statements must be true for the single

Â point a star, b star. That's what we mean by matching best

Â responses. And if so, we call this a star, b star

Â point, a Nash equilibrium. It says, there's no unilateral incentives

Â to change my strategy away from a star, b star.

Â So, in this case, there are two Nash equilibria.

Â The action, action is a Nash equilibria, because neither a nor b has a unilateral

Â incentive to move away from this choice. And there's another Nash equilibrium,

Â romance, romance. And back to the Prisoner's Dilemma, this

Â confess, confess is the unique Nash equilibria.

Â Even though both realize it's better to go there, but there's no unilateral, that's

Â the key word, incentive to move away from this point.

Â Basically, A says, if you don't move, I don't move.

Â B says, well, if you don't move, I don't move.

Â And they both have stuck in this socially sub optimal unique Nash equilibrium.

Â So, we see that Nash equilibriua may not exist, both examples we just saw, they

Â exist. It may not be unique, if exists, we saw

Â the examples with two Nash equilibria. And, none of the Nash equilibria might be

Â socially optimal, maximizing the summation of payoffs, or, even be Pareto optimal.

Â However, by a very fundamental research by John Nash, in his PhD dissertation at

Â Princeton, he showed that as long as you allow each player to flip a coin and

Â probabilistically decide which strategy to choose, then in that so-called mixed

Â strategy scenario, there will always exist at least a Nash equilibrium.

Â 17:46

Now, back to our power control problem. Distributed power control is competition

Â and competition can be modeled as a game, so what kind of game are we talking about

Â here? Well, let's identify the set of players

Â first. Well, that's the logical link, or the

Â transceiver peer there at the end of that. Let's identify the strategy space here.

Â It becomes a little more complicated. In the two examples we just saw, the

Â strategy space is pretty simple. It's just two choices.

Â But here, is a continuum of the power levels for each user i, such that the

Â target SIR gamma is achieved. Now, as you can see, the strategy space

Â said, so defined, is a set of Ps, thus, determined, in part, by what the other

Â users' Ps might be cuz that would determine how easy or hard is it for you

Â to achieve the target SIR. So, it's through the coupling of strategy

Â space instead of the coupling of payout function that we see the interference

Â phenomenon reflected in the power control game here because the payoff function is

Â very simple, you basically want to look at a cost function, that is Pi and you want

Â to make this small. So, that is the definition of the game.

Â And here are two simple, but important facts.

Â Fact number one, you can easily to verify for yourself, that the DPC algorithm, what

Â is the algorithm again? It says, that to transmit power for each

Â transmitter i at time t + one is the current transmit power times the ratio

Â between the target SIR and the current SIR at time t, and this goes on for all the

Â users. This actually is the best responsive

Â strategy in response to all the other players' strategies.

Â Whatever they might be, they are reflected through the SIR value.

Â