0:00

In this video, we're going to look at how to define utility in infinitely repeated

Â games. So, remember that the way that an infinitely repeated game works is that we

Â have some stage game, which is a normal form game, and the players repeatedly play

Â the same game over and over again. And what that means is, each player gets a

Â sequence of payoffs, let's say, Player I gets the sequence R1 in the first

Â repetition, R2 in the second repetition, R3, just on and on infinitely. So, we have

Â this infinite sequence of real values which are the payoffs that this player has

Â gotten. But if we want to reason about this game, we can't really reason using

Â utility theory about an infinite sequence. We, instead, have to take this sequence

Â and turn it into a single number that talks about the utility that the player

Â has for having played this sequence. And so, how do we do that? What's the right

Â way of thinking about that? So, the first thing to notice is that previous things

Â that we've learned about Game Theory aren't going to be sufficient to answer

Â this question. So, you might wonder if we can take this infinitely repeated game and

Â just write it in extension form. And, of course, we can't. And the reason is that

Â the extensive form would be infinitely deep. We would never get to a leaf node

Â where we could write a payoff. And so, that won't help us. You might also just

Â wonder, can we sum up the sequence of payoffs and just say that my utility is

Â the, the, the sum of these values. And the problem is, that, that sum can be

Â unbounded. Because, if for example, every payoff I get is positive, then I'm going

Â to have some unbounded amount of, of utility at the end. And so that, that's

Â not going to work for me, I want to have finite utilities.

Â So, instead there are two canonical ways that this gets defined. And I'll tell you

Â about both of them in this video. So, here's the first one. so the first thing

Â says, let me look intuitively at my average payoff over the sequence. Now, the

Â average payoff of a sequence is als o not well defined because, of course, the way I

Â take an average is I sum everything up, and then I divide it by the number of

Â things. And we've already seen that the sum could be unbounded, the number of

Â things is unbounded as well, so I would have infinity divided by infinity, which

Â wouldn't help me out here. But, what I can do instead is to look at the limit of

Â finite averages as the averages get longer and longer. So, what I can say is let me

Â look at an average over, over the first k things in my sequence. And then, let me

Â take the limit of this average as k goes to infinity. And, it turns out actually

Â technically, this isn't always well defined. It's almost always well defined

Â and there's an easy fix we can do to this definition that I've left out here to keep

Â it from getting too technical. But, in cases where this is well defined this is

Â the right thing to do. So, and, and everything we'll talk about in this

Â course, it'll be well defined. So, so this is the defined as the average reward that

Â the player gets over the infinite sequence. So this gives us one number. The

Â reason we have a second definition is that there's something kind of counter

Â intuitive about the average reward. Let me put it back up. The reason that it's kind

Â of counter intuitive is that, if I get some bad payoff for a finite amount of

Â time, let's say, for the first 100,000 iterations, I get a payoff of negative a

Â million. And then, for the rest of time after that, I get some good payoff. Let's

Â say, one unit of utility. Then, the limit of the means would be one. Because the,

Â the negative payoff that I got at the beginning is only for a finite amount of

Â time and it washes out in the average if I go long enough out into the future. And,

Â well, that's what the math says. But, that doesn't always model what we want to, to

Â really reason about because we have an intuition that payoffs that you get early

Â on are kind of more important than pay-offs that you get really far into the

Â future. So, if we want to have a model of ut ility that has that property, we need

Â to say that different payoffs matter differently. So, it's more important to me

Â to get a good payoff in the first iteration than to get one in the millionth

Â iteration. And the way that I can model that is by saying, my payoffs are

Â multiplied by some discount factor. So, my discount factor talks about my value for

Â payoffs at different times. So, my discount factor, beta, is some value

Â strictly between 0 and 1. And, you can sort of think of it like an interest rate.

Â It's saying, you know, with money, if I wanted to tell somebody that I'm going to

Â pay them $100 in a year they would value that at less than $100 today. and so and,

Â and the amount by which they would value it less today kind of corresponds to the

Â interest rate. and that's kind of exactly what's going on in the math here. So, what

Â I'm saying here is that my utility for this stream of payoffs this stream of r's,

Â this stream of payoffs, is weighted by the discount factor to the power of which

Â payoff in the sequence it is. So, I, I'm going to discount each payoff

Â successively. So, the first one is going to have the discount factor applied once.

Â The second one is going to have the discount factor applied twice, so I'm

Â going to get the discount factor squared and so on, all the way through the

Â sequence. So, each of them is going to be diminished, but each of them is still

Â going to matter. And there are two ways we can think about what the discount factor

Â means. So, the first is kind of the interpretation that I've been telling you

Â so far. That the agent just cares more about the near term than the long term.

Â there 's another definition which is different but mathematically the same, so

Â it's interesting to think about. And that is, that the agent really is the agent we

Â just talked about in the average reward case, cares just about as much as every

Â payoff. But with some probability, the probability actually 1 minus beta, the

Â game will end in every given round. So, our game is not necessarily infinitely

Â repeated, it's sort of potentially infinitely repeated. But every time we

Â play the game, we're going to flip a coin. And with probability 1 minus beta, the

Â game is going to just end. And with probability beta, the game is going to

Â continue. And what that means is that here we'd be talking about my expected reward

Â in the game, because there's a beta chance that I'll go to the next round. There's a

Â beta squared chance that I'll go 2 rounds forward. There's beta cubed chance I'm

Â going to go 3 rounds forward and so on. So, that means my expected utility in this

Â game would, would just be the same formula.

Â And that's it for defining utility in these games. Thanks very much.

Â