0:03

[SOUND] The algorithms we study this week have one common property.

Â This property is the fact that they treat the decision process, be it decision

Â process or something else, as a black box, or well, almost always a black box.

Â You do take account for the fact that you have to take states, produce probabilities

Â of actions and so on, so there's this iterative structure of the process.

Â But otherwise, the assumption for example,

Â is not that widely used as we will use it later in this course.

Â Basically, you can think of it as a, again, black box family of algorithms.

Â So, you have this decision process here, or any process.

Â And you have all those things, actions, rewards.

Â Rewards from the whole trajectory in this process.

Â Now the way you think of it is as some kind of box to which you feed

Â the parameters of your policy.

Â Maybe ways of a neural network which constitutes your agent's probability,

Â action probability distribution, or a table of probabilities for

Â every possible state, if there is a fleet amount of them.

Â Anything you can think of, and then this box spits out the expected reward.

Â Or just reward from one of several trajectories averaged.

Â Now since we don't actually require that much from this process,

Â can think you can make this next step and assume it's a black box.

Â So you have a black box which takes a vector of weights.

Â You can just draw a few inputs for every respective weight here probabilities.

Â It spits out one number, and you want to tune these inputs to

Â get the output number as large as possible in expectation.

Â And again the method basically does this very thing.

Â Maybe not exactly black box but it is almost so.

Â And the method we're going to start right now, or to be more accurate,

Â a family of methods, the so-called evolution strategies.

Â Now counterintuitively, they only have a little bit to do with actual biological

Â evolution, but get another method like.

Â Now the idea behind them is, the first thing you have to do is you have to define

Â a distribution, probability distribution, over inputs to your black box,

Â which takes parameters that produce the reward.

Â So if you use a distribution, if you remember for each state.

Â You have to feed it one number per particular action in a particular state.

Â So it's the number of states times the numbers of actions.

Â Minus one if you are purely mathematical.

Â And in case you are using a neural network, say 100 neurons followed by yet

Â another 100 neurons, then you have to store, in this case at least,

Â 100 squared numbers, which are the weights of this neural network.

Â So what you do is you define them via some kind of distribution.

Â For example, the fully factorized normal distribution.

Â So you have 10,000 weights.

Â What you do is you have 10,000 means of those respective weights.

Â And you have 10,000 weight wise variances, the sigma squares.

Â 3:27

You could, of course, use not the fully factorized normal distribution but

Â the actual mandates and, weight wise, kind of variant.

Â The algorithm of course using other distribution,

Â using not the fully facotrized normal

Â distribution but the actual mandates.

Â And you could use not the The algorithm

Â of course [INAUDIBLE] use not the [INAUDIBLE] and [INAUDIBLE].

Â The algorithm of course using any other distribution,

Â using not the normal distribution but the actual mandates.

Â and [INAUDIBLE] [INAUDIBLE] the sigma squareds.

Â The algorithm of course using other distribution.

Â Use not the [INAUDIBLE] as a normal distribution but

Â the actual [INAUDIBLE] sigma squares.

Â [INAUDIBLE] Weight wise kind of variant.

Â The algorithm [INAUDIBLE] you could of course use any other distribution.

Â Use not the normal distribution but the actual mandates.

Â You could [INAUDIBLE] squares.

Â [INAUDIBLE] You could of course use any other distribution.

Â The algorithms [INAUDIBLE]

Â used not the [INAUDIBLE] and

Â normal distribution but

Â the actual [INAUDIBLE] fully factorized

Â [INAUDIBLE] sigma squares.

Â [FOREIGN] And of course, using any other [INAUDIBLE] the algorithm [INAUDIBLE] and

Â weight wise kind of [INAUDIBLE] normal distribution but

Â the actual amount of fully factorized [INAUDIBLE] the sigma squares.

Â [FOREIGN] The algorithm [INAUDIBLE]

Â and weight wise kind of [INAUDIBLE]

Â but the actual [INAUDIBLE]

Â the sigma squares.

Â