0:00

This video will be a really short even though I will talk in it

about such interesting topics that

each one of them would probably deserve a full lecture,

if not a course on its own.

But we have to draw a line somewhere in this course,

so I thought that if we can't spend more time on these interesting topics

then at least we can indicate them for your future reference.

So, as we said a few times in this specialization,

the general class of problems that can be addressed

using methods of reinforcement learning is extremely wide.

In particular, many of tasks of

quantitative trading can be reduced to reinforcement learning.

In the course on reinforcement learning,

we talked about using RL for option pricing and stock portfolio optimization.

In this week, we talked more about

limit order book and the dealer problem of optimal execution.

In addition to their market investor and the dealer problems,

there is also a very interesting class of problems related to modeling market makers.

Their optimization problem is different from those of the investors and

the dealers but they're also amenable to methods of reinforcement learning.

Those of you who are interested in this topic can

start with a paper by a M. Dixon on using

reinforcement learning for this task as well as you

can play with his code that relies on tradinggym,

which in its turn is built on top of open AI gym library for reinforcement learning.

Another interesting potential implications for RL is

credit management of a loan portfolios in peer-to-peer lending or P2P lending.

Platforms that match online lenders and borrowers such as Lending Club,

Prosper, or OnDeck become very popular in the last years.

2:34

It's not anymore your mom and dad little business as now

these activities are funded by very serious folks such as hedge funds and banks.

Questions of optimal portfolio management of

warm portfolios are therefore quite important for these players.

The main topic is always more or less the same,

which is optimization of risk-return profile

in the context of sequential investment decision-making.

Reinforcement learning can be very useful for these tasks as

well as we discussed at length in our courses.

For P2P portfolios, portfolio optimization can be formulated as

convex optimization is constraints in a similar way to

how we did with stocks but there are also some differences.

For example, you can only go long in these markets as well as each one are computed

differently but the mathematical and modelling approach

is still largely the same as for the stocks.

Application software RL for P2P lending are currently an area of active research.

Another direction I wanted to mention here would be

potential applications to keep the currencies trading.

I believe that if you ended up taking this course chances are that

you're already short or familiar with cryptocurrencies.

Bitcoin or Ethereum are the most famous of them but there are many others,

more than a 100 competing cryptocurrencies that

are available for both investment and analysis.

In 2017, the total volume of cryptocurrencies markets in the US was about 120 billion,

out of which about 40 billion was in Bitcoin.

So, cryptocurrencies markets are similar in many ways to conventional,

financial markets but they're also differences mostly due to very high volatility,

absence of regulation, and vulnerability of investments.

As one example of using RL for optimal management of cryptocurrencies portfolios,

you can take a look at the paper referenced on this slide.

This paper combines reinforcement learning with e-current neural networks and LSTM,

for learning a state representation and a reduce dimensionality of the state space.

There are some other papers probably out there which do similar research.

Finally, more on the theoretical side,

I would like to mention a few interesting topics of perception-action cycles.

One of the very first diagrams in this course was this one.

We talked about differences between perception tasks and action tasks and how

supervised and unsupervised learning solve

perception tasks while enforcement learning solves action tasks.

There exists a very interesting body of work on information theory.

Based approach is where perception and actions

are integrated together into what is called perception-action cycles.

There are some references for you there if you want to follow

and explore these interesting research and I believe that

it has lots of potential as it allows us to bring

the feature selection problem directly into an action optimization task.

For example, in our toy model presented in the third week of this course,

signals CT are supposed to be found from an independent analysis of Alpha Research.