About this Course
74,234 recent views

100% online

Start instantly and learn at your own schedule.

Flexible deadlines

Reset deadlines in accordance to your schedule.

Advanced Level

Approx. 40 hours to complete

Suggested: 6 weeks of study, 3-6 hours/week for base track, 6-9 with all the horrors of honors section...


Subtitles: English, Korean

100% online

Start instantly and learn at your own schedule.

Flexible deadlines

Reset deadlines in accordance to your schedule.

Advanced Level

Approx. 40 hours to complete

Suggested: 6 weeks of study, 3-6 hours/week for base track, 6-9 with all the horrors of honors section...


Subtitles: English, Korean

Syllabus - What you will learn from this course

5 hours to complete

Intro: why should i care?

In this module we gonna define and "taste" what reinforcement learning is about. We'll also learn one simple algorithm that can solve reinforcement learning problems with embarrassing efficiency.

13 videos (Total 84 min), 8 readings, 3 quizzes
13 videos
Reinforcement learning vs all3m
Multi-armed bandit4m
Decision process & applications6m
Markov Decision Process5m
Crossentropy method9m
Approximate crossentropy method5m
More on approximate crossentropy method6m
Evolution strategies: core idea6m
Evolution strategies: math problems5m
Evolution strategies: log-derivative trick8m
Evolution strategies: duct tape6m
Blackbox optimization: drawbacks4m
8 readings
What you're getting into1m
Setting up course environment10m
Note: this course vs github course1m
Lecture slides10m
Course teaser placeholder10m
About honors track1m
3 hours to complete

At the heart of RL: Dynamic Programming

This week we'll consider the reinforcement learning formalisms in a more rigorous, mathematical way. You'll learn how to effectively compute the return your agent gets for a particular action - and how to pick best actions based on that return.

5 videos (Total 54 min), 2 readings, 4 quizzes
5 videos
State and Action Value Functions13m
Measuring Policy Optimality6m
Policy: evaluation & improvement10m
Policy and value iteration8m
2 readings
Advanced Reward Design10m
Discrete Stochastic Dynamic Programming10m
3 practice exercises
Reward design8m
Optimality in RL10m
Policy Iteration14m
5 hours to complete

Model-free methods

This week we'll find out how to apply last week's ideas to the real world problems: ones where you don't have a perfect model of your environment.

6 videos (Total 47 min), 1 reading, 4 quizzes
6 videos
Monte-Carlo & Temporal Difference; Q-learning8m
Exploration vs Exploitation8m
Footnote: Monte-Carlo vs Temporal Difference2m
Accounting for exploration. Expected Value SARSA.11m
On-policy vs off-policy; Experience replay7m
1 reading
1 practice exercise
Model-free reinforcement learning10m
5 hours to complete

Approximate Value Based Methods

This week we'll learn to scale things even farther up by training agents based on neural networks.

9 videos (Total 104 min), 3 readings, 5 quizzes
9 videos
Loss functions in value based RL11m
Difficulties with Approximate Methods15m
DQN – bird's eye view9m
DQN – the internals9m
DQN: statistical issues6m
Double Q-learning6m
More DQN tricks10m
Partial observability17m
3 readings
TD vs MC10m
DQN follow-ups10m
3 practice exercises
MC & TD8m
SARSA and QLeaning8m
5 hours to complete

Policy-based methods

We spent 3 previous modules working on the value-based methods: learning state values, action values and whatnot. Now's the time to see an alternative approach that doesn't require you to predict all future rewards to learn something.

11 videos (Total 68 min), 1 reading, 3 quizzes
11 videos
All Kinds of Policies4m
Policy gradient formalism8m
The log-derivative trick3m
Advantage actor-critic6m
Duct tape zone4m
Policy-based vs Value-based4m
Case study: A3C6m
A3C case study (2/2)3m
Combining supervised & reinforcement learning6m
1 reading
1 practice exercise
A policy-based quiz14m
5 hours to complete


In this final week you'll learn how to build better exploration strategies with a focus on contextual bandit setup. In honor track, you'll also learn how to apply reinforcement learning to train structured deep learning models.

10 videos (Total 85 min), 4 readings, 4 quizzes
10 videos
Regret: measuring the quality of exploration6m
The message just repeats. 'Regret, Regret, Regret.'5m
Intuitive explanation7m
Thompson Sampling5m
Optimism in face of uncertainty5m
Bayesian UCB11m
Introduction to planning17m
Monte Carlo Tree Search10m
4 readings
Extras: exploration10m
Extras: planning10m
2 practice exercises
56 ReviewsChevron Right


started a new career after completing these courses


got a tangible career benefit from this course


got a pay increase or promotion

Top reviews from Practical Reinforcement Learning

By AKMay 28th 2019

This is one of the Best Course available on Reinforcement Learning. I have gone through various study material but the depth and practical knowledge given in the course is awesome.

By FZFeb 14th 2019

A great course with very practical assignments to help you learn how to implement RL algorithms. But it also has some stupid quiz questions which makes you feel confusing.



Pavel Shvechikov

Researcher at HSE and Sberbank AI Lab
HSE Faculty of Computer Science

Alexander Panin

HSE Faculty of Computer Science

About National Research University Higher School of Economics

National Research University - Higher School of Economics (HSE) is one of the top research universities in Russia. Established in 1992 to promote new research and teaching in economics and related disciplines, it now offers programs at all levels of university education across an extraordinary range of fields of study including business, sociology, cultural studies, philosophy, political science, international relations, law, Asian studies, media and communicamathematics, engineering, and more. Learn more on www.hse.ru...

About the Advanced Machine Learning Specialization

This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings....
Advanced Machine Learning

Frequently Asked Questions

  • Once you enroll for a Certificate, you’ll have access to all videos, quizzes, and programming assignments (if applicable). Peer review assignments can only be submitted and reviewed once your session has begun. If you choose to explore the course without purchasing, you may not be able to access certain assignments.

  • When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

More questions? Visit the Learner Help Center.