About this Course
4.7
372 ratings
83 reviews
If you want to break into competitive data science, then this course is for you! Participating in predictive modelling competitions can help you gain practical experience, improve and harness your data modelling skills in various domains such as credit, insurance, marketing, natural language processing, sales’ forecasting and computer vision to name a few. At the same time you get to do it in a competitive context against thousands of participants where each one tries to build the most predictive algorithm. Pushing each other to the limit can result in better performance and smaller prediction errors. Being able to achieve high ranks consistently can help you accelerate your career in data science. In this course, you will learn to analyse and solve competitively such predictive modelling tasks. When you finish this class, you will: - Understand how to solve predictive modelling competitions efficiently and learn which of the skills obtained can be applicable to real-world tasks. - Learn how to preprocess the data and generate new features from various sources such as text and images. - Be taught advanced feature engineering techniques like generating mean-encodings, using aggregated statistical measures or finding nearest neighbors as a means to improve your predictions. - Be able to form reliable cross validation methodologies that help you benchmark your solutions and avoid overfitting or underfitting when tested with unobserved (test) data. - Gain experience of analysing and interpreting the data. You will become aware of inconsistencies, high noise levels, errors and other data-related issues such as leakages and you will learn how to overcome them. - Acquire knowledge of different algorithms and learn how to efficiently tune their hyperparameters and achieve top performance. - Master the art of combining different machine learning models and learn how to ensemble. - Get exposed to past (winning) solutions and codes and learn how to read them. Disclaimer : This is not a machine learning course in the general sense. This course will teach you how to get high-rank solutions against thousands of competitors with focus on practical usage of machine learning methods rather than the theoretical underpinnings behind them. Prerequisites: - Python: work with DataFrames in pandas, plot figures in matplotlib, import and train models from scikit-learn, XGBoost, LightGBM. - Machine Learning: basic understanding of linear models, K-NN, random forest, gradient boosting and neural networks....
Globe

100% online courses

Start instantly and learn at your own schedule.
Calendar

Flexible deadlines

Reset deadlines in accordance to your schedule.
Advanced Level

Advanced Level

Clock

Approx. 45 hours to complete

Suggested: 6-10 hours/week...
Comment Dots

English

Subtitles: English...

Skills you will gain

Data AnalysisFeature ExtractionFeature EngineeringXgboost
Globe

100% online courses

Start instantly and learn at your own schedule.
Calendar

Flexible deadlines

Reset deadlines in accordance to your schedule.
Advanced Level

Advanced Level

Clock

Approx. 45 hours to complete

Suggested: 6-10 hours/week...
Comment Dots

English

Subtitles: English...

Syllabus - What you will learn from this course

Week
1
Clock
6 hours to complete

Introduction & Recap

This week we will introduce you to competitive data science. You will learn about competitions' mechanics, the difference between competitions and a real life data science, hardware and software that people usually use in competitions. We will also briefly recap major ML models frequently used in competitions....
Reading
8 videos (Total 46 min), 7 readings, 6 quizzes
Video8 videos
Meet your lecturers2m
Course overview7m
Competition Mechanics6m
Kaggle Overview [screencast]7m
Real World Application vs Competitions5m
Recap of main ML algorithms9m
Software/Hardware Requirements5m
Reading7 readings
Welcome!10m
Week 1 overview10m
Disclaimer10m
Explanation for quiz questions10m
Additional Materials and Links10m
Explanation for quiz questions10m
Additional Material and Links10m
Quiz5 practice exercises
Practice Quiz8m
Recap8m
Recap12m
Software/Hardware6m
Graded Soft/Hard Quiz8m
Clock
2 hours to complete

Feature Preprocessing and Generation with Respect to Models

In this module we will summarize approaches to work with features: preprocessing, generation and extraction. We will see, that the choice of the machine learning model impacts both preprocessing we apply to the features and our approach to generation of new ones. We will also discuss feature extraction from text with Bag Of Words and Word2vec, and feature extraction from images with Convolution Neural Networks....
Reading
7 videos (Total 73 min), 4 readings, 4 quizzes
Video7 videos
Numeric features13m
Categorical and ordinal features10m
Datetime and coordinates8m
Handling missing values10m
Bag of words10m
Word2vec, CNN13m
Reading4 readings
Explanation for quiz questions10m
Additional Material and Links10m
Explanation for quiz questions10m
Additional Material and Links10m
Quiz4 practice exercises
Feature preprocessing and generation with respect to models8m
Feature preprocessing and generation with respect to models8m
Feature extraction from text and images8m
Feature extraction from text and images8m
Clock
29 minutes to complete

Final Project Description

This is just a reminder, that the final project in this course is better to start soon! The final project is in fact a competition, in this module you can find an information about it....
Reading
1 video (Total 4 min), 2 readings
Reading2 readings
Final project10m
Final project advice #110m
Week
2
Clock
2 hours to complete

Exploratory Data Analysis

We will start this week with Exploratory Data Analysis (EDA). It is a very broad and exciting topic and an essential component of solving process. Besides regular videos you will find a walk through EDA process for Springleaf competition data and an example of prolific EDA for NumerAI competition with extraordinary findings....
Reading
8 videos (Total 80 min), 2 readings, 1 quiz
Video8 videos
Building intuition about the data6m
Exploring anonymized data15m
Visualizations11m
Dataset cleaning and other things to check7m
Springleaf competition EDA I8m
Springleaf competition EDA II16m
Numerai competition EDA6m
Reading2 readings
Week 2 overview10m
Additional material and links10m
Quiz1 practice exercise
Exploratory data analysis12m
Clock
2 hours to complete

Validation

In this module we will discuss various validation strategies. We will see that the strategy we choose depends on the competition setup and that correct validation scheme is one of the bricks for any winning solution. ...
Reading
4 videos (Total 51 min), 3 readings, 2 quizzes
Video4 videos
Validation strategies7m
Data splitting strategies14m
Problems occurring during validation20m
Reading3 readings
Validation strategies10m
Comments on quiz10m
Additional material and links10m
Quiz2 practice exercises
Validation8m
Validation8m
Clock
5 hours to complete

Data Leakages

Finally, in this module we will cover something very unique to data science competitions. That is, we will see examples how it is sometimes possible to get a top position in a competition with a very little machine learning, just by exploiting a data leakage. ...
Reading
3 videos (Total 26 min), 3 readings, 3 quizzes
Video3 videos
Leaderboard probing and examples of rare data leaks9m
Expedia challenge9m
Reading3 readings
Comments on quiz10m
Additional material and links10m
Final project advice #210m
Quiz1 practice exercise
Data leakages8m
Week
3
Clock
3 hours to complete

Metrics Optimization

This week we will first study another component of the competitions: the evaluation metrics. We will recap the most prominent ones and then see, how we can efficiently optimize a metric given in a competition....
Reading
8 videos (Total 83 min), 3 readings, 2 quizzes
Video8 videos
Regression metrics review I14m
Regression metrics review II8m
Classification metrics review20m
General approaches for metrics optimization6m
Regression metrics optimization10m
Classification metrics optimization I7m
Classification metrics optimization II6m
Reading3 readings
Week 3 overview10m
Comments on quiz10m
Additional material and links10m
Quiz2 practice exercises
Metrics12m
Metrics12m
Clock
4 hours to complete

Advanced Feature Engineering I

In this module we will study a very powerful technique for feature generation. It has a lot of names, but here we call it "mean encodings". We will see the intuition behind them, how to construct them, regularize and extend them. ...
Reading
3 videos (Total 27 min), 2 readings, 2 quizzes
Video3 videos
Regularization7m
Extensions and generalizations10m
Reading2 readings
Comments on quiz10m
Final project advice #310m
Quiz1 practice exercise
Mean encodings8m
Week
4
Clock
3 hours to complete

Hyperparameter Optimization

In this module we will talk about hyperparameter optimization process. We will also have a special video with practical tips and tricks, recorded by four instructors....
Reading
6 videos (Total 86 min), 4 readings, 2 quizzes
Video6 videos
Hyperparameter tuning II12m
Hyperparameter tuning III13m
Practical guide16m
KazAnova's competition pipeline, part 118m
KazAnova's competition pipeline, part 217m
Reading4 readings
Week 4 overview10m
Comments on quiz10m
Additional material and links10m
Additional materials and links10m
Quiz2 practice exercises
Practice quiz6m
Graded quiz8m
Clock
4 hours to complete

Advanced feature engineering II

In this module we will learn about a few more advanced feature engineering techniques....
Reading
4 videos (Total 22 min), 2 readings, 2 quizzes
Video4 videos
Matrix factorizations6m
Feature Interactions5m
t-SNE5m
Reading2 readings
Comments on quiz10m
Additional Materials and Links10m
Quiz1 practice exercise
Graded Advanced Features II Quiz12m
Clock
10 hours to complete

Ensembling

Nowadays it is hard to find a competition won by a single model! Every winning solution incorporates ensembles of models. In this module we will talk about the main ensembling techniques in general, and, of course, how it is better to ensemble the models in practice. ...
Reading
8 videos (Total 92 min), 4 readings, 4 quizzes
Video8 videos
Bagging9m
Boosting16m
Stacking16m
StackNet14m
Ensembling Tips and Tricks14m
CatBoost 17m
CatBoost 27m
Reading4 readings
Validation schemes for 2-nd level models10m
Comments on quiz10m
Additional materials and links10m
Final project advice #410m
Quiz2 practice exercises
Ensembling8m
Ensembling12m
4.7
Direction Signs

33%

started a new career after completing these courses
Briefcase

83%

got a tangible career benefit from this course

Top Reviews

By MSMar 29th 2018

Top Kagglers gently introduce one to Data Science Competitions. One will have a great chance to learn various tips and tricks and apply them in practice throughout the course. Highly recommended!

By MMNov 10th 2017

This course is fantastic. It's chock full of practical information that is presented clearly and concisely. I would like to thank the team for sharing their knowledge so generously.

Instructors

Dmitry Ulyanov

Visiting lecturer
HSE Faculty of Computer Science

Alexander Guschin

Visiting lecturer at HSE, Lecturer at MIPT
HSE Faculty of Computer Science

Mikhail Trofimov

Visiting lecturer
HSE Faculty of Computer Science

Dmitry Altukhov

Visiting lecturer
HSE Faculty of Computer Science

Marios Michailidis

Research Data Scientist
H2O.ai

About National Research University Higher School of Economics

National Research University - Higher School of Economics (HSE) is one of the top research universities in Russia. Established in 1992 to promote new research and teaching in economics and related disciplines, it now offers programs at all levels of university education across an extraordinary range of fields of study including business, sociology, cultural studies, philosophy, political science, international relations, law, Asian studies, media and communications, IT, mathematics, engineering, and more. Learn more on www.hse.ru...

About the Advanced Machine Learning Specialization

This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings....
Advanced Machine Learning

Frequently Asked Questions

  • Once you enroll for a Certificate, you’ll have access to all videos, quizzes, and programming assignments (if applicable). Peer review assignments can only be submitted and reviewed once your session has begun. If you choose to explore the course without purchasing, you may not be able to access certain assignments.

  • When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

More questions? Visit the Learner Help Center.