About this Course
4.0
194 ratings
57 reviews
Have you ever heard about such technologies as HDFS, MapReduce, Spark? Always wanted to learn these new tools but missed concise starting material? Don’t miss this course either! In this 6-week course you will: - learn some basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce and Spark; - be guided both through systems internals and their applications; - learn about distributed file systems, why they exist and what function they serve; - grasp the MapReduce framework, a workhorse for many modern Big Data applications; - apply the framework to process texts and solve sample business cases; - learn about Spark, the next-generation computational framework; - build a strong understanding of Spark basic concepts; - develop skills to apply these tools to creating solutions in finance, social networks, telecommunications and many other fields. Your learning experience will be as close to real life as possible with the chance to evaluate your practical assignments on a real cluster. No mocking, a friendly considerate atmosphere to make the process of your learning smooth and enjoyable. Get ready to work with real datasets alongside with real masters! Special thanks to: - Prof. Mikhail Roytberg, APT dept., MIPT, who was the initial reviewer of the project, the supervisor and mentor of half of the BigData team. He was the one, who helped to get this show on the road. - Oleg Sukhoroslov (PhD, Senior Researcher at IITP RAS), who has been teaching MapReduce, Hadoop and friends since 2008. Now he is leading the infrastructure team. - Oleg Ivchenko (PhD student APT dept., MIPT), Pavel Akhtyamov (MSc. student at APT dept., MIPT) and Vladimir Kuznetsov (Assistant at P.G. Demidov Yaroslavl State University), superbrains who have developed and now maintain the infrastructure used for practical assignments in this course. - Asya Roitberg, Eugene Baulin, Marina Sudarikova. These people never sleep to babysit this course day and night, to make your learning experience productive, smooth and exciting....
Globe

100% online courses

Start instantly and learn at your own schedule.
Calendar

Flexible deadlines

Reset deadlines in accordance to your schedule.
Intermediate Level

Intermediate Level

Clock

Approx. 41 hours to complete

Suggested: 6 weeks of study, 6-8 hours/week...
Comment Dots

English

Subtitles: English...

Skills you will gain

Python ProgrammingApache HadoopMapreduceApache Spark
Globe

100% online courses

Start instantly and learn at your own schedule.
Calendar

Flexible deadlines

Reset deadlines in accordance to your schedule.
Intermediate Level

Intermediate Level

Clock

Approx. 41 hours to complete

Suggested: 6 weeks of study, 6-8 hours/week...
Comment Dots

English

Subtitles: English...

Syllabus - What you will learn from this course

Week
1
Clock
14 minutes to complete

Welcome

...
Reading
8 videos (Total 14 min)
Video8 videos
Issues BigData can solve1m
BigData Applications1m
What is BigData Essentials?2m
Course Structure2m
Meet Emeli1m
Meet Alexey2m
Meet Ivan1m
Clock
8 hours to complete

What are BigData and distributed file systems (e.g. HDFS)?

...
Reading
18 videos (Total 136 min), 10 readings, 5 quizzes
Video18 videos
File system managing6m
File content exploration 15m
File content exploration 213m
Processes4m
Scaling Distributed File System9m
Block and Replica States, Recovery Process 16m
Block and Replica States, Recovery Process 27m
HDFS Client9m
Web UI, REST API4m
Namenode Architecture8m
Introduction10m
Text formats9m
Binary formats 18m
Binary formats 28m
Compression7m
How to submit your first assignment3m
How to Install Docker on Windows 7, 8, 104m
Reading10 readings
Basic Bash Commands10m
Slack Channel is the quickest way to get answers to your questions10m
HDFS Lesson Introduction10m
Gentle Introduction into "curl"10m
File formats extra (optional)10m
Grading System: Instructions and Common Problems10m
Docker Installation Guide10m
Programming Assignment: Instructions and Common Problems10m
FAQ How to show your code to teaching staff10m
Slack channel "Bigdata-coursera" - the quickest to solve technical problems.10m
Quiz2 practice exercises
Distributed File Systems16m
Big Data and Distributed File Systems25m
Week
2
Clock
3 hours to complete

Solving Problems with MapReduce

...
Reading
17 videos (Total 94 min), 1 reading, 3 quizzes
Video17 videos
Unreliable Components 28m
MapReduce4m
Distributed Shell8m
Fault Tolerance7m
Fault Tolerance. Live Demo3m
Streaming7m
Streaming in Python3m
WordCount in Python5m
Distributed Cache4m
Environment, Counters4m
Testing5m
Combiner5m
Partitioner7m
Comparator1m
Speculative Execution / Backup Tasks3m
Compression4m
Reading1 reading
Hadoop Streaming Assignments: Intro and Code Samples10m
Quiz3 practice exercises
Hadoop MapReduce Intro26m
MapReduce Streaming26m
Hadoop Streaming Final30m
Week
3
Clock
4 hours to complete

Solving Problems with MapReduce (practice week)

...
Reading
1 video (Total 3 min), 5 readings, 5 quizzes
Reading5 readings
Hadoop Streaming Assignments: Intro and Code Samples10m
Hints to Debug Hadoop Streaming Applications10m
Grading System and Grading System Sandbox User Guide10m
Hadoop Streaming Assignments: Instructions10m
Hint to the "Stop words" programming assignment10m
Week
4
Clock
3 hours to complete

Introduction to Apache Spark

...
Reading
16 videos (Total 95 min), 2 readings, 2 quizzes
Video16 videos
RDDs8m
Transformations 16m
Transformations 27m
Actions5m
Resiliency6m
Execution & Scheduling6m
Caching & Persistence5m
Broadcast variables5m
Accumulator variables5m
Getting started with Spark & Python6m
Working with text files6m
Joins4m
Broadcast & Accumulator variables5m
Spark UI4m
Cluster mode3m
Reading2 readings
Spark Assignments Intro10m
Instructions for Spark programming assignment10m
Quiz2 practice exercises
Lesson 1 Quiz20m
Lesson 2 Quiz24m
4.0
57 ReviewsChevron Right

Top Reviews

By SDJun 28th 2018

Absolutely essential for everyone who wants a proper introduction to HDFS, MapReduce and Spark. Brought to you by a great team of geniuses of their time ;)

By NPApr 27th 2018

The course gave me more techical skill with Hadoop and Spark that help me can confidence in my career. Thank coursera and yandex so much.

Instructors

Ivan Puzyrevskiy

Technical Team Lead

Alexey A. Dral

Founder and Chief Executive Officer
BigData Team

About Yandex

Yandex is a technology company that builds intelligent products and services powered by machine learning. Our goal is to help consumers and businesses better navigate the online and offline world....

About the Big Data for Data Engineers Specialization

This specialization is made for people working with data (either small or big). If you are a Data Analyst, Data Scientist, Data Engineer or Data Architect (or you want to become one) — don’t miss the opportunity to expand your knowledge and skills in the field of data engineering and data analysis on the large scale. In four concise courses you will learn the basics of Hadoop, MapReduce, Spark, methods of offline data processing for warehousing, real-time data processing and large-scale machine learning. And Capstone project for you to build and deploy your own Big Data Service (make your portfolio even more competitive). Over the course of the specialization, you will complete progressively harder programming assignments (mostly in Python). Make sure, you have some experience in it. This course will master your skills in designing solutions for common Big Data tasks: - creating batch and real-time data processing pipelines, - doing machine learning at scale, - deploying machine learning models into a production environment — and much more! Join some of best hands-on big data professionals, who know, their job inside-out, to learn the basics, as well as some tricks of the trade, from them. Special thanks to Prof. Mikhail Roytberg (APT dept., MIPT), Oleg Sukhoroslov (PhD, Senior Researcher, IITP RAS), Oleg Ivchenko (APT dept., MIPT), Pavel Akhtyamov (APT dept., MIPT), Vladimir Kuznetsov, Asya Roitberg, Eugene Baulin, Marina Sudarikova....
Big Data for Data Engineers

Frequently Asked Questions

  • Once you enroll for a Certificate, you’ll have access to all videos, quizzes, and programming assignments (if applicable). Peer review assignments can only be submitted and reviewed once your session has begun. If you choose to explore the course without purchasing, you may not be able to access certain assignments.

  • When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

More questions? Visit the Learner Help Center.