Chevron Left
Back to Big Data Essentials: HDFS, MapReduce and Spark RDD

Big Data Essentials: HDFS, MapReduce and Spark RDD, Yandex

302 ratings
81 reviews

About this Course

Have you ever heard about such technologies as HDFS, MapReduce, Spark? Always wanted to learn these new tools but missed concise starting material? Don’t miss this course either! In this 6-week course you will: - learn some basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce and Spark; - be guided both through systems internals and their applications; - learn about distributed file systems, why they exist and what function they serve; - grasp the MapReduce framework, a workhorse for many modern Big Data applications; - apply the framework to process texts and solve sample business cases; - learn about Spark, the next-generation computational framework; - build a strong understanding of Spark basic concepts; - develop skills to apply these tools to creating solutions in finance, social networks, telecommunications and many other fields. Your learning experience will be as close to real life as possible with the chance to evaluate your practical assignments on a real cluster. No mocking, a friendly considerate atmosphere to make the process of your learning smooth and enjoyable. Get ready to work with real datasets alongside with real masters! Special thanks to: - Prof. Mikhail Roytberg, APT dept., MIPT, who was the initial reviewer of the project, the supervisor and mentor of half of the BigData team. He was the one, who helped to get this show on the road. - Oleg Sukhoroslov (PhD, Senior Researcher at IITP RAS), who has been teaching MapReduce, Hadoop and friends since 2008. Now he is leading the infrastructure team. - Oleg Ivchenko (PhD student APT dept., MIPT), Pavel Akhtyamov (MSc. student at APT dept., MIPT) and Vladimir Kuznetsov (Assistant at P.G. Demidov Yaroslavl State University), superbrains who have developed and now maintain the infrastructure used for practical assignments in this course. - Asya Roitberg, Eugene Baulin, Marina Sudarikova. These people never sleep to babysit this course day and night, to make your learning experience productive, smooth and exciting....

Top reviews


Nov 22, 2018

Everything in this course is new to me, but it provides me with many practice so I can gradually get familiar with all these new stuff. I find it a bit challenging, but overall it's quite good.


Jun 28, 2018

Absolutely essential for everyone who wants a proper introduction to HDFS, MapReduce and Spark. Brought to you by a great team of geniuses of their time ;)

Filter by:

78 Reviews

By Scott Small

Apr 05, 2019

I audited this course, because I was interested to complete the specialization. I finished the course and all of the assignments. After finishing this course, I will not continue the specialization.

For me, the biggest problem was the lectures regarding MapReduce. In my mind, there was a disconnect between the lecture materials and the assignments. The assignments also tended to be poorly worded; it was rarely clear what needed to be done to finish an assignment. I needed to use a lot of external resources here. I still do not understand map-side and reduce-side joins, and I do not feel comfortable writing a MapReduce job without a lot of time.

The lectures over Hadoop were ok, but strange. A lot of details are presented about how Hadoop works internally, and the speed at which the lectures move makes the discussion very dense and difficult to follow. However, the material is not used in the assignments or required further in the course, and the instructors are quite clear that this is the case. To me, this seems like a missed opportunity. There could be an entire week dedicated to the internals of Hadoop (or maybe even an entire course). After this course, I do feel comfortable getting around in an HDFS, and I feel I have a basic understanding of how it works.

The best part of the course was the lectures about Spark. The material was clearly presented, and the assignments were all relevant. The course gives a good introduction of Spark. I feel comfortable using basic SPARK operations to manipulate data.

If you wish to take this course, I recommend that you are knowledgeable about Linux Bash commands. There is a review section, but if you are seeing these ideas for the first time, I suspect you will suffer a lot.

The instructors provide Docker images so the assignments can be completed on a local computer. If you are not knowledgeable about Docker, I recommend learning through this course. It's not necessary but it's quite simple.

I do agree with others that the accents of some instructors can be difficult to understand. There are options for English subtitles which help a lot here.

Because I only audited the course, I could not submit any assignments for review. Thus I cannot comment on the automated grader. However many people in the forums complained about the grader.

I am interested to continue with Big Data topics, but this course was an inefficient way to learn. I fear the remaining courses in the specialization will be similar. I have completed several courses on Coursera, and this was by far the worst. I recommend the MapReduce section be improved and clarified.

By Ehsan Fathi

Apr 02, 2019

This is the most awful course I have ever had in Coursera

By Sock, Hanbin

Mar 25, 2019

I appreciate that practical assignments exist and were definitely helpful for really understanding how to use MapReduce and Spark.

My complaints come from various issues that shouldn't be issues. A link to a Jupyter notebook file for the statistics part on week 6 wouldn't be downloaded when clicked on, and instead opened it on a new page (and the notebook file did not work unless you copied and pasted the page AFTER VIEWING THE PAGE's SOURCE).

The URLs for the New York taxi data are completely broken, the auto marker gives unhelpful error messages (for example, for week 6 td*idf when the issue was that I redirected my first map reduce job's stderr to a file, the error message from the marker was to "use only 0 or >1 reducers". I was using 0 reducers already, so this error message confused me for hours until I found a random post on Slack that said that stderr is needed to be output to terminal for the first mapreduce).

The course does teach quite a bit, however the lack of support from instructors, poor error messages on auto testers, and other issues that you will naturally encounter taking the course make it difficult for me to recommend this course to others.

By Kassymzhomart Kunanbayev

Mar 22, 2019

Difficult for newbies, but good for intermediate

By kebize mustapha

Mar 21, 2019

<Good learning >

By Mayur Thakkar

Mar 08, 2019

Tools provided in the course to submit the assignment doesn't work and there is no response from the team on how to resolve this issue. All the users in this course are facing the same issue.

By Ferran Garcia Pagans

Mar 08, 2019

The tools provide to complete the course doesn't work.

By Álvaro Lemos

Mar 06, 2019

Awesome course, goes very deep into the details.

By Terry Anderson

Mar 01, 2019

Good general overview, start to the subject. Frustrated at consistent issues with development environment and/or ability to debug. Responses to questions and mentor assistance is seriously lacking.

By Kristin Abkemeier

Feb 25, 2019

I learned some useful information and got some experience in working with Hadoop MapReduce and Apache Spark, even though there was plenty about the course that made it a real headache. I do feel like I spent much more time trying to figure out how to make my answers pass the autograder rather than learning how to structure my code to solve big data problems. Even though I won't bother to finish this course, because I don't need it to get a new job right now, I figure that what I have learned will give me a headstart if and when I really do need to learn this material.