About this Course
3.8
60 ratings
12 reviews
No doubt working with huge data volumes is hard, but to move a mountain, you have to deal with a lot of small stones. But why strain yourself? Using Mapreduce and Spark you tackle the issue partially, thus leaving some space for high-level tools. Stop struggling to make your big data workflow productive and efficient, make use of the tools we are offering you. This course will teach you how to: - Warehouse your data efficiently using Hive, Spark SQL and Spark DataFframes. - Work with large graphs, such as social graphs or networks. - Optimize your Spark applications for maximum performance. Precisely, you will master your knowledge in: - Writing and executing Hive & Spark SQL queries; - Reasoning how the queries are translated into actual execution primitives (be it MapReduce jobs or Spark transformations); - Organizing your data in Hive to optimize disk space usage and execution times; - Constructing Spark DataFrames and using them to write ad-hoc analytical jobs easily; - Processing large graphs with Spark GraphFrames; - Debugging, profiling and optimizing Spark application performance. Still in doubt? Check this out. Become a data ninja by taking this course! Special thanks to: - Prof. Mikhail Roytberg, APT dept., MIPT, who was the initial reviewer of the project, the supervisor and mentor of half of the BigData team. He was the one, who helped to get this show on the road. - Oleg Sukhoroslov (PhD, Senior Researcher at IITP RAS), who has been teaching MapReduce, Hadoop and friends since 2008. Now he is leading the infrastructure team. - Oleg Ivchenko (PhD student APT dept., MIPT), Pavel Akhtyamov (MSc. student at APT dept., MIPT) and Vladimir Kuznetsov (Assistant at P.G. Demidov Yaroslavl State University), superbrains who have developed and now maintain the infrastructure used for practical assignments in this course. - Asya Roitberg, Eugene Baulin, Marina Sudarikova. These people never sleep to babysit this course day and night, to make your learning experience productive, smooth and exciting....
Globe

100% online courses

Start instantly and learn at your own schedule.
Calendar

Flexible deadlines

Reset deadlines in accordance to your schedule.
Advanced Level

Advanced Level

Clock

Approx. 38 hours to complete

Suggested: 6 weeks of study, 6-8 hours/week...
Comment Dots

English

Subtitles: English...

Skills you will gain

GraphsHiveApache HiveApache Spark
Globe

100% online courses

Start instantly and learn at your own schedule.
Calendar

Flexible deadlines

Reset deadlines in accordance to your schedule.
Advanced Level

Advanced Level

Clock

Approx. 38 hours to complete

Suggested: 6 weeks of study, 6-8 hours/week...
Comment Dots

English

Subtitles: English...

Syllabus - What you will learn from this course

Week
1
Clock
12 minutes to complete

Welcome to the Second Course: Big Data Analysis

...
Reading
8 videos (Total 12 min)
Video8 videos
What is BigData Analysis?1m
Tools For BigData Analysis1m
Graph Data Analysis2m
Meet Alexey Dral2m
Meet Pavel Mezentsevm
Meet Natalia Pritykovskayam
Meet Pavel Klemenkovm
Clock
3 hours to complete

Big Data SQL: Hive

...
Reading
15 videos (Total 105 min), 1 reading, 3 quizzes
Video15 videos
HTTP Web Service: Access Log Format4m
Business Use Cases: Solution with Hive6m
(optional) SQL: likbez10m
Hive Data Definition Language (DDL)11m
Hive Data Manipulation Language (DML)6m
Hive Analytics: RegexSerDe, Views7m
(optional) Regular Expressions, Likbez9m
Hive Analytics: UDF, UDAF, UDTF7m
Hive Streaming4m
Hive PTF (Window Functions)5m
Hive Optimization: Partitioning, Bucketing and Sampling8m
Hive Map-Side Joins: Plain, Bucket, Sort-Merge5m
Hive Optimization: Data Skew4m
Hive Optimization: Row-Columnar File Formats, Compression8m
Reading1 reading
Slack Channel is the quickest way to get answers to your questions10m
Quiz3 practice exercises
Hive: SQL over Hadoop MapReduce20m
Hive Analytics with UDF and Streaming20m
Hive final20m
Week
2
Clock
7 hours to complete

Big Data SQL: Hive (practice week)

...
Reading
3 videos (Total 11 min), 6 readings, 5 quizzes
Video3 videos
How to Install Docker on Windows 7, 8, 104m
How to submit your first Hadoop assignment3m
Reading6 readings
Assignments. General requirements10m
Hive assignment. Intro and instructions10m
Grading System: Instructions and Common Problems10m
Docker Installation Guide10m
Copy of Assignments. General requirements10m
Copy of Assignments. General requirements10m
Week
3
Clock
2 hours to complete

Spark SQL and Spark Dataframe

...
Reading
14 videos (Total 82 min), 2 quizzes
Video14 videos
What is Pandas DataFrame and how to create it4m
How to process a DataFrame as SQL4m
Working with Hive4m
Reading and Writing Files7m
RDD vs. DF vs. SQL3m
Projection and Filtering5m
Functions5m
Aggregates6m
Join8m
User Defined Functions8m
Time Processing4m
Window Functions7m
Two-Dimensional Distributions4m
Quiz2 practice exercises
Introducing DataFrame and SQL16m
Spark SQL and Spark Dataframe18m
Week
4
Clock
4 hours to complete

Graph Analysis from Big Data Perspective

...
Reading
13 videos (Total 83 min), 5 quizzes
Video13 videos
Graph representation7m
Counting common friends. Part I2m
Counting common friends. Part II10m
Counting common friends. Part III5m
GraphFrames: Introduction6m
Motif Finding: DSL6m
Motif Finding: Counting Mutual Friends6m
Motif Finding: Under The Hood. Part 114m
Motif Finding: Under The Hood. Part 24m
Triangles Count: Introduction3m
Triangles Count: Edge Lists6m
Triangles Count: GraphFrame6m
Quiz4 practice exercises
Graph Representations10m
Motif Finding18m
Triangles Count8m
Graph Analysis from Big Data Perspective20m
3.8
Direction Signs

50%

started a new career after completing these courses
Briefcase

83%

got a tangible career benefit from this course

Top Reviews

By SSFeb 3rd 2018

I wish I could give more rating than 5 :). Excellent course. Thanks so much for such an excellent course. All the instructors are great.

By AKAug 7th 2018

This course is so detailed and focused on the basics of the Hive and Spark SQL frameworks. Amazing professors.

Instructors

Pavel Klemenkov

Chief Data Scientist
NVIDIA

Pavel Mezentsev

Senior Data Scientist
PulsePoint inc

Alexey A. Dral

Founder and Chief Executive Officer
BigData Team

About Yandex

Yandex is a technology company that builds intelligent products and services powered by machine learning. Our goal is to help consumers and businesses better navigate the online and offline world....

About the Big Data for Data Engineers Specialization

This specialization is made for people working with data (either small or big). If you are a Data Analyst, Data Scientist, Data Engineer or Data Architect (or you want to become one) — don’t miss the opportunity to expand your knowledge and skills in the field of data engineering and data analysis on the large scale. In four concise courses you will learn the basics of Hadoop, MapReduce, Spark, methods of offline data processing for warehousing, real-time data processing and large-scale machine learning. And Capstone project for you to build and deploy your own Big Data Service (make your portfolio even more competitive). Over the course of the specialization, you will complete progressively harder programming assignments (mostly in Python). Make sure, you have some experience in it. This course will master your skills in designing solutions for common Big Data tasks: - creating batch and real-time data processing pipelines, - doing machine learning at scale, - deploying machine learning models into a production environment — and much more! Join some of best hands-on big data professionals, who know, their job inside-out, to learn the basics, as well as some tricks of the trade, from them. Special thanks to Prof. Mikhail Roytberg (APT dept., MIPT), Oleg Sukhoroslov (PhD, Senior Researcher, IITP RAS), Oleg Ivchenko (APT dept., MIPT), Pavel Akhtyamov (APT dept., MIPT), Vladimir Kuznetsov, Asya Roitberg, Eugene Baulin, Marina Sudarikova....
Big Data for Data Engineers

Frequently Asked Questions

  • Once you enroll for a Certificate, you’ll have access to all videos, quizzes, and programming assignments (if applicable). Peer review assignments can only be submitted and reviewed once your session has begun. If you choose to explore the course without purchasing, you may not be able to access certain assignments.

  • When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

More questions? Visit the Learner Help Center.