BIG DATA ENGINEER
MASTER’S PROGRAM
In collaboration with IBM
www.simplilearn.com
1 | www.simplilearn.com
Contents
03 About the Course
04 Key Features of Big Data Engineer Master’s Program
05 About IBM and Simplilearn in collaboration
06 Learning Path Visualization
07 Program Outcomes
08 Who Should Enroll
09 Courses
Step 1: Big Data for Data Engineering
09
Step 2: Data Engineering with Hadoop
10
Step 3: Data Engineering with Scala
11
Step 4: Big Data Hadoop and Spark Developer
12
Step 5: Python for Data Science
14
Step 6: PySpark Training
15
Step 7: Big Data and Hadoop Administrator
17
Step 8: MongoDB Developer and Administrator
19
Step 9: Apache Cassandra
21
Step 10: Apache Spark and Scala
22
24 Electives
25 Certificates
26 Advisory Board Members
2 | www.simplilearn.com
About the Course
This Big Data Engineer Master’s framework, leverage the functionality
Program, in collaboration with IBM, of Apache Spark with Python,
provides training on the competitive simplify data lines with Apache Kafka,
skills required for a rewarding career and use the open source database
in data engineering. You’ll learn management tool MongoDB to store
to master the Hadoop Big Data data in Big Data environments.
3 | www.simplilearn.com
Key
Features
Industry-recognized certifications from
IBM and Simplilearn
Real-life projects providing hands-on
industry training
30+ in-demand skills
Lifetime access to self-paced learning and
class recordings
$1,200 worth of IBM cloud credits
4 | www.simplilearn.com
About IBM and Simplilearn
collaboration
A joint partnership with Simplilearn offering a plethora of technology
and IBM introduces students to an and consulting services. Each year,
integrated Blended Learning, making IBM invests $6 billion in research
them an expert in Data Engineering. and development and has achieved
The program, in collaboration with five Nobel Prizes, nine US National
IBM, will make students industry- Medals of Technology, five US
ready for Data Engineer job roles. National Medals of Science, six
IBM is a leading cognitive solutions Turing Awards, and 10 Inductions in
and cloud platform company, US Inventors Hall of Fame.
headquartered in Armonk, New York,
About Simplilearn
Simplilearn is a leader in digital skills by the industry’s highest completion
training, focused on the emerging rates. Partnering with professionals and
technologies that are transforming our companies, we identify their unique needs
world. Our blended learning approach and provide outcome-centric solutions to
drives learner engagement and backed help them achieve their professional goals.
5 | www.simplilearn.com
Learning Path - Data Engineer
Big Data for
Data Engineering
Data Engineering Data Engineering
with Hadoop with Scala
Big Data Hadoop
Python for
and Spark Developer
Data Science
Big Data
PySpark
and Hadoop
Training
Administrator
MongoDB Developer
Apache Cassandra
and Administrator
Electives
Apache Spark • Scala for Data Science
and Scala
• Spark for Scala Analytics
• Simplifying Data Pipelines
with Apache Kafka
6 | www.simplilearn.com
Big Data Engineer Master’s Program
Outcomes
Gain an in-depth understanding of Achieve insights on how to improve
the flexible and versatile frameworks business productivity by processing
on the Hadoop ecosystem, such as Big Data on platforms that can
Pig, Hive, Impala, HBase, Sqoop, handle its volume, velocity, variety,
Flume and Yarn. and veracity
Master tools and skills including Data Learn how Kafka is used in the real
Model Creation, Database Interfaces, world, including its architecture
Advanced Architecture, Spark, Scala, and components, get hands-on
RDD, SparkSQL, Spark Streaming, experience connecting Kafka to
Spark ML, GraphX, Sqoop, Flume, Spark, and work with Kafka Connect
Pig, Hive, Impala and Kafka
Architecture.
Become proficient with the
fundamentals of the Scala language,
Understand how to model data, its tooling, and the development
perform ingestion, replicate data, process
and shard data using a NoSQL
database management system
MongoDB.
Gain expertise in creating and
maintaining analytics infrastructure
and own the development,
deployment, maintenance,
and monitoring of architecture
components.
7 | www.simplilearn.com
Who Should Enroll in this Program?
A Big Data Engineer builds and Banking and finance
maintains data structures and professionals
architectures for data ingestion,
Database administrators
processing, and deployment
for large-scale data-intensive Beginners in the data
applications. It’s a promising engineering domain
career for both new and
experienced professionals with Students in UG/ PG
a passion for data, including: programs
IT professionals
8 | www.simplilearn.com
S
T
E
Big Data for Data Engineering P
1
This introductory course from IBM will teach you the basic concepts and 2
terminologies of Big Data, and its real-life applications across multiple
industries. You will gain insights on how to improve business productivity 3
by processing large volumes of data and extract valuable information
from them.
4
5
Key Learning Objectives 6
Understand what Big Data is, sources of Big Data, and real-life 7
examples
8
Learn about the key difference between Big Data and Data Science
9
Master how to use Big Data for operational analysis and better
customer service 10
Know the Ecosystem of Big Data and Hadoop framework
Course curriculum
Lesson 1 - What is Big Data?
Lesson 2 - Big Data: Beyond the Hype
Lesson 3 - Big Data and Data Science
Lesson 4 - Use Cases
Lesson 5 - Processing Big Data
9 | www.simplilearn.com
S
T
E
Data Engineering with Hadoop P
1
Apache Hadoop is one of the most in-demand technologies for analyzing 2
Big Data. This introductory Hadoop course by IBM will give you an
overview of what Hadoop is and its components, such as MapReduce and 3
HDFS. Additionally, this course will teach you to explore with large data
sets and use Hadoop’s method of distributed processing.
4
5
Key Learning Objectives 6
Understand Hadoop’s architecture and primary components, such as 7
MapReduce and Hadoop Distributed File System (HDFS)
8
Add and remove nodes from Hadoop clusters, check the available disk
space on each node, and modify configuration parameters 9
Learn about Apache projects that are part of the Hadoop ecosystem, 10
including Pig, Hive, HBase, ZooKeeper, Oozie, Sqoop, Flume, and
more.
Course curriculum
Lesson 1 - Introduction to Hadoop
Lesson 2 -Hadoop Architecture
Lesson 3 -Hadoop Administration
Lesson 4 -Hadoop Components
10 | www.simplilearn.com
S
T
E
Data Engineering with Scala P
1
Kickstart your learning of Scala with this introductory course and 2
familiarize yourself with Scala programming. Carefully crafted by IBM,
upon completion of this course you will be able to write your Scala 3
codes, perform Big Data analysis using Scala , and create your own Scala
projects.
4
5
Key Learning Objectives 6
Create your own Scala Project 7
Understand basic object-oriented programming methodologies in 8
Scala
9
Work with data in Scala such as pattern matching, applying synthetic
methods, handling options, failures, and futures 10
Course curriculum
Lesson 1 - Introduction
Lesson 2 - Basic Object Oriented Programming
Lesson 3 - Case Objects and Classes
Lesson 4 - Collections
Lesson 5 - Idiomatic Scala
11 | www.simplilearn.com
S
T
E
Big Data Hadoop and Spark Developer P
1
Simplilearn’s Big Data Hadoop Training Course helps you master Big 2
Data and Hadoop Ecosystem tools, such as HDFS, YARN, MapReduce,
Hive, Impala, Pig, HBase, Spark, Flume, Sqoop, Hadoop Frameworks, and 3
more concepts of Big Data processing life cycle. Throughout this online
instructor-led Hadoop Training, you will be working on real-time projects
4
on Retail, Tourism, Finance, etc. This Big Data Course also prepares you 5
for Cloudera’s CCA175 Big Data certification.
6
Key Learning Objectives 7
Learn how to navigate the Hadoop Ecosystem and understand how to 8
optimize its use
9
Ingest data using Sqoop, Flume, and Kafka
10
Implement partitioning, bucketing, and indexing in Hive
Work with RDD in Apache Spark
Process real-time streaming data
Perform DataFrame operations in Spark using SQL queries
Implement User-Defined Functions (UDF) and User-Defined Attribute
Functions (UDAF) in Spark
Course curriculum
Lesson 1 - Introduction to Bigdata and Hadoop
Lesson 2 - Hadoop Architecture Distributed Storage (HDFS)
and YARN
12 | www.simplilearn.com
Lesson 3 - Data Ingestion into Big Data Systems and ETL
Lesson 4 - Distributed Processing MapReduce Framework and Pig
Lesson 5 - Apache Hive
Lesson 6 - NoSQL Databases HBase
Lesson 7 - Basics of Functional Programming and Scala
Lesson 8 - Apache Spark Next-Generation Big Data Framework
Lesson 9 - Spark Core Processing RDD
Lesson 10 - Spark SQL Processing DataFrames
Lesson 11 - Spark MLLib Modelling BigData with Spark
Lesson 12 - Stream Processing Frameworks and Spark Streaming
Lesson 13 -Spark GraphX
13 | www.simplilearn.com
S
T
E
Python for Data Science P
1
Kickstart your learning of Python for Data Science with this introductory 2
course and familiarize yourself with programming. Carefully crafted by
IBM, upon completion of this course you will be able to write your Python 3
scripts, perform fundamental hands-on data analysis using the Jupyter-
based lab environment, and create your own Data Science projects using
4
IBM Watson. 5
6
Key Learning Objectives
7
Write your first Python program by implementing concepts of
variables, strings, functions, loops, conditions 8
Understand the nuances of lists, sets, dictionaries, conditions and 9
branching, objects and classes
10
Work with data in Python such as reading and writing files, loading,
working, and saving data with Pandas
Course curriculum
Lesson 1 - Python Basics
Lesson 2 - Python Data Structures
Lesson 3 - Python Programming Fundamentals
Lesson 4 - Working with Data in Python
Lesson 5 - Working with NumPy Arrays
14 | www.simplilearn.com
S
T
E
Pyspark Training P
1
Pyspark Training will provide an in-depth overview of Apache Spark, the 2
open-source query engine for processing large datasets, and how to
integrate it with Python using the PySpark interface. The course will show 3
you how to build and implement data-intensive applications as you dive
into the world of high-performance machine learning leveraging Spark
4
RDD, Spark SQL, Spark MLlib, Spark Streaming, HDFS, Sqoop, Flume, 5
Spark GraphX, and Kafka.
6
Key Learning Objectives 7
Understand how to leverage the functionality of Python as you deploy 8
it in the Spark ecosystem
9
Master Apache Spark architecture and how to set up a Python
environment for Spark 10
Learn about various techniques for collecting data, RDDs and contrast
them with DataFrames, how to read data from files and HDFS, and
how to work with schemas
Obtain a comprehensive knowledge of various tools that fall under
the Spark ecosystem such as Spark SQL, Spark MlLib, Sqoop, Kafka,
Flume and Spark Streaming
Create and explore various APIs to work with Spark DataFrames,
and learn how to aggregate, transform, filter, and sort data with
DataFrames.
15 | www.simplilearn.com
Course curriculum
Lesson 1 - A brief primer on Pyspark
Lesson 02 - Resilient Distributed Datasets
Lesson 03 - Resilient Distributed Datasets and actions
Lesson 04 - DataFrames and Transformations
Lesson 05 - Data Processing with Spark DataFrames
16 | www.simplilearn.com
S
T
E
Big Data and Hadoop Administrator P
1
This Big Data and Hadoop Administrator training course will furnish you 2
with the aptitudes and methodologies necessary to excel in the Big Data
Analytics industry. With this Hadoop Admin training, you’ll learn to work 3
with the adaptable, versatile frameworks based on the Apache Hadoop
ecosystem, including Hadoop installation and configuration, cluster 4
management with Sqoop, Flume, Pig, Hive, Impala, and Cloudera. You’ll
learn Big Data implementations that have security, speed, and scale..
5
6
Key Learning Objectives 7
Understand the fundamentals and characteristics of Big Data and 8
various scalability options available to help manage huge quantities of
data 9
Master the concepts of the Hadoop framework, including architecture, 10
Hadoop distributed file system, and deployment of Hadoop clusters
using core or vendor-specific distributions
Use Cloudera manager for setup, deployment, maintenance, and
monitoring of Hadoop clusters
Work with Hadoop clients, nodes for clients and web interfaces like
HUE to work with Hadoop Cluster
Use cluster planning and tools for data ingestion into Hadoop clusters,
and cluster monitoring activities
Understand security implementation to secure data and clusters
17 | www.simplilearn.com
Course curriculum
Lesson 1 - Big Data and Hadoop Introduction
Lesson 2 - Hadoop Distributed File System (HDFS)
Lesson 3 - Hadoop Cluster Setup and Working
Lesson 4 - Hadoop Configurations and Daemon Logs
Lesson 5 - Hadoop Cluster Maintenance and Administration
Lesson 6 - Hadoop Computational Frameworks
Lesson 7 - Scheduling: Managing Resources
Lesson 8 - Hadoop Cluster Planning
Lesson 9 - Hadoop Clients and Hue Interface
Lesson 10 - Data Ingestion in Hadoop Cluster
Lesson 11 - Hadoop Ecosystem ComponentsServices
Lesson 12 - Hadoop Security
Lesson 13 - Hadoop Cluster Monitoring
18 | www.simplilearn.com
S
T
E
MongoDB Developer and Administrator P
1
Become an expert MongoDB developer and administrator by gaining 2
an in-depth knowledge of NoSQL and mastering skills of data modeling,
ingestion, query, sharding, and data replication. The course includes 3
industry-based projects in e-learning and telecom domains. It is best
suited for database administrators, software developers, system 4
administrators, and analytics professionals.
5
6
Key Learning Objectives
7
Develop expertise in writing Java and NodeJS applications using
MongoDB 8
Master the skills of Replication and Sharding of data in MongoDB to 9
optimize read/write performance
Perform installation, configuration, and maintenance of MongoDB
10
environment
Get hands-on experience in creating and managing different types of
indexes in MongoDB for query execution
Proficiently store unstructured data in MongoDB
Develop skill sets in processing huge amounts of data using MongoDB
tools
Gain proficiency in MongoDB configuration, backup methods as well
as monitoring and operational strategies
Acquire an in-depth understanding of managing DB Notes, Replica set
& Master-Slave concepts
19 | www.simplilearn.com
Course curriculum
Lesson 1 - Introduction to NoSQL databases
Lesson 2 - MongoDB: A Database for the Modern Web
Lesson 3 - CRUD Operations in MongoDB
Lesson 4 - Indexing and Aggregation
Lesson 5 - Replication and Sharding
Lesson 6 - Developing Java and Node JS Application with MongoDB
Lesson 7 - Administration of MongoDB Cluster Operations
20 | www.simplilearn.com
S
T
E
Apache Cassandra P
1
This Apache Cassandra certification training will develop your expertise 2
in working with high-volume Cassandra database management system
as part of the Big Data Hadoop framework. With this Cassandra training, 3
you will learn Cassandra concepts, features, architecture and data
model, and how to install, configure and monitor open-source databases. 4
The Casandra course is ideal for software developers and analytics
professionals who wish to further their careers in the Big Data field.
5
6
Key Learning Objectives 7
Describe the need for Big Data and NoSQL 8
Explain the fundamental concepts of Cassandra and its architecture 9
Describe the architecture of Cassandra 10
Demonstrate data model creation in Cassandra
Use Cassandra database interfaces
Demonstrate Cassandra database configuration
Course curriculum
Lesson 1 - Introduction to Big Data and NoSQL Databases
Lesson 2 - Introduction to Cassandra
Lesson 3 - Architecture of Cassandra
Lesson 4 - Installation and Configuration of Cassandra
Lesson 5 - Cassandra Data Model
Lesson 6 - Cassandra Interfaces
Lesson 7 - Advanced Architecture and Cluster Management
Lesson 8 - Hadoop Ecosystem around Cassandra
21 | www.simplilearn.com
S
T
E
Apache Spark and Scala P
1
This Apache Spark and Scala certification training is designed to advance 2
your expertise working with the Big Data Hadoop Ecosystem. You will
master essential skills of the Apache Spark open source framework and 3
the Scala programming language, including Spark Streaming, Spark
SQL, Machine Learning Programming, GraphX programming and Shell 4
Scripting Spark. This Scala and Spark certification course will give you
vital skill sets and a competitive advantage for an exciting career as a
5
Hadoop Developer. 6
7
Key Learning Objectives
8
Understand the limitations of MapReduce and the role of Spark in
overcoming these limitations 9
Understand the fundamentals of the Scala programming language and 10
its features
Explain and master the process of installing Spark as a standalone
cluster 12
Develop expertise in using Resilient Distributed Datasets (RDD) for
creating applications in Spark
Master Structured Query Language (SQL) using SparkSQL
Gain a thorough understanding of Spark streaming features
Master and describe the features of Spark ML programming and
GraphX programming
22 | www.simplilearn.com
Course curriculum
Lesson 1 - Introduction to Spark
Lesson 2 - Introduction to Programming in Scala
Lesson 3 - Using RDD for Creating Applications in Spark
Lesson 4 - Running SQL Queries Using Spark SQL
Lesson 5 - Spark Streaming
Lesson 6 - Spark ML Programming
Lesson 7 - Spark GraphX Programming
23 | www.simplilearn.com
Elective Course
Scala for Data Science
This course will let flex your Scala skills for data
preparation, feature engineering, creating data pipelines,
and solving Big Data analytics problems. You will learn to
leverage the integration of Apache Spark and Scala and
how use Spark’s machine learning pipelines to fit models
and search for optimal hyperparameters using Scala in a
Spark cluster.
Spark for Scala Analytics
Through this course you will get an overview on the
history of Apache Spark, how it evolved, how to build
applications with Spark, RDDs and Data frames, Spark
and its associated ecosystems. It will teach you to
leverage the core RDD and DataFrame APIs to perform
analytics on datasets with Scala.
Simplifying Data Pipelines with Apache Kafka
Apache Kafka is an open-source stream processing
platform and a high-performance real-time messaging
system that can process millions of messages per
second. This Kafka training course curated by IBM
will guide you through Kafka architecture, installation,
interfaces, and configuration on their way to learning the
advanced concepts of Big Data. It will give you hands-on
experience connecting Kafka to Spark and working with
Kafka Connect.
24 | www.simplilearn.com
Certificates
Upon completion of this Master’s Program, you will receive the certificates
from IBM and Simplilearn in the Big Data Engineer courses in the learning path.
These certificates will testify to your skills as an expert in Data Engineering.
Upon program completion, you will also receive an industry recognized Master’s
Certificate from Simplilearn.
25 | www.simplilearn.com
Advisory board member
Ronald Van Loon
Top 10 Big Data & Data Science Influencer,
Director - Adversitement
Named by Onalytica as one of the three most
influential people in Big Data, Ronald is also an
author for a number of leading Big Data and
Data Science websites, including Datafloq, Data
Science Central, and The Guardian. He also
regularly speaks at renowned events.
26 | www.simplilearn.com
USA
Simplilearn Americas, Inc.
201 Spear Street, Suite 1100, San Francisco, CA 94105
United States
Phone No: +1-844-532-7688
INDIA
Simplilearn Solutions Pvt Ltd.
# 53/1 C, Manoj Arcade, 24th Main, Harlkunte
2nd Sector, HSR Layout
Bangalore - 560102
Call us at: 1800-212-7688
www.simplilearn.com
27 | www.simplilearn.com