[go: up one dir, main page]

0% found this document useful (0 votes)
42 views54 pages

Data Engineering - Curriculum

The document outlines the performance and learning outcomes for a data engineering course, covering essential skills such as SQL, ETL design patterns, NoSQL, Pyspark, and cloud fundamentals. It details the course structure, including training segments, delivery methods, and estimated durations for each module. Additionally, it specifies the assessment methods and knowledge levels required for various topics within the curriculum.

Uploaded by

211b059
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views54 pages

Data Engineering - Curriculum

The document outlines the performance and learning outcomes for a data engineering course, covering essential skills such as SQL, ETL design patterns, NoSQL, Pyspark, and cloud fundamentals. It details the course structure, including training segments, delivery methods, and estimated durations for each module. Additionally, it specifies the assessment methods and knowledge levels required for various topics within the curriculum.

Uploaded by

211b059
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 54

S No Performance Outcome

1
2 Should be able to write basic query with best practices
3
4 Should be able to understand the need of DW
5 Should understand different ETL design patterns
6 Should be able to navigate and perform basic needs to execute scripts
Should be able to analyze and understand existing shell scripts and able to develop simple shell
7 scripts
8 should be able to understand the concept of NoSQL.
9 Should have a very good knowledge and understanding on Pyspark

10 Should have a very good knowledge and understanding on Spark

11 Should have a very good knowledge and understanding on Data Preprocessing in Python
12 Should be able to understand the concept of Cloud fundamentals
13 Should be able to understand the concept of Azure Data Factory
14 Should be able to understand the concept of AWS

15 Should be able to understand the concept of Data Analytics & Data Science techniques and tools

16 Should be able to understand the concept of Machine learning algorithms & Predictive models
Level (Awareness, Skill
Learning Outcome
and Knowledge)

Should be able to understand database operations / various database manipulations (DML, Skill level
DDL, DQL, DCL, TCL) using SQL. Should be able to query using Joins and sub queries

Should be able to understand basics of data warehousing, architecture, basics of DataMart


and Operational Data Store Knowledge
Should understand different ETL patterns in spark like Lambda architecture etc.., Knowledge
Should be able to use basic commands of UNIX/Windows operating system. Skill level
Should be able to understand, develop and execute basic UNIX shell scripting Skill level
Should be able to describe the concepts on NoSQL Knowledge
Should have a clear understanding on Handling Data Apache Spark SQL and Streaming,
Transformation Skill level

Should have a clear understanding on Spark execution model, Structured API, Dataframe. Skill level

Should have a clear understanding on Data cleaning, Transformations of numerical features Skill level
Should be able to describe the concepts in Cloud fundamentals Knowledge
Should be able to describe the concepts in Azure SQL, Azure Blob storage, Azure Data
Factory, Azure Synapse Analytics Knowledge
Should be able to describe the concepts in AWS S3, AWS Glue Knowledge
Should be able to describe the concepts in Data Analytics & Data Science techniques and
tools Knowledge

Should be able to describe the concepts in Machine learning algorithms & Predictive models Knowledge
Sub Track

SQL

Data Warehouse
ETL - Basics

Unix shell scripting

NoSQL

Pyspark, Spark

Data Preprocessing in Python


Cloud fundamentals

Azure Data Factory


AWS
Data Analytics & Data Science
techniques and tools
Machine learning algorithms &
Predictive models
Course Segment Training Segment Stage

Fundamentals Enablement Stage 1


Fundamentals Enablement Stage 1
Fundamentals Assessment Stage 1
Fundamentals Enablement Stage 1
Fundamentals Enablement Stage 1
Fundamentals Enablement Stage 1
Fundamentals Enablement Stage 1
Assessment Assessment Stage 1
Behavioural Enablement Stage 1 & 2
Gen AI Enablement Stage 2
Data Engineering Enablement Stage 2
Data Engineering Assessment Stage 2
Data Engineering Enablement Stage 2
Data Engineering Assessment Stage 2
Evaluation Interim Evaluation Stage 2
Data Engineering Enablement Stage 2
Data Engineering Assessment Stage 2
Data Engineering Enablement Stage 2
Data Engineering Enablement Stage 2
Data Engineering Project Stage 2
Evaluation Final Evaluation Stage 2
New Course/Module Delivery Method

ANSI SQL Self - Learning, Doubt Clarification &


DBMS & Data Modeling Techniques Elearning
SQL Assessment Practice Assessment
NO SQL Self - Learning & Doubt Clarification
DW Basics Self - Learning & Doubt Clarification
ETL Concepts Self - Learning & Doubt Clarification
Unix file commands & scripting Self - Learning, Doubt Clarification &
Qualifier Assessment
Behavioural ILT
Fundamentals of Gen AI ILT & Elearning
PySpark, Spark for beginners Self - Learning, Doubt Clarification &
PySpark, Spark Practice Assessment
Data Preprocessing in Python Self - Learning, Doubt Clarification &
Python coding basics Practice Assessment
Interim Evaluation Evaluation
Cloud fundamentals, Azure Data Factory, AWS S3, AWS Glue Self - Learning & Doubt Clarification
AWS Practice Assessment
Data Analytics & Data Science techniques and tools Self - Learning & Doubt Clarification
Machine learning algorithms & Predictive models Self - Learning & Doubt Clarification
Project Case study ILT
Final Evaluation Evaluation
Total

Total duration(in days)


Total duration(in weeks)
Duration Enablement Assessment Comments
assurance Level assurance level
40 Skill skill
7 skill knowledge
4 Skill skill
12 knowledge knowledge
6 knowledge knowledge
9 knowledge knowledge
20 skill knowledge
16 Skill skill
39 Skill skill
4 Knowledge Knowledge
60 skill Skill
1 knowledge knowledge
26 skill Skill
1 knowledge knowledge
18 skill Skill
40 knowledge knowledge
2 knowledge knowledge
5 knowledge knowledge
16 knowledge knowledge
80 skill Skill
24 skill Skill
430

53.75
10.75
Understanding ANSI SQL : Table of Co
Module Name: Understanding ANSI SQL
Coverage of Each Modul
Topic # Learning
Topic Name
Objective #

1 Understanding SQL
1
2

2 DDL, DML, DQL, DCL, TCL


1
2
3
4
5
6

3 Understanding Constraints and their Types


1
2
3
4
5
6
7
8
9
10
11
12
13
14

4 SQL Operators
1
2
3
4
5
6

5 SQL Functions
1
2
3
4
5
6
7

6 Clauses in SQL
1
2
3
4
5

7 Joins and their Types


1
2
3
4
5
6
7
8
9
10
11
12
13

8 Sub-queries
1
2
3
4
5
6
7
8
9
10
11
12
13

9 Views and Indexes


1
2
3
4
5
6
7
8
Understanding ANSI SQL : Table of Contents

Coverage of Each Module


Estimated
Learning Objective for the Topics Duration In Mts
for eLearning

Understanding ANSI SQL 15


ANSI SQL Data Types 15
Estimated Time Duration for this Topic 30

Data Definition Language : CREATE, ALTER, RENAME, DROP, TRUNCATE


Data Manipulation Language : INSERT, UPDATE, DELETE
Data Query Language : SELECT, FETCH FIRST 60
Data Control Language : GRANT, REVOKE,
Transaction Control Language : COMMIT, SAVEPOINT, ROLLBACK
Case study
Estimated Time Duration for this Topic 60

Data Integrity
Integrity Constraints
Entity integrity
PRIMARY KEY Constraint
Sequence generators
Referential Integrity
FOREIGN KEY Constraint 60
Domain Integrity
NOT NULL Constraint
UNIQUE KEY Constraint
CHECK Constraint
User Defined Integrity
Enabling and Disabling Constraints
Case Study
Estimated Time Duration for this Topic 60

SQL Operators & their types


Arithmetic operators
Comparison operators 60
Logical operators
Set operators
Case Study
Estimated Time Duration for this Topic 60

ANSI (SQL 99) SQL Functions Classification


Deterministic and Nondeterministic functions
Aggregate Functions and Scalar Functions
60
String Functions, Mathematical Functions
60
Miscellaneous Functions (COALESCE & NULLIF)
Nesting of Functions & SQL Expression
Case study
Estimated Time Duration for this Topic 60

Group By Clause
Having Clause
30
Order By Clause
Order of Execution of Clauses in SELECT Statement
Case study
Estimated Time Duration for this Topic 30

JOIN & JOIN Style


Theta Style
ANSI Style : JOIN ... ON & JOIN ... USING
CROSS JOIN
INNER JOIN
EQUI-JOIN
120
NATURAL JOIN
OUTER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
SELF JOIN
Case study
Estimated Time Duration for this Topic 120

Understanding Subqueries
Advantages of subqueries
Rules of subqueries
Using Subqueries With SELECT, INSERT, UPDATE, DELETE
Subqueries Types
Scalar Subquery
60
Single Row Subquery
Multiple Row Subquery
Usage of IN, NOT IN, ALL, ANY, and SOME
Correlated Subqueries
Usage of EXISTS, NOT EXISTS
Difference between Correlated & Non-Correlated Subquery
Case study
Estimated Time Duration for this Topic 60

Database Objects
What is View?
Advantages of View
Inline View 40
What is Index ?
Index Architecture : Non-clustered & Clustered
40

Unique Index
Case study
Estimated Time Duration for this Topic 40
Total Duration in Mins 520
Total Duration in Hours 9
Estimated Estimated
Duration In Mts Duration In Mts
for Hands-on Total
0 0
0 15
0 15
0 30
0

0 60

120 120
120 180
0

0 60

180 180
180 240
0

0 60

240 240
240 300
0

0 60
0 60

240 240
240 300
0

0 30

240 240
240 270
0

0 120

480 480
480 600
0

0 60

320 320
320 380
0

0 40
0 40

60 60
60 100
1880 2400
31 40
DBMS & Data Modeling : Table of Conten
Module Name: DBMS & Data Modeling
Coverage of Each Module
Topic # Learning
Topic Name
Objective #

1 Introduction to Database Systems


1
2
3
4
5

2 DBMS Architecture
1
2
3

3 Types of Databases
1
2

5 Overview of Data Model


1
2

6 Categories of Data Model


1
2

7 Stages of Data Model


1
2
3

8 Logical Data Model Contents


1
2
3
4
5

9 Demo on ErwinTool
1
2

10 Converting Logical to Physical Model


1
2

11 Requirement Analysis
1
2
3

12 Normalization and De-Normalization


1
2
3
4
5
6
7

13 Specialization & Generalization


1
2
3
DBMS & Data Modeling : Table of Contents

Coverage of Each Module


Estimated Estimated
Learning Objective for the Topics Duration In Mts Duration In Mts
for Theory for Elearnings

Need for a database 10


File- Based Systems 10
Define database and DBMS 15
features of the DBMS 15
usage of database 10
Estimated Time Duration for this Topic 60

three-level architecture of DBMS 10


functions of Database Systems 5
overall system architecture 15
Estimated Time Duration for this Topic 30

structure of data 10
process of data access in the various data models 20
Estimated Time Duration for this Topic 30

Role of the Data Model in Application development. 10


Benefits of Data Model. 10
Estimated Time Duration for this Topic 0 20

OLTP 10
Dimensional Modeling 10
Estimated Time Duration for this Topic 0 20

Conceptual Modeling 20
Logical Modeling
Physical Modeling
Estimated Time Duration for this Topic 0 20

Entity 20
Attribute
Relationship
Notation
Keys-PK, FK,AK etc
Estimated Time Duration for this Topic 0 20

Creating Entities,Attributes 25
Creating different types of relationships 25
Estimated Time Duration for this Topic 50
Steps for logical to physical data model conversion 25
Physical Model -Primary Keys & Constraints 25
Estimated Time Duration for this Topic 50

The goals of requirement analysis 10


Points to keep in mind for requirement analysis.
Structured Data Modeling Process
Estimated Time Duration for this Topic 0 10

Why Normalization? 10
Normalization Forms - First Normal Form (1NF) 10
Second Normal Form (2NF) 10
Third Normal Form (3NF) 10
Boyce-Codd Normal Form (BCNF) 10
Why do we need to de-normalize? 10
Pros & Cons of de-normalization
Estimated Time Duration for this Topic 0 60

What is Specialization & Generalization? 20


Why do we need Specialization & Generalization? 10
Rollup & Rolldown Concepts 20
Estimated Time Duration for this Topic 50
Total Duration in Mins 270 150
Total Duration in Hours 4.5 2.5
Estimated Estimated
Duration In Mts for Duration In Mts
Hands-on Total

10
10
15
15
10
60

10
5
15
30

10
20
30

10
10
20

10
10
20

20
0
0
20

20
0
0
0
0
20

25
25
50
25
25
50

10
0
0
10

10
10
10
10
10
10
0
60

20
10
20
50
420
7
NoSQL : Table of Contents

Coverage of Each Module


Topic # Learning
Topic Name Objective Learning Objective for the Topics
#
1 NoSQL
1 Introduction to NoSQL and MongoDB
2 Importing and Exporting Data
3 Mongo Query Language
4 Updating Documents
5 The Aggregation Framework
6 Variables in Aggregation Expressions
7 Schema Validation and Data Modelling
8 Indexes and Performance
9 MongoDB Drivers (Python)
Estimated Time Duration for this Topic

Total Duration in Hours


e of Contents

ge of Each Module

Estimated Duration In Estimated Duration In Estimated Duration In Mts


Mts for Theory Mts for Hands-on Total

30 30
30 30
120 120
30 30
180 180
30 30
30 30
60 60
30 30
540 0 540

9
DW Basics : Table of Contents
Module Name: DW Basics
Coverage of Each Module
Topic # Learning
Topic Name
Objective #

1 Introduction and Architecture


1
2
3
4

2 Basics of Data Warehouse


1
2
3
4

3 Data Marts
1
2
3
4
5
6
7

4 Operational Data Store


1
2
3
4
5
6
7
8

5 Enterprise Data Warehouse


1
2
3

6 Datawarehouse case study


1
DW Basics : Table of Contents

Coverage of Each Module


Estimated Duration
Learning Objective for the Topics In Mts for Theory +
Demo

What is Operational System? 20


Characteristics of Operational Systems 10
Need for a Separate Informational System 20
Information Center 10
Estimated Time Duration for this Topic 60

Data Warehouse: Definition 10


Data Warehouse: Features,Data,Business Benefits,Application Areas 15
Basic Data Warehouse Architecture & Implementation 25
Data Warehouse: Differences from Operational Systems 10
Estimated Time Duration for this Topic 60

Data Marts: Overview 10


Data Marts: Needs 5
Data Marts: Features 5
Data Marts: Types 20
Advantages of Data Mart 5
Disadvantages of Data Mart 5
Data Warehouse vs Data Mart 10
Estimated Time Duration for this Topic 60

Operational Data Store Definition 10


ODS: Needs 5
ODS: Data 5
ODS: Benefits 5
Operational Data Store: Update schedule 5
ODS Vs Data Warehouse 15
What is OLAP 10
OLAP Terminology 5
Estimated Time Duration for this Topic 60

Enterprise Data Warehouse (EDW) 20


EDW- “Top Down” Approach 20
EDW- “Bottom up” Approach 20
Estimated Time Duration for this Topic 60

Case study - Store Data Warehouse 0


Estimated Time Duration for this Topic 0
Total Duration in Mins 300
Total Duration in Hours 5
Estimated Estimated Duration
Duration In Mts In Mts
for Hands-on Total

0 20
0 10
0 20
0 10
0 60

0 10
0 15
0 25
0 10
0 60

0 10
0 5
0 5
0 20
0 5
0 5
0 10
0 60

0 10
0 5
0 5
0 5
0 5
0 15
0 10
0 5
0 60

0 20
0 20
0 20
0 60

60 60
60 60
60 360
1 6
ETL Concepts : Table of Contents
Module Name: ETL CONCEPTS

Coverage of Each Module


Topic # Learning
Topic Name
Objective #

1
Introduction to ETL Concepts
1
2
3
4
5
6
7

2 ETL for the Data Warehouse


1
2
3
4
5

3 ETL for the Data Mart


1
2
3
4
5

4 ETL for ODS


1
2

5 Overview on Advanced ETL


1
2
3
4
5
ETL Concepts : Table of Contents

Coverage of Each Module


Estimated Estimated Estimated Duration
Learning Objective for the Topics Duration In Mts Duration In Mts In Mts
for Theory for Hands-on Total

What is ETL 20 20
ETL Architecture 20 20
Transformation Options 20 20
ETL Standards 20 20
ETL and metadata 20 20
FACT and Dimension Tables 20 20
SCD I/II/III 30 30
Estimated Time Duration for this Topic 150 0 150

Data Sourcing / Changed Data Capture


20 20
Data Transport
20 20
Data Staging
20 20
Changed Data Determination
20 20
Loading normalized warehouse structures
30 30
Estimated Time Duration for this Topic 110 0 150

Surrogate key lookup and assignment


20 20
Slowly Changing Dimensions - Types 1,2, 3 & 6
30 30
Denormalization and impact on ETL
20 20
Populating “junk” dimensions using a Cartesian product
30 30
Aggregation
30 30
Estimated Time Duration for this Topic 130 0 130
Real/near time approaches
Data Modeling differences 30 30
30 30
Estimated Time Duration for this Topic 60 0 60

Indexing (b-tree, bitmap, join indexes, etc)


Forms of Parallelism 30 30
30 30
RDBMS tuning and ETL
30 30
Caching/Partitioning 30 30
ETL Tools in the market and their Comparison
30 30
Estimated Time Duration for this Topic 150 0 150

Total Duration in Mins 540 0 540


Total Duration in Hours 9 0 9
UNIX & SHELL Scripting : Table of Contents
Module Name: UNIX & SHELL Scripting
Coverage of Each Module
Topic # Learning
Topic Name
Objective #

1
Introduction to Unix and Basic Concepts
1
2
3
4

2 Unix commands Shell scripting


1
2
3
4

3 Advanced Shell scripting


1
2
3
UNIX & SHELL Scripting : Table of Contents

Coverage of Each Module


Estimated Estimated
Learning Objective for the Topics Duration In Mts Duration In Mts
for Theory for Hands-on

Overview of Unix Operating system,Kernel,History 60 0


File system basics 60 0
Editors 60 30
Unix commands 60 30
Estimated Time Duration for this Topic 240 60

More Unix commands 60 0


Introduction to shell scripting 30 0
Shell variables,Operators 120 30
program flow controls,Functions,sample shell scripts 90 30
Estimated Time Duration for this Topic 300 60

Command redirection,Job control,Embedded scripts 90 0


Regular expressions,Signals,traps,Other useful commands 90 90
Best practices 60 60
Estimated Time Duration for this Topic 240 150
Total Duration in Mins 780 270
Total Duration in Hours 13 4.5
Estimated
Duration In Mts
Total

60
60
90
90
300

60
30
150
120
360

90

180
120
390
1050
17.5
Pyspark : Table of Content
Module Name: Pyspark
Coverage of Each Mo

Topic # Learning
Topic Name
Objective #

1 Spark
1
2
3
4
5
6
7
8
9
10

2 Handling Data Apache Spark SQL and Streaming


1
2
3
4
5

Total Time Duration


Total Time Duration (In Hours)
Pyspark : Table of Contents

Coverage of Each Module

Estimated
Learning Objective for the Topics Duration In Mins
for Theory

Introduction to Spark
Transformations, Actions, RDD, DataSet
Key Value Methods and Caching Data
120
Distribution and Parallelism
Spark Streaming
Optimization
Data Exploration and Analysis
Transforming and Cleaning Unstructured Data
120
Summarizing Data Along Dimensions
Broadcasting and Accumulator
Estimated Time Duration for this Topic 240

Introduction
Querying Data with the DataFrames
Improving Type Safety with Datasets 240
Processing Data with the Streaming API
Optimizing, Structured Streaming, and Spark 2.x
Estimated Time Duration for this Topic 240

Total Time Duration 480


Total Time Duration (In Hours) 8
Estimated
Total Estimated
Duration In Mins
Duration In Mins
for Hands-on

180 300

180 300

360 600

360 600

360 600

720 1200
12 20
Spark : Table of Contents
Module Name: Spark
Coverage of Each Module
Topic # Learning Estimated Duration In
Topic Name Learning Objective for the Topics
Objective # Minutes for Theory
1 Spark Programming
1 Introduction to Spark 60
2 Why do we need spark 60
3 Installing and using Apache spark 240
4 Spark execution model and architecture 240
5 Spark programming model 240
6 Structured API foundataion 300
7 Data sources and sinks 300
8 Dataframe and dataset transformations 300
9 Aggregations in Spark 300
10 Dataframe joins 300
11 Alternatives for Spark 60
Estimated Time Duration for this Topic 2400

Total Estimated Time Duration (In Mins) 2400


Total Estimated Time Duration (In Hours) 40
Estimated Duration In Estimated Duration
Minutes for Hands-on In Minutes - Total

60
60
240
240
240
300
300
300
300
300
60
0 2400

0 2400
0 40
Data Preprocessing in Python : Table of Contents
Module Name: Spark
Coverage of Each Module
Topic # Learning Estimated Duration In
Topic Name Learning Objective for the Topics
Objective # Minutes for Theory
1 Data Preprocessing in Python
1 Data Cleaning 60
2 Encoding of the categorical features 45
3 Transformations of the numerical features 45
4 Pipelines 30
5 Scaling 30
6 Principal Component Analysis 30
7 Filter-based feature selection 60
8 A complete pipeline 30
9 Oversampling 30
Estimated Time Duration for this Topic 360

Total Estimated Time Duration (In Mins) 360


Total Estimated Time Duration (In Hours) 6
ontents

Estimated Duration In Estimated Duration


Minutes for Hands-on In Minutes - Total

60
45
45
30
30
30
60
30
30
900 1260

900 1260
15 21
Cloud Fundamentals : Table of Contents

Coverage of Each Module


Topic # Learning
Topic Name Objective Learning Objective for the Topics
#
1 Cloud Fundamentals
1 Create a Free Tier Account on AWS
2 IT Fundamentals
3 Cloud Computing Concepts
4 AWS Access Control and Networking
5 Amazon EC2, Auto Scaling, and Load Balancing
6 AWS Storage Services
7 AWS Database Services
8 Automation and DevOps on AWS
9 DNS, Caching, and Performance Optimization
10 Containers and Serverless Computing
Estimated Time Duration for this Topic

AWS Glue : Table of Contents


1 AWS Glue
1 Introduction to Glue
2 Create a ETL Workflow in Glue
3 Writing Custom Script in Glue
4 Glue as Metadata for Hive
Estimated Time Duration for this Topic

2 S3
1 Core concepts of object store
2 S3-storage class, Lifecycle, replication
Estimated Time Duration for this Topic

Total Duration in Mins


Total Duration in Hours
als : Table of Contents

rage of Each Module

Estimated Duration In Estimated Duration In Estimated Duration In Mts


Mts for Theory Mts for Hands-on Total

20 20
40 40
30 30
60 60
60 60
60 60
30 30
30 30
30 30
60 60
420 0 420

Table of Contents

30 30
60 60
60 60
60 60
210 0 210

4 4

4 0 4

634 0 634
10.5666666666667 0 10.5666666666667
ADF, ADLS: Table of Contents
Module Name: ADF,ADFS
Coverage of Each Module
Topic #
Topic Name Learning Objective #
1 Introduction - Understanding Core Data Concepts
1
2
3
4
5
6
7
8

2 Azure SQL - Introduction


1
2
3
4
5
6

3 Azure Blob Storage - Introduction


1
2

4 Azure Data Factory - Core Concepts


1
2
3
4
5
6
7
8

5 Practice Section: Build an ETL Pipeline with Azure Data Factory


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

23
24

6 Azure Synapse Analytics - Serverless SQL pool


1
2

4
5
6
7
8
9
10
11
12
13

14

7 Azure Synapse Analytics - Serverless Apache Spark pool


1
2
3
4
5
6
7

8 Azure Synapse Analytics - Dedicated SQL Pool


1
2
3
4
5
6
7
8

9 ADF ADLF Synapse Project


1

Total Time Duration


Total Time Duration (In Hours)
ADF, ADLS: Table of Contents

Coverage of Each Module


Estimated Duration In Mins for
Learning Objective for the Topics
Theory

Introduction
Data - A simple definition
Introduction to Structured data
Introduction to Non Relational Data
120
Introduction to Data Ingestion
Introduction to Data Processing
Batch Processing vs Stream Processing
Introduction to Data Analytics
Estimated Time Duration for this Topic 120

Create a Single Instance Database


Create a Virtual Machine
Authentication and Authorization
Understanding Tables and Views 150
How to Create a Database Diagram in SSMS
Azure Cost Management - How to Create a Budget
in Azure
Estimated Time Duration for this Topic 150

Introduction to Azure Storage


120
Create an Azure Storage Account
Estimated Time Duration for this Topic 120

Section Intro
Create Datasets
Create Pipeline and Activities
Create Mapping Data Flow and Adding Sources
150
Mapping Data Flow - Joining Sources
Mapping Data Flow - Aggregate Data
Mapping Data Flow Execution
Mapping Data Flow and Apache Spark Execution
Estimated Time Duration for this Topic 150

Introduction
Cost Warning - Data Pipeline Pricing
Azure SQL - Contained Users
Azure Key Vault - Store SQL Server Secrets
Azure Key Vault - Linked Service
Create Azure Storage Account
Azure Managed Identity - Create a Linked Service
To Azure Blob Storage
Azure Role Based Access Control - Grant Access
To Managed Identity
Create a Dataset for the Lookup Activity
Azure Data Factory - Lookup Activity
Azure Data Factory - ForEach Activity & Pipeline
Expressions
Azure Data Factory - ForEach Activity - Part II
240
Parameterize a Dataset Part I - Container Name
Parameterize a Dataset Part II - Directory Name
Parameterize a Dataset Part III - File Name
Mapping Data Flow - JSON Source
Mapping Data Flow - Parquet Source
Mapping Data Flow - JOIN & Derived Column
Transformations
Mapping Data Flow - Aggregate Transformation
Mappind Data Flow - Parameterized CSV File Sink
Azure Data Factory - Store SAS In Azure Key Vault
Azure Data Factory - Copy Activity Merge
Behaviour
Azure Data Factory - End To End Pipeline
Execution
Azure Data Factory - Storage Event Triggers
Estimated Time Duration for this Topic 240

Data Processing - OLAP vs OLTP


Azure Synapse Analytics - Create a Synapse
workspace
Azure Synapse Analytics - Serverless SQL Pool
Introduction
Serverless SQL pool - Connect with Azure AD User
& Azure Data Studio
Serverless SQL pool - Server Level Credential
Openrowset - Read Parquet Files
Openrowset - Read CSV Files
180
Openrowset - Read JSON - Line Delimited JSON
Openrowset - Read JSON - Array of Objects
Serverless SQL pool - Introduction to External
Tables
Serverless SQL pool - Create External Table - Part I
Serverless SQL pool - Create External Table - Part II
Serverless SQL pool - Create External Table III -
How to Handle Dirty Records
Serverless SQL pool - CETAS - Create External
Table As Select
Estimated Time Duration for this Topic 180

Create a Serverless Apache Spark Pool


Scaling a Serverless Apache Spark Pool
Azure Synapse Analytics - Workspace Quotas
Working with Azure Data Lake Storage 120
120
Working with Azure Blob Storage
Working with Azure SQL
Practice - Configure your favorite IDE tool
Estimated Time Duration for this Topic 120

Synapse - Create a dedicated SQL pool


Load data - Copy Statement
Load data - CREATE TABLE AS SELECT (CTAS)
Star Schema - Architecture of a Data Warehouse
Hash-distributed table
Hash-distributed table - Choose the Distribution 120
Column
Round-robin distributed table
Practice: Create and Load Data into a Dedicated
SQL Pool
Workload Management - How to managed query
performance
Estimated Time Duration for this Topic 120

Capstone Project
Estimated Time Duration for this Topic 0
Total Time Duration 1200
Total Time Duration (In Hours) 20
Total Estimated Duration In
Mins

120

120

120

150

120

120

150

150
240

240

180

180

120
120

120

120

120

0
0
1200
20
Data Analytics & Data Science - Introduction : Table of Content

Coverage of Each Module


Topic # Learning
Topic Name Objective Learning Objective for the Topics
#
Business Analytics. Data Analytics &
1 Data Science - Introduction
1 The Basics of Data Driven Decision Making
2 Visualizing Data
3 Describing Data
4 Estimation & Confidence Intervals
5 Hypothesis Testing
6 One & Two Sample Hypothesis Testing
7 Correlation & Regression
8 Analysis of Variance
Guided Practical Demonstrations Following Two
9 Companies
Estimated Time Duration for this Topic

Total Duration in Hours


Introduction : Table of Contents

ge of Each Module

Estimated Duration In Estimated Duration In Estimated Duration In Mts


Mts for Theory Mts for Hands-on Total

30 30
10 10
10 10
10 10
10 10
15 15
10 10
10 10
75 75
180 0 180

3
Machine learning algorithms & Predictive models : Table

Coverage of Each Module


Topic # Learning
Topic Name Objective
#
Machine learning
1 algorithms &
Predictive models
1
2
3
4
5
6
7
8
9
Machine learning algorithms & Predictive models : Table of Contents

Coverage of Each Module

Estimated Duration In
Learning Objective for the Topics Mts for Theory

Introduction 30
Software used in this course R-Studio and Introduction to R 30
R Crash Course - get started with R-programming in R-Studio 60
Fundamentals of predictive modelling with Machine Learning: Thoery 90
Unsupervised Machine Learning and Cluster Analysis in R 90
Supervised Machine Learning in R: Classification in R 60
Supervised Machine Learning in R: Linear Regression Analysis 60
More types of regression models in R 60
Working With Non-Parametric and Non-Linear Data (Supervised Machine Learning) 60
Estimated Time Duration for this Topic 540

Total Duration in Hours


Contents

Estimated Duration In Estimated Duration In Mts


Mts for Hands-on Total

30
30
60
90
90
60
60
60
60
0 540

You might also like