[go: up one dir, main page]

0% found this document useful (0 votes)
166 views11 pages

ADF Course Content

This document provides an overview of Azure data engineering concepts including Extract, Transform, Load (ETL) processes, data warehouses, Azure data services, roles of Azure Data Factory and data engineers. It describes common data types, formats, and triggers. It also outlines scenarios for using Azure Data Factory to move and transform data between various Azure data storage services like Azure SQL, Azure Synapse, Azure Data Lake, and Azure Blob storage. Key activities in ADF pipelines including copy, lookup, filter, and data flows are explained.

Uploaded by

anil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views11 pages

ADF Course Content

This document provides an overview of Azure data engineering concepts including Extract, Transform, Load (ETL) processes, data warehouses, Azure data services, roles of Azure Data Factory and data engineers. It describes common data types, formats, and triggers. It also outlines scenarios for using Azure Data Factory to move and transform data between various Azure data storage services like Azure SQL, Azure Synapse, Azure Data Lake, and Azure Blob storage. Key activities in ADF pipelines including copy, lookup, filter, and data flows are explained.

Uploaded by

anil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Introduction to Azure Data Engineer

o Importance of ETL
o What is OLTP
o Limitation of OLTP Systems
o What is Data warehouse
o Limitations of traditional data warehouse
o What is ETL(Extract Transformation and Loading)
o What is ELT(Extract Loading and Transformation)
o ETL Tools in the Market
o Importance of Transformation
o What is ADF
o Why ADF
o Advantages compared to traditional ETL tools
o Different Modern Data warehouse systems in Market
Introduction to the Azure Data Engineering Products
o Azure SQL
o Azure Synapse
o Azure Data Lake
o Azure Data Factory
o Azure Databricks
o Azure Keyvault
o Azure Cosmos DB
Roles and Responsibilities of Azure Data Factory and Data Engineer
o Different types of data

▪ Structured

▪ Unstructured

▪ Semi-Structured

o Different Formats of Data


▪ CSV

▪ JSON

▪ Parquet

▪ AVRO

▪ ORC

o Different types of Triggers

▪ Scheduled Triggers

▪ Tumbling Window Triggers

▪ Event based Triggers

o Connectors overview
o Data Transformations using Mapping Data flows
Introduction to Azure Data Factory
o Difference between ADF version1 and version2
o Building Blocks of ADF

▪ Pipelines

▪ Activities

▪ Datasets

▪ Linked Services

▪ Integration Runtimes

● Azure Auto Integration Runtime


● Self Hosted Integration Runtime
● SSIS Integration Runtime
Introduction to Azure Portal
o How to create Azure Account
o Azure subscriptions
o What are resources and Resource Groups
o Why multiple resource groups are needed
o What are regions in azure
o Region Advantages
Azure Storage Introduction
o Blob Storage
o Data Lake Storage
o Differences between Data Lake Gen1 and Gen2
o Storage Redundancy options

▪ LRS(Locally Redundant Storage)

▪ ZRS(Zone Redundant Storage)

▪ GRS(Geo Redundant Storage)

▪ GZRS(Geo Zone Redundant Storage)

o Difference between blob storage and data lake gen2


o File Shares
o Queues
o Table Storage
o Load the data from Azure blob storage to Power BI
o Load the data from Azure Data Lake to Power BI
o Different Tiers in Azure Storage

▪ Hot

▪ Cool

▪ Achieve

o Storage Life Cycle Management


o RBAC (Role Based Access Control)
o ACL(Access Control Lists)
o Shared Access Signatures
o Access Keys
Azure Data Factory Practical's
o Create Azure Data Factory
o Create Azure SQL Resource
o Connect Azure SQL to SSMS
o Create first ADF pipeline
o Adding Client IP to Firewall settings
o Allow Azure Services in Firewall settings
 
Connectors overview
o Different data source connectors
o How to check connectors options
Copy Activity
o Genera Options
o Source Dataset in copy activity
o Sink Dataset in copy activity
o Mapping
o Settings
o User Properties
Scenario1
o Copy the file from one container to another container from same storage
account
o Copy the file from one storage account container to another storage account
container
Scenario2
o How to copy the data from Azure SQL to Azure Data Lake Gen2
o How to load the data into Azure SQL Table from Azure Data Lake Gen2
o How to delete the files on successful copy of entire folder contents

▪ Delete Activity Overview

● General
● Source
● Logging Settings
● User Properties
Scenario3
o How copy the files with some pattern to another container

▪ Get Metadata Activity Configure to the container

● Field List Options


● Child Items
● Exists
● Item Name
● Item Type
● Last Modified
o How to Get the File Size and delete the files with size greater than 4mb

▪ Configure Get Meta Data to the file

● Field List Options

▪ AND OR Functions in Expression Language

Scenario4
o Conditional Split with if condition

▪ Used activities

● Get Metadata
● Filter
● Foreach
● If Condition
 
Scenario5
o Send an email on Pipeline Failure

▪ Web Activity

● Settings

▪ Logic Apps Integration

Scenario6
o Migrate all tables of database into datalake with single pipeline

▪ Lookup Activity --> Foreach Activity --> Copy Activity --> with

Parameters
Scenario7
o How to apply more than one conditional split in ADF Activities

▪ Switch Activity

Scenario8
o How to run the pipeline until file arrives in Datalake Container

▪ Until Activity

Scenario9
o How to load the filtered data to the azure sql table/Synapse table

▪ Stored Procedure activity


● Pass output of previous activity to the stored procedure
parameters
Scenario10
o Perform Transformation using script activity

▪ Script Activity

Scenario11
o Use cases of Execute pipeline activity
o Use cases of Validation Activity
o Use cases of Wait Activity
o Use cases of Set and Append Variable Activity
Recap of all the activities
o Copy Data, Append Variable, Delete, Execute Pipeline, Get Metadata,
Lookup, Stored Procedure, Script, Set Variable, Validation, Web Activity, Wait,
Filter Activity, For Each Activity, If Condition, Switch, Until
Parameterization
o Parameterize datasets, Linked Services
o Bulk load of data from Azure SQL to Data Lake Storage
Data Flows Introduction
o ELT(Extract Load and Transform)
o Two DataFlows in ADF
o DataFlow Activity in Pipeline
o Mapping Data Flows
o Configuring Mapping Dataflows as Dataflow Activity in PipeLine
o DataFlow Debug Explanation
o Introduction to Transformations

▪ Source

▪ Sink

▪ Union

▪ Filter

▪ Select

▪ Derived Column
▪ Join, etc.

o Difference between source , sink of “Copy Data” activity And source , sink of
“Mapping Data Flows”.
Dataflow Scenario1
o Source as (datalake) Sink as (SQL table) --> by using dataflow source and
sink transformation
Extract data from RDBMS to Datalake --> Apply Filter
o Filter Transformation (Source --> Filter --> Sink) Execute with Dataflow
Activity
Data Flow Scenario on Select Transformation
o Select Transformation

▪ Renaming the columns

▪ Droping the columns

▪ Reordering the columns

▪ Flow

● Source --> Select --> Sink (Execute with Mapping Dataflow)


Derived Column use case
o Derived column Transformation

▪ Clean existing column data

▪ Generate new column by using expression language

▪ Generate Hierarchal Columns

▪ Type casting using derived column transformation

▪ Columns condition

▪ Column pattern

Union Transformation use cases


o Combining one or more datasets with same schema and write output to single
file
o Combining one or more datasets with different schema and write output to
single file
o Union by

▪ Name

▪ Position

o Select with union


Branching Transformation and its use cases
Exists Transformation
o Exists transformation and its use cases

▪ Exists with equality condition

▪ Exists with non-equality condition

● Inconsistent results handling with non-equality conditions


● Fixed broadcast settings
o Exists with Custom Expression
Lookup transformation and its use cases
o Overview of lookup transformation
o Lookup using

▪ Any row

▪ First row

▪ Last row

▪ Match Multiple rows

o Lookup with non-equality conditions


Conditional split transformation and its use cases
o Distribute data into to multiple datasets based on given condition
o Split datasets on

▪ First Matching Condition

▪ All Matching Condition

Join Transformation and its use cases


o Full outer
o Inner
o Left Outer
o Right Outer
o Custom Cross
Aggregate Transformation and its use cases
o Removing duplicates using aggregate
o Single grouping with single aggregation
o Single grouping with multiple aggregation
o Multi groping with multiple aggregation
Pivot Transformation
o What is pivot transformation
o Difference between Aggregate transformation and Pivot Transformation 
o Implement pivot transformation in dataflow
o How to clean pivot output
o How to call multiple dataflows  in a single pipeline 
o Why we used multiple dataflows in one single pipeline
Unpivot Transformation
o Unpivot transformation
o What should be the input for Unpivot (pivoted output file).
o 3 configurations :

▪ Ungrouping column

▪ Unpivoted column (column names to as column values)

▪ aggregated column expression (which row aggregated values to be

turned as column values)


o Difference between output of “aggregate” and “unpivot” transformation
o What additionally unpivot produces
o Use case of Unpivot
Surrogate Key Transformation
o Surrogate Key transformation
o Why we should use Surrogate key
o Configuring Starting Integer Number for Surrogate Key
o Scenario of Surrogate Key
Rank Transformation
o Rank Transformation in dataflows
o Why sorting data is required for Rank transformation
o Sorting options
▪ Ascending

▪ Descending”

o When to use “Ascending” option for Ranking


o When to use “Descending” option for Ranking
o Dense Rank and How it Works
o Non Dense Rank (Normal Rank) and How it Works
o Why we should not use “surrogate key” for Ranking
o What is difference between Dense Rank and Non Dense Rank(Normal Rank)
Window Transformation
o Over
o Sort
o Range by
o Window columns
o Cumulative Average
o Cumulative Sum
o Cumulative Max
o Dense Rank for each partition.
o Making all rows as Single Window and apply Cumulative Aggregation

Stringify Transformation
o Reading Json data
o Reading a single Json record
o Reading array of documents 
o Converting complex types into String using Stringify and derived column
transformations
Parse Transformation
o Parse transformation in Mapping DataFlows
o How to handle and parse string collection( delimited string values )
o How to handle and parse xml data
o How to handle and parse json data
Flatten Transformation
o Converting Complex Json nested structures into CSV/Text file.
o Complex data processing (transformations).
o How parse nested Json records
o How to flatten array of values into multiple rows.
o If data has complex structures, what are supportive data store formats .
o How to write into json format.
o How to write into flatten file format( csv).
o Transformations used to process data.

▪ Parse

▪ Flatten

▪ Derived Column

▪ Select

▪ Sink

Implementation of Incremental data loading using ADF Activities


Implementation of SCD (Slowly Changing Dimensions)
o SCD Type 1
o SCD Type 2
o SCD Type 3
o SCD Type 4
o SCD Type 6
Data Loading to Synapse from Azure Data Lake
o Copy Command
o Polybase
o Bulk Insert
o Upsert
ADD ON:
● Real-time Project Explanation
● Resume Preparation
● Interview Questions Discussions

You might also like