Azure Data Engineering Course Content:
Azure Data Factory (ADF)
1. Overview of Azure Data Factory
o What is ADF, its use cases, and real-world applications
o Key components: Linked Services, Datasets, Pipelines, Triggers
2. Building Pipelines
o Creating ETL pipelines from scratch
o Data transformation activities: Copy Data, Mapping Data Flows
o Integration with Blob Storage, SQL Database
3. Parameterization and Dynamic Content
o Using parameters, variables, and expressions
o Dynamic Linked Services and Datasets
4. Pipeline Control Flow
o Conditional execution, For-Each loops, and handling failures
5. Monitoring and Debugging
o Debugging pipelines in real-time
o Monitoring pipeline runs and understanding metrics
6. Integration with Other Azure Services
o Using Databricks notebooks in ADF pipelines
o Moving data to/from Azure Synapse Analytics
Azure Databricks
1. Overview of Azure Databricks
o Databricks architecture and ecosystem
o Setting up workspace: Clusters, Notebooks, and Libraries
2. Introduction to Apache Spark
o Core Spark concepts: RDDs, DataFrames, and Datasets
o Introduction to Delta Lake and its benefits
3. Optimization Techniques
o Understanding shuffling, partitioning, and caching
o Optimizing queries with broadcast joins
4. Structured Streaming
o Processing real-time data streams
o Handling streaming data with Delta Lake
5. Integration with ADF and Synapse
o Connecting Databricks to other Azure services
o Writing processed data to Synapse and Blob Storage
Azure Synapse Analytics
1. Overview of Azure Synapse Analytics
o Synapse architecture: SQL on-demand, Dedicated SQL Pool, Serverless compute
o Key concepts: Partitioning, Indexing, and Data Distribution
2. Data Integration
o Loading data from ADF and Databricks to Synapse
o Working with external tables and PolyBase
3. Performance Tuning
o Query optimization techniques in Synapse
o Managing table partitions and statistics
4. Security and Monitoring
o Implementing Row-Level Security (RLS)
o Monitoring Synapse performance and job activity
5. End-to-End Project
o Design and implement a complete ETL pipeline connecting ADF, Databricks, and
Synapse
o Perform data transformations, aggregations, and reporting using Synapse