PART-3
LEARN
ZURE DATA
FACTORY (ADF)
C .R. Anil Kumar Reddy
www.linkedin.com/in/chenchuanil
Before we dive in to Control Flow and Transformational Activities let
us discuss about important activity called Copy activity
Copy Activity
The Copy Activity in Azure Data Factory (ADF) is one of the core
activities used for data movement. Its primary function is to copy data
from a source to a destination, supporting a variety of on-premises,
cloud-based, and SaaS data sources.
Purpose of Copy Activity
Data Movement: It moves data from a source to a destination
without making significant changes. It can handle structured, semi-
structured, and unstructured data.
ETL/ELT Processes: It serves as the Extract and Load phases in
ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform)
pipelines, where it moves raw data to a data lake or a data
warehouse for further transformation or processing.
www.linkedin.com/in/chenchuanil
Use Cases of Copy Activity
ETL/ELT Pipelines: Copy Activity moves raw data from operational databases
to a data warehouse or lake for further processing.
Cloud Migration: It helps move large volumes of data from on-premises
systems to cloud-based storage or databases.
Data Synchronization: Copy Activity can synchronize data between multiple
systems (e.g., databases and data lakes).
Summary
The Copy Activity in Azure Data Factory is a versatile tool for moving data from
various sources to destinations. It handles data transfers efficiently, supports
multiple data formats, and offers extensive control over performance, fault
tolerance, and monitoring. It plays a crucial role in the data ingestion and
preparation phases of data processing pipelines in ADF.
www.linkedin.com/in/chenchuanil
Here is a detailed explanation of each control flow and
transformation activity in Azure Data Factory:
Control Flow Activities
1. Execute Pipeline Activity
Purpose: This activity allows you to invoke another pipeline within your
primary pipeline. It helps with modularizing and reusing pipelines, improving
manageability.
Use Case: When you need to break down complex data processes into
smaller, reusable pipelines, or when managing dependencies across
pipelines.
Configuration: Specify the pipeline to execute, pass any necessary
parameters, and configure settings like wait-for-completion or timeout
values.
www.linkedin.com/in/chenchuanil
2. Lookup Activity
Purpose: Used to retrieve data from a specified dataset. The
result can be a single row or a list of rows, and it’s commonly
used to get configuration values or check if data exists.
Use Case: When you need to fetch metadata, configuration
details, or conditionally trigger activities based on the retrieved
data.
Configuration: Specify the dataset and the source query or table.
If the lookup returns more than one row, ensure that the first row
is selected.
www.linkedin.com/in/chenchuanil
3. Filter Activity
Purpose: Filters a list of items based on a specified condition.
Use Case: When you want to limit the data processed to only
those items that meet a certain condition.
Configuration: Provide an input dataset and define the filter
condition using expressions.
www.linkedin.com/in/chenchuanil
4. Iteration Activities (Foreach)
Purpose: To iterate over a collection of items and execute
activities for each item.
Use Case: Processing a list of files, rows, or any iterable dataset
one by one.
Configuration: Provide the list or array of items to iterate and
define activities to execute for each iteration.
www.linkedin.com/in/chenchuanil
5. Iteration Activities(Until)
Purpose: Executes activities in a loop until a specific
condition is met.
Use Case: Scenarios where processing needs to continue
until a dataset reaches a target value.
Configuration: Define the exit condition and activities that
will be executed repeatedly.
www.linkedin.com/in/chenchuanil
6. Get Metadata Activity
Purpose: Retrieves metadata (like file size, last modified
date, or row count) from a dataset.
Use Case: Checking the properties of files, tables, or other
data sources before further processing.
Configuration: Specify the dataset and the list of
metadata fields you want to retrieve (like file name, size,
or schema).
www.linkedin.com/in/chenchuanil
7. Validation Activity
Purpose: Validates whether the dataset exists or meets
certain conditions (e.g., non-empty files).
Use Case: To ensure a dataset is available and valid before
proceeding with further pipeline activities.
Configuration: Select the dataset and configure the
validation type, such as whether the file exists or has a
minimum row count.
www.linkedin.com/in/chenchuanil
8. Conditional Activities (If Condition)
Purpose: Executes activities based on a true/false
expression.
Use Case: When you want to branch logic depending on
whether a condition is met.
Configuration: Define an expression to evaluate. If true,
certain activities are executed; if false, alternative
activities are triggered.
www.linkedin.com/in/chenchuanil
9. Conditional Activities (Switch)
Purpose: Executes different sets of activities based on
the value of an expression.
Use Case: When branching into more than two paths
based on values like status codes.
Configuration: Define the expression and configure cases
for each possible value.
www.linkedin.com/in/chenchuanil
10.Web Activity
Purpose: Invokes an external web service.
Use Case: When you need to interact with REST APIs to
send or retrieve data.
Configuration: Specify the endpoint, headers, body, and
authentication (if needed).
www.linkedin.com/in/chenchuanil
11. WebHook Activity
Purpose: Similar to the Web activity but supports long-
running tasks by calling an endpoint and waiting for a
response.
Use Case: Ideal for asynchronous web service calls.
Configuration: Provide the callback URL, headers, and the
expected wait-for-completion status.
www.linkedin.com/in/chenchuanil
Transformational Activities
Purpose: These are activities within mapping data flows that
transform data as it moves through the pipeline.
Common Transformations: Includes activities like joins,
aggregations, filters, and lookups within a data flow.
Use Case: When you need to cleanse, aggregate, or reshape data
before loading it into the final destination.
Configuration: Set up transformations using the data flow UI to
perform complex data operations.
www.linkedin.com/in/chenchuanil
Script and Stored Procedure Activities
Script Activity:
Purpose: Executes SQL scripts against a database.
Use Case: When you need to run raw SQL commands,
such as schema updates or bulk data manipulations.
Configuration: Define the script text or reference a script
file, and specify the target SQL database.
www.linkedin.com/in/chenchuanil
Stored Procedure Activity:
Purpose: Executes a stored procedure in a relational
database.
Use Case: When database logic is encapsulated in stored
procedures.
Configuration: Specify the stored procedure name and
any necessary parameters.
www.linkedin.com/in/chenchuanil
NIL REDDY CHENCHU
Torture the data, and it will confess to anything
DATA ANALYTICS
SHARE IF YOU LIKE THE POST
Lets Connect to discuss more on Data
www.linkedin.com/in/chenchuanil