ADF Project - 1

The document outlines steps to move raw data from on-premise to Azure using Azure Data Factory (ADF). It involves: 1. Using an ADF pipeline to move raw data to an Azure data lake with raw, processed, and refined layers. 2. Using data flows to deduplicate data and move it from raw to processed layers, then join data and move it to the refined layer. 3. Moving some refined data to Azure Synapse Analytics for analysis.

Uploaded by

ambati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views3 pages

ADF Project - 1

Uploaded by

ambati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Steps:

From the On-premise we will move the raw data to cloud using ADF pipeline

Here we have 3 layers Raw, Processed Data, Predefined data.( in data lake storage gen2)
Dataflows will read the data from raw layer, there it will check the duplicate and everything. If data is
good then it will move that to processed data.

From the Processed data will join some data and move it to refined data.

From refined data some of the data will move to azure synapse analytics.

Create tables

Create stored procedure

Install the Self hosted IR

Create a connection strings (2 Linked services) for SQL and azure blob. Here we need to use the
keyvaults option.

Create data sets

Create pipeline – lookup activity – write a query select * from table where status<>’succeeded’.

Pass this output to foreach activity and inside this use copy activity – new data set and create
parameter(source schema and source table)
Then sink create dataset referring to Data Lake Storage Gen2.

Then pass this values to stored procedure success and failure. Along with failure stored procedure need
to add web activity

Create logic app( logic app designer – http – gmail – send email.

Here in this scenario it will through error for each time it fails, so to over come this we can do one this is
we need to place the web activity after the for each activity.

One more thing can do is we can add stored procedure and web activity after the for each.

Explanation

1. Lookup is going to read all the file names.

2. Foreach is trying to copy to destination table, if it is succeed/failed then it will update in the
stored procedure against the meta data.(success stored procedure and failed stored procedure)
3. IF the entire foreach is succeeded another stored procedure will reset the status to ready. In
case anything is failed web activity will execute.

So the Data is now in Raw folder

Open a Data flow – select source and sink – windows – turn on the dataflow debug once it is done -
import

In data flow – windows ( used to check duplicate) select each column – add filter then add condition
select rownumber is equal to 1 – select then remove row number column.