[go: up one dir, main page]

0% found this document useful (0 votes)
46 views4 pages

Data Engineer Interview Question

Uploaded by

Harish Pininti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views4 pages

Data Engineer Interview Question

Uploaded by

Harish Pininti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

To walk you through an end-to-end solution for migrating data from an on-premises SQL Server DB to

an Azure SQL DB using Azure Data Factory (ADF), I'll break it down into clear steps. This includes
understanding why we need a self-hosted integration runtime (IR) instead of the auto-resolve IR, and
how to configure it all correctly.

Overview:

The goal is to create a Data Pipeline in ADF that moves data from an on-prem SQL Server to an Azure
SQL Database. The main challenge here is that the auto-resolve integration runtime won’t work for on-
premises data sources because it doesn't have access to on-prem networks. Instead, you will use a self-
hosted integration runtime, which you install on a machine within your on-premises environment.

Here’s how you can approach this process:

Step 1: Set Up the Azure SQL Database

First, you need to have your Azure SQL Database set up in your Azure environment. If it's not created
yet, follow these steps:

1. In the Azure portal, navigate to SQL Databases and click + Add to create a new database.

2. Follow the prompts to create your database and server. Make sure to note down the server
name, username, and password.

Step 2: Create an Azure Data Factory (ADF) Instance

1. In the Azure portal, go to Create a resource and search for Data Factory.

2. Follow the wizard to create your ADF instance, select the region where your resources are
located, and click Create.

3. Once created, navigate to the Data Factory UI by going to the Author & Monitor section.

Step 3: Install and Configure the Self-Hosted Integration Runtime (IR)

Since you’re dealing with an on-premises SQL Server DB, you need the Self-Hosted IR to facilitate the
connection between ADF (in the cloud) and your on-prem server (on your network).

Install the Self-Hosted IR:

1. In Azure Data Factory, go to Manage (gear icon on the left) > Integration Runtimes.

2. Click + New and select Self-hosted.

3. Download the Self-Hosted IR installer from the prompt.

4. Install the software on a machine that has access to your on-prem SQL Server. This machine
should be able to connect to your on-prem SQL Server database and have outbound access to
Azure.
Configure the Self-Hosted IR:

1. During the installation, the wizard will ask for a key (provided in the ADF portal) to authenticate
and link the IR to your ADF instance.

2. After installation, verify the IR is running by checking the status in the ADF portal (under
Integration Runtimes). The status should be "Running."

The Self-hosted IR allows secure communication between your on-prem systems and the cloud. It
ensures that your data can be transferred securely through your network.

Step 4: Set Up Linked Services in ADF

Linked Services define the connections to your data sources and destinations.

1. Create a Linked Service for On-Prem SQL Server:

o Go to Manage > Linked Services > + New.

o Choose SQL Server as the connector.

o For Connection type, select On-premises and use the Self-hosted IR you installed.

o Provide the necessary connection information for your on-prem SQL Server: server
name, database name, and authentication details.

2. Create a Linked Service for Azure SQL Database:

o Similarly, create another Linked Service for your Azure SQL Database.

o Choose Azure SQL Database and enter the server, database, username, and password.

Step 5: Create the Data Pipeline in ADF

Once the linked services are set up, you can create a data pipeline to move data.

1. In ADF, go to the Author tab (pencil icon).

2. Click + New pipeline.

3. Add a Copy Data activity:

o Under Source, choose the on-prem SQL Server linked service you created.

o Configure the source dataset to point to the table(s) or data you want to move.

o Under Sink, select the Azure SQL Database linked service and configure the target
dataset (your Azure SQL database tables).

4. Mapping: You can map columns between the source (on-prem SQL Server) and the sink (Azure
SQL Database) if they have different column names or data types.
5. Set any additional options you need (like fault tolerance, data transformation, etc.).

Step 6: Test the Pipeline and Monitor the Data Transfer

1. Debug the pipeline: Before running the pipeline, you can test it by clicking Debug to ensure
everything is working properly.

2. Run the pipeline: Once you’re ready, you can trigger the pipeline manually, or you can schedule
it to run at specific intervals.

3. Monitor the pipeline: After the pipeline runs, you can check the Monitor tab in ADF to view logs
and monitor the success or failure of your data transfer.

Step 7: Automate and Schedule the Pipeline (Optional)

If you want the pipeline to run on a regular schedule (for example, daily, weekly, or at specific times), you
can:

1. Go to the Trigger section in the pipeline.

2. Click New/Edit to create a schedule or event-based trigger.

Why Use a Self-Hosted Integration Runtime?

To clarify your follow-up question: Why isn’t the auto-resolve IR sufficient for on-prem to cloud
migrations?

1. Auto-Resolve IR: This IR is used for cloud-to-cloud data movement. It automatically configures
itself and doesn’t have the capability to securely access on-premises systems.

2. Self-Hosted IR: This is required for on-premises data sources because it enables ADF to securely
interact with on-prem SQL Server or other on-prem data stores. It acts as a bridge between the
cloud and on-premises environments. The Self-hosted IR runs on a machine within your on-prem
network, which allows it to access local data sources and securely transfer the data to Azure.

Without the Self-hosted IR, Azure Data Factory wouldn’t be able to access your on-prem resources due
to networking restrictions and security concerns (firewalls, VPNs, etc.).

Summary of Key Steps:

1. Set up Azure SQL Database.

2. Create an ADF instance.

3. Install and configure the Self-hosted Integration Runtime on an on-prem server.


4. Set up linked services for both on-prem SQL Server and Azure SQL Database.

5. Create and configure a Copy Data pipeline in ADF.

6. Run and monitor the pipeline to ensure data is transferred.

7. Optionally, schedule the pipeline for automated, recurring data transfers.

This approach will help you securely move data from an on-prem SQL Server to Azure SQL Database
using Azure Data Factory.

You might also like