Azure Data Factory - Interview Concepts
Main Concepts
1. Data Factory: Cloud-based ETL service used to create and manage data pipelines. Moves and
transforms data across sources.
2. Pipeline: A container for a group of activities that define the workflow of data movement and
transformation.
3. Activity: A single step in a pipeline like Copy Activity, Data Flow Activity, etc.
4. Datasets: Define the schema or structure of input/output data used in a pipeline.
5. Linked Services: Like a connection string that defines how to connect to data sources.
6. Triggers: Used to schedule pipeline execution (time-based, event-based, manual).
7. Integration Runtime (IR): Compute infrastructure used for moving and transforming data. Types:
Azure IR, Self-hosted IR, Azure-SSIS IR.
8. Data Flow: Visual way to transform data at scale using a no-code interface.
9. Copy Activity: Moves data from source to destination (e.g., from SQL to Blob).
10. Monitoring: Allows checking pipeline status, debugging, and viewing logs.
Sample Interview Questions
1. What is Azure Data Factory?
2. What is the difference between pipeline and activity?
3. How do you schedule a pipeline in ADF?
4. What are linked services and datasets in ADF?
5. What is Integration Runtime and why is it important?
6. What is the difference between Azure IR and Self-hosted IR?
7. How do you monitor or troubleshoot a failed pipeline?
8. Explain how Copy Activity works.
9. What types of triggers are available in ADF?
10. Can ADF be used for both batch and real-time data integration?