0% found this document useful (0 votes)

13 views6 pages

Complete Data Engineering Interview QA

The document provides a comprehensive list of interview questions and answers across various topics including Python, PySpark, SQL, and AWS. It covers both intermediate and advanced levels, detailing key concepts, code examples, and best practices. This resource is designed to help candidates prepare for data engineering interviews.

Uploaded by

tejaswini6299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

Complete Data Engineering Interview QA

Uploaded by

tejaswini6299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Top Data Engineering Interview Questions with Answers

---

## Python Interview Questions

### Intermediate

1. What are Python’s key features?

**Answer:** Python is:
- Interpreted and dynamically typed, so no need for compiling or declaring variables.
- Object-oriented and supports multiple programming paradigms.
- Highly readable and concise.
- Comes with a large standard library.
- Portable and supports integration with other languages.

2. Explain list comprehension with an example.

**Answer:** It's a concise way to create lists.
```python
nums = [x for x in range(10) if x % 2 == 0] # [0, 2, 4, 6, 8]
```

3. What is the difference between `is` and `==`?

**Answer:**
- `==` compares values.
- `is` compares memory locations (object identity).
```python
a = [1, 2]; b = [1, 2]
a == b # True
a is b # False
```

4. **What are *args and kwargs?

**Answer:** They allow variable numbers of arguments.
```python
def example(*args, **kwargs):
print(args, kwargs)

example(1, 2, a=3, b=4) # (1, 2) {'a': 3, 'b': 4}

```

5. How is memory managed in Python?

**Answer:**
- Python uses reference counting and a garbage collector to manage memory.
- Memory is managed in private heap space.

### Advanced

1. Explain Python's GIL.

**Answer:** Global Interpreter Lock allows only one thread to execute at a time in
CPython. It simplifies memory management but limits CPU-bound multi-threading.

2. What are decorators?

**Answer:** Decorators wrap functions to add functionality.
```python
def decorator(func):
def wrapper():
print("Before")
func()
print("After")
return wrapper

@decorator
def greet():
print("Hello")
greet() # Output: Before, Hello, After
```

3. Difference between deep copy and shallow copy.

**Answer:**
- Shallow copy creates a new object but copies references.
- Deep copy copies everything recursively.
```python
import copy
a = [[1, 2]]
shallow = copy.copy(a)
deep = copy.deepcopy(a)
```

4. Python OOP concepts:

- Inheritance: Acquiring properties of parent class.
- Polymorphism: Same interface, different behavior.
- Encapsulation: Data hiding.
- Abstraction: Hiding internal implementation.
5. **Generators vs Iterators.**
**Answer:**
- Iterators: Objects implementing `__iter__()` and `__next__()`.
- Generators: Functions using `yield` to return an iterator.
```python
def gen():
yield 1
yield 2

for i in gen(): print(i)

```

---

## PySpark Interview Questions

### Intermediate

1. **Transformations vs Actions?**
- Transformations are lazy (e.g., `filter`, `map`), actions trigger computation (e.g., `show`,
`collect`).

2. Wide vs Narrow Transformations:

- Narrow: No shuffle, e.g., `map`, `filter`
- Wide: Data shuffle, e.g., `reduceByKey`, `join`

3. **Joins in PySpark:**
```python
df1.join(df2, df1.id == df2.id, 'inner')
```
Types: inner, left, right, outer, semi, anti

4. **Using UDFs:**
```python
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

upper_udf = udf(lambda x: x.upper(), StringType())

df.withColumn("upper_name", upper_udf(df.name)).show()
```

5. **Broadcast variables:**
```python
bc_var = sc.broadcast([1, 2, 3])
print(bc_var.value)
```
Used for performance improvement in joins.

### Advanced

1. Catalyst Optimizer: Optimizes logical and physical plans.

2. **Tungsten Engine:** Improves memory and CPU efficiency.
3. **Coalesce vs Repartition:**
- `coalesce(n)`: Less shuffling
- `repartition(n)`: Full shuffle
4. **Skew handling:** Add salt keys, use broadcast joins.
5. **Delta Lake & Streaming:** ACID compliance + real-time ingestion.

---

## SQL Interview Questions

### Intermediate

1. **CTEs:**
```sql
WITH emp_cte AS (SELECT * FROM employees)
SELECT * FROM emp_cte;
```

2. Recursive Queries: Useful for hierarchies.

```sql
WITH RECURSIVE nums AS (
SELECT 1 AS n
UNION ALL
SELECT n+1 FROM nums WHERE n < 5
) SELECT * FROM nums;
```

3. Pivot/Unpivot: Transform rows to columns and vice versa.

4. **Indexes:** Improve search speed, but slow down insert/update.
5. **Explain/Analyze:** Used to view query execution plans.

### Advanced

1. Second Highest Salary:

```sql
SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM
employees);
```

2. OLAP vs OLTP: OLAP is analytical, OLTP is transactional.

3. **Temporal Queries:**
```sql
SELECT id, LAG(salary) OVER (PARTITION BY id ORDER BY date) FROM salaries;
```

4. Star vs Snowflake Schema:

- Star: Denormalized, fast.
- Snowflake: Normalized, less redundancy.

5. Materialized Views: Store precomputed data for fast reads.

---

## AWS Interview Questions

### Intermediate

1. **S3 vs EBS:**
- S3: Object storage.
- EBS: Block storage.

2. **IAM Concepts:**
- Users, Groups, Roles, Policies

3. **Glue vs EMR:**
- Glue: Serverless ETL
- EMR: Hadoop/Spark cluster

4. Athena SQL on S3: Query directly using SQL without ETL.

5. **Security Best Practices:**
- Least privilege
- Encryption at rest & transit

### Advanced

1. Data Pipeline using S3, Kinesis, Glue, Athena:

Real-time ingestion with Kinesis → Glue transform → S3 → Athena query
2. **Redshift Tuning:** Sort keys, dist keys, vacuum, analyze
3. **Lake Formation:** Secure data lake on top of S3
4. **CloudWatch vs CloudTrail:**
- CloudWatch: Monitoring
- CloudTrail: API activity logging

5. Kinesis Real-Time Example:

```python
import boto3
kinesis = boto3.client('kinesis')
kinesis.put_record(StreamName="my_stream", Data=b"my_data", PartitionKey="key")
```

Data Engineering Interview QA
No ratings yet
Data Engineering Interview QA
4 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
24 pages
Python Interview Questions
No ratings yet
Python Interview Questions
8 pages
DHP Answer
No ratings yet
DHP Answer
11 pages
Python Developer Interview
No ratings yet
Python Developer Interview
9 pages
Question
No ratings yet
Question
6 pages
Top 100 Python Interview Questions For Data Analyst
No ratings yet
Top 100 Python Interview Questions For Data Analyst
10 pages
Untitled Document
No ratings yet
Untitled Document
10 pages
Real Python Interview Questions American Express
No ratings yet
Real Python Interview Questions American Express
7 pages
Interview Prep1
No ratings yet
Interview Prep1
9 pages
Python Interview Questions Dhawal Waghulde
No ratings yet
Python Interview Questions Dhawal Waghulde
3 pages
Python Developer Interview Playbook Full
No ratings yet
Python Developer Interview Playbook Full
6 pages
Interview Questions
No ratings yet
Interview Questions
6 pages
Extracted
No ratings yet
Extracted
8 pages
Viva Answers
No ratings yet
Viva Answers
7 pages
Notes For Fintech Assesment, Cheatsheet
No ratings yet
Notes For Fintech Assesment, Cheatsheet
19 pages
ProfessionalPython PDF
No ratings yet
ProfessionalPython PDF
6 pages
Python Interview QA Fresher
No ratings yet
Python Interview QA Fresher
4 pages
Python 1
No ratings yet
Python 1
14 pages
Python Interview Preparation
No ratings yet
Python Interview Preparation
22 pages
Python Theory Analyst Interview
No ratings yet
Python Theory Analyst Interview
2 pages
Python Imp 001
No ratings yet
Python Imp 001
16 pages
CS Viva Questions XII
No ratings yet
CS Viva Questions XII
2 pages
Deloitte Data Engineer Interview Experience (0-3 Yoe)
No ratings yet
Deloitte Data Engineer Interview Experience (0-3 Yoe)
22 pages
Data Engineer Interview Prep
No ratings yet
Data Engineer Interview Prep
27 pages
@Arcserve@Operations Analyst Hyderabad Remote
No ratings yet
@Arcserve@Operations Analyst Hyderabad Remote
10 pages
Top 50 Python Interview Questions
No ratings yet
Top 50 Python Interview Questions
8 pages
Interviewsss
No ratings yet
Interviewsss
4 pages
PySpark Cheatsheet
100% (1)
PySpark Cheatsheet
12 pages
Data Analytics at NP IT SOLUTIONS
No ratings yet
Data Analytics at NP IT SOLUTIONS
4 pages
Full PySpark Interview QA
No ratings yet
Full PySpark Interview QA
5 pages
Computer Science
No ratings yet
Computer Science
5 pages
Python Core Concepts Cheat Sheet
No ratings yet
Python Core Concepts Cheat Sheet
2 pages
Senior Data Engineer Qna
No ratings yet
Senior Data Engineer Qna
4 pages
Python Syl Lab Us
No ratings yet
Python Syl Lab Us
17 pages
Python PPR Soln
No ratings yet
Python PPR Soln
74 pages
Python Interviews Question
No ratings yet
Python Interviews Question
47 pages
Python Interview Questions
No ratings yet
Python Interview Questions
2 pages
Viva Question For Board 220324 212056
No ratings yet
Viva Question For Board 220324 212056
10 pages
Python BigData Alternative Assignment
No ratings yet
Python BigData Alternative Assignment
5 pages
Python Interview Questions 30
No ratings yet
Python Interview Questions 30
4 pages
Python Interview Cheatsheet
No ratings yet
Python Interview Cheatsheet
4 pages
Python Database Programming Guide
100% (1)
Python Database Programming Guide
17 pages
Python Database Programming Study Material PDF
100% (1)
Python Database Programming Study Material PDF
17 pages
Top 20 Python Interview Q & A
No ratings yet
Top 20 Python Interview Q & A
7 pages
Python
No ratings yet
Python
13 pages
12 CS Set A Anskey
No ratings yet
12 CS Set A Anskey
16 pages
Python Interview Questions Tejal
No ratings yet
Python Interview Questions Tejal
5 pages
Data Science Module 1 Notes and QA
No ratings yet
Data Science Module 1 Notes and QA
4 pages
Comprehensive SQL Python Interview Guide
No ratings yet
Comprehensive SQL Python Interview Guide
4 pages
Python Interview Questions
No ratings yet
Python Interview Questions
2 pages
Pyspark Theory Questions
No ratings yet
Pyspark Theory Questions
5 pages
Python
No ratings yet
Python
23 pages
Answers and Explanations
No ratings yet
Answers and Explanations
32 pages
Python Programming
No ratings yet
Python Programming
13 pages
Interview Questions For 5 Yrs of Exp
No ratings yet
Interview Questions For 5 Yrs of Exp
6 pages
Learn Programming
No ratings yet
Learn Programming
2 pages
BCA Lab Project
No ratings yet
BCA Lab Project
18 pages
Lecture 3 Conditional Statements and Loops in Python
No ratings yet
Lecture 3 Conditional Statements and Loops in Python
42 pages
Create a Triangle with libGDX
No ratings yet
Create a Triangle with libGDX
5 pages
Distributed Computing Unit 1 & 2
No ratings yet
Distributed Computing Unit 1 & 2
4 pages
Anytime A-Star Algorithm
No ratings yet
Anytime A-Star Algorithm
4 pages
OS Record
No ratings yet
OS Record
76 pages
Python Calculator Project
No ratings yet
Python Calculator Project
18 pages
Uc 8051 - Programming 8051 - SRK
No ratings yet
Uc 8051 - Programming 8051 - SRK
55 pages
DYNP - GET - STEPL SAP Function Module - Current Sy-Stepl at Event POV
No ratings yet
DYNP - GET - STEPL SAP Function Module - Current Sy-Stepl at Event POV
4 pages
2021 Online MRC Jan 20-21 - Wsi
No ratings yet
2021 Online MRC Jan 20-21 - Wsi
19 pages
Unit 1
No ratings yet
Unit 1
111 pages
Cse2003 Data-Structures-And-Algorithms Eth 1.0 37 Cse2003
No ratings yet
Cse2003 Data-Structures-And-Algorithms Eth 1.0 37 Cse2003
2 pages
ACD300 Exam Dumps: Authentic Questions and Answers To Boost Your Success
No ratings yet
ACD300 Exam Dumps: Authentic Questions and Answers To Boost Your Success
7 pages
Asynchronous Process Spawning in Glib
No ratings yet
Asynchronous Process Spawning in Glib
4 pages
Tutorial Letter 201/2/2018: Introduction To Programming II
No ratings yet
Tutorial Letter 201/2/2018: Introduction To Programming II
17 pages
31725H Unit6 Pef20200318
No ratings yet
31725H Unit6 Pef20200318
25 pages
601.465/665 - Natural Language Processing Assignment 1: Designing Context-Free Grammars
No ratings yet
601.465/665 - Natural Language Processing Assignment 1: Designing Context-Free Grammars
11 pages
OS MCQ Indiabix
100% (2)
OS MCQ Indiabix
33 pages
Workflow: Advanced Toolpath Accelerator
No ratings yet
Workflow: Advanced Toolpath Accelerator
11 pages
PHOTOBOOTH
No ratings yet
PHOTOBOOTH
6 pages
1.2 MARS Data Cache Simulator Tool
No ratings yet
1.2 MARS Data Cache Simulator Tool
2 pages
Rofile Ummary: Software Engineer Intern
No ratings yet
Rofile Ummary: Software Engineer Intern
1 page
Template Components
No ratings yet
Template Components
10 pages
Technical Assignment - Web Hiring Platform Application
No ratings yet
Technical Assignment - Web Hiring Platform Application
2 pages
Developer Portals Summit Developer Ebook
No ratings yet
Developer Portals Summit Developer Ebook
123 pages
CO COMP6080!1!2025 Term1 T1 Multimodal Standard Kensington
No ratings yet
CO COMP6080!1!2025 Term1 T1 Multimodal Standard Kensington
11 pages
Datagrid View Cell Events
No ratings yet
Datagrid View Cell Events
2 pages
Computer Engineering QBank
No ratings yet
Computer Engineering QBank
6 pages
Practical 3
No ratings yet
Practical 3
3 pages

Complete Data Engineering Interview QA

Uploaded by

Complete Data Engineering Interview QA

Uploaded by

**Top Data Engineering Interview Questions with Answers**

## Python Interview Questions

1. **What are Python’s key features?**

2. **Explain list comprehension with an example.**

3. **What is the difference between `is` and `==`?**

4. **What are *args and **kwargs?**

example(1, 2, a=3, b=4) # (1, 2) {'a': 3, 'b': 4}

5. **How is memory managed in Python?**

1. **Explain Python's GIL.**

2. **What are decorators?**

3. **Difference between deep copy and shallow copy.**

4. **Python OOP concepts:**

for i in gen(): print(i)

## PySpark Interview Questions

2. **Wide vs Narrow Transformations:**

upper_udf = udf(lambda x: x.upper(), StringType())

1. **Catalyst Optimizer:** Optimizes logical and physical plans.

## SQL Interview Questions

2. **Recursive Queries:** Useful for hierarchies.

3. **Pivot/Unpivot:** Transform rows to columns and vice versa.

1. **Second Highest Salary:**

2. **OLAP vs OLTP:** OLAP is analytical, OLTP is transactional.

4. **Star vs Snowflake Schema:**

5. **Materialized Views:** Store precomputed data for fast reads.

## AWS Interview Questions

4. **Athena SQL on S3:** Query directly using SQL without ETL.

1. **Data Pipeline using S3, Kinesis, Glue, Athena:**

5. **Kinesis Real-Time Example:**

You might also like

Top Data Engineering Interview Questions with Answers

1. What are Python’s key features?

2. Explain list comprehension with an example.

3. What is the difference between `is` and `==`?

4. **What are *args and kwargs?

5. How is memory managed in Python?

1. Explain Python's GIL.

2. What are decorators?

3. Difference between deep copy and shallow copy.

4. Python OOP concepts:

2. Wide vs Narrow Transformations:

1. Catalyst Optimizer: Optimizes logical and physical plans.

2. Recursive Queries: Useful for hierarchies.

3. Pivot/Unpivot: Transform rows to columns and vice versa.

1. Second Highest Salary:

2. OLAP vs OLTP: OLAP is analytical, OLTP is transactional.

4. Star vs Snowflake Schema:

5. Materialized Views: Store precomputed data for fast reads.

4. Athena SQL on S3: Query directly using SQL without ETL.

1. Data Pipeline using S3, Kinesis, Glue, Athena:

5. Kinesis Real-Time Example: