[go: up one dir, main page]

0% found this document useful (0 votes)
36 views4 pages

Data Engineering Interview QA

The document provides a comprehensive list of interview questions and answers for data engineering roles, covering topics in Python, PySpark, SQL, and AWS. It includes both intermediate and advanced questions, addressing key concepts such as memory management in Python, transformations in PySpark, and AWS services. The format includes code examples to illustrate answers effectively.

Uploaded by

tejaswini6299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views4 pages

Data Engineering Interview QA

The document provides a comprehensive list of interview questions and answers for data engineering roles, covering topics in Python, PySpark, SQL, and AWS. It includes both intermediate and advanced questions, addressing key concepts such as memory management in Python, transformations in PySpark, and AWS services. The format includes code examples to illustrate answers effectively.

Uploaded by

tejaswini6299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Top Data Engineering Interview

Questions with Answers


## Python Interview Questions

### Intermediate

1. **What are Python’s key features?**


**Answer:** Interpreted, dynamically typed, object-oriented, portable, and has extensive
libraries.

2. **Explain list comprehension with an example.**


```python
nums = [x for x in range(10) if x % 2 == 0]
```

3. **What is the difference between `is` and `==`?**


- `==` checks value equality.
- `is` checks object identity.

4. **What are *args and **kwargs?**


```python
def example(*args, **kwargs):
print(args, kwargs)
```

5. **How is memory managed in Python?**


- Managed using reference counting and garbage collection.

### Advanced

1. **Explain Python's GIL.**


**Answer:** Global Interpreter Lock allows only one thread to execute Python bytecode at
a time.

2. **What are decorators?**


```python
def decorator(func):
def wrapper():
print("Before")
func()
print("After")
return wrapper

@decorator
def greet():
print("Hello")
```

3. **Difference between deep copy and shallow copy.**


- `copy()` creates a shallow copy.
- `deepcopy()` creates a full independent copy.

4. **Python OOP concepts:** Inheritance, polymorphism, encapsulation, abstraction.

5. **Generators vs Iterators.**
```python
def gen():
yield 1
yield 2
```

## PySpark Interview Questions

### Intermediate

1. **Transformations vs Actions?**
```python
df.filter(df.age > 30) # Transformation
df.show() # Action
```

2. **Wide vs Narrow Transformations:**


- Narrow: `filter`, `map` (no shuffling)
- Wide: `groupBy`, `join` (requires shuffle)

3. **Joins in PySpark:**
```python
df1.join(df2, df1.id == df2.id, 'inner')
```

4. **Using UDFs:**
```python
udf_func = udf(lambda x: x.upper(), StringType())
df.withColumn("upper", udf_func(df.name))
```

5. **Broadcast variables:**
```python
broadcast_var = sc.broadcast([1, 2])
acc = sc.accumulator(0)
```

### Advanced

1. **Catalyst Optimizer & Tungsten Engine**


2. **Coalesce vs Repartition**
```python
df.coalesce(1), df.repartition(10)
```
3. **Skew handling:** Salt keys, broadcast small tables
4. **Performance tuning:** Cache, partitioning, pruning
5. **Delta Lake + Structured Streaming**

## SQL Interview Questions

### Intermediate

1. **CTEs:**
```sql
WITH temp AS (SELECT * FROM employees)
SELECT * FROM temp;
```

2. **Recursive Queries**
3. **Pivot/Unpivot**
4. **Indexes:** Speed up search queries
5. **Explain/Analyze** for query plans

### Advanced

1. **Second Highest Salary:**


```sql
SELECT MAX(salary) FROM emp WHERE salary < (SELECT MAX(salary) FROM emp);
```
2. **OLAP vs OLTP**
3. **Temporal Queries**
```sql
SELECT id, LAG(salary) OVER (...) FROM table;
```
4. **Star vs Snowflake Schema**
5. **Materialized Views**

## AWS Interview Questions

### Intermediate

1. **S3 vs EBS**
2. **IAM concepts**
3. **Glue vs EMR**
4. **Athena SQL on S3**
5. **Security Best Practices**

### Advanced

1. **Data Pipeline using S3, Kinesis, Glue, Athena**


2. **Redshift Tuning**
3. **AWS Lake Formation**
4. **CloudWatch vs CloudTrail**
5. **Kinesis Real-Time Example:**
```python
boto3.client('kinesis').put_record(...)
```

You might also like