JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA – 533 003, Andhra Pradesh, India
CSE (DS) (R23-IInd YEAR COURSE STRUCTURE & SYLLABUS)
II Year II Semester L T P C
0 0 3 1.5
DATA ENGINEERING LAB
Course Objective:
The main objective of this course is to teach how build data engineering infrastructure
and data pipelines.
Course Outcomes:
At the end of the course student will be able to:
1. Build our Data Engineering Infrastructure
2. Demonstrate Reading and Writing files
3. Build Data Pipelines and integrate with Dashboard
4. Deploy the Data Pipeline in production
Experiments:
1. Installing and configuring Apache NiFi, Apache Airflow
2. Installing and configuring Elasticsearch, Kibana, PostgreSQL, pgAdmin 4
3. Reading and Writing files
a. Reading and writing files in Python
b. Processing files in Airflow
c. NiFi processors for handling files
d. Reading and writing data to databases in Python
e. Databases in Airflow
f. Database processors in NiFi
4. Working with Databases
a. Inserting and extracting relational data in Python
b. Inserting and extracting NoSQL database data in Python
c. Building database pipelines in Airflow
d. Building database pipelines in NiFi
5. Cleaning, Transforming and Enriching Data
a. Performing exploratory data analysis in Python
b. Handling common data issues using pandas
c. Cleaning data using Airflow
6. Building the Data Pipeline
7. Building a Kibana Dash Board
8. Perform the following operations
a. Staging and validating data
b. Building idempotent data pipelines
c. Building atomic data pipelines
9. Version Control with the NiFi Registry
36
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA – 533 003, Andhra Pradesh, India
CSE (DS) (R23-IInd YEAR COURSE STRUCTURE & SYLLABUS)
a. Installing and configuring the NiFi Registry
b. Using the Registry in NiFi
c. Versioning your data pipelines
d. Using git-persistence with the NiFi Registry
10. Monitoring Data Pipelines
a. Monitoring NiFi in the GUI
b. Monitoring NiFi using processors
c. Monitoring NiFi with Python and the REST API
11. Deploying Data Pipelines
a. Finalizing your data pipelines for production
b. Using the NiFi variable registry
c. Deploying your data pipelines
12. Building a Production Data Pipeline
a. Creating a test and production environment
b. Building a production data pipeline
c. Deploying a data pipeline in production
Reference Books:
1. Paul Crickard , Data Engineering with Python,Packt Publishing, October 2020.
37