0% found this document useful (0 votes)

20 views14 pages

Questionaire - Case Study

Case Study

Uploaded by

nitya.e17352

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views14 pages

Questionaire - Case Study

Case Study

Uploaded by

nitya.e17352

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Page | 1

ASSIGNMENT: BUILDING A DATA PIPELINE ON AWS

Objective:
To design and implement a data pipeline on Amazon Web Services (AWS) that ingests,
transforms, and analyzes a health data dataset to extract valuable insights.

Dataset:
You can get dataset from below url:
Alzheimers-Disease-and-Healthy-Aging-Data

Tasks:
I. Data Ingestion:
o Create an S3 bucket to store the dataset.
o Use AWS mechanism (for example Glue) to extract the data from the S3 bucket
and load it into an Amazon Redshift data warehouse.
II. Data Cleaning and Transformation:
o Use AWS EMR data processing framework like to clean and transform the data.
This may involve tasks such as:
 Handling missing values (If needed)
 Removing outliers (If needed)
 Normalizing data (If needed)
 Creating derived features
III. Data Analysis:
o Use PowerBI or a machine learning framework like SageMaker to perform
exploratory data analysis and extract insights. This may include:
 Calculating summary statistics
 Creating visualizations
 Building predictive models
IV. Data Visualization:
o Use a tool like Amazon QuickSight to create interactive dashboards and
visualizations to communicate the insights to stakeholders.
Deliverables:
 A detailed design document outlining the data pipeline architecture, data ingestion and
transformation steps, and analysis techniques.
 The AWS project and code used to implement the pipeline.
 A presentation summarizing the key findings and insights from the data analysis.

Note: Remember to monitor costs when using AWS services. For example, running an EMR
cluster 24/7 can be expensive.
Page | 2

ASSIGNMENT: BUILDING A DATA PIPELINE USING RDBMS

Objective:

Design and implement a data pipeline using a relational database management system (RDBMS)
to ingest, transform, and analyze a health data dataset to derive key insights.

Dataset:

You can access the dataset from the following URL:

Alzheimers-Disease-and-Healthy-Aging-Data

Tasks:

1. Data Ingestion:

 Set up a relational database (e.g., MySQL, PostgreSQL).

 Load the dataset into the database using a tool like pgAdmin or MySQL Workbench.

2. Data Cleaning and Transformation:

 Use SQL queries to clean and transform the data, including tasks like:
o Handling missing values (if needed)
o Removing outliers (if needed)
o Normalizing the data (if needed)
o Creating new columns based on existing data (derived features)

3. Data Analysis:

 Use SQL to perform exploratory data analysis:

o Calculate summary statistics
o Query specific insights from the data

4. Data Visualization:

 Export the results to visualization tools like Excel, Tableau, or PowerBI.

 Create charts or graphs to communicate key findings.

Deliverables:
Page | 3

 A design document describing the pipeline, steps for data ingestion, transformation, and
analysis techniques.
 The SQL scripts used for data cleaning, transformation, and analysis.
 A summary presentation highlighting key findings and visualizations.
Page | 4

WORKING ON ASSIGNMENT 2

ASSIGNMENT: BUILDING A DATA PIPELINE USING RDBMS

Objective:

Design and implement a data pipeline using a relational database management system (RDBMS)
to ingest, transform, and analyze a health data dataset to derive key insights.

Dataset:

Alzheimers-Disease-and-Healthy-Aging-Data

NOTE: Here in this file it contains 284143 serial numbers. So, I am taking first 100 values to
apply the functions.

Tasks:

1. Data Ingestion:

 Set up a relational database (e.g., MySQL, PostgreSQL).

 Load the dataset into the database using a tool like pgAdmin or MySQL Workbench.
Page | 5

Fig: 1.1 Load the dataset into the database using MySQL Workbench.

2. Data Cleaning and Transformation:

Areas that need to be work on.

2.1 Remove the Duplicates (if needed).

2.2 Standardize the Data (if needed).

2.3 Null Values and Blank Values (if needed).

2.4 Remove any column (if needed).

Page | 6

Step2.1.1: Remove the Duplicates (if needed).

Copy all the raw data to this new table as “Alzheimer”. To make a copy of this is because of
saving the original data; if any wrong changes happens then we have a copy of original data.
Page | 7

Step 2.1.2 Remove the duplicated values.

The “Row_Num” as 1, if it is greater than 2 that means there is duplicate value in the data.
Page | 8

Result: No Duplicate values in the table.

STEP 2.2.1 Standardize the Data (if needed).

Finding the issues in the data and fixing it.

Page | 9

Fig2.2.1.1 There is no duplicate values in the table corresponding to their Class, Topic and
Question.

2.3 Null Values and Blank Values (if needed).

Fig2.3.1 No NULL value in the table.

P a g e | 10

Fig2.3.2 “Double check” No NULL value in the table.

Fig2.3.3 Table contains empty values/missing values.

So, before doing any modification just check out the values importance.
P a g e | 11

Here every row has data, which has some relation to its column values. Either they can be
deleted which is not a good option or we can update the value in the table, without doing
modification in the actual table. So, Instead of deleting the values, let me update it with new
data values (Hypothetically Situation). First check out the number of values.

Fig2.3.4 Here 150 values, are there having blank values, corresponding to this join.

Fig2.3.5 Here 57 values are changed.

P a g e | 12

Now, checking the values for NULL, to make sure that I want to either delete the data or change
the data, one side I need to be sure that yes, I want to delete it. Therefore, honestly I am not
sure the use of this NULL data as of 100 %. So, deleting this NULL value data.

Fig2.3.6 Fetch the NULL values and deleted them.

P a g e | 13

2.4 Remove any column (if needed).

Fig2.4.1 Drop the un-used columns from the table.

P a g e | 14

Highlighting key findings and visualizations.

Simplify Complex Data

Break down complex datasets into smaller, digestible segments to prevent information
overload. Use multiple layers to present intricate relationships.

Choose Relevant Data Points

Include only the most relevant data points that support the data. Avoid overcrowding
visualizations with unnecessary data.

1. Enhanced Understanding.
2. Insight Generation.
3. Effective Communication.
4. Storytelling.
5. Improved Decision-making.

AL ICT - Databse
No ratings yet
AL ICT - Databse
39 pages
Cit143 Introduction To Data Organisation and Management Summary
100% (1)
Cit143 Introduction To Data Organisation and Management Summary
46 pages
DBMS (Database Management System)
100% (1)
DBMS (Database Management System)
87 pages
Data Preprocessing
No ratings yet
Data Preprocessing
67 pages
3 Processing
No ratings yet
3 Processing
79 pages
Ab Initio Advance Concepts
No ratings yet
Ab Initio Advance Concepts
256 pages
Data Preprocessing
No ratings yet
Data Preprocessing
120 pages
Management Information Systems Managing The Digital Firm 15th Edition Laudon Test Bank Instant Download
100% (5)
Management Information Systems Managing The Digital Firm 15th Edition Laudon Test Bank Instant Download
47 pages
DWDM PDF
No ratings yet
DWDM PDF
21 pages
AIS - Chapter 3 Relational Database
No ratings yet
AIS - Chapter 3 Relational Database
24 pages
Lecture 3 Unit 1
No ratings yet
Lecture 3 Unit 1
61 pages
Datapreparation
No ratings yet
Datapreparation
59 pages
Wa0011.
No ratings yet
Wa0011.
183 pages
DSV-S8 Data Cleaning
No ratings yet
DSV-S8 Data Cleaning
34 pages
BCS403 DBMS Lab Manual
No ratings yet
BCS403 DBMS Lab Manual
63 pages
22UCS303 DS-Unit II-N
No ratings yet
22UCS303 DS-Unit II-N
71 pages
2-Data Fundamentals For BI - Part1
No ratings yet
2-Data Fundamentals For BI - Part1
39 pages
UNIT - Introduction - DataScience - New
No ratings yet
UNIT - Introduction - DataScience - New
55 pages
All Worksheets MYSQL - Solutions
No ratings yet
All Worksheets MYSQL - Solutions
11 pages
Lecture 02
No ratings yet
Lecture 02
41 pages
FDS UNIT 1 Part2
No ratings yet
FDS UNIT 1 Part2
47 pages
CS322 - Lec 3 - S25
No ratings yet
CS322 - Lec 3 - S25
42 pages
Data Warehousing and Data Mining Lab: Maharaja Agrasen Institute of Technology, PSP Area, Sector - 22, New Delhi - 110085
No ratings yet
Data Warehousing and Data Mining Lab: Maharaja Agrasen Institute of Technology, PSP Area, Sector - 22, New Delhi - 110085
44 pages
UNIT 2 Data Preprocessing
No ratings yet
UNIT 2 Data Preprocessing
72 pages
Data Preprocessing, Data Warehousing
No ratings yet
Data Preprocessing, Data Warehousing
9 pages
DM Chapter 3
No ratings yet
DM Chapter 3
60 pages
Document
No ratings yet
Document
29 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Grade 8
No ratings yet
Grade 8
58 pages
Intro To Data Analytics - Cleanup & Transformation
No ratings yet
Intro To Data Analytics - Cleanup & Transformation
30 pages
Test pl300
No ratings yet
Test pl300
44 pages
TCS Prep Camp DBMS
No ratings yet
TCS Prep Camp DBMS
120 pages
Data Cleaningin ML
No ratings yet
Data Cleaningin ML
15 pages
DM Unit 3
No ratings yet
DM Unit 3
15 pages
Unit 2
No ratings yet
Unit 2
11 pages
Hgs Phase II
No ratings yet
Hgs Phase II
27 pages
DLM 1 NVDB Prepare GBPE V2.0
No ratings yet
DLM 1 NVDB Prepare GBPE V2.0
30 pages
Lab - SQL Queries Worksheet-2
100% (1)
Lab - SQL Queries Worksheet-2
4 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
OJT-Field Report - Research Project Format 2025
No ratings yet
OJT-Field Report - Research Project Format 2025
9 pages
23042025DM
No ratings yet
23042025DM
5 pages
Introduction To Data Science: Data Science Methodology & Data Preparation DR Shuhaida Mohamed Shuhidan Jan 2025
No ratings yet
Introduction To Data Science: Data Science Methodology & Data Preparation DR Shuhaida Mohamed Shuhidan Jan 2025
34 pages
Data Wrangling
No ratings yet
Data Wrangling
9 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
30 pages
VIPDMTheory Chapter 3
No ratings yet
VIPDMTheory Chapter 3
87 pages
DMW Module 2
No ratings yet
DMW Module 2
32 pages
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
No ratings yet
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
85 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Pre Processing
No ratings yet
Pre Processing
52 pages
Deep Learning Ram
No ratings yet
Deep Learning Ram
21 pages
Lesson 7 Data Description and Diagnostics
No ratings yet
Lesson 7 Data Description and Diagnostics
14 pages
70+ SQL Interview Question With Sample Data
No ratings yet
70+ SQL Interview Question With Sample Data
7 pages
III Unit
No ratings yet
III Unit
4 pages
Data Analytics Complete Syllabus
No ratings yet
Data Analytics Complete Syllabus
5 pages
Step by Step Data Wrangling
No ratings yet
Step by Step Data Wrangling
4 pages
Data Cleaning Thesis
100% (2)
Data Cleaning Thesis
5 pages
Ijst 2021 1178
No ratings yet
Ijst 2021 1178
10 pages
Boonen 2012 2643
No ratings yet
Boonen 2012 2643
28 pages
Data Cleaning and Data Transformation
No ratings yet
Data Cleaning and Data Transformation
13 pages
Data Preprocessing Part 1
No ratings yet
Data Preprocessing Part 1
14 pages
Unit-3 Finalized
No ratings yet
Unit-3 Finalized
9 pages
Data Cleaning Checklist & AI Prompts (40 Prompts)
No ratings yet
Data Cleaning Checklist & AI Prompts (40 Prompts)
10 pages
Se241 Web Tech Lab
No ratings yet
Se241 Web Tech Lab
6 pages
CS Practical File Grade XII
No ratings yet
CS Practical File Grade XII
5 pages
Dog Name
No ratings yet
Dog Name
1 page
Day-4 Preprocessing
No ratings yet
Day-4 Preprocessing
11 pages
Chapter - 2 - Cleaning and Transforming Data
No ratings yet
Chapter - 2 - Cleaning and Transforming Data
27 pages
CIS 2640 Excel 26 - Tables
No ratings yet
CIS 2640 Excel 26 - Tables
9 pages
Module 2 - Data Preprocessing
No ratings yet
Module 2 - Data Preprocessing
16 pages
Chapter 1 - Lecture 1
No ratings yet
Chapter 1 - Lecture 1
54 pages
Lec 9
No ratings yet
Lec 9
1 page
U1 - DA - Data Preprocessing
No ratings yet
U1 - DA - Data Preprocessing
6 pages
REVIEWER
No ratings yet
REVIEWER
9 pages
Checklist For Data
No ratings yet
Checklist For Data
2 pages
Naan Mudhalvan Phase 2
No ratings yet
Naan Mudhalvan Phase 2
13 pages
Unit Iv
No ratings yet
Unit Iv
13 pages
Composite Datatypes: Types: PL/SQL Records PL/SQL Tables Contain Internal Components Are Reusable
No ratings yet
Composite Datatypes: Types: PL/SQL Records PL/SQL Tables Contain Internal Components Are Reusable
14 pages
Fabric Onelake
No ratings yet
Fabric Onelake
89 pages
CS 5402 - Intro To Data Mining October 2020 - January 2021 Semester HW #1
No ratings yet
CS 5402 - Intro To Data Mining October 2020 - January 2021 Semester HW #1
3 pages
Grade Xi Ip Mid Term Paper
No ratings yet
Grade Xi Ip Mid Term Paper
4 pages
Mid Term Project
No ratings yet
Mid Term Project
3 pages
Bana Reviewer
No ratings yet
Bana Reviewer
4 pages
Analysis - BEx Web Application Designer - SAP Library
No ratings yet
Analysis - BEx Web Application Designer - SAP Library
9 pages
Chapter-11 Queries in Base
No ratings yet
Chapter-11 Queries in Base
2 pages
Simple Obiee Architecture
No ratings yet
Simple Obiee Architecture
4 pages
Access Notes
No ratings yet
Access Notes
6 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Hallo Microsoft Excel: Mastering Data Analytics
From Everand
Hallo Microsoft Excel: Mastering Data Analytics
Agus Kurniawan
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet

Questionaire - Case Study

Uploaded by

Questionaire - Case Study

Uploaded by

Page | 1

ASSIGNMENT: BUILDING A DATA PIPELINE ON AWS

ASSIGNMENT: BUILDING A DATA PIPELINE USING RDBMS

You can access the dataset from the following URL:

 Set up a relational database (e.g., MySQL, PostgreSQL).

2. Data Cleaning and Transformation:

 Use SQL to perform exploratory data analysis:

 Export the results to visualization tools like Excel, Tableau, or PowerBI.

ASSIGNMENT: BUILDING A DATA PIPELINE USING RDBMS

 Set up a relational database (e.g., MySQL, PostgreSQL).

2. Data Cleaning and Transformation:

Areas that need to be work on.

2.1 Remove the Duplicates (if needed).

2.2 Standardize the Data (if needed).

2.3 Null Values and Blank Values (if needed).

2.4 Remove any column (if needed).

Step2.1.1: Remove the Duplicates (if needed).

Step 2.1.2 Remove the duplicated values.

Result: No Duplicate values in the table.

STEP 2.2.1 Standardize the Data (if needed).

Finding the issues in the data and fixing it.

2.3 Null Values and Blank Values (if needed).

Fig2.3.1 No NULL value in the table.

Fig2.3.2 “Double check” No NULL value in the table.

Fig2.3.3 Table contains empty values/missing values.

Fig2.3.5 Here 57 values are changed.

Fig2.3.6 Fetch the NULL values and deleted them.

2.4 Remove any column (if needed).

Fig2.4.1 Drop the un-used columns from the table.

Highlighting key findings and visualizations.

Simplify Complex Data

Choose Relevant Data Points

You might also like