Assignment ReportDMLAB

The assignment involved analyzing the PSLM-2020 dataset to estimate poverty based on the World Bank's definition, which classifies individuals living on less than $2.15 per day as impoverished. Key steps included data loading, exploration, cleaning, transformation, and feature engineering to prepare for poverty prediction. The findings highlighted household income disparities and poverty levels, providing insights for policymakers to address poverty-related issues.

Uploaded by

khokharfaraz54

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views5 pages

Assignment ReportDMLAB

Uploaded by

khokharfaraz54

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

ASSIGNEMENTNO.

GROUP MEMBERS

MUHAMMAD FARAZ – 221980007

(LEAD)

HASSAN RASHEE – 221980038

COURSE

DATA MINNING LAB (A)

INSTRUCTOR
MAHAB KHADDIM
ASSIGNMENT-REPORT
Introduction:
The primary purpose of this task was to analyze the PSLM-2020 dataset using the
World Bank’s definition of poverty as the foundation for exploration. According to
the World Bank, a person is considered to be in poverty if their income is below $2.15
per day in terms of purchasing power parity (PPP). This analysis aimed to process the
PSLM-2020 dataset, review its instruction manual, and prepare the data for poverty
prediction and estimation.

Steps to Complete the Task:

1. Understanding the World Bank Definition of Poverty:
The World Bank defines poverty as a condition where individuals live on less than
$2.15 per day (adjusted for PPP). This threshold informed the development of features
and the criteria used for poverty classification in this task.

2. Reviewing the PSLM-2020 Dataset

The PSLM-2020 dataset and its instruction manual were reviewed to:

 Understand the meaning and context of variables.

 Identify income-related fields and other relevant data for poverty analysis, such as
household size, remittances, and value in kind.
 Gain clarity on the dataset structure and missing data policies.

3. Data Loading
The datasets (SecE.sav and roster.sav) were loaded into an analytical environment
for inspection. Initial examination included understanding data types, column names,
and the extent of missing values. The instruction manual was used to interpret
variable meanings and ensure accurate data handling.

4. Data Exploration

Exploratory data analysis (EDA) was conducted to:

 Examine distributions of income-related variables.
 Identify relationships between household size, total income, and poverty.
 Highlight anomalies or irregularities in data entries.

5. Data Cleaning

To prepare the data for analysis:

 Missing values in income-related columns were replaced with zero, assuming

the absence of income data indicated no income from that source.
 Irrelevant variables were removed based on the instruction manual.
 Descriptive column names were assigned for clarity and consistency.

6. Data Transformation

Key transformations were applied to prepare the data for poverty prediction:

 Household size was calculated by grouping individuals by their household ID

(hhcode).
 Datasets were merged using a unique household identifier to consolidate
income data with demographic details.
 Income components, such as monthly income, annual income, remittances,
and value in kind, were normalized for uniform analysis.

7. Feature Engineering

Derived features were created for poverty estimation:

 Total income was calculated as the sum of all income components.

 Each household’s daily income per person was calculated by dividing total
income by household size and normalizing for days in a month.
 A binary poverty indicator was created based on whether the daily income per
person was below $2.15.
8. Validation

The pre-processed data was validated by:

 Verifying calculations for total income and daily income per person.
 Sampling data entries to confirm consistency with the original dataset.
 Ensuring compliance with the World Bank’s poverty threshold criteria.

9. Analysis and Visualization

Key insights were drawn, focusing on:

 The proportion of households living below the poverty line.

 Variations in income distribution across regions.
 The relationship between household size and poverty status.

Visualizations were generated to highlight these findings, such as bar charts for
poverty proportions and histograms for income distributions.

Pre-Processing Approach
1. Guided by the Instruction Manual:
Variable selection, handling, and transformations were informed by the PSLM-
2020 dataset instruction manual.
2. World Bank Poverty Definition as Benchmark:
All calculations and features, such as per-person daily income, were
benchmarked against the $2.15/day PPP threshold.
3. Data Integrity:
Steps were taken to ensure no critical data was lost during cleaning. Columns
were renamed and structured for clarity.
4. Feature Engineering:
Income was aggregated across various sources and normalized to a consistent
scale for effective analysis.
5. Validation:
Results were cross-verified to ensure alignment with the World Bank's poverty
criteria.
Conclusion
This task aimed to explore poverty using the PSLM-2020 dataset and the World
Bank’s definition of poverty. By combining robust pre-processing methods with
insights from the dataset’s instruction manual, the data was effectively prepared for
poverty prediction and estimation. The results provide valuable insights into
household income disparities and poverty levels, aiding policymakers and
stakeholders in addressing poverty-related challenges.

NO POVERTY Project Concept Note IBM
No ratings yet
NO POVERTY Project Concept Note IBM
5 pages
ML Techniques for Poverty Analysis
No ratings yet
ML Techniques for Poverty Analysis
19 pages
Pat Data Analysis Manual
No ratings yet
Pat Data Analysis Manual
9 pages
BOSeJ 1 3 Article+3
No ratings yet
BOSeJ 1 3 Article+3
14 pages
AI Project Cycle by Prithviraj Kumar
No ratings yet
AI Project Cycle by Prithviraj Kumar
13 pages
Metadata 01 01 01b
No ratings yet
Metadata 01 01 01b
6 pages
Zhao ManualPovMap PDF
No ratings yet
Zhao ManualPovMap PDF
20 pages
Costa Rica Income Qualification Analysis
No ratings yet
Costa Rica Income Qualification Analysis
15 pages
Laporan Analisis Studi Kasus Klasifikasi Kemiskinan - I Wayan Ardi Satya Putra - Maulana Ihsan
No ratings yet
Laporan Analisis Studi Kasus Klasifikasi Kemiskinan - I Wayan Ardi Satya Putra - Maulana Ihsan
6 pages
Topic 8 Prob and Stats For Economics and Actuary
No ratings yet
Topic 8 Prob and Stats For Economics and Actuary
11 pages
Self Teaching Stata
No ratings yet
Self Teaching Stata
4 pages
Paper 5-Naive Bayes Classifier Algorithm Approach For Mapping
No ratings yet
Paper 5-Naive Bayes Classifier Algorithm Approach For Mapping
5 pages
Poverty Measures: Celia M. Reyes Introduction To Poverty Analysis NAI, Beijing, China Nov. 1-8, 2005
No ratings yet
Poverty Measures: Celia M. Reyes Introduction To Poverty Analysis NAI, Beijing, China Nov. 1-8, 2005
45 pages
Machine Learning in Poverty Mapping
No ratings yet
Machine Learning in Poverty Mapping
30 pages
Statistic Project Demo
No ratings yet
Statistic Project Demo
3 pages
Data Analysis With Stata: Creating A Working Dataset: Gumilang Aryo Sahadewo October 9, 2017 Mep Feb Ugm
No ratings yet
Data Analysis With Stata: Creating A Working Dataset: Gumilang Aryo Sahadewo October 9, 2017 Mep Feb Ugm
25 pages
Addison - Poverty Dynamics - Interdisciplinary Perspectives (Oxford, 2009) PDF
No ratings yet
Addison - Poverty Dynamics - Interdisciplinary Perspectives (Oxford, 2009) PDF
377 pages
Eco Report
No ratings yet
Eco Report
9 pages
Spring 25 DevEc Week4 Lec2 Sozbir
No ratings yet
Spring 25 DevEc Week4 Lec2 Sozbir
60 pages
Multidimesionality of Poverty
No ratings yet
Multidimesionality of Poverty
19 pages
Report 1 AI17C DBM302m KhaiHoan BaoChau VanThu
No ratings yet
Report 1 AI17C DBM302m KhaiHoan BaoChau VanThu
6 pages
IRIS Presentation of Poverty Assessment Indicators
No ratings yet
IRIS Presentation of Poverty Assessment Indicators
42 pages
Analyzing Urban Poverty: A Summary of Methods and Approaches
No ratings yet
Analyzing Urban Poverty: A Summary of Methods and Approaches
66 pages
Class-7-Poverty Measurement
No ratings yet
Class-7-Poverty Measurement
14 pages
Pakistan Economy Poverty
No ratings yet
Pakistan Economy Poverty
47 pages
WBWP Poverty Mapping
No ratings yet
WBWP Poverty Mapping
37 pages
WB Handbook On Poverty and Inequality Summary
No ratings yet
WB Handbook On Poverty and Inequality Summary
17 pages
Poverty Gender Inequality
No ratings yet
Poverty Gender Inequality
7 pages
Dang Et Al (2014) Using Repeated Cross-Sections To Explore Movements in and Out of Poverty
No ratings yet
Dang Et Al (2014) Using Repeated Cross-Sections To Explore Movements in and Out of Poverty
44 pages
VISION Extreme Poverty Clean
No ratings yet
VISION Extreme Poverty Clean
3 pages
Final Shokhrukhsora Toshmukhamedova
No ratings yet
Final Shokhrukhsora Toshmukhamedova
11 pages
Stat Class 5
No ratings yet
Stat Class 5
27 pages
Leveraging Big Data To Combat Poverty A New Frontier in Social Innovation
No ratings yet
Leveraging Big Data To Combat Poverty A New Frontier in Social Innovation
2 pages
Measuring Multidimensional Poverty
No ratings yet
Measuring Multidimensional Poverty
35 pages
S4 - L6 - Balboni Et Al. (2022) Why Do People Stay Poor
No ratings yet
S4 - L6 - Balboni Et Al. (2022) Why Do People Stay Poor
60 pages
SEELECTED Chapter - 2 - Poverty - in - India - An - Analysis - o
No ratings yet
SEELECTED Chapter - 2 - Poverty - in - India - An - Analysis - o
17 pages
Poverty Impact Analysis
No ratings yet
Poverty Impact Analysis
31 pages
CSS 214 - Lecture Note 1
No ratings yet
CSS 214 - Lecture Note 1
5 pages
Loan Default Prediction Analysis
No ratings yet
Loan Default Prediction Analysis
18 pages
Unveiling Income Disparities A Data-Driven Exploration of Socioeconomic Factors.
No ratings yet
Unveiling Income Disparities A Data-Driven Exploration of Socioeconomic Factors.
10 pages
Lecture2 DEV
No ratings yet
Lecture2 DEV
19 pages
Hackaton Presentation - Sustainability
No ratings yet
Hackaton Presentation - Sustainability
65 pages
Module 4 Measuring Poverty Measures (WORLD BANK) (REPORT)
No ratings yet
Module 4 Measuring Poverty Measures (WORLD BANK) (REPORT)
33 pages
Case Study 1
No ratings yet
Case Study 1
20 pages
An Example of A Research Proposal
100% (2)
An Example of A Research Proposal
8 pages
Economics: Social
No ratings yet
Economics: Social
99 pages
Bron Datasets Task
No ratings yet
Bron Datasets Task
10 pages
Poverty Presentation by Zubair Final
No ratings yet
Poverty Presentation by Zubair Final
34 pages
Costa Rican Household Poverty Level Prediction
50% (2)
Costa Rican Household Poverty Level Prediction
19 pages
Measures of Poverty
No ratings yet
Measures of Poverty
4 pages
Ser Clust1908
No ratings yet
Ser Clust1908
22 pages
Costa Rica Income Qualification Analysis
No ratings yet
Costa Rica Income Qualification Analysis
40 pages
Eco615 Final
No ratings yet
Eco615 Final
81 pages
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
No ratings yet
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
16 pages
Income Prediction Analysis
No ratings yet
Income Prediction Analysis
16 pages
4b World Bank 17
No ratings yet
4b World Bank 17
22 pages
Measuring Poverty
No ratings yet
Measuring Poverty
47 pages
Module 4 - STAT 311 - 022001
No ratings yet
Module 4 - STAT 311 - 022001
29 pages
Statistical Tools and Techniques: College-Level Notes
No ratings yet
Statistical Tools and Techniques: College-Level Notes
14 pages
Money Machine: A Trailblazing American Venture in China Weijian Shan PDF Download
0% (1)
Money Machine: A Trailblazing American Venture in China Weijian Shan PDF Download
131 pages
Stock Market Analysis Course Pack
No ratings yet
Stock Market Analysis Course Pack
9 pages
The Mass Appraisal of The Real Estate by Computational Intelligence
No ratings yet
The Mass Appraisal of The Real Estate by Computational Intelligence
6 pages
Maybe Days 1st Edition Jennifer Wilgocki Download
No ratings yet
Maybe Days 1st Edition Jennifer Wilgocki Download
103 pages
Certified Quality Technician Brochure
50% (2)
Certified Quality Technician Brochure
12 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
8 pages
Summarizing Data (Slides)
No ratings yet
Summarizing Data (Slides)
13 pages
Primary Teachers' First Aid Management of Children's School Day Accidents: Video-Assisted Teaching Method Versus Lecture Method
No ratings yet
Primary Teachers' First Aid Management of Children's School Day Accidents: Video-Assisted Teaching Method Versus Lecture Method
10 pages
Management 14th Edition by Stephen P. Robbins (Ebook PDF) Download
100% (3)
Management 14th Edition by Stephen P. Robbins (Ebook PDF) Download
79 pages
Lesson+8.1+Answer+Key+ +Intro+Stats+ +Stats+Medic
No ratings yet
Lesson+8.1+Answer+Key+ +Intro+Stats+ +Stats+Medic
2 pages
Actuarial GLMs Explained
No ratings yet
Actuarial GLMs Explained
74 pages
AOD Lec7-8 Activities
No ratings yet
AOD Lec7-8 Activities
10 pages
Statistics and Probability q4 Mod21 Calculating The Slope and Y Intercept of A Regression Line V2
No ratings yet
Statistics and Probability q4 Mod21 Calculating The Slope and Y Intercept of A Regression Line V2
24 pages
Data Exploration and Visualisation LP
No ratings yet
Data Exploration and Visualisation LP
4 pages
Encyclopedia of Interfacial Chemistry: Surface Science and Electrochemistry (Vol 1 - Vol 7) Klaus Wandelt (Editor) Download
100% (1)
Encyclopedia of Interfacial Chemistry: Surface Science and Electrochemistry (Vol 1 - Vol 7) Klaus Wandelt (Editor) Download
126 pages
Banking Governance in Mogadishu
No ratings yet
Banking Governance in Mogadishu
13 pages
The Impact of School Nutrition Prigrams On Students Academic Performance
No ratings yet
The Impact of School Nutrition Prigrams On Students Academic Performance
17 pages
Experimental Research Design Thesis PDF
100% (3)
Experimental Research Design Thesis PDF
8 pages
What's New in Ibm Spss Statistics 28: Data Sheet
No ratings yet
What's New in Ibm Spss Statistics 28: Data Sheet
5 pages
ARCH For IPython Notebook - Kevin Sheppard (2021)
100% (1)
ARCH For IPython Notebook - Kevin Sheppard (2021)
470 pages
Surveying Natural Populations Quantitative Tools For Assessing Biodiversity Second Edition Lee-Ann Hayek Available Any Format
100% (4)
Surveying Natural Populations Quantitative Tools For Assessing Biodiversity Second Edition Lee-Ann Hayek Available Any Format
167 pages
Mammographic Mass Detection Using Machine Learning Classifiers
No ratings yet
Mammographic Mass Detection Using Machine Learning Classifiers
10 pages
Time Series Smoothing in Excel
No ratings yet
Time Series Smoothing in Excel
11 pages
Research Methodology & Market Analysis
No ratings yet
Research Methodology & Market Analysis
41 pages
Multiple Regression Analysis Using SPSS Statistics
No ratings yet
Multiple Regression Analysis Using SPSS Statistics
5 pages
Management of Coking Coal Resources 1st Edition Kumar Download
No ratings yet
Management of Coking Coal Resources 1st Edition Kumar Download
140 pages
QUANTITATIVE METHODS (Creswell)
No ratings yet
QUANTITATIVE METHODS (Creswell)
31 pages
Course Feature and Objective: Me4128 Simulation Modelling and Analysis
No ratings yet
Course Feature and Objective: Me4128 Simulation Modelling and Analysis
2 pages