Code Refactoring for Data Scientists

The document describes code that analyzes a wine quality dataset. The code first renames columns to replace spaces with underscores. It then calculates statistics to see how different features relate to wine quality ratings by grouping data into above and below median values and finding the mean quality for each group. The document notes the code could be refactored to make it more clean, modular and efficient.

Uploaded by

Amal Abdallah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views4 pages

Code Refactoring for Data Scientists

Uploaded by

Amal Abdallah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

{

"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Refactor: Wine Quality Analysis\n",
"In this exercise, you'll refactor code that analyzes a wine quality dataset
taken from the UCI Machine Learning Repository [here]
(https://archive.ics.uci.edu/ml/datasets/wine+quality). Each row contains data on a
wine sample, including several physicochemical properties gathered from tests, as
well as a quality rating evaluated by wine experts.\n",
"\n",
"The code in this notebook first renames the columns of the dataset and then
calculates some statistics on how some features may be related to quality ratings.
Can you refactor this code to make it more clean and modular?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"df = pd.read_csv('winequality-red.csv', sep=';')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Renaming Columns\n",
"You want to replace the spaces in the column labels with underscores to be
able to reference columns with dot notation. Here's one way you could've done it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"new_df = df.rename(columns={'fixed acidity': 'fixed_acidity',\n",
" 'volatile acidity': 'volatile_acidity',\n",
" 'citric acid': 'citric_acid',\n",
" 'residual sugar': 'residual_sugar',\n",
" 'free sulfur dioxide': 'free_sulfur_dioxide',\n",
" 'total sulfur dioxide':
'total_sulfur_dioxide'\n",
" })\n",
"new_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here's a slightly better way you could do it. You can avoid making naming
errors due to typos caused by manual typing. However, this looks a little
repetitive. Can you make it better?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"labels = list(df.columns)\n",
"labels[0] = labels[0].replace(' ', '_')\n",
"labels[1] = labels[1].replace(' ', '_')\n",
"labels[2] = labels[2].replace(' ', '_')\n",
"labels[3] = labels[3].replace(' ', '_')\n",
"labels[5] = labels[5].replace(' ', '_')\n",
"labels[6] = labels[6].replace(' ', '_')\n",
"df.columns = labels\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Analyzing Features\n",
"Now that your columns are ready, you want to see how different features of
this dataset relate to the quality rating of the wine. A very simple way you could
do this is by observing the mean quality rating for the top and bottom half of each
feature. The code below does this for four features. It looks pretty repetitive
right now. Can you make this more concise? \n",
"\n",
"You might challenge yourself to figure out how to make this code more
efficient! But you don't need to worry too much about efficiency right now - we
will cover that more in the next section."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_alcohol = df.alcohol.median()\n",
"for i, alcohol in enumerate(df.alcohol):\n",
" if alcohol >= median_alcohol:\n",
" df.loc[i, 'alcohol'] = 'high'\n",
" else:\n",
" df.loc[i, 'alcohol'] = 'low'\n",
"df.groupby('alcohol').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_pH = df.pH.median()\n",
"for i, pH in enumerate(df.pH):\n",
" if pH >= median_pH:\n",
" df.loc[i, 'pH'] = 'high'\n",
" else:\n",
" df.loc[i, 'pH'] = 'low'\n",
"df.groupby('pH').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_sugar = df.residual_sugar.median()\n",
"for i, sugar in enumerate(df.residual_sugar):\n",
" if sugar >= median_sugar:\n",
" df.loc[i, 'residual_sugar'] = 'high'\n",
" else:\n",
" df.loc[i, 'residual_sugar'] = 'low'\n",
"df.groupby('residual_sugar').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_citric_acid = df.citric_acid.median()\n",
"for i, citric_acid in enumerate(df.citric_acid):\n",
" if citric_acid >= median_citric_acid:\n",
" df.loc[i, 'citric_acid'] = 'high'\n",
" else:\n",
" df.loc[i, 'citric_acid'] = 'low'\n",
"df.groupby('citric_acid').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

ML Assgn Logistic Wine Quality - Ipynb - Colab
No ratings yet
ML Assgn Logistic Wine Quality - Ipynb - Colab
5 pages
EDA Assignment Day 14.ipynb
No ratings yet
EDA Assignment Day 14.ipynb
19 pages
Quality Prediction Checkpoint
No ratings yet
Quality Prediction Checkpoint
14 pages
Basic Python Analysis
No ratings yet
Basic Python Analysis
33 pages
Untitledd
No ratings yet
Untitledd
3 pages
Exercise Ipynb
No ratings yet
Exercise Ipynb
3 pages
Pandas Exercises - Ipynb
No ratings yet
Pandas Exercises - Ipynb
11 pages
Eda Red Wine
No ratings yet
Eda Red Wine
16 pages
Practical04.ipynb - Colab
No ratings yet
Practical04.ipynb - Colab
2 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Compte Rendu TP 2 Pandas
No ratings yet
Compte Rendu TP 2 Pandas
2 pages
Decision Trees
No ratings yet
Decision Trees
2 pages
CropYield Prediction Checkpoint - Ipynb
No ratings yet
CropYield Prediction Checkpoint - Ipynb
74 pages
Pandas
No ratings yet
Pandas
91 pages
Pandas Commands
No ratings yet
Pandas Commands
3 pages
Pandas Usefull Code
No ratings yet
Pandas Usefull Code
2 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Project CST 383
No ratings yet
Project CST 383
1,083 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
COT Data
No ratings yet
COT Data
18 pages
Wine
No ratings yet
Wine
22 pages
Wine DS
No ratings yet
Wine DS
14 pages
1 4-EDA Ipynb
No ratings yet
1 4-EDA Ipynb
12 pages
Data Cleaning
No ratings yet
Data Cleaning
22 pages
COT Datjdjdjsjsjsjsjsjsjjsjsjsjsjsjsjsjsjsjjss
No ratings yet
COT Datjdjdjsjsjsjsjsjsjjsjsjsjsjsjsjsjsjsjjss
29 pages
Wine Quality Prediction Using Machine Learning
No ratings yet
Wine Quality Prediction Using Machine Learning
10 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
HW04
No ratings yet
HW04
3 pages
Python Project 2 Colab
No ratings yet
Python Project 2 Colab
6 pages
Pyspark MLlib
No ratings yet
Pyspark MLlib
4 pages
Different Methods of Plotting
No ratings yet
Different Methods of Plotting
4 pages
Pandas Operations Guide
No ratings yet
Pandas Operations Guide
6 pages
Datamining Exp5 Datanormalisation
No ratings yet
Datamining Exp5 Datanormalisation
14 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Learning Concepts Hackers Realm
No ratings yet
Learning Concepts Hackers Realm
78 pages
Zomoto Data Analysis Using Python - 1
No ratings yet
Zomoto Data Analysis Using Python - 1
10 pages
Python & Pandas Cheat Sheet Guide
100% (2)
Python & Pandas Cheat Sheet Guide
5 pages
Smart Factory Energy Prediction - Ipynb
No ratings yet
Smart Factory Energy Prediction - Ipynb
355 pages
2 Linear Regression-Homeprices - Ipynb
No ratings yet
2 Linear Regression-Homeprices - Ipynb
62 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Empirical Crop Suitability Model 1694688954
No ratings yet
Empirical Crop Suitability Model 1694688954
24 pages
Python & Data Science Cheat Sheet
100% (4)
Python & Data Science Cheat Sheet
11 pages
2 3-SVM Ipynb
No ratings yet
2 3-SVM Ipynb
111 pages
Pandas Dataframe Cheat Sheet
No ratings yet
Pandas Dataframe Cheat Sheet
3 pages
New Text Document
No ratings yet
New Text Document
1 page
IP CH 1 12th
No ratings yet
IP CH 1 12th
3 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
Pandas Notes
No ratings yet
Pandas Notes
5 pages
My Own Cheatsheet
No ratings yet
My Own Cheatsheet
13 pages
Python GPU DataFrames Guide
No ratings yet
Python GPU DataFrames Guide
2 pages
Data Cleaning
No ratings yet
Data Cleaning
40 pages
Import As Import As Import As Import As Import As From Import From Import From Import
No ratings yet
Import As Import As Import As Import As Import As From Import From Import From Import
12 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
Pandas Trampas
No ratings yet
Pandas Trampas
9 pages
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
The Heston Model and Its Extensions in VBA Wiley Finance 1st Edition Rouah Download
100% (8)
The Heston Model and Its Extensions in VBA Wiley Finance 1st Edition Rouah Download
80 pages
Non Metallic Piping Analysis in AutoPIPE Workbook
No ratings yet
Non Metallic Piping Analysis in AutoPIPE Workbook
31 pages
Description: Print
No ratings yet
Description: Print
4 pages
Design and Development of Double Chamber Centrifugal De-Huller For Millets
No ratings yet
Design and Development of Double Chamber Centrifugal De-Huller For Millets
179 pages
Academy of Technology: Powerpoint Presentation On Understanding The Features of Object-Oriented Programming
No ratings yet
Academy of Technology: Powerpoint Presentation On Understanding The Features of Object-Oriented Programming
10 pages
Newsvendor Model for Swimsuits
No ratings yet
Newsvendor Model for Swimsuits
8 pages
Database System With Administration: Technical Assessment
100% (2)
Database System With Administration: Technical Assessment
13 pages
Curriculum Vitae Frank Otis Bryan
No ratings yet
Curriculum Vitae Frank Otis Bryan
8 pages
STOBER Drive Systems Catalog
No ratings yet
STOBER Drive Systems Catalog
240 pages
Bab 4
No ratings yet
Bab 4
9 pages
Topic: Transformation Answer All Questions.: 1 A) B) C) P
No ratings yet
Topic: Transformation Answer All Questions.: 1 A) B) C) P
8 pages
Service Manual: SS-WG880
100% (1)
Service Manual: SS-WG880
6 pages
Abbotsleigh 2016 3U Trials Solutions
No ratings yet
Abbotsleigh 2016 3U Trials Solutions
28 pages
Concept Development UNIT V
No ratings yet
Concept Development UNIT V
12 pages
RK900-05 Wireless Home Weather Station
No ratings yet
RK900-05 Wireless Home Weather Station
3 pages
Application of Adaptive Neuro-Fuzzy Inferen
No ratings yet
Application of Adaptive Neuro-Fuzzy Inferen
14 pages
Figlio Etal 2015
No ratings yet
Figlio Etal 2015
10 pages
Ahmed's Integral Solution Unveiled
No ratings yet
Ahmed's Integral Solution Unveiled
3 pages
Thousand Eyes
No ratings yet
Thousand Eyes
43 pages
4.1 - Interpreting Statistics
No ratings yet
4.1 - Interpreting Statistics
3 pages
1+2+1 Voltage Regulator For Intel Imvp8™ Cpus: Data Short
100% (1)
1+2+1 Voltage Regulator For Intel Imvp8™ Cpus: Data Short
2 pages
Batch Electrochemical Production of Sodium Hypochlorite PH Change and Influence of Alkalinity
No ratings yet
Batch Electrochemical Production of Sodium Hypochlorite PH Change and Influence of Alkalinity
7 pages
Dhamdhere OS2E Chapter 03 Power Point Slides 2
No ratings yet
Dhamdhere OS2E Chapter 03 Power Point Slides 2
61 pages
Godel and The End of Physics - Stephen Hawking
No ratings yet
Godel and The End of Physics - Stephen Hawking
4 pages
FALLSEM2017-18 EEE2005 ETH TT423 VL2017181000234 Reference Material I 13-Chebyshev Lowpass Filter
No ratings yet
FALLSEM2017-18 EEE2005 ETH TT423 VL2017181000234 Reference Material I 13-Chebyshev Lowpass Filter
9 pages
Ntop Resources - Lattices
No ratings yet
Ntop Resources - Lattices
3 pages
2022 Kilbaha Exam 2
No ratings yet
2022 Kilbaha Exam 2
27 pages
Payam S2 Result
No ratings yet
Payam S2 Result
1 page
Diesel Spray Penetration Analysis
No ratings yet
Diesel Spray Penetration Analysis
8 pages
Complex Numbers - Polar Form
No ratings yet
Complex Numbers - Polar Form
59 pages

Code Refactoring for Data Scientists

Uploaded by

Code Refactoring for Data Scientists

Uploaded by

{

You might also like