0% found this document useful (0 votes)

7 views13 pages

00 Logistic Regression

Uploaded by

statminekane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views13 pages

00 Logistic Regression

Uploaded by

statminekane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

___ Copyright by Pierian Data Inc. For more information, visit us at www.pieriandata.

com

Logistic Regression
Imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Data
An experiment was conducted on 5000 participants to study the effects of age and physical
health on hearing loss, specifically the ability to hear high pitched tones. This data displays the
result of the study in which participants were evaluated and scored for physical ability and then
had to take an audio test (pass/no pass) which evaluated their ability to hear high frequencies.
The age of the user was also noted. Is it possible to build a model that would predict someone's
liklihood to hear the high frequency sound based solely on their features (age and physical
score)?

• Features
– age - Age of participant in years
– physical_score - Score achieved during physical exam
• Label/Target
– test_result - 0 if no pass, 1 if test passed
df = pd.read_csv('../DATA/hearing_test.csv')

df.head()

age physical_score test_result

0 33.0 40.7 1
1 50.0 37.2 1
2 52.0 24.7 0
3 56.0 31.0 0
4 35.0 42.9 1

Exploratory Data Analysis and Visualization

Feel free to explore the data further on your own.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 5000 non-null float64
1 physical_score 5000 non-null float64
2 test_result 5000 non-null int64
dtypes: float64(2), int64(1)
memory usage: 117.3 KB

df.describe()

age physical_score test_result

count 5000.000000 5000.000000 5000.000000
mean 51.609000 32.760260 0.600000
std 11.287001 8.169802 0.489947
min 18.000000 -0.000000 0.000000
25% 43.000000 26.700000 0.000000
50% 51.000000 35.300000 1.000000
75% 60.000000 38.900000 1.000000
max 90.000000 50.000000 1.000000

df['test_result'].value_counts()

1 3000
0 2000
Name: test_result, dtype: int64

sns.countplot(data=df,x='test_result')

<AxesSubplot:xlabel='test_result', ylabel='count'>
sns.boxplot(x='test_result',y='age',data=df)

<AxesSubplot:xlabel='test_result', ylabel='age'>

sns.boxplot(x='test_result',y='physical_score',data=df)

<AxesSubplot:xlabel='test_result', ylabel='physical_score'>
sns.scatterplot(x='age',y='physical_score',data=df,hue='test_result')

<AxesSubplot:xlabel='age', ylabel='physical_score'>

sns.pairplot(df,hue='test_result')

<seaborn.axisgrid.PairGrid at 0x19ceae2fd08>
sns.heatmap(df.corr(),annot=True)

<AxesSubplot:>
sns.scatterplot(x='physical_score',y='test_result',data=df)

<AxesSubplot:xlabel='physical_score', ylabel='test_result'>

sns.scatterplot(x='age',y='test_result',data=df)

<AxesSubplot:xlabel='age', ylabel='test_result'>
Easily discover new plot types with a google search! Searching for "3d matplotlib scatter plot"
quickly takes you to: https://matplotlib.org/3.1.1/gallery/mplot3d/scatter3d.html

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df['age'],df['physical_score'],df['test_result'],c=df['test
_result'])

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x19ceaf878c8>
Train | Test Split and Scaling
X = df.drop('test_result',axis=1)
y = df['test_result']

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.1, random_state=101)

scaler = StandardScaler()

scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)

Logistic Regression Model

from sklearn.linear_model import LogisticRegression

# help(LogisticRegression)

# help(LogisticRegressionCV)

log_model = LogisticRegression()

log_model.fit(scaled_X_train,y_train)

LogisticRegression()
Coefficient Interpretation
Things to remember:

• These coeffecients relate to the odds and can not be directly interpreted as in linear
regression.
• We trained on a scaled version of the data
• It is much easier to understand and interpret the relationship between the coefficients
than it is to interpret the coefficients relationship with the probability of the target/label
class.

Make sure to watch the video explanation, also check out the links below:

• https://stats.idre.ucla.edu/stata/faq/how-do-i-interpret-odds-ratios-in-logistic-
regression/
• https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-how-do-i-interpret-odds-
ratios-in-logistic-regression/

The odds ratio

For a continuous independent variable the odds ratio can be defined as:

This exponential relationship provides an interpretation for

β1

The odds multiply by

β
e1
for every 1-unit increase in x.

log_model.coef_

array([[-0.94953524, 3.45991194]])

This means:

• We can expect the odds of passing the test to decrease (the original coeff was negative)
per unit increase of the age.
• We can expect the odds of passing the test to increase (the original coeff was positive)
per unit increase of the physical score.
• Based on the ratios with each other, the physical_score indicator is a stronger predictor
than age.
Model Performance on Classification Tasks
from sklearn.metrics import
accuracy_score,confusion_matrix,classification_report,plot_confusion_m
atrix

y_pred = log_model.predict(scaled_X_test)

accuracy_score(y_test,y_pred)

0.93

confusion_matrix(y_test,y_pred)

array([[172, 21],
[ 14, 293]], dtype=int64)

plot_confusion_matrix(log_model,scaled_X_test,y_test)

<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at
0x19ceb65e588>

# Scaled so highest value=1

plot_confusion_matrix(log_model,scaled_X_test,y_test,normalize='true')

<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at
0x19ceb691b88>
print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.92 0.89 0.91 193

1 0.93 0.95 0.94 307

accuracy 0.93 500

macro avg 0.93 0.92 0.93 500
weighted avg 0.93 0.93 0.93 500

X_train.iloc[0]

age 32.0
physical_score 43.0
Name: 141, dtype: float64

y_train.iloc[0]

# 0% probability of 0 class
# 100% probability of 1 class
log_model.predict_proba(X_train.iloc[0].values.reshape(1, -1))

array([[0., 1.]])

log_model.predict(X_train.iloc[0].values.reshape(1, -1))
array([1], dtype=int64)

Evaluating Curves and AUC

Make sure to watch the video on this!

from sklearn.metrics import

precision_recall_curve,plot_precision_recall_curve,plot_roc_curve

plot_precision_recall_curve(log_model,scaled_X_test,y_test)

<sklearn.metrics._plot.precision_recall_curve.PrecisionRecallDisplay
at 0x19cec76dac8>

plot_roc_curve(log_model,scaled_X_test,y_test)

<sklearn.metrics._plot.roc_curve.RocCurveDisplay at 0x19ceb5c4288>
------

Samplecode (HDPS)
No ratings yet
Samplecode (HDPS)
29 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Prediction - Ipynb - Colab
No ratings yet
Prediction - Ipynb - Colab
7 pages
Ss Project With Python
No ratings yet
Ss Project With Python
9 pages
Heart Disease Report With Comments and Code
No ratings yet
Heart Disease Report With Comments and Code
9 pages
Student Performance Analysis and Prediction 2.3
No ratings yet
Student Performance Analysis and Prediction 2.3
19 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
Analyzing Student Performance in Exams Using Python
No ratings yet
Analyzing Student Performance in Exams Using Python
11 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Cardiovascular Disease Prediction
No ratings yet
Cardiovascular Disease Prediction
2 pages
Admission Prediction Guide
No ratings yet
Admission Prediction Guide
13 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Health Risk Prediction
No ratings yet
Health Risk Prediction
80 pages
Class 14 - Basic Coding in Python - 5
No ratings yet
Class 14 - Basic Coding in Python - 5
24 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
A09Ass05 - Jupyter Notebook
No ratings yet
A09Ass05 - Jupyter Notebook
15 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Heart Disease Prediction Using ML
No ratings yet
Heart Disease Prediction Using ML
16 pages
Statistics
No ratings yet
Statistics
163 pages
Statistics in Python
No ratings yet
Statistics in Python
19 pages
ML Lab: Healthcare Data Analysis
No ratings yet
ML Lab: Healthcare Data Analysis
16 pages
Group 6 CC07
No ratings yet
Group 6 CC07
36 pages
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
No ratings yet
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
15 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
16 pages
Data Analysis W Pandas
No ratings yet
Data Analysis W Pandas
4 pages
Aerofit - Business - Case - JupyterLab
No ratings yet
Aerofit - Business - Case - JupyterLab
36 pages
Name: Le Ho Thao Nguyen Student ID: 20194224
No ratings yet
Name: Le Ho Thao Nguyen Student ID: 20194224
9 pages
Diabetic Retinopathy Risk Modeling
No ratings yet
Diabetic Retinopathy Risk Modeling
24 pages
Diabetes Prediction with Logistic Regression
No ratings yet
Diabetes Prediction with Logistic Regression
9 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
Machine Learning: Technical Requirements & Data Processing Guide
No ratings yet
Machine Learning: Technical Requirements & Data Processing Guide
30 pages
Web Application
No ratings yet
Web Application
13 pages
Unit 2 Supervised Learning
No ratings yet
Unit 2 Supervised Learning
20 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
Heart - Disease - 1.ipynb - Colaboratory
No ratings yet
Heart - Disease - 1.ipynb - Colaboratory
9 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
CardioGoodFitness - Jupyter Notebook
No ratings yet
CardioGoodFitness - Jupyter Notebook
12 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
Turing Data Analysis
No ratings yet
Turing Data Analysis
30 pages
Ai&Ml Bail606 ML Lab Manual
No ratings yet
Ai&Ml Bail606 ML Lab Manual
50 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
ML Lab FileDhruv
No ratings yet
ML Lab FileDhruv
74 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
Machine Learnine Experiment by Priyanka
No ratings yet
Machine Learnine Experiment by Priyanka
6 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
Ai Record Programs
No ratings yet
Ai Record Programs
34 pages
Python Codes Test 2
No ratings yet
Python Codes Test 2
12 pages
Week 2 Part 1 Inferential Statistics 1 Self Paced TutorialsUpload
No ratings yet
Week 2 Part 1 Inferential Statistics 1 Self Paced TutorialsUpload
16 pages
Heart Attack
No ratings yet
Heart Attack
18 pages
Lecture 2
No ratings yet
Lecture 2
30 pages
Indexdw
No ratings yet
Indexdw
34 pages
Midwest's EZ Bulk Casing Guide: Storage For Un-Opened Casing Mix Bags
No ratings yet
Midwest's EZ Bulk Casing Guide: Storage For Un-Opened Casing Mix Bags
4 pages
Lesson Plan Outline
No ratings yet
Lesson Plan Outline
18 pages
1 Complex Analysis: 1.1 Analytic Functions
No ratings yet
1 Complex Analysis: 1.1 Analytic Functions
15 pages
Ecosort
No ratings yet
Ecosort
11 pages
Linear Algebra: Orthogonality Basics
No ratings yet
Linear Algebra: Orthogonality Basics
7 pages
Cover Letter Offical
No ratings yet
Cover Letter Offical
1 page
Origins of Imperial Ottoman Architecture in Istanbul - A Cross-Cu
No ratings yet
Origins of Imperial Ottoman Architecture in Istanbul - A Cross-Cu
124 pages
MPS 100 Cargo Transport
No ratings yet
MPS 100 Cargo Transport
4 pages
Take - Me - Home - Country - Roads Jonh Denver PDF
100% (4)
Take - Me - Home - Country - Roads Jonh Denver PDF
1 page
Log-Periodic Antenna Specs
No ratings yet
Log-Periodic Antenna Specs
2 pages
1-Res2DInv Manual
No ratings yet
1-Res2DInv Manual
162 pages
Micropod II
No ratings yet
Micropod II
21 pages
Cleaner Production
No ratings yet
Cleaner Production
12 pages
May Q&A
No ratings yet
May Q&A
169 pages
MCQ Important
No ratings yet
MCQ Important
3 pages
Maintenance Schedule 320D FAL Series Excavator
50% (2)
Maintenance Schedule 320D FAL Series Excavator
5 pages
Private Jet Selection Guide
No ratings yet
Private Jet Selection Guide
5 pages
Course Outline Quantum Mechanics-II
No ratings yet
Course Outline Quantum Mechanics-II
2 pages
Love and Commitment Song Lyrics
No ratings yet
Love and Commitment Song Lyrics
20 pages
Soil Mechanics Lab Manual
No ratings yet
Soil Mechanics Lab Manual
39 pages
Chapter 4: Probability Power Analysis: 06-88-541: Low Power CMOS Design
No ratings yet
Chapter 4: Probability Power Analysis: 06-88-541: Low Power CMOS Design
4 pages
Study of Hot Deformation Behaviour of 2205 Duplex Stainless Steel
No ratings yet
Study of Hot Deformation Behaviour of 2205 Duplex Stainless Steel
8 pages
Parameters Theory Design and Microwave Applications Wiley Series in Microwave and Optical Engineering 1903958
100% (1)
Parameters Theory Design and Microwave Applications Wiley Series in Microwave and Optical Engineering 1903958
50 pages
Embryonal Tumors
No ratings yet
Embryonal Tumors
16 pages
Marketing & Packaging of Processed Meat & Poultry Products
No ratings yet
Marketing & Packaging of Processed Meat & Poultry Products
14 pages
Threat Modelling at The L2 Level
No ratings yet
Threat Modelling at The L2 Level
9 pages
Instrumentation Tailing Dams
100% (1)
Instrumentation Tailing Dams
257 pages
Fenrg 09 719697
No ratings yet
Fenrg 09 719697
19 pages
Series 8000DH Direct Heat Incubators User Manual
No ratings yet
Series 8000DH Direct Heat Incubators User Manual
86 pages
How Green Computing Can Benefit Schools - Spaces4Learning
No ratings yet
How Green Computing Can Benefit Schools - Spaces4Learning
8 pages

00 Logistic Regression

Uploaded by

00 Logistic Regression

Uploaded by

___ Copyright by Pierian Data Inc. For more information, visit us at www.pieriandata.

age physical_score test_result

Exploratory Data Analysis and Visualization

age physical_score test_result

from mpl_toolkits.mplot3d import Axes3D

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,

Logistic Regression Model

The odds ratio

This exponential relationship provides an interpretation for

The odds multiply by

# Scaled so highest value=1

precision recall f1-score support

0 0.92 0.89 0.91 193

accuracy 0.93 500

Evaluating Curves and AUC

from sklearn.metrics import

You might also like