0% found this document useful (0 votes)

18 views6 pages

9914 ML Lab3

The document outlines a data analysis process using the Titanic dataset, including data loading, cleaning, and exploratory analysis. It highlights missing values in the 'age', 'cabin', and 'embarked' columns, and demonstrates data preprocessing steps such as filling missing values and encoding categorical variables. Finally, it applies a logistic regression model to predict survival, evaluates its performance with a confusion matrix, and presents classification metrics.

Uploaded by

Ronit Naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views6 pages

9914 ML Lab3

Uploaded by

Ronit Naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

In [ ]: import numpy as np

import pandas as pd

from sklearn import preprocessing

import matplotlib.pyplot as plt
plt.rc("font", size=14)
import seaborn as sns
sns.set(style="white") #white background style for seaborn plots
sns.set(style="whitegrid", color_codes=True)

import warnings
warnings.simplefilter(action='ignore')

New Section
In [ ]: train_df = pd.read_csv("/content/titanic.csv")

# Read CSV test data file into DataFrame

test_df = pd.read_csv("/content/titanic.csv")

# preview train data

train_df.head()

Out[ ]: passengerid pclass survived name sex age sibsp parch ticket fare cabin embarked

0 1 1 1 Allen, Miss. Elisabeth Walton female 29.0000 0 0 24160 211.3375 B5 S

C22
1 2 1 1 Allison, Master. Hudson Trevor male 0.9167 1 2 113781 151.5500 S
C26

C22
2 3 1 0 Allison, Miss. Helen Loraine female 2.0000 1 2 113781 151.5500 S
C26

C22
3 4 1 0 Allison, Mr. Hudson Joshua Creighton male 30.0000 1 2 113781 151.5500 S
C26

Allison, Mrs. Hudson J C (Bessie Waldo C22

4 5 1 0 female 25.0000 1 2 113781 151.5500 S
Daniels) C26

In [ ]: print('The number of samples into the train data is {}.'.format(train_df.shape[0]))

The number of samples into the train data is 1309.

In [ ]: test_df.head()

Out[ ]: passengerid pclass survived name sex age sibsp parch ticket fare cabin embarked

0 1 1 1 Allen, Miss. Elisabeth Walton female 29.0000 0 0 24160 211.3375 B5 S

C22
1 2 1 1 Allison, Master. Hudson Trevor male 0.9167 1 2 113781 151.5500 S
C26

C22
2 3 1 0 Allison, Miss. Helen Loraine female 2.0000 1 2 113781 151.5500 S
C26

C22
3 4 1 0 Allison, Mr. Hudson Joshua Creighton male 30.0000 1 2 113781 151.5500 S
C26

Allison, Mrs. Hudson J C (Bessie Waldo C22

4 5 1 0 female 25.0000 1 2 113781 151.5500 S
Daniels) C26

In [ ]: print('The number of samples into the test data is {}.'.format(test_df.shape[0]))

The number of samples into the test data is 1309.

In [ ]: train_df.isnull().sum()

passengerid 0
Out[ ]:
pclass 0
survived 0
name 0
sex 0
age 263
sibsp 0
parch 0
ticket 0
fare 1
cabin 1014
embarked 2
dtype: int64

In [ ]: print('Percent of missing "Age" records is %.2f%%' %((train_df['age'].isnull().sum()/train_df.shape[0])*100))

Percent of missing "Age" records is 20.09%

In [ ]: ax = train_df["age"].hist(bins=15, density=True, stacked=True, color='teal', alpha=0.6)

train_df["age"].plot(kind='density', color='teal')
ax.set(xlabel='age')
plt.xlim(-10,85)
plt.show()

In [ ]: print('The mean of "Age" is %.2f' %(train_df["age"].mean(skipna=True)))

# median age
print('The median of "Age" is %.2f' %(train_df["age"].median(skipna=True)))

The mean of "Age" is 29.88

The median of "Age" is 28.00

In [ ]: print('Percent of missing "Cabin" records is %.2f%%' %((train_df['cabin'].isnull().sum()/train_df.shape[0])*100))

Percent of missing "Cabin" records is 77.46%

In [ ]: print('Percent of missing "Embarked" records is %.2f%%' %((train_df['embarked'].isnull().sum()/train_df.shape[0])*100))

Percent of missing "Embarked" records is 0.15%

In [ ]: print('Boarded passengers grouped by port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton):')

print(train_df['embarked'].value_counts())
sns.countplot(x='embarked', data=train_df, palette='Set2')
plt.show()

Boarded passengers grouped by port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton):

S 914
C 270
Q 123
Name: embarked, dtype: int64

In [ ]: print('The most common boarding port of embarkation is %s.' %train_df['embarked'].value_counts().idxmax())

The most common boarding port of embarkation is S.

In [ ]: train_data = train_df.copy()
train_data["age"].fillna(train_df["age"].median(skipna=True), inplace=True)
train_data["embarked"].fillna(train_df['embarked'].value_counts().idxmax(), inplace=True)
train_data.drop('cabin', axis=1, inplace=True)

In [ ]: train_data.isnull().sum()

passengerid 0
Out[ ]:
pclass 0
survived 0
name 0
sex 0
age 0
sibsp 0
parch 0
ticket 0
fare 1
embarked 0
dtype: int64

In [ ]: train_data.head()
Out[ ]: passengerid pclass survived name sex age sibsp parch ticket fare embarked

0 1 1 1 Allen, Miss. Elisabeth Walton female 29.0000 0 0 24160 211.3375 S

1 2 1 1 Allison, Master. Hudson Trevor male 0.9167 1 2 113781 151.5500 S

2 3 1 0 Allison, Miss. Helen Loraine female 2.0000 1 2 113781 151.5500 S

3 4 1 0 Allison, Mr. Hudson Joshua Creighton male 30.0000 1 2 113781 151.5500 S

4 5 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0000 1 2 113781 151.5500 S

In [ ]: sns.countplot(x='survived', hue='pclass', data=train_df)

<matplotlib.axes._subplots.AxesSubplot at 0x7ffac6809a10>
Out[ ]:

In [ ]: sns.countplot(x='survived', hue='sex', data=train_df)

<matplotlib.axes._subplots.AxesSubplot at 0x7ffac67084d0>
Out[ ]:

In [ ]: def add_age(cols):
Age = cols[0]
Pclass = cols[1]
if pd.isnull(Age):
return int(train_df[train_df["pclass"] == Pclass]["age"].mean())
else:
return Age

In [ ]: train_df["age"] = train_df[["age", "pclass"]].apply(add_age,axis=1)

In [ ]: train_df.drop("cabin",inplace=True,axis=1)

In [ ]: train_df.dropna(inplace=True)

In [ ]: pd.get_dummies(train_df["sex"])
Out[ ]: female male

0 1 0

1 0 1

2 1 0

3 0 1

4 1 0

... ... ...

1304 1 0

1305 1 0

1306 0 1

1307 0 1

1308 0 1

1306 rows × 2 columns

In [ ]: sex = pd.get_dummies(train_df["sex"],drop_first=True)

In [ ]: embarked = pd.get_dummies(train_df["embarked"],drop_first=True)
pclass = pd.get_dummies(train_df["pclass"],drop_first=True)

In [ ]: train = pd.concat([train_df,pclass,sex,embarked],axis=1)

In [ ]: train.drop(["passengerid","pclass","name","sex","ticket","embarked"],axis=1,inplace=True)

In [ ]: X = train.drop("survived",axis=1)
y = train["survived"]

In [ ]: from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 101)

In [ ]: from sklearn.linear_model import LogisticRegression

logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)

LogisticRegression()
Out[ ]:

In [ ]: predictions = logmodel.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, predictions))

precision recall f1-score support

0 0.84 0.86 0.85 247

1 0.75 0.73 0.74 145

accuracy 0.81 392

macro avg 0.80 0.79 0.80 392
weighted avg 0.81 0.81 0.81 392

In [ ]: from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, predictions)

array([[212, 35],
Out[ ]:
[ 39, 106]])

In [ ]: %matplotlib inline
from sklearn.metrics import confusion_matrix
import itertools
import matplotlib.pyplot as plt

In [ ]: def plot_confusion_matrix(cm, classes,

normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)

if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')

print(cm)

thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')

In [ ]: cm = confusion_matrix(y_true=y_test, y_pred=predictions)
cm_plot_labels = ["True","False"]
plot_confusion_matrix(cm=cm, classes=cm_plot_labels, title='Confusion Matrix')

Confusion matrix, without normalization

[[212 35]
[ 39 106]]

9924 ML Lab3
No ratings yet
9924 ML Lab3
9 pages
Titanic Data Analysis & Modeling
No ratings yet
Titanic Data Analysis & Modeling
11 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
14 pages
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
5 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
Logistic Regression On Titanic Dataset
No ratings yet
Logistic Regression On Titanic Dataset
6 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Titanic ML for Data Scientists
No ratings yet
Titanic ML for Data Scientists
36 pages
Titanic
No ratings yet
Titanic
6 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
Titanic Logistic Regression Project
No ratings yet
Titanic Logistic Regression Project
35 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
Day 20
No ratings yet
Day 20
5 pages
ML - Lab 03.ipynb Colab
No ratings yet
ML - Lab 03.ipynb Colab
4 pages
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
No ratings yet
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
19 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
01-Logistic Regression With Python
No ratings yet
01-Logistic Regression With Python
12 pages
Advanced Python for Data Scientists
No ratings yet
Advanced Python for Data Scientists
19 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
KNN Practical Debasmita Datta
No ratings yet
KNN Practical Debasmita Datta
6 pages
Practical No 01
No ratings yet
Practical No 01
9 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
# Load The Titanic Dataset: Import As Import As From Import From Import
No ratings yet
# Load The Titanic Dataset: Import As Import As From Import From Import
9 pages
A09Ass01 - Jupyter Notebook
No ratings yet
A09Ass01 - Jupyter Notebook
8 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
33 pages
Titanic PuneethRegonda
No ratings yet
Titanic PuneethRegonda
8 pages
Data Preprocessing for ML with Python
No ratings yet
Data Preprocessing for ML with Python
2 pages
ML Lab File
No ratings yet
ML Lab File
19 pages
Random Forest: Random Forest Has Classifier For Classification and Regressor For Regression
No ratings yet
Random Forest: Random Forest Has Classifier For Classification and Regressor For Regression
9 pages
Titanic Data Analysis & Modeling
No ratings yet
Titanic Data Analysis & Modeling
12 pages
Pandas Data Imputation Guide
No ratings yet
Pandas Data Imputation Guide
12 pages
1
No ratings yet
1
13 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
Titanic Survival Prediction - Step-by-Step Guide
No ratings yet
Titanic Survival Prediction - Step-by-Step Guide
4 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
The Titanic Dataset
No ratings yet
The Titanic Dataset
6 pages
Data Cleaning by Manish Batra 1697684636
No ratings yet
Data Cleaning by Manish Batra 1697684636
30 pages
ML 3
No ratings yet
ML 3
9 pages
Ahamed 123
100% (1)
Ahamed 123
7 pages
Seaborn Ploting in Titanic
No ratings yet
Seaborn Ploting in Titanic
18 pages
Day 11 (Code 1) Mean Median Imputation - Jupyter Notebook
No ratings yet
Day 11 (Code 1) Mean Median Imputation - Jupyter Notebook
6 pages
Assignment2 DMS672
No ratings yet
Assignment2 DMS672
15 pages
DSBDA9
No ratings yet
DSBDA9
7 pages
23BCE7199 ML Lab Assignment
No ratings yet
23BCE7199 ML Lab Assignment
15 pages
Project Report
No ratings yet
Project Report
7 pages
PANDAS Groupby Continues 2
No ratings yet
PANDAS Groupby Continues 2
5 pages
Titanic Survival Data Analysis
No ratings yet
Titanic Survival Data Analysis
6 pages
Untitled26 1
No ratings yet
Untitled26 1
15 pages
AI&ML
No ratings yet
AI&ML
9 pages
1 10
No ratings yet
1 10
4 pages
LOGISTIC - REGRESSION - Jupyter Notebook
No ratings yet
LOGISTIC - REGRESSION - Jupyter Notebook
18 pages
ML Dataset Performance
No ratings yet
ML Dataset Performance
11 pages
Titanic
No ratings yet
Titanic
22 pages
Titanic Dataset
No ratings yet
Titanic Dataset
9 pages
Random Forest
No ratings yet
Random Forest
8 pages
9914 VivianLudrick BDA Lab3
No ratings yet
9914 VivianLudrick BDA Lab3
8 pages
9914 ML Lab2
No ratings yet
9914 ML Lab2
5 pages
9924 Experiment 2
No ratings yet
9924 Experiment 2
20 pages
Technical Paper
No ratings yet
Technical Paper
11 pages
csc604: Ai: Module 5: Planning
No ratings yet
csc604: Ai: Module 5: Planning
18 pages
Case Study - Student Result Management System
No ratings yet
Case Study - Student Result Management System
68 pages
Unit-3 Decision Control System
No ratings yet
Unit-3 Decision Control System
57 pages
Winnie Tafempa Messa Resume Update
No ratings yet
Winnie Tafempa Messa Resume Update
2 pages
Cyber Priority 2024
No ratings yet
Cyber Priority 2024
21 pages
CTS Industrial Robotics Digital Manufacturing Tech - CTS - 1.0 - NSQF-4
No ratings yet
CTS Industrial Robotics Digital Manufacturing Tech - CTS - 1.0 - NSQF-4
37 pages
The Healing Tablet
100% (1)
The Healing Tablet
4 pages
Excel & Ledger Training Guide
No ratings yet
Excel & Ledger Training Guide
3 pages
DTC-325 DtGrabber+ Manual
No ratings yet
DTC-325 DtGrabber+ Manual
38 pages
CRg7-Passport-CP-tester (English V1)
No ratings yet
CRg7-Passport-CP-tester (English V1)
21 pages
Chapter 1 Sources of Error
No ratings yet
Chapter 1 Sources of Error
7 pages
Free BINIIN Lookup Web Service
No ratings yet
Free BINIIN Lookup Web Service
1 page
Project Report Final Defence
No ratings yet
Project Report Final Defence
42 pages
NWHK LAB 1-Installing Kali Linux v.1.1
No ratings yet
NWHK LAB 1-Installing Kali Linux v.1.1
11 pages
Class X AI Exam Blueprint
No ratings yet
Class X AI Exam Blueprint
1 page
Opcode Guide for Assembly Programmers
No ratings yet
Opcode Guide for Assembly Programmers
4 pages
Kgdkydlhxlhdit
No ratings yet
Kgdkydlhxlhdit
223 pages
Cambridge Primary Computing Learner S Book Stage 1 Sample Pages 9781398368569 Pages 5
No ratings yet
Cambridge Primary Computing Learner S Book Stage 1 Sample Pages 9781398368569 Pages 5
1 page
Versions of Table Read and Table Write Instructions
No ratings yet
Versions of Table Read and Table Write Instructions
7 pages
DLL Ict Matatag W2
No ratings yet
DLL Ict Matatag W2
11 pages
WI-PS210G-O V2 11port Outdoor Waterproof PoE Switch With 8PoE
No ratings yet
WI-PS210G-O V2 11port Outdoor Waterproof PoE Switch With 8PoE
6 pages
BS Curriculm Computing Disciplines-2023
No ratings yet
BS Curriculm Computing Disciplines-2023
108 pages
How To Send Resume Using Email
100% (2)
How To Send Resume Using Email
8 pages
Python in A Nutshell Second Edition Alex Martelli Download
No ratings yet
Python in A Nutshell Second Edition Alex Martelli Download
52 pages
Integral Calculus
No ratings yet
Integral Calculus
18 pages
System Analysis Design Chapter 8
No ratings yet
System Analysis Design Chapter 8
41 pages
Importance of Human-Computer Interaction
No ratings yet
Importance of Human-Computer Interaction
24 pages
Compiler Design for CS Students
No ratings yet
Compiler Design for CS Students
46 pages
UniFi Network - AP Antenna Radiation Patterns - U
No ratings yet
UniFi Network - AP Antenna Radiation Patterns - U
1 page
TikTok Region Change Guide
No ratings yet
TikTok Region Change Guide
7 pages
Farlin Final Case Analysis 2 Mis 112
No ratings yet
Farlin Final Case Analysis 2 Mis 112
4 pages