10/15/24, 8:57 PM Decision Tree - Jupyter Notebook
Decsion Tree Classifier
Using Loan Aproval Dataset
import Libraries
In [9]: # Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
Create a Simple Dataset
In [13]: # Create the dataset
data = {
'Age': [25, 45, 35, 50, 23, 40, 30, 28, 55, 33],
'Income': ['Low', 'High', 'Medium', 'High', 'Low', 'Medium', 'High
'Credit_Score': [600, 700, 650, 720, 580, 660, 680, 590, 740, 620]
'Owns_House': ['No', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No',
'Approved': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No',
}
# Convert the dictionary to a pandas DataFrame
df = pd.DataFrame(data)
# Display the dataset
df
Out[13]: Age Income Credit_Score Owns_House Approved
0 25 Low 600 No No
1 45 High 700 Yes Yes
2 35 Medium 650 No Yes
3 50 High 720 Yes Yes
4 23 Low 580 No No
5 40 Medium 660 Yes Yes
6 30 High 680 Yes Yes
7 28 Low 590 No No
8 55 High 740 Yes Yes
9 33 Low 620 No No
localhost:8888/notebooks/Downloads/Decision Tree.ipynb 1/4
10/15/24, 8:57 PM Decision Tree - Jupyter Notebook
Step 3: Encode Categorical Features
convert the categorical variables like "Income" and "Owns_House" into numerical values for
the model. We'll use LabelEncoder for this.
In [17]: # Initialize LabelEncoder
le = LabelEncoder()
# Convert categorical columns to numerical ones
df['Income'] = le.fit_transform(df['Income']) # Low=1, Medium=2, High
df['Owns_House'] = le.fit_transform(df['Owns_House']) # No=0, Yes=1
df['Approved'] = le.fit_transform(df['Approved']) # No=0, Yes=1
# Display the dataset after encoding
df
Out[17]: Age Income Credit_Score Owns_House Approved
0 25 1 600 0 0
1 45 0 700 1 1
2 35 2 650 0 1
3 50 0 720 1 1
4 23 1 580 0 0
5 40 2 660 1 1
6 30 0 680 1 1
7 28 1 590 0 0
8 55 0 740 1 1
9 33 1 620 0 0
Split the Dataset into Features and Target
In [19]: # Define features (X) and target (y)
X = df.drop('Approved', axis=1)
y = df['Approved']
Split the Data into Training and Test Sets
localhost:8888/notebooks/Downloads/Decision Tree.ipynb 2/4
10/15/24, 8:57 PM Decision Tree - Jupyter Notebook
In [21]: # Split the dataset into training and testing sets (80% train, 20% tes
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.
# Print the shapes of training and testing sets
print(f"Training set size: {X_train.shape}")
print(f"Testing set size: {X_test.shape}")
Training set size: (8, 4)
Testing set size: (2, 4)
Train the Decision Tree Classifier
In [23]: # Initialize the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
# Train the model using the training data
clf.fit(X_train, y_train)
Out[23]: DecisionTreeClassifier(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or
trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page
with nbviewer.org.
Step 7: Make Predictions
In [31]: # Make predictions on the testing set
y_pred = clf.predict(X_test)
# Print the actual and predicted values side-by-side for comparison
print("Actual vs Predicted values:")
for actual, predicted in zip(y_test, y_pred):
print(f"Actual: {actual}, Predicted: {predicted}")
Actual vs Predicted values:
Actual: 1, Predicted: 1
Actual: 1, Predicted: 1
Step 8: Evaluate the Model
localhost:8888/notebooks/Downloads/Decision Tree.ipynb 3/4
10/15/24, 8:57 PM Decision Tree - Jupyter Notebook
In [27]: # Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# Detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Accuracy: 100.00%
Classification Report:
precision recall f1-score support
1 1.00 1.00 1.00 2
accuracy 1.00 2
macro avg 1.00 1.00 1.00 2
weighted avg 1.00 1.00 1.00 2
In [ ]:
localhost:8888/notebooks/Downloads/Decision Tree.ipynb 4/4