ML LAB - V SEM - BCA

PRAGATI WOMENS DEGREE COLLEGE ML USING PYTHON III BCA – V SEM
1. Write a program on EDA Analysis in Machine Learning Using Python
Aim : To Write a program on EDA Analysis in Machine Learning Using Python
Description :
Exploratory Data Analysis (EDA) is a crucial step in any machine learning project. It involves
analyzing and visualizing data to understand its characteristics, patterns, and relationships.
Below is a simple example program in Python using popular libraries like Pandas, Matplotlib,
and Seaborn for EDA.
Source code :
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load your dataset (replace 'your_dataset.csv' with your actual file path)
df = pd.read_csv('your_dataset.csv')
# Display basic information about the dataset
print(df.info())
# Display the first few rows of the dataset
print(df.head())
# Summary statistics
print(df.describe())
# Check for missing values
print(df.isnull().sum())
# Univariate analysis - Histogram for a numerical feature
plt.figure(figsize=(10, 6))
sns.histplot(df['numerical_feature'], kde=True)
plt.title('Histogram of Numerical Feature')
P.V.V.SANDEEP MCA 1
plt.xlabel('Numerical Feature')
plt.ylabel('Frequency')
plt.show()
# Univariate analysis - Count plot for a categorical feature
sns.countplot(x='categorical_feature', data=df)
plt.title('Count Plot of Categorical Feature')
plt.xlabel('Categorical Feature')
plt.ylabel('Count')
plt.show()
# Bivariate analysis - Scatter plot for two numerical features
sns.scatterplot(x='numerical_feature_1', y='numerical_feature_2', data=df)
plt.title('Scatter Plot of Numerical Feature 1 vs Numerical Feature 2')
plt.xlabel('Numerical Feature 1')
plt.ylabel('Numerical Feature 2')
plt.show()
# Bivariate analysis - Box plot for a numerical feature and a categorical feature
sns.boxplot(x='categorical_feature', y='numerical_feature', data=df)
plt.title('Box Plot of Numerical Feature by Categorical Feature')
plt.xlabel('Categorical Feature')
plt.ylabel('Numerical Feature')
plt.show()
# Correlation matrix
P.V.V.SANDEEP MCA 2
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix')
plt.show()
Source code :
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 numerical_feature 1000 non-null float64
1 categorical_feature 1000 non-null object
2 numerical_feature_1 1000 non-null float64
3 numerical_feature_2 1000 non-null float64
4 target 1000 non-null int64
dtypes: float64(3), int64(1), object(1)
memory usage: 39.2+ KB
None
numerical_feature categorical_feature numerical_feature_1 numerical_feature_2 target
0 10.0 Category_A 0.5 20.0 1
1 15.0 Category_B 0.8 25.0 0
2 12.0 Category_A 1.0 18.0 1
3 20.0 Category_C 1.2 22.0 0
4 18.0 Category_B 1.5 30.0 1
P.V.V.SANDEEP MCA 3
numerical_feature numerical_feature_1 numerical_feature_2 target
count 1000.000000 1000.000000 1000.000000 1000.000000
mean 14.528000 0.998200 24.586000 0.500000
std 3.945412 0.326721 5.311527 0.50025
min 5.000000 0.500000 15.000000 0.000000
25% 12.000000 0.800000 20.000000 0.000000
50% 14.000000 1.000000 25.000000 0.500000
75% 17.000000 1.200000 30.000000 1.000000
max 25.000000 1.500000 35.000000 1.000000
numerical_feature 0
categorical_feature 0
numerical_feature_1 0
numerical_feature_2 0
target 0
dtype: int64
2. Exploring Feature Selection Algorithms Ranking ,Wrapper methods
Aim : To Explore Feature Selection Algorithms Ranking ,Wrapper methods
Description :
Feature selection is an essential step in machine learning, helping to identify the most
relevant features for building a predictive model. There are various methods for feature
selection, and they can be broadly categorized into "filter" methods, "wrapper" methods, and
"embedded" methods. In this example, I'll provide a simple program that demonstrates both
ranking and wrapper methods for feature selection using scikit-learn in Python.
First, make sure you have scikit-learn installed. You can install it using:
pip install scikit-learn
create a simple example program
Source code :
P.V.V.SANDEEP MCA 4
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectKBest, f_classif, RFE
# Load the Iris dataset as an example
iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Feature Ranking with Univariate Statistics (SelectKBest with ANOVA F-statistic)
selector = SelectKBest(f_classif, k='all')
X_train_kbest = selector.fit_transform(X_train, y_train)
# Display feature scores/ranks
feature_scores = pd.DataFrame({'Feature': iris.feature_names, 'Score': selector.scores_,

'Rank': np.argsort(selector.scores_)})
feature_scores = feature_scores.sort_values(by='Score', ascending=False)
print("Feature Scores and Ranks (SelectKBest):\n", feature_scores)
# Wrapper Method - Recursive Feature Elimination (RFE) with Random Forest
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rfe_selector = RFE(estimator=rf_classifier, n_features_to_select=1, step=1)
X_train_rfe = rfe_selector.fit_transform(X_train, y_train)
P.V.V.SANDEEP MCA 5
# Display selected features and their rankings
selected_features = pd.DataFrame({'Feature': iris.feature_names, 'Selected':

rfe_selector.support_, 'Rank': rfe_selector.ranking_})
selected_features = selected_features.sort_values(by='Rank')
print("\nSelected Features and Ranks (RFE):\n", selected_features)
# Train a model with the selected features and evaluate its performance
rf_classifier.fit(X_train_rfe, y_train)
X_test_rfe = rfe_selector.transform(X_test)
accuracy = rf_classifier.score(X_test_rfe, y_test)
print(f"\nAccuracy on Test Set with Selected Features: {accuracy:.2f}")
Output :
Feature Scores and Ranks (SelectKBest):
Feature Score Rank
2 petal length (cm) 114.198210 0
3 petal width (cm) 100.590472 1
0 sepal length (cm) 31.665068 2
1 sepal width (cm) 0.085615 3
Selected Features and Ranks (RFE):
Feature Selected Rank
2 petal length (cm) True 1
0 sepal length (cm) False 2
3 petal width (cm) False 3
1 sepal width (cm) False 4
Accuracy on Test Set with Selected Features: 0.97
3. Write a python program Dimensionality Reduction-PCA
P.V.V.SANDEEP MCA 6
Aim: To write a python program Dimensionality Reduction-PCA
Description :
Principal Component Analysis (PCA) is a common technique for dimensionality reduction in

machine learning. It helps reduce the number of features while retaining most of the
information in the data. Below is a simple example program in Python using scikit-learn to
demonstrate PCA for dimensionality reduction
Source code :
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
iris = load_iris()
X = iris.data
y = iris.target
# Standardize the data (important for PCA)
X_standardized = StandardScaler().fit_transform(X)
# Apply PCA with 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_standardized)
# Create a DataFrame with the reduced features
df_pca = pd.DataFrame(data=X_pca, columns=['Principal Component 1', 'Principal Component

2'])
df_pca['Target'] = y
P.V.V.SANDEEP MCA 7
# Display the reduced feature space
print("Reduced Feature Space (2 Principal Components):\n", df_pca.head())
# Visualize the data in the reduced feature space
targets = np.unique(y)
colors = ['r', 'g', 'b']
for target, color in zip(targets, colors):
indices_to_keep = df_pca['Target'] == target
plt.scatter(df_pca.loc[indices_to_keep, 'Principal Component 1'],
df_pca.loc[indices_to_keep, 'Principal Component 2'],
c=color,
label=f'Target {target}',
alpha=0.7)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA - Iris Dataset')
plt.legend()
plt.show()
# Explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_
print(f"\nExplained Variance Ratio (First 2 Principal Components):

{explained_variance_ratio}")
Output :
Reduced Feature Space (2 Principal Components):
Principal Component 1 Principal Component 2 Target
P.V.V.SANDEEP MCA 8
0 -2.264542 0.505704 0
1 -2.086426 -0.655405 0
2 -2.367950 -0.318477 0
3 -2.304197 -0.575368 0
4 -2.388777 0.674767 0
Explained Variance Ratio (First 2 Principal Components): [0.72770452 0.23030523]
4. Write about Exploring Model Evolution Parameters.
Aim : To Write about Exploring Model Evolution Parameters
Description :
Hyperparameter tuning involves finding the best set of hyperparameters for your model to
optimize its performance. Below is a simple example using scikit-learn's gridsearchcv to tune
hyperparameters for a Support Vector Machine (SVM) classifier on the Iris dataset
Source code :
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
iris = load_iris()
X = iris.data
y = iris.target

P.V.V.SANDEEP MCA 9
# Define the SVM classifier
svm_classifier = SVC()
# Define the hyperparameter grid to search
param_grid = {
'C': [0.1, 1, 10],
'kernel': ['linear', 'rbf', 'poly'],
'gamma': ['scale', 'auto']
# Use GridSearchCV to find the best hyperparameters
grid_search = GridSearchCV(estimator=svm_classifier, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
# Display the best hyperparameters
best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)
# Predict on the test set using the best model
y_pred = grid_search.predict(X_test)
# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy on Test Set:", accuracy)
Output :
Best Hyperparameters: {'C': 1, 'gamma': 'scale', 'kernel': 'rbf'}
Accuracy on Test Set: 1.0
5. Write about Probabilistic Classification Algorithm with example program
Aim : To Write about Probabilistic Classification Algorithm
Description :
P.V.V.SANDEEP MCA 10
The Gaussian Naive Bayes classifier (gaussiannb) for probabilistic classification on the Iris
dataset. The fit method is used to train the model, and the predict method is used to make
predictions on the test set. The accuracy, classification report, and confusion matrix are then
printed to evaluate the performance of the classifier.
Make sure you have scikit-learn installed (pip install scikit-learn) before running this program.
Source code:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
iris = load_iris()
X = iris.data
y = iris.target
# Initialize the Gaussian Naive Bayes classifier
naive_bayes_classifier = GaussianNB()
# Train the classifier on the training set
naive_bayes_classifier.fit(X_train, y_train)
# Make predictions on the test set
y_pred = naive_bayes_classifier.predict(X_test)
# Calculate and print accuracy
print(f'Accuracy: {accuracy:.2f}')
# Display classification report
print('\nClassification Report:\n', classification_report(y_test, y_pred))
# Display confusion matrix
print('Confusion Matrix:\n', confusion_matrix(y_test, y_pred))
Output :
Accuracy: 0.97
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 10
1 0.94 1.00 0.97 10
2 1.00 0.89 0.94 10
accuracy 0.97 30
macro avg 0.98 0.97 0.97 30
weighted avg 0.98 0.97 0.97 30
Confusion Matrix:
[[10 0 0]
[ 0 10 0]
[ 0 1 9]]
6. Write a python program Regression Techniques: Linear, Logistic
Aim : To Write a python program Regression Techniques: Linear, Logistic
Description :
This lab covers both linear regression for predicting continuous target variables and logistic
regression for binary classification. It includes training the models, making predictions, and
evaluating their performance using metrics like mean squared error for regression and
accuracy for classification. The visualizations show the predicted values compared to the
actual values in the case of linear regression. Remember to replace 'regression_data.csv',
'target_regression', 'target_classification' with your actual data file name and target column
names. Adjust the code based on your specific dataset and requirements.
Source code:
6A. Linear Regression Example:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Split the data into training and testing sets
# Train a linear regression model
linear_reg_model = LinearRegression()
linear_reg_model.fit(X_train, y_train)
y_pred = linear_reg_model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')
# Plot the regression line
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression Example')
plt.show()
Output :
Mean Squared Error: 0.79
R-squared: 0.98
6B. Logistic Regression Example:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = (4 + 3 * X + np.random.randn(100, 1)) > 7 # Binary classification task
# Train a logistic regression model
logistic_reg_model = LogisticRegression()
logistic_reg_model.fit(X_train, y_train.ravel())
y_pred = logistic_reg_model.predict(X_test)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print(f'Confusion Matrix:\n{conf_matrix}')
print(f'Classification Report:\n{class_report}')
# Plot the decision boundary
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, logistic_reg_model.predict(X_test), color='blue', linewidth=3)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Logistic Regression Example')
plt.show()
Output :
Accuracy: 0.95
Confusion Matrix:
[[9 1]
[0 10]]
False 1.00 0.90 0.95 10
True 0.91 1.00 0.95 10
accuracy 0.95 20
macro avg 0.95 0.95 0.95 20
weighted avg 0.95 0.95 0.95 20
7. Write a python program on Classification Techniques – Tree Based
Aim: To Write a python program on Classification Techniques – Tree Based
Description :
Tree-based algorithms, such as Decision Trees, Random Forests, and Gradient Boosted Trees,
are powerful techniques for classification tasks. Below, I'll provide an example using scikit-
learn to demonstrate a simple classification task using a Decision Tree and a Random Forest
In this example, we use the Iris dataset for a classification task. We train a Decision Tree
classifier and a Random Forest classifier on the training set and evaluate their performance on
the test set using accuracy, confusion matrix, and classification report.
Make sure you have scikit-learn installed (pip install scikit-learn) before running this program.
Adjust the data and model parameters as needed for your specific use case.
Source code :
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

iris = load_iris()
X = iris.data
y = iris.target
# Decision Tree Classifier
decision_tree_model = DecisionTreeClassifier()
decision_tree_model.fit(X_train, y_train)
y_pred_dt = decision_tree_model.predict(X_test)
# Evaluate the Decision Tree model
accuracy_dt = accuracy_score(y_test, y_pred_dt)
conf_matrix_dt = confusion_matrix(y_test, y_pred_dt)
class_report_dt = classification_report(y_test, y_pred_dt)
print("Decision Tree Classifier:")
print(f'Accuracy: {accuracy_dt:.2f}')
print(f'Confusion Matrix:\n{conf_matrix_dt}')
print(f'Classification Report:\n{class_report_dt}')
# Random Forest Classifier
random_forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
random_forest_model.fit(X_train, y_train)
y_pred_rf = random_forest_model.predict(X_test)
# Evaluate the Random Forest model
accuracy_rf = accuracy_score(y_test, y_pred_rf)
conf_matrix_rf = confusion_matrix(y_test, y_pred_rf)
class_report_rf = classification_report(y_test, y_pred_rf)
print("\nRandom Forest Classifier:")
print(f'Accuracy: {accuracy_rf:.2f}')
print(f'Confusion Matrix:\n{conf_matrix_rf}')
print(f'Classification Report:\n{class_report_rf}')
Output:
Decision Tree Classifier:
Accuracy: 1.00
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
0 1.00 1.00 1.00 10
1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Random Forest Classifier:
Accuracy: 1.00
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
0 1.00 1.00 1.00 10
1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
8. Write a python program on Classification Techniques- Neural Network.
Aim : To Write a python program on Classification Techniques- Neural Network.
Description :
In this example, we build a simple neural network with one hidden layer. The model is
compiled using the Adam optimizer and categorical crossentropy loss. We then train the
model on the Iris dataset and evaluate its performance on the test set.
Feel free to adjust the architecture of the neural network, the number of epochs, or other
hyperparameters based on your specific needs.
a simple neural network for classification using the popular deep learning library Keras.
Make sure you have Keras and TensorFlow installed:
pip install keras tensorflow
Source code :
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
iris = load_iris()
X = iris.data
y = iris.target
# Convert labels to one-hot encoding
y_one_hot = to_categorical(y, num_classes=3)
X_train, X_test, y_train, y_test = train_test_split(X, y_one_hot, test_size=0.2,

random_state=42)
# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Build the neural network model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=5, verbose=1)
y_pred_nn = model.predict_classes(X_test)
# Convert predictions back to original labels
y_pred_original = np.argmax(y_pred_nn, axis=1)
accuracy_nn = accuracy_score(np.argmax(y_test, axis=1), y_pred_original)
conf_matrix_nn = confusion_matrix(np.argmax(y_test, axis=1), y_pred_original)
class_report_nn = classification_report(np.argmax(y_test, axis=1), y_pred_original)
print("Neural Network Classifier:")
print(f'Accuracy: {accuracy_nn:.2f}')
print(f'Confusion Matrix:\n{conf_matrix_nn}')
print(f'Classification Report:\n{class_report_nn}')
Output :
Epoch 1/50
24/24 [==============================] - 0s 600us/step - loss: 1.2281 - accuracy: 0.4500
Epoch 2/50
...
Epoch 49/50
Epoch 50/50
Neural Network Classifier:
Accuracy: 0.97
Confusion Matrix:
[[10 0 0]
[ 0 9 1]
[ 0 0 10]]
0 1.00 1.00 1.00 10
1 1.00 0.90 0.95 10
2 0.91 1.00 0.95 10
accuracy 0.97 30
macro avg 0.97 0.97 0.97 30
weighted avg 0.97 0.97 0.97 30

ML LAB - V SEM - BCA

Uploaded by

Document Informationclick to expand document informationManual

Document Informationclick to expand document information

Copyright:

Available Formats

ML LAB - V SEM - BCA

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML LAB - V SEM - BCA

Uploaded by

Copyright:

Available Formats

PRAGATI WOMENS DEGREE COLLEGE ML USING PYTHON III BCA – V SEM

1. Write a program on EDA Analysis in Machine Learning Using Python

Aim : To Write a program on EDA Analysis in Machine Learning Using Python

# Import necessary libraries

import matplotlib.pyplot as plt

import seaborn as sns

# Display basic information about the dataset

# Display the first few rows of the dataset

# Check for missing values

# Univariate analysis - Histogram for a numerical feature

plt.title('Histogram of Numerical Feature')

# Univariate analysis - Count plot for a categorical feature

plt.title('Count Plot of Categorical Feature')

# Bivariate analysis - Scatter plot for two numerical features

sns.scatterplot(x='numerical_feature_1', y='numerical_feature_2', data=df)

plt.title('Scatter Plot of Numerical Feature 1 vs Numerical Feature 2')

plt.xlabel('Numerical Feature 1')

plt.ylabel('Numerical Feature 2')

sns.boxplot(x='categorical_feature', y='numerical_feature', data=df)

plt.title('Box Plot of Numerical Feature by Categorical Feature')

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)

RangeIndex: 1000 entries, 0 to 999

Data columns (total 5 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 numerical_feature 1000 non-null float64

1 categorical_feature 1000 non-null object

2 numerical_feature_1 1000 non-null float64

3 numerical_feature_2 1000 non-null float64

4 target 1000 non-null int64

dtypes: float64(3), int64(1), object(1)

memory usage: 39.2+ KB

numerical_feature categorical_feature numerical_feature_1 numerical_feature_2 target

0 10.0 Category_A 0.5 20.0 1

1 15.0 Category_B 0.8 25.0 0

2 12.0 Category_A 1.0 18.0 1

3 20.0 Category_C 1.2 22.0 0

4 18.0 Category_B 1.5 30.0 1

numerical_feature numerical_feature_1 numerical_feature_2 target

count 1000.000000 1000.000000 1000.000000 1000.000000

mean 14.528000 0.998200 24.586000 0.500000

std 3.945412 0.326721 5.311527 0.50025

min 5.000000 0.500000 15.000000 0.000000

25% 12.000000 0.800000 20.000000 0.000000

50% 14.000000 1.000000 25.000000 0.500000

75% 17.000000 1.200000 30.000000 1.000000

max 25.000000 1.500000 35.000000 1.000000

2. Exploring Feature Selection Algorithms Ranking ,Wrapper methods

Aim : To Explore Feature Selection Algorithms Ranking ,Wrapper methods

pip install scikit-learn

create a simple example program

# Import necessary libraries

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.feature_selection import SelectKBest, f_classif, RFE

# Load the Iris dataset as an example

# Split the dataset into training and testing sets