0% found this document useful (0 votes)

7 views35 pages

ML Recordjp

This document is a mini project report for a Machine Learning course, detailing various experiments conducted in Python, including data visualization, linear and logistic regression, K-Nearest Neighbors, and Support Vector Machines. Each experiment outlines the aim, algorithm, program code, and results, demonstrating the application of machine learning techniques on datasets related to retail sales, salary prediction, diabetes classification, and social network ads. The report serves as a record of the work done by a student during the academic year 2024-2025.

Uploaded by

jayaprakash2210604

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views35 pages

ML Recordjp

Uploaded by

jayaprakash2210604

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 35

UEC2604

MACHINE LEARNING

Mini Project Report

VI Semester
UG-Electronics and Communication Engineering
(2024 – 2025)

Name JAYAPRAKASH K
Register Number 3122223002049
Sri Sivasubramaiya Nadar College of Engineering
(An Autonomous Institution, Affiliated to Anna University, Chennai)
Rajiv Gandhi Salai (OMR), Kalavakkam – 603 110

BONAFIDE CERTIFICATE

Date: .....................

Certified that this is the bonafide record of work done by,

Name: .............................................................................................
Register No: ............................................ of Sixth Semester B.E
Electronic And Communication Engineering during the academic year
2024 - 2025 for the subject UEC2604 – Machine learning.

Submitted for the Continuous Assessment Test 2 held on .......................

Faculty In-charge

Exp. No TITLE PAGE NO.

1 Data Visualization Techniques

2 Linear and logistic Regression

3 K Nearest Neighbour

4 Support Vector Machine

5 Principal Component Analysis

6 K Means Clustering
Exp: No : 1 Data Visualization Techniques Date:

AIM:
To perform data visualization techniques on retail sales data to analyze trends,
distribution, top-selling products, sales by country, and correlations using Python libraries.

Algorithm:
1. Start
2. Read the Excel file containing the retail sales data.
3. Convert the InvoiceDate column to datetime format.
4. To analysis the sales tread ,resample the data monthly and visualize total quantity sold
over time.
5. Plot a histogram to analyze the distribution of unit prices.
6. Aggregate sales data by product descriptions and visualize the top-selling products
using a horizontal bar chart.
7. Group data by country and visualize total sales for each country using a bar chart.
8. Compute and visualize the correlation between Quantity and UnitPrice using a
heatmap.
9. Identify the top 5 countries by total sales and visualize their contribution using a pie
chart.
10. End.

Program:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data

file_path = "D:\College\SEMESTER 6\Machine Learning\Lab\lab\Online
Retail_dta_visualisation.xlsx" # Replace with your file path if needed
df = pd.read_csv(file_path)

# 1. Distribution of House Prices

plt.figure(figsize=(10, 6))
sns.histplot(df['sqft_above'], bins=50, kde=True, color='skyblue')
plt.title('Distribution of House Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

# 2. Relationship between Square Footage and Price

plt.figure(figsize=(10, 6))
sns.scatterplot(x='sqft_living', y='price', data=df, alpha=0.5)
plt.title('Price vs Square Footage')
plt.xlabel('Square Footage')
plt.ylabel('Price')
plt.show()

# 3. Boxplot: Price by Number of Bedrooms

plt.figure(figsize=(10, 6))
sns.boxplot(x='bedrooms', y='price', data=df)
plt.title('Price Distribution by Number of Bedrooms')
plt.xlabel('Bedrooms')
plt.ylabel('Price')
plt.show()

plt.figure(figsize=(12, 8))
numeric_df = df.select_dtypes(include=['float64', 'int64']) # Exclude non-
numeric columns
correlation_matrix = numeric_df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm',
linewidths=0.5)
plt.title('Feature Correlation Heatmap')
plt.show()

# 5. Line Plot: Average Price Over Time

df['date'] = pd.to_datetime(df['date'].str[:8], format='%Y%m%d')
price_trend = df.groupby(df['date']).mean()['price']

plt.figure(figsize=(12, 6))
price_trend.plot(color='green')
plt.title('Average House Price Over Time')
plt.xlabel('Date')
plt.ylabel('Average Price')
plt.show()
Output:
Figure:1.1: Plot

Figure:1.2:Histogram
Figure:1.3: Horizontal bar chart

Figure:1.4 :Bar chart

Figure:1.5 : Heat Map

Figure:1.6 : Pie chart

Results:
Thus, Python program to perform data visualization techniques on retail sales data to
analyze trends, distribution, top-selling products, sales by country, and correlations using
Python libraries has been executed and outputs are verified.

Exp: No : 2 Linear and logistic Regression Date:

AIM:
To implement and evaluate Linear Regression for salary prediction and Logistic
Regression for diabetes classification using Python libraries.

Algorithm:
Start
Linear Regression (Salary Prediction)
1. Load the dataset (Salary_Data_linear_regression.csv).
2. Extract Years of Experience as the independent variable (X) and Salary as the
dependent variable (y).
3. Split the data into training and testing sets (80% train, 20% test).
4. Train a Linear Regression model using the training data.
5. Predict salary values on the test set.
6. Evaluate the model using Mean Squared Error (MSE).
7. Plot the actual vs. predicted salaries.
Logistic Regression (Diabetes Classification)
1. Load the dataset (diabetes_logistic_regression.csv).
2. Extract the input features (X) and the target outcome (y).
3. Split the data into training and testing sets (80% train, 20% test).
4. Train a Logistic Regression model using the training data.
5. Predict diabetes outcomes on the test set.
6. Evaluate the model using Accuracy Score and Classification Report (precision,
recall, F1-score).

End.

Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load the data
data = pd.read_csv('D:\College\SEMESTER 6\Machine Learning\Lab\lab\
Salary_Data_linear_regression.csv')
# Prepare the features (X) and target variable (y)
X = data['YearsExperience'].values.reshape(-1, 1)
y = data['Salary'].values
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)##
#Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Print model details
print("Linear Regression Analysis Results:")
print(f"Slope (Coefficient): {model.coef_[0]:.2f}")
print(f"Intercept: {model.intercept_:.2f}")
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared Score: {r2:.2f}")

# Visualize the results

plt.figure(figsize=(10, 6))
plt.scatter(X_train, y_train, color='blue', label='Training Data')
plt.scatter(X_test, y_test, color='red', label='Testing Data')
plt.plot(X, model.predict(X), color='green', label='Regression Line')
plt.title('Salary vs Years of Experience')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.legend()
plt.show()
# Prediction example
years_of_experience = 5
predicted_salary = model.predict([[years_of_experience]])
print(f"\nPredicted Salary for {years_of_experience} years of experience: $
{predicted_salary[0]:.2f}")

Output:
Figure 2.1: Linear Regression

Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report, roc_curve, auc

# Load dataset
df = pd.read_csv("D:/College/SEMESTER 6/Machine
Learning/Lab/lab/diabetes_logistic_regression.csv")

# Display basic info

print(df.head())
print(df.info())
# Assuming the last column is the target variable (adjust if needed)
y = df.iloc[:, -1]
X = df.iloc[:, :-1]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train logistic regression model

model = LogisticRegression()
model.fit(X_train, y_train)

# Predict on test data

y_pred = model.predict(X_test)

# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", classification_report(y_test, y_pred))
# Plot confusion matrix
plt.figure(figsize=(6,4))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Negative', 'Positive'], yticklabels=['Negative', 'Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# ROC Curve
probs = model.predict_proba(X_test)[:, 1]
fpr, tpr, _ = roc_curve(y_test, probs)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(6,4))
plt.plot(fpr, tpr, color='blue', label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], linestyle='--', color='gray')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend()
plt.show()`
output:
Figure 2.1: logistic regression

Figure 2.2: ROC curve

figure 2.3: terminal output
Results:
Thus, Python program to implement and evaluate Linear Regression for salary
prediction and Logistic Regression for diabetes classification using Python libraries has been
executed and outputs are verified.
Exp: No : 3 K Nearest Neighbour Date:

AIM:
To implement and compare Linear Regression & K-Nearest Neighbours (KNN)
Regression for salary prediction and Logistic Regression & K-Nearest Neighbours (KNN)
Classification for diabetes prediction

Algorithm:
Start
Regression (Salary Prediction using Linear Regression & KNN Regression)
1. Load the Salary Dataset (Salary_Data_linear_regression.csv).
2. Select Years of Experience as the independent variable (X) and Salary as the
dependent variable (y).
3. Split the dataset into training (80%) and testing (20%) sets.
4. Train a Linear Regression model on the training data.
5. Predict salaries using the trained model and calculate Mean Squared Error (MSE).
6. Train a KNN Regressor (k=5) on the same data.
7. Predict salaries using KNN Regressor and calculate MSE.
8. Plot Actual vs. Predicted salaries for both models.
Classification (Diabetes Prediction using Logistic Regression & KNN Classification)
1. Load the Diabetes Dataset (diabetes_logistic_regression.csv).
2. Extract all input features (X) and target labels (y).
3. Split the dataset into training (80%) and testing (20%) sets.
4. Train a Logistic Regression model on the training data.
5. Predict diabetes outcomes on the test set and calculate accuracy.
6. Train a KNN Classifier (k=5) on the same data.
7. Predict outcomes using KNN and calculate accuracy.
8. Generate Classification Reports for both models.
End.

Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load dataset
file_path = "D:\College\SEMESTER 6\Machine Learning\Lab\lab\kc_house_data.csv"
if not os.path.exists(file_path):
raise FileNotFoundError(f"Dataset not found: {file_path}")

df = pd.read_csv(file_path)

# Handle missing values

df.dropna(inplace=True)

# Display basic info

print(df.head())
print(df.info())

# Drop non-useful columns if necessary (e.g., ID, address, date, etc.)

if 'id' in df.columns:
df = df.drop(columns=['id'])
if 'date' in df.columns:
df = df.drop(columns=['date'])
# Selecting features (X) and target variable (y)
# Assuming 'price' is the target variable (modify if needed)
X = df.drop(columns=['price'])
y = df['price'] > df['price'].median() # Binary classification (High/Low price)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train KNN model (with k=5)

knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)

# Predict on test data

y_pred = knn_model.predict(X_test)

# --- Confusion Matrix Plot ---

plt.figure(figsize=(6, 4))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Low Price', 'High
Price'], yticklabels=['Low Price', 'High Price'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# --- Scatter Plot (Visualizing sqft_living vs. price category) ---

if 'sqft_living' in df.columns:
plt.figure(figsize=(8, 5))
sns.scatterplot(x=df['sqft_living'], y=df['price'], hue=y, palette={True: 'red', False: 'blue'},
alpha=0.5)
plt.xlabel('Living Area (sqft)')
plt.ylabel('Price')
plt.title('House Prices: High vs Low Price Categories')
plt.legend(title='Price Category', labels=['Low Price', 'High Price'])
plt.show()

Output:

Figure 3.1:KNN vs Linear regression

Figure 3.2 :Terminal output of K-Nearest Neighbour

Results:
Thus, Python program to implement and compare Linear Regression & K-Nearest
Neighbours (KNN) Regression for salary prediction and Logistic Regression & K-Nearest
Neighbours (KNN) Classification for diabetes prediction has been executed and outputs are
verified.
Exp: No : 4 Support Vector Machine Date:

AIM:
To implement Support Vector Machine (SVM) classification on the Social Network
Ads dataset and evaluate its performance

Algorithm:
1. Start
2. Read the dataset Social_Network_Ads_SVM.csv using pandas.
3. Select the feature columns (Age & Estimated Salary) and the target column
(Purchased).
4. Split the dataset into training (80%) and testing (20%) sets.
5. Standardize the feature values using StandardScaler.
6. Use SVC(kernel='linear') to train a linear Support Vector Machine on the training
set.
7. Predict the target values on the test set.
8. Calculate accuracy using accuracy_score.
9. Generate a classification report showing precision, recall, and F1-score.
10. Compute and visualize a confusion matrix using seaborn.heatmap().
11. End.

Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load dataset
df = pd.read_csv("D:\College\SEMESTER 6\Machine Learning\Lab\lab\
Social_Network_Ads_SVM.csv")

# Display basic info

print(df.head())
print(df.info())

# Drop User ID column as it's not useful

if 'User ID' in df.columns:
df = df.drop(columns=['User ID'])

# Encode categorical variable (Gender)

if 'Gender' in df.columns:
label_encoder = LabelEncoder()
df['Gender'] = label_encoder.fit_transform(df['Gender']) # Female -> 0, Male -> 1

# Assuming the last column is the target variable (adjust if needed)

y = df.iloc[:, -1]
X = df.iloc[:, :-1]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM model

svm_model = SVC(kernel='rbf', probability=True)
svm_model.fit(X_train, y_train)

# Predict on test data

y_pred = svm_model.predict(X_test)
# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", classification_report(y_test, y_pred))

# --- Confusion Matrix Plot ---

plt.figure(figsize=(6, 4))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Negative',
'Positive'], yticklabels=['Negative', 'Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show() # Show the first plot before proceeding

# --- Scatter Plot (First Two Features) ---

plt.figure(figsize=(6, 4))
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='coolwarm', edgecolors='k',
alpha=0.7)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Scatter Plot of Training Data')
plt.show()

# --- Decision Boundary Visualization (Only for 2 Features) ---

def plot_decision_boundary(X, y, model):
X1, X2 = np.meshgrid(np.arange(start=X[:, 0].min() - 1, stop=X[:, 0].max() + 1,
step=0.01),
np.arange(start=X[:, 1].min() - 1, stop=X[:, 1].max() + 1, step=0.01))
plt.figure(figsize=(6, 4))
plt.contourf(X1, X2, model.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape), alpha=0.3, cmap='coolwarm')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('SVM Decision Boundary')
plt.show()

# Visualize decision boundary (only works if dataset has 2 features)

if X_train.shape[1] == 2:
plot_decision_boundary(X_train, y_train, svm_model)

Output:

Figure 4.1 : Confusion Matrix of Support Vector Machine

Figure 4.2 : Terminal output of Support Vector Machine

Results:
Thus, Python program to implement Support Vector Machine (SVM) classification on
the Social Network Ads dataset and evaluate its performance has been executed and outputs
are verified.
Exp: No : 5 Principal Component Analysis Date:

AIM:
To perform Principal Component Analysis (PCA) on the Social Network Ads dataset
to reduce dimensionality and visualize the data in a lower-dimensional space

Algorithm:
1. Start
2. Read the dataset Social_Network_Ads_SVM.csv using pandas.
3. Convert categorical columns (if any) into numerical values using LabelEncoder().
4. Extract features (X) and target variable (y).
5. Standardize the dataset using StandardScaler() to normalize feature values.
6. Use PCA(n_components=2) to reduce the dataset to two principal components for
visualization.
7. Transform the standardized feature matrix into the new PCA space.
8. Create a scatter plot of the PCA-transformed data, coloured by the target variable.
9. Display the explained variance ratio to understand how much variance each
principal component captures.
10. End.

Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.decomposition import PCA

# Check if file exists

file_path = "D:\College\SEMESTER 6\Machine Learning\Lab\lab\
Social_Network_Ads_SVM.csv"
if not os.path.exists(file_path):
raise FileNotFoundError(f"Dataset not found: {file_path}")

# Load dataset
df = pd.read_csv(file_path)

# Display basic info

print(df.head())
print(df.info())
# Drop User ID column as it's not useful
if 'User ID' in df.columns:
df = df.drop(columns=['User ID'])

# Encode categorical variable (Gender)

if 'Gender' in df.columns:
label_encoder = LabelEncoder()
df['Gender'] = label_encoder.fit_transform(df['Gender']) # Female -> 0, Male -> 1

# Ensure all features are numerical

df = df.apply(pd.to_numeric, errors='coerce')
df = df.dropna()

# Assuming the last column is the target variable (adjust if needed)

y = df.iloc[:, -1]
X = df.iloc[:, :-1]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Apply PCA
pca = PCA(n_components=2) # Reduce to 2 principal components
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

# Explained variance ratio

print("Explained variance ratio:", pca.explained_variance_ratio_)

# Train SVM model

svm_model = SVC(kernel='rbf', probability=True)
svm_model.fit(X_train_pca, y_train)

# Predict on test data

y_pred = svm_model.predict(X_test_pca)

# Plot confusion matrix

plt.figure(figsize=(6,4))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Negative',
'Positive'], yticklabels=['Negative', 'Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# Visualize PCA components

def plot_pca(X_pca, y):
plt.figure(figsize=(8,6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='coolwarm', edgecolors='k')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA - Data Projection')
plt.show()

plot_pca(X_train_pca, y_train)

# Visualize decision boundary

def plot_decision_boundary(X, y, model):
X1, X2 = np.meshgrid(np.arange(start=X[:, 0].min()-1, stop=X[:, 0].max()+1, step=0.01),
np.arange(start=X[:, 1].min()-1, stop=X[:, 1].max()+1, step=0.01))
plt.contourf(X1, X2, model.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape), alpha=0.3, cmap='coolwarm')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('SVM Decision Boundary with PCA')
plt.show()

plot_decision_boundary(X_train_pca, y_train, svm_model)

Output:
Figure 5.2: Principal Component Analysis Visualization of dataset

Figure 3.2: Terminal output of Principal Component Analysis

Results:
Thus, Python program to perform Principal Component Analysis (PCA) on the Social
Network Ads dataset to reduce dimensionality and visualize the data in a lower-dimensional
space has been executed and outputs are verified.

Exp: No : 6 K-Means Clustering Date:

AIM:
To perform K-Means Clustering on the King County House Sales dataset to group
similar houses based on numerical features and visualize the clusters.
Algorithm:
1. Start
2. Read the dataset kc_house_data.csv using pandas.
3. Select only numerical columns for clustering.
4. Remove missing values (NaN) from the dataset.
5. Standardize the features using StandardScaler() to normalize the data.
6. Choose K = 3 clusters.
7. Fit the K-Means algorithm on the standardized dataset.
8. Predict the cluster for each data point and store it in a new column.
9. Plot a scatter plot of the first two principal features with color-coded clusters.
10. Display the cluster centers of the fitted model.
11. End.

Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load dataset
file_path = "D:\College\SEMESTER 6\Machine Learning\Lab\lab\kc_house_data.csv"
df = pd.read_csv(file_path)

# Display basic info

print(df.head())
print(df.info())

# Selecting relevant numerical features for clustering

selected_features = ['sqft_living', 'price']
data = df[selected_features]

# Standardizing the data

scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Determine optimal number of clusters using the Elbow Method

wcss = [] # Within-cluster sum of squares
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
kmeans.fit(data_scaled)
wcss.append(kmeans.inertia_)

# Plot the Elbow Method graph

plt.figure(figsize=(8, 5))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--', color='b')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.title('Elbow Method for Optimal Clusters')
plt.show()

# Applying K-Means Clustering with optimal clusters (e.g., k=3)

kmeans = KMeans(n_clusters=3, init='k-means++', random_state=42)
df['Cluster'] = kmeans.fit_predict(data_scaled)

# Visualizing Clusters
plt.figure(figsize=(8, 6))
sns.scatterplot(x=data_scaled[:, 0], y=data_scaled[:, 1], hue=df['Cluster'], palette='viridis')
plt.xlabel('Scaled sqft_living')
plt.ylabel('Scaled price')
plt.title('K-Means Clustering')
plt.legend(title='Cluster')
plt.show()

# Display cluster centers

print("Cluster Centers (scaled values):", kmeans.cluster_centers_)

Output:
Figure 6.1: Scatter plot of K-Means clustering

Figure 6.2: Terminal output of K-Means clustering

Results:
Thus, Python program to perform K-Means Clustering on the King County House
Sales dataset to group similar houses based on numerical features and visualize the clusters
has been executed and outputs are verified.

Ae Test Bank This Is Applied Econometrics Testbank
100% (1)
Ae Test Bank This Is Applied Econometrics Testbank
134 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
ML
No ratings yet
ML
17 pages
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
No ratings yet
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
19 pages
House Pricing
No ratings yet
House Pricing
15 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
Unit 5
No ratings yet
Unit 5
18 pages
ML Combined
No ratings yet
ML Combined
254 pages
Machine Learning-SEAIML-241P (PR) Bharat
No ratings yet
Machine Learning-SEAIML-241P (PR) Bharat
42 pages
Task 1
No ratings yet
Task 1
5 pages
Exp 1
No ratings yet
Exp 1
6 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
Python File
No ratings yet
Python File
5 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Vishal - WS-3
No ratings yet
Vishal - WS-3
5 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
Experiment No.8
No ratings yet
Experiment No.8
5 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
ML Record
No ratings yet
ML Record
21 pages
Abhishek Pandey - BI Lab - Exp 3
No ratings yet
Abhishek Pandey - BI Lab - Exp 3
8 pages
Explain Me Every Code Written in It With Deep Know
No ratings yet
Explain Me Every Code Written in It With Deep Know
7 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
1 - Lab Manual (ML)
No ratings yet
1 - Lab Manual (ML)
42 pages
ML File
No ratings yet
ML File
37 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Jaspreet WS-3
No ratings yet
Jaspreet WS-3
5 pages
ML Record
No ratings yet
ML Record
19 pages
Easy Pract ML
No ratings yet
Easy Pract ML
7 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
ML Manual
No ratings yet
ML Manual
30 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
Machine Learning Algorithm With Python Implementation
No ratings yet
Machine Learning Algorithm With Python Implementation
34 pages
ML 01 (Shubham)
No ratings yet
ML 01 (Shubham)
14 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
R22 ML Lab Manual
No ratings yet
R22 ML Lab Manual
25 pages
ML Lab Experiment Shivansh
No ratings yet
ML Lab Experiment Shivansh
29 pages
ML 6 7 8
No ratings yet
ML 6 7 8
10 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Linear
No ratings yet
Linear
2 pages
External
No ratings yet
External
11 pages
ML Manual
No ratings yet
ML Manual
24 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Module 2
No ratings yet
Module 2
20 pages
Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Batch 31
No ratings yet
Batch 31
11 pages
UIT2046 Introduction To Data Science (Offered by IT)
No ratings yet
UIT2046 Introduction To Data Science (Offered by IT)
3 pages
Behind The Brand - Pepsodent With Vira Annisa - Unilever
No ratings yet
Behind The Brand - Pepsodent With Vira Annisa - Unilever
4 pages
Cover Page
No ratings yet
Cover Page
1 page
Week 4 - The Multiple Linear Regression Model (Part 1) PDF
No ratings yet
Week 4 - The Multiple Linear Regression Model (Part 1) PDF
35 pages
Decision Tree Questions
No ratings yet
Decision Tree Questions
8 pages
Naïve Bayes Classifier: Ke Chen
No ratings yet
Naïve Bayes Classifier: Ke Chen
20 pages
Testsfor Structural Breaksin Time Series Analysis AReviewof Recent Development
No ratings yet
Testsfor Structural Breaksin Time Series Analysis AReviewof Recent Development
15 pages
Econometrics by Example 2nd Edition Damodar Gujarati Download PDF
No ratings yet
Econometrics by Example 2nd Edition Damodar Gujarati Download PDF
41 pages
Stat 378
No ratings yet
Stat 378
73 pages
ADF Test
No ratings yet
ADF Test
7 pages
Regression Monograph DSBA Final
No ratings yet
Regression Monograph DSBA Final
38 pages
Machine Learning Interview Questions and Answers PDF
No ratings yet
Machine Learning Interview Questions and Answers PDF
15 pages
Linear Regression in Keras
No ratings yet
Linear Regression in Keras
3 pages
MCQ
No ratings yet
MCQ
8 pages
STAT 700 Homework 5
No ratings yet
STAT 700 Homework 5
10 pages
Multiple Regression With Two Independent Variables: 1. Data Collection
No ratings yet
Multiple Regression With Two Independent Variables: 1. Data Collection
6 pages
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
No ratings yet
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
19 pages
Multivariate Linear Regression
100% (1)
Multivariate Linear Regression
46 pages
Econometrics Assignment 1 Fall 2020
No ratings yet
Econometrics Assignment 1 Fall 2020
2 pages
Two-Way ANOVA RCBD
No ratings yet
Two-Way ANOVA RCBD
32 pages
Classical ML Algorithms
No ratings yet
Classical ML Algorithms
109 pages
Multicollinearity and Regression Analysis
No ratings yet
Multicollinearity and Regression Analysis
12 pages
B. Com. H Business Statistics S FpigWq1
No ratings yet
B. Com. H Business Statistics S FpigWq1
8 pages
Flood Level
No ratings yet
Flood Level
11 pages
Heckman Selection Model
No ratings yet
Heckman Selection Model
9 pages
Spearman Rho Activity
No ratings yet
Spearman Rho Activity
4 pages
Intermediate R - Nonlinear Regression in R
No ratings yet
Intermediate R - Nonlinear Regression in R
4 pages
ML Lecture#3
No ratings yet
ML Lecture#3
37 pages
FINAL - CC01 - Group7
No ratings yet
FINAL - CC01 - Group7
23 pages
Clog P Dengan Aktivitas (Log 1/IC) : Regression
No ratings yet
Clog P Dengan Aktivitas (Log 1/IC) : Regression
6 pages
Multicoliniarity
No ratings yet
Multicoliniarity
2 pages
HousePricePrediction Poster
No ratings yet
HousePricePrediction Poster
1 page

ML Recordjp

Uploaded by

ML Recordjp

Uploaded by

UEC2604

Mini Project Report

Certified that this is the bonafide record of work done by,

Submitted for the Continuous Assessment Test 2 held on .......................

Exp. No TITLE PAGE NO.

1 Data Visualization Techniques

2 Linear and logistic Regression

4 Support Vector Machine

5 Principal Component Analysis

# Load the data

# 1. Distribution of House Prices

# 2. Relationship between Square Footage and Price

# 3. Boxplot: Price by Number of Bedrooms

# 5. Line Plot: Average Price Over Time

Figure:1.4 :Bar chart

Figure:1.6 : Pie chart

Exp: No : 2 Linear and logistic Regression Date:

# Visualize the results

# Display basic info

# Train logistic regression model

# Predict on test data

Figure 2.2: ROC curve

# Handle missing values

# Display basic info

# Drop non-useful columns if necessary (e.g., ID, address, date, etc.)

# Train KNN model (with k=5)

# Predict on test data

# --- Confusion Matrix Plot ---

# --- Scatter Plot (Visualizing sqft_living vs. price category) ---

Figure 3.1:KNN vs Linear regression

# Display basic info

# Drop User ID column as it's not useful

# Encode categorical variable (Gender)

# Assuming the last column is the target variable (adjust if needed)

# Train SVM model

# Predict on test data

# --- Confusion Matrix Plot ---

# --- Scatter Plot (First Two Features) ---

# --- Decision Boundary Visualization (Only for 2 Features) ---

# Visualize decision boundary (only works if dataset has 2 features)

Figure 4.1 : Confusion Matrix of Support Vector Machine

# Check if file exists

# Display basic info

# Encode categorical variable (Gender)

# Ensure all features are numerical

# Assuming the last column is the target variable (adjust if needed)

# Explained variance ratio

# Train SVM model

# Predict on test data

# Plot confusion matrix

# Visualize PCA components

# Visualize decision boundary

plot_decision_boundary(X_train_pca, y_train, svm_model)

Figure 3.2: Terminal output of Principal Component Analysis

Exp: No : 6 K-Means Clustering Date:

# Display basic info

# Selecting relevant numerical features for clustering

# Standardizing the data

# Determine optimal number of clusters using the Elbow Method

# Plot the Elbow Method graph

# Applying K-Means Clustering with optimal clusters (e.g., k=3)

# Display cluster centers

Figure 6.2: Terminal output of K-Means clustering

You might also like