ML Lab Manual
ML Lab Manual
HASSAN-573201
Prepared by:
Dr. VASANTHA KUMARA M
Associate Professor
Programme Educational Objectives (PEOs)
PEO1: Graduates of the program will have ability to understand, analyse and design an
Artificial Intelligence and Machine Learning solution to real-world challenges.
PEO2: Graduates of this program will have an ability to be getting employed and excel in
professional career, research to achieve higher goals.
PEO3: Graduates of the program will excel as socially committed engineers with high ethical and
moral values.
PSO1: An ability to apply concepts of Artificial Intelligence and Machine Learning to design, develop
and implement solutions to solve technical problems.
PSO2: An ability to use Artificial Intelligence and Machine Learning knowledge for successful career
as an employee and an engineering professional.
Course Outcomes
CO1 Illustrate the principles of multivariate data and apply dimensionality reduction techniques
CO2 Demonstrate similarity-based learning methods and perform regression analysis.
CO3 Develop decision trees for classification and regression problems, and Bayesian models for
probabilistic learning.
CO4
Implement the clustering algorithms to share computing resources
Syllabus
Subject: Machine Learning Subject Code: BCSL607
Programming Experiments
1. Develop a program to create histograms for all numerical features and analyze the distribution of
each feature. Generate box plots for all numerical features and identify any outliers. Use
California Housing dataset..
2. Develop a program to Compute the correlation matrix to understand the relationships between
pairs of features. Visualize the correlation matrix using a heatmap to know which variables have
strong positive/negative correlations. Create a pair plot to visualize pairwise relationships between
features. Use California Housing dataset..
3. Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2.
4. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Find-S algorithm to output a description of the set of all hypotheses consistent with the training
examples
5. Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly
generated 100 values of x in the range of [0,1]. Perform the following based on dataset generated.
a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1, else xi ∊
Class1 b. Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
6. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points.
Select appropriate data set for your experiment and draw graphs
7. Develop a program to demonstrate the working of Linear Regression and Polynomial Regression.
Use Boston Housing Dataset for Linear Regression and Auto MPG Dataset (for vehicle fuel
efficiency prediction) for Polynomial Regression
8. Develop a program to demonstrate the working of the decision tree algorithm. Use Breast Cancer
Data set for building the decision tree and apply this knowledge to classify a new sample.
9. Develop a program to implement the Naive Bayesian classifier considering Olivetti Face Data set
for training. Compute the accuracy of the classifier, considering a few test data sets.
10. Develop a program to implement k-means clustering using Wisconsin Breast Cancer data set
and visualize the clustering result.
Program 1
AIM:
To visualize univariate data distribution using histograms and identify outliers using box plots on the
California Housing dataset.
Objectives:
1. Generate histograms for all numerical features.
2. Analyze the distribution of each feature.
3. Create box plots for all numerical features.
4. Identify outliers using the Interquartile Range IQR) method.
Algorithm:
1. Load the California Housing dataset.
2. Extract numerical features from the dataset.
3. Plot histograms for each numerical feature.
4. Plot box plots to detect outliers.
5. Use IQR (Interquartile Range) method to find outliers:
o Compute Q1 (25th percentile) and Q3 (75th percentile).
o Compute IQR = Q3 - Q1.
o Identify outliers as values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR.
6. Display the summary statistics of the dataset.
1. Develop a program to create histograms for all numerical features and analyze the distribution
of each feature. Generate box plots for all numerical features and identify any outliers. Use
California Housing dataset.
Import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(as_frame=True)
housing_df = data.frame
# Plot histograms
plt.figure(figsize=(15, 10))
for i, feature in enumerate(numerical_features):
plt.subplot(3, 3, i + 1)
sns.histplot(housing_df[feature], kde=True, bins=30,
color='blue') plt.title(f'Distribution of {feature}')
plt.tight_layout()plt.show()
print("Outliers Detection:")
outliers_summary = {}
for feature in numerical_features:
Q1 = housing_df[feature].quantile(0.25)
Q3 = housing_df[feature].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = housing_df[(housing_df[feature] < lower_bound) |
(housing_df[feature] > upper_bound)]
outliers_summary[feature] = len(outliers)
print(f"{feature}: {len(outliers)} outliers")
Explanation
Histograms show the distribution of numerical features.
Box plots help detect outliers using the IQR method.
Used for data preprocessing and feature engineering.
Helps in understanding skewness and spread of data.
Essential for identifying anomalies before model training.
Program 2
AIM:
To analyze feature relationships using a correlation matrix and visualize feature relationships using a
heatmap and pair plot on the California Housing dataset.
Objectives:
Algorithm:
2. Develop a program to Compute the correlation matrix to understand the relationships between pairs of
features. Visualize the correlation matrix using a heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to visualize pairwise relationships between features. Use
California Housing dataset.
import pandas as pd
import seaborn as
sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
# Step 1: Load the California Housing Dataset
california_data =
fetch_california_housing(as_frame=True) data =
california_data.frame
Explanation:
PCA reduces dimensions while preserving maximum variance.
Helps avoid overfitting and speeds up computation.
Projects data onto a new set of axes (principal components).
Used for visualizing high-dimensional data in 2D/3D.
Commonly used in image compression and facial recognition.
3. Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2.
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
Output:
AIM:
To implement and demonstrate the Find-S algorithm to find the most specific hypothesis consistent with the
training examples stored in a CSV file.
Objectives:
Algorithm:
4. For a given set of training data examples stored in a .CSV file, implement and demonstrate the Find-
S algorithm to output a description of the set of all hypotheses consistent with the training
examples.
import pandas as pd
def find_s_algorithm(file_path):
data = pd.read_csv(file_path)
print("Training data:")
print(data)
return hypothesis
Output:
Explanation:
Algorithm:
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
data = np.random.rand(100)
train_data = data[:50]
train_labels = labels
test_data = data[50:]
results = {}
for k in k_values:
print(f"Results for k = {k}:")
classified_labels = [knn_classifier(train_data, train_labels,
test_point, k) for test_point in test_data]
results[k] = classified_labels
print("Classification complete.\n")
for k in k_values:
classified_labels = results[k]
class1_points = [test_data[i] for i in range(len(test_data)) if
classified_labels[i] == "Class1"]
class2_points = [test_data[i] for i in range(len(test_data)) if
classified_labels[i] == "Class2"]
plt.figure(figsize=(10,6))
plt.scatter(train_data, [0] * len(train_data), c=["blue" if label ==
"Class1" else "red" for label in train_labels],
label="Training Data", marker="o")
plt.scatter(class1_points, [1] * len(class1_points), c="blue",
label="Class1 (Test)", marker="x")
plt.scatter(class2_points, [1] * len(class2_points), c="red",
label="Class2 (Test)", marker="x")
Output:
Training dataset: First 50 points labeled based on the rule (x <= 0.5 -> Class1, x > 0.5 -> Class2)
Testing dataset: Remaining 50 points to be classified
Results for k = 1:
Point x51 (value: 0.5702) is classified as Class2
Point x52 (value: 0.4654) is classified as Class1
Point x53 (value: 0.7016) is classified as Class2
Point x54 (value: 0.5964) is classified as Class2
Point x55 (value: 0.0643) is classified as Class1
Point x56 (value: 0.2698) is classified as Class1
Point x57 (value: 0.7124) is classified as Class2
Point x58 (value: 0.3219) is classified as Class1
Point x59 (value: 0.2637) is classified as Class1
Point x60 (value: 0.5483) is classified as Class2
Point x61 (value: 0.1561) is classified as Class1
Point x62 (value: 0.1592) is classified as Class1
Point x63 (value: 0.3752) is classified as Class1
Point x64 (value: 0.1299) is classified as Class1
Point x65 (value: 0.6934) is classified as Class2
Point x66 (value: 0.5240) is classified as Class2
Point x67 (value: 0.0203) is classified as Class1
Point x68 (value: 0.3789) is classified as Class1
Point x69 (value: 0.6866) is classified as Class2
Point x70 (value: 0.1834) is classified as Class1
Point x71 (value: 0.4197) is classified as Class1
Point x72 (value: 0.3608) is classified as Class1
Point x73 (value: 0.7579) is classified as Class2
Point x74 (value: 0.1624) is classified as Class1
Point x75 (value: 0.5943) is classified as Class2
Point x76 (value: 0.4097) is classified as Class1
Point x77 (value: 0.6124) is classified as Class2
Point x78 (value: 0.2794) is classified as Class1
Point x79 (value: 0.3193) is classified as Class1
Point x80 (value: 0.0503) is classified as Class1
Point x81 (value: 0.8038) is classified as Class2
Point x83 (value: 0.4230) is classified as Class1
Point x84 (value: 0.7250) is classified as Class2
Point x85 (value: 0.7162) is classified as Class2
Point x86 (value: 0.0725) is classified as Class1
Point x87 (value: 0.0752) is classified as Class1
Point x88 (value: 0.4676) is classified as Class1
Point x89 (value: 0.2256) is classified as Class1
Point x90 (value: 0.4552) is classified as Class1
Point x91 (value: 0.4787) is classified as Class1
Point x92 (value: 0.7390) is classified as Class2
Point x93 (value: 0.0649) is classified as Class1
Point x94 (value: 0.3373) is classified as Class1
Point x95 (value: 0.7719) is classified as Class2
Point x96 (value: 0.0512) is classified as Class1
Point x97 (value: 0.3012) is classified as Class1
Point x98 (value: 0.5966) is classified as Class2
Point x99 (value: 0.2897) is classified as Class1
Point x100 (value: 0.2176) is classified as Class1
Results for k = 2:
Point x51 (value: 0.5702) is classified as Class2
Point x52 (value: 0.4654) is classified as Class1
Point x53 (value: 0.7016) is classified as Class2
Point x54 (value: 0.5964) is classified as Class2
Point x55 (value: 0.0643) is classified as Class1
Point x56 (value: 0.2698) is classified as Class1
Point x57 (value: 0.7124) is classified as Class2
Point x58 (value: 0.3219) is classified as Class1
Point x59 (value: 0.2637) is classified as Class1
Point x60 (value: 0.5483) is classified as Class2
Point x61 (value: 0.1561) is classified as Class1
Point x62 (value: 0.1592) is classified as Class1
Point x63 (value: 0.3752) is classified as Class1
Point x64 (value: 0.1299) is classified as Class1
Point x65 (value: 0.6934) is classified as Class2
Point x66 (value: 0.5240) is classified as Class2
Point x67 (value: 0.0203) is classified as Class1
Point x68 (value: 0.3789) is classified as Class1
Point x69 (value: 0.6866) is classified as Class2
Point x70 (value: 0.1834) is classified as Class1
Point x71 (value: 0.4197) is classified as Class1
Point x72 (value: 0.3608) is classified as Class1
Point x73 (value: 0.7579) is classified as Class2
Point x74 (value: 0.1624) is classified as Class1
Point x75 (value: 0.5943) is classified as Class2
Point x76 (value: 0.4097) is classified as Class1
Point x77 (value: 0.6124) is classified as Class2
Point x78 (value: 0.2794) is classified as Class1
Point x79 (value: 0.3193) is classified as Class1
Point x80 (value: 0.0503) is classified as Class1
Point x81 (value: 0.8038) is classified as Class2
Point x82 (value: 0.0792) is classified as Class1
Point x83 (value: 0.4230) is classified as Class1
Point x84 (value: 0.7250) is classified as Class2
Point x85 (value: 0.7162) is classified as Class2
Point x86 (value: 0.0725) is classified as Class1
Point x87 (value: 0.0752) is classified as Class1
Point x88 (value: 0.4676) is classified as Class1
Point x89 (value: 0.2256) is classified as Class1
Point x90 (value: 0.4552) is classified as Class1
Point x91 (value: 0.4787) is classified as Class1
Point x92 (value: 0.7390) is classified as Class2
Point x93 (value: 0.0649) is classified as Class1
Point x94 (value: 0.3373) is classified as Class1
Point x95 (value: 0.7719) is classified as Class2
Point x96 (value: 0.0512) is classified as Class1
Point x97 (value: 0.3012) is classified as Class1
Point x98 (value: 0.5966) is classified as Class2
Point x99 (value: 0.2897) is classified as Class1
Point x100 (value: 0.2176) is classified as Class1
Results for k = 3:
Point x51 (value: 0.5702) is classified as Class2
Point x53 (value: 0.7016) is classified as Class2
Point x54 (value: 0.5964) is classified as Class2
Point x55 (value: 0.0643) is classified as Class1
Point x56 (value: 0.2698) is classified as Class1
Point x57 (value: 0.7124) is classified as Class2
Point x58 (value: 0.3219) is classified as Class1
Point x59 (value: 0.2637) is classified as Class1
Point x60 (value: 0.5483) is classified as Class2
Point x61 (value: 0.1561) is classified as Class1
Point x62 (value: 0.1592) is classified as Class1
Point x63 (value: 0.3752) is classified as Class1
Point x64 (value: 0.1299) is classified as Class1
Point x65 (value: 0.6934) is classified as Class2
Point x66 (value: 0.5240) is classified as Class2
Point x67 (value: 0.0203) is classified as Class1
Point x68 (value: 0.3789) is classified as Class1
Point x69 (value: 0.6866) is classified as Class2
Point x70 (value: 0.1834) is classified as Class1
Point x71 (value: 0.4197) is classified as Class1
Point x72 (value: 0.3608) is classified as Class1
Point x73 (value: 0.7579) is classified as Class2
Point x74 (value: 0.1624) is classified as Class1
Point x75 (value: 0.5943) is classified as Class2
Point x76 (value: 0.4097) is classified as Class1
Point x77 (value: 0.6124) is classified as Class2
Point x78 (value: 0.2794) is classified as Class1
Point x79 (value: 0.3193) is classified as Class1
Point x80 (value: 0.0503) is classified as Class1
Point x81 (value: 0.8038) is classified as Class2
Point x82 (value: 0.0792) is classified as Class1
Point x83 (value: 0.4230) is classified as Class1
Point x84 (value: 0.7250) is classified as Class2
Point x85 (value: 0.7162) is classified as Class2
Point x86 (value: 0.0725) is classified as Class1
Point x87 (value: 0.0752) is classified as Class1
Point x88 (value: 0.4676) is classified as Class1
Point x89 (value: 0.2256) is classified as Class1
Point x90 (value: 0.4552) is classified as Class1
Point x91 (value: 0.4787) is classified as Class1
Point x92 (value: 0.7390) is classified as Class2
Point x93 (value: 0.0649) is classified as Class1
Point x94 (value: 0.3373) is classified as Class1
Point x95 (value: 0.7719) is classified as Class2
Point x96 (value: 0.0512) is classified as Class1
Point x97 (value: 0.3012) is classified as Class1
Point x98 (value: 0.5966) is classified as Class2
Point x99 (value: 0.2897) is classified as Class1
Point x100 (value: 0.2176) is classified as Class1
Results for k = 4:
Point x51 (value: 0.5702) is classified as Class2
Point x52 (value: 0.4654) is classified as Class1
Point x53 (value: 0.7016) is classified as Class2
Point x54 (value: 0.5964) is classified as Class2
Point x55 (value: 0.0643) is classified as Class1
Point x56 (value: 0.2698) is classified as Class1
Point x57 (value: 0.7124) is classified as Class2
Point x58 (value: 0.3219) is classified as Class1
Point x59 (value: 0.2637) is classified as Class1
Point x60 (value: 0.5483) is classified as Class2
Point x61 (value: 0.1561) is classified as Class1
Point x62 (value: 0.1592) is classified as Class1
Point x63 (value: 0.3752) is classified as Class1
Point x64 (value: 0.1299) is classified as Class1
Point x65 (value: 0.6934) is classified as Class2
Point x66 (value: 0.5240) is classified as Class2
Point x67 (value: 0.0203) is classified as Class1
Point x68 (value: 0.3789) is classified as Class1
Point x69 (value: 0.6866) is classified as Class2
Point x70 (value: 0.1834) is classified as Class1
Point x71 (value: 0.4197) is classified as Class1
Point x72 (value: 0.3608) is classified as Class1
Point x73 (value: 0.7579) is classified as Class2
Point x75 (value: 0.5943) is classified as Class2
Point x76 (value: 0.4097) is classified as Class1
Point x77 (value: 0.6124) is classified as Class2
Point x78 (value: 0.2794) is classified as Class1
Point x79 (value: 0.3193) is classified as Class1
Point x80 (value: 0.0503) is classified as Class1
Point x81 (value: 0.8038) is classified as Class2
Point x82 (value: 0.0792) is classified as Class1
Point x83 (value: 0.4230) is classified as Class1
Point x84 (value: 0.7250) is classified as Class2
Point x85 (value: 0.7162) is classified as Class2
Point x86 (value: 0.0725) is classified as Class1
Point x87 (value: 0.0752) is classified as Class1
Point x88 (value: 0.4676) is classified as Class1
Point x89 (value: 0.2256) is classified as Class1
Point x90 (value: 0.4552) is classified as Class1
Point x91 (value: 0.4787) is classified as Class1
Point x92 (value: 0.7390) is classified as Class2
Point x93 (value: 0.0649) is classified as Class1
Point x94 (value: 0.3373) is classified as Class1
Point x95 (value: 0.7719) is classified as Class2
Point x96 (value: 0.0512) is classified as Class1
Point x97 (value: 0.3012) is classified as Class1
Point x98 (value: 0.5966) is classified as Class2
Point x99 (value: 0.2897) is classified as Class1
Point x100 (value: 0.2176) is classified as Class1
Results for k = 5:
Point x51 (value: 0.5702) is classified as Class2
Point x52 (value: 0.4654) is classified as Class1
Point x53 (value: 0.7016) is classified as Class2
Point x54 (value: 0.5964) is classified as Class2
Point x55 (value: 0.0643) is classified as Class1
Point x56 (value: 0.2698) is classified as Class1
Point x57 (value: 0.7124) is classified as Class2
Point x58 (value: 0.3219) is classified as Class1
Point x59 (value: 0.2637) is classified as Class1
Point x60 (value: 0.5483) is classified as Class2
Point x61 (value: 0.1561) is classified as Class1
Point x62 (value: 0.1592) is classified as Class1
Point x63 (value: 0.3752) is classified as Class1
Point x64 (value: 0.1299) is classified as Class1
Point x65 (value: 0.6934) is classified as Class2
Point x66 (value: 0.5240) is classified as Class2
Point x67 (value: 0.0203) is classified as Class1
Point x68 (value: 0.3789) is classified as Class1
Point x69 (value: 0.6866) is classified as Class2
Point x70 (value: 0.1834) is classified as Class1
Point x71 (value: 0.4197) is classified as Class1
Point x72 (value: 0.3608) is classified as Class1
Point x73 (value: 0.7579) is classified as Class2
Point x74 (value: 0.1624) is classified as Class1
Point x75 (value: 0.5943) is classified as Class2
Point x76 (value: 0.4097) is classified as Class1
Point x77 (value: 0.6124) is classified as Class2
Point x78 (value: 0.2794) is classified as Class1
Point x79 (value: 0.3193) is classified as Class1
Point x80 (value: 0.0503) is classified as Class1
Point x81 (value: 0.8038) is classified as Class2
Point x82 (value: 0.0792) is classified as Class1
Point x83 (value: 0.4230) is classified as Class1
Point x84 (value: 0.7250) is classified as Class2
Point x85 (value: 0.7162) is classified as Class2
Point x86 (value: 0.0725) is classified as Class1
Point x87 (value: 0.0752) is classified as Class1
Point x88 (value: 0.4676) is classified as Class1
Point x89 (value: 0.2256) is classified as Class1
Point x90 (value: 0.4552) is classified as Class1
Point x91 (value: 0.4787) is classified as Class2
Point x92 (value: 0.7390) is classified as Class2
Point x93 (value: 0.0649) is classified as Class1
Point x94 (value: 0.3373) is classified as Class1
Point x95 (value: 0.7719) is classified as Class2
Pointx96(value: 0.0512) is classified as Class1
Point x97 (value: 0.3012) is classified as Class1
Point x98 (value: 0.5966) is classified as Class2
Point x99 (value: 0.2897) is classified as Class1
Pointx100 (value: 0.2176) is classified as Class1
Results for k = 20:
Point x51 (value: 0.5702) is classified as Class2
Point x52 (value: 0.4654) is classified as Class1
Point x53 (value: 0.7016) is classified as Class2
Point x54 (value: 0.5964) is classified as Class2
Point x55 (value: 0.0643) is classified as Class1
Point x56 (value: 0.2698) is classified as Class1
Point x57 (value: 0.7124) is classified as Class2
Point x58 (value: 0.3219) is classified as Class1
Point x59 (value: 0.2637) is classified as Class1
Point x60 (value: 0.5483) is classified as Class2
Point x61 (value: 0.1561) is classified as Class1
Point x62 (value: 0.1592) is classified as Class1
Point x63 (value: 0.3752) is classified as Class1
Point x64 (value: 0.1299) is classified as Class1
Point x65 (value: 0.6934) is classified as Class2
Point x66 (value: 0.5240) is classified as Class2
Point x67 (value: 0.0203) is classified as Class1
Point x68 (value: 0.3789) is classified as Class1
Point x69 (value: 0.6866) is classified as Class2
Point x70 (value: 0.1834) is classified as Class1
Point x71 (value: 0.4197) is classified as Class1
Point x72 (value: 0.3608) is classified as Class1
Point x73 (value: 0.7579) is classified as Class2
Point x74 (value: 0.1624) is classified as Class1
Point x75 (value: 0.5943) is classified as Class2
Point x76 (value: 0.4097) is classified as Class1
Point x77 (value: 0.6124) is classified as Class2
Point x78 (value: 0.2794) is classified as Class1
Point x79 (value: 0.3193) is classified as Class1
Point x80 (value: 0.0503) is classified as Class1
Point x81 (value: 0.8038) is classified as Class2
Point x82 (value: 0.0792) is classified as Class1
Point x83 (value: 0.4230) is classified as Class1
Point x84 (value: 0.7250) is classified as Class2
Point x85 (value: 0.7162) is classified as Class2
Point x86 (value: 0.0725) is classified as Class1
Point x87 (value: 0.0752) is classified as Class1
Point x88 (value: 0.4676) is classified as Class1
Point x89 (value: 0.2256) is classified as Class1
Point x90 (value: 0.4552) is classified as Class1
Point x91 (value: 0.4787) is classified as Class1
Point x92 (value: 0.7390) is classified as Class2
Point x93 (value: 0.0649) is classified as Class1
Point x94 (value: 0.3373) is classified as Class1
Point x95 (value: 0.7719) is classified as Class2
Point x96 (value: 0.0512) is classified as Class1
Point x97 (value: 0.3012) is classified as Class1
Point x98 (value: 0.5966) is classified as Class2
Point x99 (value: 0.2897) is classified as Class1
Pointx100 (value: 0.2176) is classified as Class1
Classification complete.
Explanation:
Objectives:
Algorithm:
6. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points.
Select appropriate data set for your experiment and draw graphs
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
X = np.linspace(0, 2 * np.pi, 100)
y = np.sin(X) + 0.1 * np.random.randn(100) X_bias =
np.c_[np.ones(X.shape), X]
x_test = np.linspace(0, 2 * np.pi, 200) x_test_bias =
np.c_[np.ones(x_test.shape), x_test]
tau = 0.5
y_pred = np.array([locally_weighted_regression(xi, X_bias, y, tau) for xi in
x_test_bias])
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='red', label='Training Data', alpha=0.7)
plt.plot(x_test, y_pred, color='blue', label=f'LWR Fit (tau={tau})',
linewidth=2)
plt.xlabel('X',fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Locally Weighted Regression', fontsize=14)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.show()
Output:
Explanation:
AIM:
To demonstrate Linear Regression on the Boston Housing dataset and Polynomial Regression on the Auto
MPG dataset.
Objectives:
Algorithm:
Explanation:
Linear regression fits a straight line to data.
Polynomial regression fits a curved line for non-linear patterns.
Evaluated using Mean Squared Error (MSE) & R² score.
Feature scaling improves polynomial regression performance.
Used in housing price prediction and sales forecasting.
7. Develop a program to demonstrate the working of Linear Regression and Polynomial Regression. Use
Boston Housing Dataset for Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency
prediction) for Polynomial Regression.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler from
sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error, r2_score
def linear_regression_california():
housing = fetch_california_housing(as_frame=True)
X = housing.data[["AveRooms"]]
y = housing.target
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
def polynomial_regression_auto_mpg():
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-
mpg/auto- mpg.data"
column_names = ["mpg", "cylinders", "displacement", "horsepower", "weight",
"acceleration", "model_year", "origin"]
data = pd.read_csv(url, sep='\s+', names=column_names, na_values="?")
data = data.dropna()
X = data["displacement"].values.reshape(-1, 1)
y = data["mpg"].values
y_pred = poly_model.predict(X_test)
AIM:
To build a Decision Tree classifier using the Breast Cancer dataset and apply it to classify a
Algorithm:
8. Develop a program to demonstrate the working of the decision tree algorithm. Use Breast
Cancer Data set for building the decision tree and apply this knowledge to classify a new sample.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
data = load_breast_cancer()
x=data.data
y=data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
plt.figure(figsize=(12,8))
tree.plot_tree(clf, filled=True, feature_names=data.feature_names,
class_names=data.target_names)
plt.title("Decision Tree - Breast Cancer Dataset")
plt.show()
Output:
Explanation:
Objectives:
Algorithm:
import numpy as np
from sklearn.datasets import fetch_olivetti_faces
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
import matplotlib.pyplot as plt
data = fetch_olivetti_faces(shuffle=True,random_state=42)
X = data.data
y = data.target
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
plt.show()
Output:
Explanation :
Confusion Matrix:
[[2 0 0 ... 0 0 0]
[0 2 0 ... 0 0 0]
[0 0 2 ... 0 0 1]
...
[0 0 0 ... 1 0 0]
[0 0 0 ... 0 3 0]
[0 0 0 ... 0 0 5]]
Cross-validation accuracy: 87.25%
Program 10: k-Means Clustering
AIM:
To implement k-Means clustering using the Wisconsin Breast Cancer dataset and
visualize clustering results.
Objectives:
Algorithm:
10. Develop a program to implement k-means clustering using Wisconsin Breast Cancer data
set and visualize the clustering result.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, classification_report
data = load_breast_cancer()
X = data.data
y = data.target
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print("Confusion Matrix:")
print(confusion_matrix(y, y_kmeans))
print("\nClassification Report:")
print(classification_report(y, y_kmeans))
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
df = pd.DataFrame(X_pca, columns=['PC1', 'PC2'])
df['Cluster'] = y_kmeans
df['True Label'] = y
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1',
s=100, edgecolor='black', alpha=0.7)
plt.title('K-Means Clustering of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='True Label',
palette='coolwarm', s=100, edgecolor='black', alpha=0.7)
plt.title('True Labels of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="True Label")
plt.show()
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1',
s=100, edgecolor='black', alpha=0.7)
centers = pca.transform(kmeans.cluster_centers_)
plt.scatter(centers[:, 0], centers[:, 1], s=200, c='red', marker='X',
label='Centroids')
Descriptive statistics summarizes and organizes data to describe its main features using
measures like mean, median, mode, and standard deviation.