[go: up one dir, main page]

0% found this document useful (0 votes)
3 views5 pages

Machine Learning Labnem

The document outlines a machine learning lab course with various experiments focused on data analysis and modeling techniques using different datasets. Key tasks include creating histograms and box plots, computing correlation matrices, implementing algorithms like k-NN and decision trees, and performing PCA. The experiments utilize datasets such as California Housing, Iris, and Breast Cancer to demonstrate practical applications of machine learning concepts.

Uploaded by

navyanagraj0903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

Machine Learning Labnem

The document outlines a machine learning lab course with various experiments focused on data analysis and modeling techniques using different datasets. Key tasks include creating histograms and box plots, computing correlation matrices, implementing algorithms like k-NN and decision trees, and performing PCA. The experiments utilize datasets such as California Housing, Iris, and Breast Cancer to demonstrate practical applications of machine learning concepts.

Uploaded by

navyanagraj0903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Machine learning lab

Course Code: BCSL606


SLno EXPERIMENT
1 Develop a program to create histograms for all numerical features and analyze the
distribution of each feature. Generate box plots for all numerical features and identify any
outliers. Use California Housing dataset.

2 Develop a program to Compute the correlation matrix to understand the relationships


between pairs of features. Visualize the correlation matrix using a heat map to know which
variables have strong positive/negative correlations. Create a pair plot to visualize pairwise
relationships between features. Use California Housing dataset
3 Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2
4 For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Find-S algorithm to output a description of the set of all hypotheses consistent with the
training examples
5 Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly

generated. a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1,
generated 100 values of x in the range of [0,1]. Perform the following based on dataset

else xi ∊ Class1 b. Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
6. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs
7. Develop a program to demonstrate the working of Linear Regression and Polynomial
Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG Dataset (for
vehicle fuel efficiency prediction) for Polynomial Regression.
8 Develop a program to demonstrate the working of the decision tree algorithm. Use Breast
Cancer Data set for building the decision tree and apply this knowledge to classify a new
sample
9. Develop a program to implement the Naive Bayesian classifier considering Olivetti Face Data
set for training. Compute the accuracy of the classifier, considering a few test data sets.
10. Develop a program to implement k-means clustering using Wisconsin Breast Cancer data set
and visualize the clustering result.
1. Develop a program to create histograms for all numerical features and analyze the distribution of
each feature. Generate box plots for all numerical features and identify any outliers. Use California
Housing dataset.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os

# Step 1: Load the California Housing dataset from directory


file_path = r"C:/Users/Admin/Desktop/housing.csv" # Update this path to your
local file
if os.path.exists(file_path):
housing_df = pd.read_csv(file_path)
print("\nDataset loaded successfully!")
else:
print(f"\nError: File not found at {file_path}")
exit()

# Step 2: Create histograms for numerical features


numerical_features = housing_df.select_dtypes(include=[np.number]).columns

# Plot histograms
plt.figure(figsize=(15, 10))
for i, feature in enumerate(numerical_features):
plt.subplot(3, 3, i + 1)
sns.histplot(housing_df[feature], kde=True, bins=30, color='blue')
plt.title(f'Distribution of {feature}')
plt.tight_layout()
plt.show()

# Step 3: Generate box plots for numerical features


plt.figure(figsize=(15, 10))
for i, feature in enumerate(numerical_features):
plt.subplot(3, 3, i + 1)
sns.boxplot(x=housing_df[feature], color='orange')
plt.title(f'Box Plot of {feature}')
plt.tight_layout()
plt.show()

# Step 4: Identify outliers using the IQR method


print("\nOutliers Detection:")
outliers_summary = {}
for feature in numerical_features:
Q1 = housing_df[feature].quantile(0.25)
Q3 = housing_df[feature].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = housing_df[(housing_df[feature] < lower_bound) |
(housing_df[feature] > upper_bound)]
outliers_summary[feature] = len(outliers)
print(f"{feature}: {len(outliers)} outliers")

# Step 5: Print a summary of the dataset


print("\nDataset Summary:")
print(housing_df.describe())

2. Develop a program to compute the correlation matrix to understand the relationships


between pairs of features. Visualize the correlation matrix using a heat-map to know
which variables have strong positive/negative correlations. Create a pair plot to
visualize pair-wise relationships between features. Use California Housing dataset.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os

# Step 1: Load the California Housing Dataset from local directory


file_path = r"C:/Users/Admin/Desktop/housing.csv" # Replace with your local file path

# Check if the file exists


if os.path.exists(file_path):
data = pd.read_csv(file_path)
else:
print(f"File not found: {file_path}")
exit()

# Step 2: Compute the correlation matrix


correlation_matrix = data.corr(numeric_only=True)

# Step 3: Visualize the correlation matrix using a heatmap


plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix of California Housing Features')
plt.show()

# Step 4: Create a pair plot to visualize pairwise relationships


sns.pairplot(data, diag_kind='kde', plot_kws={'alpha': 0.5})
plt.suptitle('Pair Plot of California Housing Features', y=1.02)
plt.show()
OUTPUT:

You might also like