ML Lab Manual
ML Lab Manual
Experiment -1
Write a python program to compute Central Tendency Measures: Mean, Median, Mode
Measure of Dispersion: Variance, Standard Deviation
statistics Library
Overview:
Purpose: Simplifies descriptive statistical calculations such as measures of central tendency (mean,
median, mode) and measures of dispersion (variance, standard deviation).
Key Functions:
o mean(data): Computes the average of the data.
o median(data): Finds the middle value in a sorted dataset.
o mode(data): Determines the most frequent value in the dataset.
o variance(data): Measures the spread of the dataset from the mean.
o stdev(data): Calculates the standard deviation, quantifying data spread.
Use Case:
Example Insights:
In the program, we used both manual calculations and statistics functions to showcase the ease and
accuracy the library provides. The library eliminates the need to write complex loops for basic statistical
tasks.
Program:
import statistics
# Mean
mean = sum(data) / len(data)
print("\nMean (manual):", mean)
print("Mean (library):", statistics.mean(data))
# Median
sorted_data = sorted(data)
n = len(data)
median = (sorted_data[n // 2] + sorted_data[(n - 1) // 2]) / 2 if n % 2 == 0 else sorted_data[n // 2]
print("\nMedian (manual):", median)
print("Median (library):", statistics.median(data))
# Mode
mode = statistics.mode(data)
print("\nMode (library):", mode)
# Variance
mean_square_diff = sum((x - mean) ** 2 for x in data) / len(data)
print("\nVariance (manual):", mean_square_diff)
print("Variance (library):", statistics.variance(data))
# Standard Deviation
std_dev = mean_square_diff ** 0.5
print("\nStandard Deviation (manual):", std_dev)
print("Standard Deviation (library):", statistics.stdev(data))
Experiment-2
Math Library
Overview:
Use Case:
Ideal for mathematical modeling, geometry, physics simulations, and simple calculations.
Example Insights:
The program demonstrates basic operations like square root and factorial to show the versatility of the
library in numerical computations. It bridges the gap between basic arithmetic and advanced
calculations.
Program:
import math
# Arithmetic Operations
print("\nBasic Operations:")
print(f"Square root of {num}: {math.sqrt(num)}")
print(f"Factorial of {int(num)} (if integer): {math.factorial(int(num)) if num.is_integer() and num >= 0 else
'Factorial not defined for this input.'}")
print(f"{num} raised to the power 3: {math.pow(num, 3)}")
# Trigonometric Functions
print("\nTrigonometric Functions:")
angle = float(input("Enter an angle (in degrees): "))
radians = math.radians(angle)
print(f"Sine of {angle} degrees: {math.sin(radians)}")
print(f"Cosine of {angle} degrees: {math.cos(radians)}")
print(f"Tangent of {angle} degrees: {math.tan(radians)}")
# Rounding Functions
print("\nRounding Functions:")
print(f"Ceiling of {num}: {math.ceil(num)}")
print(f"Floor of {num}: {math.floor(num)}")
Numpy Library
Overview:
Purpose: Provides support for large multi-dimensional arrays, matrices, and a collection of mathematical
functions to operate on them efficiently.
Key Features:
o Arrays and Broadcasting: Efficient operations on large datasets without explicit loops.
o Mathematical Operations: Supports functions like mean, sum, min, max, and element-wise
operations.
o Linear Algebra: Offers matrix operations, eigenvalues, and vector manipulations.
o Random Sampling: Generates random numbers or samples datasets.
Use Case:
Foundational library for numerical computations in scientific computing and machine learning.
Example Insights:
In the program, np.array() was used to create an array. Operations like np.sum() and np.mean()
demonstrate how operations eliminate manual iteration for performance optimization.
It's the backbone of Python's data science stack and integrates seamlessly with libraries like pandas,
scipy, and matplotlib.
Program:
import numpy as np
# Element-wise Operations
print("\nElement-wise Operations:")
print("Array squared:", data ** 2)
print("Array multiplied by 2:", data * 2)
# Linear Algebra
print("\nLinear Algebra Operations:")
matrix = data.reshape(-1, 1) if len(data) % 2 == 0 else np.append(data, 0).reshape(-1, 2)
print("Matrix (reshaped data):\n", matrix)
if matrix.shape[0] == matrix.shape[1]: # Square matrix required
print("Matrix Determinant:", np.linalg.det(matrix))
# Random Sampling
print("\nRandom Sampling:")
rand_arr = np.random.randint(0, 100, 5)
print("Random array of 5 integers:", rand_arr)
scipy Library
Overview:
Purpose: Built on top of numpy, it extends functionality to include advanced algorithms for scientific and
technical computing.
Modules:
o stats: For statistical functions like Z-scores, t-tests, and distributions.
o integrate: For numerical integration.
o optimize: For solving optimization problems.
o signal: For signal processing tasks.
o interpolate: For interpolation methods.
Use Case:
Used in scenarios requiring precise scientific computations like engineering simulations, physics models,
or statistical testing.
Advanced Applications:
Multivariate statistical analysis, curve fitting, and optimization problems in research domains.
Program:
# Statistical Functions
print("\nStatistical Analysis:")
print("Mean:", stats.tmean(data))
print("Variance:", stats.tvar(data))
print("Standard Deviation:", stats.tstd(data))
print("Mode:", stats.mode(data, keepdims=False))
print("Z-Scores:", stats.zscore(data))
# Integration
print("\nIntegration Example:")
result, _ = integrate.quad(lambda x: x**2, 0, 1) # ∫(x^2) dx from 0 to 1
print("Definite integral of x^2 from 0 to 1:", result)
# Probability Distribution
print("\nProbability Distribution:")
mean, std = np.mean(data), np.std(data)
print(f"PDF of normal distribution at mean ({mean}):", stats.norm.pdf(mean, loc=mean, scale=std))
print(f"CDF of normal distribution at mean ({mean}):", stats.norm.cdf(mean, loc=mean, scale=std))
Key Functionalities :
Library Feature Description
math Trigonometric Functions sin, cos, tan with angle conversion from degrees to radians
math Logarithms and Exponentials Base n logarithm, natural logarithm, and exponentials
numpy Array Manipulation Array reshaping, basic operations (sum, mean, max, min)
PANDAS LIBRARY
o The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis"
and was created by Wes McKinney in 2008.
o Pandas can clean messy data sets, and make them readable and relevant.
1. Series:
o One-dimensional labeled array.
o Acts like a NumPy array but with labels.
import pandas as pd
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
print(s['a']) # Access by label
2. DataFrame:
o Two-dimensional, tabular data structure.
o Each column is a Series.
3. Index:
o Immutable labels for rows and columns.
o Can be customized for hierarchical indexing.
Data Transformation:
df['Age'] = df['Age'].apply(lambda x: x + 5)
df['NewCol'] = df['Age'] * 2
Combine datasets:
o Concatenation:
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df.resample('M').mean() # Monthly aggregation
Data Loading:
o Supports various formats like CSV, Excel, SQL, JSON, and more.
import pandas as pd
data = pd.read_csv('data.csv')
Data Cleaning:
o Removing duplicates:
data.drop_duplicates(inplace=True)
Data Transformation:
o Renaming columns:
o Filtering rows:
Data Aggregation:
o Grouping data:
grouped = data.groupby('column').mean()
Applications in ML:
Feature Engineering: Handle categorical and numerical transformations.
Data Preprocessing: Clean and prepare datasets before feeding them into ML models.
Integration: Works seamlessly with other ML libraries like NumPy and Scikit-learn.
MATPLOT LIBRARY
Matplotlib is a versatile library for creating static, animated, and interactive visualizations in Python. It helps
visualize patterns, trends, and distributions in data.
Key Features:
Line Plots:
Bar Charts:
plt.bar(categories, values)
Scatter Plots:
plt.scatter(x, y)
Histograms:
plt.hist(data['column'], bins=10)
Customization:
plt.grid(True)
plt.legend(['Line 1', 'Line 2'])
plt.show()
Modify ticks:
ax.set_xticks([0, 5, 10])
ax.set_xticklabels(['Low', 'Medium', 'High'])
Subplots:
Applications in ML:
EDA: Gain insights into the dataset through visual exploration.
Model Evaluation: Visualize performance metrics like accuracy, loss curves, or confusion matrices.
Feature Relationships: Understand how features interact.
LAB-4
Write a Python program to implement Simple Linear Regression
print(df_data.head())
#Data Cleaning
print(df_data.isnull().sum())
#df_data.fillna(df_data[1].mean,inplace=True)
print(df_data.head())
#Data Visualization
pt.scatter(x,y)
pt.xlabel("Hours Studied")
pt.ylabel("Performance Index")
pt.show()
#Training Phrase
model=LinearRegression();
model.fit(x_train,y_train);
#Testing Phrase
prediction=model.predict(x_test)
print(prediction)
print(df_data.head())
print(df_data.head())
param_grid = {
'criterion': ['gini', 'entropy'],
'max_depth': [3],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
#Training the dataset with best model obtained from grid search
best_clf.fit(X_train, y_train)
#Testing
y_pred_optimized = best_clf.predict(X_test)
print(y_pred_optimized)
print(df_dataset.head())
#Testing Phrase
predictions=model.predict(xtest)
#Testing Phrase
predictions=model.predict(xtest)
print(df_dataset)
#Training Phrase
lr=LogisticRegression(max_iter=400)
lr.fit(xtrain,ytrain)
#Testing Phrase
predictions=lr.predict(xtest)
predictions
print(inertias)
centers = pca.transform(kmeans.cluster_centers_)
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, marker='X', label='Cluster
Centers')
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
results = {}
# Training and evaluating each model
for model_name, model in models.items():
# Training the model
model.fit(X_train_scaled, y_train)
# Storing results
results[model_name] = {
"accuracy": accuracy,
"classification_report": classification_rep,
"confusion_matrix": confusion_mat
}
plt.figure(figsize=(8, 6))
sns.barplot(x=model_names, y=accuracies, palette="viridis")
plt.title("Model Comparison (Accuracy)")
plt.ylabel("Accuracy")
plt.show()