Ds Lab-1
Ds Lab-1
AIM:
To work with numpy arrays
a.REAL AND IMAGINARAY PARTS OF ARRAY OF COMPLEX
NUMBERS
ALGORITHM:
STEP1: Input:An array of complex numbers
STEP2: For each complex number z in the array: Calculate the magnitude r of z:
o r = sqrt(real(z)^2 + imag(z)^2)
STEP3: Calculate the angle theta of z: theta = arctan2(imag(z), real(z))
STEP4: Calculate the square root of the magnitude: sqrt_r = sqrt(r)
STEP5: Calculate the half-angle: half_theta = theta / 2
STEP6: Calculate the real and imaginary parts of the square root: real_part =
sqrt_r * cos(half_theta).,imaginary_part = sqrt_r * sin(half_theta)
STEP7: Output:An array of real parts and an array of imaginary parts.
PROGRAM:
import numpy as np
x = np.sqrt([1+0j])
y = np.sqrt([0+1j])
print("Original array: x", x)
print("Original array: y", y)
print("Real part of the array:")
print(x.real)
print(y.real)
print("Imaginary part of the array:")
print(x. imag)
print(y. imag)
OUTPUT:
Original array: x [1.+0.j]
Original array: y [0.70710678+0.70710678j]
Real part of the array:
[1.]
[0.70710678]
Imaginary part of the array:
[0.]
[0.70710678]
OUTPUT:
One dimensional array:[0 1 2 3 4]
Two dimensional array:[[0 1 2 3 4]
[5 6 7 8 9]
0:0
1:1
2:2
3:3
4:4
0:5
1:6
2:7
3:8
4:9
e.ELEMENT WISE OPERATION
ALGORITHM:
STEP1: Import the numpy library using the import numpy as np statement. This
library provides tools for numerical operations and arrays.
STEP2: Create two NumPy arrays x and y containing the desired elements.
These arrays represent the numbers to be compared.
STEP3: Print the x and y arrays to display the original numbers that will be
compared.
STEP4: Use the following NumPy functions to perform element-wise
comparisons between x and y
STEP5: Print the results of each comparison, which will be boolean arrays
indicating the element-wise relationships between x and y.
PROGRAM
import numpy as np
x = np.array([3,5])
y = np.array([2,5])
print(“Original numbers:”)
print(x)
print(y)
print(“Comaprison-greater”)
print(np.greater(x,y))
print(“Comparison-greater_equal”)
print(np.greater_equal(x,y))
print(“Comparison-less”)
print(np.less(x,y))
print(“Comparison-less_equal”)
print(np.less_equal(x,y))
OUTPUT:
Original number: [ 3 5 ]
[2 5]
Comparison-greater:
[True False]
Comparison-greater_equal:
[True True]
Comparison-less:
[False False]
Comparison-less_equal:
[False True]
RESULT:
Thus, the working with numpy arrays was successfully executed and
verified.
EX NO:3
WORKING WITH PANDAS DATAFRAME
DATE:
AIM:
To work with pandas dataframe
a.MERGING TWO DATAFRAMES
ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis
STEP2: Create dictionaries data1 and data2 containing the following key-value
pairs:
• 'ID': A list of IDs.
• 'Name', 'Age': Lists of names and ages (in data1).
• 'City', 'Salary': Lists of cities and salaries (in data2)
STEP3: Create DataFrames df1 and df2 from the respective dictionaries using
the pd.DataFrame() function. This converts the dictionaries into tabular data
structures.
STEP4: Use the pd.merge() function to merge df1 and df2 based on the 'ID'
column. Specify the how='inner' parameter to perform an inner join, which
keeps only the rows that have matching values in both DataFrames.
STEP5: Print the merged_df DataFrame to display the results. This will output a
table containing the combined data from df1 and df2, with the rows joined
based on the matching 'ID' values.
PROGRAM:
import pandas as pd
data1 = {'ID': [1, 2, 3],'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df1 = pd.DataFrame(data1)
data2 = {'ID': [1, 2, 4], 'City': ['New York', 'Los Angeles', 'Chicago'],
'Salary': [70000, 80000, 60000]}
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)
OUTPUT:
ID Name Age City Salary
0 1 Alice 25 New York 70000
1 2 Bob 30 Los Angeles 80000
2 3 Charlie 35 Chicago 60000
OUTPUT:
Department Finance HR IT
Name
Alice 0 51000.0 0.0
Bob 0 0.0 60500.0
Charlie 55000 0.0 0.0
OUTPUT:
Duplicate rows:
Name Age City
2 Alice 25 New York
4 Bob 30 Los Angeles
DataFrame without duplicates:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
3 Charlie 35 Chicago
d.ACCESS ROW BY INDEX
ALGORITHM:
STEP1: Import the pandas library for data manipulation and analysis.
STEP2: Create a Python dictionary data with keys representing column names
('Name', 'Age', 'City') and values as lists containing corresponding data.
STEP3: Use pd.DataFrame(data) to create a pandas DataFrame from the
dictionary.This DataFrame will have columns 'Name', 'Age', and 'City'.
STEP4: Use df.loc[1] to access the second row of the DataFrame. Remember
that indexing in Python starts from 0, so the second row has index 1.
STEP5: Use print(second_row) to display the selected row.
PROGRAM
import pandas as pd
data = {'Name': ['Gen', 'Mary', 'Alice'], 'Age': [22, 28, 35], 'City': ['New
York', 'Londan', 'Coimbatore']}
df = pd.DataFrame(data)
second_row = df.loc[1]
print(second_row)
OUTPUT:
Name Mary
Age 28
City Londan
Name: 1, dtype: object
e. CALCULATE AVERAGE AND FILTER ROWS
ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis.
STEP2: Create a dictionary named data containing the following key-value
pairs:
o 'Name': A list of employee names.
o 'Age': A list of employee ages.
o 'Salary': A list of employee salaries.
STEP3: Create a pandas DataFrame named df from the data dictionary using
the pd.DataFrame(data) function. This converts the dictionary into a tabular data
structure.
STEP4: Calculate the average salary using the df['Salary'].mean() expression.
This computes the mean value of the salaries in the 'Salary' column.
STEP5:Filter the DataFrame to only include rows where the 'Salary' is greater
than the avg_salary. This creates a new DataFrame df_filtered containing only
the employees whose salaries exceed the average.
STEP6: Print the df_filtered DataFrame to display the results. This will output a
table containing the names, ages, and salaries of the employees who earn more
than the average salary.
PROGRAM
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [25, 30, 35, 40, 22],
'Salary': [50000, 60000, 55000, 65000, 48000]}
df = pd.DataFrame(data)
avg_salary = df['Salary'].mean()
df_filtered = df[df['Salary'] > avg_salary]
print(df_filtered)
OUTPUT:
Name Age Salary
1 Bob 30 60000
2 Charlie 35 55000
3 David 40 65000
RESULT:
Thus the working with pandas dataframe was successfully executed
and verified.
EX NO:4
BASIC PLOT USING MATPLOTLIB
DATE:
AIM:
To write a python program of basic plot using matplotlib
a.PLOTTING SINE AND COSINE FUNCTIONS WITH LEGENDS IN
MATPLOTLIB
ALGORITHM
STEP1: Import numpy for numerical operations and matplotlib.pyplot for
plotting
STEP2: Use np.linspace() to generate 1000 evenly spaced points between 0 and
10, and assign them to the x variable.
STEP3: Create a figure and an axis object using plt.subplots().
STEP4: Use ax.plot() to plot the sine and cosine functions:Plot the sine
function with a dashed blue line ('--b') and label it 'Sine'.Plot the cosine function
with a red line (c='r') and label it 'Cosine'.
STEP5: Use ax.axis('equal') to set the aspect ratio of the axes to be equal,
ensuring that the scales of the x and y axes are the same.
STEP6: Use ax.legend(loc="lower left") to add a legend to the plot. The
loc="lower left" argument specifies the location of the legend.
STEP7: Implicitly display the plot using plt.show() (not explicitly included in
the code, but it's the default behavior).
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 1000)
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), '--b', label='Sine')
ax.plot(x, np.cos(x), c='r', label='Cosine')
ax.axis('equal')
leg = ax.legend(loc="lower left")
OUTPUT
RESULT:
Thus the program of basic matplot using matplotlib was successfully
executed and verified.
EX NO :5
STATISTICAL AND PROBABILITY MEASURES
DATE :
AIM:
To write a python program of statsical and probability measures.
a.FREQUENCY DISTRIBUTIONS
ALGORITHM:
STEP1: Create an empty dictionary to store the frequency of each number.Each
key in the dictionary will represent a unique number, and its corresponding
value will be the count of that number.
STEP2: For each number in the list: Check if the number is already a key in the
dictionary:
o If it is, increment its corresponding value by 1.
o If it's not, add the number as a new key to the dictionary with an
initial value of 1.
STEP3: Iterate over the key-value pairs in the dictionary. For each pair, print the
number and its corresponding frequency in a readable format.
PROGRAM
data = [5, 1, 2, 2, 3, 1, 5, 5, 4, 3, 4, 4, 1, 5]
frequency = {}
for item in data:
frequency[item] = frequency.get(item, 0) + 1
print("Frequency Distribution:")
for item, count in frequency.items():
print(f"{item}: {count}")
OUTPUT:
Frequency Distribution:
5: 4
1: 3
2: 2
3: 2
4: 3
b.MEAN,MODE,STANDARD DEVIATION
ALGORITHM
STEP1: Sum all the numbers in the list. Divide the sum by the total number of
elements in the list.
STEP2: Find the number that occurs most frequently in the list. If multiple
numbers occur with the same highest frequency, the mode can be multiple
values.
STEP3: Calculate the mean of the list. For each number in the list:
• Subtract the mean from the number.
• Square the difference. Calculate the average of the squared differences.
Take the square root of the average.
PROGRAM
import statistics
def calculate_statistics(data):
mean = statistics.mean(data)
mode = statistics.mode(data)
std_dev = statistics.stdev(data)
return mean, mode, std_dev
data = [1, 2, 2, 3, 4, 4, 4, 5, 6, 7]
mean, mode, std_dev = calculate_statistics(data)
print("Mean:", mean)
print("Mode:", mode)
print("Standard Deviation:", std_dev)
OUTPUT
Mean: 3.8
Mode: 4
Standard Deviation: 1.8737959096740262
c.VARIABILITY
ALGORITHM
STEP1: Import the statistics library to utilize its functions for variance and
standard deviation calculations.
STEP2: Create a function calculate_variability that takes a list of numbers
(data) as input.
STEP3: Find the maximum and minimum values in the data list. Subtract the
minimum value from the maximum value to obtain the range.
STEP4: Use the statistics.variance and statistics.stdev functions to calculate the
variance and standard deviation of the data list, respectively.
STEP5: Return the calculated range, variance, and standard deviation.Print the
calculated values to the console.
PROGRAM
import statistics
def calculate_variability(data):
data_range = max(data) - min(data)
variance = statistics.variance(data)
std_dev = statistics.stdev(data)
return data_range, variance, std_dev
data = [4, 8, 6, 5, 3, 8, 9, 7, 5]
data_range, variance, std_dev = calculate_variability(data)
print("Range:", data_range)
print("Variance:", variance)
print("Standard Deviation:", std_dev)
OUTPUT
Range: 6
Variance: 4.111111111111111
Standard Deviation: 2.0275875100994063
d.NORMAL CURVES
ALGORITHM
STEP1: Import numpy as np for numerical computations. Import
matplotlib.pyplot as plt for plotting functionalities.
STEP2: Set the desired mean (mean) of the normal distribution.Set the desired
standard deviation (std_dev) of the normal distribution.
STEP3: Create a NumPy array x of evenly spaced values using np.linspace.This
array will represent the X-axis of the plot.
STEP4: Define the mathematical formula for the normal distribution probability
density function (PDF).It involves the mean, standard deviation, and the
constant values (np.pi and np.exp).Use vectorized operations with NumPy for
efficient calculation.Store the calculated PDF values in a NumPy array y.
STEP5: Use plt.plot(x, y) to create a line plot of the data points (x-values vs.
probability densities). Set the plot title using plt.title("Normal Distribution
Curve"). Label the x-axis and y-axis using plt.xlabel("X-axis") and
plt.ylabel("Probability Density").
STEP6: Use plt.show() to display the generated plot.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
mean = 0
std_dev = 1
x = np.linspace(-4, 4, 1000)
y = (1 / (std_dev * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mean) / std_dev)
** 2)
plt.plot(x, y)
plt.title("Normal Distribution Curve")
plt.xlabel("X-axis")
plt.ylabel("Probability Density")
plt.show()
OUTPUT
f.CORRELATION COEFFICIENT
ALGORITHM
STEP1: Import the numpy library as np to perform numerical operations.
STEP2: Create a function named correlation_coefficient that takes two NumPy
arrays x and y as input.
STEP3: Use np.mean(x) and np.mean(y) to compute the mean values of x and y,
respectivelty.Store these values in mean_x and mean_y.
STEP4: Subtract mean_x from each element of x and mean_y from each
element of y.Multiply the corresponding differences and sum the results.Store
the sum in numerator.
STEP5: Subtract mean_x from each element of x and square the
differences.Sum the squared differences and store the result in a temporary
variable.Do the same for the y values.Multiply the two sums and take the square
root.Store the result in denominator.
STEP6: Divide the numerator by the denominator and return the result.
STEP7: Create two lists x and y to represent the data.Convert the lists to
NumPy arrays using np.array.Call the correlation_coefficient function with the
NumPy arrays as arguments. Print the calculated correlation coefficient.
PROGRAM
import numpy as np
def correlation_coefficient(x, y):
mean_x = np.mean(x)
mean_y = np.mean(y)
numerator = np.sum((x - mean_x) * (y - mean_y))
denominator = np.sqrt(np.sum((x - mean_x) ** 2) * np.sum((y - mean_y) **
2))
return numerator / denominator
x = [1, 2, 3, 4, 5]
y = [2, 3, 4, 5, 6]
result = correlation_coefficient(np.array(x), np.array(y))
print("Correlation Coefficient:", result)
OUTPUT
Correlation Coefficient: 1.0
g.REGRESSION
ALGORITHM
STEP1: Import numpy as np for numerical operations. Import LinearRegression
from sklearn.linear_model for linear regression functionality.
STEP2: Create NumPy arrays x and y to store the independent and dependent
variables, respectively.In this example, x represents a single feature (reshape(-1,
1) ensures it's a 2D column vector).
STEP3: Instantiate a LinearRegression object from scikit-learn to create the
linear regression model.
STEP4: Use the model.fit(x, y) method to fit the model with the training data (x
and y).During this process, the model estimates the optimal slope and intercept
for the best-fit line.
STEP5: Access the slope (coefficient) using model.coef_[0]. This represents the
change in y for a unit change in x.
STEP6: Print the calculated slope and intercept values.
PROGRAM
import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) # Reshape to make it 2D
y = np.array([2, 3, 4, 5, 6])
model = LinearRegression()
model.fit(x, y)
slope = model.coef_[0]
intercept = model.intercept_
print("Slope (Coefficient):", slope)
print("Intercept:", intercept)
predicted_y = model.predict(x)
print("Predicted Values:", predicted_y)
OUTPUT
Slope (Coefficient): 1.0
Intercept: 1.0
Predicted Values: [2. 3. 4. 5. 6.]
RESULT:
Thus the program for statistical and probability measures was
successfully executed and verified.
EX NO:6
UNIVARIATE AND BIVARIATE ANALYSIS
DATE:
AIM:
To write a python program for univariate ad bivariate analysis
a.UNIVARIATE ANALYSIS:
FREQUENCY,MEAN,MEDIAN,MODE,VARIANCE,STANDARD
DEVIATION,SKEWNESS AND KURTOSIS
ALGORITHM
STEP1: Initialize an empty dictionary to store frequencies.
STEP2: Sum all data points.Divide the sum by the number of data points.
STEP3: Sort the data in ascending order.If the number of data points is odd, the
median is the middle value.
STEP4: Find the most frequent value in the data.If multiple values have the
same highest frequency, there's no unique mode.
STEP5: Calculate the mean.
STEP6: Calculate the variance.
PROGRAM
import statistics as stats
import collections
data = [12, 15, 12, 18, 16, 14, 15, 14, 16, 18, 19, 21, 15, 17, 19]
frequency = collections.Counter(data)
print("Frequency Distribution:")
for value, count in frequency.items():
print(f"{value}: {count}")
mean_value = stats.mean(data)
print("\nMean:", mean_value)
median_value = stats.median(data)
print("Median:", median_value)
try:
mode_value = stats.mode(data)
print("Mode:", mode_value)
except stats.StatisticsError:
print("Mode: No unique mode found")
variance_value = stats.variance(data) # Sample variance
print("Variance:", variance_value)
std_deviation_value = stats.stdev(data)
print("Standard Deviation:", std_deviation_value)
OUTPUT
Frequency Distribution:
12: 2
15: 3
18: 2
16: 2
14: 2
19: 2
21: 1
17: 1
Mean: 16.066666666666666
Median: 16
Mode: 15
Variance: 6.780952380952381
Standard Deviation: 2.6040261866871424
b.BIVARIATE ANALYSIS:
LINEAR REGRESSION MODELLING
ALGORITHM
STEP1: Use numpy.random.rand to create random values for the independent
variable (X). Use train_test_split from sklearn.model_selection to split the data
into training and testing sets. This helps evaluate the model's performance on
unseen data.
STEP2: Import LinearRegression from sklearn.linear_model. This class
implements the linear regression algorithm.
STEP3: Use model.fit(X_train, y_train) to train the model. This involves fitting
the equation to the training data (X_train and y_train).
STEP4: Use model.predict(X_test) to predict the dependent variable (y) values
for the unseen test data (X_test).
STEP5: Access model.intercept_ to get the estimated intercept (b0) in the linear
equation.
STEP6: Calculate the R^2 score using r2_score from sklearn.metrics. This
indicates how well the regression line fits the data (closer to 1 is better).
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
np.random.seed(0)
X = 2 * np.random.rand(100, 1) # Independent variable (predictor)
y = 4 + 3 * X + np.random.randn(100, 1) # Dependent variable with some
noise
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)
OUTPUT
Intercept: [4.32235853]
Coefficient: [[2.93647151]]
Mean Squared Error: 1.0434333815695171
R^2 Score: 0.7424452332071367
LOGISTIC REGRESSION MODELLING
ALGORITHM
STEP1: Generate synthetic dataset.
STEP2: Prepare the data for logistic regression.
STEP3: Perform Logistic Regression with statsmodels.
STEP4: Alternatively, fit the model with sklearn's LogisticRegression.
STEP5: Make predictions and evaluate the model.
PROGRAM
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
np.random.seed(0)
data_size = 100
target = np.random.choice([0, 1], size=data_size)
predictor = target + np.random.normal(0, 1, data_size)
df = pd.DataFrame({'Target': target, 'Predictor': predictor})
X = df[['Predictor']]
y = df['Target']
X_sm = sm.add_constant(X)
logit_model = sm.Logit(y, X_sm).fit()
print("Logistic Regression with statsmodels:")
print(logit_model.summary())
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=0)
sklearn_logit = LogisticRegression()
sklearn_logit.fit(X_train, y_train)
y_pred = sklearn_logit.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print("\nLogistic Regression with sklearn:")
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
OUTPUT
Logistic Regression with sklearn:
Accuracy: 0.7333333333333333
Confusion Matrix:
[[ 8 4]
[ 4 14]]
Classification Report:
precision recall f1-score support
0 0.67 0.67 0.67 12
1 0.78 0.78 0.78 18
accuracy 0.73 30
macro avg 0.72 0.72 0.72 30
weighted avg 0.73 0.73 0.73 30
RESULT
Thus the python program for univariate and bivariate analysis are
successfully executed and verified.
EX NO : 7 SUPERVISED AND UNSUPERVISED LEARNING
DATE : ALGORITHM ON ANY DATA SET
AIM
To write python program for supervised and unsupervised learning algorithm
on any data set
a.SUPERVISED LEARNING
ALGORITHM
STEP1: Import datasets.load_wine from sklearn.
STEP2: Split the scaled data (X_scaled) and class labels (y_class) or regression
target (y_regression) into training and testing sets using train_test_split(). This
ensures the model is evaluated on unseen data.
STEP3: Import KNeighborsClassifier from sklearn.neighbors.
STEP4: Use knn.fit(X_train_class, y_train_class) to train the KNN model on the
labeled classification training data (X_train_class and y_train_class).
STEP5: Use tree.predict(X_test_reg) to predict the first feature's value for the
unseen test data (X_test_reg). The model predicts a continuous value for each
data point in the test set.
PROGRAM
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
wine = datasets.load_wine()
X = wine.data
y_class = wine.target
y_regression = wine.data[:, 0]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train_class, X_test_class, y_train_class, y_test_class =
train_test_split(X_scaled, y_class, test_size=0.3, random_state=42)
X_train_reg, X_test_reg, y_train_reg, y_test_reg =
train_test_split(X_scaled, y_regression, test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_class, y_train_class)
y_pred_class = knn.predict(X_test_class)
print("KNN Classification Accuracy:", accuracy_score(y_test_class,
y_pred_class))
tree = DecisionTreeRegressor(random_state=42)
tree.fit(X_train_reg, y_train_reg)
y_pred_reg = tree.predict(X_test_reg)
print("Decision Tree MSE:", mean_squared_error(y_test_reg, y_pred_reg))
print("Decision Tree R2 Score:", r2_score(y_test_reg, y_pred_reg))
OUTPUT
KNN Classification Accuracy: 0.9629629629629629
Decision Tree MSE: 0.0016999999999999878
Decision Tree R2 Score: 0.996833244367082
b.UNSUPERVISED LEARNING
ALGORITHM
STEP1: Initialization:Randomly select k data points as initial centroids.
STEP2: Calculate the distance to each centroid.
STEP3: Calculate the mean of all data points assigned to the cluster. Set the
centroid of the cluster to this mean.
STEP4: Repeat steps 2 and 3 until convergence (i.e., no data points change
clusters).
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
data = np.array([
[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0],
[5, 5], [6, 6], [5, 6]
])
num_clusters = 3
kmeans = KMeans(n_clusters=num_clusters, random_state=0)
kmeans.fit(data)
centers = kmeans.cluster_centers_
labels = kmeans.labels_
print("Cluster centers:\n", centers)
print("Labels for each data point:", labels)
OUTPUT
Cluster centers:
[[10. 2. ]
[ 1. 2. ]
[ 5.33333333 5.66666667]]
Labels for each data point: [1 1 1 0 0 0 2 2 2]
RESULT
Thus the python program for supervised and unsupervised learning
algorithm on any dataset was successfully executed and verified.
EX NO:8
VARIOUS PLOTTING FUNCTIONS ON
DATE: ANY DATA SET
AIM
To write a python program for various plotting functions on any dataset
a.SIMPLE LINE PLOT
ALGORITHM
STEP1: Import the matplotlib module.
STEP2: plt.plot(x, y, color="blue", label="y = sin(x)") is the key line for
creating the plot.
STEP3: plt.title("Simple Line Plot of y = sin(x)") adds a title to the
plot.plt.xlabel("x") and plt.ylabel("y") label the X and Y axes, respectively.
STEP4: plt.grid(True) adds a grid to the plot for better visual reference.
STEP5: plt.show() displays the generated line plot.
PROGRAM
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100) # 100 points between 0 and 10
y = np.sin(x) # sine function
plt.figure(figsize=(8, 6))
plt.plot(x, y, color="blue", label="y = sin(x)")
plt.title("Simple Line Plot of y = sin(x)")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.grid(True)
plt.show()
OUTPUT
b.BAR PLOT
ALGORITHM
STEP1: The provided data is assumed to be in separate lists (categories and
values). Seaborn can handle various data structures like DataFrames or
dictionaries.
STEP2: For each category: A rectangular bar is created .
STEP3: The X-axis labels are set to the category labels.The Y-axis label is set to
the label for the values (usually the units of measurement).
STEP4: A title is added to the plot (optional).The plot is displayed using
plt.show().
PROGRAM
import matplotlib.pyplot as plt
import seaborn as sns
categories = ['Category A', 'Category B', 'Category C', 'Category D']
values = [23, 45, 56, 78]
plt.figure(figsize=(8, 6))
sns.barplot(x=categories, y=values, palette="viridis")
plt.title("Bar Plot of Categories")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()
OUTPUT
c.PIE CHART
ALGORITHM
STEP1: labels: Names of the categories for each slice.sizes: Numerical values
representing the size of each slice (usually proportional to a quantity).
STEP2: plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
startangle=140) is the key line for generating the pie chart.
STEP3: plt.figure(figsize=(8, 6)) sets the figure size (width and height in
inches).plt.title("Pie Chart of Categories") adds a title to the plot.
STEP4: plt.show() displays the generated pie chart.
PROGRAM
labels = ['Category A', 'Category B', 'Category C', 'Category D']
sizes = [15, 30, 45, 10]
colors = ['gold', 'lightblue', 'lightgreen', 'salmon']
plt.figure(figsize=(8, 6))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
startangle=140)
plt.title("Pie Chart of Categories")
plt.show()
OUTPUT
RESULT
Thus the program for various plotting functions on any dataset was
successfully executed and verified.