[go: up one dir, main page]

0% found this document useful (0 votes)
43 views40 pages

Ds Lab-1

Uploaded by

kabileshramesh80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views40 pages

Ds Lab-1

Uploaded by

kabileshramesh80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

EX NO:2

WORKING WITH NUMPY ARRAYS


DATE:

AIM:
To work with numpy arrays
a.REAL AND IMAGINARAY PARTS OF ARRAY OF COMPLEX
NUMBERS
ALGORITHM:
STEP1: Input:An array of complex numbers
STEP2: For each complex number z in the array: Calculate the magnitude r of z:
o r = sqrt(real(z)^2 + imag(z)^2)
STEP3: Calculate the angle theta of z: theta = arctan2(imag(z), real(z))
STEP4: Calculate the square root of the magnitude: sqrt_r = sqrt(r)
STEP5: Calculate the half-angle: half_theta = theta / 2
STEP6: Calculate the real and imaginary parts of the square root: real_part =
sqrt_r * cos(half_theta).,imaginary_part = sqrt_r * sin(half_theta)
STEP7: Output:An array of real parts and an array of imaginary parts.
PROGRAM:
import numpy as np
x = np.sqrt([1+0j])
y = np.sqrt([0+1j])
print("Original array: x", x)
print("Original array: y", y)
print("Real part of the array:")
print(x.real)
print(y.real)
print("Imaginary part of the array:")
print(x. imag)
print(y. imag)
OUTPUT:
Original array: x [1.+0.j]
Original array: y [0.70710678+0.70710678j]
Real part of the array:
[1.]
[0.70710678]
Imaginary part of the array:
[0.]
[0.70710678]

b.CHANGE AN ARRAY DATATYPE


ALGORITHM:
STEP1: Import the NumPy library for numerical operations on arrays.
STEP2: Use np.array() to create a 2D array with the given data. Specify the
desired data type (np.int32) to ensure integer values.
STEP3: Use print() to display the array. Use print(x.dtype) to print the data type
of the array.
STEP4: Use the astype() method to convert the data type of the array to float.
Assign the result to a new variable y.
STEP5: Use print(y.dtype) to print the new data type of the array. Use print(y)
to display the array with the new data type.
PROGRAM
import numpy as np
x = np.array([[2,4,6], [6,7,8]].np.int32)
print(x)
print("Data type of the array is:", x.dtype)
y = x.astype(float)
print("New Type:",y.dtype)
print(y)
OUTPUT:
[[2 4 6]
[6 7 8]]
Data type of the array is: int32
New Type: float64
[[2. 4. 6.]
[6. 7. 8.]]
c.ELEMENT WISE REMAINDER OF AN ARRAY OF DIVISION
ALGORITHM:
STEP1 Import the numpy library using the import numpy as np statement. This
library provides tools for numerical operations and arrays.
STEP2: Create a NumPy array x containing the numbers from 0 to 6 (inclusive)
using np.arange(7). This array will represent the numbers to be divided.
STEP3: Print the x array to display the original numbers that will be divided.
STEP4: Use the np.remainder(x, 5) function to calculate the element-wise
remainder of dividing each element in x by 5. This will return a new array
where each element is the remainder of the corresponding division operation.
STEP5: Print the resulting array to display the element-wise remainders.
PROGRAM:
import numpy as np
x = np.arrange(7)
print(“Original array:”)
print(x)
print(“Element-wise remainder of division:”)
print(np.remainder(x,5))
OUTPUT:
Original Array : [0 1 2 3 4 5 6]
Element wise remainder of division : [0 1 2 3 4 0 1]
d.COMBINING ONE AND 2D NUMPY ARRAY
ALGORITHM:
STEP 1: Import the NumPy library using import numpy as np.
STEP 2: Create a one-dimensional array num_1d containing numbers from 0
to 4 using np.arange(5). Create a two-dimensional array num_2d containing
numbers from 0 to 9, reshaped into a 2x5 matrix using np.arange(10).reshape(2,
5).
STEP 3: Print num_1d and num_2d to display their contents
STEP 4: Use a for loop with np.nditer([num_1d, num_2d]) to iterate over the
elements of both arrays simultaneously.
o a will represent the element from num_1d.
o b will represent the corresponding element from num_2d.
STEP 5 Inside the loop, print the values of a and b using the print("%d : %d"
% (a, b)) statement. This will display the elements of both arrays in pairs.
PROGRAM:
import numpy as np
num_1d=np.arrange(5)
print(“One dimesnsional array:”)
print(num_1d)
num_2d=np.arrange(10).reshape(2,5)
print(“\nTwo dimensional array:”)
print(num_2d)
for a,b in np.nditer([num_1d,num_2d]):
print(“%d :%d”%(a,b)

OUTPUT:
One dimensional array:[0 1 2 3 4]
Two dimensional array:[[0 1 2 3 4]
[5 6 7 8 9]

0:0
1:1
2:2
3:3
4:4
0:5
1:6
2:7
3:8
4:9
e.ELEMENT WISE OPERATION
ALGORITHM:
STEP1: Import the numpy library using the import numpy as np statement. This
library provides tools for numerical operations and arrays.
STEP2: Create two NumPy arrays x and y containing the desired elements.
These arrays represent the numbers to be compared.
STEP3: Print the x and y arrays to display the original numbers that will be
compared.
STEP4: Use the following NumPy functions to perform element-wise
comparisons between x and y
STEP5: Print the results of each comparison, which will be boolean arrays
indicating the element-wise relationships between x and y.
PROGRAM
import numpy as np
x = np.array([3,5])
y = np.array([2,5])
print(“Original numbers:”)
print(x)
print(y)
print(“Comaprison-greater”)
print(np.greater(x,y))
print(“Comparison-greater_equal”)
print(np.greater_equal(x,y))
print(“Comparison-less”)
print(np.less(x,y))
print(“Comparison-less_equal”)
print(np.less_equal(x,y))
OUTPUT:
Original number: [ 3 5 ]
[2 5]
Comparison-greater:
[True False]
Comparison-greater_equal:
[True True]
Comparison-less:
[False False]
Comparison-less_equal:
[False True]

RESULT:
Thus, the working with numpy arrays was successfully executed and
verified.
EX NO:3
WORKING WITH PANDAS DATAFRAME
DATE:

AIM:
To work with pandas dataframe
a.MERGING TWO DATAFRAMES
ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis
STEP2: Create dictionaries data1 and data2 containing the following key-value
pairs:
• 'ID': A list of IDs.
• 'Name', 'Age': Lists of names and ages (in data1).
• 'City', 'Salary': Lists of cities and salaries (in data2)

STEP3: Create DataFrames df1 and df2 from the respective dictionaries using
the pd.DataFrame() function. This converts the dictionaries into tabular data
structures.
STEP4: Use the pd.merge() function to merge df1 and df2 based on the 'ID'
column. Specify the how='inner' parameter to perform an inner join, which
keeps only the rows that have matching values in both DataFrames.
STEP5: Print the merged_df DataFrame to display the results. This will output a
table containing the combined data from df1 and df2, with the rows joined
based on the matching 'ID' values.
PROGRAM:
import pandas as pd
data1 = {'ID': [1, 2, 3],'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df1 = pd.DataFrame(data1)
data2 = {'ID': [1, 2, 4], 'City': ['New York', 'Los Angeles', 'Chicago'],
'Salary': [70000, 80000, 60000]}
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)

OUTPUT:
ID Name Age City Salary
0 1 Alice 25 New York 70000
1 2 Bob 30 Los Angeles 80000
2 3 Charlie 35 Chicago 60000

b.PIVOT TABLE CREATION


ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis.
STEP2: Create a dictionary named data containing the following key-value
pairs:
• 'Name': A list of names.
• 'Department': A list of departments.
• 'Salary': A list of salaries.
STEP3: Create a pandas DataFrame named df from the data dictionary using the
pd.DataFrame(data) function. This converts the dictionary into a tabular data
structure.
STEP4: Use the df.pivot_table() method to create a pivot table
STEP5: Fill any missing values in the pivot table with zeros using the
pivot_table.fillna(0, inplace=True) method. This ensures that all cells have a
value.
STEP6: Print the pivot_table DataFrame to display the results. This will output a
table where the rows represent names, the columns represent departments, and
the values represent the average salaries for each name-department
combination.
PROGRAM:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Alice', 'Bob'],
'Department': ['HR', 'IT', 'Finance', 'HR', 'IT'],
'Salary': [50000, 60000, 55000, 52000, 61000]}
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='Salary', index='Name',
columns='Department', aggfunc='mean')
pivot_table.fillna(0, inplace=True)
print(pivot_table)

OUTPUT:
Department Finance HR IT
Name
Alice 0 51000.0 0.0
Bob 0 0.0 60500.0
Charlie 55000 0.0 0.0

c.HANDLING DUPLICATE ROWS


ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis
STEP2: Create a dictionary named data
STEP3: Create a pandas DataFrame named df from the data dictionary using the
pd.DataFrame(data) function. This converts the dictionary into a tabular data
structure.
STEP4: Use the df.duplicated() method to identify duplicate rows in the
DataFrame. This returns a boolean Series where True indicates a duplicate row
and False indicates a unique row.
STEP5: Print the duplicate rows by indexing the DataFrame with the duplicates
Series. This will display the rows that are identical to previous rows.
STEP6: Use the df.drop_duplicates() method to remove duplicate rows from the
DataFrame. This returns a new DataFrame with only unique rows.
STEP7: Print the df_no_duplicates DataFrame to display the results. This will
output a table containing the data without any duplicate rows.
PROGRAM:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob'],
'Age': [25, 30, 25, 35, 30],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Los Angeles']}
df = pd.DataFrame(data)
duplicates = df.duplicated()
print("Duplicate rows:\n", df[duplicates])
df_no_duplicates = df.drop_duplicates()
print("\nDataFrame without duplicates:\n", df_no_duplicates)

OUTPUT:
Duplicate rows:
Name Age City
2 Alice 25 New York
4 Bob 30 Los Angeles
DataFrame without duplicates:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
3 Charlie 35 Chicago
d.ACCESS ROW BY INDEX
ALGORITHM:
STEP1: Import the pandas library for data manipulation and analysis.
STEP2: Create a Python dictionary data with keys representing column names
('Name', 'Age', 'City') and values as lists containing corresponding data.
STEP3: Use pd.DataFrame(data) to create a pandas DataFrame from the
dictionary.This DataFrame will have columns 'Name', 'Age', and 'City'.
STEP4: Use df.loc[1] to access the second row of the DataFrame. Remember
that indexing in Python starts from 0, so the second row has index 1.
STEP5: Use print(second_row) to display the selected row.
PROGRAM
import pandas as pd
data = {'Name': ['Gen', 'Mary', 'Alice'], 'Age': [22, 28, 35], 'City': ['New
York', 'Londan', 'Coimbatore']}
df = pd.DataFrame(data)
second_row = df.loc[1]
print(second_row)
OUTPUT:
Name Mary
Age 28
City Londan
Name: 1, dtype: object
e. CALCULATE AVERAGE AND FILTER ROWS
ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis.
STEP2: Create a dictionary named data containing the following key-value
pairs:
o 'Name': A list of employee names.
o 'Age': A list of employee ages.
o 'Salary': A list of employee salaries.
STEP3: Create a pandas DataFrame named df from the data dictionary using
the pd.DataFrame(data) function. This converts the dictionary into a tabular data
structure.
STEP4: Calculate the average salary using the df['Salary'].mean() expression.
This computes the mean value of the salaries in the 'Salary' column.
STEP5:Filter the DataFrame to only include rows where the 'Salary' is greater
than the avg_salary. This creates a new DataFrame df_filtered containing only
the employees whose salaries exceed the average.
STEP6: Print the df_filtered DataFrame to display the results. This will output a
table containing the names, ages, and salaries of the employees who earn more
than the average salary.

PROGRAM
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [25, 30, 35, 40, 22],
'Salary': [50000, 60000, 55000, 65000, 48000]}
df = pd.DataFrame(data)
avg_salary = df['Salary'].mean()
df_filtered = df[df['Salary'] > avg_salary]
print(df_filtered)

OUTPUT:
Name Age Salary
1 Bob 30 60000
2 Charlie 35 55000
3 David 40 65000
RESULT:
Thus the working with pandas dataframe was successfully executed
and verified.
EX NO:4
BASIC PLOT USING MATPLOTLIB
DATE:

AIM:
To write a python program of basic plot using matplotlib
a.PLOTTING SINE AND COSINE FUNCTIONS WITH LEGENDS IN
MATPLOTLIB
ALGORITHM
STEP1: Import numpy for numerical operations and matplotlib.pyplot for
plotting
STEP2: Use np.linspace() to generate 1000 evenly spaced points between 0 and
10, and assign them to the x variable.
STEP3: Create a figure and an axis object using plt.subplots().
STEP4: Use ax.plot() to plot the sine and cosine functions:Plot the sine
function with a dashed blue line ('--b') and label it 'Sine'.Plot the cosine function
with a red line (c='r') and label it 'Cosine'.
STEP5: Use ax.axis('equal') to set the aspect ratio of the axes to be equal,
ensuring that the scales of the x and y axes are the same.
STEP6: Use ax.legend(loc="lower left") to add a legend to the plot. The
loc="lower left" argument specifies the location of the legend.
STEP7: Implicitly display the plot using plt.show() (not explicitly included in
the code, but it's the default behavior).
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 1000)
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), '--b', label='Sine')
ax.plot(x, np.cos(x), c='r', label='Cosine')
ax.axis('equal')
leg = ax.legend(loc="lower left")
OUTPUT

b.PLACE THE LEGEND OUTSIDE THE PLOT IN MATPLOTLIB


ALGORITHM:
STEP1: Import numpy (not used in this specific example but commonly used
for numerical computations) .Import matplotlib.pyplot for plotting
functionalities.
STEP2: Create a list x containing values for the x-axis (0 to 8).Create two lists,
y1 and y2, containing values for the y-axis of the two functions you want to
plot.
STEP3:Use plt.plot(y1, label="y = x") to plot the first function.
• y1 specifies the data points for the y-axis.
• label="y = x" sets the label for the legend, indicating the function plotted.
Use plt.plot(y2, label="y = 3x") to plot the second function following the same
format as the first.
STEP4: Use plt.legend(bbox_to_anchor=(0.75, 1.15), ncol=2) to add a legend to
the plot with some customizations.
• bbox_to_anchor=(0.75, 1.15): This positions the legend outside the plot
area at coordinates (0.75, 1.15) relative to the figure. You can adjust these
values to change the legend's position.
• ncol=2: This specifies that the legend should display labels in two
columns.
STEP5: Use plt.show() to display the generated plot with the two functions and
the legend.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
x = [0, 1, 2, 3, 4, 5, 6, 7, 8]
y1 = [0, 3, 6, 9, 12, 15, 18, 21, 24]
y2 = [0, 1, 2, 3, 4, 5, 6, 7, 8]
plt.plot(y1, label="y = x")
plt.plot(y2, label="y = 3x")
plt.legend(bbox_to_anchor=(0.75, 1.15), ncol=2)
plt.show()
OUTPUT

c.REMOVE THE LEGEND IN MATPLOTLIB


ALGORITHM
STEP1: Import numpy for numerical operations (used to create sequences of
data points). Import matplotlib.pyplot for plotting functionalitie.
STEP2: Use np.linspace(-3, 3, 100) to generate 100 evenly spaced points
between -3 and 3 for the x-axis and store them in x.Use np.power(x, 2) to
calculate the square of each element in x and store the results in y1. Use
np.power(x, 3) to calculate the cube of each element in x and store the results in
y2.
STEP3: Use plt.subplots(2, 1) to create a figure with two subplots arranged
vertically (2 rows, 1 column). This assigns the figure object to fig and an array
of axes objects to axs. Each element in axs represents an individual subplot.
STEP4: Use axs[0].plot(x, y1, c = 'r',label = 'x^2') to plot the squared function
(y1) on the first subplot (axs[0]). Use axs[1].plot(x, y2, c = 'g',label = 'x^3') to
plot the cubed function (y2) on the second subplot (axs[1]), following the same
format as the first plot.
STEP5: Use axs[0].legend(loc = 'upper left') to add a legend to the first subplot
positioned in the upper left corner (loc = 'upper left'). Use axs[1].legend(loc =
'upper left') to add a legend to the second subplot with the same positioning.
STEP6: Use axs[1].get_legend().remove() to remove the legend that was just
added to the second subplot (axs[1]).
STEP7: Use plt.show() to display the generated plot with the two subplots. The
first subplot will have a legend, while the second subplot will not.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-3, 3, 100)
y1 = np.power(x, 2)
y2 = np.power(x, 3)
fig, axs = plt.subplots(2, 1)
axs[0].plot(x, y1, c = 'r',label = 'x^2')
axs[1].plot(x, y2, c = 'g',label = 'x^3')
axs[0].legend(loc = 'upper left')
axs[1].legend(loc = 'upper left')
axs[1].get_legend().remove()
plt.show()
OUTPUT

RESULT:
Thus the program of basic matplot using matplotlib was successfully
executed and verified.
EX NO :5
STATISTICAL AND PROBABILITY MEASURES
DATE :

AIM:
To write a python program of statsical and probability measures.
a.FREQUENCY DISTRIBUTIONS
ALGORITHM:
STEP1: Create an empty dictionary to store the frequency of each number.Each
key in the dictionary will represent a unique number, and its corresponding
value will be the count of that number.
STEP2: For each number in the list: Check if the number is already a key in the
dictionary:
o If it is, increment its corresponding value by 1.
o If it's not, add the number as a new key to the dictionary with an
initial value of 1.

STEP3: Iterate over the key-value pairs in the dictionary. For each pair, print the
number and its corresponding frequency in a readable format.
PROGRAM
data = [5, 1, 2, 2, 3, 1, 5, 5, 4, 3, 4, 4, 1, 5]
frequency = {}
for item in data:
frequency[item] = frequency.get(item, 0) + 1
print("Frequency Distribution:")
for item, count in frequency.items():
print(f"{item}: {count}")
OUTPUT:
Frequency Distribution:
5: 4
1: 3
2: 2
3: 2
4: 3
b.MEAN,MODE,STANDARD DEVIATION
ALGORITHM
STEP1: Sum all the numbers in the list. Divide the sum by the total number of
elements in the list.
STEP2: Find the number that occurs most frequently in the list. If multiple
numbers occur with the same highest frequency, the mode can be multiple
values.
STEP3: Calculate the mean of the list. For each number in the list:
• Subtract the mean from the number.
• Square the difference. Calculate the average of the squared differences.
Take the square root of the average.
PROGRAM
import statistics
def calculate_statistics(data):
mean = statistics.mean(data)
mode = statistics.mode(data)
std_dev = statistics.stdev(data)
return mean, mode, std_dev
data = [1, 2, 2, 3, 4, 4, 4, 5, 6, 7]
mean, mode, std_dev = calculate_statistics(data)
print("Mean:", mean)
print("Mode:", mode)
print("Standard Deviation:", std_dev)
OUTPUT
Mean: 3.8
Mode: 4
Standard Deviation: 1.8737959096740262
c.VARIABILITY
ALGORITHM
STEP1: Import the statistics library to utilize its functions for variance and
standard deviation calculations.
STEP2: Create a function calculate_variability that takes a list of numbers
(data) as input.
STEP3: Find the maximum and minimum values in the data list. Subtract the
minimum value from the maximum value to obtain the range.
STEP4: Use the statistics.variance and statistics.stdev functions to calculate the
variance and standard deviation of the data list, respectively.
STEP5: Return the calculated range, variance, and standard deviation.Print the
calculated values to the console.
PROGRAM
import statistics
def calculate_variability(data):
data_range = max(data) - min(data)
variance = statistics.variance(data)
std_dev = statistics.stdev(data)
return data_range, variance, std_dev
data = [4, 8, 6, 5, 3, 8, 9, 7, 5]
data_range, variance, std_dev = calculate_variability(data)
print("Range:", data_range)
print("Variance:", variance)
print("Standard Deviation:", std_dev)
OUTPUT
Range: 6
Variance: 4.111111111111111
Standard Deviation: 2.0275875100994063
d.NORMAL CURVES
ALGORITHM
STEP1: Import numpy as np for numerical computations. Import
matplotlib.pyplot as plt for plotting functionalities.
STEP2: Set the desired mean (mean) of the normal distribution.Set the desired
standard deviation (std_dev) of the normal distribution.
STEP3: Create a NumPy array x of evenly spaced values using np.linspace.This
array will represent the X-axis of the plot.
STEP4: Define the mathematical formula for the normal distribution probability
density function (PDF).It involves the mean, standard deviation, and the
constant values (np.pi and np.exp).Use vectorized operations with NumPy for
efficient calculation.Store the calculated PDF values in a NumPy array y.
STEP5: Use plt.plot(x, y) to create a line plot of the data points (x-values vs.
probability densities). Set the plot title using plt.title("Normal Distribution
Curve"). Label the x-axis and y-axis using plt.xlabel("X-axis") and
plt.ylabel("Probability Density").
STEP6: Use plt.show() to display the generated plot.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
mean = 0
std_dev = 1
x = np.linspace(-4, 4, 1000)
y = (1 / (std_dev * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mean) / std_dev)
** 2)
plt.plot(x, y)
plt.title("Normal Distribution Curve")
plt.xlabel("X-axis")
plt.ylabel("Probability Density")
plt.show()
OUTPUT

e.CORREALATION AND SCATTER PLOTS


ALGORITHM
STEP1: Import numpy as np for numerical computations.Import
matplotlib.pyplot as plt for plotting functionalities.
STEP2: Create NumPy arrays x and y to store the data points for the two
variables.Ensure both arrays have the same length to represent corresponding
values.
STEP3: Use np.corrcoef(x, y) to calculate the correlation coefficient
matrix.Access the element at index [0, 1] to get the correlation between x and
y.Store this value in a variable correlation.
STEP4: Use plt.scatter(x, y, color='blue') to create a scatter plot with blue
colored markers.Each data point from x and y will be plotted as a marker on the
scatter plot.
STEP5: Set the plot title using plt.title(f"Scatter Plot (Correlation:
{correlation:.2f})").The title includes the calculated correlation coefficient
formatted to two decimal places. Label the x-axis and y-axis using
plt.xlabel("X-axis") and plt.ylabel("Y-axis").
STEP6: Use plt.show() to display the generated scatter plot.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2, 4, 5, 4, 5, 7, 8, 8, 10, 12])
correlation = np.corrcoef(x, y)[0, 1]
plt.scatter(x, y, color='blue')
plt.title(f"Scatter Plot (Correlation: {correlation:.2f})")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
OUTPUT

f.CORRELATION COEFFICIENT
ALGORITHM
STEP1: Import the numpy library as np to perform numerical operations.
STEP2: Create a function named correlation_coefficient that takes two NumPy
arrays x and y as input.
STEP3: Use np.mean(x) and np.mean(y) to compute the mean values of x and y,
respectivelty.Store these values in mean_x and mean_y.
STEP4: Subtract mean_x from each element of x and mean_y from each
element of y.Multiply the corresponding differences and sum the results.Store
the sum in numerator.
STEP5: Subtract mean_x from each element of x and square the
differences.Sum the squared differences and store the result in a temporary
variable.Do the same for the y values.Multiply the two sums and take the square
root.Store the result in denominator.
STEP6: Divide the numerator by the denominator and return the result.
STEP7: Create two lists x and y to represent the data.Convert the lists to
NumPy arrays using np.array.Call the correlation_coefficient function with the
NumPy arrays as arguments. Print the calculated correlation coefficient.
PROGRAM
import numpy as np
def correlation_coefficient(x, y):
mean_x = np.mean(x)
mean_y = np.mean(y)
numerator = np.sum((x - mean_x) * (y - mean_y))
denominator = np.sqrt(np.sum((x - mean_x) ** 2) * np.sum((y - mean_y) **
2))
return numerator / denominator
x = [1, 2, 3, 4, 5]
y = [2, 3, 4, 5, 6]
result = correlation_coefficient(np.array(x), np.array(y))
print("Correlation Coefficient:", result)
OUTPUT
Correlation Coefficient: 1.0
g.REGRESSION
ALGORITHM
STEP1: Import numpy as np for numerical operations. Import LinearRegression
from sklearn.linear_model for linear regression functionality.
STEP2: Create NumPy arrays x and y to store the independent and dependent
variables, respectively.In this example, x represents a single feature (reshape(-1,
1) ensures it's a 2D column vector).
STEP3: Instantiate a LinearRegression object from scikit-learn to create the
linear regression model.
STEP4: Use the model.fit(x, y) method to fit the model with the training data (x
and y).During this process, the model estimates the optimal slope and intercept
for the best-fit line.
STEP5: Access the slope (coefficient) using model.coef_[0]. This represents the
change in y for a unit change in x.
STEP6: Print the calculated slope and intercept values.
PROGRAM
import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) # Reshape to make it 2D
y = np.array([2, 3, 4, 5, 6])
model = LinearRegression()
model.fit(x, y)
slope = model.coef_[0]
intercept = model.intercept_
print("Slope (Coefficient):", slope)
print("Intercept:", intercept)
predicted_y = model.predict(x)
print("Predicted Values:", predicted_y)
OUTPUT
Slope (Coefficient): 1.0
Intercept: 1.0
Predicted Values: [2. 3. 4. 5. 6.]
RESULT:
Thus the program for statistical and probability measures was
successfully executed and verified.
EX NO:6
UNIVARIATE AND BIVARIATE ANALYSIS
DATE:

AIM:
To write a python program for univariate ad bivariate analysis
a.UNIVARIATE ANALYSIS:
FREQUENCY,MEAN,MEDIAN,MODE,VARIANCE,STANDARD
DEVIATION,SKEWNESS AND KURTOSIS
ALGORITHM
STEP1: Initialize an empty dictionary to store frequencies.
STEP2: Sum all data points.Divide the sum by the number of data points.
STEP3: Sort the data in ascending order.If the number of data points is odd, the
median is the middle value.
STEP4: Find the most frequent value in the data.If multiple values have the
same highest frequency, there's no unique mode.
STEP5: Calculate the mean.
STEP6: Calculate the variance.

PROGRAM
import statistics as stats
import collections
data = [12, 15, 12, 18, 16, 14, 15, 14, 16, 18, 19, 21, 15, 17, 19]
frequency = collections.Counter(data)
print("Frequency Distribution:")
for value, count in frequency.items():
print(f"{value}: {count}")
mean_value = stats.mean(data)
print("\nMean:", mean_value)
median_value = stats.median(data)
print("Median:", median_value)
try:
mode_value = stats.mode(data)
print("Mode:", mode_value)
except stats.StatisticsError:
print("Mode: No unique mode found")
variance_value = stats.variance(data) # Sample variance
print("Variance:", variance_value)
std_deviation_value = stats.stdev(data)
print("Standard Deviation:", std_deviation_value)

OUTPUT
Frequency Distribution:
12: 2
15: 3
18: 2
16: 2
14: 2
19: 2
21: 1
17: 1
Mean: 16.066666666666666
Median: 16
Mode: 15
Variance: 6.780952380952381
Standard Deviation: 2.6040261866871424
b.BIVARIATE ANALYSIS:
LINEAR REGRESSION MODELLING
ALGORITHM
STEP1: Use numpy.random.rand to create random values for the independent
variable (X). Use train_test_split from sklearn.model_selection to split the data
into training and testing sets. This helps evaluate the model's performance on
unseen data.
STEP2: Import LinearRegression from sklearn.linear_model. This class
implements the linear regression algorithm.
STEP3: Use model.fit(X_train, y_train) to train the model. This involves fitting
the equation to the training data (X_train and y_train).
STEP4: Use model.predict(X_test) to predict the dependent variable (y) values
for the unseen test data (X_test).
STEP5: Access model.intercept_ to get the estimated intercept (b0) in the linear
equation.
STEP6: Calculate the R^2 score using r2_score from sklearn.metrics. This
indicates how well the regression line fits the data (closer to 1 is better).
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
np.random.seed(0)
X = 2 * np.random.rand(100, 1) # Independent variable (predictor)
y = 4 + 3 * X + np.random.randn(100, 1) # Dependent variable with some
noise
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)
OUTPUT
Intercept: [4.32235853]
Coefficient: [[2.93647151]]
Mean Squared Error: 1.0434333815695171
R^2 Score: 0.7424452332071367
LOGISTIC REGRESSION MODELLING
ALGORITHM
STEP1: Generate synthetic dataset.
STEP2: Prepare the data for logistic regression.
STEP3: Perform Logistic Regression with statsmodels.
STEP4: Alternatively, fit the model with sklearn's LogisticRegression.
STEP5: Make predictions and evaluate the model.
PROGRAM
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
np.random.seed(0)
data_size = 100
target = np.random.choice([0, 1], size=data_size)
predictor = target + np.random.normal(0, 1, data_size)
df = pd.DataFrame({'Target': target, 'Predictor': predictor})
X = df[['Predictor']]
y = df['Target']
X_sm = sm.add_constant(X)
logit_model = sm.Logit(y, X_sm).fit()
print("Logistic Regression with statsmodels:")
print(logit_model.summary())
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=0)
sklearn_logit = LogisticRegression()
sklearn_logit.fit(X_train, y_train)
y_pred = sklearn_logit.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print("\nLogistic Regression with sklearn:")
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
OUTPUT
Logistic Regression with sklearn:
Accuracy: 0.7333333333333333
Confusion Matrix:
[[ 8 4]
[ 4 14]]
Classification Report:
precision recall f1-score support
0 0.67 0.67 0.67 12
1 0.78 0.78 0.78 18

accuracy 0.73 30
macro avg 0.72 0.72 0.72 30
weighted avg 0.73 0.73 0.73 30

RESULT
Thus the python program for univariate and bivariate analysis are
successfully executed and verified.
EX NO : 7 SUPERVISED AND UNSUPERVISED LEARNING
DATE : ALGORITHM ON ANY DATA SET

AIM
To write python program for supervised and unsupervised learning algorithm
on any data set
a.SUPERVISED LEARNING
ALGORITHM
STEP1: Import datasets.load_wine from sklearn.
STEP2: Split the scaled data (X_scaled) and class labels (y_class) or regression
target (y_regression) into training and testing sets using train_test_split(). This
ensures the model is evaluated on unseen data.
STEP3: Import KNeighborsClassifier from sklearn.neighbors.
STEP4: Use knn.fit(X_train_class, y_train_class) to train the KNN model on the
labeled classification training data (X_train_class and y_train_class).
STEP5: Use tree.predict(X_test_reg) to predict the first feature's value for the
unseen test data (X_test_reg). The model predicts a continuous value for each
data point in the test set.
PROGRAM
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
wine = datasets.load_wine()
X = wine.data
y_class = wine.target
y_regression = wine.data[:, 0]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train_class, X_test_class, y_train_class, y_test_class =
train_test_split(X_scaled, y_class, test_size=0.3, random_state=42)
X_train_reg, X_test_reg, y_train_reg, y_test_reg =
train_test_split(X_scaled, y_regression, test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_class, y_train_class)
y_pred_class = knn.predict(X_test_class)
print("KNN Classification Accuracy:", accuracy_score(y_test_class,
y_pred_class))
tree = DecisionTreeRegressor(random_state=42)
tree.fit(X_train_reg, y_train_reg)
y_pred_reg = tree.predict(X_test_reg)
print("Decision Tree MSE:", mean_squared_error(y_test_reg, y_pred_reg))
print("Decision Tree R2 Score:", r2_score(y_test_reg, y_pred_reg))
OUTPUT
KNN Classification Accuracy: 0.9629629629629629
Decision Tree MSE: 0.0016999999999999878
Decision Tree R2 Score: 0.996833244367082
b.UNSUPERVISED LEARNING
ALGORITHM
STEP1: Initialization:Randomly select k data points as initial centroids.
STEP2: Calculate the distance to each centroid.
STEP3: Calculate the mean of all data points assigned to the cluster. Set the
centroid of the cluster to this mean.
STEP4: Repeat steps 2 and 3 until convergence (i.e., no data points change
clusters).
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
data = np.array([
[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0],
[5, 5], [6, 6], [5, 6]
])
num_clusters = 3
kmeans = KMeans(n_clusters=num_clusters, random_state=0)
kmeans.fit(data)
centers = kmeans.cluster_centers_
labels = kmeans.labels_
print("Cluster centers:\n", centers)
print("Labels for each data point:", labels)
OUTPUT
Cluster centers:
[[10. 2. ]
[ 1. 2. ]
[ 5.33333333 5.66666667]]
Labels for each data point: [1 1 1 0 0 0 2 2 2]

RESULT
Thus the python program for supervised and unsupervised learning
algorithm on any dataset was successfully executed and verified.
EX NO:8
VARIOUS PLOTTING FUNCTIONS ON
DATE: ANY DATA SET

AIM
To write a python program for various plotting functions on any dataset
a.SIMPLE LINE PLOT
ALGORITHM
STEP1: Import the matplotlib module.
STEP2: plt.plot(x, y, color="blue", label="y = sin(x)") is the key line for
creating the plot.
STEP3: plt.title("Simple Line Plot of y = sin(x)") adds a title to the
plot.plt.xlabel("x") and plt.ylabel("y") label the X and Y axes, respectively.
STEP4: plt.grid(True) adds a grid to the plot for better visual reference.
STEP5: plt.show() displays the generated line plot.
PROGRAM
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100) # 100 points between 0 and 10
y = np.sin(x) # sine function
plt.figure(figsize=(8, 6))
plt.plot(x, y, color="blue", label="y = sin(x)")
plt.title("Simple Line Plot of y = sin(x)")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.grid(True)
plt.show()
OUTPUT

b.BAR PLOT
ALGORITHM
STEP1: The provided data is assumed to be in separate lists (categories and
values). Seaborn can handle various data structures like DataFrames or
dictionaries.
STEP2: For each category: A rectangular bar is created .
STEP3: The X-axis labels are set to the category labels.The Y-axis label is set to
the label for the values (usually the units of measurement).
STEP4: A title is added to the plot (optional).The plot is displayed using
plt.show().

PROGRAM
import matplotlib.pyplot as plt
import seaborn as sns
categories = ['Category A', 'Category B', 'Category C', 'Category D']
values = [23, 45, 56, 78]
plt.figure(figsize=(8, 6))
sns.barplot(x=categories, y=values, palette="viridis")
plt.title("Bar Plot of Categories")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()
OUTPUT

c.PIE CHART
ALGORITHM
STEP1: labels: Names of the categories for each slice.sizes: Numerical values
representing the size of each slice (usually proportional to a quantity).
STEP2: plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
startangle=140) is the key line for generating the pie chart.
STEP3: plt.figure(figsize=(8, 6)) sets the figure size (width and height in
inches).plt.title("Pie Chart of Categories") adds a title to the plot.
STEP4: plt.show() displays the generated pie chart.
PROGRAM
labels = ['Category A', 'Category B', 'Category C', 'Category D']
sizes = [15, 30, 45, 10]
colors = ['gold', 'lightblue', 'lightgreen', 'salmon']
plt.figure(figsize=(8, 6))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
startangle=140)
plt.title("Pie Chart of Categories")
plt.show()
OUTPUT

RESULT
Thus the program for various plotting functions on any dataset was
successfully executed and verified.

You might also like