0% found this document useful (0 votes)

43 views40 pages

Ds Lab-1

Uploaded by

kabileshramesh80

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views40 pages

Ds Lab-1

Uploaded by

kabileshramesh80

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

EX NO:2

WORKING WITH NUMPY ARRAYS

DATE:

AIM:
To work with numpy arrays
a.REAL AND IMAGINARAY PARTS OF ARRAY OF COMPLEX
NUMBERS
ALGORITHM:
STEP1: Input:An array of complex numbers
STEP2: For each complex number z in the array: Calculate the magnitude r of z:
o r = sqrt(real(z)^2 + imag(z)^2)
STEP3: Calculate the angle theta of z: theta = arctan2(imag(z), real(z))
STEP4: Calculate the square root of the magnitude: sqrt_r = sqrt(r)
STEP5: Calculate the half-angle: half_theta = theta / 2
STEP6: Calculate the real and imaginary parts of the square root: real_part =
sqrt_r * cos(half_theta).,imaginary_part = sqrt_r * sin(half_theta)
STEP7: Output:An array of real parts and an array of imaginary parts.
PROGRAM:
import numpy as np
x = np.sqrt([1+0j])
y = np.sqrt([0+1j])
print("Original array: x", x)
print("Original array: y", y)
print("Real part of the array:")
print(x.real)
print(y.real)
print("Imaginary part of the array:")
print(x. imag)
print(y. imag)
OUTPUT:
Original array: x [1.+0.j]
Original array: y [0.70710678+0.70710678j]
Real part of the array:
[1.]
[0.70710678]
Imaginary part of the array:
[0.]
[0.70710678]

b.CHANGE AN ARRAY DATATYPE

ALGORITHM:
STEP1: Import the NumPy library for numerical operations on arrays.
STEP2: Use np.array() to create a 2D array with the given data. Specify the
desired data type (np.int32) to ensure integer values.
STEP3: Use print() to display the array. Use print(x.dtype) to print the data type
of the array.
STEP4: Use the astype() method to convert the data type of the array to float.
Assign the result to a new variable y.
STEP5: Use print(y.dtype) to print the new data type of the array. Use print(y)
to display the array with the new data type.
PROGRAM
import numpy as np
x = np.array([[2,4,6], [6,7,8]].np.int32)
print(x)
print("Data type of the array is:", x.dtype)
y = x.astype(float)
print("New Type:",y.dtype)
print(y)
OUTPUT:
[[2 4 6]
[6 7 8]]
Data type of the array is: int32
New Type: float64
[[2. 4. 6.]
[6. 7. 8.]]
c.ELEMENT WISE REMAINDER OF AN ARRAY OF DIVISION
ALGORITHM:
STEP1 Import the numpy library using the import numpy as np statement. This
library provides tools for numerical operations and arrays.
STEP2: Create a NumPy array x containing the numbers from 0 to 6 (inclusive)
using np.arange(7). This array will represent the numbers to be divided.
STEP3: Print the x array to display the original numbers that will be divided.
STEP4: Use the np.remainder(x, 5) function to calculate the element-wise
remainder of dividing each element in x by 5. This will return a new array
where each element is the remainder of the corresponding division operation.
STEP5: Print the resulting array to display the element-wise remainders.
PROGRAM:
import numpy as np
x = np.arrange(7)
print(“Original array:”)
print(x)
print(“Element-wise remainder of division:”)
print(np.remainder(x,5))
OUTPUT:
Original Array : [0 1 2 3 4 5 6]
Element wise remainder of division : [0 1 2 3 4 0 1]
d.COMBINING ONE AND 2D NUMPY ARRAY
ALGORITHM:
STEP 1: Import the NumPy library using import numpy as np.
STEP 2: Create a one-dimensional array num_1d containing numbers from 0
to 4 using np.arange(5). Create a two-dimensional array num_2d containing
numbers from 0 to 9, reshaped into a 2x5 matrix using np.arange(10).reshape(2,
5).
STEP 3: Print num_1d and num_2d to display their contents
STEP 4: Use a for loop with np.nditer([num_1d, num_2d]) to iterate over the
elements of both arrays simultaneously.
o a will represent the element from num_1d.
o b will represent the corresponding element from num_2d.
STEP 5 Inside the loop, print the values of a and b using the print("%d : %d"
% (a, b)) statement. This will display the elements of both arrays in pairs.
PROGRAM:
import numpy as np
num_1d=np.arrange(5)
print(“One dimesnsional array:”)
print(num_1d)
num_2d=np.arrange(10).reshape(2,5)
print(“\nTwo dimensional array:”)
print(num_2d)
for a,b in np.nditer([num_1d,num_2d]):
print(“%d :%d”%(a,b)

OUTPUT:
One dimensional array:[0 1 2 3 4]
Two dimensional array:[[0 1 2 3 4]
[5 6 7 8 9]

0:0
1:1
2:2
3:3
4:4
0:5
1:6
2:7
3:8
4:9
e.ELEMENT WISE OPERATION
ALGORITHM:
STEP1: Import the numpy library using the import numpy as np statement. This
library provides tools for numerical operations and arrays.
STEP2: Create two NumPy arrays x and y containing the desired elements.
These arrays represent the numbers to be compared.
STEP3: Print the x and y arrays to display the original numbers that will be
compared.
STEP4: Use the following NumPy functions to perform element-wise
comparisons between x and y
STEP5: Print the results of each comparison, which will be boolean arrays
indicating the element-wise relationships between x and y.
PROGRAM
import numpy as np
x = np.array([3,5])
y = np.array([2,5])
print(“Original numbers:”)
print(x)
print(y)
print(“Comaprison-greater”)
print(np.greater(x,y))
print(“Comparison-greater_equal”)
print(np.greater_equal(x,y))
print(“Comparison-less”)
print(np.less(x,y))
print(“Comparison-less_equal”)
print(np.less_equal(x,y))
OUTPUT:
Original number: [ 3 5 ]
[2 5]
Comparison-greater:
[True False]
Comparison-greater_equal:
[True True]
Comparison-less:
[False False]
Comparison-less_equal:
[False True]

RESULT:
Thus, the working with numpy arrays was successfully executed and
verified.
EX NO:3
WORKING WITH PANDAS DATAFRAME
DATE:

AIM:
To work with pandas dataframe
a.MERGING TWO DATAFRAMES
ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis
STEP2: Create dictionaries data1 and data2 containing the following key-value
pairs:
• 'ID': A list of IDs.
• 'Name', 'Age': Lists of names and ages (in data1).
• 'City', 'Salary': Lists of cities and salaries (in data2)

STEP3: Create DataFrames df1 and df2 from the respective dictionaries using
the pd.DataFrame() function. This converts the dictionaries into tabular data
structures.
STEP4: Use the pd.merge() function to merge df1 and df2 based on the 'ID'
column. Specify the how='inner' parameter to perform an inner join, which
keeps only the rows that have matching values in both DataFrames.
STEP5: Print the merged_df DataFrame to display the results. This will output a
table containing the combined data from df1 and df2, with the rows joined
based on the matching 'ID' values.
PROGRAM:
import pandas as pd
data1 = {'ID': [1, 2, 3],'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df1 = pd.DataFrame(data1)
data2 = {'ID': [1, 2, 4], 'City': ['New York', 'Los Angeles', 'Chicago'],
'Salary': [70000, 80000, 60000]}
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)

OUTPUT:
ID Name Age City Salary
0 1 Alice 25 New York 70000
1 2 Bob 30 Los Angeles 80000
2 3 Charlie 35 Chicago 60000

b.PIVOT TABLE CREATION

ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis.
STEP2: Create a dictionary named data containing the following key-value
pairs:
• 'Name': A list of names.
• 'Department': A list of departments.
• 'Salary': A list of salaries.
STEP3: Create a pandas DataFrame named df from the data dictionary using the
pd.DataFrame(data) function. This converts the dictionary into a tabular data
structure.
STEP4: Use the df.pivot_table() method to create a pivot table
STEP5: Fill any missing values in the pivot table with zeros using the
pivot_table.fillna(0, inplace=True) method. This ensures that all cells have a
value.
STEP6: Print the pivot_table DataFrame to display the results. This will output a
table where the rows represent names, the columns represent departments, and
the values represent the average salaries for each name-department
combination.
PROGRAM:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Alice', 'Bob'],
'Department': ['HR', 'IT', 'Finance', 'HR', 'IT'],
'Salary': [50000, 60000, 55000, 52000, 61000]}
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='Salary', index='Name',
columns='Department', aggfunc='mean')
pivot_table.fillna(0, inplace=True)
print(pivot_table)

OUTPUT:
Department Finance HR IT
Name
Alice 0 51000.0 0.0
Bob 0 0.0 60500.0
Charlie 55000 0.0 0.0

c.HANDLING DUPLICATE ROWS

ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis
STEP2: Create a dictionary named data
STEP3: Create a pandas DataFrame named df from the data dictionary using the
pd.DataFrame(data) function. This converts the dictionary into a tabular data
structure.
STEP4: Use the df.duplicated() method to identify duplicate rows in the
DataFrame. This returns a boolean Series where True indicates a duplicate row
and False indicates a unique row.
STEP5: Print the duplicate rows by indexing the DataFrame with the duplicates
Series. This will display the rows that are identical to previous rows.
STEP6: Use the df.drop_duplicates() method to remove duplicate rows from the
DataFrame. This returns a new DataFrame with only unique rows.
STEP7: Print the df_no_duplicates DataFrame to display the results. This will
output a table containing the data without any duplicate rows.
PROGRAM:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob'],
'Age': [25, 30, 25, 35, 30],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Los Angeles']}
df = pd.DataFrame(data)
duplicates = df.duplicated()
print("Duplicate rows:\n", df[duplicates])
df_no_duplicates = df.drop_duplicates()
print("\nDataFrame without duplicates:\n", df_no_duplicates)

OUTPUT:
Duplicate rows:
Name Age City
2 Alice 25 New York
4 Bob 30 Los Angeles
DataFrame without duplicates:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
3 Charlie 35 Chicago
d.ACCESS ROW BY INDEX
ALGORITHM:
STEP1: Import the pandas library for data manipulation and analysis.
STEP2: Create a Python dictionary data with keys representing column names
('Name', 'Age', 'City') and values as lists containing corresponding data.
STEP3: Use pd.DataFrame(data) to create a pandas DataFrame from the
dictionary.This DataFrame will have columns 'Name', 'Age', and 'City'.
STEP4: Use df.loc[1] to access the second row of the DataFrame. Remember
that indexing in Python starts from 0, so the second row has index 1.
STEP5: Use print(second_row) to display the selected row.
PROGRAM
import pandas as pd
data = {'Name': ['Gen', 'Mary', 'Alice'], 'Age': [22, 28, 35], 'City': ['New
York', 'Londan', 'Coimbatore']}
df = pd.DataFrame(data)
second_row = df.loc[1]
print(second_row)
OUTPUT:
Name Mary
Age 28
City Londan
Name: 1, dtype: object
e. CALCULATE AVERAGE AND FILTER ROWS
ALGORITHM:
STEP1: Import the pandas library using the import pandas as pd statement. This
library provides tools for data manipulation and analysis.
STEP2: Create a dictionary named data containing the following key-value
pairs:
o 'Name': A list of employee names.
o 'Age': A list of employee ages.
o 'Salary': A list of employee salaries.
STEP3: Create a pandas DataFrame named df from the data dictionary using
the pd.DataFrame(data) function. This converts the dictionary into a tabular data
structure.
STEP4: Calculate the average salary using the df['Salary'].mean() expression.
This computes the mean value of the salaries in the 'Salary' column.
STEP5:Filter the DataFrame to only include rows where the 'Salary' is greater
than the avg_salary. This creates a new DataFrame df_filtered containing only
the employees whose salaries exceed the average.
STEP6: Print the df_filtered DataFrame to display the results. This will output a
table containing the names, ages, and salaries of the employees who earn more
than the average salary.

PROGRAM
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [25, 30, 35, 40, 22],
'Salary': [50000, 60000, 55000, 65000, 48000]}
df = pd.DataFrame(data)
avg_salary = df['Salary'].mean()
df_filtered = df[df['Salary'] > avg_salary]
print(df_filtered)

OUTPUT:
Name Age Salary
1 Bob 30 60000
2 Charlie 35 55000
3 David 40 65000
RESULT:
Thus the working with pandas dataframe was successfully executed
and verified.
EX NO:4
BASIC PLOT USING MATPLOTLIB
DATE:

AIM:
To write a python program of basic plot using matplotlib
a.PLOTTING SINE AND COSINE FUNCTIONS WITH LEGENDS IN
MATPLOTLIB
ALGORITHM
STEP1: Import numpy for numerical operations and matplotlib.pyplot for
plotting
STEP2: Use np.linspace() to generate 1000 evenly spaced points between 0 and
10, and assign them to the x variable.
STEP3: Create a figure and an axis object using plt.subplots().
STEP4: Use ax.plot() to plot the sine and cosine functions:Plot the sine
function with a dashed blue line ('--b') and label it 'Sine'.Plot the cosine function
with a red line (c='r') and label it 'Cosine'.
STEP5: Use ax.axis('equal') to set the aspect ratio of the axes to be equal,
ensuring that the scales of the x and y axes are the same.
STEP6: Use ax.legend(loc="lower left") to add a legend to the plot. The
loc="lower left" argument specifies the location of the legend.
STEP7: Implicitly display the plot using plt.show() (not explicitly included in
the code, but it's the default behavior).
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 1000)
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), '--b', label='Sine')
ax.plot(x, np.cos(x), c='r', label='Cosine')
ax.axis('equal')
leg = ax.legend(loc="lower left")
OUTPUT

b.PLACE THE LEGEND OUTSIDE THE PLOT IN MATPLOTLIB

ALGORITHM:
STEP1: Import numpy (not used in this specific example but commonly used
for numerical computations) .Import matplotlib.pyplot for plotting
functionalities.
STEP2: Create a list x containing values for the x-axis (0 to 8).Create two lists,
y1 and y2, containing values for the y-axis of the two functions you want to
plot.
STEP3:Use plt.plot(y1, label="y = x") to plot the first function.
• y1 specifies the data points for the y-axis.
• label="y = x" sets the label for the legend, indicating the function plotted.
Use plt.plot(y2, label="y = 3x") to plot the second function following the same
format as the first.
STEP4: Use plt.legend(bbox_to_anchor=(0.75, 1.15), ncol=2) to add a legend to
the plot with some customizations.
• bbox_to_anchor=(0.75, 1.15): This positions the legend outside the plot
area at coordinates (0.75, 1.15) relative to the figure. You can adjust these
values to change the legend's position.
• ncol=2: This specifies that the legend should display labels in two
columns.
STEP5: Use plt.show() to display the generated plot with the two functions and
the legend.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
x = [0, 1, 2, 3, 4, 5, 6, 7, 8]
y1 = [0, 3, 6, 9, 12, 15, 18, 21, 24]
y2 = [0, 1, 2, 3, 4, 5, 6, 7, 8]
plt.plot(y1, label="y = x")
plt.plot(y2, label="y = 3x")
plt.legend(bbox_to_anchor=(0.75, 1.15), ncol=2)
plt.show()
OUTPUT

c.REMOVE THE LEGEND IN MATPLOTLIB

ALGORITHM
STEP1: Import numpy for numerical operations (used to create sequences of
data points). Import matplotlib.pyplot for plotting functionalitie.
STEP2: Use np.linspace(-3, 3, 100) to generate 100 evenly spaced points
between -3 and 3 for the x-axis and store them in x.Use np.power(x, 2) to
calculate the square of each element in x and store the results in y1. Use
np.power(x, 3) to calculate the cube of each element in x and store the results in
y2.
STEP3: Use plt.subplots(2, 1) to create a figure with two subplots arranged
vertically (2 rows, 1 column). This assigns the figure object to fig and an array
of axes objects to axs. Each element in axs represents an individual subplot.
STEP4: Use axs[0].plot(x, y1, c = 'r',label = 'x^2') to plot the squared function
(y1) on the first subplot (axs[0]). Use axs[1].plot(x, y2, c = 'g',label = 'x^3') to
plot the cubed function (y2) on the second subplot (axs[1]), following the same
format as the first plot.
STEP5: Use axs[0].legend(loc = 'upper left') to add a legend to the first subplot
positioned in the upper left corner (loc = 'upper left'). Use axs[1].legend(loc =
'upper left') to add a legend to the second subplot with the same positioning.
STEP6: Use axs[1].get_legend().remove() to remove the legend that was just
added to the second subplot (axs[1]).
STEP7: Use plt.show() to display the generated plot with the two subplots. The
first subplot will have a legend, while the second subplot will not.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-3, 3, 100)
y1 = np.power(x, 2)
y2 = np.power(x, 3)
fig, axs = plt.subplots(2, 1)
axs[0].plot(x, y1, c = 'r',label = 'x^2')
axs[1].plot(x, y2, c = 'g',label = 'x^3')
axs[0].legend(loc = 'upper left')
axs[1].legend(loc = 'upper left')
axs[1].get_legend().remove()
plt.show()
OUTPUT

RESULT:
Thus the program of basic matplot using matplotlib was successfully
executed and verified.
EX NO :5
STATISTICAL AND PROBABILITY MEASURES
DATE :

AIM:
To write a python program of statsical and probability measures.
a.FREQUENCY DISTRIBUTIONS
ALGORITHM:
STEP1: Create an empty dictionary to store the frequency of each number.Each
key in the dictionary will represent a unique number, and its corresponding
value will be the count of that number.
STEP2: For each number in the list: Check if the number is already a key in the
dictionary:
o If it is, increment its corresponding value by 1.
o If it's not, add the number as a new key to the dictionary with an
initial value of 1.

STEP3: Iterate over the key-value pairs in the dictionary. For each pair, print the
number and its corresponding frequency in a readable format.
PROGRAM
data = [5, 1, 2, 2, 3, 1, 5, 5, 4, 3, 4, 4, 1, 5]
frequency = {}
for item in data:
frequency[item] = frequency.get(item, 0) + 1
print("Frequency Distribution:")
for item, count in frequency.items():
print(f"{item}: {count}")
OUTPUT:
Frequency Distribution:
5: 4
1: 3
2: 2
3: 2
4: 3
b.MEAN,MODE,STANDARD DEVIATION
ALGORITHM
STEP1: Sum all the numbers in the list. Divide the sum by the total number of
elements in the list.
STEP2: Find the number that occurs most frequently in the list. If multiple
numbers occur with the same highest frequency, the mode can be multiple
values.
STEP3: Calculate the mean of the list. For each number in the list:
• Subtract the mean from the number.
• Square the difference. Calculate the average of the squared differences.
Take the square root of the average.
PROGRAM
import statistics
def calculate_statistics(data):
mean = statistics.mean(data)
mode = statistics.mode(data)
std_dev = statistics.stdev(data)
return mean, mode, std_dev
data = [1, 2, 2, 3, 4, 4, 4, 5, 6, 7]
mean, mode, std_dev = calculate_statistics(data)
print("Mean:", mean)
print("Mode:", mode)
print("Standard Deviation:", std_dev)
OUTPUT
Mean: 3.8
Mode: 4
Standard Deviation: 1.8737959096740262
c.VARIABILITY
ALGORITHM
STEP1: Import the statistics library to utilize its functions for variance and
standard deviation calculations.
STEP2: Create a function calculate_variability that takes a list of numbers
(data) as input.
STEP3: Find the maximum and minimum values in the data list. Subtract the
minimum value from the maximum value to obtain the range.
STEP4: Use the statistics.variance and statistics.stdev functions to calculate the
variance and standard deviation of the data list, respectively.
STEP5: Return the calculated range, variance, and standard deviation.Print the
calculated values to the console.
PROGRAM
import statistics
def calculate_variability(data):
data_range = max(data) - min(data)
variance = statistics.variance(data)
std_dev = statistics.stdev(data)
return data_range, variance, std_dev
data = [4, 8, 6, 5, 3, 8, 9, 7, 5]
data_range, variance, std_dev = calculate_variability(data)
print("Range:", data_range)
print("Variance:", variance)
print("Standard Deviation:", std_dev)
OUTPUT
Range: 6
Variance: 4.111111111111111
Standard Deviation: 2.0275875100994063
d.NORMAL CURVES
ALGORITHM
STEP1: Import numpy as np for numerical computations. Import
matplotlib.pyplot as plt for plotting functionalities.
STEP2: Set the desired mean (mean) of the normal distribution.Set the desired
standard deviation (std_dev) of the normal distribution.
STEP3: Create a NumPy array x of evenly spaced values using np.linspace.This
array will represent the X-axis of the plot.
STEP4: Define the mathematical formula for the normal distribution probability
density function (PDF).It involves the mean, standard deviation, and the
constant values (np.pi and np.exp).Use vectorized operations with NumPy for
efficient calculation.Store the calculated PDF values in a NumPy array y.
STEP5: Use plt.plot(x, y) to create a line plot of the data points (x-values vs.
probability densities). Set the plot title using plt.title("Normal Distribution
Curve"). Label the x-axis and y-axis using plt.xlabel("X-axis") and
plt.ylabel("Probability Density").
STEP6: Use plt.show() to display the generated plot.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
mean = 0
std_dev = 1
x = np.linspace(-4, 4, 1000)
y = (1 / (std_dev * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mean) / std_dev)
** 2)
plt.plot(x, y)
plt.title("Normal Distribution Curve")
plt.xlabel("X-axis")
plt.ylabel("Probability Density")
plt.show()
OUTPUT

e.CORREALATION AND SCATTER PLOTS

ALGORITHM
STEP1: Import numpy as np for numerical computations.Import
matplotlib.pyplot as plt for plotting functionalities.
STEP2: Create NumPy arrays x and y to store the data points for the two
variables.Ensure both arrays have the same length to represent corresponding
values.
STEP3: Use np.corrcoef(x, y) to calculate the correlation coefficient
matrix.Access the element at index [0, 1] to get the correlation between x and
y.Store this value in a variable correlation.
STEP4: Use plt.scatter(x, y, color='blue') to create a scatter plot with blue
colored markers.Each data point from x and y will be plotted as a marker on the
scatter plot.
STEP5: Set the plot title using plt.title(f"Scatter Plot (Correlation:
{correlation:.2f})").The title includes the calculated correlation coefficient
formatted to two decimal places. Label the x-axis and y-axis using
plt.xlabel("X-axis") and plt.ylabel("Y-axis").
STEP6: Use plt.show() to display the generated scatter plot.
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2, 4, 5, 4, 5, 7, 8, 8, 10, 12])
correlation = np.corrcoef(x, y)[0, 1]
plt.scatter(x, y, color='blue')
plt.title(f"Scatter Plot (Correlation: {correlation:.2f})")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
OUTPUT

f.CORRELATION COEFFICIENT
ALGORITHM
STEP1: Import the numpy library as np to perform numerical operations.
STEP2: Create a function named correlation_coefficient that takes two NumPy
arrays x and y as input.
STEP3: Use np.mean(x) and np.mean(y) to compute the mean values of x and y,
respectivelty.Store these values in mean_x and mean_y.
STEP4: Subtract mean_x from each element of x and mean_y from each
element of y.Multiply the corresponding differences and sum the results.Store
the sum in numerator.
STEP5: Subtract mean_x from each element of x and square the
differences.Sum the squared differences and store the result in a temporary
variable.Do the same for the y values.Multiply the two sums and take the square
root.Store the result in denominator.
STEP6: Divide the numerator by the denominator and return the result.
STEP7: Create two lists x and y to represent the data.Convert the lists to
NumPy arrays using np.array.Call the correlation_coefficient function with the
NumPy arrays as arguments. Print the calculated correlation coefficient.
PROGRAM
import numpy as np
def correlation_coefficient(x, y):
mean_x = np.mean(x)
mean_y = np.mean(y)
numerator = np.sum((x - mean_x) * (y - mean_y))
denominator = np.sqrt(np.sum((x - mean_x) ** 2) * np.sum((y - mean_y) **
2))
return numerator / denominator
x = [1, 2, 3, 4, 5]
y = [2, 3, 4, 5, 6]
result = correlation_coefficient(np.array(x), np.array(y))
print("Correlation Coefficient:", result)
OUTPUT
Correlation Coefficient: 1.0
g.REGRESSION
ALGORITHM
STEP1: Import numpy as np for numerical operations. Import LinearRegression
from sklearn.linear_model for linear regression functionality.
STEP2: Create NumPy arrays x and y to store the independent and dependent
variables, respectively.In this example, x represents a single feature (reshape(-1,
1) ensures it's a 2D column vector).
STEP3: Instantiate a LinearRegression object from scikit-learn to create the
linear regression model.
STEP4: Use the model.fit(x, y) method to fit the model with the training data (x
and y).During this process, the model estimates the optimal slope and intercept
for the best-fit line.
STEP5: Access the slope (coefficient) using model.coef_[0]. This represents the
change in y for a unit change in x.
STEP6: Print the calculated slope and intercept values.
PROGRAM
import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) # Reshape to make it 2D
y = np.array([2, 3, 4, 5, 6])
model = LinearRegression()
model.fit(x, y)
slope = model.coef_[0]
intercept = model.intercept_
print("Slope (Coefficient):", slope)
print("Intercept:", intercept)
predicted_y = model.predict(x)
print("Predicted Values:", predicted_y)
OUTPUT
Slope (Coefficient): 1.0
Intercept: 1.0
Predicted Values: [2. 3. 4. 5. 6.]
RESULT:
Thus the program for statistical and probability measures was
successfully executed and verified.
EX NO:6
UNIVARIATE AND BIVARIATE ANALYSIS
DATE:

AIM:
To write a python program for univariate ad bivariate analysis
a.UNIVARIATE ANALYSIS:
FREQUENCY,MEAN,MEDIAN,MODE,VARIANCE,STANDARD
DEVIATION,SKEWNESS AND KURTOSIS
ALGORITHM
STEP1: Initialize an empty dictionary to store frequencies.
STEP2: Sum all data points.Divide the sum by the number of data points.
STEP3: Sort the data in ascending order.If the number of data points is odd, the
median is the middle value.
STEP4: Find the most frequent value in the data.If multiple values have the
same highest frequency, there's no unique mode.
STEP5: Calculate the mean.
STEP6: Calculate the variance.

PROGRAM
import statistics as stats
import collections
data = [12, 15, 12, 18, 16, 14, 15, 14, 16, 18, 19, 21, 15, 17, 19]
frequency = collections.Counter(data)
print("Frequency Distribution:")
for value, count in frequency.items():
print(f"{value}: {count}")
mean_value = stats.mean(data)
print("\nMean:", mean_value)
median_value = stats.median(data)
print("Median:", median_value)
try:
mode_value = stats.mode(data)
print("Mode:", mode_value)
except stats.StatisticsError:
print("Mode: No unique mode found")
variance_value = stats.variance(data) # Sample variance
print("Variance:", variance_value)
std_deviation_value = stats.stdev(data)
print("Standard Deviation:", std_deviation_value)

OUTPUT
Frequency Distribution:
12: 2
15: 3
18: 2
16: 2
14: 2
19: 2
21: 1
17: 1
Mean: 16.066666666666666
Median: 16
Mode: 15
Variance: 6.780952380952381
Standard Deviation: 2.6040261866871424
b.BIVARIATE ANALYSIS:
LINEAR REGRESSION MODELLING
ALGORITHM
STEP1: Use numpy.random.rand to create random values for the independent
variable (X). Use train_test_split from sklearn.model_selection to split the data
into training and testing sets. This helps evaluate the model's performance on
unseen data.
STEP2: Import LinearRegression from sklearn.linear_model. This class
implements the linear regression algorithm.
STEP3: Use model.fit(X_train, y_train) to train the model. This involves fitting
the equation to the training data (X_train and y_train).
STEP4: Use model.predict(X_test) to predict the dependent variable (y) values
for the unseen test data (X_test).
STEP5: Access model.intercept_ to get the estimated intercept (b0) in the linear
equation.
STEP6: Calculate the R^2 score using r2_score from sklearn.metrics. This
indicates how well the regression line fits the data (closer to 1 is better).
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
np.random.seed(0)
X = 2 * np.random.rand(100, 1) # Independent variable (predictor)
y = 4 + 3 * X + np.random.randn(100, 1) # Dependent variable with some
noise
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)
OUTPUT
Intercept: [4.32235853]
Coefficient: [[2.93647151]]
Mean Squared Error: 1.0434333815695171
R^2 Score: 0.7424452332071367
LOGISTIC REGRESSION MODELLING
ALGORITHM
STEP1: Generate synthetic dataset.
STEP2: Prepare the data for logistic regression.
STEP3: Perform Logistic Regression with statsmodels.
STEP4: Alternatively, fit the model with sklearn's LogisticRegression.
STEP5: Make predictions and evaluate the model.
PROGRAM
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
np.random.seed(0)
data_size = 100
target = np.random.choice([0, 1], size=data_size)
predictor = target + np.random.normal(0, 1, data_size)
df = pd.DataFrame({'Target': target, 'Predictor': predictor})
X = df[['Predictor']]
y = df['Target']
X_sm = sm.add_constant(X)
logit_model = sm.Logit(y, X_sm).fit()
print("Logistic Regression with statsmodels:")
print(logit_model.summary())
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=0)
sklearn_logit = LogisticRegression()
sklearn_logit.fit(X_train, y_train)
y_pred = sklearn_logit.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print("\nLogistic Regression with sklearn:")
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
OUTPUT
Logistic Regression with sklearn:
Accuracy: 0.7333333333333333
Confusion Matrix:
[[ 8 4]
[ 4 14]]
Classification Report:
precision recall f1-score support
0 0.67 0.67 0.67 12
1 0.78 0.78 0.78 18

accuracy 0.73 30
macro avg 0.72 0.72 0.72 30
weighted avg 0.73 0.73 0.73 30

RESULT
Thus the python program for univariate and bivariate analysis are
successfully executed and verified.
EX NO : 7 SUPERVISED AND UNSUPERVISED LEARNING
DATE : ALGORITHM ON ANY DATA SET

AIM
To write python program for supervised and unsupervised learning algorithm
on any data set
a.SUPERVISED LEARNING
ALGORITHM
STEP1: Import datasets.load_wine from sklearn.
STEP2: Split the scaled data (X_scaled) and class labels (y_class) or regression
target (y_regression) into training and testing sets using train_test_split(). This
ensures the model is evaluated on unseen data.
STEP3: Import KNeighborsClassifier from sklearn.neighbors.
STEP4: Use knn.fit(X_train_class, y_train_class) to train the KNN model on the
labeled classification training data (X_train_class and y_train_class).
STEP5: Use tree.predict(X_test_reg) to predict the first feature's value for the
unseen test data (X_test_reg). The model predicts a continuous value for each
data point in the test set.
PROGRAM
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
wine = datasets.load_wine()
X = wine.data
y_class = wine.target
y_regression = wine.data[:, 0]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train_class, X_test_class, y_train_class, y_test_class =
train_test_split(X_scaled, y_class, test_size=0.3, random_state=42)
X_train_reg, X_test_reg, y_train_reg, y_test_reg =
train_test_split(X_scaled, y_regression, test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_class, y_train_class)
y_pred_class = knn.predict(X_test_class)
print("KNN Classification Accuracy:", accuracy_score(y_test_class,
y_pred_class))
tree = DecisionTreeRegressor(random_state=42)
tree.fit(X_train_reg, y_train_reg)
y_pred_reg = tree.predict(X_test_reg)
print("Decision Tree MSE:", mean_squared_error(y_test_reg, y_pred_reg))
print("Decision Tree R2 Score:", r2_score(y_test_reg, y_pred_reg))
OUTPUT
KNN Classification Accuracy: 0.9629629629629629
Decision Tree MSE: 0.0016999999999999878
Decision Tree R2 Score: 0.996833244367082
b.UNSUPERVISED LEARNING
ALGORITHM
STEP1: Initialization:Randomly select k data points as initial centroids.
STEP2: Calculate the distance to each centroid.
STEP3: Calculate the mean of all data points assigned to the cluster. Set the
centroid of the cluster to this mean.
STEP4: Repeat steps 2 and 3 until convergence (i.e., no data points change
clusters).
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
data = np.array([
[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0],
[5, 5], [6, 6], [5, 6]
])
num_clusters = 3
kmeans = KMeans(n_clusters=num_clusters, random_state=0)
kmeans.fit(data)
centers = kmeans.cluster_centers_
labels = kmeans.labels_
print("Cluster centers:\n", centers)
print("Labels for each data point:", labels)
OUTPUT
Cluster centers:
[[10. 2. ]
[ 1. 2. ]
[ 5.33333333 5.66666667]]
Labels for each data point: [1 1 1 0 0 0 2 2 2]

RESULT
Thus the python program for supervised and unsupervised learning
algorithm on any dataset was successfully executed and verified.
EX NO:8
VARIOUS PLOTTING FUNCTIONS ON
DATE: ANY DATA SET

AIM
To write a python program for various plotting functions on any dataset
a.SIMPLE LINE PLOT
ALGORITHM
STEP1: Import the matplotlib module.
STEP2: plt.plot(x, y, color="blue", label="y = sin(x)") is the key line for
creating the plot.
STEP3: plt.title("Simple Line Plot of y = sin(x)") adds a title to the
plot.plt.xlabel("x") and plt.ylabel("y") label the X and Y axes, respectively.
STEP4: plt.grid(True) adds a grid to the plot for better visual reference.
STEP5: plt.show() displays the generated line plot.
PROGRAM
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100) # 100 points between 0 and 10
y = np.sin(x) # sine function
plt.figure(figsize=(8, 6))
plt.plot(x, y, color="blue", label="y = sin(x)")
plt.title("Simple Line Plot of y = sin(x)")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.grid(True)
plt.show()
OUTPUT

b.BAR PLOT
ALGORITHM
STEP1: The provided data is assumed to be in separate lists (categories and
values). Seaborn can handle various data structures like DataFrames or
dictionaries.
STEP2: For each category: A rectangular bar is created .
STEP3: The X-axis labels are set to the category labels.The Y-axis label is set to
the label for the values (usually the units of measurement).
STEP4: A title is added to the plot (optional).The plot is displayed using
plt.show().

PROGRAM
import matplotlib.pyplot as plt
import seaborn as sns
categories = ['Category A', 'Category B', 'Category C', 'Category D']
values = [23, 45, 56, 78]
plt.figure(figsize=(8, 6))
sns.barplot(x=categories, y=values, palette="viridis")
plt.title("Bar Plot of Categories")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()
OUTPUT

c.PIE CHART
ALGORITHM
STEP1: labels: Names of the categories for each slice.sizes: Numerical values
representing the size of each slice (usually proportional to a quantity).
STEP2: plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
startangle=140) is the key line for generating the pie chart.
STEP3: plt.figure(figsize=(8, 6)) sets the figure size (width and height in
inches).plt.title("Pie Chart of Categories") adds a title to the plot.
STEP4: plt.show() displays the generated pie chart.
PROGRAM
labels = ['Category A', 'Category B', 'Category C', 'Category D']
sizes = [15, 30, 45, 10]
colors = ['gold', 'lightblue', 'lightgreen', 'salmon']
plt.figure(figsize=(8, 6))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
startangle=140)
plt.title("Pie Chart of Categories")
plt.show()
OUTPUT

RESULT
Thus the program for various plotting functions on any dataset was
successfully executed and verified.

python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Unit 1
No ratings yet
Unit 1
170 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
Unique Questions PDF
No ratings yet
Unique Questions PDF
133 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Efficient Computing With NumPy
No ratings yet
Efficient Computing With NumPy
73 pages
Unit Iv FDS
No ratings yet
Unit Iv FDS
142 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Dfs Manual
No ratings yet
Dfs Manual
43 pages
UNIT 5 Python Aktu
No ratings yet
UNIT 5 Python Aktu
49 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Data Science Record
No ratings yet
Data Science Record
44 pages
Fda Lab
No ratings yet
Fda Lab
43 pages
Section 7
No ratings yet
Section 7
33 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Batch2 Ds
No ratings yet
Batch2 Ds
34 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
30 pages
Manual
No ratings yet
Manual
52 pages
Data Science Handwritten Notes - 3
No ratings yet
Data Science Handwritten Notes - 3
26 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
ELE492 - ELE492 - Image Process Lecture Notes 5
No ratings yet
ELE492 - ELE492 - Image Process Lecture Notes 5
41 pages
Fds PDF
No ratings yet
Fds PDF
58 pages
Numpy
No ratings yet
Numpy
20 pages
Unit 4 Numpy
No ratings yet
Unit 4 Numpy
14 pages
Unit 7 Python Libraries For Data Science
No ratings yet
Unit 7 Python Libraries For Data Science
34 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
16 pages
11th PGM
No ratings yet
11th PGM
9 pages
Python Unit-5
No ratings yet
Python Unit-5
14 pages
Record
No ratings yet
Record
25 pages
Python Programming U5
No ratings yet
Python Programming U5
46 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Python Exps Questions
No ratings yet
Python Exps Questions
10 pages
Module3 Advance Pythonlibraries
No ratings yet
Module3 Advance Pythonlibraries
53 pages
Wa0009.
No ratings yet
Wa0009.
31 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Value Added Course: Programming in Python and Machine Learning UNIT-2
No ratings yet
Value Added Course: Programming in Python and Machine Learning UNIT-2
41 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Fundamentals of Data Science Lab Manual New
No ratings yet
Fundamentals of Data Science Lab Manual New
33 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Numpy
No ratings yet
Numpy
9 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
STD XII-IP Ch-1 (Practical)
No ratings yet
STD XII-IP Ch-1 (Practical)
7 pages
Staff Manula 01
No ratings yet
Staff Manula 01
7 pages
Mds1111 Merged Numbered
No ratings yet
Mds1111 Merged Numbered
41 pages
Attachment 3 Python For Data Analysis Lyst9850
No ratings yet
Attachment 3 Python For Data Analysis Lyst9850
31 pages
Shalvin
No ratings yet
Shalvin
9 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
NUMPY
No ratings yet
NUMPY
33 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Numpy
No ratings yet
Numpy
7 pages
STATISTICS AND PROBABILITY Until T Distribution 2
No ratings yet
STATISTICS AND PROBABILITY Until T Distribution 2
214 pages
Chapter 3-Numerical Descriptive Measures
No ratings yet
Chapter 3-Numerical Descriptive Measures
69 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
TX 1xx Instruction Manual V3.00 - ENG
No ratings yet
TX 1xx Instruction Manual V3.00 - ENG
141 pages
Grandstar pp2 T1 2025
No ratings yet
Grandstar pp2 T1 2025
14 pages
Notation
100% (1)
Notation
36 pages
Risk and Return Analysis of Stocks of Nepalese Com
No ratings yet
Risk and Return Analysis of Stocks of Nepalese Com
18 pages
Richardson DAA 3e PPT Ch03)
No ratings yet
Richardson DAA 3e PPT Ch03)
47 pages
Chapter 08
No ratings yet
Chapter 08
66 pages
Basic Stat - Study Guides (Modules 1-7) - PublisherF
No ratings yet
Basic Stat - Study Guides (Modules 1-7) - PublisherF
31 pages
Pgde 742
No ratings yet
Pgde 742
16 pages
Stats Lab Report
No ratings yet
Stats Lab Report
20 pages
Quality Assurance For Chemist
No ratings yet
Quality Assurance For Chemist
49 pages
Accuracy of Mechanical Torque Limiting Devices For
No ratings yet
Accuracy of Mechanical Torque Limiting Devices For
10 pages
Truness Vs Precision
No ratings yet
Truness Vs Precision
24 pages
Exemplar Distance Learning Lesson
No ratings yet
Exemplar Distance Learning Lesson
4 pages
Researchpaperonsupermarket - Imp
100% (1)
Researchpaperonsupermarket - Imp
32 pages
Icici Pru Pms Pipe Strategy Factsheet May 2025
No ratings yet
Icici Pru Pms Pipe Strategy Factsheet May 2025
3 pages
Baryshnikova (2023)
No ratings yet
Baryshnikova (2023)
17 pages
Marine Structures: Ahmed A. Elshafey, Mahmoud R. Haddara, H. Marzouk
No ratings yet
Marine Structures: Ahmed A. Elshafey, Mahmoud R. Haddara, H. Marzouk
18 pages
A Comparative Study On Financial Perform
No ratings yet
A Comparative Study On Financial Perform
16 pages
Mota-Vargas y Rojas-Soto 2012 The Importance of Defining The Geographic Distribution of Species For
No ratings yet
Mota-Vargas y Rojas-Soto 2012 The Importance of Defining The Geographic Distribution of Species For
8 pages
Anomaly Detection On Seasonal Metrics Via Robust Time Series Decomposition
No ratings yet
Anomaly Detection On Seasonal Metrics Via Robust Time Series Decomposition
6 pages
Chapter 4 Discrete Probability Distribution
No ratings yet
Chapter 4 Discrete Probability Distribution
8 pages
Hypothesis Problems
No ratings yet
Hypothesis Problems
5 pages
Midterm Wife Cheat Sheet
No ratings yet
Midterm Wife Cheat Sheet
3 pages
FIN555 Homework 2 SP19 Solution
No ratings yet
FIN555 Homework 2 SP19 Solution
5 pages
STATS PROB Long Test
No ratings yet
STATS PROB Long Test
2 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Ds Lab-1

Uploaded by

Ds Lab-1

Uploaded by

EX NO:2

WORKING WITH NUMPY ARRAYS

b.CHANGE AN ARRAY DATATYPE

b.PIVOT TABLE CREATION

c.HANDLING DUPLICATE ROWS

b.PLACE THE LEGEND OUTSIDE THE PLOT IN MATPLOTLIB

c.REMOVE THE LEGEND IN MATPLOTLIB

e.CORREALATION AND SCATTER PLOTS

You might also like