[go: up one dir, main page]

0% found this document useful (0 votes)
117 views33 pages

Practical 1 and 2-1

The document contains code snippets and output related to data analysis tasks using NumPy and Pandas. The tasks involve computing statistics of arrays, manipulating multi-dimensional arrays, handling missing values in data frames, sorting and filtering data frames. Correlation and covariance are also calculated between columns of a data frame.

Uploaded by

SURAJ BISWAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views33 pages

Practical 1 and 2-1

The document contains code snippets and output related to data analysis tasks using NumPy and Pandas. The tasks involve computing statistics of arrays, manipulating multi-dimensional arrays, handling missing values in data frames, sorting and filtering data frames. Correlation and covariance are also calculated between columns of a data frame.

Uploaded by

SURAJ BISWAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

31/01/2024

Practical → 1st (a)


Q1 Write programs in Python using NumPy library to do the following:
A. Compute the mean, standard deviation, and variance of a two-dimensional
random integer array along the second axis.
CODE:
import numpy as np
x=np.arange(6)
print("\nOriginal array:")
print(x)
r1=np.mean(x)
r2=np.average(x)
print("\nMean:",r1)
r1=np.std(x)
r2=np.sqrt(np.mean((x-np.mean(x))**2))
print("\nstd:",r1)
r1=np.var(x)
r2=np.mean((x-np.mean(x))**2)
print("\nvariance;",r1)

OUTPUT:
01/02/2024

Practical → 1st (b)


Q1 Write programs in Python using NumPy library to do the following:
B. Create a 2-dimensional array of size m x n integer elements, also print the shape,
type and data type of the array and then reshape it into an n x m array, where n
and m are user inputs given at the run time
CODE:
import numpy as np
x = np.array([[2, 4, 6], [6, 8, 10]])
print("First Array : ")
print(x)
print("Type of Array")
print(type(x))
print("Shape of Array")
print(x.shape)
print(x.dtype)
reshaped2 = np.reshape(x, (3, 2))
print("Second Reshaped Array : ")
print(reshaped2)

OUTPUT:
21/02/2024

Practical → 1st (c)


Q1 Write programs in Python using NumPy library to do the following:
C. Test whether the elements of a given 1D array are zero, non-zero, and NaN.
Record the indices of these elements in three separate arrays.
CODE:
import numpy as np
arr = np.array([0, 1, 2, 0, np.nan, 5, 0])
zero_indices = np.where(arr == 0)[0]
non_zero_indices = np.where(arr != 0)[0]
nan_indices = np.where(np.isnan(arr))
print("Zero indices:", zero_indices)
print("Non-zero indices:", non_zero_indices)
print("NaN indices:", nan_indices)

OUTPUT:
21/02/2024

Practical → 1st (d)


Q1 Write programs in Python using NumPy library to do the following:
D. Create three random arrays of the same size: Array1, Array2, and
Array3. Subtract Array 2 from Array3 and store in Array4. Create
another array Array5 having two times the values in Array1. Find
Covariance and Correlation of Array1 with Array4 and Array5
respectively
CODE:
import numpy as np

# Create three random arrays of the same size


size = 100
Array1 = np.random.rand(size)
Array2 = np.random.rand(size)
Array3 = np.random.rand(size)

# Subtract Array2 from Array3 and store in Array4


Array4 = Array3 - Array2

# Create Array5 having two times the values in Array1


Array5 = 2 * Array1

# Find Covariance of Array1 with Array4


covariance_1_4 = np.cov(Array1, Array4)[0][1]

# Find Correlation of Array1 with Array5


correlation_1_5 = np.corrcoef(Array1, Array5)[0][1]

print("Covariance of Array1 with Array4:", covariance_1_4)


print("Correlation of Array1 with Array5:", correlation_1_5)

OUTPUT:
21/02/2024

Practical → 1st (e)


Q1 Write programs in Python using NumPy library to do the following:
E. Create two random arrays of the same size 10: Array1, and Array2. Find the sum
of the first half of both the arrays and the product of the second half of both
arrays.
CODE:
import numpy as np

# Create two random arrays of the same size 10


size = 10
Array1 = np.random.rand(size)
Array2 = np.random.rand(size)

# Find the sum of the first half of both arrays


sum_first_half_Array1 = np.sum(Array1[:size//2])
sum_first_half_Array2 = np.sum(Array2[:size//2])

# Find the product of the second half of both arrays


product_second_half_Array1 = np.prod(Array1[size//2:])
product_second_half_Array2 = np.prod(Array2[size//2:])

print("Sum of the first half of Array1:", sum_first_half_Array1)


print("Sum of the first half of Array2:", sum_first_half_Array2)
print("Product of the second half of Array1:", product_second_half_Array1)
print("Product of the second half of Array2:", product_second_half_Array2)

OUTPUT:
28/02/2024

Practical → 2nd (a)


Q2 Do the following using PANDAS Series:
A. Create a series with 5 elements. Display the series sorted on index and also sorted
on values separately.
CODE:
import pandas as pd

# Create a series with 5 elements


series = pd.Series([10, 5, 8, 2, 7], index=['e', 'a', 'd', 'c', 'b'])

# Display the series sorted on index


sorted_by_index = series.sort_index()

# Display the series sorted on values


sorted_by_values = series.sort_values()

print("Series sorted on index:")


print(sorted_by_index)

print("\nSeries sorted on values:")


print(sorted_by_values)

OUTPUT:
28/02/2024

Practical → 2nd (b)


Q2 Do the following using PANDAS Series:
B. Create a series with N elements with some duplicate values. Find the minimum
and maximum ranks assigned to the values using the ‘first’ and ‘max’ methods.
CODE:
import pandas as pd
series = pd.Series([2, 4, 6, 2, 8, 6, 3, 7, 4, 5])
min_ranks_first = series.rank(method='first')
max_ranks_max = series.rank(method='max')

print("Series:")
print(series)
print("\nMinimum ranks (using 'first' method):")
print(min_ranks_first)
print("\nMaximum ranks (using 'max' method):")
print(max_ranks_max)

OUTPUT:
28/02/2024

Practical → 2nd (c)


Q2 Do the following using PANDAS Series:
C. Display the index value of the minimum and maximum element of a
Series.
CODE:
import pandas as pd

# Create a sample series


series = pd.Series([10, 5, 8, 2, 7])

# Find index value of the minimum element


min_index = series.idxmin()

# Find index value of the maximum element


max_index = series.idxmax()

print("Index value of the minimum element:", min_index)


print("Index value of the maximum element:", max_index)

OUTPUT:
06/03/2024

Practical → 3rd (a)


Q3 Create a data frame having at least 3 columns and 50 rows to store numeric data
generated using a random function. Replace 10% of the values by null values whose
index positions are generated using random function. Do the following:
a. Identify and count missing values in a data frame.
CODE:
import pandas as pd
import numpy as np
data = np.random.randn(50, 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
null_indices = np.random.choice(df.index, size=int(0.1 * len(df)), replace=False)
for idx in null_indices:
col_idx = np.random.randint(0, 3)
df.iloc[idx, col_idx] = np.nan
missing_values_count = df.isnull().sum()
print("Missing values count:")
print(missing_values_count)

OUTPUT:
06/03/2024

Practical → 3rd (b)


b. Drop the column having more than 5 null values.
CODE:
import pandas as pd
import numpy as np
data = np.random.randn(50, 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
null_indices = np.random.choice(df.index, size=int(0.1 * len(df)), replace=False)
for idx in null_indices:
col_idx = np.random.randint(0, 3)
df.iloc[idx, col_idx] = np.nan
df=df.dropna(axis=1, thresh=45)
print(df)

OUTPUT:
06/03/2024

Practical → 3rd (c)


c. Identify the row label having maximum of the sum of all values in a row and drop that
row.
CODE:
import pandas as pd
import numpy as np
data = np.random.randn(50, 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
null_indices = np.random.choice(df.index, size=int(0.1 * len(df)), replace=False)
for idx in null_indices:
col_idx = np.random.randint(0, 3)
df.iloc[idx, col_idx] = np.nan
row_sums = df.sum(axis=1)
max_row_label = row_sums.idxmax()
df = df.drop(index=max_row_label)
print(df)

OUTPUT:
06/03/2024

Practical → 3rd (d)


d. Sort the data frame on the basis of the first column.
CODE:
import pandas as pd
import numpy as np
data = np.random.randn(50, 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
null_indices = np.random.choice(df.index, size=int(0.1 * len(df)), replace=False)
for idx in null_indices:
col_idx = np.random.randint(0, 3)
df.iloc[idx, col_idx] = np.nan
df_sorted = df.sort_values(by='Column1')
print(df_sorted)

OUTPUT:
06/03/2024

Practical → 3rd (e)


e. Remove all duplicates from the first column.
CODE:
import pandas as pd
import numpy as np
data = np.random.randn(50, 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
null_indices = np.random.choice(df.index, size=int(0.1 * len(df)), replace=False)
for idx in null_indices:
col_idx = np.random.randint(0, 3)
df.iloc[idx, col_idx] = np.nan
df_unique=df.drop_duplicates(subset=['Column1'])
print(df_unique)

OUTPUT:
06/03/2024

Practical → 3rd (f)


f. Find the correlation between first and second column and covariance between second
and third column.
CODE:
import pandas as pd
import numpy as np
data = np.random.randn(50, 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
null_indices = np.random.choice(df.index, size=int(0.1 * len(df)), replace=False)
for idx in null_indices:
col_idx = np.random.randint(0, 3)
df.iloc[idx, col_idx] = np.nan
# Calculate correlation between first and second column
correlation_first_second = df['Column1'].corr(df['Column2'])

# Calculate covariance between second and third column


covariance_second_third = df['Column2'].cov(df['Column3'])

print("Correlation between first and second column:", correlation_first_second)


print("Covariance between second and third column:", covariance_second_third)

OUTPUT:
06/03/2024

Practical → 3rd (g)


g. Discretize the second column and create 5 bins.
CODE:
import pandas as pd
import numpy as np
data = np.random.randn(50, 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
null_indices = np.random.choice(df.index, size=int(0.1 * len(df)), replace=False)
for idx in null_indices:
col_idx = np.random.randint(0, 3)
df.iloc[idx, col_idx] = np.nan
df['Column2_bins'] = pd.cut(df['Column2'], bins=5)
print(df)

OUTPUT:
13/03/2024

Practical → 6th
Q Consider the following data frame containing a family name, gender of the family
member and her/his monthly income in each record.

CODE:
import pandas as pd

# Creating the DataFrame


data = {
'Name': ['Shah', 'Vats', 'Vats', 'Kumar', 'Vats', 'Kumar', 'Shah', 'Shah', 'Kumar',
'Vats'],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',
'Female', 'Male'],
'MonthlyIncome': [114000.00, 65000.00, 43150.00, 69500.00, 155000.00,
103000.00, 55000.00, 112400.00, 81030.00, 71900.00]
}

df = pd.DataFrame(data)
print(df)
OUTPUT:
13/03/2024

Practical → 6th (a)


Q Write a program in Python using Pandas to perform the following:
a. Calculate and display familywise gross monthly income.
CODE:
import pandas as pd
data = {
'Name': ['Shah', 'Vats', 'Vats', 'Kumar', 'Vats', 'Kumar', 'Shah', 'Shah', 'Kumar',
'Vats'],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',
'Female', 'Male'],
'MonthlyIncome': [114000.00, 65000.00, 43150.00, 69500.00, 155000.00,
103000.00, 55000.00, 112400.00, 81030.00, 71900.00]
}
df = pd.DataFrame(data)
family_income = df.groupby('Name')['MonthlyIncome'].sum()
print("Familywise gross monthly income:")
print(family_income)
print()
OUTPUT:
13/03/2024

Practical → 6th (b)


Q Write a program in Python using Pandas to perform the following:
b. Calculate and display the member with the highest monthly income.
CODE:
import pandas as pd
data = {
'Name': ['Shah', 'Vats', 'Vats', 'Kumar', 'Vats', 'Kumar', 'Shah', 'Shah', 'Kumar',
'Vats'],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',
'Female', 'Male'],
'MonthlyIncome': [114000.00, 65000.00, 43150.00, 69500.00, 155000.00,
103000.00, 55000.00, 112400.00, 81030.00, 71900.00]
}
df = pd.DataFrame(data)
highest_income_member = df.loc[df['MonthlyIncome'].idxmax()]
print("Member with the highest monthly income:")
print(highest_income_member)
print()
OUTPUT:
13/03/2024

Practical → 6th (c)


Q Write a program in Python using Pandas to perform the following:
c. Calculate and display monthly income of all members with income greater than Rs.
60000.00.
CODE:
import pandas as pd
data = {
'Name': ['Shah', 'Vats', 'Vats', 'Kumar', 'Vats', 'Kumar', 'Shah', 'Shah', 'Kumar',
'Vats'],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',
'Female', 'Male'],
'MonthlyIncome': [114000.00, 65000.00, 43150.00, 69500.00, 155000.00,
103000.00, 55000.00, 112400.00, 81030.00, 71900.00]
}
df = pd.DataFrame(data)
high_income_members = df[df['MonthlyIncome'] > 60000.00]
print("Monthly income of members with income greater than Rs. 60000.00:")
print(high_income_members[['Name', 'MonthlyIncome']])
print()
OUTPUT:
13/03/2024

Practical → 6th (d)


Q Write a program in Python using Pandas to perform the following:
d. Calculate and display the average monthly income of the female members
CODE:
import pandas as pd
data = {
'Name': ['Shah', 'Vats', 'Vats', 'Kumar', 'Vats', 'Kumar', 'Shah', 'Shah', 'Kumar',
'Vats'],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',
'Female', 'Male'],
'MonthlyIncome': [114000.00, 65000.00, 43150.00, 69500.00, 155000.00,
103000.00, 55000.00, 112400.00, 81030.00, 71900.00]
}
df = pd.DataFrame(data)
female_avg_income = df[df['Gender'] == 'Female']['MonthlyIncome'].mean()
print("Average monthly income of female members:", female_avg_income)

OUTPUT:
21/03/2024

Practical → 7th (a)


Q7. Using Titanic dataset, to do the following:
A. Find the total number of passengers with age less than 30.

CODE:
import pandas as pd
titanic_df = pd.read_csv("C:/Users/DELL/Downloads/titanic.csv")
# a. Total number of passengers with age less than 30
passengers_under_30 = titanic_df[titanic_df['Age'] < 30]
total_passengers_under_30 = passengers_under_30.shape[0]
print("Total number of passengers with age less than 30:",
total_passengers_under_30)

OUTPUT:
21/03/2024

Practical → 7th (b)


Q7. Using Titanic dataset, to do the following:
B. Find total fare paid by passengers of first class.

CODE:
import pandas as pd
titanic_df = pd.read_csv("C:/Users/DELL/Downloads/titanic.csv")
# b. Total fare paid by passengers of first class
first_class_fare = titanic_df[titanic_df['Pclass'] == 1]['Fare'].sum()
print("Total fare paid by passengers of first class:", first_class_fare)

OUTPUT:
21/03/2024

Practical → 7th (c)


Q7. Using Titanic dataset, to do the following:
C. Compare number of survivors of each passenger class

CODE:
import pandas as pd
titanic_df = pd.read_csv("C:/Users/DELL/Downloads/titanic.csv")
# c. Number of survivors of each passenger class
survivors_by_class = titanic_df.groupby('Pclass')['Survived'].sum()
print("Number of survivors of each passenger class:")
print(survivors_by_class)

OUTPUT:
21/03/2024

Practical → 7th (d)


Q7. Using Titanic dataset, to do the following:
D. Compute descriptive statistics for any numeric attribute genderwise

CODE:
import pandas as pd
titanic_df = pd.read_csv("C:/Users/DELL/Downloads/titanic.csv")
# d. Descriptive statistics for age attribute genderwise
descriptive_stats_genderwise = titanic_df.groupby('Sex')['Age'].describe()
print("Descriptive statistics for age attribute genderwise:")
print(descriptive_stats_genderwise)

OUTPUT:
17/04/2024

Practical → 4th
Q4. Consider two Excel files having an attendance of two workshops. Each file has three
fields ‘Name’, ‘Date, duration (in minutes) where names are unique within a file. Note
that duration may take one of three values (30, 40, 50) only. Import the data into two
data frames.
CODE:
import pandas as pd

# Create dummy data for workshop1


workshop1_data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Date': ['2024-04-01', '2024-04-02', '2024-04-03', '2024-04-04'],
'Duration': [30, 40, 50, 30]
}

# Create dummy data for workshop2


workshop2_data = {
'Name': ['Alice', 'Eve', 'Charlie', 'Frank'],
'Date': ['2024-04-03', '2024-04-04', '2024-04-05', '2024-04-06'],
'Duration': [30, 40, 50, 30]
}

# Create data frames from the dummy data


df1 = pd.DataFrame(workshop1_data)
df2 = pd.DataFrame(workshop2_data)

# Display the first few rows of each data frame to verify the data
print("Data Frame 1:")
print(df1)

print("\nData Frame 2:")


print(df2)

OUTPUT:
Q. Import the data into two data frames and do the following:
a. Perform a merging of the two data frames to find the names of students
who had attended both workshops.
b. Find the names of all students who have attended a single workshop only.
c. Merge two data frames row-wise and find the total number of records in
the data frame.
d. Merge two data frames row-wise and use two columns viz. names and
dates as multi-row indexes. Generate descriptive statistics for this
hierarchical data frame.
CODE:
# a. Perform merging of the two data frames to find the names of students who had attended
both workshops.
attended_both = pd.merge(df1, df2, how='inner', on='Name')
print("\nNames of students who attended both workshops:")
print(attended_both['Name'].unique())

# b. Find names of all students who have attended a single workshop only.
attended_either = pd.merge(df1, df2, how='outer', on='Name', indicator=True)
attended_single = attended_either[attended_either['_merge'].isin(['left_only', 'right_only'])]
print("\nNames of students who attended a single workshop only:")
print(attended_single['Name'].unique())

# c. Merge two data frames row-wise and find the total number of records in the data frame.
merged_df = pd.concat([df1, df2], ignore_index=True)
print("\nTotal number of records in the merged data frame:", len(merged_df))

# d. Merge two data frames row-wise and use two columns viz. names and dates as multi-row
indexes.
# Generate descriptive statistics for this hierarchical data frame.
merged_df_multi_index = pd.concat([df1.set_index(['Name', 'Date']), df2.set_index(['Name',
'Date'])], axis=0)
print("\nDescriptive statistics for the hierarchical data frame:")
print(merged_df_multi_index.describe())
OUTPUT:
24/04/2024

Practical → 5th (a)


Q5. Using Iris data, plot the following with proper legend and axis labels: (Download
IRIS data from: https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn
datasets)
a. Plot bar chart to show the frequency of each class label in the data.
CODE:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load Iris dataset
iris_df = sns.load_dataset('iris')
# a. Plot bar chart to show the frequency of each class label in the data.
plt.figure(figsize=(8, 6))
sns.countplot(x='species', data=iris_df)
plt.title('Frequency of each class label')
plt.xlabel('Species')
plt.ylabel('Frequency')
plt.show()

OUTPUT:
24/04/2024

Practical → 5th (b)


Q5. Using Iris data, plot the following with proper legend and axis labels: (Download
IRIS data from: https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn
datasets)
b. Draw a scatter plot for Petal width vs sepal width and fit a regression line .
CODE:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load Iris dataset
iris_df = sns.load_dataset('iris')
# b. Draw a scatter plot for Petal width vs sepal width and fit a regression line
plt.figure(figsize=(8, 6))
sns.regplot(x='petal_width', y='sepal_width', data=iris_df)
plt.title('Petal width vs Sepal width')
plt.xlabel('Petal width (cm)')
plt.ylabel('Sepal width (cm)')
plt.show()

OUTPUT:
24/04/2024

Practical → 5th (c)


Q5. Using Iris data, plot the following with proper legend and axis labels: (Download
IRIS data from: https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn
datasets)
c. Plot density distribution for feature petal length.
CODE:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load Iris dataset
iris_df = sns.load_dataset('iris')
# c. Plot density distribution for feature petal length.
plt.figure(figsize=(8, 6))
sns.kdeplot(iris_df['petal_length'], shade=True)
plt.title('Density distribution of Petal length')
plt.xlabel('Petal length (cm)')
plt.ylabel('Density')
plt.show()

OUTPUT:
24/04/2024

Practical → 5th (d)


Q5. Using Iris data, plot the following with proper legend and axis labels: (Download
IRIS data from: https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn
datasets)
d. Use a pair plot to show pairwise bivariate distribution in the Iris Dataset.
CODE:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load Iris dataset
iris_df = sns.load_dataset('iris')
# d. Use a pair plot to show pairwise bivariate distribution in the Iris Dataset.
plt.figure(figsize=(10, 8))
sns.pairplot(iris_df, hue='species')
plt.show()

OUTPUT:
24/04/2024

Practical → 5th (e)


Q5. Using Iris data, plot the following with proper legend and axis labels: (Download
IRIS data from: https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn
datasets)
e. Draw heatmap for the four numeric attributes.
CODE:
mport pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load Iris dataset
iris_df = sns.load_dataset('iris')
# Drop the 'species' column
numeric_df = iris_df.drop(columns='species')
# Draw heatmap for the four numeric attributes
plt.figure(figsize=(8, 6))
sns.heatmap(numeric_df.corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Heatmap for numeric attributes')
plt.show()

OUTPUT:
24/04/2024

Practical → 5th (f)


Q5. Using Iris data, plot the following with proper legend and axis labels: (Download
IRIS data from: https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn
datasets)
f. Compute mean, mode, median, standard deviation, confidence interval and
standard error for each feature
CODE:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load Iris dataset
iris_df = sns.load_dataset('iris')
# Compute statistics for numeric features
numeric_stats = iris_df.describe().transpose()
# Compute mode for non-numeric features separately
non_numeric_modes =
iris_df.select_dtypes(include=['object']).mode().transpose().iloc[:, 0]
# Compute standard error
standard_error = iris_df.select_dtypes(include=['number']).sem().values
# Compute 95% confidence interval
n = iris_df.shape[0]
confidence_interval = 1.96 * iris_df.select_dtypes(include=['number']).std() / (n ** 0.5)
# Combine statistics
feature_stats = pd.concat([numeric_stats, pd.DataFrame(non_numeric_modes,
columns=['mode']),
pd.DataFrame(standard_error, columns=['standard error']),
pd.DataFrame((numeric_stats['mean'] - confidence_interval).values,
columns=['95% CI (low)']),
pd.DataFrame((numeric_stats['mean'] + confidence_interval).values,
columns=['95% CI (high)'])], axis=1)
print("\nFeature statistics:")
print(feature_stats)

OUTPUT:
24/04/2024

Practical → 5th (g)


Q5. Using Iris data, plot the following with proper legend and axis labels: (Download
IRIS data from: https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn
datasets)
g. Compute correlation coefficients between each pair of features and plot heatmap.
CODE:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load Iris dataset
iris_df = sns.load_dataset('iris')
# Exclude non-numeric column 'species'
numeric_columns = iris_df.select_dtypes(include=[float, int]).columns
iris_numeric_df = iris_df[numeric_columns]
# Compute correlation coefficients between each pair of features and plot heatmap
correlation_matrix = iris_numeric_df.corr()
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation heatmap')
plt.show()
OUTPUT:

You might also like