[go: up one dir, main page]

0% found this document useful (0 votes)
33 views3 pages

Assignment No - 10

This document analyzes an iris dataset using Python. It loads the dataset from a CSV file and summarizes the features, which include 4 numerical columns (sepal length, sepal width, petal length, petal width) and 1 object column for species. It then generates histograms of each numerical feature and a boxplot to visualize the distributions.

Uploaded by

Sid Chabukswar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views3 pages

Assignment No - 10

This document analyzes an iris dataset using Python. It loads the dataset from a CSV file and summarizes the features, which include 4 numerical columns (sepal length, sepal width, petal length, petal width) and 1 object column for species. It then generates histograms of each numerical feature and a boxplot to visualize the distributions.

Uploaded by

Sid Chabukswar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

In [3]: # Assignment - A10 | Name : Chabukswar Siddharth S.

| Roll No :76

In [4]: # import libraies


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

In [5]: df = pd.read_csv('iris.csv')
df.head()

Out[5]: sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

1. List down the features and their types


In [6]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB

In [7]: # Hence the dataset contains 4 numerical columns and 1 object column

In [8]: np.unique(df["species"])

array(['setosa', 'versicolor', 'virginica'], dtype=object)


Out[8]:

In [9]: df.describe()

Loading [MathJax]/extensions/Safe.js
Out[9]: sepal_length sepal_width petal_length petal_width

count 150.000000 150.000000 150.000000 150.000000

mean 5.843333 3.054000 3.758667 1.198667

std 0.828066 0.433594 1.764420 0.763161

min 4.300000 2.000000 1.000000 0.100000

25% 5.100000 2.800000 1.600000 0.300000

50% 5.800000 3.000000 4.350000 1.300000

75% 6.400000 3.300000 5.100000 1.800000

max 7.900000 4.400000 6.900000 2.500000

2. Create a histogram for each feature in the


dataset.
In [12]: import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2, figsize=(12, 6), constrained_layout = True)


for i in range(4):
x, y = i // 2, i % 2
axes[x, y].hist(df[df.columns[i + 1]])
axes[x, y].set_title(f"Distribution of {df.columns[i + 1][:-2]}")

3. Create a boxplot for each feature in the dataset.


In [13]: data_to_plot = [df[x] for x in df.columns[1:-1]]
fig, axes = plt.subplots(1, figsize=(12,8))
bp = axes.boxplot(data_to_plot)

Loading [MathJax]/extensions/Safe.js
In [ ]:

Loading [MathJax]/extensions/Safe.js

You might also like