[go: up one dir, main page]

0% found this document useful (0 votes)
3 views8 pages

Dsbda 10

The document contains a Python script that analyzes the Iris dataset using pandas, numpy, matplotlib, and seaborn. It includes data loading, descriptive statistics, data visualization through box plots and histograms, and correlation analysis. The dataset consists of 150 samples with measurements of sepal and petal dimensions across three species of Iris flowers.

Uploaded by

naitikpawar22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views8 pages

Dsbda 10

The document contains a Python script that analyzes the Iris dataset using pandas, numpy, matplotlib, and seaborn. It includes data loading, descriptive statistics, data visualization through box plots and histograms, and correlation analysis. The dataset consists of 150 samples with measurements of sepal and petal dimensions across three species of Iris flowers.

Uploaded by

naitikpawar22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]: Iris = pd.read_csv("C:/Users/prajw/Desktop/Indexs/DSBDA print/Assignment 10 (Data Visualization III)/Iris.csv")


Iris

Out[2]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

... ... ... ... ... ... ...

145 146 6.7 3.0 5.2 2.3 Iris-virginica

146 147 6.3 2.5 5.0 1.9 Iris-virginica

147 148 6.5 3.0 5.2 2.0 Iris-virginica

148 149 6.2 3.4 5.4 2.3 Iris-virginica

149 150 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 6 columns

In [3]: Iris.shape

Out[3]: (150, 6)

In [4]: Iris.describe()

Out[4]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm

count 150.000000 150.000000 150.000000 150.000000 150.000000

mean 75.500000 5.843333 3.054000 3.758667 1.198667

std 43.445368 0.828066 0.433594 1.764420 0.763161

min 1.000000 4.300000 2.000000 1.000000 0.100000

25% 38.250000 5.100000 2.800000 1.600000 0.300000

50% 75.500000 5.800000 3.000000 4.350000 1.300000

75% 112.750000 6.400000 3.300000 5.100000 1.800000

max 150.000000 7.900000 4.400000 6.900000 2.500000

In [5]: Iris.dtypes

Out[5]: Id int64
SepalLengthCm float64
SepalWidthCm float64
PetalLengthCm float64
PetalWidthCm float64
Species object
dtype: object

In [6]: Iris.isnull().sum()

Out[6]: Id 0
SepalLengthCm 0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0
dtype: int64

In [7]: print(Iris.groupby('Species').size())

Species
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
dtype: int64

In [8]: Iris.plot(kind='box', subplots=True, layout=(3,2), figsize=(8,12));


In [9]: Iris.hist(figsize=(12,12))
plt.show()
In [10]: Iris.corr(numeric_only="True")

Out[10]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm

Id 1.000000 0.716676 -0.397729 0.882747 0.899759

SepalLengthCm 0.716676 1.000000 -0.109369 0.871754 0.817954

SepalWidthCm -0.397729 -0.109369 1.000000 -0.420516 -0.356544

PetalLengthCm 0.882747 0.871754 -0.420516 1.000000 0.962757

PetalWidthCm 0.899759 0.817954 -0.356544 0.962757 1.000000

In [11]: sns.heatmap(Iris.corr(numeric_only="True"), annot=True, cmap='Wistia')

Out[11]: <Axes: >


In [12]: sns.pairplot(Iris)

Out[12]: <seaborn.axisgrid.PairGrid at 0x2bf02191970>


In [13]: from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['target'] = iris.target

In [14]: import matplotlib.pyplot as plt

iris_df.hist()
plt.show()
In [15]: iris_df.boxplot()
plt.show()

In [16]: iris_df.describe()

Out[16]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target

count 150.000000 150.000000 150.000000 150.000000 150.000000

mean 5.843333 3.057333 3.758000 1.199333 1.000000

std 0.828066 0.435866 1.765298 0.762238 0.819232

min 4.300000 2.000000 1.000000 0.100000 0.000000

25% 5.100000 2.800000 1.600000 0.300000 0.000000

50% 5.800000 3.000000 4.350000 1.300000 1.000000

75% 6.400000 3.300000 5.100000 1.800000 2.000000

max 7.900000 4.400000 6.900000 2.500000 2.000000

In [17]: sns.boxplot(data=Iris, orient="h")


plt.show()
In [18]: sns.boxplot(x = 'sepal width (cm)', data = iris_df)

Out[18]: <Axes: xlabel='sepal width (cm)'>

In [19]: Q1 = Iris.SepalWidthCm.quantile(0.25)
Q3 = Iris.SepalWidthCm.quantile(0.75)
IQR = Q3-Q1
print(IQR)

0.5

In [20]: data = Iris[Iris.SepalWidthCm < (Q1 - 1.5 * IQR) / (Iris.SepalWidthCm > (Q3 + 1.5 * IQR))]

In [21]: data
Out[21]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

... ... ... ... ... ... ...

145 146 6.7 3.0 5.2 2.3 Iris-virginica

146 147 6.3 2.5 5.0 1.9 Iris-virginica

147 148 6.5 3.0 5.2 2.0 Iris-virginica

148 149 6.2 3.4 5.4 2.3 Iris-virginica

149 150 5.9 3.0 5.1 1.8 Iris-virginica

147 rows × 6 columns

In [ ]:

You might also like