[go: up one dir, main page]

0% found this document useful (0 votes)
24 views13 pages

Exno 4

Uploaded by

kaviya260703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views13 pages

Exno 4

Uploaded by

kaviya260703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

EX. NO.

:4 DESCRIPTIVE ANALYTICS WITH PANDAS ON IRIS DATA

DATE:

The Iris Data Set is available from the UC Irvine Machine Learning
Repository at http://archive.ics.uci.edu/ml/datasets/Iris in Comma-Separated
Values (CSV) format. Python’s pandas library is used to import the CSV file. Iris
dataset is a famous dataset that contains five columns: Petal Length, Petal Width,
Sepal Length, Sepal Width, and Species Type.
Iris is a flowering plant; the researchers have measured various features of the
different iris flowers and recorded them digitally.

Using pandas, tabular data can be imported as a DataFrame object. A


pandas DataFrame represents a rectangular table of data containing an ordered
collection of columns and each column can have a different value type. The Iris
data set contains four numerical columns for the petal and sepal measurements
and one categorical column for the class or type of iris.

The pandas read_csv function loads delimited data from a file, URL, or
file-like object using the comma as the default delimiter and creates
a DataFrame object. When a pandas DataFrame object is created, it has many
attributes and methods available that can be used on that object.

A dataframe represents a rectangular table of data and contains an ordered


collection of columns, each of which can be a different value type. A dataframe
has both a row and column index and it is a dictionary of Series, all sharing the
same index. A column in a dataframe can be retrieved as a Series. The columns
of the resulting DataFrame have different dtypes.

AIM:

To perform descriptive analytics on the iris dataset by reading iris data


from a CSV file, the web, and the sklearn datasets module.

ALGORITHM:

Step 1: Open the Anaconda prompt and type “jupyter notebook”.

Step 2: Create a new notebook and save it.

Step 3: Import the pandas package.


Step 5: Read the Iris dataset directly from the URL
at https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data or
alternatively, it can be saved locally and read in by specifying the file path. The
dataset can also be loaded from the scikit-learn datasets module.

Step 4: Type the commands.


Step 6: Display the output.
Step 7: Stop the program.

(i) Descriptive analytics on the Iris dataset by reading data from a specific
location in the computer or from web
Code: Importing pandas to use in code as pd.
import pandas as pd

Code for reading data from CSV file


iris = pd.read_csv('iris.csv', delimiter = ',')

Code for reading data from URL

a. Create csv_url and pass to it the URL where the data set is available
‘https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'.
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data'

b. Create a list of column names “col_names” using the iris attribute


information.
# using the attribute information as the column names
col_names=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width',
'Class']

c. Create a panda’s DataFrame object called iris.

iris = pd.read_csv(csv_url, names = col_names)


Code: Display the top rows of the dataset with their columns
# Default value of head() function is 5, that is, it shows top 5 rows when no
argument is given
iris.head()
Output:

Code: Display the specified number of rows randomly


iris.sample(10)
Output:
Code: Display the number of columns and names of the columns.
iris.columns
Output:
Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
'species'],
dtype='object')

Code: Display the shape of the dataset


# Displays number of rows and columns.
iris.shape
Output:
(150, 5)

Code: Display the whole dataset


iris
Output:

Code: Slicing the rows


# Prints the rows from 10 to 20
iris[10:21]
Output:

Code: Display the number of instances and attributes in the dataset


# Demonstrates a complete dataset - no null values
iris.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB

Code: Display the number of instances of each species


# Shows a balanced dataset - each type is equally represented
iris.groupby('species').size()
Output:
species
setosa 50
versicolor 50
virginica 50
dtype: int64

Code: Display the datatypes of each of the attributes

#The columns of the resulting DataFrame have different dtypes.


iris.dtypes
Output:
Sepal_Length float64
Sepal_Width float64
Petal_Length float64
Petal_Width float64
Class object
dtype: object

Code: Display basic statistical features of the dataset


iris.describe()
Output:

Code: Count the number of rows in the dataset


iris.count()
Output:
sepal_length 150
sepal_width 150
petal_length 150
petal_width 150
species 150
dtype: int64

Code: Number of counts of unique values using “value_counts()”


iris["species"].value_counts()
Output:
setosa 50
versicolor 50
virginica 50
Name: species, dtype: int64

# Sample mean for every numeric column


iris.mean()
Output:
sepal_length 5.843333
sepal_width 3.054000
petal_length 3.758667
petal_width 1.198667
dtype: float64

# Sample median for every numeric column


iris.median()
Output:
sepal_length 5.80
sepal_width 3.00
petal_length 4.35
petal_width 1.30
dtype: float64

# Sample variance for every numeric column


iris.var()
Output:
sepal_length 0.685694
sepal_width 0.188004
petal_length 3.113179
petal_width 0.582414
dtype: float64

# Sample standard deviance for every numeric column


iris.std()
Output:
sepal_length 0.828066
sepal_width 0.433594
petal_length 1.764420
petal_width 0.763161
dtype: float64

(ii) Descriptive analytics on the Iris dataset by reading data from the
scikit-learn datasets module
Code: Load iris dataset from scikit learn datasets module
from sklearn.datasets import load_iris

iris= load_iris()

Code: Store features matrix in X


X= iris.data

Code: Store target vector in y


y= iris.target

Code: Names of features/columns in iris dataset


iris.feature_names
Output:
['sepal length (cm)',
'sepal width (cm)',
'petal length (cm)',
'petal width (cm)']

Code: Display names of target/output in iris dataset


print(iris.target_names)
Output:
['setosa' 'versicolor' 'virginica']
Code: Examine the size of feature matrix
print(iris.data.shape)
Output:
(150, 4)

Code: Display the size of target vector


print(iris.target.shape)
Output:
(150,)

Code: Display the contents of the data


print(iris.data)
Output:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]
[5.4 3.7 1.5 0.2]
[4.8 3.4 1.6 0.2]
[4.8 3. 1.4 0.1]
[4.3 3. 1.1 0.1]
[5.8 4. 1.2 0.2]
[5.7 4.4 1.5 0.4]
[5.4 3.9 1.3 0.4]
[5.1 3.5 1.4 0.3]
[5.7 3.8 1.7 0.3]
[5.1 3.8 1.5 0.3]
[5.4 3.4 1.7 0.2]
[5.1 3.7 1.5 0.4]
[4.6 3.6 1. 0.2]
[5.1 3.3 1.7 0.5]
[4.8 3.4 1.9 0.2]
[5. 3. 1.6 0.2]
[5. 3.4 1.6 0.4]
[5.2 3.5 1.5 0.2]
[5.2 3.4 1.4 0.2]
[4.7 3.2 1.6 0.2]
[4.8 3.1 1.6 0.2]
[5.4 3.4 1.5 0.4]
[5.2 4.1 1.5 0.1]
[5.5 4.2 1.4 0.2]
[4.9 3.1 1.5 0.2]
[5. 3.2 1.2 0.2]
[5.5 3.5 1.3 0.2]
[4.9 3.6 1.4 0.1]
[4.4 3. 1.3 0.2]
[5.1 3.4 1.5 0.2]
[5. 3.5 1.3 0.3]
[4.5 2.3 1.3 0.3]
[4.4 3.2 1.3 0.2]
[5. 3.5 1.6 0.6]
[5.1 3.8 1.9 0.4]
[4.8 3. 1.4 0.3]
[5.1 3.8 1.6 0.2]
[4.6 3.2 1.4 0.2]
[5.3 3.7 1.5 0.2]
[5. 3.3 1.4 0.2]
[7. 3.2 4.7 1.4]
[6.4 3.2 4.5 1.5]
[6.9 3.1 4.9 1.5]
[5.5 2.3 4. 1.3]
[6.5 2.8 4.6 1.5]
[5.7 2.8 4.5 1.3]
[6.3 3.3 4.7 1.6]
[4.9 2.4 3.3 1. ]
[6.6 2.9 4.6 1.3]
[5.2 2.7 3.9 1.4]
[5. 2. 3.5 1. ]
[5.9 3. 4.2 1.5]
[6. 2.2 4. 1. ]
[6.1 2.9 4.7 1.4]
[5.6 2.9 3.6 1.3]
[6.7 3.1 4.4 1.4]
[5.6 3. 4.5 1.5]
[5.8 2.7 4.1 1. ]
[6.2 2.2 4.5 1.5]
[5.6 2.5 3.9 1.1]
[5.9 3.2 4.8 1.8]
[6.1 2.8 4. 1.3]
[6.3 2.5 4.9 1.5]
[6.1 2.8 4.7 1.2]
[6.4 2.9 4.3 1.3]
[6.6 3. 4.4 1.4]
[6.8 2.8 4.8 1.4]
[6.7 3. 5. 1.7]
[6. 2.9 4.5 1.5]
[5.7 2.6 3.5 1. ]
[5.5 2.4 3.8 1.1]
[5.5 2.4 3.7 1. ]
[5.8 2.7 3.9 1.2]
[6. 2.7 5.1 1.6]
[5.4 3. 4.5 1.5]
[6. 3.4 4.5 1.6]
[6.7 3.1 4.7 1.5]
[6.3 2.3 4.4 1.3]
[5.6 3. 4.1 1.3]
[5.5 2.5 4. 1.3]
[5.5 2.6 4.4 1.2]
[6.1 3. 4.6 1.4]
[5.8 2.6 4. 1.2]
[5. 2.3 3.3 1. ]
[5.6 2.7 4.2 1.3]
[5.7 3. 4.2 1.2]
[5.7 2.9 4.2 1.3]
[6.2 2.9 4.3 1.3]
[5.1 2.5 3. 1.1]
[5.7 2.8 4.1 1.3]
[6.3 3.3 6. 2.5]
[5.8 2.7 5.1 1.9]
[7.1 3. 5.9 2.1]
[6.3 2.9 5.6 1.8]
[6.5 3. 5.8 2.2]
[7.6 3. 6.6 2.1]
[4.9 2.5 4.5 1.7]
[7.3 2.9 6.3 1.8]
[6.7 2.5 5.8 1.8]
[7.2 3.6 6.1 2.5]
[6.5 3.2 5.1 2. ]
[6.4 2.7 5.3 1.9]
[6.8 3. 5.5 2.1]
[5.7 2.5 5. 2. ]
[5.8 2.8 5.1 2.4]
[6.4 3.2 5.3 2.3]
[6.5 3. 5.5 1.8]
[7.7 3.8 6.7 2.2]
[7.7 2.6 6.9 2.3]
[6. 2.2 5. 1.5]
[6.9 3.2 5.7 2.3]
[5.6 2.8 4.9 2. ]
[7.7 2.8 6.7 2. ]
[6.3 2.7 4.9 1.8]
[6.7 3.3 5.7 2.1]
[7.2 3.2 6. 1.8]
[6.2 2.8 4.8 1.8]
[6.1 3. 4.9 1.8]
[6.4 2.8 5.6 2.1]
[7.2 3. 5.8 1.6]
[7.4 2.8 6.1 1.9]
[7.9 3.8 6.4 2. ]
[6.4 2.8 5.6 2.2]
[6.3 2.8 5.1 1.5]
[6.1 2.6 5.6 1.4]
[7.7 3. 6.1 2.3]
[6.3 3.4 5.6 2.4]
[6.4 3.1 5.5 1.8]
[6. 3. 4.8 1.8]
[6.9 3.1 5.4 2.1]
[6.7 3.1 5.6 2.4]
[6.9 3.1 5.1 2.3]
[5.8 2.7 5.1 1.9]
[6.8 3.2 5.9 2.3]
[6.7 3.3 5.7 2.5]
[6.7 3. 5.2 2.3]
[6.3 2.5 5. 1.9]
[6.5 3. 5.2 2. ]
[6.2 3.4 5.4 2.3]
[5.9 3. 5.1 1.8]]

Code: Display target vector iris species: 0 = setosa, 1 = versicolor, 2 = virginica


print(iris.target)
Output:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2
2 2]

Code: Convert into dataframe


import pandas as pd
import numpy as np
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['Species'])

# Distribution of each Iris species


df['Species'].value_counts()
Output:
0.0 50
1.0 50
2.0 50
Name: Species, dtype: int64

Code: Display basic statistical features of the dataset


df.describe()
Output:
RESULT:
Thus, the code for reading data from CSV file, web, and sklearn package was
executed and various commands for doing descriptive analytics on the Iris data
set were executed and the output is verified.

You might also like