Data Preprocessing Report
Data Preprocessing Report
1: N. Vamsi Nadh-230
2: F G Firasath -233
3: B Dhanush-291
Bangalore
Problem statement
This data set consists of the physical parameters of three species of flower
— Versicolor, Setosa and Virginica. The numeric parameters which the
dataset contains are Sepal width, Sepal length, Petal width and Petal
length. In this data we will be predicting the classes of the flowers based on
these parameters.The data consists of continuous numeric values which
describe the dimensions of the respective features. We will be training the
model based on these features.
Dataset Experimented:
1.Name of the Dataset:
This dataset contains Iris flower dataset for the research and to identify the
different rare species. In this dataset there are sepal length, sepal width, petal
length and petal width. This dataset has the data of the flower of the iris.
2.Features:
In this dataset as we are taking the different iris flower types from the research
for the 150 different types of floweres
The feature are:
Id
SepalLengthCm
SepalWidthCm
PetalLengthCm
PetalWidthCm
Species
3.Observation:
Number of observation in the dataset are 150 as we are taking
150 flower
4.Type of Dataset:
This dataset belongs to the classification dataset.
Classification, In this dataset we are taking the sepal length,
sepat width, petal length and petal width are classified for the
flower of the iris flower dataset.
Data Preprocessing Techniques:
Libraries used:
1.Pandas- This libraire function provides data structures like Data Frame and
Series, which are efficient for handling structured data.
2.Numpy-It provides support for large, multi-dimensional arrays and matrices,
along with a collection of mathematical functions to operate on these arrays
efficiently.
3.Matplotib-It provides a wide range of plotting functions to generate various
types of plots, including line plots, scatter plots, bar plots, histograms, heatmaps,
and more.
4.Seaborn-Seaborn simplifies the process of creating complex visualizations by
offering functions that automatically handle tasks such as data aggregation and
summarization, as well as styling and colour palettes.
5.Plotly.express-It display statistical information encoded in a color palette.
Data preprocessing :
1.Importing Libraries:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", color_codes=True)
import os
print(os.listdir("../input/"))
2.Importing the Dataset:
iris = pd.read_csv("../input/Iris.csv")
iris.head()
df.shape
From this, we got to know that there are 150 rows of data available and for each
row, we have 5 different features or columns.
3.Data Analysis:
Scatter plot of Iris features:
iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm")