0% found this document useful (0 votes)

171 views44 pages

Data Handling Using Pandas-I-ORG

The document discusses Pandas, a Python library used for data analysis and manipulation. It provides two main data structures - Series for one-dimensional data and DataFrame for two-dimensional data. Series contains a homogeneous data array and associated index. DataFrame contains heterogeneous data arranged in rows and columns like a spreadsheet. It allows easy operations on data like selection, deletion, merging etc.

Uploaded by

Ayush Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

171 views44 pages

Data Handling Using Pandas-I-ORG

Uploaded by

Ayush Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

CHAPTER-1 Data Handling using Pandas –I

Pandas:
• It is a package useful for data analysis and manipulation.
• Pandas provide an easy way to create, manipulate and wrangle the
data.
• Pandas provide powerful and easy-to-use data structures, as well
as the means to quickly perform operations on these structures.

Data scientists use Pandas for its following advantages:

• Easily handles missing data.

• It uses Series for one-dimensional data structure and Data Frame
for multi-dimensional data structure.
• It provides an efficient way to slice the data.
• It provides a flexible way to merge, concatenate or reshape the
data.

DATA STRUCTURE IN
PANDAS
A data structure is a way to arrange the data in such a way that so it
can be accessed quickly and we can perform various operation on this
data like- retrieval, deletion, modification etc.

Pandas deals with 3 data structure-

1. Series
2. Data Frame
3. Panel

We are having only series and data frame in our syllabus.

RANJANA BAJAJ PGT(CS/IP)

Series
Series- Series is a one-dimensional array like structure with
homogeneous data, which can be used to handle and manipulate data.
What makes it special is its index attribute, which has incredible
functionality and is heavily mutable.

It has two parts-

1. Data part (An array of actual data)
2. Associated index with data (associated array of indexes or data
labels)

e.g.-

Index Data

0 10

1 15

2 18

3 22

RANJANA BAJAJ PGT(CS/IP)

How to create Series with nd array

Program-

import pandas as pd
Output-
import numpy as np Default Index
0 10
arr=np.array([10,15,18,22])
1 15
s = pd.Series(arr) 2 18
3 22
print(s)

Here we create an Data

array of 4 values.

How to create Series with mutable index

Program-

import pandas as pd Output-

import numpy as np first a
arr=np.array(['a','b','c','d']) second b
third c
s=pd.Series(arr,
fourth d
index=['first','second','third','fourth'])

print(s)

RANJANA BAJAJ PGT(CS/IP)

Creating a series from Scalar value

To create a series from scalar value, an index must be provided. The

scalar value will be repeated as per the length of index.

Creating a series from a Dictionary

Head and Tail Functions in Series

head (): It is used to access the first 5 rows of a series.

Note : To access first 3 rows we can call series_name.head (3)

Result of s.head()

Result of s.head(3)
tail(): It is used to access the last 5 rows of a series.

Note : To access last 4 rows we can call series_name.tail (4)

Selection in Series

Series provides index label loc and iloc and [] to access rows and columns.

1. loc index label :-

Syntax:- series_name.loc[StartRange : StopRange]

Example-

To Print Values from Index 0 to 2

To Print Values from Index 3 to 4

2. Selection Using iloc index label :-

Syntax:- series_name.iloc[StartRange : StopRange]

Example-

To Print Values from Index 0 to 1.

3. Selection Using [] :

Syntax:- series_name[StartRange> : StopRange] or

series_name[ index]

Example-

To Print Values at Index 3.

Indexing in Series

Pandas provide index attribute to get or set the index of entries or

values in series.

Example-
Slicing in Series

Slicing is a way to retrieve subsets of data from a pandas object. A slice

object syntax is –

SERIES_NAME [start: end: step]

The segments start representing the first item, end representing the
last item, and step representing the increment between each item that
you would like.

Example :-
DATAFRAME

DATAFRAM It is a two-dimensional object that is useful in representing

E-
data in the form of rows and columns. It is similar to a spreadsheet or
an SQL table. This is the most commonly used pandas object. Once we
store the data into the Data frame, we can perform various operations
that are useful in analyzing and understanding the data.

DATAFRAME STRUCTURE

COLUMNS PLAYERNAME IPLTEAM BASEPRICEINC

0 ROHIT MI 13

1 VIRAT RCB 17

2 HARDIK MI 14

INDEX DATA

PROPERTIES OF
DATAFRAME

1. A Dataframe has axes (indices)-

➢ Row index (axis=0)
➢ Column index (axes=1)
2. It is similar to a spreadsheet , whose row index is called index
and column index is called column name.
3. A Dataframe contains Heterogeneous data.
4. A Dataframe Size is Mutable.
5. A Dataframe Data is Mutable.
A data frame can be created using any of the following-

1. Series
2. Lists
3. Dictionary
4. A numpy 2D array

How to create Dataframe From

Series
Program-
Output-

import pandas as pd 0

s = pd.Series(['a','b','c','d']) 0 a
1 b Default Column Name As 0
df=pd.DataFrame(s)
2 c
print s 3 d
Data Frame from Dictionary of Series

Example-

Data Frame from List of Dictionaries

Example-
Iteration on Rows and Columns

If we want to access record or data from a data frame row wise or

column wise then iteration is used. Pandas provide 2 functions to perform
iterations-

1. iterrows ()
2. iteritems ()

iterrows()

It is used to access the data row wise. Example-

iteritems()

It is used to access the data column wise.

Example-
Select operation in data frame

To access the column data , we can mention the column name as

subscript.
e.g. - df[empid]. This can also be done by using df.empid.
To access multiple columns we can write as df[ [col1, col2,---] ]

Example -
import pandas as pd
empdata={ 'empid':[101,102,103,104,105,106],

>>df[[‘empid’,’ename’]]
empid ename
0 101 Sachin
1 102 Vinod
2 103 Lakhbir
3 104 Anil
4 105 Devinder
5 106 UmaSelvi
To Add & Rename a column in data
frame

import pandas as pd

s = pd.Series([10,15,18,22])

df=pd.DataFrame(s)

df.columns=[‘List1’] To Rename the default column of Data

Frame as List1

df[‘List2’]=20 To create a new column List2 with all values as

df[‘List3’]=df[‘List1’]+df[‘List2’] Output-

List1 List2 List3

Add Column1 and Column2 and store in
0 10 20 30
New column List3 1 15 20 35
2 18 20 38
print(df) 3 22 20 42
To Delete a Column in data frame

We can delete the column from a data frame by using any of

the the following –
1. del
2. pop()
3. drop()

>> del df[‘List3’] We can simply delete a column by passing

column name in subscript with df
>>df
Output-

List1 List
2
0 20
1
0
1 20
1
5
2 20
1
8
3 20
2
2
>>df.pop(‘List2’) we can simply delete a column by passing
column
name in pop method.
>>df

List1
0
1
0
1
1
5
2
1
8
3
2
2
To Delete a Column using drop()

import pandas as pd
s= pd.Series([10,20,30,40])
df=pd.DataFrame(s)
df.columns=[‘List1’]
df[‘List2’]=50
df1=df.drop(‘List2’,axis=1) (axis=1) means to delete Data
column wise
df2=df.drop(index=2,3,axis=0) (axis=0) means to delete
data row wise with given index
print(df)
print(“ After deletion::”)
print(df1)
print (“ After row deletion::”)
print(df2)

Output-
List1 List
2
0 10 40
1 20 40
2 30 40
3 40 40
After deletion::
List1
0 10
1 20
2 30
3 40
After row deletion::
List1
0 10
1 20
Accessing the data frame through loc()
and iloc() method or indexing using Labels

Pandas provide loc() and iloc() methods to access the subset from a data
frame using row/column.

Accessing the data frame through loc()

It is used to access a group of rows and columns.

Syntax-

Df.loc[StartRow : EndRow, StartColumn : EndColumn]

Note - If we pass : in row or column part then pandas provide the entire
row or column respectively.

To access a single row

To access multiple Rows Qtr1 to Qtr3

Example 2:-

To access single column

To access Multiple Column namely TCS and WIPRO

Example-3

To access first row

To access first 3 Rows

Accessing the data frame through iloc()

It is used to access a group of rows and columns based on numeric index

value.

Syntax-

Df.loc[StartRowindexs : EndRowindex, StartColumnindex : EndColumnindex]

Note - If we pass : in row or column part then pandas provide

the entire row or column respectively.

To access First two Rows

and Second column

To access all Rows and First

Two columns Record
head() and tail() Method

The method head() gives the first 5 rows and the method tail() returns the last 5
rows.

import pandas as pd
empdata={ 'empid':[101,102,103,104,105,106],

'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi'],
'Doj':['12-01-2012','15-01-2012','05-09-2007','17-01-
2012','05-09-2007','16-01-2012'] }
df=pd.DataFrame(empdata)
print(df)
print(df.head())
print(df.tail())
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir Data Frame
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod head() displays first 5 rows
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
Doj empid ename
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil tail() display last 5 rows
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
To display first 2 rows we can use head(2) and to returns last2 rows
we can use tail(2) and to return 3rd to 4th row we can write df[2:5].

import pandas as pd
empdata={ 'empid':[101,102,103,104,105,106],

Doj empid ename

0 12-01-2012 101 Sachin head(2) displays first 2 rows
1 15-01-2012 102 Vinod

Doj empid ename

4 05-09-2007 105 Devinder tail(2) displays last 2 rows
5 16-01-2012 106 UmaSelvi
Doj empid ename
2 05-09-2007 103 Lakhbir
3 17-01- 2012 104 Anil df[2:5] display 2nd to 4th row
4 05-09-2007 105 Devinder
Boolean Indexing in Data Frame

Boolean indexing helps us to select the data from the DataFrames using
a boolean vector. We create a DataFrame with a boolean index to use
the boolean indexing.

To Return Data frame where index is

True

We can pass only integer value in

iloc
Concat operation in data frame

Pandas provides various facilities for easily combining together Series,

DataFrame.

pd.concat(objs, axis=0, join='outer', join_axes=None,ignore_index=False)

• objs − This is a sequence or mapping of Series, DataFrame, or Panel

objects.
• axis − {0, 1, ...}, default 0. This is the axis to concatenate along.
• join − {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on
other axis(es). Outer for union and inner for intersection.
• ignore_index − boolean, default False. If True, do not use the
index values on the concatenation axis. The resulting axis will be
labeled 0, ..., n - 1.
• join_axes − This is the list of Index objects. Specific indexes to
use for the other (n-1) axes instead of performing inner/outer set
logic.

The Concat() performs concatenation operations along an axis.

Merge operation in data frame

Two Data Frames might hold different kinds of information about the
same entity and linked by some common feature/column. To join these
Data Frames, pandas provides multiple functions like merge (), join() etc.

Example-1

This will give the common rows between the two

data frames for the corresponding column
values (‘id’).
Example-2

It might happen that the column on which you

want to merge the Data Frames have
different names (unlike in this case). For such
merges, you will have to specify the
arguments left_on as the left DataFrame
name and right_on as the right DataFrame
name.
Join operation in data frame

It is used to merge data frames based on some common column/key.

1. Full Outer Join:- The full outer join combines the results of both
the left and the right outer joins. The joined data frame will contain all
records from both the data frames and fill in NaNs for missing matches
on either side. You can perform a full outer join by specifying the how
argument as outer in merge() function.

Example-

The resulting DataFrame had all the

entries from both the tables
with NaN values for missing
matches on either side. However,
one more thing to notice is the
suffix which got appended to the
column names to show which column
came from which DataFrame. The
default suffixes are x and y,
however, you can modify them by
specifying the suffixes argument in
the merge() function.
Example-2
2. Inner Join :- The inner join produce only those records that match
in both the data frame. You have to pass inner in how argument inside
merge() function.

Example-
3. Right Join :- The right join produce a complete set of records
from data frame B(Right side Data Frame) with the matching records
(where available) in data frame A( Left side data frame). If there is no
match right side will contain null. You have to pass right in how argument
inside merge() function.

Example-
4. Left Join :- The left join produce a complete set of records from
data frame A(Left side Data Frame) with the matching records (where
available) in data frame B( Right side data frame). If there is no match
left side will contain null. You have to pass left in how argument inside
merge() function.

Example-
5. Joining on Index :- Sometimes you have to perform the join on
the indexes or the row labels. For that you have to specify right_index
( for the indexes of the right data frame )and left_index( for the
indexes of left data frame) as True.

Example-
CSV File

A CSV is a comma separated values file, which allows data to be

saved in a tabular format. CSV is a simple file such as a
spreadsheet or database. Files in the csv format can be
imported and exported from programs that store data in
tables, such as Microsoft excel or Open Office.

CSV files data fields are most often

separated, or delimited by a comma. Here the data in each row
are delimited by comma and individual rows are separated by
newline.

To create a csv file, first choose your

favorite text editor such as- Notepad and open a new file. Then
enter the text data you want the file to contain, separating
each value with a comma and each row with a new line. Save the
file with the extension.csv. You can open the file using MS Excel
or another spread sheet program. It will create the table of
similar data.
Pd.read_csv() method is used to read a csv file.
Exporting data from data frame to
CSV File

To export a data frame into a csv file first of all, we create a

data frame say df1 and use dataframe.to_csv(‘

E:\Dataframe1.csv ’ ) method to export data frame df1 into

csv file Dataframe1.csv.

And now the content of df1 is exported to csv file Dataframe1.

Dictionary 2024 2025
100% (1)
Dictionary 2024 2025
90 pages
024 Price and Everything PDF
No ratings yet
024 Price and Everything PDF
12 pages
L1 python Pandas 1 series notes (1)
No ratings yet
L1 python Pandas 1 series notes (1)
25 pages
Any 2024 o Level Chinese Oral Topic Guesses rSGExams
No ratings yet
Any 2024 o Level Chinese Oral Topic Guesses rSGExams
1 page
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
Catálogo CHINA
100% (2)
Catálogo CHINA
96 pages
Norgren Secador de Membrana
No ratings yet
Norgren Secador de Membrana
442 pages
50 Interview Question Code Galatta - Handbook
No ratings yet
50 Interview Question Code Galatta - Handbook
16 pages
Python Unit I II III
100% (1)
Python Unit I II III
45 pages
Regulation of The Internet A Technological Perspective: Gerry Miller Gerri Sinclair David Sutherland Julie Zilber
No ratings yet
Regulation of The Internet A Technological Perspective: Gerry Miller Gerri Sinclair David Sutherland Julie Zilber
96 pages
Python Zero To Mastery
100% (1)
Python Zero To Mastery
2 pages
Python Program To Implement Stack Operations
100% (2)
Python Program To Implement Stack Operations
4 pages
AS5000-MS-10-300 SpecAS5311 v0 1 PDF
No ratings yet
AS5000-MS-10-300 SpecAS5311 v0 1 PDF
7 pages
Indian Banking System
100% (1)
Indian Banking System
36 pages
Accountancy Sample Paper 1
No ratings yet
Accountancy Sample Paper 1
144 pages
DBMS Assignment 4
No ratings yet
DBMS Assignment 4
13 pages
Yayasan Akrab Pekanbaru: Keywords: Election, Smart System, Fingerprint, Pilkades
No ratings yet
Yayasan Akrab Pekanbaru: Keywords: Election, Smart System, Fingerprint, Pilkades
10 pages
Practical List Ip
100% (1)
Practical List Ip
10 pages
Subhadip Das CSE 2018/008 11700118026 Cryptography and Network Security PEC-CS801B CA1 Assignment
No ratings yet
Subhadip Das CSE 2018/008 11700118026 Cryptography and Network Security PEC-CS801B CA1 Assignment
9 pages
How To Get Banned From The Apricity Forum
No ratings yet
How To Get Banned From The Apricity Forum
9 pages
classXII D Student Handbook
No ratings yet
classXII D Student Handbook
68 pages
The Fujilux Trio
No ratings yet
The Fujilux Trio
11 pages
Wawb Im20190924e
No ratings yet
Wawb Im20190924e
172 pages
Dissolution Practice Questions
No ratings yet
Dissolution Practice Questions
6 pages
CV Wanrusli
No ratings yet
CV Wanrusli
8 pages
Bhuvanteza Happy Homes Hmda Fee Receipts
No ratings yet
Bhuvanteza Happy Homes Hmda Fee Receipts
6 pages
السيره الذاتيه للدكتور المهندس حسين حسين محمد حسن
No ratings yet
السيره الذاتيه للدكتور المهندس حسين حسين محمد حسن
5 pages
R Project
0% (1)
R Project
25 pages
IP Class 12
No ratings yet
IP Class 12
10 pages
Gurobi Training
No ratings yet
Gurobi Training
84 pages
Activity 1 - Javascript Basic.
No ratings yet
Activity 1 - Javascript Basic.
14 pages
Practical 1 and 2-1
No ratings yet
Practical 1 and 2-1
33 pages
CS - Term-2 - Full SQL - 240801 - 193816
No ratings yet
CS - Term-2 - Full SQL - 240801 - 193816
62 pages
IIM Shillong-MDP Programs 2020-21
No ratings yet
IIM Shillong-MDP Programs 2020-21
25 pages
Xi DBMS
No ratings yet
Xi DBMS
21 pages
Quick Start
No ratings yet
Quick Start
10 pages
Practices Practices: Informatics
No ratings yet
Practices Practices: Informatics
46 pages
PA4600 Datasheet en 204108 F36I-E-01
No ratings yet
PA4600 Datasheet en 204108 F36I-E-01
6 pages
CS Question Paper
No ratings yet
CS Question Paper
3 pages
CV (Yashika Neekhra)
No ratings yet
CV (Yashika Neekhra)
4 pages
Dataframes-I (Create & Selection)
No ratings yet
Dataframes-I (Create & Selection)
10 pages
CBSE XII IP Model QP With Solution
No ratings yet
CBSE XII IP Model QP With Solution
9 pages
CBSE Sample Paper For Class 12th Informatics Practices - (Term-1 2022 With Marking Scheme
No ratings yet
CBSE Sample Paper For Class 12th Informatics Practices - (Term-1 2022 With Marking Scheme
3 pages
L7
No ratings yet
L7
11 pages
Project Report Covid 19 Analysis Tutorialaicsip
No ratings yet
Project Report Covid 19 Analysis Tutorialaicsip
19 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
PB 1 IP Answer Key 2024
No ratings yet
PB 1 IP Answer Key 2024
6 pages
Publication 4
No ratings yet
Publication 4
21 pages
Morph Effect
No ratings yet
Morph Effect
14 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
25 pages
MATPLOTLIB NOTES Pandas
No ratings yet
MATPLOTLIB NOTES Pandas
17 pages
SCH5027D NW
No ratings yet
SCH5027D NW
1 page
Instructionscandidates
No ratings yet
Instructionscandidates
20 pages
Ip Project PDF
No ratings yet
Ip Project PDF
50 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
Lecture Notes
100% (1)
Lecture Notes
82 pages
Journal 12
No ratings yet
Journal 12
54 pages
Unique IP Project File Harshit
No ratings yet
Unique IP Project File Harshit
26 pages
CLS - Xii - Ip - Practical & Project - 2022-23
No ratings yet
CLS - Xii - Ip - Practical & Project - 2022-23
6 pages
Python Pandas Visualisation Assignment
No ratings yet
Python Pandas Visualisation Assignment
15 pages
IP Practicle
No ratings yet
IP Practicle
23 pages
Engineering Procedure: SAEP-122 31 July, 2005 Project Records Document Responsibility: Project Support and Controls Dept
No ratings yet
Engineering Procedure: SAEP-122 31 July, 2005 Project Records Document Responsibility: Project Support and Controls Dept
11 pages
Saish IP Project
No ratings yet
Saish IP Project
16 pages
Machine Learning With Python Nitin Sharma
No ratings yet
Machine Learning With Python Nitin Sharma
18 pages
Informatics Practices Practical List22-2323
100% (1)
Informatics Practices Practical List22-2323
7 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Ip Sample Paper 6
No ratings yet
Ip Sample Paper 6
8 pages
Cap 9 - 3151910-Operations-Research-Theory-And-Applications-By-J.-K.-Sharma-Z-Lib - Org
No ratings yet
Cap 9 - 3151910-Operations-Research-Theory-And-Applications-By-J.-K.-Sharma-Z-Lib - Org
54 pages
DATA VISUALIZATION - Part 2
No ratings yet
DATA VISUALIZATION - Part 2
10 pages
IP 12 BoardPracPracticeQuestions PDF
No ratings yet
IP 12 BoardPracPracticeQuestions PDF
2 pages
Python Cheat Sheet For Beginners
No ratings yet
Python Cheat Sheet For Beginners
26 pages
IP TERM-1 Study Material (Session 2021-22)
No ratings yet
IP TERM-1 Study Material (Session 2021-22)
84 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
Byline Article Resilient Data Platforms
No ratings yet
Byline Article Resilient Data Platforms
3 pages
Ip 12 Assignment - 6 (MCQ)
No ratings yet
Ip 12 Assignment - 6 (MCQ)
8 pages
12ip 22 23
No ratings yet
12ip 22 23
188 pages
Daa Notes Unit 4
No ratings yet
Daa Notes Unit 4
14 pages
Python Pandas
No ratings yet
Python Pandas
19 pages
CBSE Class 12 Informatic Practices MySQL
No ratings yet
CBSE Class 12 Informatic Practices MySQL
26 pages
Statistics Probability
No ratings yet
Statistics Probability
66 pages
XII-IP-QuickRevision 2 in 1
No ratings yet
XII-IP-QuickRevision 2 in 1
13 pages
LMRS Ip 2020 21
No ratings yet
LMRS Ip 2020 21
21 pages
Info Pract Xii Ms PB 1 Set 3
No ratings yet
Info Pract Xii Ms PB 1 Set 3
11 pages
Informatics Practices Class 12
No ratings yet
Informatics Practices Class 12
8 pages
Using Project Management Office (Pmo) To Improve Project Management Abilities
No ratings yet
Using Project Management Office (Pmo) To Improve Project Management Abilities
13 pages
12 Ip
No ratings yet
12 Ip
5 pages
E-Mail Sop
No ratings yet
E-Mail Sop
2 pages
Panimalar Engineering College Internal Assessment - Iii Cs8251 Programming in C
No ratings yet
Panimalar Engineering College Internal Assessment - Iii Cs8251 Programming in C
2 pages
AS-societal impacts-NCERT Solutions
No ratings yet
AS-societal impacts-NCERT Solutions
11 pages
Pandas Dataframe Assignment No 3 - Answerkey
No ratings yet
Pandas Dataframe Assignment No 3 - Answerkey
10 pages
Chapter-I: Derivatives (Futures & Options)
No ratings yet
Chapter-I: Derivatives (Futures & Options)
18 pages
Database Management System
No ratings yet
Database Management System
35 pages
Project
No ratings yet
Project
18 pages