Data Handling Using Pandas-I-ORG
Data Handling Using Pandas-I-ORG
Pandas:
• It is a package useful for data analysis and manipulation.
• Pandas provide an easy way to create, manipulate and wrangle the
data.
• Pandas provide powerful and easy-to-use data structures, as well
as the means to quickly perform operations on these structures.
DATA STRUCTURE IN
PANDAS
A data structure is a way to arrange the data in such a way that so it
can be accessed quickly and we can perform various operation on this
data like- retrieval, deletion, modification etc.
1. Series
2. Data Frame
3. Panel
e.g.-
Index Data
0 10
1 15
2 18
3 22
Program-
import pandas as pd
Output-
import numpy as np Default Index
0 10
arr=np.array([10,15,18,22])
1 15
s = pd.Series(arr) 2 18
3 22
print(s)
array of 4 values.
Program-
print(s)
Result of s.head()
Result of s.head(3)
tail(): It is used to access the last 5 rows of a series.
Series provides index label loc and iloc and [] to access rows and columns.
Example-
Example-
series_name[ index]
Example-
Example-
Slicing in Series
The segments start representing the first item, end representing the
last item, and step representing the increment between each item that
you would like.
Example :-
DATAFRAME
DATAFRAME STRUCTURE
0 ROHIT MI 13
1 VIRAT RCB 17
2 HARDIK MI 14
INDEX DATA
PROPERTIES OF
DATAFRAME
1. Series
2. Lists
3. Dictionary
4. A numpy 2D array
import pandas as pd 0
s = pd.Series(['a','b','c','d']) 0 a
1 b Default Column Name As 0
df=pd.DataFrame(s)
2 c
print s 3 d
Data Frame from Dictionary of Series
Example-
Example-
Iteration on Rows and Columns
1. iterrows ()
2. iteritems ()
iterrows()
Example-
Select operation in data frame
Example -
import pandas as pd
empdata={ 'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','U
maSelvi'],
'Doj':['12-01-2012','15-01-2012','05-09-2007','17-
01- 2012','05-09-2007','16-01-2012'] }
df=pd.DataFrame(empdata)
print(df)
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01- 2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
>>df.empid or df[‘empid’]
0 101
1 102
2 103
3 104
4 105
5 106
Name: empid, dtype: int64
>>df[[‘empid’,’ename’]]
empid ename
0 101 Sachin
1 102 Vinod
2 103 Lakhbir
3 104 Anil
4 105 Devinder
5 106 UmaSelvi
To Add & Rename a column in data
frame
import pandas as pd
s = pd.Series([10,15,18,22])
df=pd.DataFrame(s)
df[‘List3’]=df[‘List1’]+df[‘List2’] Output-
List1 List
2
0 20
1
0
1 20
1
5
2 20
1
8
3 20
2
2
>>df.pop(‘List2’) we can simply delete a column by passing
column
name in pop method.
>>df
List1
0
1
0
1
1
5
2
1
8
3
2
2
To Delete a Column using drop()
import pandas as pd
s= pd.Series([10,20,30,40])
df=pd.DataFrame(s)
df.columns=[‘List1’]
df[‘List2’]=50
df1=df.drop(‘List2’,axis=1) (axis=1) means to delete Data
column wise
df2=df.drop(index=2,3,axis=0) (axis=0) means to delete
data row wise with given index
print(df)
print(“ After deletion::”)
print(df1)
print (“ After row deletion::”)
print(df2)
Output-
List1 List
2
0 10 40
1 20 40
2 30 40
3 40 40
After deletion::
List1
0 10
1 20
2 30
3 40
After row deletion::
List1
0 10
1 20
Accessing the data frame through loc()
and iloc() method or indexing using Labels
Pandas provide loc() and iloc() methods to access the subset from a data
frame using row/column.
Syntax-
Syntax-
The method head() gives the first 5 rows and the method tail() returns the last 5
rows.
import pandas as pd
empdata={ 'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi'],
'Doj':['12-01-2012','15-01-2012','05-09-2007','17-01-
2012','05-09-2007','16-01-2012'] }
df=pd.DataFrame(empdata)
print(df)
print(df.head())
print(df.tail())
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir Data Frame
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod head() displays first 5 rows
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
Doj empid ename
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil tail() display last 5 rows
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
To display first 2 rows we can use head(2) and to returns last2 rows
we can use tail(2) and to return 3rd to 4th row we can write df[2:5].
import pandas as pd
empdata={ 'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi'],
'Doj':['12-01-2012','15-01-2012','05-09-2007','17-01-
2012','05-09-2007','16-01-2012'] }
df=pd.DataFrame(empdata)
print(df)
print(df.head(2))
print(df.tail(2))
print(df[2:5]
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01- 2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Boolean indexing helps us to select the data from the DataFrames using
a boolean vector. We create a DataFrame with a boolean index to use
the boolean indexing.
Two Data Frames might hold different kinds of information about the
same entity and linked by some common feature/column. To join these
Data Frames, pandas provides multiple functions like merge (), join() etc.
Example-1
1. Full Outer Join:- The full outer join combines the results of both
the left and the right outer joins. The joined data frame will contain all
records from both the data frames and fill in NaNs for missing matches
on either side. You can perform a full outer join by specifying the how
argument as outer in merge() function.
Example-
Example-
3. Right Join :- The right join produce a complete set of records
from data frame B(Right side Data Frame) with the matching records
(where available) in data frame A( Left side data frame). If there is no
match right side will contain null. You have to pass right in how argument
inside merge() function.
Example-
4. Left Join :- The left join produce a complete set of records from
data frame A(Left side Data Frame) with the matching records (where
available) in data frame B( Right side data frame). If there is no match
left side will contain null. You have to pass left in how argument inside
merge() function.
Example-
5. Joining on Index :- Sometimes you have to perform the join on
the indexes or the row labels. For that you have to specify right_index
( for the indexes of the right data frame )and left_index( for the
indexes of left data frame) as True.
Example-
CSV File