Introduction to Pandas for Data Analysis
Demo Links: Links::
https:://githu .co Links:m/Data-Science-Eas:t-AFrica/Data-Science-Bo Links:o Links:t-Camp-No Links:tes:/tree/main/
pandas:
-Ipytho Links:n no Links:te o Links:o Links:k file with demo Links:: https:://githu .co Links:m/Data-Science-Eas:t-AFrica/Data-
Science-Bo Links:o Links:t-Camp-No Links:tes:/tree/main/pandas:
•What Is: Pandas:
•Pandas: Operatio Links:n
- Slicing the data frame
- Merging & Jo Links:ining
- Co Links:ncatenatio Links:n
- Changing the index
- Change Co Links:lumn headers:
- Data munging
- Us:e-Cas:e: Analyze yo Links:uth unemplo Links:yment data
What Is Pandas:
Pandas: is: us:ed fo Links:r data manipulatio Links:n, analys:is: and cleaning. Pytho Links:n pandas: is:
well s:uited fo Links:r different kinds: o Links:f data, s:uch as: :
• Ta ular data with hetero Links:geneo Links:us:ly-typed co Links:lumns:
• Ordered and uno Links:rdered time s:eries: data
• Ar itrary matrix data with ro Links:w & co Links:lumn la els:
• Unla elled data
• Any o Links:ther fo Links:rm o Links:f o Links: s:ervatio Links:nal o Links:r s:tatis:tical data s:ets:
Installing Pandas:
To Links: ins:tall Pytho Links:n Pandas:, go Links: to Links: yo Links:ur co Links:mmand line/ terminal and type “pip
install pandas” o Links:r els:e, if yo Links:u have anaco Links:nda ins:talled in yo Links:ur s:ys:tem, jus:t type
in “conda install pandas”.
Once the ins:tallatio Links:n is: co Links:mpleted, go Links: to Links: yo Links:ur IDE (Jupyter, PyCharm etc.) and
s:imply impo Links:rt it y typing: “import pandas as pd”
If yo Links:u are us:ing jupyter/go Links:o Links:gle co Links:la s: run, “!pip install pandas”.
Python Pandas Operations
Us:ing Pytho Links:n pandas:, yo Links:u can perfo Links:rm a lo Links:t o Links:f o Links:peratio Links:ns: with s:eries:, data frames:,
mis:s:ing data, gro Links:up y etc. So Links:me o Links:f the co Links:mmo Links:n o Links:peratio Links:ns: fo Links:r data manipulatio Links:n
are lis:ted elo Links:w:
1). Slicing
2). Merging & Joining
3). Concatenation
4). Changing the index
5). Data Munging
Slicing the Data Frame
In o Links:rder to Links: perfo Links:rm s:licing o Links:n data, yo Links:u need a data frame. Data frame is: a 2-
dimens:io Links:nal data s:tructure and a mo Links:s:t co Links:mmo Links:n pandas: o Links: ject.
import pandas as pd
XYZ_web = {'Day':[1,2,3,4,5,6], "Visitors":[1000, 700,6000,1000,400,350],
"Bounce_Rate":[20,20, 23,15,10,34]}
df= pd.DataFrame(XYZ_web)
print(df)
The co Links:de a o Links:ve will co Links:nvert a dictio Links:nary into Links: a pandas: Data Frame alo Links:ng
with index to Links: the left, run the co Links:de o Links:n yo Links:ur env to Links: s:ee the o Links:utput o Links:r
reference to Links: the link that was: pro Links:vided to Links: GitHu no Links:te o Links:o Links:k example.
print(df.head(2))
This: is: pro Links:vide the firs:t two Links: ro Links:ws:, and if yo Links:u want the las:t two Links: ro Links:ws: us:e the
co Links:mmand elo Links:w:
print(df.tail(2))
Merging & Joining
In merging, yo Links:u can merge two Links: data frames: to Links: fo Links:rm a s:ingle data frame. Yo Links:u
can als:o Links: decide which co Links:lumns: yo Links:u want to Links: make co Links:mmo Links:n. Let implement that
practically, firs:t create three data frames:, which has: s:o Links:me key-value pairs: and
then merge the data frames: to Links:gether.
import pandas as pd
df1= pd.DataFrame({ "HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":
[50,45,45,67]}, index=[2001, 2002,2003,2004])
df2=pd.DataFrame({ "HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":
[50,45,45,67]}, index=[2005, 2006,2007,2008])
merged= pd.merge(df1,df2)
print(merged)
As: yo Links:u can s:ee a o Links:ve, the two Links: data frames: has: merged into Links: a s:ingle data
frame. No Links:w, yo Links:u can als:o Links: s:pecify the co Links:lumn which yo Links:u want to Links: make
co Links:mmo Links:n. Run the a o Links:ve co Links:de in yo Links:ur env and s:ee the o Links:utput, o Links:r refer to Links:
the GitHu link that I pro Links:vided fo Links:r the no Links:te o Links:o Links:k example.
Task 1: Make the “HPI” column to be common for everything else and separate
columns.
Solution:
df1 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3], "IND_GDP":
[50,45,45,67]}, index=[2001, 2002,2003,2004])
df2 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":
[50,45,45,67]}, index=[2005, 2006,2007,2008])
merged= pd.merge(df1,df2, on ="HPI")
print(merged)
Join is: a co Links:nvenient metho Links:d to Links: co Links:m ine two Links: differently indexed
dataframes: into Links: a s:ingle res:ult dataframe. This: is: quite s:imilar to Links: the
“merge” o Links:peratio Links:n, except the jo Links:ining o Links:peratio Links:n will e o Links:n the “index”
ins:tead o Links:f the “co Links:lumns:” .
df1 = pd.DataFrame({"Int_Rate":[2,1,2,3], "IND_GDP":[50,45,45,67]},
index=[2001, 2002,2003,2004])
df2 = pd.DataFrame({"Low_Tier_HPI":[50,45,67,34],"Unemployment":[1,3,5,6]},
index=[2001, 2003,2004,2004])
joined= df1.join(df2)
print(joined)
Run the a o Links:ve co Links:de and s:tudy the o Links:utput. As: yo Links:u may no Links:tice in yo Links:ur
o Links:utput, in year 2002(index), there is: no Links: value attached to Links: co Links:lumns:
“lo Links:w_tier_HPI” and “unemplo Links:yment” , therefo Links:re it has: printed NaN (No Links:t a
Num er). Later in 2004, o Links:th the values: are availa le, therefo Links:re it has:
printed the res:pective values:.
Task 2: Make sure you can clearly differentiate merge and join in
pandas.
Concatenation
Co Links:ncatenatio Links:n as:ically glues: the dataframes: to Links:gether. Yo Links:u can s:elect the
dimens:io Links:n o Links:n which yo Links:u want to Links: co Links:ncatenate. Fo Links:r that, jus:t us:e “pd.concat” and
pas:s: in the lis:t o Links:f dataframes: to Links: co Links:ncatenate to Links:gether.
df1 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3], "IND_GDP":
[50,45,45,67]}, index=[2001, 2002,2003,2004])
df2 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":
[50,45,45,67]}, index=[2005, 2006,2007,2008])
concat= pd.concat([df1,df2])
print(concat)
Run the a o Links:ve co Links:de in yo Links:ur lo Links:cal env and s:tudy yo Links:ur o Links:utput, as: yo Links:u might
realize, the two Links: dataframes: are glued to Links:gether in as:ingle dataframe, where
the index s:tarts: fro Links:m 2001 all the way upto Links: 2008.
Yo Links:u can als:o Links: s:pecify axis:=1 in o Links:rder to Links: jo Links:in, merge o Links:r co Links:ncatenate alo Links:ng
the co Links:lumns:.
df1 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3], "IND_GDP":
[50,45,45,67]}, index=[2001, 2002,2003,2004])
df2 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":
[50,45,45,67]}, index=[2005, 2006,2007,2008])
concat= pd.concat([df1,df2],axis=1)
print(concat)
Run the a o Links:ve co Links:de in yo Links:ur lo Links:cal env and s:tudy the o Links:utput and as: yo Links:u
might realize, there are unch o Links:f mis:s:ing values:. This: happens: ecaus:e the
dataframes: didn’t have values: fo Links:r all the indexes: yo Links:u want to Links: co Links:ncatenate
o Links:n. Therefo Links:re, yo Links:u s:ho Links:uld make s:ure that yo Links:u have all the info Links:rmatio Links:n
lining up co Links:rrectly when yo Links:u jo Links:in o Links:r co Links:ncatenate o Links:n the axis:.
Change the index
No Links:w let unders:tand ho Links:w to Links: change the index values: in a dataframe. Fo Links:r
example, let create a dataframe with s:o Links:me key value pairs: in a dictio Links:nary and
change the index values:.
import pandas as pd
df= pd.DataFrame({"Day":[1,2,3,4], "Visitors":[200, 100,230,300], "Bounce_Rate":
[20,45,60,10]})
df.set_index("Day", inplace= True)
print(df)
Run the a o Links:ve co Links:de and s:tudy the o Links:utput and yo Links:u will realize that the
index value has: een changed with res:pect to Links: the “Day” co Links:lumn.
Change the Column Headers
Let take the a o Links:ve example, where we will change the co Links:lumn header fro Links:m
“Vis:ito Links:rs:” to Links: “Us:ers:” .
import pandas as pd
df = pd.DataFrame({"Day":[1,2,3,4], "Visitors":[200, 100,230,300],
"Bounce_Rate":[20,45,60,10]})
df = df.rename(columns={"Visitors":"Users"})
print(df)
Run the a o Links:ve co Links:de in yo Links:ur lo Links:cal env and yo Links:u will no Links:tice that co Links:lumn
header “Vis:ito Links:rs:” has: een changed to Links: “Us:ers:” .
Data Munging
In Data munging, yo Links:u can co Links:nvert a particular data into Links: a different fo Links:rmat.
Fo Links:r example, if yo Links:u have a .cs:v file, yo Links:u can co Links:nvert it into Links: .html o Links:r any
o Links:ther data fo Links:rmat as: well.
import pandas as pd
country= pd.read_csv("train.csv",index_col=0)
country.to_html('index.html')
Once yo Links:u run this: co Links:de in yo Links:ur lo Links:cal env, a HTML file will e created
named “index.html” . Yo Links:u can directly co Links:py the path o Links:f the file and pas:te it
in yo Links:ur ro Links:ws:er which dis:plays: the data in a HTML fo Links:rmat.