[go: up one dir, main page]

0% found this document useful (0 votes)
19 views18 pages

Unit 3 Python B.SC IT

Uploaded by

devilnithi79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views18 pages

Unit 3 Python B.SC IT

Uploaded by

devilnithi79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Prepared by

Senthil Kumar S
Assistant Professor,
Department of Information Technology,
Sri Ramakrishna Mission Vidyalaya
College of Arts and Science
(Autonomous), Coimbatore
Pandas:
Introduction to pandas
- Data manipulation with pandas

- Operating on null values,

- hierarchical indexing

- Combining Datasets

- Aggregation and Grouping.


- Manipulation of data with combined datasets using Pandas.
 Pandas is a Python library.
 Pandas is used to analyze data.
 Pandas is a Python library used for working with data sets.
 A data frame is a import pandas as pd
structured d = {'col1':
representation of [1, 2, 3, 4, 7], 'col2':
data. [4, 5, 6, 9, 5], 'col3':
 Data frame with 3 [7, 8, 12, 1, 11]}
rows and 5 columns df =
pd.DataFrame(data=d)
print(df)
import pandas as pd
df = pd.DataFrame( {'name':
['Akshay','Mukesh','Deepak']
, 'age':[22,23,21], 'country':
['india','india','us']})
print(df)

Output:
name age country 0 Akshay
22 india 1 Mukesh 23 india 2
Deepak 21 us
regno name mark
0 1 senthil 76
1 2 jamuna 67
2 3 rakki 88
import pandas 3 4 kavi 90
df = pandas.read_csv("D:\SRMV 4 5 karthi 89
CAS\ssk\csv\sdata.csv") 5 6 mahes 66
print(df) 6 7 seetha 54
print(df.loc[2,:]) regno 3
name rakki
mark 88
Name: 2, dtype: object
# importing pandas as pd Output:
import pandas as pd First Score Second Score Third
# importing numpy as np Score
import numpy as np 0 100.0 30.0 NaN
# dictionary of lists 1 90.0 45.0 40.0
dict = {'First Score':[100, 90, np.nan, 2 NaN 56.0 80.0
95], 3 95.0 NaN 98.0
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
First Score Second Score Third Score
# creating a dataframe from
dictionary 1 90.0 45.0 40.0
df = pd.DataFrame(dict)
print(df)
# Remove Blank Rows using dropna()
function
df.dropna(axis=0,inplace=True)
print(df)
# importing pandas as pd
import pandas as pd Output
# importing numpy as np
import numpy as np First Score Second Score Third Score
# dictionary of lists 0 100.0 30.0 NaN
dict = {'First Score':[100, 90, np.nan, 1 90.0 45.0 40.0
95], 2 NaN 56.0 80.0
3 95.0 NaN 98.0
'Second Score': [30, 45, 56, np.nan], First Score Second Score Third Score
'Third Score':[np.nan, 40, 80, 98]} 0 100.0 30.0 0.0
# creating a dataframe from 1 90.0 45.0 40.0
dictionary 2 0.0 56.0 80.0
3 95.0 0.0 98.0
df = pd.DataFrame(dict)
print(df)
# filling missing value using zero
print(df.fillna(0))
use the describe() function in Python to First Score Second Score Third Score
0 100 30.0 NaN
summarize data 1 90 45.0 40.0
2 87 56.0 80.0
# importing pandas as pd 3 95 NaN 98.0
import pandas as pd First Score Second Score Third Score
count 4.000000 3.000000 3.000000
# importing numpy as np mean 93.000000 43.666667
72.666667
import numpy as np std 5.715476 13.051181 29.687259
min 87.000000 30.000000
# dictionary of lists 40.000000
dict = {'First Score':[100, 90, 87, 95], 25% 89.250000 37.500000
60.000000
'Second Score': [30, 45, 56, np.nan], 50% 92.500000 45.000000
80.000000
'Third Score':[np.nan, 40, 80, 98]} 75% 96.250000 50.500000
89.000000
# creating a dataframe from dictionary max 100.000000 56.000000
df = pd.DataFrame(dict) 98.000000

print(df)
print(df.describe())
 Primarily we focus on one-
dimensional and two
dimensional data
 Often, it is useful to go
beyond this and store
higher-dimensional data—
that is, data indexed by
more than one or two
keys.
 To handle three-
dimensional and four-
dimensional data, common
Hierarchical Indexes are also known
pattern in practice is to as multi-indexing is setting more
make use of hierarchical than one column name as the index.
indexing (also known as
multi-indexing)
# importing pandas as pd
import pandas as pd
df1 = pd.DataFrame({'employee': ['Bob',
'Jake', 'Lisa', 'Sue'], 'group':
['Accounting', 'Engineering',
'Engineering', 'HR']})

df2 = pd.DataFrame({'employee': ['Lisa',


'Bob', 'Jake', 'Sue'], 'hire_date': [2004,
2008, 2012, 2014]})
display (df1,df2)
# importing pandas as pd
import pandas as pd
df1 = pd.DataFrame({'employee': ['Bob',
'Jake', 'Lisa', 'Sue'], 'group':
['Accounting', 'Engineering',
'Engineering', 'HR']})

df2 = pd.DataFrame({'employee': ['Lisa',


'Bob', 'Jake', 'Sue'], 'hire_date': [2004,
2008, 2012, 2014]})
display(df1,df2)
df3= pd.merge(df1,df2)
display(df3)
use the describe() function in Python to First Score Second Score Third Score
0 100 30.0 NaN
summarize data 1 90 45.0 40.0
# importing pandas as pd 2 87 56.0 80.0
3 95 NaN 98.0
import pandas as pd
# importing numpy as np
import numpy as np First Score Second Score Third Score
count 4.000000 3.000000 3.000000
# dictionary of lists mean 93.000000 43.666667 72.666667
std 5.715476 13.051181 29.687259
dict = {'First Score':[100, 90, 87, 95], min 87.000000 30.000000 40.000000
'Second Score': [30, 45, 56, np.nan], 25% 89.250000 37.500000 60.000000
50% 92.500000 45.000000 80.000000
'Third Score':[np.nan, 40, 80, 98]} 75% 96.250000 50.500000 89.000000
max 100.000000 56.000000 98.000000
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
print(df)
print(df.describe())
 The groupby() method
allows you to group
your data and execute
functions

You might also like