NAND VIDYA NIKETAN
JAMNAGAR
GRADE – XII
INFORMATICS PRACTICES
PREPARED BY : VIPITHAV
CONTACT NO : 8547196016
Creating ADataFrame Object From A2D Dictionary
With Values AsSeries Objects
import pandas as pd
Staff = pd.Series ([10, 25, 40] )
Salaries = pd.Series ([16000, 24600, 56300])
School = { ‘People’ : Staff, ‘Amount’ : Salaries }
Dfobj=pd.DataFrame (School)
print (Dfobj)
Output:-
People Amoun
0 10 t 16000
1 25 24600
2 40 56300
RETRIEVING VARIOUS PROPERTIES OF ADATAFRAME OBJECT
import pandas as pd
Staff = pd.Series ([10, 25, 40] )
Salaries = pd.Series ([16000, 24600, 56300])
School = {‘People’:Staff, ‘Amount’:Salaries}
dfobj=pd.DataFrame (School)
print (dfobj)
print (dfobj.index)
print (dfobj.columns)
print (dfobj.axes)
print (dfobj.dtypes)
OUTPUT
People Amount
0 10 16000
1 25 24600
2 40 56300
rangeindex(start=0, stop=3, step=1) ---------------- index
index(['people', 'amount'], dtype='object’) --------columns
[rangeindex(start=0, stop=3, step=1), index(['people', 'amount'],
dtype='object’)] ------------- axes
people int64 ------------- dtypes
amount int64
dtype: object
GETTING NUMBER OF ROWS IN A DATAFRAME
The len(<DF object>) will return the number of rows in adataframe.
import pandas as pd
Staff = pd.Series ([10, 25, 40] )
Salaries = pd.Series ([16000, 24600, 56300])
School = {'People':Staff, 'Amount':Salaries}
Dfobj=pd.DataFrame (School)
print (Dfobj)
print ("Total Number of Rows are:- ", len(Dfobj))
Output:-
People Amount
0 10 16000
1 25 24600
2 40 56300
Total Number of Rows are:- 3
GETTING COUNT OF NON-NAN / NAN VALUES
Like Series, you can use count ( ) with DataFrame too to get the count of non-NaN non-NA
values.
a) If you don’t pass any argument or pass 0(default id 0 only), then it returns count of
non-NaN values for each column. For Eg:-
print (Dfobj.count())
People 3
Amount 3
dtype: int64
b) If you pass argument as 1, then it returns count of non-NaN values for eachrow. For
Eg:-
print (Dfobj.count(1))
0 2
1 2
2 2
dtype: int64
TRANSPOSING A DATAFRAME
We can transpose a dataframe by swapping its indexes and columns by using
attribute T as shown below:-
print (Dfobj)
print ("Transposed DataFrame")
print (Dfobj.T)
Output:-
People Amount
0 10 16000
1 25 24600
2 40 56300
Transposed DataFrame
0 1 2
People 10 25 40
Amount 1600 24600 56300
SELECTING / ACCESSINGACOLUMN
We can select a column by using the following Syntax:-
<DataFrame Object> [<Column Name>] OR
<DataFrame Object> .<Column Name>
For Eg:-
print (Dfobj.Amount)
Output:-
0 16000
1 24600
2 56300
Name: Amount, dtype: int64
SELECTING / ACCESSING MULTIPLE COLUMNS
We can select more than one columns by using the following
Syntax:-
<DataFrameObject> [ [<Column Name1>,< Column
Name2>……] ]
For Eg:-
print(Dfobj[['Amount’, 'People']])
Output:-
Amount People
0 16000 10
1 24600 25
2 56300 40
SELECTING / ACCESSING ASUBSET FROM A
DATAFRAME USING ROW/COLUMN NAMES
To access a row or set of rows and columns , you can use following
syntax to select / access a subset from a dataframe object:-
<DataFrame Object> .loc [ <startrow>:<endrow>,
<startcolumn>:<endcolumn>]
❖ To access a row, just give the row name / label as this:
<DataFrame Object> .loc [ <rowlabel>,:]. Make sure not to miss
the colon after comma.
print(Dfobj.loc[0,:])
Output:-
People 10
Amount 16000
Name: 0, dtype: int64
❖ To AccessMultiple Rows:- You can use
<DataFrame Object> .loc [ <startrow>:<endrow>,
:]. Make sure not to miss the COLONAFTERCOMMA.
For Eg:-
print(Dfobj.loc[0 : 1, :])
Output:-
People Amount
0 10 16000
1 25 24600
❖ TO ACCESSSELECTIVE COLUMNS
You can use
<DataFrame Object> . loc [ : , <start column>:<end column>].
Make sure not to miss the COLON BEFORECOMMA. Like rows, all
columns falling between start and end columns, will also be listed:-
For Eg:-
print(Dfobj . loc[: , 'People’ ])
Output:-
0 10
1 25
2 40
Name: People, dtype: int64
❖ SELECTING ROWS/COLUMNS FROM A DATAFRAME
Sometimes your dataframe object does not contain row or column labels or even
you may not remember them. In such cases, you can extract subset from dataframe
using the row and column numeric index/position. But this time you will use iloc
instead of loc. iloc means integer location.
<DataFrame Object> . iloc [ <start row index> : <end row index> , <start
column>:<end column>].
When you use iloc, then <startindex>:<endindex> given for rows and columns
work like slices, and the endindex is excluded (unlike loc), just as in the slices.
Consider the following example:
print(Dfobj . iloc [0:2, 1:2])
Output:-
Amount
0 16000
1 24600
WRITE STATEMENT TO DO THE FOLLOWING:
• Display rows 1to 4 (both inclusive).
print (df [1:5])
• From rows 2 to 4 (both inclusive), display columns ‘Item
Types’ and ‘Total Profit’.
print (df . loc[2:5, [‘Item Types’, ‘Total Profit’]])
• From rows 3 to 6 (both inclusive), display first four columns.
print (df . iloc [2:5, 0:4])
loc & iloc
With loc, both start label and end label
are included when given as start:end,
but with iloc , like slices end
index/position is excluded when given
as start:end.