0% found this document useful (0 votes)

15 views4 pages

Day7 PandasCoreFeatures

Uploaded by

Anjana Mahawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views4 pages

Day7 PandasCoreFeatures

Uploaded by

Anjana Mahawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

# Before using any methods of pandas, you have to import the library

import pandas as pd

Series : It is like a single column of a table or a 1 dimensional array

a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
# Output is the first column is index, 2nd is the series a
print(myvar[0])

# Provide the first column index to proper names.

myvar = pd.Series(a,index = ["row1","row2","row3"])
print(myvar)

Dataframes is actually similar to either

a. tables in SQL or

b. 2d dimensional array

healthdata = {
"weight": [25, 60, 84,103,76,40],
"height": [140,182,161,182,192,123]
}
myhealth = pd.DataFrame(healthdata)

print(myhealth)

# By using the index you can give a proper name called label to the
index if needed.
# Row1, Row2 ... Row6 are labels.
healthdata = {
"weight": [25, 60, 84,103,76,40],
"height": [140,182,161,182,192,123]
}
mydf =
pd.DataFrame(healthdata,index=["Row1","Row2","Row3","Row4","Row5","Row
6"])
print(mydf)

# To subset the data we can apply Boolean indexing.

# This indexing is commonly known as a filter.
# For example if we want to subset the rows in which the weight is
greater than 73:

# Filter the weight above 100

mydf_sub = mydf[ mydf['weight'] > 73 ]

print(mydf_sub)
Like >, you can also check ==, <, <=, >= Filter the height less than 125

#There are a number of ways to subset the Data Frame:

# one or more columns
# one or more rows
# a subset of rows and columns
# Rows and columns can be selected by their position or label
#
# When selecting one column, it is possible to use single set of
brackets,
# but the resulting object will be a Series (not a DataFrame):

#Select column height, this will return a series

mydf['height']

# When we need to select more than one column and/or make the output
to be a DataFrame,
# we should use double brackets:

#Select column 'weight' , height:

mydf[['weight','height']]

# If we need to select a range of rows, we can specify the range using

":"
#Select rows by their position:
mydf[1:3]

# Notice that the first row has a position 0, and the last value in
the range is omitted:
# So for 0:10 range the first 10 rows are returned with the positions
starting with 0 and ending with 9

#If we need to select a range of rows, using their labels we can use
method loc:
#Select rows by their labels:
mydf.loc["Row1":"Row3",['weight']]

# If we need to select a range of rows and/or columns, using their

positions we can use method iloc:
# 1st position is the height
# remember the position starts from 0.
mydf.iloc[2:4,[1]]

Below set of iloc and loc code snippets for self practice They are different variations of how we
can get subset of dataframe based on index filter.

mydfdf.iloc[0] # First row of a data frame

i = 2
mydf.iloc[i] #(i+1)th row
mydf.iloc[-1] # Last row
mydf.iloc[:, 0] # First column
mydf.iloc[:, -1] # Last column
mydf.iloc[0:7] #First 7 rows
mydf.iloc[:, 0:2] #First 2 columns
mydf.iloc[1:3, 0:2] #Second through third rows and first 2 columns
mydf.iloc[[0,5], [1,3]] #1st and 6th rows and 2nd and 4th columns

Let's understanding what data Cleaning means. Its all about fix bad data. Bad data could be:
Empty cells Data in wrong format Wrong data Duplicates

Orders = {
"Orderno" : [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,17,0,21,17],
"CustomerName" :
["Rk","Mike","Ben","Veronica","Maria","Lata","Judiath","Blake","George
","Duke","Prish",

"Ivan",None,"Nancy","Sarvesh","Jay","Jayant","Margaret","Jay"],
"OrderAmount" : [2400,1432,173,258,3402,7143,143422,1734,2143,12,-
23473,17343,593,432,943,999,1843,0,999],
"OrderQty" :
[24,32,17,2,4,143,None,172,432,21,2,17,13,12,8,12,6,3,12],
"OrderStatus" :
["Open","Closed","InProgress","Cancelled",None,"Open","Closed","InProg
ress","Cancelled",

"Open","Closed","InProgress","Cancelled",None,"Open","Closed","InProgr
ess","Cancelled","Closed"],
}
ordersdf = pd.DataFrame(Orders)
print(ordersdf)

Let's look at empty cells. there are empty cells at row index 4, 12 and 13 We can do following

1. Remove all rows which have empty values

2. Replace it with some value
# remove all row with empty values, all values with None are treated
as empty values
newordersdf = ordersdf.dropna()
print(newordersdf)

# To replace all row with empty values, all values with None are
treated as empty values
newordersdf = ordersdf.fillna(19)
print(newordersdf)
# Here this is wrong as customer name, order status cannot have value
19.

# Let's change only the orderstatus with Na to "Open"

ordersdf.fillna({"OrderStatus":"Open"}, inplace = True)

# inplace=True changes the original ordersdf as well.

print(ordersdf)

# Orderqty is None, there we can try to have a mean or median as

replace value
OrdMedian = ordersdf["OrderQty"].median()
ordersdf.fillna({"OrderQty":OrdMedian},inplace=True)
print(ordersdf)

Let's handle now wrong data value Refer to OrderAmount it has a negative value. This could be
a pure mistake and we can replace the specific value in dataframe

ordersdf.loc[10,"OrderAmount"] = 23473
print(ordersdf)

Wrong Format can be changed Let's see orderqty seems to be a float while it should be integer.
This can be changed using type casting

ordersdf['OrderQty'] = ordersdf['OrderQty'].astype(int)
print(ordersdf)

Let's Now check for Duplicate entries. Here we will use duplicated() and drop_duplicated()

print(ordersdf.duplicated())

ordersdf.drop_duplicates(inplace = True)
print(ordersdf)

Pandas DataFrame Notes - 12pages-Pages-4
No ratings yet
Pandas DataFrame Notes - 12pages-Pages-4
1 page
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Unit IV
No ratings yet
Unit IV
49 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Dataframes UNIT 1 PART 2
No ratings yet
Dataframes UNIT 1 PART 2
33 pages
Unit 2 notes-II
No ratings yet
Unit 2 notes-II
47 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Data Handling for Data Scientists
No ratings yet
Data Handling for Data Scientists
163 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Python Pandas for Data Science
No ratings yet
Python Pandas for Data Science
59 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Line by Line 12 IP
No ratings yet
Line by Line 12 IP
21 pages
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
Lecture 1 On DataFrame
No ratings yet
Lecture 1 On DataFrame
4 pages
Unit 2
No ratings yet
Unit 2
81 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Revision Notes DataFrame XII IP
No ratings yet
Revision Notes DataFrame XII IP
8 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas Notebook
No ratings yet
Pandas Notebook
24 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Data Science Notes Unit-1 Part - 2
No ratings yet
Data Science Notes Unit-1 Part - 2
22 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas Notes
No ratings yet
Pandas Notes
3 pages
Dataframe
No ratings yet
Dataframe
19 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Ip Study
No ratings yet
Ip Study
18 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Pandas - Cheatsheet
No ratings yet
Pandas - Cheatsheet
4 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
Pandas - Jupyter Notebook
No ratings yet
Pandas - Jupyter Notebook
23 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Data Analysis With PANDAS: Cheat Sheet
86% (7)
Data Analysis With PANDAS: Cheat Sheet
4 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
Separable Programming in NLP
No ratings yet
Separable Programming in NLP
9 pages
Instaglam
No ratings yet
Instaglam
2 pages
EXP-LG-080251 - 0 - Suport
No ratings yet
EXP-LG-080251 - 0 - Suport
2 pages
Question Bank: Practical No. 9 (Batch: A) JPR (TYIF) : College
No ratings yet
Question Bank: Practical No. 9 (Batch: A) JPR (TYIF) : College
6 pages
Cs2202 ANNA UNIV Question Paper 1
No ratings yet
Cs2202 ANNA UNIV Question Paper 1
3 pages
Changes
No ratings yet
Changes
84 pages
Brochure
No ratings yet
Brochure
2 pages
DBMS - Quiz 006 - 5 PDF
No ratings yet
DBMS - Quiz 006 - 5 PDF
5 pages
AC 20-170 - Integrated Modular Avionics Development, Verification, Integration, and Approval Using RTCA - DO-297 and Technical Standard Order-C153 - With Change 1 PDF
No ratings yet
AC 20-170 - Integrated Modular Avionics Development, Verification, Integration, and Approval Using RTCA - DO-297 and Technical Standard Order-C153 - With Change 1 PDF
66 pages
Workshop 1 PDF
No ratings yet
Workshop 1 PDF
4 pages
Benutzer Eng
No ratings yet
Benutzer Eng
528 pages
Architecture Models, A Necessity or Luxury: A Case Study of Uet Architecture Department
No ratings yet
Architecture Models, A Necessity or Luxury: A Case Study of Uet Architecture Department
4 pages
Presentation Impact of Technology On Communication
No ratings yet
Presentation Impact of Technology On Communication
12 pages
It Standard For Business
No ratings yet
It Standard For Business
133 pages
Veeam Backup 7 0 Release Notes
No ratings yet
Veeam Backup 7 0 Release Notes
19 pages
CABINS: A Framework of Knowledge Acquisition and Iterative Revision For Schedule Improvement and Reactive Repair
No ratings yet
CABINS: A Framework of Knowledge Acquisition and Iterative Revision For Schedule Improvement and Reactive Repair
31 pages
3D Studio MAX 3 Fundamentals
No ratings yet
3D Studio MAX 3 Fundamentals
97 pages
Game Theory: Bhaskar Dutta
No ratings yet
Game Theory: Bhaskar Dutta
25 pages
Microprocessor Quiz Questions
No ratings yet
Microprocessor Quiz Questions
7 pages
9 - Data Conversion
No ratings yet
9 - Data Conversion
4 pages
Coursera Course Guide for UAP CSE Students
No ratings yet
Coursera Course Guide for UAP CSE Students
6 pages
Barcoding Guide
No ratings yet
Barcoding Guide
13 pages
Software Problems
No ratings yet
Software Problems
22 pages
SSI New Generation DFSS Toolset
100% (3)
SSI New Generation DFSS Toolset
2 pages
Flash March Mc000091 MW
No ratings yet
Flash March Mc000091 MW
4 pages
BW Brief 19 June 2017
No ratings yet
BW Brief 19 June 2017
1 page
PVAAS Data Tools
No ratings yet
PVAAS Data Tools
4 pages
A Study of Human Resource Development in Mahindra & Mahindra Limited, Nagpur
No ratings yet
A Study of Human Resource Development in Mahindra & Mahindra Limited, Nagpur
3 pages
CS76-Computer Graphics Lab
No ratings yet
CS76-Computer Graphics Lab
26 pages
Image Compression: by Artificial Neural Networks
No ratings yet
Image Compression: by Artificial Neural Networks
14 pages

Day7 PandasCoreFeatures

Uploaded by

Day7 PandasCoreFeatures

Uploaded by

# Before using any methods of pandas, you have to import the library

Series : It is like a single column of a table or a 1 dimensional array

# Provide the first column index to proper names.

Dataframes is actually similar to either

# To subset the data we can apply Boolean indexing.

# Filter the weight above 100

mydf_sub = mydf[ mydf['weight'] > 73 ]

#There are a number of ways to subset the Data Frame:

#Select column height, this will return a series

#Select column 'weight' , height:

# If we need to select a range of rows, we can specify the range using

# If we need to select a range of rows and/or columns, using their

mydfdf.iloc[0] # First row of a data frame

1. Remove all rows which have empty values

# Let's change only the orderstatus with Na to "Open"

# inplace=True changes the original ordersdf as well.

# Orderqty is None, there we can try to have a mean or median as

You might also like