Pandas Shan Ver2

The document provides an overview of the Pandas library, highlighting its capabilities for data manipulation with DataFrames and Series, which are built on top of NumPy. It covers installation, basic operations, data indexing, and selection methods, as well as the creation of pivot tables for data summarization. Additionally, it illustrates how to construct DataFrames from various data structures and the use of the Index object in Pandas.

Uploaded by

amalamargret.cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views25 pages

Pandas Shan Ver2

Uploaded by

amalamargret.cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Pandas

➢ A newer package built on top of NumPy

and provides an efficient implementation of a
DataFrame.
➢ Pandas implements a number of powerful
data operations familiar to users of both
database frameworks and spreadsheet
programs.
Installing and Starting
import pandas
pandas.__version__
Output: '0.18.1‘

import numpy as np
import pandas as pd
data = pd.Series([0.25, 0.5, 0.75,
1.0])
print(data) Output: 01 0.25
0.50
2 0.75
3 1.00
dtype: float64
print(data.values)
Output: [ 0.25, 0.5 , 0.75, 1. ]
print(data.index)
Output: RangeIndex(start=0, stop=4, step=1)
print(data[1])
Output: 0.5
print(data[1:3])
Output 1 0.50
2 0.75
dtype: float64
Series as generalized NumPy array:
The essential difference is the presence of the
index: while the NumPy array has an implicitly
defined integer index used to access the values, the
Pandas Series has an explicitly defined index
associated with the values.
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
print(data)
Output: a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
print(data['b'])
Output: 0.5
We can even use noncontiguous or nonsequential
indices:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=[2, 5, 3, 7])
print(data)
Output: 2 0.25
5 0.50
3 0.75
7 1.00
dtype: float64
print(data[5])
Output: 0.5
Series as specialized dictionary:
A dictionary is a structure that maps arbitrary keys to a set of
arbitrary values, and a Series is a structure that maps typed keys to
a set of typed values.

population_dict = {'California': 38332521,

'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
population = pd.Series(population_dict)
print(population)

Output: California 38332521

Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64
The Pandas DataFrame Object
• The next fundamental structure in Pandas is
the DataFrame.
• Like the Series object discussed in the
previous section, the DataFrame can be
thought of either as a generalization of a
NumPy array or as a specialization of a
Python dictionary.
• DataFrame as a generalized NumPy array:
A DataFrame is an analog of a two-dimensional
array with both flexible row indices and flexible column
names.
area_dict = {'California': 423967, 'Texas': 695662, 'New
York': 141297, 'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
print(area)
Output: California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
dtype: int64
states = pd.DataFrame({'population': population, 'area':
area})
print(states)
Output: area population
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193

print(states.index)
Output:
Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'],
dtype='object')

print(states.columns)
Output: Index(['area', 'population'], dtype='object')
DataFrame as specialized dictionary:
we can also think of a DataFrame as a
specialization of a dictionary. Where a dictionary
maps a key to a value, a DataFrame maps a column
name to a Series of column data.
print(states['area'])
Output: California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64
Constructing DataFrame objects:
From a single Series object:
print(pd.DataFrame(population, columns=['population']))
Output: population
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
From a list of dicts:
data = [{'a': i, 'b': 2 * i}
for i in range(3)]
pd.DataFrame(data)
Output: a b
0 0 0
1 1 2
2 2 4
From a dictionary of Series objects:
print(pd.DataFrame({'population': population,
'area': area}))
Output: area population
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193
From a two-dimensional NumPy array:
print(pd.DataFrame(np.random.rand(3, 2), columns=[‘x’, ‘y'],
index=['a', 'b', 'c']) )
Output: foo bar
a 0.865257 0.213169
b 0.442759 0.108267
c 0.047110 0.905718
• From a NumPy structured array:
A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
print(A)
Output: array([(0, 0.0), (0, 0.0), (0, 0.0)],
dtype=[('A', '<i8'), ('B', '<f8')])
print(pd.DataFrame(A))
Output: A B
0 0 0.0
1 0 0.0
2 0 0.0
The Pandas Index Object
• This Index object is an interesting structure in
itself, and it can be thought of either as an
immutable array or as an ordered set
(technically a multiset, as Index objects may
contain repeated values).
An Index from a list of integers:
ind = pd.Index([2, 3, 5, 7, 11])
print(ind)
Output: Index([2, 3, 5, 7, 11], dtype='int64')
Index as immutable array:
print(ind[1])
Output: 3
print(ind[::2])
Output: Int64Index([2, 5, 11], dtype='int64')
print(ind.size, ind.shape, ind.ndim, ind.dtype)
Output: 5 (5,) 1 int64
ind[1] = 0
TypeError Traceback (most recent call last)
Index as ordered set:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
print(indA.intersection(indB)) # intersection
Output: Int64Index([3, 5, 7], dtype='int64')
print(indA.union(indB)) # union
Output: Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')
print(indA.symmetric_difference(indB)) # symmetric difference
Output: Int64Index([1, 2, 9, 11], dtype='int64')
Data Indexing and Selection
➢ We looked in detail at methods and tools to
access, set, and modify values in NumPy arrays.
➢ These included indexing , slicing, masking,
fancy indexing and combinations.
➢ Here we’ll look at similar means of accessing
and modifying values in Pandas Series and
DataFrame objects.
Data Selection in Series:
A Series object acts in many ways like a one dimensional NumPy
array, and in many ways like a standard Python dictionary.
If we keep these two overlapping analogies in mind, it will help us
to understand the patterns of data indexing and selection in these arrays.
a) Series as dictionary:
import pandas as pd
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
print(data)
Output: a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
print(data['b'])
Output: 0.5
b) Series as one- # masking
dimensional array:
# slicing by explicit index data[(data > 0.3) & (data
data['a':'c'] < 0.8)]
Output: a 0.25 Output: b 0.50
b 0.50 c 0.75
c 0.75 dtype: float64
dtype: float64 # fancy indexing
# slicing by implicit
integer index data[['a', 'e']]
data[0:2] Output: a 0.25
Output: a 0.25 e 1.25
b 0.50 dtype: float64
dtype: float64
c) Indexers: loc, iloc, and ix iloc attribute allows indexing and
slicing that always references
data = pd.Series(['a', 'b', the implicit Python-style index
'c'], index=[1, 3, 5]) print(data.iloc[1])
print(data) Output: 'b'
Output: 1 a print(data.iloc[1:3])
3 b Output: 3 b
5 c
5 c
dtype: object
dtype: object A third indexing attribute, ix, is
print(data.loc[1]) a hybrid of the two, and for
Series objects is equivalent to
Output: 'a' standard []-based indexing.
print(data.loc[1:3]) The purpose of the ix indexer
Output: 1 a will become more apparent in
the context of DataFrame
3 b objects, which we will discuss in
dtype: object a moment.
Pivot Tables
• A pivot table is a similar operation that is commonly
seen in spreadsheets and other programs that
operate on tabular data.
• The pivot table takes simple columnwise data as
input, and groups the entries into a two-dimensional
table that provides a multidimensional
summarization of the data.
• The difference between pivot tables and GroupBy
can sometimes cause confusion; it helps me to think
of pivot tables as essentially a multidimensional
version of GroupBy aggregation.
• That is, we splitapply-combine, but both the split and
the combine happen across not a onedimensional
index, but across a two-dimensional grid.
Motivating Pivot Tables:
import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
titanic.head()
Pivot Tables by Hand
Using the vocabulary of GroupBy, we might proceed using
something like this: we group by class and gender, select
survival, apply a mean aggregate, com bine the resulting
groups, and then unstack the hierarchical index to reveal
the hidden multidimensionality.
Example:
titanic.groupby(['sex',
'class'])['survived'].aggregate('mean').unstack()
Pivot Table Syntax
titanic.pivot_table('survived', index='sex',
columns='class')

Multilevel pivot
age = pd.cut(titanic['age'], [0, 18, 80])
titanic.pivot_table('survived', ['sex', age], 'class')
Birthrate Data:
#This data can be found at
#https://raw.githubusercontent.com
births = pd.read_csv('births.csv')
births.head()
births['decade'] = 10 * (births['year'] // 10)
births.pivot_table('births', index='decade',
columns='gender', aggfunc='sum')

PYTHON NOTES 1,2,3
No ratings yet
PYTHON NOTES 1,2,3
90 pages
Data Manipulation With Pandas (1)
No ratings yet
Data Manipulation With Pandas (1)
138 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pandas
No ratings yet
Pandas
57 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
Pandas
No ratings yet
Pandas
163 pages
PC U1 2024
No ratings yet
PC U1 2024
82 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
MLL IP XII
No ratings yet
MLL IP XII
22 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
Pandas
No ratings yet
Pandas
49 pages
Pandas Fundamentals (1)
No ratings yet
Pandas Fundamentals (1)
90 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Analysis Services MOLAP Performance Guide For SQL Server 2012 and 2014
No ratings yet
Analysis Services MOLAP Performance Guide For SQL Server 2012 and 2014
110 pages
Python Unit- 1
No ratings yet
Python Unit- 1
49 pages
Amharic Orthodox Bible 81
No ratings yet
Amharic Orthodox Bible 81
3 pages
Notes - EDA-Unit2 (1)
No ratings yet
Notes - EDA-Unit2 (1)
43 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
pandas notes
No ratings yet
pandas notes
19 pages
Lecture 1 - DB Concepts and ER Model
No ratings yet
Lecture 1 - DB Concepts and ER Model
35 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Unit_III_part_2_1725700061785
No ratings yet
Unit_III_part_2_1725700061785
85 pages
leip102
No ratings yet
leip102
36 pages
Unit 1 DMW
No ratings yet
Unit 1 DMW
41 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
complete worksheet[1]
No ratings yet
complete worksheet[1]
21 pages
Httpsncert.nic.Intextbookpdfleip102.PDF
No ratings yet
Httpsncert.nic.Intextbookpdfleip102.PDF
36 pages
Lecture 11
No ratings yet
Lecture 11
17 pages
UNIT 3(Chapter 2) Pandas
No ratings yet
UNIT 3(Chapter 2) Pandas
43 pages
ch02
No ratings yet
ch02
37 pages
DS
No ratings yet
DS
38 pages
05Getting Started With Pandas
No ratings yet
05Getting Started With Pandas
44 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
3.1 - Data Collection - Image and Text
No ratings yet
3.1 - Data Collection - Image and Text
31 pages
UNIT 5 CC
No ratings yet
UNIT 5 CC
6 pages
Pandas
No ratings yet
Pandas
82 pages
Python Data Processing
No ratings yet
Python Data Processing
36 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
DDC
No ratings yet
DDC
39 pages
Pandas
No ratings yet
Pandas
36 pages
unit 3
No ratings yet
unit 3
10 pages
Pandas.ipynb - Colab (1)
No ratings yet
Pandas.ipynb - Colab (1)
8 pages
UNIT 4 CC
No ratings yet
UNIT 4 CC
6 pages
SRS Document For ER Diagram
No ratings yet
SRS Document For ER Diagram
11 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Goldmine 4 Network Installation
No ratings yet
Goldmine 4 Network Installation
14 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
ip study
No ratings yet
ip study
18 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
12ip 22 23
No ratings yet
12ip 22 23
188 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Database Design Assesment
No ratings yet
Database Design Assesment
26 pages
Fundamental of Data Analysis Assignment
No ratings yet
Fundamental of Data Analysis Assignment
4 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
Pandas
No ratings yet
Pandas
21 pages
System Copy (Cloning) - Oracle DBA For SAP Basis
No ratings yet
System Copy (Cloning) - Oracle DBA For SAP Basis
17 pages
Nasscom - Report (Finally)
No ratings yet
Nasscom - Report (Finally)
31 pages
Ip 102
No ratings yet
Ip 102
36 pages
XII_ip_Panda_I_Part_I_2023 (1) 1 1
No ratings yet
XII_ip_Panda_I_Part_I_2023 (1) 1 1
25 pages
Unit 5
No ratings yet
Unit 5
3 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
CH 2
No ratings yet
CH 2
36 pages
Cs205 Mid by M. Qasim
100% (1)
Cs205 Mid by M. Qasim
15 pages
Ms-Access Notes
100% (1)
Ms-Access Notes
28 pages
OAF Training
No ratings yet
OAF Training
63 pages
Mobox Log
No ratings yet
Mobox Log
2 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Keyrus - 2021 Alteryx Designer Onboarding
No ratings yet
Keyrus - 2021 Alteryx Designer Onboarding
41 pages
Mohit
No ratings yet
Mohit
19 pages
Introduction to Pandas & Data Structures
No ratings yet
Introduction to Pandas & Data Structures
11 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
DBMS Questions Answers
100% (1)
DBMS Questions Answers
64 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
2.2 Data Indexing and Selection
No ratings yet
2.2 Data Indexing and Selection
8 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
joins_SQL_cheat_sheet
No ratings yet
joins_SQL_cheat_sheet
1 page
Answer Basics of Database Development - CAT Term 2
No ratings yet
Answer Basics of Database Development - CAT Term 2
5 pages
How To Convert Visual Foxpro Database Into SQL Server Database
No ratings yet
How To Convert Visual Foxpro Database Into SQL Server Database
3 pages
Cloud Practitioner
No ratings yet
Cloud Practitioner
5 pages
Web Development Training Syllabus: 1. Core PHP+MYSQL (Level - 1)
No ratings yet
Web Development Training Syllabus: 1. Core PHP+MYSQL (Level - 1)
3 pages
Hospital Are The Essential Part of Our Live1
No ratings yet
Hospital Are The Essential Part of Our Live1
15 pages
2 Business Objects
No ratings yet
2 Business Objects
3 pages
SLT (SAP Landscape Transformation Replication Server) in SAP HANA
No ratings yet
SLT (SAP Landscape Transformation Replication Server) in SAP HANA
5 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

Pandas Shan Ver2

Uploaded by

Pandas Shan Ver2

Uploaded by

Pandas

➢ A newer package built on top of NumPy

population_dict = {'California': 38332521,

Output: California 38332521

You might also like