0% found this document useful (0 votes)

35 views10 pages

2.1 Pandas Objects

The document introduces Pandas, a Python data analysis library built on NumPy. It discusses the Pandas Series object which is a one-dimensional array of indexed data. It then covers the Pandas DataFrame object which is a two-dimensional data structure that can be thought of as a generalization of a NumPy array or a specialized dictionary.

Uploaded by

csbs249052

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views10 pages

2.1 Pandas Objects

Uploaded by

csbs249052

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Introduction to Pandas

 Pandas is a newer package built on top of NumPy that provides an efficient

implementation of a Data Frame.
 Data Frames are essentially multidimensional arrays with attached row and column
labels, often with heterogeneous types and/or missing data.
 As well as offering a convenient storage interface for labelled data, Pandas
implements a number of powerful data operations familiar to users of both
database frameworks and spreadsheet programs .

Pandas Objects
Three fundamental Pandas data structures: The Series, Data Frame, and Index.
We will start our code sessions with the standard NumPy and Pandas imports:
In [1]: import numpy as np
import pandas as pd

The Pandas Series Object

 A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or
array as follows:
In [2]: data = pd.Series([0.25, 0.5, 0.75, 1.0])
data
Out[2]: 0 0.25
1 0.50
2 0.75
3 1.00
dtype: float64
 The Series combines a sequence of values with an explicit sequence of indices, which we
can access with the values and index attributes. The values are simply a familiar NumPy
array:
In [3]: data.values
Out[3]: array([0.25, 0.5 , 0.75, 1. ])
The index is an array-like object of type pd.Index, which we'll discuss in more detail momentarily:
In [4]: data.index
Out[4]: RangeIndex(start=0, stop=4, step=1)

 Like with a NumPy array, data can be accessed by the associated index via the familiar
Python square-bracket notation:
In [5]: data[1]
Out[5]: 0.5
In [6]: data[1:3]
Out[6]: 1 0.50 2 0.75`
dtype: float64
Series as Generalized NumPy Array
 The Series object may appear to be basically interchangeable with a one-dimensional
NumPy array.
 The essential difference is that while the NumPy array has an implicitly defined integer
index used to access the values, the Pandas Series has an explicitly defined index associated
with the values.
 This explicit index definition gives the Series object additional capabilities.
 For example, the index need not be an integer, but can consist of values of any desired type.
So, if we wish, we can use strings as an index:
In [7]: data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
data Out[7]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
And the item access works as expected:
In [8]: data['b']
Out[8]: 0.5
We can even use noncontiguous or nonsequential indices:
In [9]: data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=[2, 5, 3, 7])
data
2 0.25
5 0.50
3 0.75
7 1.00
dtype: float64

In [10]: data[5]
Out[10]: 0.5

Series as Specialized Dictionary

 A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and
a Series is a structure that maps typed keys to a set of typed values.
 This typing is important: just as the type-specific compiled code behind a NumPy array
makes it more efficient than a Python list for certain operations, the type information of a
Pandas Series makes it more efficient than Python dictionaries for certain operations.
 The Series-as-dictionary analogy can be made even more clear by constructing
a Series object directly from a Python dictionary, here the five most populous US states
according to the 2020 census:
In [11]:
population_dict = {'California': 39538223, 'Texas': 29145505,
'Florida': 21538187, 'New York': 20201249,
'Pennsylvania': 13002700}
population = pd.Series(population_dict)
population

California 39538223
Texas 29145505
Florida 21538187
New York 20201249
Pennsylvania 13002700
dtype: int64
From here, typical dictionary-style item access can be performed:
In [12]:
population['California']
Out[12]:
39538223
Unlike a dictionary, though, the Series also supports array-style operations such as slicing:
In [13]:
population['California':'Florida']
Out[13]:
California 39538223
Texas 29145505
Florida 21538187
dtype: int64

Constructing Series Objects

We've already seen a few ways of constructing a Pandas Series from scratch. All of them are some
version of the following:

pd.Series(data, index=index)
where index is an optional argument, and data can be one of many entities.

For example, data can be a list or NumPy array, in which case index defaults to an integer
sequence:
In [14]:
pd.Series([2, 4, 6])
Out[14]:
0 2
1 4
2 6
dtype: int64
Or data can be a scalar, which is repeated to fill the specified index:
In [15]:
pd.Series(5, index=[100, 200, 300])
Out[15]:
100 5
200 5
300 5
dtype: int64

Or it can be a dictionary, in which case index defaults to the dictionary keys:

In [16]:
pd.Series({2:'a', 1:'b', 3:'c'})
Out[16]:
2 a
1 b
3 c
dtype: object
In each case, the index can be explicitly set to control the order or the subset of keys used:
In [17]:
pd.Series({2:'a', 1:'b', 3:'c'}, index=[1, 2])
Out[17]:
1 b
2 a
dtype: object

The Pandas DataFrame Object

 The next fundamental structure in Pandas is the DataFrame.
 Like the Series object discussed in the previous section, the DataFrame can be thought of
either as a generalization of a NumPy array, or as a specialization of a Python dictionary.
DataFrame as Generalized NumPy Array
 If a Series is an analog of a one-dimensional array with explicit indices, a DataFrame is an
analog of a two-dimensional array with explicit row and column indices.
 Just as you might think of a two-dimensional array as an ordered sequence of aligned one-
dimensional columns, you can think of a DataFrame as a sequence of aligned Series objects.
Here, by "aligned" we mean that they share the same index.

To demonstrate this, let's first construct a new Series listing the area of each of the five states
discussed in the previous section (in square kilometers):
In [18]:
area_dict = {'California': 423967, 'Texas': 695662, 'Florida':
170312,
'New York': 141297, 'Pennsylvania': 119280}
area = pd.Series(area_dict)
area
Out[18]:
California 423967
Texas 695662
Florida 170312
New York 141297
Pennsylvania 119280
dtype: int64
Now that we have this along with the population Series from before, we can use a dictionary to
construct a single two-dimensional object containing this information:
In [19]:
states = pd.DataFrame({'population': population,
'area': area})
states

Out[19]:

population area

California 39538223 423967

Texas 29145505 695662

Florida 21538187 170312

New York 20201249 141297

Pennsylvani
13002700 119280
a
Like the Series object, the DataFrame has an index attribute that gives access to the index labels:
In [20]:
states.index
Out[20]:
Index(['California', 'Texas', 'Florida', 'New York',
'Pennsylvania'], dtype='object')
Additionally, the DataFrame has a columns attribute, which is an Index object holding the column
labels:
In [21]:
states.columns
Out[21]:
Index(['population', 'area'], dtype='object')
Thus the DataFrame can be thought of as a generalization of a two-dimensional NumPy array,
where both the rows and columns have a generalized index for accessing the data.
DataFrame as Specialized Dictionary
Similarly, we can also think of a DataFrame as a specialization of a dictionary. Where a dictionary
maps a key to a value, a DataFrame maps a column name to a Series of column data. For example,
asking for the 'area' attribute returns the Series object containing the areas we saw earlier:
In [22]:
states['area']
Out[22]:
California 423967
Texas 695662
Florida 170312
New York 141297
Pennsylvania 119280
Name: area, dtype: int64

Constructing DataFrame Objects

A Pandas DataFrame can be constructed in a variety of ways. Here we'll explore several examples.
From a single Series object
A DataFrame is a collection of Series objects, and a single-column DataFrame can be constructed
from a single Series:
In [23]:
pd.DataFrame(population, columns=['population'])

population

California 39538223

Texas 29145505

Florida 21538187

New York 20201249

Pennsylvania 13002700
From a list of dicts
Any list of dictionaries can be made into a DataFrame. We'll use a simple list comprehension to
create some data:
In [24]:
data = [{'a': i, 'b': 2 * i}
for i in range(3)]
pd.DataFrame(data)
Out[24]:
a b

0 0 0

1 1 2

2 2 4
Even if some keys in the dictionary are missing, Pandas will fill them in with NaN values (i.e., "Not
a Number"; see Handling Missing Data):
In [25]:
pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
Out[25]:
a b c

0 1.0 2 NaN

1 NaN 3 4.0

From a dictionary of Series objects

As we saw before, a DataFrame can be constructed from a dictionary of Series objects as well:
In [26]:
pd.DataFrame({'population': population,
'area': area})
Out[26]:
population area

California 39538223 423967

Texas 29145505 695662

Florida 21538187 170312

New York 20201249 14129

1300270
Pennsylvania 119280
0

From a two-dimensional NumPy array

Given a two-dimensional array of data, we can create a DataFrame with any specified column
and index names. If omitted, an integer index will be used for each:
In [27]:
pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])
Out[27]:
foo bar

a 0.471098 0.317396

b 0.614766 0.305971
population area

foo bar

c 0.533596 0.512377

From a NumPy structured array

We covered structured arrays in Structured Data: NumPy's Structured Arrays. A
Pandas DataFrame operates much like a structured array, and can be created directly from one:
In [28]:
A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
A
Out[28]:
array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('A', '<i8'), ('B',
'<f8')])
In [29]:
pd.DataFrame(A)
Out[29]:

A B

0 0 0.0

1 0 0.0

2 0 0.0

The Pandas Index Object

 The Series and DataFrame objects both contain an explicit index that lets you reference and
modify data. This Index object is an interesting structure in itself, and it can be thought of
either as an immutable array or as an ordered set (technically a multiset, as Index objects
may contain repeated values).
 Those views have some interesting consequences in terms of the operations available
on Index objects. As a simple example, let's construct an Index from a list of integers:
In [30]:
ind = pd.Index([2, 3, 5, 7, 11])
ind
Out[30]:
Int64Index([2, 3, 5, 7, 11], dtype='int64')
Index as Immutable Array
The Index in many ways operates like an array. For example, we can use standard Python indexing
notation to retrieve values or slices:
In [31]: ind[1]
Out[31]: 3
In [32]: ind[::2]
Out[32]: Int64Index([2, 5, 11], dtype='int64')
Index objects also have many of the attribute familiar from NumPy arrays:
In [33]:
print(ind.size, ind.shape, ind.ndim, ind.dtype)
5 (5,) 1 int64
One difference between Index objects and NumPy arrays is that the indices are immutable
—that is, they cannot be modified via the normal means:

This immutability makes it safer to share indices between multiple DataFrames and arrays, without
the potential for side effects from inadvertent index modification.
Index as Ordered Set
Pandas objects are designed to facilitate operations such as joins across datasets, which depend on
many aspects of set arithmetic. The Index object follows many of the conventions used by Python's
built-in set data structure, so that unions, intersections, differences, and other combinations can be
computed in a familiar way:
In [35]: indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
In [36]: indA.intersection(indB)
Out[36]: Int64Index([3, 5, 7], dtype='int64')
In [37]: indA.union(indB)
Out[37]: Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')
In [38]: sindA.symmetric_difference(indB)
Int64Index([1, 2, 9, 11], dtype='int64')

Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
Python Pandas
No ratings yet
Python Pandas
177 pages
Python Unit -6 Pandas
No ratings yet
Python Unit -6 Pandas
106 pages
Pandas
No ratings yet
Pandas
163 pages
Data Manipulation With Pandas (1)
No ratings yet
Data Manipulation With Pandas (1)
138 pages
UNIT-04-PANDAS
No ratings yet
UNIT-04-PANDAS
46 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
pandas notes
No ratings yet
pandas notes
19 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
XII IP Ch 1 Python Pandas - I Series
No ratings yet
XII IP Ch 1 Python Pandas - I Series
45 pages
Unit 04 Pandas
No ratings yet
Unit 04 Pandas
46 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
leip102
No ratings yet
leip102
36 pages
Unit_III_part_2_1725700061785
No ratings yet
Unit_III_part_2_1725700061785
85 pages
UNIT 3(Chapter 2) Pandas
No ratings yet
UNIT 3(Chapter 2) Pandas
43 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Data Handling using Pandas-1
No ratings yet
Data Handling using Pandas-1
23 pages
Httpsncert.nic.Intextbookpdfleip102.PDF
No ratings yet
Httpsncert.nic.Intextbookpdfleip102.PDF
36 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Introduction to Pandas
No ratings yet
Introduction to Pandas
9 pages
Python Data Processing
No ratings yet
Python Data Processing
36 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Sr Ip Pandas i Full Notes
No ratings yet
Sr Ip Pandas i Full Notes
30 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
DS
No ratings yet
DS
38 pages
XII_ip_Panda_I_Part_I_2023 (1) 1 1
No ratings yet
XII_ip_Panda_I_Part_I_2023 (1) 1 1
25 pages
Pandas Notoes For XII PDF
No ratings yet
Pandas Notoes For XII PDF
12 pages
Python Pandas
100% (1)
Python Pandas
35 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Ip 102
No ratings yet
Ip 102
36 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Pandas
No ratings yet
Pandas
21 pages
CH 2
No ratings yet
CH 2
36 pages
Pandas
No ratings yet
Pandas
82 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
Data Handling Using Pandas - 1-2-1
No ratings yet
Data Handling Using Pandas - 1-2-1
10 pages
Pandas Fundamentals (1)
No ratings yet
Pandas Fundamentals (1)
90 pages
2.2 Data Indexing and Selection
No ratings yet
2.2 Data Indexing and Selection
8 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Pandas
No ratings yet
Pandas
57 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
PYTHON UNIT-5 Part-C
No ratings yet
PYTHON UNIT-5 Part-C
4 pages
Pandas
No ratings yet
Pandas
36 pages
Access Guide Coursera For Employee
No ratings yet
Access Guide Coursera For Employee
29 pages
SF EC TimeOff User
100% (1)
SF EC TimeOff User
102 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas basics
No ratings yet
Pandas basics
21 pages
Exp 25_26
No ratings yet
Exp 25_26
17 pages
HIRA Template 200912
0% (1)
HIRA Template 200912
107 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
Stone - de 123 DOJ Response To MTC Crowdstrike Reports
89% (9)
Stone - de 123 DOJ Response To MTC Crowdstrike Reports
4 pages
AndeStar SPA V3 UM072 V1.6
No ratings yet
AndeStar SPA V3 UM072 V1.6
369 pages
Introduction to Pandas & Data Structures
No ratings yet
Introduction to Pandas & Data Structures
11 pages
9106 1635 40 RCS Troubleshooting Diagrams
No ratings yet
9106 1635 40 RCS Troubleshooting Diagrams
98 pages
GV75M @track Air Interface Protocol - V3.09
No ratings yet
GV75M @track Air Interface Protocol - V3.09
359 pages
LC-32LE295I Service Manual
No ratings yet
LC-32LE295I Service Manual
67 pages
Lead innovative thinking and practice
No ratings yet
Lead innovative thinking and practice
25 pages
TPT-8020A User Manual V1.4
No ratings yet
TPT-8020A User Manual V1.4
18 pages
Assignment DCA & PGDCA2
No ratings yet
Assignment DCA & PGDCA2
9 pages
HE7200A BB2327 Installation Guide
No ratings yet
HE7200A BB2327 Installation Guide
27 pages
DLCourseFile (09 12 2021)
No ratings yet
DLCourseFile (09 12 2021)
78 pages
Presentaton PPT Stock Prediction
No ratings yet
Presentaton PPT Stock Prediction
10 pages
Technote Browser Script v8
No ratings yet
Technote Browser Script v8
38 pages
Updated Technology Lesson Plan Evaluation Checklist-1
No ratings yet
Updated Technology Lesson Plan Evaluation Checklist-1
2 pages
Ariva T65 Manual en v2 2
No ratings yet
Ariva T65 Manual en v2 2
36 pages
CG12 BSP
No ratings yet
CG12 BSP
31 pages
VHDL Code For 1 To 4 Demux
No ratings yet
VHDL Code For 1 To 4 Demux
6 pages
Tranformation
No ratings yet
Tranformation
18 pages
Tổng Hợp Đề Thi Cuối Kỳ 2 Lớp 6
No ratings yet
Tổng Hợp Đề Thi Cuối Kỳ 2 Lớp 6
51 pages
Miles and Snow's Organizational Strategies
No ratings yet
Miles and Snow's Organizational Strategies
15 pages
Smart Greenhouse Monitoring System Using Wireless Sensor Networks
No ratings yet
Smart Greenhouse Monitoring System Using Wireless Sensor Networks
6 pages
How To Connect Mobile Internet To Your PC Via Tethering
No ratings yet
How To Connect Mobile Internet To Your PC Via Tethering
9 pages
Satish Yerramsetti
No ratings yet
Satish Yerramsetti
4 pages
G110M G120D G120P G120C ET200pro 4.7SP10 Prod Info 0418 en-US
No ratings yet
G110M G120D G120P G120C ET200pro 4.7SP10 Prod Info 0418 en-US
6 pages
Experiment 4 Report Template
No ratings yet
Experiment 4 Report Template
3 pages
E-Commerce With MLM Software
No ratings yet
E-Commerce With MLM Software
11 pages
Computer Network Rcs 601
No ratings yet
Computer Network Rcs 601
2 pages
Ic A210e
No ratings yet
Ic A210e
2 pages
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

2.1 Pandas Objects

Uploaded by

2.1 Pandas Objects

Uploaded by

Introduction to Pandas

 Pandas is a newer package built on top of NumPy that provides an efficient

The Pandas Series Object

Series as Specialized Dictionary

Constructing Series Objects

Or it can be a dictionary, in which case index defaults to the dictionary keys:

The Pandas DataFrame Object

California 39538223 423967

Texas 29145505 695662

Florida 21538187 170312

New York 20201249 141297

Constructing DataFrame Objects

New York 20201249

From a dictionary of Series objects

California 39538223 423967

Texas 29145505 695662

Florida 21538187 170312

New York 20201249 14129

From a two-dimensional NumPy array

From a NumPy structured array

The Pandas Index Object

You might also like