0% found this document useful (0 votes)

22 views8 pages

Pandas - Ipynb - Colab

The document provides an overview of Pandas objects, specifically the Series and DataFrame structures, highlighting their creation, indexing, and manipulation. It explains how to construct these objects from various data types, including lists, dictionaries, and NumPy arrays, and discusses data indexing techniques using loc and iloc. Additionally, it covers operations on data, handling missing values, and basic statistical functions available in Pandas.

Uploaded by

Asra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views8 pages

Pandas - Ipynb - Colab

Uploaded by

Asra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

keyboard_arrow_down Introducing Pandas Objects

import numpy as np
import pandas as pd

keyboard_arrow_down The Pandas Series Object

A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or array as follows:

data = pd.Series([5,14,99,888])
data

0 5

1 14

2 99

3 888

dtype: int64

data[3]

888

data.values

array([ 5, 14, 99, 888])

data.index

RangeIndex(start=0, stop=4, step=1)

Like with a NumPy array, data can be accessed by the associated index via the familiar Python square-bracket notation:

data[1]

14
data[1:3]

1 14

2 99

dtype: int64

keyboard_arrow_down Series as generalized NumPy array

Index is the difference between numpy and pandas

data = pd.Series([0.25, 0.5, 0.75, 1.0],

index=['a', 'b', 'c','d'])
data

a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64

And the item access works as expected:

data['b']

0.5

keyboard_arrow_down Series as specialized dictionary

The Series -as-dictionary analogy can be made even more clear by constructing a Series object directly from a Python dictionary:

student_dict = {'Ram': 123,

'Shyam': 124,
'Arun': 125}
students = pd.Series(student_dict)
students

Ram 123

Shyam 124

Arun 125

dtype: int64

By default, a Series will be created where the index is drawn from the sorted keys. From here, typical dictionary-style item access can be
performed:

#To access rollno of Ram

students['Ram']

123

keyboard_arrow_down The Pandas DataFrame Object

The next fundamental structure in Pandas is the DataFrame . Like the Series object discussed in the previous section, the DataFrame can be
thought of either as a generalization of a NumPy array, or as a specialization of a Python dictionary. We'll now take a look at each of these
perspectives.
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]

# Create the pandas DataFrame

df = pd.DataFrame(data, columns=['Name', 'Age'])

# print dataframe.
df

Name Age

0 tom 10

1 nick 15

2 juli 14

df.index

RangeIndex(start=0, stop=3, step=1)

Additionally, the DataFrame has a columns attribute, which is an Index object holding the column labels:

df.columns

Index(['Name', 'Age'], dtype='object')

Thus the DataFrame can be thought of as a generalization of a two-dimensional NumPy array, where both the rows and columns have a
generalized index for accessing the data.

df['Name']

0 tom
1 nick
2 juli
Name: Name, dtype: object

keyboard_arrow_down Constructing DataFrame objects

A Pandas DataFrame can be constructed in a variety of ways. Here we'll give several examples.

keyboard_arrow_down From a single Series object

A DataFrame is a collection of Series objects, and a single-column DataFrame can be constructed from a single Series :

pd.DataFrame(students, columns=['rollno'])

rollno

Ram 123

Shyam 124

Arun 125

keyboard_arrow_down From a list of dicts

Any list of dictionaries can be made into a DataFrame .

Even if some keys in the dictionary are missing, Pandas will fill them in with NaN (i.e., "not a number") values:

pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

a b c

0 1.0 2 NaN

1 NaN 3 4.0

keyboard_arrow_down From a two-dimensional NumPy array

Given a two-dimensional array of data, we can create a DataFrame with any specified column and index names. If omitted, an integer index will
be used for each:

np.random.rand(3, 2)

array([[0.48925761, 0.81202557],
[0.37526746, 0.9834642 ],
[0.10226165, 0.37402615]])

pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])

foo bar

a 0.965321 0.512423

b 0.969355 0.437354

c 0.196705 0.719428

keyboard_arrow_down From a NumPy structured array

We covered structured arrays in Structured Data: NumPy's Structured Arrays. A Pandas DataFrame operates much like a structured array, and
can be created directly from one:

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])

array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('A', '<i8'), ('B', '<f8')])

pd.DataFrame(A)

A B

0 0 0.0

1 0 0.0

2 0 0.0

keyboard_arrow_down Index as ordered set

Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic. The
Index object follows many of the conventions used by Python's built-in set data structure, so that unions, intersections, differences, and other
combinations can be computed in a familiar way:

indA = pd.Index([1, 3, 5, 7, 9])

indB = pd.Index([2, 3, 5, 7, 11])

indA & indB # intersection- common elements

<ipython-input-71-b0dd807d5915>:1: FutureWarning: Index.__and__ operating as a set operation is deprecated, in the future this will be a
indA & indB # intersection- common elements
Int64Index([3, 5, 7], dtype='int64')

indA | indB # union - all elements

Index([3, 3, 5, 7, 11], dtype='int64')

indA ^ indB # symmetric difference

Index([3, 0, 0, 0, 2], dtype='int64')

DATA INDEXING AND SELECTION

import pandas as pd
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
data

a 0.25

b 0.50

c 0.75

d 1.00

dtype: float64

# masking
data[(data ==0.5) ]

b 0.5

dtype: float64

# fancy indexing
data[['a', 'd']]

a 0.25
d 1.00
dtype: float64

Indexers: loc, iloc, and ix

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

data

1 a

3 b

5 c

dtype: object

# explicit index when indexing - user defined index

data[1]

'a'

# implicit index when slicing

data[1:3]

3 b

5 c

dtype: object
Because of this potential confusion in the case of integer indexes, Pandas provides some special indexer attributes that explicitly expose
certain indexing schemes.

First, the loc attribute allows indexing and slicing that always references the explicit index:

data.loc[1] #Local means explicit

'a'

data.loc[1:3]

1 a
3 b
dtype: object

The iloc attribute allows indexing and slicing that always references the implicit Python-style index:

data.iloc[1:3] #Implicit

3 b
5 c
dtype: object

student= [['Ram', 123,80,85], ['Shyam', 124,70,75],

['Arun', 125,35,60], ['Gopal', 235,95,70]]
data = pd.DataFrame(student,columns=['Name','Rollno',"FDS_Mark","DS_Mark"])

data

Name Rollno FDS_Mark DS_Mark

0 Ram 123 80 85

1 Shyam 124 70 75

2 Arun 125 35 60

3 Gopal 235 95 70

#Select all roll numbers

data['Rollno']

0 123
1 124
2 125
3 235
Name: Rollno, dtype: int64

data['name_dept'] = data['Name'] + "_CSE A"

data

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A

2 Arun 125 35 60 Arun_CSE A

3 Gopal 235 95 70 Gopal_CSE A

#Select first two rows

data[:2]

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A

#Operating on Pandas Data
#Dividing mark column by 100
data['FDS_Mark']/100

FDS_Mark

0 0.80

1 0.70

2 0.35

3 0.95

dtype: float64

data['DS_Mark']-15

DS_Mark

0 70

1 60

2 45

3 55

dtype: int64

data['Total_mark']=data['FDS_Mark']+data['DS_Mark']

data

Name Rollno FDS_Mark DS_Mark name_dept Total_mark

0 Ram 123 80 85 Ram_CSE A 165

1 Shyam 124 70 75 Shyam_CSE A 145

2 Arun 125 35 60 Arun_CSE A 95

3 Gopal 235 95 70 Gopal_CSE A 165

data['Total_mark'].mean()

142.5

data['Total_mark'].median()

155.0

data['Total_mark'].mode()

0 165
dtype: int64

#Handling Missing Data

isnull(): Generate a boolean mask indicating missing values

notnull(): Opposite of isnull()

dropna(): Return a filtered version of the data

fillna(): Return a copy of the data with missing values filled or imputed

import numpy as np
data = pd.Series([1, np.nan, 'hello', None])
data
0

0 1

1 NaN

2 hello

3 None

dtype: object

data.isnull()

0 False
1 True
2 False
3 True
dtype: bool

data.dropna() # Inplace changes original copy

0 1
2 hello
dtype: object

data

0 1
1 NaN
2 hello
3 None
dtype: object

#Filling Null Values

data.fillna(0)

0 1
1 0
2 hello
3 0
dtype: object

# forward-fill
data.fillna(method='ffill')

0 1
1 1
2 hello
3 hello
dtype: object

Start coding or generate with AI.

The Pandas Series Object-Print
No ratings yet
The Pandas Series Object-Print
16 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
CPE221 (2023-2024) - Lesson 3 - Pandas
No ratings yet
CPE221 (2023-2024) - Lesson 3 - Pandas
10 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
138 pages
Pandas Assignment Version-2
No ratings yet
Pandas Assignment Version-2
9 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
No ratings yet
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
4 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
Copy of Copy of Black Doodle Group Project Presentation - 20230903 - 211147 - 0000
No ratings yet
Copy of Copy of Black Doodle Group Project Presentation - 20230903 - 211147 - 0000
32 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
Ip Study
No ratings yet
Ip Study
18 pages
IP DataFrames (Introduction)
No ratings yet
IP DataFrames (Introduction)
18 pages
Class 12 Panda Project
No ratings yet
Class 12 Panda Project
13 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Subject IP
No ratings yet
Subject IP
9 pages
XII IP Resource Material - DataFrame
No ratings yet
XII IP Resource Material - DataFrame
22 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
33 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
6 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
10 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Pandas, Numpy, Matplotlib
No ratings yet
Pandas, Numpy, Matplotlib
11 pages
Unit 4
No ratings yet
Unit 4
36 pages
Python Data Analysis Basics
No ratings yet
Python Data Analysis Basics
32 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
ML Unit-2 Notes
No ratings yet
ML Unit-2 Notes
17 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Python Pandas Dataframe
No ratings yet
Python Pandas Dataframe
21 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas
No ratings yet
Pandas
12 pages
Pandas for Data Analysis Beginners
No ratings yet
Pandas for Data Analysis Beginners
89 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
41 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Class Xii Ip Ch-2 Dataframes
No ratings yet
Class Xii Ip Ch-2 Dataframes
100 pages
Pandas
No ratings yet
Pandas
11 pages
Practical Scientific Computing in Python A Workbook
No ratings yet
Practical Scientific Computing in Python A Workbook
43 pages
Lab1!10!07-2025 - Import Export Data Using NumPy Pandas
No ratings yet
Lab1!10!07-2025 - Import Export Data Using NumPy Pandas
5 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
Project Synopsis (Old)
No ratings yet
Project Synopsis (Old)
18 pages
ML Labs
No ratings yet
ML Labs
46 pages
Assertion and Reason Based Questions: Pip Install Matplotlib
100% (1)
Assertion and Reason Based Questions: Pip Install Matplotlib
57 pages
Introduction To Numpy
No ratings yet
Introduction To Numpy
41 pages
A Summer Trainning Report
No ratings yet
A Summer Trainning Report
18 pages
Matplotlib
No ratings yet
Matplotlib
17 pages
Aiml MCQS
No ratings yet
Aiml MCQS
48 pages
Usha Bhat: Research Scholar
No ratings yet
Usha Bhat: Research Scholar
2 pages
Python
No ratings yet
Python
3 pages
Documentclass
No ratings yet
Documentclass
6 pages
ENEL102 Fall 2024 Course Notes ALLin One Doc
No ratings yet
ENEL102 Fall 2024 Course Notes ALLin One Doc
26 pages
Numpy Array Operations in Jupyter Notebook
No ratings yet
Numpy Array Operations in Jupyter Notebook
13 pages
RANDOM FOREST (Binary Classification)
No ratings yet
RANDOM FOREST (Binary Classification)
5 pages
Intellipaat Python Certification Training Course Converted 3
No ratings yet
Intellipaat Python Certification Training Course Converted 3
11 pages
Fire Detection and Alert System Using Convolutional Neural Network
No ratings yet
Fire Detection and Alert System Using Convolutional Neural Network
6 pages
Unit-Wise Important Question Bank - Python Bcc402
No ratings yet
Unit-Wise Important Question Bank - Python Bcc402
10 pages
MIT Ans
No ratings yet
MIT Ans
216 pages
Python Frontmatter
No ratings yet
Python Frontmatter
12 pages
Stock Price Prediction Report
No ratings yet
Stock Price Prediction Report
44 pages
Matrix Project Chatbot
No ratings yet
Matrix Project Chatbot
45 pages
CampusX Course Index
No ratings yet
CampusX Course Index
3 pages
Solving Diffusion Equation with Fluidlearn
No ratings yet
Solving Diffusion Equation with Fluidlearn
8 pages
Machine-Learning-Using-Python-Pdf-Free (1) - 23-30
No ratings yet
Machine-Learning-Using-Python-Pdf-Free (1) - 23-30
8 pages
ML Lab (R22) Manual
No ratings yet
ML Lab (R22) Manual
25 pages
Matplotlib: John Hunter, Darren Dale, Eric Firing, Michael Droettboom and The Matplotlib Development Team
No ratings yet
Matplotlib: John Hunter, Darren Dale, Eric Firing, Michael Droettboom and The Matplotlib Development Team
100 pages
Fundamentals of Accelerated Computing With CUDA Python
No ratings yet
Fundamentals of Accelerated Computing With CUDA Python
2 pages

Pandas - Ipynb - Colab

Uploaded by

Pandas - Ipynb - Colab

Uploaded by

keyboard_arrow_down Introducing Pandas Objects

keyboard_arrow_down The Pandas Series Object

array([ 5, 14, 99, 888])

RangeIndex(start=0, stop=4, step=1)

keyboard_arrow_down Series as generalized NumPy array

Index is the difference between numpy and pandas

data = pd.Series([0.25, 0.5, 0.75, 1.0],

And the item access works as expected:

keyboard_arrow_down Series as specialized dictionary

student_dict = {'Ram': 123,

#To access rollno of Ram

keyboard_arrow_down The Pandas DataFrame Object

# Create the pandas DataFrame

RangeIndex(start=0, stop=3, step=1)

Index(['Name', 'Age'], dtype='object')

keyboard_arrow_down Constructing DataFrame objects

keyboard_arrow_down From a single Series object

keyboard_arrow_down From a list of dicts

Any list of dictionaries can be made into a DataFrame .

pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

keyboard_arrow_down From a two-dimensional NumPy array

keyboard_arrow_down From a NumPy structured array

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])

keyboard_arrow_down Index as ordered set

indA = pd.Index([1, 3, 5, 7, 9])

indA & indB # intersection- common elements

indA | indB # union - all elements

indA ^ indB # symmetric difference

Index([3, 0, 0, 0, 2], dtype='int64')

DATA INDEXING AND SELECTION

Indexers: loc, iloc, and ix

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

# explicit index when indexing - user defined index

# implicit index when slicing

data.loc[1] #Local means explicit

student= [['Ram', 123,80,85], ['Shyam', 124,70,75],

Name Rollno FDS_Mark DS_Mark

#Select all roll numbers

data['name_dept'] = data['Name'] + "_CSE A"

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A

2 Arun 125 35 60 Arun_CSE A

3 Gopal 235 95 70 Gopal_CSE A

#Select first two rows

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A

Name Rollno FDS_Mark DS_Mark name_dept Total_mark

0 Ram 123 80 85 Ram_CSE A 165

1 Shyam 124 70 75 Shyam_CSE A 145

2 Arun 125 35 60 Arun_CSE A 95

3 Gopal 235 95 70 Gopal_CSE A 165

#Handling Missing Data

isnull(): Generate a boolean mask indicating missing values

notnull(): Opposite of isnull()

dropna(): Return a filtered version of the data

data.dropna() # Inplace changes original copy

#Filling Null Values

Start coding or generate with AI.

You might also like