0% found this document useful (0 votes)

8 views28 pages

Pandas Introduction: What Is Python Pandas Used For?

Pandas is an open-source Python library designed for data manipulation and analysis, built on top of NumPy. It provides powerful tools for data cleaning, merging, handling missing data, and supports operations like grouping and visualization. The document also covers installation, indexing, selecting data, handling missing values, hierarchical indexing, vectorized string operations, and working with time series data.

Uploaded by

mohdsadikabdulsamad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views28 pages

Pandas Introduction: What Is Python Pandas Used For?

Uploaded by

mohdsadikabdulsamad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Pandas Introduction

Pandas is a powerful and open-source Python library. The Pandas library is

used for data manipulation and analysis. Pandas consist of data structures
and functions to perform efficient operations on data.

What is Python Pandas used for?

The Pandas library is generally used for data science, but have you
wondered why? This is because the Pandas library is used in conjunction
with other libraries that are used for data science.

It is built on top of the NumPy library which means that a lot of the
structures of NumPy are used or replicated in Pandas.

The data produced by Pandas is often used as input for plotting functions
in Matplotlib, statistical analysis in SciPy, and machine learning algorithms
in Scikit-learn.

You must be wondering, Why should you use the Pandas Library. Python's
Pandas library is the best tool to analyze, clean, and manipulate data.

Here is a list of things that we can do using Pandas.

Data set cleaning, merging, and joining.

Easy handling of missing data (represented as NaN) in floating point as

well as non-floating point data.

Columns can be inserted and deleted from DataFrame and higher-

dimensional objects.

Powerful group by functionality for performing split-apply-combine

operations on data sets.

Data Visualization.

Getting Started with Pandas

Let's see how to start working with the Python Pandas library:

Installing Pandas
The first step in working with Pandas is to ensure whether it is installed in
the system or not. If not, then we need to install it on our system using
the pip command.

Follow these steps to install Pandas:

Step 1: Type 'cmd' in the search box and open it.

Step 2: Locate the folder using the cd command where the python-pip file
has been installed.

Step 3: After locating it, type the command:

pip install pandas

Importing Pandas
After the Pandas have been installed in the system, you need to import
the library. This module is generally imported as follows:

import pandas as pd

Indexing and
Selecting Data with
Pandas
Indexing in Pandas :
Indexing in pandas means simply selecting particular rows and
columns of data from a DataFrame. Indexing could mean selecting
all the rows and some of the columns, some of the rows and all of
the columns, or some of each of the rows and columns. Indexing
can also be known as Subset Selection.

Indexing Methods in Pandas:

Label-based indexing using .loc[]:

Uses labels or names to select data. You can specify row labels
and column names.

Example:

df.loc['row_label', 'column_name']

Integer-based indexing using .iloc[]:

Uses integer positions to select data. It's similar to Python's

typical 0-based indexing.

Example:

df.iloc[integer_row_position, integer_column_position]

Boolean indexing:

Uses boolean vectors to filter data. It selects rows or columns

where the condition is True.

Example:

df[df['column_name'] > 0]

CODE:-

import pandas as pd

# Create a sample DataFrame

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],

'Age': [25, 30, 35, 40, 45],

'City': ['New York', 'Los Angeles', 'Chicago', 'Houston',

'Phoenix'],

'Salary': [50000, 60000, 75000, 80000, 70000]

df = pd.DataFrame(data)

# Selecting a single column

print(df['Name'])

# Selecting multiple columns

print(df[['Name', 'Age']])

# Selecting rows by index label

print(df.loc[1]) # Select row with index label 1

# Selecting specific rows and columns

print(df.loc[[0, 2, 4], ['Name', 'City']]) # Rows 0, 2, 4 and columns

'Name' and 'City'

# Selecting rows by index position

print(df.iloc[3]) # Select row with index position 3 (zero-indexed)

# Selecting specific rows and columns by index position

print(df.iloc[[1, 3], [0, 2]]) # Rows 1 and 3, columns 0 and 2

(Name and City)

# Conditional selection
print(df[df['Age'] > 30]) # Select rows where Age is greater than
30

OUTPUT:-

0 Alice

Name: Name, dtype: object

Name Age

0 Alice 25

1 Bob 30

Name: Bob, Age: 30, City: Los Angeles, Salary: 60000

Name City

0 Alice New York

2 Charlie Chicago

4 Eve Phoenix

Name: David, Age: 40, City: Houston, Salary: 80000

Name City

1 Bob Los Angeles

3 David Houston

Name Age

2 Charlie 35

3 David 40

4 Eve 45
1. Operating on Data
in Pandas
#### Syntax and Explanation of Common Operations:

- *Creating a DataFrame:*

import pandas as pd

# Example data

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],

'Age': [25, 30, 35, 40, 45],

'Salary': [50000, 60000, 75000, 80000, 55000]

# Creating a DataFrame

df = pd.DataFrame(data)

- *Selecting Data:*

# Selecting specific columns

selected_columns = df[['Name', 'Age']]

- *Filtering Data:*

# Filtering based on conditions (Age > 30)

filtered_data = df[df['Age'] > 30]

- Adding a New Column:

# Adding a new column (e.g., Bonus)

df['Bonus'] = df['Salary'] * 0.1

- *Aggregating Data:*

# Calculating mean salary

mean_salary = df['Salary'].mean()

OUTPUT
Name Age Salary

0 Alice 25 50000

1 Bob 30 60000

2 Charlie 35 75000

3 David 40 80000

4 Emily 45 55000

Name Age

0 Alice 25
1 Bob 30

2 Charlie 35

3 David 40

4 Emily 45

Name Age Salary

2 Charlie 35 75000

3 David 40 80000

4 Emily 45 55000

Name Age Salary Bonus

0 Alice 25 50000 5000.0

1 Bob 30 60000 6000.0

2 Charlie 35 75000 7500.0

3 David 40 80000 8000.0

4 Emily 45 55000 5500.0

68800.0

### 2. Handling
Missing Data
### Explanation of Functions:

- *.isnull()*:

- *Syntax:* DataFrame.isnull()
- *Explanation:* Returns a boolean DataFrame indicating where
values are NaN (missing).

- *.dropna()*:

- Syntax: DataFrame.dropna(axis=0, how='any', thresh=None,

subset=None, inplace=False)

- Explanation: Drops rows or columns with missing values

(NaN).

- *.fillna()*:

- Syntax: DataFrame.fillna(value=None, method=None,

axis=None, inplace=False, limit=None, downcast=None)

- Explanation: Fills missing values (NaN) using specified

methods or values.

- *.MultiIndex.from_tuples()*:

- Syntax: pd.MultiIndex.from_tuples(tuples, names=None)

- Explanation: Creates a multi-level index object from tuples.

- *.loc[]*:

- Syntax: DataFrame.loc[label] or DataFrame.loc[row_label,

column_label]

- Explanation: Accesses a group of rows and columns by

label(s) or a boolean array.

- Creating a DataFrame with Missing Values:

data_missing = {

'A': [1, 2, None, 4, 5],

'B': [10, None, 30, 40, 50],

'C': ['a', 'b', None, 'd', 'e']

df_missing = pd.DataFrame(data_missing)

- Detecting Missing Values:

# Detecting missing values

is_null = df_missing.isnull()

- Dropping Rows with Any Missing Values:

# Dropping rows with any missing values

df_cleaned = df_missing.dropna()

- Filling Missing Values with a Specified Value:

# Filling missing values with 0

df_filled = df_missing.fillna(value=0)

output
A B C

0 1.0 10.0 a

1 2.0 NaN b

2 NaN 30.0 None

3 4.0 40.0 d

4 5.0 50.0 e

A B C

0 False False False

1 False True False

2 True False True

3 False False False

4 False False False

A B C

0 1.0 10.0 a

3 4.0 40.0 d

4 5.0 50.0 e

A B C

0 1.0 10.0 a

1 2.0 0.0 b

2 0.0 30.0 0

3 4.0 40.0 d

4 5.0 50.0 e

### 3. Hierarchical
Indexing
#### Syntax and Explanation of Hierarchical Indexing:

- Creating a DataFrame with Hierarchical Indexing:

# Creating a DataFrame with hierarchical indexing

index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1), ('B',

2)], names=['Letter', 'Number'])

df_hierarchical = pd.DataFrame({'Values': [10, 20, 30, 40]},

index=index)

- Accessing Data with Hierarchical Indexing:

# Accessing data with hierarchical indexing

value_A1 = df_hierarchical.loc[('A', 1), 'Values']

- Aggregating Data with Hierarchical Indexing:

# Aggregating data with hierarchical indexing

sum_A_values = df_hierarchical.loc['A', 'Values'].sum()

OUTPUT
Values

Letter Number
A 1 10

2 20

B 1 30

2 40

These explanations and syntaxes should give you a clear

understanding of how to use these Pandas functions and methods
effectively for data manipulation, handling missing data, and
working with hierarchical indexing.

## Vectorized String
Operations
Vectorized string operations in pandas allow you to efficiently
apply string methods to entire columns or Series of data. This is
particularly useful when you need to clean or transform text data
in bulk. Here are some key points:

1. Series.str Methods: Pandas provides a str accessor that

exposes a set of string methods, similar to Python's built-in string
methods. These methods can be accessed through Series.str.

import pandas as pd
data = {'name': ['Alice', 'Bob', 'Charlie'],

'city': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)

# Example: Convert names to uppercase

df['name_upper'] = df['name'].str.upper()

2. Handling Missing Values: Vectorized string operations

gracefully handle missing values (NaN) without raising errors,
which simplifies data cleaning workflows.

# Example: Replace missing cities with a default value

df['city'] = df['city'].str.replace('NaN', 'Unknown')

3. Regular Expressions: String methods in pandas support

regular expressions (regex=True), enabling complex pattern
matching and extraction tasks.

# Example: Extract the first name using regular expressions

df['first_name'] = df['name'].str.extract(r'^(\w+)')

OUTPUT

name city name _upper first _name

0 Alice New York ALICE Alice

1 Bob Los Angeles BOB Bob

2 Charlie Chicago CHARLIE Charlie

### Working with

Time Series
Pandas provides robust support for working with time series
data, including powerful tools for manipulating, analyzing, and
visualizing time-indexed data. Here are some key features:

1. DateTime Index: Pandas has a specialized DatetimeIndex to

efficiently handle time-series data.

import pandas as pd

import numpy as np

# Create a time series with a datetime index

dates = pd.date_range('2024-01-01', periods=10)

ts = pd.Series(np.random.randn(10), index=dates)

2. Resampling and Frequency Conversion: Easily change the

frequency of your time series data using resample().

# Resample to monthly frequency

ts_monthly = ts.resample('M').mean()
3. *Time Zone Handling*: Pandas supports time zone localization
and conversion operations.

# Convert to a different time zone

ts_utc = ts.tz_localize('UTC')

ts_ny = ts_utc.tz_convert('America/New_York')

4. Time Series Plotting: Pandas integrates with Matplotlib to

provide convenient plotting methods for time series data.

import matplotlib.pyplot as plt

# Plot the time series

ts.plot()

plt.show()

OUTPUT
2024-01-01 -0.432560

2024-01-02 -0.173636

2024-01-03 0.293211

2024-01-04 0.047759

2024-01-05 0.991461

2024-01-06 0.914069

2024-01-07 0.281746

2024-01-08 0.647789

2024-01-09 0.151357

2024-01-10 0.443611

Freq: D, dtype: float64

ts_monthly:

2024-01-31 0.234511

2024-02-29 0.434195

Freq: M, dtype: float64

ts_utc:

2024-01-01 00:00:00+00:00 -0.432560

2024-01-02 00:00:00+00:00 -0.173636

2024-01-03 00:00:00+00:00 0.293211

2024-01-04 00:00:00+00:00 0.047759

2024-01-05 00:00:00+00:00 0.991461

2024-01-06 00:00:00+00:00 0.914069

2024-01-07 00:00:00+00:00 0.281746

2024-01-08 00:00:00+00:00 0.647789

2024-01-09 00:00:00+00:00 0.151357

2024-01-10 00:00:00+00:00 0.443611

Freq: D, dtype: float64

ts_ny:

2024-01-01 05:00:00-05:00 -0.432560

2024-01-02 05:00:00-05:00 -0.173636

2024-01-03 05:00:00-05:00 0.293211

2024-01-04 05:00:00-05:00 0.047759

2024-01-05 05:00:00-05:00 0.991461

2024-01-06 05:00:00-05:00 0.914069

2024-01-07 05:00:00-05:00 0.281746

2024-01-08 05:00:00-05:00 0.647789

2024-01-09 05:00:00-05:00 0.151357

2024-01-10 05:00:00-05:00 0.443611

Freq: D, dtype: float64

|
|

1.0 | *

0.5 | * *

0.0 | * *

-0.5 | *

-1.0 | * *

-1.5 | *

-2.0 | * *

+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+

2024-01-01 2024-01-02 2024-01-03 2024-01-04 2024-01-05 2024-01-06 2024-01-07 2024-

01-08 2024-01-09 2024-01-10
### High-
Performance Pandas:
eval() and query()
Pandas provides eval() and query() functions for high-
performance operations, especially useful for large datasets.
These functions leverage Numexpr library under the hood for
efficient computation.

1. eval(): Performs expression evaluation on DataFrame

columns.

# Example: Compute a new column using eval()

df.eval('total = column1 + column2', inplace=True)

- Supports arithmetic operations (+, -, *, /), comparison

operators, and function calls.

- Useful for complex calculations involving large datasets where

performance is critical.

2. query(): Filters rows based on a boolean expression.

# Example: Filter rows using query()

filtered_df = df.query('column1 > 0 and column2 < 100')

- Provides a more readable and expressive syntax for filtering
compared to traditional boolean indexing.

- Can significantly improve performance for large DataFrames.

import pandas as pd

# Create a sample DataFrame

data = {'column1': [10, 20, 30, 40, 50],

'column2': [50, 40, 30, 20, 10]}

df = pd.DataFrame(data)

print("Original DataFrame:")

print(df)

# Compute a new column using eval()

df.eval('total = column1 + column2', inplace=True)

print("\nDataFrame after adding a new column using eval():")

print(df)

# Filter rows using query()

filtered_df = df.query('column1 > 20 and column2 < 40')

print("\nFiltered DataFrame using query():")

print(filtered_df)

OUTPUT

Original DataFrame:

column1 column2

0 10 50
1 20 40

2 30 30

3 40 20

4 50 10

DataFrame after adding a new column using eval():

column1 column2 total

0 10 50 60

1 20 40 60

2 30 30 60

3 40 20 60

4 50 10 60

Filtered DataFrame using query():

column1 column2 total

2 30 30 60

3 40 20 60

## 1. Concat and
Append
*Concatenation (pd.concat)*:

Concatenation is used to combine DataFrames along a particular

axis (either rows or columns).

import pandas as pd
# Example DataFrames

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],

'B': ['B0', 'B1', 'B2']})

df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'],

'B': ['B3', 'B4', 'B5']})

# Concatenate along rows (axis=0)

result = pd.concat([df1, df2])

print(result)

Output:

A B

0 A0 B0

1 A1 B1

2 A2 B2

0 A3 B3

1 A4 B4

2 A5 B5

*Appending (df.append)*:

Appending is used to add rows from one DataFrame to another

DataFrame.

import pandas as pd
# Example DataFrames

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],

'B': ['B0', 'B1', 'B2']})

df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'],

'B': ['B3', 'B4', 'B5']})

# Append df2 to df1

appended = df1.append(df2)

print(appended)

Output:

A B

0 A0 B0

1 A1 B1

2 A2 B2

0 A3 B3

1 A4 B4

2 A5 B5

### 2. Merge and

Join
*Merge (pd.merge)*:

Merging is used to combine DataFrames using one or more keys.

import pandas as pd

# Example DataFrames

df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],

'A': ['A0', 'A1', 'A2', 'A3']})

df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],

'B': ['B0', 'B1', 'B2', 'B3']})

# Merge on 'key' column

merged = pd.merge(df1, df2, on='key')

print(merged)

Output:

key A B

0 K0 A0 B0

1 K1 A1 B1

2 K2 A2 B2

3 K3 A3 B3

*Join (df.join)*:
Joining is used to combine columns of two DataFrames based on
index.

import pandas as pd

# Example DataFrames

left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],

'B': ['B0', 'B1', 'B2']},

index=['K0', 'K1', 'K2'])

right = pd.DataFrame({'C': ['C0', 'C1', 'C2'],

'D': ['D0', 'D1', 'D2']},

index=['K0', 'K1', 'K2'])

# Joining on index

joined = left.join(right)

print(joined)

Output:

A B C D

K0 A0 B0 C0 D0

K1 A1 B1 C1 D1

K2 A2 B2 C2 D2
### 3. Aggregation
and Grouping
*Aggregation (df.agg or groupby)*:

Aggregation allows you to perform calculations on groups of data.

import pandas as pd

# Example DataFrame

data = {'Category': ['A', 'B', 'A', 'B', 'A'],

'Value': [10, 20, 30, 40, 50]}

df = pd.DataFrame(data)

# Group by 'Category' and calculate mean of 'Value'

grouped = df.groupby('Category').agg({'Value': 'mean'})

print(grouped)

Output:

Value

Pivot tables allow you to summarize and aggregate data inside a

DataFrame.

import pandas as pd

# Example DataFrame

data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-

01-02'],

'Category': ['A', 'B', 'A', 'B'],

'Value': [10, 20, 30, 40]}

df = pd.DataFrame(data)

# Create a pivot table

pivot_table = df.pivot_table(index='Date', columns='Category',

values='Value', aggfunc='sum')

print(pivot_table)

Output:

Category A B

Date

2023-01-01 10.0 20.0

2023-01-02 30.0 40.0

These examples cover the basic usage of each concept in Pandas.
They should help you get started with manipulating and analyzing
data using these powerful tools.

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
RD050 Business Requirements Scenarios
No ratings yet
RD050 Business Requirements Scenarios
7 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
Exp 3
No ratings yet
Exp 3
10 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Python Unit 3 4
No ratings yet
Python Unit 3 4
92 pages
Pandas
No ratings yet
Pandas
26 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas
No ratings yet
Pandas
27 pages
Unit-03: Capturing, Preparing and Working With Data
No ratings yet
Unit-03: Capturing, Preparing and Working With Data
41 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
Pandas
No ratings yet
Pandas
94 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Experiment 678910
No ratings yet
Experiment 678910
12 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Unit-4 PSC
No ratings yet
Unit-4 PSC
105 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Ip Lab File Python
No ratings yet
Ip Lab File Python
9 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
7 Days Analytics Course 3feiz7 4
No ratings yet
7 Days Analytics Course 3feiz7 4
8 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Unit 3
No ratings yet
Unit 3
10 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Intro Pandas
No ratings yet
Intro Pandas
18 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas
No ratings yet
Pandas
20 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Unit 1 Python Pandas
No ratings yet
Unit 1 Python Pandas
20 pages
Dataframes-I (Create & Selection)
No ratings yet
Dataframes-I (Create & Selection)
10 pages
Unit - 4 - Part 2
No ratings yet
Unit - 4 - Part 2
36 pages
20 Pandas Codes To Master Data Analysis
No ratings yet
20 Pandas Codes To Master Data Analysis
3 pages
Pandas Dataframe All Operations 1735471870
No ratings yet
Pandas Dataframe All Operations 1735471870
4 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Session2-DM Using Pandas
No ratings yet
Session2-DM Using Pandas
51 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Functions: Built-In Functions in R
No ratings yet
Functions: Built-In Functions in R
10 pages
A12
No ratings yet
A12
1 page
Simple Calculator
No ratings yet
Simple Calculator
2 pages
Module 2
No ratings yet
Module 2
14 pages
Soft Skills & Leadership
No ratings yet
Soft Skills & Leadership
5 pages
Web Results: Purushapungava, Puru Apu Gava, Purusha-Pungava: 3 ..
No ratings yet
Web Results: Purushapungava, Puru Apu Gava, Purusha-Pungava: 3 ..
6 pages
Certified Network Security Administrator
No ratings yet
Certified Network Security Administrator
5 pages
DTC-320 StreamXpert Installation
No ratings yet
DTC-320 StreamXpert Installation
11 pages
Mobile and Wireless Communication Complete Lecture Notes #13
75% (4)
Mobile and Wireless Communication Complete Lecture Notes #13
17 pages
Cortizo (Unit 66 Unitized Façade) - 20-24283-3364-S (EN)
No ratings yet
Cortizo (Unit 66 Unitized Façade) - 20-24283-3364-S (EN)
1 page
SQL Queries Examples Manual Testing
100% (1)
SQL Queries Examples Manual Testing
7 pages
BCA Sem I and II Syllabus1
No ratings yet
BCA Sem I and II Syllabus1
32 pages
FIRST QUARTER TEST IN ENGLISH 6 (Ver2)
No ratings yet
FIRST QUARTER TEST IN ENGLISH 6 (Ver2)
7 pages
Linux Networking 101
100% (6)
Linux Networking 101
96 pages
Steps For UAN Registration
No ratings yet
Steps For UAN Registration
6 pages
S-Miles Cloud User Guide V.06
No ratings yet
S-Miles Cloud User Guide V.06
18 pages
DC-DC Converters
No ratings yet
DC-DC Converters
27 pages
Finacial Market & Management Syllabus
No ratings yet
Finacial Market & Management Syllabus
5 pages
Om MRP PDF
No ratings yet
Om MRP PDF
6 pages
My Ebook Obsession - Long Head: Search
No ratings yet
My Ebook Obsession - Long Head: Search
4 pages
SAP B1 Exercise Workbook
No ratings yet
SAP B1 Exercise Workbook
39 pages
Scribd 1
No ratings yet
Scribd 1
14 pages
Practical 7
No ratings yet
Practical 7
2 pages
Basic Business Statistics: 12 Edition
No ratings yet
Basic Business Statistics: 12 Edition
41 pages
RV2502AE Quick Start Guide EN
No ratings yet
RV2502AE Quick Start Guide EN
2 pages
SAP Email Server To Gmail Configuration
No ratings yet
SAP Email Server To Gmail Configuration
2 pages
Nuke VFX Compositing Course With Steve Wright
100% (1)
Nuke VFX Compositing Course With Steve Wright
3 pages
4g5 RPA
No ratings yet
4g5 RPA
34 pages
Administering Operations For Logs July2024
No ratings yet
Administering Operations For Logs July2024
161 pages
Cross Platform Development Using Flutter
No ratings yet
Cross Platform Development Using Flutter
6 pages
Microsoft Basic 8086xenix Reference PDF
No ratings yet
Microsoft Basic 8086xenix Reference PDF
167 pages
OWASP Top Ten Web Application Security Risks - OWASP
No ratings yet
OWASP Top Ten Web Application Security Risks - OWASP
4 pages
CGS402 Project1 Report
No ratings yet
CGS402 Project1 Report
14 pages
Government of Maharashtra State Common Entrance Test Cell, Mumbai
No ratings yet
Government of Maharashtra State Common Entrance Test Cell, Mumbai
2 pages

Pandas Introduction: What Is Python Pandas Used For?

Uploaded by

Pandas Introduction: What Is Python Pandas Used For?

Uploaded by

Pandas Introduction

Pandas is a powerful and open-source Python library. The Pandas library is

What is Python Pandas used for?

Here is a list of things that we can do using Pandas.

Data set cleaning, merging, and joining.

Easy handling of missing data (represented as NaN) in floating point as

Columns can be inserted and deleted from DataFrame and higher-

Powerful group by functionality for performing split-apply-combine

Getting Started with Pandas

Follow these steps to install Pandas:

Step 1: Type 'cmd' in the search box and open it.

Step 3: After locating it, type the command:

pip install pandas

Indexing Methods in Pandas:

Label-based indexing using .loc[]:

Integer-based indexing using .iloc[]:

Uses integer positions to select data. It's similar to Python's

Uses boolean vectors to filter data. It selects rows or columns

# Create a sample DataFrame

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],

'City': ['New York', 'Los Angeles', 'Chicago', 'Houston',

'Salary': [50000, 60000, 75000, 80000, 70000]

# Selecting a single column

# Selecting multiple columns

# Selecting rows by index label

print(df.loc[1]) # Select row with index label 1

# Selecting specific rows and columns

print(df.loc[[0, 2, 4], ['Name', 'City']]) # Rows 0, 2, 4 and columns

# Selecting rows by index position

print(df.iloc[3]) # Select row with index position 3 (zero-indexed)

# Selecting specific rows and columns by index position

print(df.iloc[[1, 3], [0, 2]]) # Rows 1 and 3, columns 0 and 2

Name: Name, dtype: object

Name: Bob, Age: 30, City: Los Angeles, Salary: 60000

0 Alice New York

Name: David, Age: 40, City: Houston, Salary: 80000

1 Bob Los Angeles

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],

'Age': [25, 30, 35, 40, 45],

'Salary': [50000, 60000, 75000, 80000, 55000]

# Selecting specific columns

selected_columns = df[['Name', 'Age']]

# Filtering based on conditions (Age > 30)

filtered_data = df[df['Age'] > 30]

- *Adding a New Column:*

# Adding a new column (e.g., Bonus)

df['Bonus'] = df['Salary'] * 0.1

# Calculating mean salary

Name Age Salary

Name Age Salary Bonus

0 Alice 25 50000 5000.0

1 Bob 30 60000 6000.0

2 Charlie 35 75000 7500.0

3 David 40 80000 8000.0

4 Emily 45 55000 5500.0

- *Syntax:* DataFrame.dropna(axis=0, how='any', thresh=None,

- *Explanation:* Drops rows or columns with missing values

- *Syntax:* DataFrame.fillna(value=None, method=None,

- *Explanation:* Fills missing values (NaN) using specified

- *Syntax:* pd.MultiIndex.from_tuples(tuples, names=None)

- *Explanation:* Creates a multi-level index object from tuples.

- *Syntax:* DataFrame.loc[label] or DataFrame.loc[row_label,

- *Explanation:* Accesses a group of rows and columns by

- *Creating a DataFrame with Missing Values:*

'A': [1, 2, None, 4, 5],

'B': [10, None, 30, 40, 50],

'C': ['a', 'b', None, 'd', 'e']

- *Detecting Missing Values:*

# Detecting missing values

- *Dropping Rows with Any Missing Values:*

# Dropping rows with any missing values

- *Filling Missing Values with a Specified Value:*

# Filling missing values with 0

2 NaN 30.0 None

0 False False False

1 False True False

- Adding a New Column:

- Syntax: DataFrame.dropna(axis=0, how='any', thresh=None,

- Explanation: Drops rows or columns with missing values

- Syntax: DataFrame.fillna(value=None, method=None,

- Explanation: Fills missing values (NaN) using specified

- Syntax: pd.MultiIndex.from_tuples(tuples, names=None)

- Explanation: Creates a multi-level index object from tuples.

- Syntax: DataFrame.loc[label] or DataFrame.loc[row_label,

- Explanation: Accesses a group of rows and columns by

- Creating a DataFrame with Missing Values:

- Detecting Missing Values:

- Dropping Rows with Any Missing Values:

- Filling Missing Values with a Specified Value:

- Creating a DataFrame with Hierarchical Indexing:

- Accessing Data with Hierarchical Indexing:

- Aggregating Data with Hierarchical Indexing:

1. Series.str Methods: Pandas provides a str accessor that

2. Handling Missing Values: Vectorized string operations

3. Regular Expressions: String methods in pandas support

1. DateTime Index: Pandas has a specialized DatetimeIndex to

2. Resampling and Frequency Conversion: Easily change the

4. Time Series Plotting: Pandas integrates with Matplotlib to

1. eval(): Performs expression evaluation on DataFrame

2. query(): Filters rows based on a boolean expression.