[go: up one dir, main page]

0% found this document useful (0 votes)
13 views23 pages

UNIT II Notes (1)

The document provides an introduction to the Pandas library in Python, covering data structures such as Series and DataFrames, data importing and exporting, data cleaning, and manipulation techniques. It includes installation instructions, code examples for creating and accessing data structures, and methods for handling missing data and duplicates. Additionally, it outlines how to read from and write to various file formats like CSV, Excel, and JSON.

Uploaded by

Sneha Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views23 pages

UNIT II Notes (1)

The document provides an introduction to the Pandas library in Python, covering data structures such as Series and DataFrames, data importing and exporting, data cleaning, and manipulation techniques. It includes installation instructions, code examples for creating and accessing data structures, and methods for handling missing data and duplicates. Additionally, it outlines how to read from and write to various file formats like CSV, Excel, and JSON.

Uploaded by

Sneha Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIT II

Introduction to Pandas

Introduction to Pandas: Understanding data structures in Pandas (Series, Data Frames).


Data Importing and Exporting: Reading data from CSV, Excel, and other file formats,
writing data to files.
Data Cleaning: Handling missing data, removing duplicates, and data filtering.
Data Manipulation: Sorting, indexing, and grouping data, merging and concatenating
Data Frames.
Working with Dates and Times: Handling date and time data, extracting and modifying date and time
components.

To begin using pandas in Python on your system, follow these steps:

✅ 1. Install Python (if not already installed)

Check if Python is installed:

python --version

If not installed, download and install it from python.org.

✅ 2. Install pandas

Open a terminal (Command Prompt, PowerShell, or a terminal on macOS/Linux) and run:

pip install pandas

If you're using Jupyter or Anaconda, pandas is often pre-installed.

✅ 3. Verify Installation

Open a Python shell or script and try importing pandas:

import pandas as pd
print(pd.__version__)

✅ 4. Write Your First pandas Code


Create a file like test_pandas.py and add:

import pandas as pd

# Create a simple DataFrame


data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
}
df = pd.DataFrame(data)

print(df)

Run it:

python test_pandas.py

✅ 5. Next Steps to Learn

 Load CSV: pd.read_csv('yourfile.csv')


 Explore data: df.head(), df.describe(), df.info()
 Manipulate columns: df['Age'] + 1, df['Name'].str.upper()

Would you like a beginner-friendly project or tutorial to follow next?

Introduction to Pandas: Understanding Data Structures in Pandas (Series,


DataFrames)

Pandas is one of the most popular libraries in Python for data manipulation and analysis. It
provides efficient and easy-to-use data structures for handling and analyzing structured data. The
primary data structures in Pandas are Series and DataFrame, each serving a unique purpose in
working with data.

1. Pandas Series

A Series is a one-dimensional labeled array that can hold any data type (integers, strings, floats,
etc.). It's similar to a list or an array but with additional functionality provided by Pandas, such as
labels (indices) that allow easy access to the data.

Key Features of Series:


 Homogeneous data: All elements in a Series are of the same type.
 Indexing: Series comes with an implicit or explicit index.
 Can hold any data type: It can store integers, floats, strings, etc.

Creating a Series:

import pandas as pd

# Creating a Series with a list of integers


data = [1, 2, 3, 4]
series = pd.Series(data)
print(series)

Output:

0 1
1 2
2 3
3 4
dtype: int64

Creating a Series with Custom Index:

data = [10, 20, 30, 40]


index = ['a', 'b', 'c', 'd']
series = pd.Series(data, index=index)
print(series)

Output:

a 10
b 20
c 30
d 40
dtype: int64

2. Pandas DataFrame

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data


structure with labeled axes (rows and columns). It’s essentially a collection of Series, where each
column is a Series that can hold different types of data.

Key Features of DataFrame:

 Two-dimensional: It has both rows and columns.


 Heterogeneous data: Different columns can hold different data types (e.g., one column
can have integers, another can have strings, etc.).
 Indexing: Similar to Series, DataFrames can have both row and column labels (indices).

Creating a DataFrame:
import pandas as pd

# Creating a DataFrame from a dictionary


data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)
print(df)

Output:

Name Age City


0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

Creating a DataFrame from a List of Lists (with Column Names):

data = [
['Alice', 25, 'New York'],
['Bob', 30, 'Los Angeles'],
['Charlie', 35, 'Chicago'],
['David', 40, 'Houston']
]

df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])


print(df)

Output:

Name Age City


0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

Accessing Data in a DataFrame:

 Accessing Columns:
 print(df['Name'])

Output:

0 Alice
1 Bob
2 Charlie
3 David
Name: Name, dtype: object
 Accessing Rows (using .iloc[] or .loc[]):
 print(df.iloc[1]) # Accessing the second row (index 1)

Output:

Name Bob
Age 30
City Los Angeles
Name: 1, dtype: object
print(df.loc[1]) # Accessing the second row (with label 1)

Output:

Name Bob
Age 30
City Los Angeles
Name: 1, dtype: object

Key Differences between Series and DataFrame:

Feature Series DataFrame


Structure 1D array with an index 2D table (rows and columns)
Homogeneous (all values are the same Heterogeneous (columns can have different
Data
type) types)
Indexing Single axis (index) Two axes (rows and columns)

Conclusion:

 Series is a simple, one-dimensional labeled array, useful for handling a single column of
data.
 DataFrame is a powerful, two-dimensional structure that can handle multiple columns
and rows, and is more suited for working with tabular data.

These data structures, Series and DataFrame, are the foundation of data manipulation and
analysis with Pandas, allowing you to efficiently work with real-world datasets.

Data Importing and Exporting in Pandas

Pandas makes it very easy to read data from different file formats and save it back to various
formats. Whether your data is stored in a CSV, Excel, JSON, or other formats, Pandas provides
efficient methods to import, manipulate, and export data.

Here's an overview of how to read and write data using Pandas.


1. Reading Data from Files

1.1 Reading from CSV Files

CSV (Comma Separated Values) is one of the most common formats for storing tabular data.
Pandas provides the read_csv() function to load data from CSV files.

Example:
import pandas as pd

# Reading from a CSV file


df = pd.read_csv('file.csv')

# Displaying the first 5 rows


print(df.head())
Common Parameters:

 filepath_or_buffer: The path to the CSV file.


 sep: The delimiter (default is ,, but can be changed for different formats).
 header: Row number(s) to use as column names (default is 0).
 index_col: Column to set as the index.
 usecols: Specify a subset of columns to read.

1.2 Reading from Excel Files

Excel files are another common format for data storage. Pandas provides the read_excel()
function for reading Excel files. You will need the openpyxl or xlrd library installed to handle
Excel files.

Example:
import pandas as pd

# Reading from an Excel file (single sheet)


df = pd.read_excel('file.xlsx')

# Displaying the first 5 rows


print(df.head())
Common Parameters:

 sheet_name: Name of the sheet to read (defaults to the first sheet).


 usecols: Columns to read.
 header: Row number(s) to use as column names.
1.3 Reading from JSON Files

JSON (JavaScript Object Notation) is a lightweight data interchange format. You can use the
read_json() function to load data from a JSON file.

Example:
import pandas as pd

# Reading from a JSON file


df = pd.read_json('file.json')

# Displaying the first 5 rows


print(df.head())
Common Parameters:

 orient: String indicating the format of the JSON data.


 typ: Type of object to return (frame for DataFrame, series for Series).

1.4 Reading from SQL Databases

Pandas can read data directly from SQL databases using the read_sql() function. You'll need a
connection to your database, and Pandas will execute a query and return the result as a
DataFrame.

Example:
import pandas as pd
import sqlite3

# Create a database connection


conn = sqlite3.connect('database.db')

# Reading data from an SQL query


df = pd.read_sql('SELECT * FROM table_name', conn)

# Displaying the first 5 rows


print(df.head())
Common Parameters:

 sql: The SQL query or table name to read from.


 con: The database connection object.

2. Writing Data to Files

Pandas also makes it easy to save your DataFrame to different file formats, such as CSV, Excel,
JSON, and more.
2.1 Writing to CSV Files

To export a DataFrame to a CSV file, you can use the to_csv() method.

Example:
import pandas as pd

# Writing data to a CSV file


df.to_csv('output.csv', index=False)
Common Parameters:

 path_or_buffer: The file path to write to.


 sep: The delimiter (default is ,).
 index: Whether to write row names (index) (default is True).
 header: Whether to write column names (default is True).

2.2 Writing to Excel Files

To export a DataFrame to an Excel file, you can use the to_excel() method. You need the
openpyxl library installed for .xlsx files.

Example:
import pandas as pd

# Writing to an Excel file (single sheet)


df.to_excel('output.xlsx', index=False)

# Writing to an Excel file with multiple sheets


with pd.ExcelWriter('output_multiple_sheets.xlsx') as writer:
df.to_excel(writer, sheet_name='Sheet1')
df.to_excel(writer, sheet_name='Sheet2')
Common Parameters:

 excel_writer: The path or an ExcelWriter object to save the file.


 index: Whether to write row names (index) (default is True).
 sheet_name: Name of the sheet to write to.

2.3 Writing to JSON Files

To save a DataFrame to a JSON file, you can use the to_json() method.

Example:
import pandas as pd

# Writing to a JSON file


df.to_json('output.json', orient='records', lines=True)
Common Parameters:

 orient: Determines the format of the JSON data (options: 'split', 'records', 'index',
'columns').
 lines: Whether to write each record on a separate line (default is False).

2.4 Writing to SQL Databases

You can write data from a DataFrame to an SQL database using the to_sql() method. It
requires a connection object to the database.

Example:
import pandas as pd
import sqlite3

# Create a database connection


conn = sqlite3.connect('database.db')

# Writing data to an SQL table


df.to_sql('table_name', conn, if_exists='replace', index=False)
Common Parameters:

 name: The name of the table to write to.


 con: The database connection object.
 if_exists: What to do if the table already exists ('fail', 'replace', 'append').
 index: Whether to write row names (index) (default is True).

Summary of Common File Formats and Functions:

File Format Function to Read Function to Write

CSV pd.read_csv() df.to_csv()

Excel pd.read_excel() df.to_excel()

JSON pd.read_json() df.to_json()

SQL pd.read_sql() df.to_sql()

Conclusion:
Pandas provides a versatile set of functions to read data from a variety of file formats (CSV,
Excel, JSON, SQL) and export data back to these formats. This ability to handle different data
sources seamlessly is one of the key strengths of Pandas in data analysis and manipulation tasks.

Data Cleaning in Pandas: Handling Missing Data, Removing Duplicates, and


Data Filtering

Data cleaning is an essential step in the data analysis process. Raw data often contains missing or
duplicate values, as well as other inconsistencies that can skew analysis. Pandas provides a
variety of functions to handle these issues, enabling effective data cleaning and preparation.

In this section, we'll cover:

1. Handling Missing Data


2. Removing Duplicates
3. Data Filtering

1. Handling Missing Data

Missing data can occur due to various reasons (e.g., not recorded, data entry errors). Pandas
provides several methods to handle missing values (NaN), allowing you to either fill them with
certain values or drop them entirely.

1.1 Detecting Missing Data

You can detect missing data in a DataFrame using isnull() or notnull() methods.

import pandas as pd

# Sample DataFrame with missing data


data = {'Name': ['Alice', 'Bob', 'Charlie', None],
'Age': [25, None, 35, 40],
'City': ['New York', 'Los Angeles', None, 'Houston']}

df = pd.DataFrame(data)

# Check for missing values


print(df.isnull()) # True if data is missing

Output:

Name Age City


0 False False False
1 False True False
2 False False True
3 True False False

1.2 Dropping Missing Data

You can drop rows or columns with missing values using the dropna() method.

 Drop rows with any missing data:

df_cleaned = df.dropna() # Drops any row with NaN values


print(df_cleaned)

 Drop rows where specific columns have missing data:

df_cleaned = df.dropna(subset=['Age']) # Drops rows where 'Age' is missing


print(df_cleaned)

 Drop columns with any missing data:

df_cleaned = df.dropna(axis=1) # Drops columns with NaN values


print(df_cleaned)

1.3 Filling Missing Data

You can fill missing data with a specific value using fillna().

 Fill missing values with a constant value:

df_filled = df.fillna(value={'Age': 30, 'City': 'Unknown'})


print(df_filled)

 Fill missing values using forward-fill or backward-fill:

df_filled = df.fillna(method='ffill') # Forward fill (propagate previous


values)
df_filled = df.fillna(method='bfill') # Backward fill (propagate next values)
print(df_filled)

 Fill missing values with the mean, median, or mode:

df['Age'] = df['Age'].fillna(df['Age'].mean()) # Fill with mean value of


'Age'
print(df)

2. Removing Duplicates

Duplicate data can arise from data entry errors or merging data from multiple sources. Pandas
offers a simple way to identify and remove duplicates from your DataFrame.
2.1 Identifying Duplicates

You can detect duplicate rows using the duplicated() method, which returns a boolean Series
indicating whether each row is a duplicate.

# Sample DataFrame with duplicate rows


data = {'Name': ['Alice', 'Bob', 'Alice', 'Charlie'],
'Age': [25, 30, 25, 35],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago']}

df = pd.DataFrame(data)

# Check for duplicates


print(df.duplicated()) # True if row is a duplicate

Output:

0 False
1 False
2 True
3 False
dtype: bool

2.2 Removing Duplicates

You can remove duplicates from your DataFrame using the drop_duplicates() method.

 Remove all duplicates:

df_unique = df.drop_duplicates() # Removes all duplicate rows


print(df_unique)

 Remove duplicates based on specific columns:

df_unique = df.drop_duplicates(subset=['Name']) # Remove rows with duplicate


'Name'
print(df_unique)

 Keep the last occurrence of the duplicates:

df_unique = df.drop_duplicates(keep='last') # Keeps the last occurrence of


each duplicate
print(df_unique)

3. Data Filtering

Data filtering allows you to select rows from a DataFrame based on certain conditions.

3.1 Filtering by Condition


You can filter rows based on conditions using boolean indexing.

 Filter rows where the 'Age' is greater than 30:

df_filtered = df[df['Age'] > 30]


print(df_filtered)

 Filter rows where 'City' is 'New York':

df_filtered = df[df['City'] == 'New York']


print(df_filtered)

 Multiple conditions with & (AND) or | (OR):

df_filtered = df[(df['Age'] > 25) & (df['City'] == 'New York')]


print(df_filtered)

 Using isin() for filtering specific values:

df_filtered = df[df['City'].isin(['New York', 'Chicago'])]


print(df_filtered)

3.2 Filtering by String Methods

Pandas provides string functions to filter rows based on string patterns (like contains,
startswith, endswith).

 Filter rows where 'City' contains 'New':

df_filtered = df[df['City'].str.contains('New')]
print(df_filtered)

 Filter rows where 'Name' starts with 'A':

df_filtered = df[df['Name'].str.startswith('A')]
print(df_filtered)

Summary of Key Functions:

Task Function
Detect Missing Data isnull(), notnull()
Drop Missing Data dropna()
Fill Missing Data fillna()
Detect Duplicates duplicated()
Remove Duplicates drop_duplicates()
Filter by Condition Boolean indexing (df[condition])
Task Function
Filter by String str.contains(), str.startswith()

Conclusion:

Data cleaning in Pandas is a crucial step in preparing data for analysis. By handling missing data,
removing duplicates, and applying filters, you can ensure that your dataset is accurate,
consistent, and ready for further analysis. Pandas provides efficient and flexible tools for each of
these tasks, making data cleaning fast and straightforward.

Data Manipulation in Pandas: Sorting, Indexing, Grouping, Merging, and


Concatenating DataFrames

Pandas is powerful when it comes to manipulating data. Whether you need to sort your data,
group it based on certain columns, merge data from different sources, or concatenate multiple
datasets, Pandas offers various methods that make these tasks easy and efficient.

In this guide, we’ll go over the following topics:

1. Sorting Data
2. Indexing Data
3. Grouping Data
4. Merging DataFrames
5. Concatenating DataFrames

1. Sorting Data

Sorting data is essential to analyze patterns or prepare data for visualizations or reporting.

1.1 Sorting by Index

You can sort a DataFrame by its index using the sort_index() method.

 Sort by index (ascending):

import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]}

df = pd.DataFrame(data, index=['a', 'd', 'c', 'b'])

# Sorting by index
df_sorted = df.sort_index(ascending=True)
print(df_sorted)

Output:

Name Age
a Alice 25
b David 40
c Charlie 35
d Bob 30

1.2 Sorting by Column(s)

You can sort by one or more columns using the sort_values() method.

 Sort by a single column (ascending):

df_sorted = df.sort_values(by='Age', ascending=True)


print(df_sorted)

Output:

Name Age
a Alice 25
d Bob 30
c Charlie 35
b David 40

 Sort by multiple columns:

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],


'Age': [25, 30, 25, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}

df = pd.DataFrame(data)

df_sorted = df.sort_values(by=['Age', 'Name'], ascending=[True, False]) #


Sort by 'Age' (asc), then 'Name' (desc)
print(df_sorted)

Output:

Name Age City


0 Alice 25 New York
2 Charlie 25 Chicago
1 Bob 30 Los Angeles
3 David 40 Houston

2. Indexing Data

Indexing refers to the ability to access rows or columns based on their labels or positions.

2.1 Accessing Columns

You can access columns directly as attributes or by using the column name.

 Access a column by name:

print(df['Name']) # Access the 'Name' column

 Access a column as an attribute (if the column name is a valid attribute):

print(df.Name)

2.2 Accessing Rows by Label (loc)

Use .loc[] to access rows by their index label.

 Access a row by index label:

print(df.loc[0]) # Access the first row (index 0)

2.3 Accessing Rows by Position (iloc)

Use .iloc[] to access rows by their position (integer-based indexing).

 Access a row by integer position:

print(df.iloc[1]) # Access the second row (position 1)

 Access multiple rows (range):

print(df.iloc[1:3]) # Access rows at positions 1 and 2

2.4 Setting a New Index

You can set a column as the index of the DataFrame using the set_index() method.

 Set 'Name' column as the index:

df_indexed = df.set_index('Name')
print(df_indexed)
Output:

Age City
Name
Alice 25 New York
Bob 30 Los Angeles
Charlie 25 Chicago
David 40 Houston

2.5 Resetting the Index

To reset the index, you can use the reset_index() method.

df_reset = df_indexed.reset_index()
print(df_reset)

3. Grouping Data

Grouping is useful for performing aggregation operations like sum, mean, count, etc., based on
some criteria.

3.1 Basic Grouping

You can use the groupby() method to group data based on one or more columns.

 Group by 'Age' and calculate the mean of each group:

df_grouped = df.groupby('Age').mean() # Group by 'Age' and calculate mean for


each group
print(df_grouped)

Output:

Age
Age
25 25.0
30 30.0
35 35.0
40 40.0

3.2 Multiple Aggregations

You can apply multiple aggregation functions on the grouped data.

 Group by 'Age' and apply multiple functions:

df_grouped = df.groupby('Age').agg({'City': 'first', 'Name': 'count'})


print(df_grouped)
Output:

City Name
Age
25 New York 1
30 Los Angeles 1
35 Chicago 1
40 Houston 1

3.3 Using groupby() with Multiple Columns

You can group by multiple columns to create subgroups.

 Group by 'Age' and 'City':

df_grouped = df.groupby(['Age', 'City']).size()


print(df_grouped)

Output:

Age City
25 Chicago 1
New York 1
30 Los Angeles 1
35 Chicago 1
40 Houston 1
dtype: int64

4. Merging DataFrames

Merging DataFrames is a common operation when you want to combine data from multiple
sources. Pandas provides the merge() function for this purpose, similar to SQL joins.

4.1 Merge on a Single Key

You can merge DataFrames using the merge() function by specifying a common column (key).

 Merge two DataFrames on a common column (e.g., 'Name'):

df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],


'Age': [25, 30, 35]})

df2 = pd.DataFrame({'Name': ['Alice', 'Bob', 'David'],


'City': ['New York', 'Los Angeles', 'Houston']})

df_merged = pd.merge(df1, df2, on='Name')


print(df_merged)

Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles

4.2 Merge with Different Keys

You can merge DataFrames with different column names using left_on and right_on.

df_merged = pd.merge(df1, df2, left_on='Name', right_on='Name', how='inner')

4.3 Types of Joins

You can specify the type of join using the how parameter:

 'left': Left join


 'right': Right join
 'outer': Outer join (union)
 'inner': Inner join (intersection)

5. Concatenating DataFrames

You can concatenate multiple DataFrames along a particular axis (either row-wise or column-
wise) using the concat() function.

5.1 Concatenate Row-wise

To stack DataFrames vertically (add more rows):

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})


df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})

df_concat = pd.concat([df1, df2], axis=0, ignore_index=True)


print(df_concat)

Output:

Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
3 David 40

5.2 Concatenate Column-wise

To concatenate DataFrames horizontally (add more columns):

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})


df2 = pd.DataFrame({'City': ['New York', 'Los Angeles']})

df_concat = pd.concat([df1, df2], axis=1)


print(df_concat)

Output:

Name Age City


0 Alice 25 New York
1 Bob 30 Los Angeles

Summary of Key Functions:

Operation Function
Sorting sort_index(), sort_values()
Indexing loc[], iloc[], set_index()
Grouping groupby(), agg(), size()
Merging DataFrames merge()
Concatenating DataFrames concat()

Conclusion:

Pandas provides a variety of powerful tools to manipulate and transform data. Sorting, indexing,
grouping, merging, and concatenating DataFrames are common tasks that allow you to clean,
organize, and analyze data more effectively. With these tools, you can perform complex data
operations in just a few lines of code, making Pandas an essential library for data manipulation in
Python.

In Pandas, working with dates and times is made easy with the datetime functionality, which
includes converting strings to datetime objects, extracting components like year, month, day,
etc., and performing operations on date and time data.

1. Converting Strings to Dates/Times

To convert strings to datetime objects, you can use the pd.to_datetime() function. This will
automatically recognize the date and time format.

import pandas as pd

# Example string date


date_str = '2025-03-11 14:25:30'

# Convert to datetime object


date_time = pd.to_datetime(date_str)

print(date_time)
2. Extracting Date and Time Components

Once you have a datetime object, you can easily extract individual components like the year,
month, day, etc.

# Assuming date_time is a pandas datetime object


print(date_time.year) # Year
print(date_time.month) # Month
print(date_time.day) # Day
print(date_time.hour) # Hour
print(date_time.minute) # Minute
print(date_time.second) # Second

3. Handling Series of Dates and Times

When working with a DataFrame or Series that contains date and time data, you can apply the
same functions across the column. For example:

data = {'date': ['2025-03-01 10:00:00', '2025-03-02 12:30:00', '2025-03-03


14:45:00']}
df = pd.DataFrame(data)

# Convert to datetime
df['date'] = pd.to_datetime(df['date'])

# Extract Year, Month, Day from the datetime column


df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['hour'] = df['date'].dt.hour

print(df)

4. Adding or Subtracting Time

You can use the timedelta class to add or subtract time from a datetime object. For example,
adding one week to a date:

from datetime import timedelta

# Add 7 days to the date


new_date = date_time + timedelta(days=7)
print(new_date)

5. Date and Time Arithmetic

You can perform date and time arithmetic to find the difference between two datetime objects.
The result will be a Timedelta object.

date1 = pd.to_datetime('2025-03-01')
date2 = pd.to_datetime('2025-03-11')
# Find difference
difference = date2 - date1
print(difference) # Output: 10 days

6. Handling Time Zones

You can also handle time zones by using the tz_localize() and tz_convert() functions.

# Localize to a specific timezone


date_time = pd.to_datetime('2025-03-11 14:25:30')
date_time = date_time.tz_localize('UTC')

# Convert to another time zone


date_time_ny = date_time.tz_convert('US/Eastern')
print(date_time_ny)

7. Formatting Dates/Times

To format a datetime object as a string, use the strftime() method. This allows you to
customize the output format.

# Format datetime as string


formatted_date = date_time.strftime('%Y-%m-%d %H:%M:%S')
print(formatted_date)

8. Handling Missing Dates

If a date or time value is missing, Pandas uses NaT (Not a Time) to represent it, similar to how
NaN works for numerical values.

# Handle missing dates


df['date'] = pd.to_datetime(df['date'], errors='coerce') # Invalid dates
become NaT

9. Working with Date Ranges

Pandas also provides functionality to create a range of dates with date_range().

# Generate a range of dates


date_range = pd.date_range(start='2025-03-01', end='2025-03-10', freq='D')
print(date_range)

This covers a lot of basic and advanced operations you might need when working with dates and
times in Pandas! Let me know if you'd like to dive deeper into any specific aspect of working
with dates.

You might also like