UNIT 5:
REAL TIME DATA ANALYSIS
VISHNU PRIYA P M | V BCA 1
TIME SERIES
Time series analysis is a crucial part of data analysis, and Python provides several tools and
libraries for working with time series data.
Time series analysis involves analyzing data points that are collected or recorded
over time at regular intervals, such as daily stock prices, hourly weather data, or
yearly sales figures. The goal is to understand patterns, trends, or make predictions
based on the historical data.
What is Time Series Data?
•Time series data is data that is recorded over time, often at regular intervals (e.g., daily,
weekly, monthly).
•Examples include:
• Stock prices recorded every day.
• Temperature readings every hour.
• Sales numbers each month.
Why Analyze
VISHNU PRIYA Time Series Data?
P M | V BCA 2
•To identify trends (e.g., a steady rise in sales over years).
•To detect seasonal patterns (e.g., sales peaking every holiday season).
Date and Time Data Types and Tools:
Datetime Module:
Python's standard library includes the datetime module, which provides classes for working
with dates and times. You can use the datetime class to represent both date and time
information.
from datetime import datetime
now = datetime.now() # Current date and time
print(now)
Date and Time Objects:
The datetime module includes date and time classes that allow you to work with just the date
or time portion.
from datetime import date, time
my_date = date(2023, 10, 24)
my_time = time(15, 30)
VISHNU PRIYA P M | V BCA 3
String to Datetime Conversion:
You can convert a string representing a date and time to a datetime object using the strptime
method.
date_str = "2023-10-24"
datetime_obj = datetime.strptime(date_str, "%Y-%m-%d")
Datetime to String Conversion:
To convert a datetime object back to a string, you can use the strftime method.
formatted_date = datetime_obj.strftime("%Y-%m-%d")
VISHNU PRIYA P M | V BCA 4
TIME SERIES BASICS:
Time series data consists of data points collected or recorded at successive, equally
spaced time intervals. Common examples of time series data include stock prices,
temperature measurements, and website traffic. Here are some fundamental concepts
and tools for working with time series data in Python:
Pandas: The Pandas library is a powerful tool for working with time series data. It
provides data structures like DataFrame and Series that are ideal for organizing and
analyzing time series data.
import pandas as pd
time_series_data = pd.Series([10, 20, 30, 40], index=pd.date_range(start='2023-10-
01', peri
VISHNU PRIYA P M | V BCA 5
Time Resampling:
You can resample time series data to change the frequency of data points (e.g., from
daily to monthly) using Pandas' resample method.
monthly_data = time_series_data.resample('M').mean()
Plotting Time Series Data:
Libraries like Matplotlib and Seaborn can be used to visualize time series data.
import matplotlib.pyplot as plt
time_series_data.plot()
plt.xlabel("Date")
plt.ylabel("Value")
plt.show()
VISHNU PRIYA P M | V BCA 6
Time Series Analysis:
You can perform various time series analysis tasks, including trend analysis, seasonality
detection, and forecasting, using tools like Statsmodels and Scikit-learn.
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(time_series_data)
trend = decomposition.trend
seasonal = decomposition.seasonal
seasonal_decompose function from the statsmodels library in Python to decompose a time series
into its trend and seasonal components. This is a common technique in time series analysis to
separate the underlying trend and seasonal patterns from the original data.
seasonal_decompose(time_series_data): This function takes a time series data as input and
decomposes it into three components: trend, seasonal, and residual (or the remainder). It uses a
seasonal decomposition technique to achieve
this.decomposition = seasonal_decompose(time_series_data): The result of the decomposition is
stored in the decomposition object, which contains the trend, seasonal, and residual
components.trend = decomposition.trend: This line extracts the trend component from the
decomposition and stores it in the variable trend. The trend represents the long-term behavior or
the underlying pattern in the time series data.seasonal = decomposition.seasonal: This line
VISHNU PRIYA P M | V BCA 7
extracts the seasonal component from the decomposition and stores it in the variable seasonal.
The seasonal component represents the periodic fluctuations or seasonality in the
time series data.
INDEXING AND SELECTION:
Selecting by Index:
You can access specific elements in a time series using index labels.
import pandas as pd
time_series_data = pd.Series([10, 20, 30, 40], index=pd.date_range(start='2023-10-01',
periods=4, freq='D'))
selected_data = time_series_data['2023-10-01']
Selecting by Slicing: Use slicing to select a range of data.
selected_range = time_series_data['2023-10-01':'2023-10-03']
Subsetting by Conditions:
You can subset time series data based on conditions using boolean indexing.
subset = time_series_data[time_series_data > 20]
VISHNU PRIYA P M | V BCA 8
Date Ranges and Frequencies:
Date Ranges: You can create date ranges using Pandas' date_range function. This is useful for
generating date index for time series data.
date_range = pd.date_range(start='2023-10-01', end='2023-10-10', freq='D')
Frequencies: You can specify various frequencies when creating date ranges. Common
frequencies include 'D' (day), 'H' (hour), 'M' (month end), and more.
hourly_range = pd.date_range(start='2023-10-01', periods=24, freq='H')
monthly_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='M')
Shifting Data:
Shifting is a common operation when working with time series data, often used for calculating
differences or creating lag features.
Shift Data: You can shift the data forward or backward using the shift method.
shifted_data = time_series_data.shift(periods=1) # Shift data one period forward
Calculating Differences:
VISHNU PRIYA P M | V BCA To compute the difference between consecutive values, you can 9
subtract the shifted series.
SHIFTING DATA:
Shifting data involves moving time series data forward or backward in time. This is useful for
various time series analysis tasks.
Shift Data: You can shift data using Pandas' shift method.
import pandas as pd
# Shifting data one period forward
shifted_data = time_series_data.shift(periods=1)
# Shifting data two periods backward
shifted_data = time_series_data.shift(periods=-2)
Calculating Differences: Shifting is often used to calculate the differences between consecutive
values.
VISHNU PRIYA P M | V BCA 10
# Calculate the difference between consecutive values
diff = time_series_data - time_series_data.shift(periods=1)
Generating Date Ranges and Frequencies:
Pandas provides powerful tools for generating date ranges with different frequencies.
Date Ranges: Use the date_range function to create date ranges.
date_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D') # Daily
frequency
Frequencies: You can specify various frequencies such as 'D' (day), 'H' (hour), 'M' (month end),
'Y' (year end), and more when creating date ranges.
hourly_range = pd.date_range(start='2023-01-01', periods=24, freq='H')
monthly_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='M')
VISHNU PRIYA P M | V BCA 11
Time Zone Handling:
Pandas can handle time zones and convert between them.
Setting Time Zone:
time_series_data = time_series_data.tz_localize('UTC') # Set time zone to UTC
Converting Time Zones:
time_series_data = time_series_data.tz_convert('US/Eastern') # Convert to US Eastern Time
Quarterly Period Frequencies:
Quarterly periods can be generated with the "Q" frequency code.
quarterly_range = pd.period_range(start='2023Q1', end='2023Q4', freq='Q')
VISHNU PRIYA P M | V BCA 12
TIME SERIES ANALYSIS
Time series analysis often involves various data manipulation tasks, including plotting, data
munging, splicing data from multiple sources, decile and quartile analysis, and more. Let's
explore these concepts and some sample applications in the context of time series analysis:
Time Series Plotting:
Plotting is crucial for visualizing time series data to identify patterns and trends.
import matplotlib.pyplot as plt
# Plot time series data
time_series_data.plot()
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Time Series Plot")
plt.show()
VISHNU PRIYA P M | V BCA 13
Data Munging:
Data munging involves cleaning, transforming, and preparing data for analysis. In time series
analysis, this might include handling missing values, resampling, or dealing with outliers.
# Handling missing values
time_series_data = time_series_data.fillna(method='ffill')
# Resampling to a different frequency
resampled_data = time_series_data.resample('W').mean()
Splicing Together Data Sources:
In some cases, you may need to combine time series data from multiple sources.
import pandas as pd
# Concatenating data from multiple sources
combined_data = pd.concat([data_source1, data_source2])
Decile and Quartile Analysis:
Decile and quartile analysis helps you understand the distribution of data.
# Calculate quartiles
quartiles = time_series_data.quantile([0.25, 0.5, 0.75])
VISHNU PRIYA P M | V BCA 14
# Calculate deciles
deciles = time_series_data.quantile([i/10 for i in range(1, 10)])
Sample Applications:
Stock Market Analysis: Analyzing stock price time series data for trends and predicting future
stock prices.
Temperature Forecasting: Analyzing historical temperature data to forecast future weather
conditions.
Demand Forecasting: Analyzing sales data to forecast future product demand.
Future Contract Rolling:
In financial time series analysis, rolling futures contracts is crucial to avoid jumps in time series
data when contracts expire.
# Rolling futures contracts in a DataFrame
rolled_data = contract_rolling_function(time_series_data, window=10)
Rolling Correlation and Linear Regression:
Rolling correlation and regression are used to understand how the relationship between two time
series changes over time.
# Calculate rolling correlation between two time series
VISHNU PRIYA P M | V BCA 15
rolling_corr = time_series_data1.rolling(window=30).corr(time_series_data2)
# Calculate rolling linear regression between two time series
Data Munging:
Data munging is a more general term that encompasses various data preparation
tasks, including cleaning, structuring, and organizing data.
It often involves dealing with missing data, handling outliers, and addressing issues
like data format inconsistencies.
Data munging can also include tasks such as data loading, data extraction, and basic
data exploration.
It is a broader term that doesn't specify a particular methodology or approach.
Data Wrangling:
Data wrangling is a subset of data munging that specifically refers to the process of
cleaning, transforming, and structuring data for analysis.
Data wrangling typically involves tasks like filtering, aggregating, joining, and
reshaping data to create a dataset that is ready for analysis or machine learning.
It is often associated with data preparation in the context of data analysis and is more
focused on making data suitable for specific analytical tasks.
VISHNU PRIYA P M | V BCA 16