DMV Lab Manual
DMV Lab Manual
LABORATORY MANUAL
Subject: DMV
Exam Seat No.:
(2019 Course) [417522]
POs are statements that describe what students are expected to know and be able to do upon
graduating from the program. These relate to the skills, knowledge, analytical ability attitude
and behavior that students acquire through the program.
PO8: Ethics:
Graduates will be able to understand the professional and ethical responsibility.
PO10: Communication:
Graduates will be able to communicate effectively and make effective documentations and
presentations.
PO11: Project Management and Finance:
Graduates will be able to apply and demonstrate engineering and management principles in
project management as a member or leader.
Course Objectives:
Course Outcomes:
CERTIFICATE
has completed all the practical work in the Computer Laboratory - I [417522]
Place:
Date:
INDEX
Department of Artificial Intelligence and Data Science Class: B.E.
Experiment No: 1
Analyzing Sales Data from Multiple File Formats
Class: BE Batch:
EXPERIMENT NO.1
Practical Title: Study Analyzing Sales Data from Multiple File Formats
Aim: To load, clean, transform, and analyze sales data from various file formats (CSV,
Excel, JSON)and to derive meaningful insights through data visualization.
Objective:
To Import sales data from CSV, Excel, and JSON file formats into appropriate data
structures.
To Identify and address data quality issues such as missing values or inconsistencies. To
Perform data cleaning operations to standardize the dataset.
To Convert the data into a unified format and perform necessary transformations. To
Conduct descriptive analysis to understand sales metrics and trends.
To Create visualizations to represent and interpret the sales data.
Software Requirements:
Hardware Requirement:
PIV, 2GB RAM, 500 GB HDD.
Learning Objectives:
To Learn Analyzing Sales Data from Multiple File Formats.
Outcome:
1. Cleaned and Unified Dataset: A single, cleaned data frame containing the
merged data fromCSV, Excel, and JSON files.
2. Descriptive Analysis Results: Metrics such as total sales, average order value, and
distribution byproduct category.
3. Visualizations: Graphical representations of sales trends, customer
behavior, and productperformance.
Theory:
1
2
Data Loading
CSV Files: CSV (Comma-Separated Values) files are text files where each line
represents a data record, and each record consists of fields separated by commas.
The pandas.read_csv() function inPython efficiently loads this data into a
DataFrame.
Excel Files: Excel files are spreadsheet files that can store data in multiple
sheets. The pandas.read_excel() function allows loading data from specific
sheets or all sheets into DataFrames.
JSON Files: JSON (JavaScript Object Notation) files store data in a hierarchical format
that can represent complex data structures. The pandas .read json() function loads this
data into a DataFrameby converting the nested JSON objects into tabular format.
Data Cleaning
Data Transformation
Merging Datasets: Combining data from different sources (e.g., CSV, Excel,
JSON) into a unifiedDataFrame is crucial for a comprehensive analysis. Functions
like merge() or concat() in pandas facilitate this.
Feature Engineering: Creating new features from existing data can enhance the
analysis. Forinstance, creating a "total revenue" column by multiplying quantity
and price.
Normalization: Standardizing the range of data values to improve the
performance of somealgorithms. Techniques such as min-max scaling or z-score
normalization are used.
Data Analysis
1
3
Descriptive Statistics: Basic metrics like mean, median, mode, standard deviation,
and percentilesprovide insights into the central tendency and dispersion of the data.
Aggregation: Summarizing data at different levels, such as total sales per region or
average ordervalue per customer, helps in understanding broader patterns.
Data Visualization
Bar Plots: Useful for comparing quantities across different categories. For example,
comparing totalsales by product category.
Line Charts: Effective for showing trends over time, such as monthly sales growth.
Pie Charts: Useful for depicting the percentage distribution of categories, such as
sales distributionby region.
Box Plots: Useful for showing data distribution and identifying outliers.
Algorithms
1. Outlier Detection:
o Z-Score: Measures how far away a data point is from the mean, in
terms of standarddeviations. Points with a z-score beyond a certain
threshold are considered outliers.
o IQR (Interquartile Range): Outliers are identified by checking if data points
fall below Q1 -1.5IQR or above Q3 + 1.5IQR, where Q1 and Q3 are the first
and third quartiles, respectively.
2. Clustering (for deeper insights):
o K-Means Clustering: Groups data points into clusters based on their
similarity. Useful forcustomer segmentation by purchasing behavior.
o Hierarchical Clustering: Builds a hierarchy of clusters, useful for
understanding the data'sstructure at different levels of granularity.
3. Regression Analysis (for trend analysis):
o Linear Regression: Analyzes the relationship between two variables, such
as sales andadvertising spend, to predict future sales.
Conclusion:
The practical successfully demonstrated the process of loading, cleaning, transforming, and
analyzing sales data from various file formats. By consolidating the data into a unified format,
performing descriptive analysis, and visualizing the results, we gained valuable insights into sales
trends and product performance.This process ensures that data-driven decisions can be
1
4
Viva Questions:
1. What is sales data, and why is it important for businesses?
2. What were the objectives of your sales data analysis project?
3. What types of insights did you hope to derive from the analysis of the
sales data?
5 3 2 10
Program:
1
5
Experiment No: 2
Analyzing Weather Data from Open Weather Map API
Class: BE Batch:
EXPERIMENT NO.2
Practical Title: Analyzing Weather Data from Open Weather Map API
Aim:
To interact with the Open Weather Map API to retrieve and analyze weather data for a specific
location, and to model and visualize weather patterns over time.
Objectives:
• To API Registration: Obtain an API key from Open Weather Map for accessing
weather data.
• To Data Retrieval: Use the API key to fetch weather data for a specific location.
• To Data Extraction: Extract relevant weather attributes such as temperature, humidity,
wind speed, and precipitation from the API response.
• To Data Cleaning: Preprocess and clean the data, addressing missing values and
inconsistencies.
• To Data Modeling: Analyze weather patterns by calculating average temperatures,
extreme values, and trends.
• To Data Visualization: Create visualizations to represent temperature changes,
precipitation levels, and wind speed variations.
• To Data Aggregation: Summarize weather statistics by specific time periods such as
daily, monthly, or seasonal.
• To Geospatial Visualization: Incorporate geographical information to visualize weather
patterns across different locations.
• To Correlation Analysis: Explore relationships between weather attributes using
correlation plots or heat maps.
Software:
o Python 3.x
o Libraries: requests, pandas, matplotlib, seaborn, geopandas (install using pip)
o Jupyter Notebook (optional, for interactive analysis)
Hardware:
o A computer with at least 4 GB of RAM
o Internet connection for API requests
o Enough storage to handle data files and visualizations
1
7
Expected Outcome:
Theory:
API Interaction
Overview: The OpenWeatherMap API provides weather data in JSON format. To access
this data, you need to send HTTP requests with your API key.
Steps:
1. Obtain API Key: Register on the OpenWeatherMap platform to get an API key.
2. Send HTTP Requests: Use libraries such as requests in Python to send GET
requests to the API endpoint.
3. Receive JSON Response: The API responds with weather data in JSON format.
4. Parse JSON Data: Use libraries like json in Python to parse the JSON response
and extract relevant information.
Data Cleaning
Data Modeling
Data Visualization
Geospatial Visualization
Correlation Analysis
Algorithms
1. Time-Series Analysis:
o Moving Average: Smooths out short-term fluctuations and highlights longer-term
trends.
df['Moving_Avg'] = df['Temperature'].rolling(window=7).mean()
1
9
2. Correlation Analysis:
o Pearson Correlation Coefficient: Measures linear correlation between two
variables.
correlation = df['Temperature'].corr(df['Humidity'])
3. Geospatial Analysis:
o K-Means Clustering: Groups data points into clusters based on geographical
location and weather attributes.
kmeans = KMeans(n_clusters=5)
import folium
heatmap = folium.Map(location=[latitude, longitude], zoom_start=12)
This extended theory provides a comprehensive approach to analyzing weather data, incorporating
API interaction, data cleaning, modeling, visualization, and correlation analysis,along with a flow
diagram and relevant algorithms.
Conclusion:
2
0
The practical demonstrated how to interact with the Open Weather Map API to retrieve and
analyze weather data. By extracting and cleaning weather attributes, we were able to visualize
and interpret the current weather conditions. For a comprehensive analysis, historical data and
geospatial information would be needed to explore trends, aggregate statistics, and visualize
weather patterns across different locations. This approach provides valuable insights into
weather behavior and facilitates better decision-making based on weather data.
Viva Questions:
1. What is the Open Weather Map API, and how does it work?
2. What kind of data can you retrieve from the Open Weather Map API?
3. What insights did you gain from visualizing the weather data?
5 3 2 10
Program:
2
1
Experiment No: 3
Data Cleaning and Preparation
Class: BE Batch:
EXPERIMENT NO.3
Practical Title: Data Wrangling on Real Estate Market
Aim:
To perform data wrangling on a real estate market dataset to clean, preprocess, and prepare the data for analysis
or modeling, with a focus on understanding factors influencing housing prices.
Objectives:
• To Import Dataset: Load the "RealEstate_Prices.csv" dataset and clean column names.
• To Handle Missing Values: Address missing values using appropriate strategies.
• To Data Merging: Merge with additional datasets if available to enrich the dataset.
• To Filter and Subset Data: Extract relevant data based on specific criteria such as time period,
property type, or location.
• To Encode Categorical Variables: Transform categorical variables for analysis.
• To Aggregate Data: Compute summary statistics or derived metrics, such as average sale prices by
neighborhood or property type.
• To Identify and Handle Outliers: Detect and manage outliers or extreme values that may affect the
analysis.
Software:
o Python 3.x
o Libraries: pandas, numpy, matplotlib, seaborn (install using pip)
o Jupyter Notebook (optional, for an interactive environment)
Hardware:
o A computer with at least 4 GB of RAM
o Enough storage to handle data files and intermediate results
Expected Outcome:
Cleaned Dataset: A well-prepared dataset with clean column names, handled missing values, and
appropriate data types.
Merged Data: Enhanced dataset with additional information from other relevant datasets.
Filtered Data: Subset of data based on criteria like time period, property type, or location.
Encoded Variables: Categorical variables encoded for analysis.
Aggregated Metrics: Summary statistics, such as average sale prices by neighborhood or property type.
Outlier Management: Identified and managed outliers to improve data quality.
Theory:
Data Wrangling is the process of cleaning, structuring, and enriching raw data into a suitable format for
analysis. It involves several steps to ensure data quality and usability:
2
3
1. Importing Data
Overview: Loading data from various sources into a suitable format for analysis.
Steps:
o Read from File: Use functions like pandas.read_csv(), pandas.read_excel(), or
pandas.read_json() to load data.
o Address Formatting Issues: Ensure correct data types, date formats, and encoding.
3. Merging Datasets
4. Filtering Data
6. Aggregating Data
o Group By: Use pandas. Group by() to aggregate data by one or more columns.
o Aggregation Functions: Apply functions like sum(), mean(), count() to summarize data.
7. Handling Outliers
Algorithms:-
1. Imputation Techniques
o Mean Imputation: Fill missing values with the mean of the column.
df['column_name'].fillna(df['column_name'].mean(), inplace=True)
2. Outlier Detection
o Z-Score Method: Identify outliers based on Z-score.
Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['column_name'] < (Q1 - 1.5 * IQR)) | (df['column_name'] > (Q3 + 1.5 * IQR))]
4. Aggregating Data
o Group By and Aggregation:
This extended theory provides a detailed approach to data wrangling, including a flow diagram and algorithms for
each step, ensuring a thorough and systematic process for preparing data for analysis.
Conclusion:
The practical demonstrated the process of data wrangling on a real estate market dataset. By cleaning and
preprocessing the data, merging additional datasets, and handling missing values, we prepared the dataset for
further analysis. Filtering, encoding, and aggregating the data allowed us to derive insights into factors
influencing housing prices. Managing outliers helped ensure the accuracy of subsequent analyses. This
comprehensive data wrangling approach provides a solid foundation for detailed analysis and modeling in the
real estate domain.
Viva Questions:
1. What is data wrangling, and why is it crucial for analyzing the real estate
market?
2. What key attributes are typically included in real estate datasets?
3. What are the common data sources used in real estate market analysis?
5 3 2 10
Program:
2
6
Experiment No: 4
Data Wrangling
Class: BE Batch:
EXPERIMENT NO.4
The aim of this project is to extract, clean, and process real estate market data to provide meaningful insights
regarding property values, trends, and other market dynamics. This will involve handling messy, incomplete, or
inaccurate data and transforming it into a structured format for analysis and decision-making.
Objectives:
1. Data Collection: Gather real estate market data from various sources, including property listings,
government data, and third-party datasets (e.g., Zillow, Realtor).
2. Data Cleaning: Handle missing values, duplicate entries, and outliers in the dataset.
3. Data Transformation: Convert data into a consistent format, such as standardizing property sizes, prices,
and locations.
4. Data Enrichment: Add external data (e.g., socio-economic or neighborhood data) to improve analysis.
5. Exploratory Data Analysis (EDA): Explore and visualize data to discover trends, patterns, and
correlations.
6. Feature Engineering: Create new features, such as price per square foot or neighborhood popularity, for
deeper insights.
7. Reporting: Provide a summary of findings and insights, including trends in property prices, demand in
different neighborhoods, and predictions for future market trends.
Hardware:
Expected Outcome:
1. Clean and Structured Data: A cleaned and well-organized dataset ready for analysis.
2. Trends and Insights: Identified key trends, such as price fluctuations in different neighborhoods, popular
property types, and factors affecting real estate prices.
3. Visualizations: Graphs, charts, and maps showing trends, relationships, and distributions in the real estate
market.
4. Report: A comprehensive report that summarizes findings, offers insights into the real estate market, and
provides recommendations or predictions.
Theory:
Data Wrangling involves transforming raw data into a useful format for analysis. In the context of the real estate
market, raw data may come from multiple sources (websites, public records, APIs) and can be messy or
incomplete. The process includes:
1. Data Collection:
o Import datasets from various sources (e.g., CSV, JSON, SQL).
o Example: pd.read_csv('real_estate_data.csv')
2. Data Cleaning:
o Remove duplicates: df.drop_duplicates()
o Handle missing values: df.fillna(method='ffill') or df.dropna()
o Convert data types (e.g., string to datetime or numeric).
3. Data Transformation:
o Normalize data (e.g., prices in different currencies): df['price'] = df['price'].apply(lambda x:
convert_currency(x))
o Standardize property sizes: df['size'] = df['size'].apply(lambda x: standardize_size(x))
4. Data Enrichment:
o Merge additional data (e.g., crime rates, schools): df = df.merge(additional_data, on='location')
2
9
Conclusion:
Data wrangling is essential for transforming raw, unstructured real estate market data into a clean, usable format
for analysis. By performing steps like data cleaning, transformation, and enrichment, we gain insights into market
trends and improve decision-making. This project helps understand property price dynamics and make predictions
for future developments in the real estate market.
Viva Questions:
1. What are the key steps involved in the data wrangling process?
2. What types of raw data are typically used in the real estate market for analysis?
3. What challenges did you face during the data wrangling process?
5 3 2 10
Program:
2
1
0
Experiment No: 5
Data Visualization using matplotlib
Class: BE Batch:
EXPERIMENT NO.5
Practical Title: Data Visualization of Air Quality Index (AQI) Trends Using Matplotlib
Aim:
To analyze and visualize the trends and patterns of air quality measurements in a city over time using the
Matplotlib library, focusing on AQI and pollutant levels.
Objectives:
Software:
o Python 3.x
o Libraries: matplotlib, pandas, seaborn (install using pip)
o Jupyter Notebook (optional, for an interactive environment)
Hardware:
o A computer with at least 4 GB of RAM
o Enough storage to handle data files and generate visualizations
Expected Outcome:
1. Visualized AQI Trends: Line plots showing overall AQI trends over time.
2. Pollutant Trends: Individual plots for PM2.5, PM10, and CO levels.
3. AQI Comparisons: Bar plots comparing AQI values across different dates or periods.
4. Distribution Analysis: Box plots or violin plots for AQI distribution.
5. Relationship Insights: Scatter or bubble charts showing relationships between AQI values andpollutant
levels.
6. Custom Visualizations: Enhanced visualizations with appropriate labels, titles, and color schemes.
Theory:
2
1
2
Data Visualization involves creating graphical representations of data to help analyze and interpret information.
Effective visualization allows for the identification of patterns, trends, and insights that might not be apparent
from raw data alone.
1. Line Plots
Overview: Line plots are used to show trends over time or continuous data. They are useful for
visualizing how a variable changes across a sequence of intervals.
Use Cases: Tracking stock prices, monitoring temperature changes over days, analyzing sales trends
over months.
Example:
plt.plot(df['Date'], df['Value'])
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Trend over Time')
plt.grid(True)
plt.show()
2. Bar Plots
Overview: Bar plots are used to compare quantities across different categories or time periods. Theycan
be vertical or horizontal.
Use Cases: Comparing sales figures by product category, displaying monthly revenue, analyzing survey
responses.
Example:
plt.bar(df['Category'], df['Sales'])
plt.xlabel('Category')
plt.ylabel('Sales')
plt.title('Sales by Category')
plt.xticks(rotation=45)
plt.show()
Overview:
o Box Plots: Show the distribution of data through quartiles and identify outliers. They areeffective
for understanding data spread and central tendency.
o Violin Plots: Combine box plots with kernel density plots to show the distribution of the data
across different categories.
Use Cases: Visualizing the spread of test scores, comparing income distribution across different regions.
Example:
2
1
3
Overview:
o Scatter Plots: Display relationships between two variables. They are used to identifycorrelations
or patterns.
o Bubble Charts: Extend scatter plots by adding a third dimension with bubble size, representing
an additional variable.
Use Cases: Analyzing the relationship between advertising spend and sales, visualizing customer
demographics.
Example:
5. Customization
Algorithms
df['Moving_Avg'] = df['Value'].rolling(window=7).mean()
plt.plot(df['Date'], df['Moving_Avg'], label='7-Day Moving Average')
plt.legend()
This extended theory on data visualization provides a detailed look at different types of plots, customization
techniques, and associated algorithms, along with a flow diagram to guide the visualization process.
Conclusion:
3
1
The practical successfully demonstrated the use of Matplotlib to visualize air quality trends and patterns. By
creating various types of plots, we were able to analyze AQI trends, observe pollutant levels over time, and
explore relationships between different air quality metrics. Customizing the visualizations with appropriate
labels, titles, and color schemes enhanced the clarity and interpretability of the results. This approach provides
valuable insights into air quality dynamics and supports data-driven decision-making for environmental health.
Viva Questions:
1. What is the Air Quality Index (AQI)?
2. Why is it important to visualize AQI trends?
3. What is the impact of high AQI levels on public health?
5 3 2 10
Program:
3
2
Experiment No: 6
Data Aggregation and Visualization of Sales Performance by
Region
Name of the Student:
Class: BE Batch:
EXPERIMENT NO.6
To perform data aggregation on sales transactions to analyze and visualize sales performanceby
region, and to identify top-performing regions and product categories.
Objectives:
Software:
o Python 3.x
o Libraries: pandas, matplotlib, seaborn (install using pip)
o Jupyter Notebook (optional, for interactive analysis)
Hardware:
o A computer with at least 4 GB of RAM
o Enough storage to handle data files and generate visualizations
Expected Outcome:
4. Sales Analysis by Product Category: Aggregated sales data by region and product
category.
5. Comparison Visualizations: Stacked bar plots or grouped bar plots comparing sales
amounts across regions and product categories.
Theory:
Data Aggregation: Data Aggregation is the process of summarizing and grouping data to derive meaningful
metrics and insights. It involves organizing data into groups, calculating key metrics, and visualizing the results
to interpret aggregated information effectively.
1. Grouping Data
Overview: Organizing data into groups based on one or more variables. This allows for aggregation and
comparison within and across these groups.
Use Cases: Grouping sales data by region, analyzing customer behavior by age group, or summarizing
performance metrics by department.
Steps:
o Define Grouping Variables: Determine which columns or variables to use for grouping (e.g.,
region, product category).
o Apply Grouping: Use functions to group data accordingly.
import pandas as pd
# Example dataframe
df = pd.DataFrame({
'Region': ['North', 'South', 'North', 'East', 'South'],
'Sales': [200, 150, 300, 250, 100]
})
# Group by 'Region'
grouped_df = df.groupby('Region').sum()
2. Calculating Metrics
Overview: Summarizing data within each group to derive meaningful metrics such as totals, averages,
and percentages.
Common Metrics:
o Total: Sum of values within each group (e.g., total sales by region).
o Average: Mean of values within each group (e.g., average sales per transaction).
o Count: Number of observations within each group (e.g., number of transactions).
Steps:
o Apply Aggregation Functions: Use functions like sum(), mean(), count(), median(), etc.
3. Visualization
Overview: Creating graphical representations to analyze and interpret aggregated data. Visualizations
make it easier to spot patterns, trends, and comparisons.
Common Visualizations:
o Bar Plots: Compare aggregated values across categories.
o Pie Charts: Show proportions of a whole (e.g., market share by region).
o Line Charts: Display trends over time if data is aggregated over time periods.
Steps:
o Select Visualization Type: Choose based on the nature of the data and the insights you want to
derive.
o Create Visualization: Use libraries like matplotlib, seaborn, or plotly to generate plots.
Algorithms:-
1. Grouping Data
o Group By: Aggregate data based on specified criteria.
2. Calculating Metrics
o Total Calculation:
total_sales = df.groupby('Region')['Sales'].sum()
o Average Calculation:
average_sales = df.groupby('Region')['Sales'].mean()
o Count Calculation:
3
6
count_sales = df.groupby('Region')['Sales'].count()
3. Visualization
o Bar Plot:
plt.bar(grouped_df.index, grouped_df.values)
plt.xlabel('Region')
plt.ylabel('Total Sales')
plt.title('Total Sales by Region')
plt.show()
o Pie Chart:
This extended theory on data aggregation provides a comprehensive look at the process, including grouping
data, calculating metrics, visualizing results, and a flow diagram with corresponding algorithms to guide the
aggregation and analysis.
Conclusion:
The practical effectively demonstrated data aggregation and visualization techniques to analyze sales
performance by region. By grouping sales data and calculating total sales amounts, we identified top-
performing regions and compared sales across different product categories. The visualizations, including bar
plots, pie charts, and stacked bar plots, provided a clear representation of sales distribution and performance,
facilitating better insights into the retail company's sales dynamics. This approach helps in understanding
regional sales performance and making data-driven decisions for strategic planning.
Viva Questions:
1. What is data aggregation?
2. Why is data aggregation important in sales performance analysis?
3. Why is data aggregation important in sales performance analysis?
5 3 2 10
Program:
3
7
Experiment No: 7
Time Series Data Analysis
Class: BE Batch:
EXPERIMENT NO.7
Practical Title: Time Series Analysis and Forecasting of Stock Market Data
Aim:
To analyze and visualize historical stock price data to identify trends, patterns, and potential predictors, and to
build models to forecast future stock prices.
Objectives:
Software:
o Python 3.x
o Libraries: pandas, matplotlib, seaborn, statsmodels (install using pip)
o Jupyter Notebook (optional, for interactive analysis)
Hardware:
o A computer with at least 4 GB of RAM
o Sufficient storage for data files and generated results
Expected Outcome:
1. Visualized Historical Trends: Line charts displaying historical stock price trends.
2. Smoothed Trends: Plots of moving averages to highlight underlying trends.
3. Seasonal Patterns: Visualizations identifying periodic patterns in stock prices.
4. Correlation Insights: Analysis and plots showing relationships between stock prices and other
variables.
5. Forecasting Results: Forecasts of future stock prices using time series models.
Theory:
Time Series Analysis: Time Series Analysis involves examining data points collected or recorded at specific
time intervals to uncover underlying patterns, trends, and seasonal effects. This analysis is crucial for forecasting
future values and understanding historical data.
3
9
1. Trends
Overview: Trends represent the long-term movement in time series data, reflecting the overall direction
in which data is moving over an extended period.
Identification: Trends can be upward, downward, or flat. They are often identified through visual
inspection of the data or by applying trend-line analysis.
Example: A company’s annual revenue growth trend over the past decade.
Diagram:
plaintext
Time Series Data with Trend
Time | | | | | |
Value |---▲---|---▲---|---▲---|---▲---|---▲---|---▲---
Year 1 Year 2 Year 3 Year 4 Year 5 Year 6
2. Moving Averages
Overview: Moving averages smooth out short-term fluctuations in the data to highlight longer-term
trends. They are particularly useful for identifying trends and cyclic patterns.
Types:
o Simple Moving Average (SMA): The average of data points over a fixed period.
o Weighted Moving Average (WMA): Assigns different weights to data points, giving more
importance to recent observations.
o Exponential Moving Average (EMA): Applies exponentially decreasing weights to past
observations, making it more responsive to recent changes.
Example: A 7-day moving average of daily temperatures to identify long-term climate trends.
Diagram:
plaintext
Time Series Data with Moving Average
Time |---▲---|---▲---|---▲---|---▲---|---▲---|---▲---|---▲---
| SMA | SMA | SMA | SMA | SMA | SMA | SMA
Value | / | / | / | / | / | / | /
| | | || | | | | | | | | | | | |
3. Seasonality
Overview: Seasonality refers to periodic fluctuations in time series data that occur at regular intervals
due to seasonal factors like time of year, month, or day of the week.
Identification: Seasonality is identified by observing repeating patterns at consistent intervals.
4
0
Diagram:
plaintext
Time Series Data with Seasonality
Time |---▲---|---▼---|---▲---|---▼---|---▲---|---▼---
| Month 1 | Month 2 | Month 3 | Month 4 | Month 5 | Month 6
Value | | | | | | | | | | | |
Diagram:
plaintext
ARIMA Model Components
Example: Forecasting monthly sales by modeling past sales data using ARIMA.
Algorithm:
# Make forecast
forecast = model_fit.forecast(steps=12)
4
1
5. Exponential Smoothing
Overview: Exponential Smoothing applies decreasing weights to past observations, making recent data
more influential for forecasting.
Types:
o Simple Exponential Smoothing: Forecasts based on weighted averages of past observations.
o Holt’s Linear Trend Model: Extends simple exponential smoothing to capture linear trends.
o Holt-Winters Seasonal Model: Includes components for trend and seasonality.
Algorithm:
Fit model
model = ExponentialSmoothing(df['Sales'], seasonal='add', seasonal_periods=12)
model_fit = model.fit()
# Make forecast
forecast = model_fit.forecast(steps=12)
Diagram:
plaintext
Copy code
Exponential Smoothing
Time | | | | | | | |
| Latest | Past | Past | Past | Past
| Observations | Observations | Observations
Weight | 0.7 | 0.2 | 0.05 | 0.025 | 0.005
Summary
This extended theory provides a comprehensive understanding of time series analysis, complete with diagrams
and algorithms for practical application
Conclusion:
The practical demonstrated the application of time series analysis techniques on stock market data. By
4
2
visualizing historical trends, calculating moving averages, and analyzing seasonal patterns, we gained insights
into the stock price behavior. The correlation analysis revealed relationships between stock prices and trading
volume, while ARIMA and Exponential Smoothing models provided forecasts for future stock prices. These
techniques and visualizations offer valuable tools for understanding stock price dynamics and making informed
investment decisions.
Viva Questions:
1. What is a time series?
2. What distinguishes a time series from cross-sectional data?
3. What is a white noise process in time series analysis?
5 3 2 10
Program: