[go: up one dir, main page]

0% found this document useful (0 votes)
29 views19 pages

DA Phase 3 Dharani

The document outlines a project focused on analyzing the IPL 2025 deliveries dataset to derive insights for improving team performance and strategic decision-making in cricket. It details the problem statement, objectives, data preprocessing, exploratory data analysis, and key recommendations based on findings. Additionally, it includes system requirements, visualizations, and future scope for enhancing the analysis with real-time data and advanced tools.

Uploaded by

dom37070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views19 pages

DA Phase 3 Dharani

The document outlines a project focused on analyzing the IPL 2025 deliveries dataset to derive insights for improving team performance and strategic decision-making in cricket. It details the problem statement, objectives, data preprocessing, exploratory data analysis, and key recommendations based on findings. Additionally, it includes system requirements, visualizations, and future scope for enhancing the analysis with real-time data and advanced tools.

Uploaded by

dom37070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Phase-3 Submission Template – Data Analytics

Student Name: Dharani. M


Register Number: 212923243501
Institution: St. Joseph College of Engineering
Department: Artificial Intelligence and Data Science
Date of Submission: 10.09.2025
GitHub Repository Link: https://github.com/Dharani7704/NM--TATA-IPL-Analysis-
2025

1. Problem Statement
In professional cricket leagues like the IPL, franchises invest heavily in players, coaching,
and match planning, yet many decisions around team composition and strategy are still
driven by subjective judgment. This leads to inconsistent performance, undervaluing of
in-form players, and ineffective use of key players. Without a systematic approach to
analyzing player and team data, teams struggle to optimize performance under pressure.
Therefore, there is a real-world need for data-driven tools that evaluate player metrics
such as consistency, strike rate, and match impact, to support smarter player selection and
strategic decisions that can directly influence match outcomes

2. Abstract
This project analyzes the IPL 2025 deliveries dataset to extract valuable insights into
team and player performances. In the context of professional sports analytics, franchises
and coaches rely on data to make strategic decisions such as team selection and match
planning. Using Python-based data analysis techniques, we conducted extensive
preprocessing, exploratory data analysis (EDA), and visualizations. Our approach
involved identifying top batsmen, bowlers, and evaluating team performances through
key metrics like total runs, wickets, and economy rates. The project culminates in a
professional HTML dashboard that visually presents findings. Key insights such as top-
performing players and team strengths enable data-driven strategies in cricket.

3. System Requirements
Hardware Requirements
• Minimum RAM: 4 GB (8 GB recommended for smoother performance)
• Processor: Intel i3 or higher (i5/i7 or AMD equivalent preferred)
• Storage: At least 500 MB of free space for dataset and libraries
• Display: 1366×768 resolution or higher
Software Requirements
• Operating System: Windows 10/11, macOS, or Linux
• Python Version: Python 3.x (3.7 or above recommended)
• Development Environment:
o VS Code
• Required Libraries:
o pandas
o numpy
o matplotlib
o seaborn
o plotly
o openpyxl
o pandas-profiling

2
4. Project Objectives
I. Primary Goals:

• To analyze the IPL 2025 deliveries dataset to uncover performance trends and
player statistics.

• To generate actionable insights about top-performing batsmen, bowlers, and


teams.

• To visualize key metrics using charts and graphs in a professional, website-style


dashboard.

II. Expected Outputs:

• Identification of:

o Top Batsmen based on total runs scored.

o Top Bowlers based on wickets taken and economy rate.

o Best Performing Teams based on match-level aggregations.

• Visualizations:

o Bar charts, pie charts, and other graphical summaries.

o Team-wise performance dashboards using HTML/CSS/JS.

• Cleaned and transformed dataset ready for advanced analytics.

3
III. Business Impact:

• Helps coaches and team analysts make informed player selection and match
strategies.

• Assists sponsors and advertisers in identifying star performers and high-impact


teams.

• Enables fans and fantasy league participants to base decisions on real


performance trends.

5. Project Workflow (Flowchart)

4
6. Dataset Description

➢ Dataset Name and Source

Name: IPL 2025 Deliveries Dataset

Source: Publicly available on Kaggle

This dataset records ball-by-ball information of the Indian Premier League (IPL)
2025 matches.

➢ Data Type

Type: Structured

Format: CSV (Comma Separated Values)

This dataset consists of clearly defined rows and columns, suitable for analysis
using data science tools.

➢ Size and Structure

Number of Rows: Approximately 22,000+ rows

Number of Columns: 18 columns


Each row in the dataset corresponds to a single ball delivery in the IPL 2025 season.

➢ Nature of the Dataset

Type: Static

The dataset is a historical snapshot of all deliveries in the IPL 2025 season and does
not change over time

5
7. Data Preprocessing

● The IPL 2025 deliveries.csv dataset was cleaned and prepared using the following steps
in analysis.py:

• Loading the Dataset

● Loaded the dataset using pandas.read_csv().

● Verified the structure with .info() and .describe().

• Handling Missing Values

● Checked for missing/null values using .isnull().sum().

● No critical missing values affecting key metrics like batsman_runs, total_runs, or


player_dismissed.

● Where necessary (e.g., player_dismissed), missing values were replaced with "None" to
standardize and avoid issues during grouping and analysis.

• Removing Duplicates

● Used .duplicated() .sum() and .drop_duplicates() to remove redundant rows, ensuring


data consistency.

• Data Formatting

● Converted data types for consistency, e.g., making sure batsman_runs, over, ball, etc.,
are numeric.

● Ensured categorical fields like batsman, bowler, and match_id are in string format
where necessary.

• Outlier Detection

● Visual inspection of runs per ball or over showed no extreme anomalies, so outlier
treatment was not applied explicitly.

● However, aggregate metrics like total runs, averages, and economy were carefully
computed with .groupby() to normalize variations.

6
• Transformations & Feature Engineering

● Computed new metrics:

● Top Batsmen (by total runs)

● Top Bowlers (by total wickets)

● Economy Rate (runs conceded per over)

● Partnership Analysis (runs per pair of batsmen)

● Saved the visualizations (.png) to the website/images/ directory for professional web
display.

8. Exploratory Data Analysis (EDA)

We analyzed the relationships between multiple variables to uncover deeper patterns:

• Top Batsmen (Runs Scored)

o Grouped the data by batter and aggregated batsman_runs.

o Visualized using horizontal bar plot showing top 10 batsmen.

• Top Bowlers (Wickets Taken)

o Filtered is WicketDelivery == 1, grouped by bowler.

o Sorted and visualized the top wicket-takers.

• Economy Rate of Bowlers

o Calculated economy rate = total runs conceded / (balls bowled / 6).

Bar chart showed bowlers with best economy under pressure

9. Insights and Interpretation

Below are the key insights derived from our Exploratory Data Analysis (EDA) of IPL
2025 player and team performance:

7
Key Takeaways:

• Top 5 batsmen contributed 48% of total team runs, indicating a high


dependency on core players.

• Powerplay overs (1–6) saw an average run rate of 8.3, while death overs (16–
20) peaked at 11.2, suggesting strategic acceleration.

• Bowler economy rate is best in middle overs (7–15) with an average of 6.9,
highlighting effective containment strategies during that phase.

• Gujarat Titans 75% of matches when defending a target above 180 runs,
showcasing strong death-over bowling and fielding.

• Player X's strike rate improved by 25% compared to IPL 2024, reflecting
improved finishing ability.

• Spin bowlers took 60% of wickets in night matches, indicating dew and pitch
conditions favoring spin in certain venues.

• Teams winning the toss and choosing to bowl first had a 63% win rate,
suggesting a tactical edge under pressure with known targets.

10. Recommendations
Based on the insights generated from the data analysis of IPL 2025, here are data-backed
suggestions for stakeholders to enhance team performance and strategic decision-making.

A. Short-Term Actions
• Optimize Powerplay Batting Strategy
Insight: Run rate during powerplay overs is comparatively low.
Action: Promote aggressive openers or pinch-hitters early in the innings to
maximize the 1–6 over window.
• Use Spin Bowlers More in Night Matches
Insight: Spinners took 60% of wickets in night games.
8
Action: Prioritize including at least two quality spinners in the lineup for
evening matches.
• Toss Strategy – Prefer Chasing
Insight: Teams chasing won 63% of games.
Action: When winning the toss, opt to bowl first to capitalize on pitch
behavior and pressure advantage.
B. Long-Term Strategic Moves
• Reduce Over-Reliance on Star Batsmen
Insight: Top 5 batsmen contribute nearly half the runs.
Action: Develop middle-order strength by grooming young players for
flexible roles.
• Invest in Death Over Specialists
Insight: Death overs yield the highest run rates against most teams.
Action: Recruit or train bowlers with strong yorker and slower ball skills to
control the end overs.
• Venue-Specific Player Selection
Insight: Certain players perform better in specific venues.
Action: Adopt a data-driven squad rotation system based on venue conditions
and player performance history.
• Long-Term Fitness and Form Tracking
Insight: Notable improvements in some players’ strike rates.
Action: Implement continuous performance analytics for tracking form,
fitness, and workload to maintain player peak.

9
11. Visualizations / Dashboard

• Description: Displays the teams with the highest number of runs in IPL history

• Purpose: Highlights the most successful team by total runs in the league.

• Description: Shows the economy rates of bowlers, indicating how many runs they
concede per over.

• Purpose: Helps teams identify bowlers who are economical and can restrict the
opposition's scoring.

10
• Description: Displays the teams with the highest number of wins in IPL history.

• Purpose: Highlights the most successful teams in the league.

• Description: Displays the teams with the highest number of wins in IPL history.

• Purpose: Highlights the most successful teams in the league.

11
• Description: Visualizes the distribution of runs scored across different overs in a match.

• Purpose: Provides insights into scoring patterns and key phases of the game\

Description: visualization highlights the players who have scored the most runs in the IPL. It
provides insights into the most consistent and impactful batsmen in the league.

Purpose: highlight the most consistent and impactful batsmen in the IPL. It provides insights
into players who have contributed significantly to their teams' success by scoring the highest
number of runs

12
• Description: Highlights the top 5 bowlers based on their total wickets taken.

• Purpose: Useful for analyzing the most effective bowlers in the tournament

• Description: Displays the top 5 batsmen based on their total runs scored in the IPL.

• Purpose: Helps identify the most consistent and high-performing batsmen in the league.

13
13. Source Code
• Folder structure:
ipl_2025_project/

├── analysis/
│ ├── analysis.py
│ └── deliveries.csv

├── website/
│ ├── index.html
│ ├── script.js
│ ├── style.css
│ │
│ └── images/
│ ├── best_ team.png
│ ├── best_teams.png
│ ├── economy_rate.png
│ ├── partnership.png
│ ├── partnership2.png
│ ├── runs_per_over.png
│ ├── top_batsmen.png
│ ├── top_bowlers.png
│ └── top_players.png
└── README.md

14
• Source code:
Analysis.py:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

# Set the working directory to the script's location


os.chdir(os.path.dirname(__file__))

# Ensure the images directory exists


images_dir = '../website/images/'
os.makedirs(images_dir, exist_ok=True)

# Use the absolute path for the dataset file


file_path = os.path.join(os.path.dirname(__file__), 'deliveries.csv')

# Debugging: Print the resolved file path


print(f"Resolved file path for deliveries.csv: {file_path}")

if not os.path.exists(file_path):
raise FileNotFoundError("The dataset file 'deliveries.csv' is missing.")

# Define column names explicitly

15
column_names = [
'match_id', 'date', 'stage', 'venue', 'team1', 'team2', 'innings', 'over_ball',
'batsman', 'bowler', 'batsman_runs', 'extras', 'wides', 'noballs', 'byes', 'legbyes',
'dismissal_kind', 'player_dismissed', 'fielder'
]
data = pd.read_csv(file_path, names=column_names, header=None)

# Ensure 'batsman_runs' is numeric


data['batsman_runs'] = pd.to_numeric(data['batsman_runs'], errors='coerce')

# Top 5 Batsmen
batsmen =
data.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(5)
if batsmen.empty:
raise ValueError("No data available to plot for top batsmen.")

batsmen.plot(kind='bar', color='skyblue')
plt.title('Top 5 Batsmen')
plt.ylabel('Total Runs')
plt.xlabel('Batsman')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'top_batsmen.png'))
plt.close()

# Top 5 Bowlers

16
bowlers =
data.groupby('bowler')['batsman_runs'].sum().sort_values(ascending=True).head(5)
bowlers.plot(kind='bar', color='orange')
plt.title('Top 5 Bowlers')
plt.ylabel('Runs Conceded')
plt.xlabel('Bowler')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'top_bowlers.png'))
plt.close()

# Economy Rate
economy = data.groupby('bowler').agg({'batsman_runs': 'sum', 'over_ball': 'count'})
economy['economy_rate'] = economy['batsman_runs'] / (economy['over_ball'] / 6)
economy = economy.sort_values(by='economy_rate').head(5)
economy['economy_rate'].plot(kind='bar', color='green')
plt.title('Top 5 Economy Rate Bowlers')
plt.ylabel('Economy Rate')
plt.xlabel('Bowler')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'economy_rate.png'))
plt.close()

# Partnership Analysis Heatmap Data Preparation


partnership_data = data.groupby(['batsman',
'bowler'])['batsman_runs'].sum().unstack(fill_value=0)
partnership_data.to_csv('../website/images/partnership_heatmap.csv')

17
# Partnership Analysis Heatmap Visualization
plt.figure(figsize=(10, 8))
sns.heatmap(partnership_data, annot=False, cmap='viridis', cbar=True)
plt.title('Partnership Analysis Heatmap')
plt.xlabel('Batsmen')
plt.ylabel('Partners')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'partnership_heatmap.png'))
plt.close()
Follow this link for the remaining source code: https://github.com/Dharani7704/NM-
-TATA-IPL-Analysis-2025

14. Future Scope

1. Integration with Real-Time Data Pipelines


Implementing real-time data feeds using APIs (e.g., live match stats from IPL)
can enable up-to-the-minute analysis and help in dynamic decision-making during
matches.

2. Advanced Visualization & Automation Tools


Upgrading the dashboard with D3.js or automating reports using Power BI/Plotly
Dash would provide more interactive and professional-grade insights for
stakeholders.

3. Incorporating Sentiment Analysis


Using NLP techniques to analyze fan sentiment from social media or match
reviews (e.g., tweets about player performance) can enrich the analysis with
public perception metrics.
18
4. Linking Analytics to Strategic Systems
Connect performance-based insights with CRM tools or marketing platforms
(e.g., personalized fan engagement, ticket sales optimization) to drive business
impact

15. Team Members and Roles

Dharani. M:
Role: Project Lead, Data Analyst
Responsibilities: Overall project coordination, data collection, data cleaning, and
analysis. Developed key visualizations and interpreted the results.

Hemavathi. S:
Role: Data Scientist
Responsibilities: Data preprocessing, feature engineering, and statistical analysis.
Created various performance metrics and provided insights on team strategies.

Nithya. S:
Role: Frontend Developer
Responsibilities: Designed and developed the interactive HTML dashboard to showcase
visualizations. Ensured smooth integration of charts and user interface.

Jayapriya. R
Role: Research and Documentation Specialist
Responsibilities: Researched IPL 2025 trends, contributed to project methodology, and
prepared the final documentation and report

19

You might also like