Phase-3 Submission Template – Data Analytics
Student Name: Dharani. M
Register Number: 212923243501
Institution: St. Joseph College of Engineering
Department: Artificial Intelligence and Data Science
Date of Submission: 10.09.2025
GitHub Repository Link: https://github.com/Dharani7704/NM--TATA-IPL-Analysis-
2025
1. Problem Statement
In professional cricket leagues like the IPL, franchises invest heavily in players, coaching,
and match planning, yet many decisions around team composition and strategy are still
driven by subjective judgment. This leads to inconsistent performance, undervaluing of
in-form players, and ineffective use of key players. Without a systematic approach to
analyzing player and team data, teams struggle to optimize performance under pressure.
Therefore, there is a real-world need for data-driven tools that evaluate player metrics
such as consistency, strike rate, and match impact, to support smarter player selection and
strategic decisions that can directly influence match outcomes
2. Abstract
This project analyzes the IPL 2025 deliveries dataset to extract valuable insights into
team and player performances. In the context of professional sports analytics, franchises
and coaches rely on data to make strategic decisions such as team selection and match
planning. Using Python-based data analysis techniques, we conducted extensive
preprocessing, exploratory data analysis (EDA), and visualizations. Our approach
involved identifying top batsmen, bowlers, and evaluating team performances through
key metrics like total runs, wickets, and economy rates. The project culminates in a
professional HTML dashboard that visually presents findings. Key insights such as top-
performing players and team strengths enable data-driven strategies in cricket.
3. System Requirements
Hardware Requirements
• Minimum RAM: 4 GB (8 GB recommended for smoother performance)
• Processor: Intel i3 or higher (i5/i7 or AMD equivalent preferred)
• Storage: At least 500 MB of free space for dataset and libraries
• Display: 1366×768 resolution or higher
Software Requirements
• Operating System: Windows 10/11, macOS, or Linux
• Python Version: Python 3.x (3.7 or above recommended)
• Development Environment:
o VS Code
• Required Libraries:
o pandas
o numpy
o matplotlib
o seaborn
o plotly
o openpyxl
o pandas-profiling
2
4. Project Objectives
I. Primary Goals:
• To analyze the IPL 2025 deliveries dataset to uncover performance trends and
player statistics.
• To generate actionable insights about top-performing batsmen, bowlers, and
teams.
• To visualize key metrics using charts and graphs in a professional, website-style
dashboard.
II. Expected Outputs:
• Identification of:
o Top Batsmen based on total runs scored.
o Top Bowlers based on wickets taken and economy rate.
o Best Performing Teams based on match-level aggregations.
• Visualizations:
o Bar charts, pie charts, and other graphical summaries.
o Team-wise performance dashboards using HTML/CSS/JS.
• Cleaned and transformed dataset ready for advanced analytics.
3
III. Business Impact:
• Helps coaches and team analysts make informed player selection and match
strategies.
• Assists sponsors and advertisers in identifying star performers and high-impact
teams.
• Enables fans and fantasy league participants to base decisions on real
performance trends.
5. Project Workflow (Flowchart)
4
6. Dataset Description
➢ Dataset Name and Source
Name: IPL 2025 Deliveries Dataset
Source: Publicly available on Kaggle
This dataset records ball-by-ball information of the Indian Premier League (IPL)
2025 matches.
➢ Data Type
Type: Structured
Format: CSV (Comma Separated Values)
This dataset consists of clearly defined rows and columns, suitable for analysis
using data science tools.
➢ Size and Structure
Number of Rows: Approximately 22,000+ rows
Number of Columns: 18 columns
Each row in the dataset corresponds to a single ball delivery in the IPL 2025 season.
➢ Nature of the Dataset
Type: Static
The dataset is a historical snapshot of all deliveries in the IPL 2025 season and does
not change over time
5
7. Data Preprocessing
● The IPL 2025 deliveries.csv dataset was cleaned and prepared using the following steps
in analysis.py:
• Loading the Dataset
● Loaded the dataset using pandas.read_csv().
● Verified the structure with .info() and .describe().
• Handling Missing Values
● Checked for missing/null values using .isnull().sum().
● No critical missing values affecting key metrics like batsman_runs, total_runs, or
player_dismissed.
● Where necessary (e.g., player_dismissed), missing values were replaced with "None" to
standardize and avoid issues during grouping and analysis.
• Removing Duplicates
● Used .duplicated() .sum() and .drop_duplicates() to remove redundant rows, ensuring
data consistency.
• Data Formatting
● Converted data types for consistency, e.g., making sure batsman_runs, over, ball, etc.,
are numeric.
● Ensured categorical fields like batsman, bowler, and match_id are in string format
where necessary.
• Outlier Detection
● Visual inspection of runs per ball or over showed no extreme anomalies, so outlier
treatment was not applied explicitly.
● However, aggregate metrics like total runs, averages, and economy were carefully
computed with .groupby() to normalize variations.
6
• Transformations & Feature Engineering
● Computed new metrics:
● Top Batsmen (by total runs)
● Top Bowlers (by total wickets)
● Economy Rate (runs conceded per over)
● Partnership Analysis (runs per pair of batsmen)
● Saved the visualizations (.png) to the website/images/ directory for professional web
display.
8. Exploratory Data Analysis (EDA)
We analyzed the relationships between multiple variables to uncover deeper patterns:
• Top Batsmen (Runs Scored)
o Grouped the data by batter and aggregated batsman_runs.
o Visualized using horizontal bar plot showing top 10 batsmen.
• Top Bowlers (Wickets Taken)
o Filtered is WicketDelivery == 1, grouped by bowler.
o Sorted and visualized the top wicket-takers.
• Economy Rate of Bowlers
o Calculated economy rate = total runs conceded / (balls bowled / 6).
Bar chart showed bowlers with best economy under pressure
9. Insights and Interpretation
Below are the key insights derived from our Exploratory Data Analysis (EDA) of IPL
2025 player and team performance:
7
Key Takeaways:
• Top 5 batsmen contributed 48% of total team runs, indicating a high
dependency on core players.
• Powerplay overs (1–6) saw an average run rate of 8.3, while death overs (16–
20) peaked at 11.2, suggesting strategic acceleration.
• Bowler economy rate is best in middle overs (7–15) with an average of 6.9,
highlighting effective containment strategies during that phase.
• Gujarat Titans 75% of matches when defending a target above 180 runs,
showcasing strong death-over bowling and fielding.
• Player X's strike rate improved by 25% compared to IPL 2024, reflecting
improved finishing ability.
• Spin bowlers took 60% of wickets in night matches, indicating dew and pitch
conditions favoring spin in certain venues.
• Teams winning the toss and choosing to bowl first had a 63% win rate,
suggesting a tactical edge under pressure with known targets.
10. Recommendations
Based on the insights generated from the data analysis of IPL 2025, here are data-backed
suggestions for stakeholders to enhance team performance and strategic decision-making.
A. Short-Term Actions
• Optimize Powerplay Batting Strategy
Insight: Run rate during powerplay overs is comparatively low.
Action: Promote aggressive openers or pinch-hitters early in the innings to
maximize the 1–6 over window.
• Use Spin Bowlers More in Night Matches
Insight: Spinners took 60% of wickets in night games.
8
Action: Prioritize including at least two quality spinners in the lineup for
evening matches.
• Toss Strategy – Prefer Chasing
Insight: Teams chasing won 63% of games.
Action: When winning the toss, opt to bowl first to capitalize on pitch
behavior and pressure advantage.
B. Long-Term Strategic Moves
• Reduce Over-Reliance on Star Batsmen
Insight: Top 5 batsmen contribute nearly half the runs.
Action: Develop middle-order strength by grooming young players for
flexible roles.
• Invest in Death Over Specialists
Insight: Death overs yield the highest run rates against most teams.
Action: Recruit or train bowlers with strong yorker and slower ball skills to
control the end overs.
• Venue-Specific Player Selection
Insight: Certain players perform better in specific venues.
Action: Adopt a data-driven squad rotation system based on venue conditions
and player performance history.
• Long-Term Fitness and Form Tracking
Insight: Notable improvements in some players’ strike rates.
Action: Implement continuous performance analytics for tracking form,
fitness, and workload to maintain player peak.
9
11. Visualizations / Dashboard
• Description: Displays the teams with the highest number of runs in IPL history
• Purpose: Highlights the most successful team by total runs in the league.
• Description: Shows the economy rates of bowlers, indicating how many runs they
concede per over.
• Purpose: Helps teams identify bowlers who are economical and can restrict the
opposition's scoring.
10
• Description: Displays the teams with the highest number of wins in IPL history.
• Purpose: Highlights the most successful teams in the league.
• Description: Displays the teams with the highest number of wins in IPL history.
• Purpose: Highlights the most successful teams in the league.
11
• Description: Visualizes the distribution of runs scored across different overs in a match.
• Purpose: Provides insights into scoring patterns and key phases of the game\
Description: visualization highlights the players who have scored the most runs in the IPL. It
provides insights into the most consistent and impactful batsmen in the league.
Purpose: highlight the most consistent and impactful batsmen in the IPL. It provides insights
into players who have contributed significantly to their teams' success by scoring the highest
number of runs
12
• Description: Highlights the top 5 bowlers based on their total wickets taken.
• Purpose: Useful for analyzing the most effective bowlers in the tournament
• Description: Displays the top 5 batsmen based on their total runs scored in the IPL.
• Purpose: Helps identify the most consistent and high-performing batsmen in the league.
13
13. Source Code
• Folder structure:
ipl_2025_project/
│
├── analysis/
│ ├── analysis.py
│ └── deliveries.csv
│
├── website/
│ ├── index.html
│ ├── script.js
│ ├── style.css
│ │
│ └── images/
│ ├── best_ team.png
│ ├── best_teams.png
│ ├── economy_rate.png
│ ├── partnership.png
│ ├── partnership2.png
│ ├── runs_per_over.png
│ ├── top_batsmen.png
│ ├── top_bowlers.png
│ └── top_players.png
└── README.md
14
• Source code:
Analysis.py:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
# Set the working directory to the script's location
os.chdir(os.path.dirname(__file__))
# Ensure the images directory exists
images_dir = '../website/images/'
os.makedirs(images_dir, exist_ok=True)
# Use the absolute path for the dataset file
file_path = os.path.join(os.path.dirname(__file__), 'deliveries.csv')
# Debugging: Print the resolved file path
print(f"Resolved file path for deliveries.csv: {file_path}")
if not os.path.exists(file_path):
raise FileNotFoundError("The dataset file 'deliveries.csv' is missing.")
# Define column names explicitly
15
column_names = [
'match_id', 'date', 'stage', 'venue', 'team1', 'team2', 'innings', 'over_ball',
'batsman', 'bowler', 'batsman_runs', 'extras', 'wides', 'noballs', 'byes', 'legbyes',
'dismissal_kind', 'player_dismissed', 'fielder'
]
data = pd.read_csv(file_path, names=column_names, header=None)
# Ensure 'batsman_runs' is numeric
data['batsman_runs'] = pd.to_numeric(data['batsman_runs'], errors='coerce')
# Top 5 Batsmen
batsmen =
data.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(5)
if batsmen.empty:
raise ValueError("No data available to plot for top batsmen.")
batsmen.plot(kind='bar', color='skyblue')
plt.title('Top 5 Batsmen')
plt.ylabel('Total Runs')
plt.xlabel('Batsman')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'top_batsmen.png'))
plt.close()
# Top 5 Bowlers
16
bowlers =
data.groupby('bowler')['batsman_runs'].sum().sort_values(ascending=True).head(5)
bowlers.plot(kind='bar', color='orange')
plt.title('Top 5 Bowlers')
plt.ylabel('Runs Conceded')
plt.xlabel('Bowler')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'top_bowlers.png'))
plt.close()
# Economy Rate
economy = data.groupby('bowler').agg({'batsman_runs': 'sum', 'over_ball': 'count'})
economy['economy_rate'] = economy['batsman_runs'] / (economy['over_ball'] / 6)
economy = economy.sort_values(by='economy_rate').head(5)
economy['economy_rate'].plot(kind='bar', color='green')
plt.title('Top 5 Economy Rate Bowlers')
plt.ylabel('Economy Rate')
plt.xlabel('Bowler')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'economy_rate.png'))
plt.close()
# Partnership Analysis Heatmap Data Preparation
partnership_data = data.groupby(['batsman',
'bowler'])['batsman_runs'].sum().unstack(fill_value=0)
partnership_data.to_csv('../website/images/partnership_heatmap.csv')
17
# Partnership Analysis Heatmap Visualization
plt.figure(figsize=(10, 8))
sns.heatmap(partnership_data, annot=False, cmap='viridis', cbar=True)
plt.title('Partnership Analysis Heatmap')
plt.xlabel('Batsmen')
plt.ylabel('Partners')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'partnership_heatmap.png'))
plt.close()
Follow this link for the remaining source code: https://github.com/Dharani7704/NM-
-TATA-IPL-Analysis-2025
14. Future Scope
1. Integration with Real-Time Data Pipelines
Implementing real-time data feeds using APIs (e.g., live match stats from IPL)
can enable up-to-the-minute analysis and help in dynamic decision-making during
matches.
2. Advanced Visualization & Automation Tools
Upgrading the dashboard with D3.js or automating reports using Power BI/Plotly
Dash would provide more interactive and professional-grade insights for
stakeholders.
3. Incorporating Sentiment Analysis
Using NLP techniques to analyze fan sentiment from social media or match
reviews (e.g., tweets about player performance) can enrich the analysis with
public perception metrics.
18
4. Linking Analytics to Strategic Systems
Connect performance-based insights with CRM tools or marketing platforms
(e.g., personalized fan engagement, ticket sales optimization) to drive business
impact
15. Team Members and Roles
Dharani. M:
Role: Project Lead, Data Analyst
Responsibilities: Overall project coordination, data collection, data cleaning, and
analysis. Developed key visualizations and interpreted the results.
Hemavathi. S:
Role: Data Scientist
Responsibilities: Data preprocessing, feature engineering, and statistical analysis.
Created various performance metrics and provided insights on team strategies.
Nithya. S:
Role: Frontend Developer
Responsibilities: Designed and developed the interactive HTML dashboard to showcase
visualizations. Ensured smooth integration of charts and user interface.
Jayapriya. R
Role: Research and Documentation Specialist
Responsibilities: Researched IPL 2025 trends, contributed to project methodology, and
prepared the final documentation and report
19