0% found this document useful (0 votes)

29 views19 pages

DA Phase 3 Dharani

The document outlines a project focused on analyzing the IPL 2025 deliveries dataset to derive insights for improving team performance and strategic decision-making in cricket. It details the problem statement, objectives, data preprocessing, exploratory data analysis, and key recommendations based on findings. Additionally, it includes system requirements, visualizations, and future scope for enhancing the analysis with real-time data and advanced tools.

Uploaded by

dom37070

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views19 pages

DA Phase 3 Dharani

Uploaded by

dom37070

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Phase-3 Submission Template – Data Analytics

Student Name: Dharani. M

Register Number: 212923243501
Institution: St. Joseph College of Engineering
Department: Artificial Intelligence and Data Science
Date of Submission: 10.09.2025
GitHub Repository Link: https://github.com/Dharani7704/NM--TATA-IPL-Analysis-
2025

1. Problem Statement
In professional cricket leagues like the IPL, franchises invest heavily in players, coaching,
and match planning, yet many decisions around team composition and strategy are still
driven by subjective judgment. This leads to inconsistent performance, undervaluing of
in-form players, and ineffective use of key players. Without a systematic approach to
analyzing player and team data, teams struggle to optimize performance under pressure.
Therefore, there is a real-world need for data-driven tools that evaluate player metrics
such as consistency, strike rate, and match impact, to support smarter player selection and
strategic decisions that can directly influence match outcomes

2. Abstract
This project analyzes the IPL 2025 deliveries dataset to extract valuable insights into
team and player performances. In the context of professional sports analytics, franchises
and coaches rely on data to make strategic decisions such as team selection and match
planning. Using Python-based data analysis techniques, we conducted extensive
preprocessing, exploratory data analysis (EDA), and visualizations. Our approach
involved identifying top batsmen, bowlers, and evaluating team performances through
key metrics like total runs, wickets, and economy rates. The project culminates in a
professional HTML dashboard that visually presents findings. Key insights such as top-
performing players and team strengths enable data-driven strategies in cricket.

3. System Requirements
Hardware Requirements
• Minimum RAM: 4 GB (8 GB recommended for smoother performance)
• Processor: Intel i3 or higher (i5/i7 or AMD equivalent preferred)
• Storage: At least 500 MB of free space for dataset and libraries
• Display: 1366×768 resolution or higher
Software Requirements
• Operating System: Windows 10/11, macOS, or Linux
• Python Version: Python 3.x (3.7 or above recommended)
• Development Environment:
o VS Code
• Required Libraries:
o pandas
o numpy
o matplotlib
o seaborn
o plotly
o openpyxl
o pandas-profiling

2
4. Project Objectives
I. Primary Goals:

• To analyze the IPL 2025 deliveries dataset to uncover performance trends and
player statistics.

• To generate actionable insights about top-performing batsmen, bowlers, and

teams.

• To visualize key metrics using charts and graphs in a professional, website-style

dashboard.

II. Expected Outputs:

• Identification of:

o Top Batsmen based on total runs scored.

o Top Bowlers based on wickets taken and economy rate.

o Best Performing Teams based on match-level aggregations.

• Visualizations:

o Bar charts, pie charts, and other graphical summaries.

o Team-wise performance dashboards using HTML/CSS/JS.

• Cleaned and transformed dataset ready for advanced analytics.

3
III. Business Impact:

• Helps coaches and team analysts make informed player selection and match
strategies.

• Assists sponsors and advertisers in identifying star performers and high-impact

teams.

• Enables fans and fantasy league participants to base decisions on real

performance trends.

5. Project Workflow (Flowchart)

4
6. Dataset Description

➢ Dataset Name and Source

Name: IPL 2025 Deliveries Dataset

Source: Publicly available on Kaggle

This dataset records ball-by-ball information of the Indian Premier League (IPL)
2025 matches.

➢ Data Type

Type: Structured

Format: CSV (Comma Separated Values)

This dataset consists of clearly defined rows and columns, suitable for analysis
using data science tools.

➢ Size and Structure

Number of Rows: Approximately 22,000+ rows

Number of Columns: 18 columns

Each row in the dataset corresponds to a single ball delivery in the IPL 2025 season.

➢ Nature of the Dataset

Type: Static

The dataset is a historical snapshot of all deliveries in the IPL 2025 season and does
not change over time

5
7. Data Preprocessing

● The IPL 2025 deliveries.csv dataset was cleaned and prepared using the following steps
in analysis.py:

• Loading the Dataset

● Loaded the dataset using pandas.read_csv().

● Verified the structure with .info() and .describe().

• Handling Missing Values

● Checked for missing/null values using .isnull().sum().

● No critical missing values affecting key metrics like batsman_runs, total_runs, or

player_dismissed.

● Where necessary (e.g., player_dismissed), missing values were replaced with "None" to
standardize and avoid issues during grouping and analysis.

• Removing Duplicates

● Used .duplicated() .sum() and .drop_duplicates() to remove redundant rows, ensuring

data consistency.

• Data Formatting

● Converted data types for consistency, e.g., making sure batsman_runs, over, ball, etc.,
are numeric.

● Ensured categorical fields like batsman, bowler, and match_id are in string format
where necessary.

• Outlier Detection

● Visual inspection of runs per ball or over showed no extreme anomalies, so outlier
treatment was not applied explicitly.

● However, aggregate metrics like total runs, averages, and economy were carefully
computed with .groupby() to normalize variations.

6
• Transformations & Feature Engineering

● Computed new metrics:

● Top Batsmen (by total runs)

● Top Bowlers (by total wickets)

● Economy Rate (runs conceded per over)

● Partnership Analysis (runs per pair of batsmen)

● Saved the visualizations (.png) to the website/images/ directory for professional web
display.

8. Exploratory Data Analysis (EDA)

We analyzed the relationships between multiple variables to uncover deeper patterns:

• Top Batsmen (Runs Scored)

o Grouped the data by batter and aggregated batsman_runs.

o Visualized using horizontal bar plot showing top 10 batsmen.

• Top Bowlers (Wickets Taken)

o Filtered is WicketDelivery == 1, grouped by bowler.

o Sorted and visualized the top wicket-takers.

• Economy Rate of Bowlers

o Calculated economy rate = total runs conceded / (balls bowled / 6).

Bar chart showed bowlers with best economy under pressure

9. Insights and Interpretation

Below are the key insights derived from our Exploratory Data Analysis (EDA) of IPL
2025 player and team performance:

7
Key Takeaways:

• Top 5 batsmen contributed 48% of total team runs, indicating a high

dependency on core players.

• Powerplay overs (1–6) saw an average run rate of 8.3, while death overs (16–
20) peaked at 11.2, suggesting strategic acceleration.

• Bowler economy rate is best in middle overs (7–15) with an average of 6.9,
highlighting effective containment strategies during that phase.

• Gujarat Titans 75% of matches when defending a target above 180 runs,
showcasing strong death-over bowling and fielding.

• Player X's strike rate improved by 25% compared to IPL 2024, reflecting
improved finishing ability.

• Spin bowlers took 60% of wickets in night matches, indicating dew and pitch
conditions favoring spin in certain venues.

• Teams winning the toss and choosing to bowl first had a 63% win rate,
suggesting a tactical edge under pressure with known targets.

10. Recommendations
Based on the insights generated from the data analysis of IPL 2025, here are data-backed
suggestions for stakeholders to enhance team performance and strategic decision-making.

A. Short-Term Actions
• Optimize Powerplay Batting Strategy
Insight: Run rate during powerplay overs is comparatively low.
Action: Promote aggressive openers or pinch-hitters early in the innings to
maximize the 1–6 over window.
• Use Spin Bowlers More in Night Matches
Insight: Spinners took 60% of wickets in night games.
8
Action: Prioritize including at least two quality spinners in the lineup for
evening matches.
• Toss Strategy – Prefer Chasing
Insight: Teams chasing won 63% of games.
Action: When winning the toss, opt to bowl first to capitalize on pitch
behavior and pressure advantage.
B. Long-Term Strategic Moves
• Reduce Over-Reliance on Star Batsmen
Insight: Top 5 batsmen contribute nearly half the runs.
Action: Develop middle-order strength by grooming young players for
flexible roles.
• Invest in Death Over Specialists
Insight: Death overs yield the highest run rates against most teams.
Action: Recruit or train bowlers with strong yorker and slower ball skills to
control the end overs.
• Venue-Specific Player Selection
Insight: Certain players perform better in specific venues.
Action: Adopt a data-driven squad rotation system based on venue conditions
and player performance history.
• Long-Term Fitness and Form Tracking
Insight: Notable improvements in some players’ strike rates.
Action: Implement continuous performance analytics for tracking form,
fitness, and workload to maintain player peak.

9
11. Visualizations / Dashboard

• Description: Displays the teams with the highest number of runs in IPL history

• Purpose: Highlights the most successful team by total runs in the league.

• Description: Shows the economy rates of bowlers, indicating how many runs they
concede per over.

• Purpose: Helps teams identify bowlers who are economical and can restrict the
opposition's scoring.

10
• Description: Displays the teams with the highest number of wins in IPL history.

• Purpose: Highlights the most successful teams in the league.

• Description: Displays the teams with the highest number of wins in IPL history.

• Purpose: Highlights the most successful teams in the league.

11
• Description: Visualizes the distribution of runs scored across different overs in a match.

• Purpose: Provides insights into scoring patterns and key phases of the game\

Description: visualization highlights the players who have scored the most runs in the IPL. It
provides insights into the most consistent and impactful batsmen in the league.

Purpose: highlight the most consistent and impactful batsmen in the IPL. It provides insights
into players who have contributed significantly to their teams' success by scoring the highest
number of runs

12
• Description: Highlights the top 5 bowlers based on their total wickets taken.

• Purpose: Useful for analyzing the most effective bowlers in the tournament

• Description: Displays the top 5 batsmen based on their total runs scored in the IPL.

• Purpose: Helps identify the most consistent and high-performing batsmen in the league.

13
13. Source Code
• Folder structure:
ipl_2025_project/
│
├── analysis/
│ ├── analysis.py
│ └── deliveries.csv
│
├── website/
│ ├── index.html
│ ├── script.js
│ ├── style.css
│ │
│ └── images/
│ ├── best_ team.png
│ ├── best_teams.png
│ ├── economy_rate.png
│ ├── partnership.png
│ ├── partnership2.png
│ ├── runs_per_over.png
│ ├── top_batsmen.png
│ ├── top_bowlers.png
│ └── top_players.png
└── README.md

14
• Source code:
Analysis.py:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

# Set the working directory to the script's location

os.chdir(os.path.dirname(__file__))

# Ensure the images directory exists

images_dir = '../website/images/'
os.makedirs(images_dir, exist_ok=True)

# Use the absolute path for the dataset file

file_path = os.path.join(os.path.dirname(__file__), 'deliveries.csv')

# Debugging: Print the resolved file path

print(f"Resolved file path for deliveries.csv: {file_path}")

if not os.path.exists(file_path):
raise FileNotFoundError("The dataset file 'deliveries.csv' is missing.")

# Define column names explicitly

15
column_names = [
'match_id', 'date', 'stage', 'venue', 'team1', 'team2', 'innings', 'over_ball',
'batsman', 'bowler', 'batsman_runs', 'extras', 'wides', 'noballs', 'byes', 'legbyes',
'dismissal_kind', 'player_dismissed', 'fielder'
]
data = pd.read_csv(file_path, names=column_names, header=None)

# Ensure 'batsman_runs' is numeric

data['batsman_runs'] = pd.to_numeric(data['batsman_runs'], errors='coerce')

# Top 5 Batsmen
batsmen =
data.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(5)
if batsmen.empty:
raise ValueError("No data available to plot for top batsmen.")

batsmen.plot(kind='bar', color='skyblue')
plt.title('Top 5 Batsmen')
plt.ylabel('Total Runs')
plt.xlabel('Batsman')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'top_batsmen.png'))
plt.close()

# Top 5 Bowlers

16
bowlers =
data.groupby('bowler')['batsman_runs'].sum().sort_values(ascending=True).head(5)
bowlers.plot(kind='bar', color='orange')
plt.title('Top 5 Bowlers')
plt.ylabel('Runs Conceded')
plt.xlabel('Bowler')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'top_bowlers.png'))
plt.close()

# Economy Rate
economy = data.groupby('bowler').agg({'batsman_runs': 'sum', 'over_ball': 'count'})
economy['economy_rate'] = economy['batsman_runs'] / (economy['over_ball'] / 6)
economy = economy.sort_values(by='economy_rate').head(5)
economy['economy_rate'].plot(kind='bar', color='green')
plt.title('Top 5 Economy Rate Bowlers')
plt.ylabel('Economy Rate')
plt.xlabel('Bowler')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'economy_rate.png'))
plt.close()

# Partnership Analysis Heatmap Data Preparation

partnership_data = data.groupby(['batsman',
'bowler'])['batsman_runs'].sum().unstack(fill_value=0)
partnership_data.to_csv('../website/images/partnership_heatmap.csv')

17
# Partnership Analysis Heatmap Visualization
plt.figure(figsize=(10, 8))
sns.heatmap(partnership_data, annot=False, cmap='viridis', cbar=True)
plt.title('Partnership Analysis Heatmap')
plt.xlabel('Batsmen')
plt.ylabel('Partners')
plt.tight_layout()
plt.savefig(os.path.join(images_dir, 'partnership_heatmap.png'))
plt.close()
Follow this link for the remaining source code: https://github.com/Dharani7704/NM-
-TATA-IPL-Analysis-2025

14. Future Scope

1. Integration with Real-Time Data Pipelines

Implementing real-time data feeds using APIs (e.g., live match stats from IPL)
can enable up-to-the-minute analysis and help in dynamic decision-making during
matches.

2. Advanced Visualization & Automation Tools

Upgrading the dashboard with D3.js or automating reports using Power BI/Plotly
Dash would provide more interactive and professional-grade insights for
stakeholders.

3. Incorporating Sentiment Analysis

Using NLP techniques to analyze fan sentiment from social media or match
reviews (e.g., tweets about player performance) can enrich the analysis with
public perception metrics.
18
4. Linking Analytics to Strategic Systems
Connect performance-based insights with CRM tools or marketing platforms
(e.g., personalized fan engagement, ticket sales optimization) to drive business
impact

15. Team Members and Roles

Dharani. M:
Role: Project Lead, Data Analyst
Responsibilities: Overall project coordination, data collection, data cleaning, and
analysis. Developed key visualizations and interpreted the results.

Hemavathi. S:
Role: Data Scientist
Responsibilities: Data preprocessing, feature engineering, and statistical analysis.
Created various performance metrics and provided insights on team strategies.

Nithya. S:
Role: Frontend Developer
Responsibilities: Designed and developed the interactive HTML dashboard to showcase
visualizations. Ensured smooth integration of charts and user interface.

Jayapriya. R
Role: Research and Documentation Specialist
Responsibilities: Researched IPL 2025 trends, contributed to project methodology, and
prepared the final documentation and report

IPL Cricket Data Analysis Guide
No ratings yet
IPL Cricket Data Analysis Guide
13 pages
Group Number: 9: Teams and Players Analysis From 12 IPL Seasons
No ratings yet
Group Number: 9: Teams and Players Analysis From 12 IPL Seasons
2 pages
IPL Performance Dashboard Project
No ratings yet
IPL Performance Dashboard Project
3 pages
Ip Project
No ratings yet
Ip Project
20 pages
Complete IPL Data Analysis Guide - Master Your Inte
No ratings yet
Complete IPL Data Analysis Guide - Master Your Inte
7 pages
Advanced IPL Match Analysis Using Python (Advanced)
No ratings yet
Advanced IPL Match Analysis Using Python (Advanced)
4 pages
IPL Data Analysis for Team Strategy
No ratings yet
IPL Data Analysis for Team Strategy
7 pages
ProjectReport
No ratings yet
ProjectReport
15 pages
CapstoneSynopsis A
No ratings yet
CapstoneSynopsis A
6 pages
Chapter 1 To 3 (1) CDGHH
No ratings yet
Chapter 1 To 3 (1) CDGHH
7 pages
IPL T20 Cricket Analysis Shallshkagksgsohssgsigsgslhsagsjsgsjgsjsh
No ratings yet
IPL T20 Cricket Analysis Shallshkagksgsohssgsigsgslhsagsjsgsjgsjsh
37 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Performance Analysis of A Cricketer by Data Visualization
No ratings yet
Performance Analysis of A Cricketer by Data Visualization
10 pages
MaTHS INVESTIGTORY Project
No ratings yet
MaTHS INVESTIGTORY Project
15 pages
Py Report
No ratings yet
Py Report
13 pages
IPL Auction Analysis For Player Selection Based
No ratings yet
IPL Auction Analysis For Player Selection Based
9 pages
45-Day IPL 2025 Data Analysis Project Roadmap
No ratings yet
45-Day IPL 2025 Data Analysis Project Roadmap
17 pages
IPL Data Analysis for CS Students
No ratings yet
IPL Data Analysis for CS Students
27 pages
This Is The File Give To Us, and Here Are The Task
No ratings yet
This Is The File Give To Us, and Here Are The Task
6 pages
Davp Ipl Jyoti RN
No ratings yet
Davp Ipl Jyoti RN
27 pages
Eda On Ipl Spark
No ratings yet
Eda On Ipl Spark
2 pages
IPL Auction Player Analysis 2025
No ratings yet
IPL Auction Player Analysis 2025
25 pages
Cricket 1 Prediction
No ratings yet
Cricket 1 Prediction
11 pages
Sample Proposal
No ratings yet
Sample Proposal
3 pages
IPL Data Analysis for Students
No ratings yet
IPL Data Analysis for Students
3 pages
Beyong Boundaries Group 1 DV Project IIM Kashipur-1
No ratings yet
Beyong Boundaries Group 1 DV Project IIM Kashipur-1
22 pages
Exp9 Ros Final
No ratings yet
Exp9 Ros Final
5 pages
Bhakti Rao BIThesis
No ratings yet
Bhakti Rao BIThesis
20 pages
Python Final El
No ratings yet
Python Final El
11 pages
Data Analysis PPT New
No ratings yet
Data Analysis PPT New
20 pages
IPL Data Analysis
100% (1)
IPL Data Analysis
26 pages
Intracollege Datathon 2.0 - Case
No ratings yet
Intracollege Datathon 2.0 - Case
5 pages
MB645 Task 2
No ratings yet
MB645 Task 2
2 pages
IPL Visualization Assignment - A
No ratings yet
IPL Visualization Assignment - A
10 pages
IPL Match Prediction Using ML
No ratings yet
IPL Match Prediction Using ML
7 pages
IPL Team Selection Case Study
No ratings yet
IPL Team Selection Case Study
4 pages
Back2Back Brain Dead 2k25
No ratings yet
Back2Back Brain Dead 2k25
37 pages
Cricket Team Analysing
No ratings yet
Cricket Team Analysing
44 pages
IPL Data Anlysis
No ratings yet
IPL Data Anlysis
10 pages
PES UNIVERSITY, Bangalore UE18CS203 B.Tech, Sem III Session: Aug-Dec, 2019 Ue18Cs203 - Introduction To Data Science
No ratings yet
PES UNIVERSITY, Bangalore UE18CS203 B.Tech, Sem III Session: Aug-Dec, 2019 Ue18Cs203 - Introduction To Data Science
4 pages
Cricket Prediction ML
No ratings yet
Cricket Prediction ML
15 pages
Comprehensive Data Analysis and Prediction On IPL Using Machine Learning Algorithms Valarmathi B 2113j1
No ratings yet
Comprehensive Data Analysis and Prediction On IPL Using Machine Learning Algorithms Valarmathi B 2113j1
11 pages
Project New
No ratings yet
Project New
13 pages
Python Final El
No ratings yet
Python Final El
11 pages
ICC Cricket World Cup 2023 EDA
No ratings yet
ICC Cricket World Cup 2023 EDA
16 pages
IPL Analytics: Winning Strategies & Metrics
No ratings yet
IPL Analytics: Winning Strategies & Metrics
7 pages
KUNJ1
No ratings yet
KUNJ1
17 pages
K Sunil
No ratings yet
K Sunil
21 pages
IPL Datasets Visualization
No ratings yet
IPL Datasets Visualization
8 pages
Ads Exp1 Kaushal-Patel 25
No ratings yet
Ads Exp1 Kaushal-Patel 25
6 pages
Tech Saksham: Data Analytics With Power BI
No ratings yet
Tech Saksham: Data Analytics With Power BI
18 pages
Final Ipl Project 1
100% (1)
Final Ipl Project 1
37 pages
5sem - MP - Synopsis Miniproject
No ratings yet
5sem - MP - Synopsis Miniproject
4 pages
Excel Project
No ratings yet
Excel Project
1 page
Is MS Dhoni Ready for IPL 2022?
No ratings yet
Is MS Dhoni Ready for IPL 2022?
24 pages
Report Mini Project
No ratings yet
Report Mini Project
25 pages
Project Report
No ratings yet
Project Report
16 pages
Project ReportBDA
No ratings yet
Project ReportBDA
14 pages
Half Million Secrets
No ratings yet
Half Million Secrets
2 pages
Social Science Club Game: History or Tsismis
No ratings yet
Social Science Club Game: History or Tsismis
4 pages
Music of Southeast Asian: Lesson
No ratings yet
Music of Southeast Asian: Lesson
22 pages
Introduction To Ms-Excel: Spreadsheet Data Pivot Tables Visual Basic For Applications
No ratings yet
Introduction To Ms-Excel: Spreadsheet Data Pivot Tables Visual Basic For Applications
11 pages
G1 Customs of The Tagalogs Bsa I C
100% (1)
G1 Customs of The Tagalogs Bsa I C
16 pages
Internship Report Anguraj
No ratings yet
Internship Report Anguraj
35 pages
2024-03-06
No ratings yet
2024-03-06
17 pages
Algebra Unit Test
No ratings yet
Algebra Unit Test
2 pages
Frequency Dividers: Device Modelling Mini-Project
No ratings yet
Frequency Dividers: Device Modelling Mini-Project
24 pages
Searching & Sorting Introduction To Sorting
No ratings yet
Searching & Sorting Introduction To Sorting
8 pages
Divine Liturgy: Living the Eucharistic Life
No ratings yet
Divine Liturgy: Living the Eucharistic Life
32 pages
DSPC Memo 2024
No ratings yet
DSPC Memo 2024
52 pages
Keihlasan Dan Arti Pentingnya Dalam Mengelola Pendidikan
No ratings yet
Keihlasan Dan Arti Pentingnya Dalam Mengelola Pendidikan
18 pages
Imc 2022 Day 2 Solutions
No ratings yet
Imc 2022 Day 2 Solutions
5 pages
SAP C_TS462_2022 Exam Q&A Demo
No ratings yet
SAP C_TS462_2022 Exam Q&A Demo
5 pages
Arduino Buzzer Programming Guide
No ratings yet
Arduino Buzzer Programming Guide
6 pages
Inverse Trigonometric Function: Multiple Choice Questions
100% (1)
Inverse Trigonometric Function: Multiple Choice Questions
6 pages
Sending Salat Upon The Prophet
No ratings yet
Sending Salat Upon The Prophet
20 pages
Java OOP Lab: Animals & Fibonacci
No ratings yet
Java OOP Lab: Animals & Fibonacci
6 pages
Coal-Assignment 3
No ratings yet
Coal-Assignment 3
11 pages
GR 11 Geo Research Task Loadshedding 2025 (1) - Edited
100% (3)
GR 11 Geo Research Task Loadshedding 2025 (1) - Edited
16 pages
Chinese Homework Answers
100% (1)
Chinese Homework Answers
5 pages
Technical Note: Operating A Movidrive B Using Two DIO11B Option Cards
No ratings yet
Technical Note: Operating A Movidrive B Using Two DIO11B Option Cards
7 pages
XML External Entity XXE Attack 1704716540
No ratings yet
XML External Entity XXE Attack 1704716540
19 pages
Summative Third-Quarter-Exam-in-Reading-and-Writing
100% (9)
Summative Third-Quarter-Exam-in-Reading-and-Writing
4 pages
Romila Thapar - The Past Before Us - Historical Traditions of Early North India-Harvard University Press (2013) - 2 PDF
86% (7)
Romila Thapar - The Past Before Us - Historical Traditions of Early North India-Harvard University Press (2013) - 2 PDF
778 pages
Toeic Speaking: Part 1: Questions 1-2
No ratings yet
Toeic Speaking: Part 1: Questions 1-2
21 pages
Deacons Mass
No ratings yet
Deacons Mass
8 pages
The Victorian Actress in The Novel and On The Stage Renata Kobetts Miller PDF Download
100% (1)
The Victorian Actress in The Novel and On The Stage Renata Kobetts Miller PDF Download
65 pages
English
No ratings yet
English
21 pages

DA Phase 3 Dharani

Uploaded by

DA Phase 3 Dharani

Uploaded by

Phase-3 Submission Template – Data Analytics

Student Name: Dharani. M

• To generate actionable insights about top-performing batsmen, bowlers, and

• To visualize key metrics using charts and graphs in a professional, website-style

II. Expected Outputs:

o Top Batsmen based on total runs scored.

o Top Bowlers based on wickets taken and economy rate.

o Best Performing Teams based on match-level aggregations.

o Bar charts, pie charts, and other graphical summaries.

o Team-wise performance dashboards using HTML/CSS/JS.

• Cleaned and transformed dataset ready for advanced analytics.

• Assists sponsors and advertisers in identifying star performers and high-impact

• Enables fans and fantasy league participants to base decisions on real

5. Project Workflow (Flowchart)

➢ Dataset Name and Source

Name: IPL 2025 Deliveries Dataset

Source: Publicly available on Kaggle

Format: CSV (Comma Separated Values)

➢ Size and Structure

Number of Rows: Approximately 22,000+ rows

Number of Columns: 18 columns

➢ Nature of the Dataset

• Loading the Dataset

● Loaded the dataset using pandas.read_csv().

● Verified the structure with .info() and .describe().

• Handling Missing Values

● Checked for missing/null values using .isnull().sum().

● No critical missing values affecting key metrics like batsman_runs, total_runs, or

● Used .duplicated() .sum() and .drop_duplicates() to remove redundant rows, ensuring

● Computed new metrics:

● Top Batsmen (by total runs)

● Top Bowlers (by total wickets)

● Economy Rate (runs conceded per over)

● Partnership Analysis (runs per pair of batsmen)

8. Exploratory Data Analysis (EDA)

We analyzed the relationships between multiple variables to uncover deeper patterns:

• Top Batsmen (Runs Scored)

o Grouped the data by batter and aggregated batsman_runs.

o Visualized using horizontal bar plot showing top 10 batsmen.

• Top Bowlers (Wickets Taken)

o Filtered is WicketDelivery == 1, grouped by bowler.

o Sorted and visualized the top wicket-takers.

• Economy Rate of Bowlers

o Calculated economy rate = total runs conceded / (balls bowled / 6).

Bar chart showed bowlers with best economy under pressure

9. Insights and Interpretation

• Top 5 batsmen contributed 48% of total team runs, indicating a high

• Purpose: Highlights the most successful teams in the league.

• Purpose: Highlights the most successful teams in the league.

# Set the working directory to the script's location

# Ensure the images directory exists

# Use the absolute path for the dataset file

# Debugging: Print the resolved file path

# Define column names explicitly

# Ensure 'batsman_runs' is numeric

# Partnership Analysis Heatmap Data Preparation

14. Future Scope

1. Integration with Real-Time Data Pipelines

2. Advanced Visualization & Automation Tools

3. Incorporating Sentiment Analysis

15. Team Members and Roles

You might also like