[go: up one dir, main page]

0% found this document useful (0 votes)
31 views7 pages

IPL Data Analysis for Team Strategy

Submit

Uploaded by

Abhinav Kora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views7 pages

IPL Data Analysis for Team Strategy

Submit

Uploaded by

Abhinav Kora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

IPL Data Analysis and Visualization for Team

Selection and Profit Strategy


Saranya G Aravind Swaminathan Joel Benjamin J
Department of Computer Science MSc student Data Science MSc student Data Science
and Engineering, Middlesex University, Middlesex University,
Amrita School of Computing, Dubai Dubai
Amrita Vishwa Vidyaapeetham aravindsswaami@gmail.com joelbenjamin1990@gmail.com
Chennai campus,
Chennai, India
g_saranya@ch.amrita.edu

Surendran R Leema Nelson


Department of Computer Science and Engineering, Department of Computer Science and Engineering,
Saveetha School of Engineering Chitkara University Institute of Engineering & technology,
Saveetha Institute of Medical and Technical Sciences, Chitkara University,
Chennai, India Punjab, India
surendran.phd.it@gmail.com leema.nelson@chitkara.edu.in

Abstract- Day by day, the role of data science and


machine learning in cricket is increasing due to the large
amount of data generated from a single player on a Today, the demand for cricket is growing rapidly,
whole line. The field of data science is the intensive study with many people focusing on data analysis and data
of data to extract insights and knowledge from the data prediction through various digital technologies.
and apply the acquired knowledge and actionable Predicting and Analyzing the Indian Premier League
insights. We use these available data and statistics to (IPL) data through Python play a key role in the
predict things like the team's first-innings score and the selection of players. The players are chosen based on
probability of winning the second team, etc. In this various factors.
paper, we will work on Indian Premier League (IPL)
Data Analysis 2022 and Data Visualization for the The cricket board and the mentor to decide the
duration (2008-2020) using Python. This application team selection for the Indian Premier League and the
modules are followed by preprocessing, data analysis, captain also has a main role in choosing the team
and visualization, and finally create a model that squad on seeing the average scores of the team players
predicts the team's overall score and the probability of against the opponent team players in the previous
winning. When building models, we use python matches played. So, this paper is based on the victory
algorithms such as Numpy for Scientific Computing, of individual players of the team based on the median
Pandas for Data Analysis, and finally Matplotlib and details of the matches played before. The mentor
Seaborn for Data Visualization. Finally, this paper helps decides on the greatest batting and the finest bowling
the trainers and owners of the team during the auction performances, and the analysis of all-rounder
to select an emerging player to win the matches in the performances. Finally, based on these analyses, 15
entire season and to win the trophy and to get profit to
players are selected for the Indian Premier League.
the owners by the profit strategy done in this analysis
report.
This conveys the prediction and analysis through
python algorithms that are being used in this research
Keywords: Data Analysis, Data Cleaning, Indian project. Thus, these algorithms forecast the performer's
Premier League, Prediction, Pandas, Numpy, Matplotlib, moderate score efficiently. The results of the study
Seaborn. show that the prediction of the entire team is
convincing and realistic.
I. INTRODUCTION
II. EXISTING SYSTEM
Python is a division of artificial intelligence, where
real-time problem statements can be solved in the real In the existing technique, there is an algorithm to
world. This method is purely done using Python compute the projected score and a winning predictor
programming using library files such as umpy, Pandas, based on the win percentage of a squad and ballots.
Matplotlib, Seaborn and depends on learning data from These strategies won’t give precise outcomes because
previous data and predicting the result accordingly. they are based on projections and perceptions based on
Python methods benefit from the use of knowledge a certain instance. Consider computing the tossed
acquisition and mathematical models. score using the traditional algorithm [1]. Projected
score = Current Run Rate * Overs remaining in an
Innings [2].
umpires with match location, and all other necessary
details. Jupyter, an Open License and Scientific
The moderate accurateness obtained by the
Python Development Environment that is integrated
following the above algorithm is very less. Trainers
with Visual Studio Code (VSC) version 1.64.0, was
can include good all-rounder performers to improve
used to conduct data analysis and plotting for
squad outcomes [3]. Several pictorial illustrations and
visualisation. Jupyter has a large collection of
visual methods were employed to analyze the batting
scientific packages for in-depth analysis, as well as the
and bowling performances of the cricketers [4]. Then
ability to perform data exploration. Jupyter is used to
For the computation and interpretation of the charts,
perform Ethical Data Analysis and Visualization using
the work examined the batsman and Bowler's history
NumPy, Pandas, Matplotlib, and Seaborn. These
from the 2008 season [5]. Nominated were 12 bowlers
Python library packages aid in basic and modern
and 12 batsmen who bowled at least 15 overs and took
visualisation. This paper proposes a system
at least 5 wickets, and batsmen who faced at least 20
development approach that employs NumPy and
overs and had at least 5 completed innings [6].
Pandas for data analysis, as well as Seaborn and
The analysis shows that Indian bowlers performed Matplotlib for performer visualisation.
well, with Indians ranking among the top ten bowlers
B. Data Collection
in all four seasons (2009, 2011, 2015, and 2019) [7].
This algorithm-based approach is used to huddle the The exploratory data has been gathered from
players by position and grade the players' performance various website sources such as
[8]. Based on performance, players from the 2008 IPL www.kaggle.com,www.data.world/datasets/ipl, dataset
season were selected for creating rank [9]. Based on has information pertaining to all matches played from
their bowling and batting performances, players were 2008 to 2020. It has 17 attributes and 816 entities.
divided into groups [10]. They assess the performance Each row corresponds to the information of a unique
of individual players from each team [11]. They used match. The vital data attributes included are the city,
Python algorithms to simulate batsmen and bowlers' date, venue, best performer of the match, team played,
execution based on past and current career data [12]. winning the toss, decision made on winning the toss,
The experiment is carried out by using a bowling the team which won the match, final result, super ,
average, strike rate, and economy, all of which are umpire’s details of the match.
referred to as the Combined Bowling Rate [13]. C. Importing IPL Dataset
A statistical model is developed to estimate a In importing the dataset, we have our data in the
player's value by taking into account various statistics form of excel which has the matches which were
of batsmen, bowlers, and all-rounders [14]. The work played from 2008 till 2020. It is shown in Fig. 1. First
is done to evaluate bowlers' performance [15]. They of all, Pandas is a software library used in Python for
attempted to develop a systematic analytical regression Data Manipulation and Data Analysis. The Panda is
model to select better auction participants [16]. In this furthermore a free software that was released under the
paper, a multiple criteria decision-making evolutionary three-clause BSD license. We have used Panda library
method was used to optimise a squad's batting and for importing the details of the matches played from
bowling stability and to find group mates. An 2008 to 2020 into the Jupyter Platform under Visual
algorithm is used to evaluate each player's Studio Code (VSC) for analyzing the data and it is also
performance [17]. used for printing the analysed data using the head
function in Python. In this, the pandas will use the
III. PROPOSED SYSTEM read.csv command along with the pathway of the
In our proposed system, we have used python dataset to the Jupyter platform which we have already
libraries like Pandas, NumPy, Matplotlib, and Seaborn connected to Jupyter Server and using it in the Visual
for Data Analyzing and Data Visualization. Firstly, we Studio Code (VSC) platform.
will clean the given set of incompetent match data into
a functional form and eliminate the redundancies from
the dataset like inappropriate team titles, teams with
distinct spellings by the strategy of Data Cleaning after
that, we will use the above-mentioned software to
explore various fields of match by acquiring the
statistics from which we can analyze some useful and
fascinating outcomes.
A. Tools and Methodology
The Indian Premier League is one of the most
popular T20 leagues in the world, with millions of fans
worldwide. From 2008 to 2020, approximately 816
matches were played. There is a massive amount of
Fig. 1. Importing IPL Data from Raw Data Set
data that includes ball-by--by-ball acuity for each team
played, match innings, the date the match occurred,
venue, toss, overs, wickets, boundaries, extras, winner,
D. Checking Dataset for NULL Values
Before starting the process of IPL Data Analysis,
we need to check the columns, rows, and NULL
values present in the dataset so that we can conclude
the analyzing phase by removing the NULL Values
present in it. For this analyzing phase, we use the
Panda “match_data.isnull().sum()” function, where the
“isnull” is used to check the number of NULL Values
in a column and the “sum” is used to add the total
number of NULL Values present in a column. It is
shown in Fig. 2.

Fig. 3. Extract Seasons based on Date

3. Analyzing the Total Matches Played in Each


Particular Season
Fig. 2. Checking NULL Values from the Data Here we have created a new variable named
IV. ANALYSIS AND VISUALIZATION “match_season_data”, where we have used the Panda
“group by” library and we have also grouped the
With all basic understanding of the attributes season columns using the unique id given to each row
present in Python, we will start the project of of the data set. After this analyzing phase, we have
analyzing the IPL Dataset and visualize it in various used the "count" function to count the total number of
forms such as bar chart, pie chart, bar graph, and line matches played in every season from 2008 to 2020.
graph. In this, we can also extract particular data from After this, we have renamed the name of the “id” to
the dataset and create a new dataset with the specific “total count” which will be displayed next to the
columns and rows we require. Season column. It is shown in Fig. 4.
1. List of Extracted Columns
In the given data set, we use the panda
“match_data.columns” command to extract the
particular column names from the entire data set and
visualize them in the form of object data types [18].
Data set collects from
https://www.kaggle.com/rishpande/indian-premier-
league-ipl-data-visualization.
2. Extraction of Season Based on Date Column
For this, we will be creating a new column named
“Season” in the excel data set. We use the
“DatetimeIndex” array, which is a part of Pandas for
separating the year from the date column. It is a
process similar to the data extraction and visualizing it
as the outcome. We have extracted this year from the
date column and named it Season because we will use Fig. 4. Analyzing Total Matches Played
it in the process for visualizing purposes. It is shown in
Fig. 3.
Fig. 6. Total Runs Scored in Entire Season

4. Merging Match Data Set with the Ball Data Set


using Right Join 6. Bar Graph Visualization for the Total Matches
Played in IPL Seasons
In this process, firstly we will create a new variable
named “season_data” and we will merge both the In this, we have used the Seaborn “sns.countplot”
match data set and ball data set. The “id” and “season” function to visualize the “Season” column from the
column from the match data is merged with ball data match data set. Then we use the matplotlib “plt.xticks”
by assigning the left column for match data's “id” and and “plt.yticks” function to set the attributes such as
the right column for ball data's “id” and this merging is rotation and font size of the x-axis and y-axis of the
done using “Left Join”. Then the panda bar graph. We also use “plt.xlabel” and “plt.ylabel”
“season_data.head” function is used to print the matplotlib functions to set the name of the x-axis as
outcome. It is shown in Fig. 5. “Season” and y-axis as “Count” respectively. Finally,
we used the “plt.title” function to assign the title as
“Total Matches played in a season” in the bar graph. It
is shown in Fig. 7.

Fig. 5. Merge Ball Data with Season Data


Fig. 7. Total Matches Played in Entire Season
5. Line Graph Visualization for the Total Number of
Runs Scored in the Entire IPL Season
Here we will use the above data which is being
extracted from the updated ball data which is nothing
but the season data. Initially, we have calculated the
total number of runs scored per season by using the
Python “sum” function. Then we have set a variable
“p” to set the index of the “Season”. Then “ax”
variable is allotted to plot the line graph using the
matplotlib library. Then we have used some styling 7. Bar Plot Showing the Number of Toss Won by Each
factors like setting the face color of the graph using Team
“ax.set” command. Finally, we have used the Seaborn We will be analyzing the match data and extracting
“sns.lineplot” function to draw a line graph using the the “toss win” column and will use the
data which is imported using the “p” variable. The “value_counts()” function to get the total number of
palette color is then set to “magma”. Then we use the tosses won by each team. In this, we will be using the
matplotlib “plot.title” to set the title of the line graph same procedure which we have implemented for the
and, we have set the font size and font weight that bar graph visualization. Along with this, we have used
needs to be printed as output. At last, we have used the the “sns.barplot” function to set the position of the bar
matplotlib “plt.show()” function to visualize the line graph and to assign the color settings like the palette,
graph. It is shown in Fig. 6. saturation, etc. It is shown in Fig. 8.
Fig. 9. Extraction of Particular Data

9. Visualization of Dismissal Kind in the form of Pie


Chart
In this step, we have extracted the specific column
“dismissal_kind” inside the variable “df_raina”.
Fig. 10. Dismissal Kind of Data

After this, we will be counting the total number of


dismissal kinds available in the specific column of the
entire IPL season data. Then we will convert the same
into percentage values so that they can be used for the
pictorial representation of the pie chart in Fig. 10.
Fig
10. Data Extraction for the Runs Scored by
Unspecific Player
In this process, we will initially define a Python
. 8. Total No. of Toss won by Each Team function “def.count(df_raina,runs)”. This function is
8. Extraction of a Particular Data from Data Sheet defined to extract and deliver a detailed score point of
a specific player in the entire duration of the season
In this, we will be using the Panda library to pull a from 2008 to 2020. We will use the “print” statement
particular set of data from the complete data set. to produce the sum of the score points in a detailed
Firstly, we have created a variable named “player”, in manner like the number of singles, doubles, 3's,
which a particular data is imported from the ball data fours,5's, and sixes taken by that specific player by
set. For Instance, {player=(ball_data['batsman’] =='SK using the “df_raina” data set. It is shown in Fig. 10.
Raina'} In this command a typical batsman is picked Exploratory Data Evaluation and Envisage the Top 10
from the entire data set and extracted to visualize the Run Scorers in the IPL Season using the same
outcome. In this, we have also assigned a new variable procedure which we have done above to extract the
named “df_raina” to extract the specific player details data. The extracted data is then visualized in the form
and to visualize. It is shown in Fig. 9. of a bar chart. Along with this, we have used the
“sns.barplot” function to set the position of the bar
graph and to assign the color settings like palette,
saturation, rotation, font size, and font weight. It is
shown in Fig. 11.

Fig. 11. Runs Scored by Unspecific Player

In this step, we have created a variable named


“run” and have imported the data from the ball data
set. Once we import the data, we use the “group by”
function to group all details of the batsmen and the
sum of the total runs scored by them. We assign the
column names by using the “runs.columns” function.
Finally, we use the “runs.sort_values” function to sort
the group of the batsmen and the total sum of their
scores. We have specified some conditions such as
“sort by run column” and the sorting should be in
“descending order" so that we can get the top 10
batsmen who scored the maximum runs in the entire
IPL season. It is shown in Fig. 12.
Fig. 14. Highest MOM Award Winners

Fig. 12. Envisage of Top 10 Runs Scorer This report has been implemented to investigate
the results of IPL matches from 2008 to 2020 using
Envisage the Top 10 Run Scorers in the IPL
Python algorithms on both proportional and uneven
Season in the form of Bar Chart. We will be
datasets. The benchmark which is used to scrutinize
analyzing the batsman data and extracting the
the results of matches was constructed successfully
"runs" column and “batsman” column and will use
with a precision rate of 94% for the congruous
the "ax” function to get the total runs scored by each
dataset using the classifiers, which is after
team. It is shown in Fig. 13.
resampling the imbalanced IPL dataset. This report
emphasises player performance, particularly
batsmen, and addresses the study that is done for the
maximum number of Men of the Match, Leading
Batsmen, and top 10 performers on the Most Runs.
Statistics from approximately 816 matches were
used in this investigation, as well as toss-related
breakdowns such as the total number of toss wins,
judgments made by each squad after winning the
toss, and toss decisions made by each squad
throughout the season.
REFERENCES
[1] Y. Kumar, H. Sharma and R. Pal, “Popularity Measuring
and Prediction Mining of IPL Team Using Machine
Fig. 13. Top Run Scorer in the Season Learning,” 2021 9th International Conference on
Reliability, Infocom Technologies and Optimization
Visualization of the Highest Number of Mom (Trends and Future Directions) (ICRITO), Noida, India, pp.
Award Winners in the form of Bar Chart. In this 1-5, 2021.
process, we have analysed and extracted the [2] V. Kanungo, and T. Bomatpalli, “Data visualization and
“playerofthematch” column and have counted the toss related analysis of IPL teams and batsmen
performances,” International Journal of Electrical and
total in the entire season. After this, we have Computer Engineering, vol. 9, no. 5, 4423, 2019.
visualized it in the form of a bar graph with the [3] S.Agrawal, S. P. Singh, and J. K. Sharma, “Predicting
highest number of “players of the match” to be Results of Indian Premier League T-20 matches using
displayed at first and it should continue in the machine learning,” 8th International Conference on
decreasing order. We have assigned the bar graph Communication Systems and Network Technologies
(CSNT), IEEE, 2018.
with a title “Highest MOM award winner” and the x
and y axes are named as “Players” and “Count” [4] A. C. Kaluarachchi, and A. S. Varde, “CricAI: A
classification based tool to predict the outcome in ODI
respectively. It is shown in Fig. 14. cricket,” Conference: IEEE Information and Automation for
Sustainability (ICIAFs), 2010.
[5] T. Tamilvizhi, B. P. Varthini and R. Surendran, “An
V. Conclusion improved solution for resource management based on
elastic cloud balancing and job shop scheduling,” ARPN
Journal of Engineering and Applied Sciences, vol. 10, no.
18, 2015.
[6] T. Vignesh, K. K. Thyagharajan, R. B. Jeyavathana, and R.
P. Kumar “Land use and land cover classification using
landsat-8 multispectral remote sensing images and long
short-term memory-recurrent neural network”, AIP
Conference Proceedings, no. 2452, 070001, 2022.
[7] C. Deep Prakash, C. Patvardhan, and S. Singh, “A new
deep performance index based on machine learning for
ranking IPL T20 cricketers,” International Journal of
Computer Applications, vol. 137, no. 10, pp. 42-49, 2016.
[8] R. Surendran and T. Tamilvizhi, “How to improve the
resource utilization in cloud data center?,” 2018
International Conference on Innovation and Intelligence
for Informatics, Computing, and Technologies (3ICT),
2018, pp. 1-6.
[9] T. A. Assegie, P. K. Rangarajan, N. K. Kumar, and D.
Vigneswari, “An empirical study on machine learning
algorithms for heart disease prediction,” IAES
International Journal of Artificial Intelligence (IJAI), vol.
11, no. 3, pp. 1066-1073, 2022.
[10] S Priyanka, K Vysali, and K B Priya Iyer, “Indian Premier
League-IPL 2020 prediction using data mining
algorithms,” International Journal for Research in
Applied Science & Engineering Technology (IJRASET) ,
Vol. 8 Issue II, Feb 2020.
[11] R. Dutt, T. A. Kusupati, A. Srivastava and D. Hore, “IPL
Player Selection using Fuzzy Logic,” 2022 IEEE
Industrial Electronics and Applications Conference
(IEACon), Kuala Lumpur, Malaysia, pp. 180-184, 2022.
[12] M. Ramalingam, S. Gokul, L. S. Mythravarshini and K. S.
Harine, “Efficient Player Prediction and Suggestion using
Machine Learning for IPL Tournament,” 2022
International Mobile and Embedded Technology
Conference (MECON), Noida, India, pp. 162-167, 2022.
[13] K. Sharma, G. Singh, and P. Goyal, “IPDCN2: Improvised
patch-based deep cnn for facial retouching detection,”
Expert Systems with Applications, vol. 211, p.118612.
2023.
[14] P. J. Rani and A. Menon, “Selection of players and team
for an indian premier league cricket match using
ensembles of classifiers,” IEEE, 2020.
[15] R. Patel and M. Brahmbhatt, “Insights of IPL: 2008 to
2020 and why it is interesting,” International Journal of
Advance Research, Ideas and Innovations in Technology,
vol. 7, no. 4, 2021.
[16] B. Srinivasa Rao1, G. Lakshman Teja, N. Anusree, and
M. Pavan Kumar, “IPL data analysis and prediction using
machine learning,” International Research Journal of
Engineering and Technology (IRJET), vol. 08, no. 11,
2021.
[17] S. Tiwari, S. Kumar, and K. Guleria “Outbreak Trends of
CoronaVirus (COVID-19) in India: a prediction,”
Disaster Medicine and Public Health Preparedness, pp. 1-
9, Cambridge University Press, 2020.
[18] S. Choudhari, N. Wagholikar, A. Swaminathan and S.
Kurhade, “Dream11 IPL Team Recommendation using
Machine Learning and Skill-Based Ranking of
Players,” 2022 International Conference for
Advancement in Technology (ICONAT), Goa, India, pp.
1-6, 2022.

You might also like