Sports Viewcontent
Sports Viewcontent
5-2021
Zhong Zhuang
May 2021
1
Contents
Abstract 4
Chapter 1: Introduction
Chapter 3: Methods
Chapter 4: Findings
Chapter 5: Discussion
Chapter 6: Conclusion
2
6.c Limitations of the Theory or Method of Research 60
Chapter 7: Bibliography
3
Abstract
With the data technology developing, Sports industry beginning to more and
more pay attention to data application in sports business and athletics. Especially in
sports athletics field, sports professional teams managers take advantages of data to
draft new players and analysis games. Recently, in sports history, some famous
sports cases involve sports data analysis like MLB Oakland Athletics “Money Ball”.
This is research goal is focusing on how data be applied to sports athletics field.
The target readers are some peoples who are love sports game and want to
know how the data be used in sports athletics. This research can help them
understanding what is meaning of sports data. In this research, I will try to use easy
understanding words to explain the data and analytics methods in sports. The
research methods include reading related books and searching related content by
applications in real sports world (case study), and summarizing key points about
The results demonstrate the importance of sports data application and broad
time, the research results including describing and summarizing sports data,
statistical methods, and modeling relationships using Linear regression. The research
also includes some classical stories about sports data application in real world for
4
Chapter 1
1.a General Introduction of the Research Project
In 2011, a sports movie – Moneyball had been released. This movie talks about
how a small ball market professional baseball team manager and his assistant took
competitive season with minimal budget. This movie really inspire me how the
sports data be used in real professional sports games. Because this movie breaks
analysis. Data has become a hot point in modern time. You can easily find many
industries need data to support them to make better decisions for group successful
in some special fields. The Sports industries has also begun to study the importance
of data application. Especially, in sports athletics area, when we watching any sports
games, we always watching and meeting some data about a players or team’s key
data display. But, if nobody to interpret these data, these data will become no
valuable. These data are benefit to us to better understanding the game and player
performances. The same reason, with sports development of the sports industry and
the investment of money, the managers hope to build a competitive team and
attract more fans with good performances. And then, the team can make much
money from fans and supporters. So managers want to use data to make good
decision to choose ideal player for them teams. After 2000 years, the practical
5
application of sports data began to become more and more famous. Let me take
some instances, Oakland Athletics baseball team manager Billy Beane, Houston
Rocket basketball team manager Daryl Morey, and NHL Chicago Blackhawks. Daryl
Morey is big fan for sports data analysis. He graduated from MIT and Northwestern
for team successful by his analysis. And then Houston Rocket begun to shot many
more three-point shots than before. He built a team that has always been
considered strange and he even trade a good big player Clint Capela.
including many aspects of sports data such as stories, analytics methods, types of
sports data… I want to show you feasibility of data applications in Sports and how
data is used in the sports analysis. I will introduce some very basic concepts and
theories about sports data analysis. According to my research problems, I also raised
related problem, what are the development prospects of sports data? From my
research, I summary some personal research results about sports data analysis.
The main reference materials and literatures for this paper are Thomas A.
Measurement and Analytics, Ben Taylor’s Thinking Basketball, And Wayne Winston’s
Analytics Stories. Some other extra materials will be found in Internet. The main
6
The data applications is becoming popular in sports industry. It breaks a lot of
traditional ideas in sports industry. More and more professional teams use data to
make advantage in sports athletic and business. But at the same time, you will find
some teams still keeping some former professional players as scooting. The first
hypothesis of this research about whether data can replace coaches to guide the
game. My original hypothesis is the data can replace coaches to guide games. The
hypothesis based on a question, If our data analysis enough strong, do we still need
some people who own extensive professional sports experiences? At least you can
find that many professional team general managers have never participated in any
learning are growing stronger than before. Maybe one day, we will see two super
computers arranging tactics to guide humans to play games. The second hypothesis
of data analysis is meaningful for predicting the outcome of the game. We are seeing
some sports games gambling agencies take advantage of data to predict games
problems will be explored in this research paper. I will try to summary reading
materials content and online materials to find some answers and reasonable results.
The main results of the research is to demonstrate and prove the hypothesis and
paper, including literature reading and collation, problem analysis and document
editing.
7
1.b Rational for Research Project
First of all, many famous sports management event have proved that the
application of data in the sports industry is feasible. For example, Oakland A’s
sports. Whatever in live broadcast of sports game or some sports news software
such as ESPN. You can always find kinds of data records and predictions. So data also
in sports athletics area. But no everyone has time or enough knowledge background
to exploring this sports data secret. This research discussed the professional problem
of data analysis from the side. Many data analysts have good data analysis skills, but
they are not up to the job of sports data analysis. It is reasonable to explore the field
professional data analysis. It has the same rationality as financial data analysis and
Finally, in the modern world, sports analytics can do so much more. Teams can
use data to prevent plays injuries. Coaches can choose the best players to players to
play games based on data analysis. But how they use data analysis method behind
these behaviors. This is worth to exploring it. This research help sports fans know
how the data is processed. For someone who want to explore specific data analysis
further, the research help them understand the difference between sports data
8
1.c Definition and Explanation of Key Terminology
counted.
Statistics: is the science concerned with developing and studying methods for
occurrence of a random event. The value is expressed from zero to one. Probability
has been introduced in Mathematics to predict how likely events are to happen.
Linear Regression: is a basic and commonly used type of predictive analysis. It can
9
the outcome variable, and in what way do they-indicated by the magnitude and sige
building. It is a branch of artificial intelligence based on the idea that systems can
learns from data, identify patterns and make decisions with minimal human
intervention.
10
Chapter 2
2.a Brief Overview of Theoretical Foundations
Utilized in the Research Study
methodology is a vast topic, fortunately, there are a few central concepts and basic
methods that can greatly improve our understanding of data and the processes that
generated them. Statistic is the science concerned with developing and studying
In sports data method analysis, statistics play at least two important roles. One
this research process. When we using some statistical model, we can find some
to describe uncertainty. Because sports game results have random nature, so any
conclusions we draw from analyzing sports data will naturally have some uncertainty
and express it. Recognizing random nature is main contribution of analytic method
11
for sports data analysis. Some basic statistic method including Mean, Median,
research.
methods play two basic roles under this situation. The first role is statistical provides
methods for extracting the maximum amount information from a set of data. The
second role is statistical provides us a way to quantify the uncertainty that results
descriptions of how likely an event is to occur. As we all know, the reason why do we
want to use analytics method because we want to use data to better understand the
factors that influence the result of sporting events. But we must be noticed all of
sports facts have a random element that come into play. So we need to
sports data. Probability theory is the branch of mathematics that deals with random
outcomes. In this research, we need to pay attention to basic properties and rules of
probability that used in sports data analytics. Some appropriate probability theory is
some data collections, we can use probability to predict whether some events would
probability of A by P(A). Probability can describe any possible specific outcome that
mathematical link between probability theory and data. The random variable is
values P(X=x) for all possible x is called the probability distribution of the random
variable.
This research involved statistical and probability, some very basic problems need
problem need to make conclusion from our reading and literature materials. So we
induction, but the theory is same. The process of mathematical induction theory
logical is like playing dominoes. If you want to knock down all the dominoes, first you
must knock down the first card, and then you must ensure that every card can knock
down the next one, then all the cards will be down. For example, let P(n) is a
statement that depends on the natural number n, and M is another natural number.
Then if P(M) is true, whenever P (k) is true for k >= M ,then P(k+1) is also true, then
P(n) must be true for every number n >= M. The same theory logical can be used in
literature summary. So we need to recognize something must true, and then some
other conditions can be added as relevance, which must also be true. Depending on
13
this theory logical, we need to make conclusion about some core ideas and view
them is true, and then add some necessary other ideas, we can get a new idea.
theory. In linear regression, we can determine that best-fitting line cross the data
and use it to better understand the relationship between the variables under
consideration. The linear regression model can be used in machine learning. We can
collect necessary enough data and put them into linear regression model and then
training our model. After we get a line from data, we can use this model to predict
analysis.
14
2.b Brief Overview of Literature Reviewed,
Discussed and Applied
This research try to figure out a problem about how data is used in sports
athletics analysis. The Specific content relative this problem includes data analysis
method, model, and practical stories. Based on this core problem, here has there
we can prove how these hypothesis is true or not. Because the purpose of this
research paper is to help people who are interested in sports data analysis and
understand sports data analysis. So we will based on this goal to do some prediction
for development of sports data analysis in the future. These topics can help people
who interested in sports data analysis to provide a macro and detailed reference.
The research result will demonstrate value of sports data analysis. The value of
research can help the people who interested in sports data analysis make better
Thomas A. Severini’s Analytic Methods In Sports (first edition), Ben Taylor’s Thinking
Basketball, Dr. Lorena Martin’s Sports Performance Measurement and Analytics, and
Wayne Winston’s Analytics Stories. These literatures can support me finish this
research. The book Analytic Method In Sports using mathematics and statistics to
understand data from baseball, football, basketball and other sports. The main
content around mathematics and statistic and probability methods and calculation in
sports field. And all of data relative different sports athletics data. The book main
15
idea about how to use statistics and probability to analysis sports data. By reading
this book, I found the big value about application of mathematics in sports data
analysis. Ben Taylor’s Thinking Basketball provide some thinking logical about no-
number sports analysis methods. Standing on basketball views, the book gives us a
lot of insights on basketball analysis from where the data can’t be seen. We can get
some stories from this book data analysis. This book also tell us many analysis and
strategy trap in basketball data analysis. The data needs to be interpreted correctly,
and the correct is made based on the interpretation result. The book Sports
will reference basketball dunk analysis to know how to measure a basketball players
different views of sports data analysis. They provide different views let us
understanding sports measure world. But they also have some disadvantages in
sports data analysis. These literatures did not make a comprehensive interpretation
of sports data analysis world. For example, when we analysis a basketball game, we
are not only analysis team data, but also including individual player game data,
player body conditions measure and analysis, and team playing strategies. These all
belong to sports data analysis world. If we put all of these thing in this research, the
16
The conclusion from my literatures, including these points. 1) Sports data
is very important for sports data analysis. Especially, as sports data analysis with
important role in sports data analysis. Sports data analysis not only need a result of
data analysis, but also need right explanations of result. So the researchers need
know some basic some sports knowledge such as game rule, meaning of specific
data. Because we need to transform data result into specific action and plan. 3) Data
analysis not only including team and players games data analysis, but also including
individual player physical measure and training performance data analysis. For
example, in basketball players, physical measure including hand size, vertical jump
record.
17
Chapter 3
3.a Study Method and Study Design
The study design (research design) including these relative content. 1)Data type
required for research ,2) Research resources, 3) Participants required for research 4)
Study Method (Data analysis methodologies ),5) The location and timescale for
The data type required for this research including quantitative and qualitative
data. Quantitative data and qualitative data including primary data and secondary
data from literatures and reading materials. Based on research goal and problems, I
made some virtual primary data to support some analysis conclusion. The virtual
primary data can help readers better understand the progress and conclusion of
problems. The virtual primary data was made by basing on real data research from
literature resources. So the virtual primary data has characteristics. For example, in
one of our literature, the author uses real NBA basketball player data to do
comparison, I will create reasonable virtual data and un-real player name to replace
author real data. I also collect some primary data from myself like my hand size
measure. The some secondary data is collected from the internet and literatures.
The secondary data belong to referenced data. The main data types including
The research resources except main literature and reading materials, I also
reference from some internet resources for further exploring sports data analysis
18
world. The extra main research resources is Google Scholar. The reason for using
Google Scholar is because the research requires some additional paper support
detail content and argument. The all of extra resources are in the Reference page.
The participant for this research only me. I as this research author, I finished all
of jobs about this research paper. My actions for this research including confirm
research goals, research problems, literature research and summary, and final
By qualitative research, I made some summary and conclusion from literatures and
research resources in some sports data analysis stories and case study. Quantitative
research method involve how data is be used and analyzed in sports athletics data
numerical data. In this research, we need to use and compare many numerical data
to get conclusion and research result. In quantitative data analysis methods, we have
two analysis level – Descriptive analysis and Inferential analysis. I use Descriptive
patterns. Taking some instances, Mean, Median, Mode, Percentage, Frequency and
Range. I use Inferential analysis to show the relationships between multiple variables
two variables. Analysis of variance is tests the extent to which two or more groups
difference with each other. Specific learning and research methods including
literatures reading and summary, search relative online research paper focus on
understanding.
The location and timescale for conducting the data. Most of data come from
literatures and reading materials, a few part of data is virtual data. The extraction
and application of all data determined along with the progress of the paper.
The research time period required 3 month or longer. I need to working about
20 hours every week to reading literatures and other necessary and helpful research
takes a lot of time to conceive the structure and content of the paper. I takes about
20
3. b Explanation of Sample to Be Used in the Study
In this research, I mainly quoted two main data samples. That is Kevin Durant
2011- 2012 season basketball data and NBA Sun’s 2005 Stoudemire and Hunter
shooting data. These data all belong to individual player basketball performance
data. These data as samples can represent players one moment or one season
specific performance.
According Kevin Durant 2011 -2012 season basketball statistics data, I used it as
example and real case to analysis mean, stander deviation and margin of error in
sports data sample. The sample includes 6 parts, they are statistics, rebounds,
assists, turnovers, fouls, points. The data samples was cited from literature. It is
Another data sample about 2005 NBA Sun’s two players Stoudemire and Hunter.
The samples comparing shooting percentage at the different area between them.
Stoudemire has better shooting percentage wherever closer basket or in the middle
range. Hunter has the same data in area where closer basket, but mid-range shot is
lower percentage than Stoudemire. These samples was used to explain how to
21
3.c Explanation of Measurements, Definitions, Indexes, etc.
and Reliability and Validity of Study Method and Study Design
Basically, based on how the data and samples are obtained, the research data
measurements and indexes are reliability. Data has the characteristics of long-term
validity and not time-sensitive. The data represents the solid characteristics of sports
data, which is reasonable and measurable. The data can represent the performance
characteristics of all athletes. The part of definitions are reliability. Because as the
changed in the future. The virtual data and real data are reliability and reasonable.
The real data from true sports cases and athletics performance. The virtual data
mimic real data, so virtual data is reliability. The real data on such as sports players
The some definitions and relative research conclusions are not reliability.
limited, some data analysis results probably have deeper exploring and finding.
Some definitions come from some phenomena and theories. But not one hundred
percentages assume all of theories and phenomena are right. Probably, one day,
some hypothesis has new finds and original theory structure would be changed by
22
Study measure and study design is reliability, The reasons same with data
reliability. The data from sports case history, history won’t be changed. So study
measure is reliability and validity. About study design, probably exist some little or
more problems, so reliability and validity are determined by time and readers. The
understanding of the literature and use induction to arrive at a new point of view.
From the analysis of the research results and research goals, this study has basically
23
Chapter 4
4.a Results of the Method of Study and Any
Unplanned or Unexpected Situations that Occurred
study, the research figure out all pre-set problems and fully demonstrates
hypotheses. From the research result, we find multiple statistics and probability
method be used in sport data analysis processes. Sports data analysis need to using
mathematics and statistics to understand data from different sports program. By the
methods of study, the sports data analysis have a detailed steps includes describing
and summarizing the sports data, probability and statistical methods to analysis
sports data, and using correlation to detect statistical relationships. Other methods
including modeling relationships using linear regression and build regression models
By analyzing basketball, we also figure out different some sports analysis logical.
has five important physical variables includes stamina, speed, strength, skill, and
spirit. Based on different sports program, these variables play different roles. Except
24
By analytics stories in sports, we also get some other conclusion about sports
data analysis reliability and application. These stories give us a professional view to
see sports data importance. From these stories in real sports case, we make a
is limited. Underestimate the time it takes to write the research paper. The research
paper structure is complicated and some content need time to learn how to write. If
the time is sufficient, the conclusions of the study may be more reliable and valid.
Another is some other factors what influence sports player performance are difficult
to measure and analysis. Sports data not just games statistics and athletes physical
Normally, we got sports data by sports media, public datasets for history sports
recorded, or watch games, collect data by yourself by measures and recoding. But
some methods need hardware support, the funding is limited. To solve this problem,
I compensate for this problem by reading the literature and try to build
comprehensive sports data analysis world. About literature, some contents are very
useful for this research, but part of content not necessary to reading. About methods
of study, underrate the importance of practice. For some questions, the Excel, R and
25
4. b Explanation of the Hypothesis
hypothesis concept and research goals, I provide two hypothesis about sports data
analysis. The hypotheses include Data can replace coaches to guide games and
sports data analysis meaningful for predicting the outcome of the game.
researching. For the first hypothesis data can replace coaches to guide games, this
hypothesis stand on different view to talk about what is data role in sports data
analysis, what is relationship between technology and human, what is data analysis
biggest potential. All of these questions we all have related discussion in this
research. The second hypothesis data is meaningful for predicting the outcome of
the game. We discussion sports data analysis reliable in the research. Because the
sports data analysis include probability, sports betting is huge part in sports data
26
Chapter 5
5. a Full Discussion of Findings (Results) and
Research Analysis of Finding
The sports data analysis by using of mathematical methods, together with the
vast amount of data now available, to analyze performances, recognize trends and
patterns, and predict results. To provide insights on athletes and teams, we need to
and probability. Statistics concepts are core in sports data analysis. The goal is to
topic which data do you need or which data are collected. In sports data analysis,
usually, the subjects are athletes, sometimes the subjects are games, seasons,
teams, or even coaches. Defining subjects is a key point that know what we want to
do by analyzing data firstly. After defining a subject, the next step is considering
20 regular season, we need to know the player’s played how many games, how
many minutes, played with which teams, scores, assists, rebound, steal, and block
example, if our subjects are NBA all Centre in 2019-2020 regular season, the variable
27
is the number of how many blocks in this season, then the measure scale is the set of
considered quantitative. We need to care about not all variables are numerical. For
example, if our subjects are MLB players in 2019 season and variable is the players
bats, and then the variable values is left(L),right(R), and switch hitter(S). Some
Analytic method means using data to extract conclusions and make decision. But
the data not always available and clear. So we need to filter out the noise and clear
data to see deeper relationships in the data. Sports data analysis belong to
observation study, the study not be control by data analytics, because we can’t
generate data from our ideal ways. For example, we can’t control which two players
or teams participate in a one game under a given situation. We have to accept all
data that randomly generated from the games or players. For different data
In usually, the first step is some type of summarization in analyzing sets of data.
build a frequency distribution table to know a team performance (lose and win) in
28
W W W L W W L L W L
L L L W L W W L W L
W W L L L L W L W L
W L W W W L L L L W
WIN 19 47.5%
LOSS 21 52.5%
TOTAL 40 100%
the same time, we need to let classes range in the same length. For example ,see
Table 3.
season
29
CLASS COUNT PERCENTAGE
300 OR MORE 3 5%
TOTAL 61
personal player performance in an entire view, mean and median in many cases is
more useful and most common for quantitative data. The median and mean are
analysis. Standard deviation is another way to know and analysis variation in sports
analysis. The standard deviation can reflect the degree of dispersion of a data.
Normally, smaller SD means a more stable performance, and vice versa. Another
analysis variation way is IQR (interquartile range). IQR is the upper quartile minus
the lower quartile. In sports analysis, IQR has two advantages, one of advantages it
variable. Another advantages is IQR not sensitive to extreme values than the mean.
30
By analysis mean, median, SD and IQR in a dataset of variations, we can have good
to draw the good conclusion from the sports data. Probability is means any process
that produces random results. For example, a basketball player plays free throw.
Nobody know whether he can get another free throw score. Some examples like this
is lots of in sports field. In probability includes more or less events. For instance, if
mean event X will have 25% probability happen. For example, if we have event A and
represent the player hit a double in baseball game, and then let us hypothesis P(A) =
0.3 and P(B) = 0.04, the probability that hitting wither a single or a double is P( A or
B) = P(A) + P(B) = 0.3 + 0.04 = 0.34. This is a most common way to consider two
represent a baseball player hit single in 0.20, denote P(C) = 0.2, then probability that
he does not hit a single is P(not single) = 1 – P(C) = 1 – 0.2 = 0.8 However, in sports
data analysis world, we not only care about events, also care about numbers. When
we have huge number of numbers data, probability can help us interpreted as long
31
history records in win-loss results between football team AAA and team BBB, team
BBB has 30% probability will win the next game. But sometimes, we will consider
other events together to predict something. For example, P(B) represent team BBB
win football game probability is 30%, denote P(B) = 30%, quarterback is critical
player in football game, QB throws 5 interceptions, the team BBB win about 3% of all
the time. So P(team BBB win | QB throws 5 interceptions) = 0.03. This is conditional
analysis , probability can help us better to understanding factors in sports events and
predict some event happened probability. This is a very useful analysis logical in
sports analysis. But how to use it should be combine different situations and sports
programs. Depending on this point, a good familiar for a sports program is benefit
you to do draw more meaningful data analysis conclusion. For example, in soccer
game P(home team wins) and P( home team score first) always put together to
analysis and predict game results such as P(home team wins | home team scores
first)
performance in every carry. Let us make a hypotheses, the running back player name
is R. The R every carry data denote R1, R2, R3…, All of these data sample mean of R is
32
5.59 yards per carry, so we estimate the future carry data about 5.59 Similarly, the
sample median of the R values is 3 yards per carry, we also estimate of the median of
R. Normally, we like using the margin of error to quantify the variation in sports
statistics. For example, a running back R per carry value average (mean) is 5.38, the
margin of error is 1.16 So the interval gives a range of values such as 5.38 -1.16 to
5.38+1.16. This range can tell us a player true performance and summarize a player
performance. If we have lots of data and the game will repeat, we can use this value
to predict a player performance in the future. The uncertainty under this prediction
is also be described by the margin of error. About margin of error calculation, there
n denotes the number of data values used in the sample average. For example, we
using NBA super star Kevin Durant’s 2011 -2012 season data to see margin of error
33
Turnovers 3.76 1.70 0.42
As we all know, suppose we know the standard deviation values of each data, for
error values. For Points section, Durant points range between 29.72 and 26.34
In some sports ,we probably need to using proportion in some probability events.
Under this conditions, we can use this formula to calculate. P(A) = p. p is the
proportion of experiments in which event A occurs. The n is same meaning with last
formula.
Another way is compute the margin of error using computer simulation to obtain
these hypothetical repetitions. You can find this way easy on internet. This paper not
mathematics paper, so we do not talk too much content about this approach in
these.
variables and presented. These approaches can help us reduce the properties of
such a relationship to a single number that is useful as. A simple summary of the
34
also called it as linear relationships. For example, we measured two variables Y and X
which we are interested and we want to know what relationship between these
variables. We let Y represent baseball runs scored and X represent on-base plus
35
Based on our current discussion, we see that sports data analysis is not a
and theories of statistics are the basis of sports data analysis. Therefore, if you want
to study sports data analysis in depth, a solid mathematical foundation and rich
statistics, when analyzing sports data, we go through the following steps, the first
step is find data from relivable and available sources. The Second step is clear data,
clear noise in data and choose necessary data for analysis. The third step is thinking
about subjects and choose variables for goals of subjects. The forth step is using
depending on subjects goals and variables. When we analysis data with statistics
Sports data analysis not only quantitative analysis like statistics methods, but
also need some qualitative analysis by summarizing specific sports event. In this
Wilt Chamberlain attempted more than 25 shots per game in 1966. He is the
best player and most efficient scorer in the team. Normally, we would like asking the
team most efficient scorer to shoot constantly. The reason is very easy, he is the
best, he got the most scores, he should increase numbers of shot and then team win.
36
But in fact, this idea not work in team sports like basketball. The Braess’s paradox
phenomenon that has been observed in cities where a major highway closed. When
shots among teammates is like dispersing cars across different roadways. The
competitors’ defenses same with other cars, these cars influence the optimal route
home, basketball defenses influence the optimal scoring path. The opponents must
be not allowed your best player constantly get scores, they always thinking and
switch different defensive strategy to influence your best player. Meeting these
under this conditions, the best player try more shots, the positive affection is less for
team wining game. In team sports, we should pay attention to team’s overall
efficiency. Chamberlain should shoot less and increase teammates scores. This
impact is called as “Global Impact”. In other words, global impact is the effect of a
basketball because the winner is determined by the team total scores other than
individual’s scores.
replaced. In 2005, Phoenix Sun has famous player Stoudemire, he scored 36 points
every 100 possessions he played. But when he backup and not play, Phoenix was not
37
lost 36 points per 100 possessions. His backup, Steven Hunter, scored 17 points
From the data analysis, we see Hunter almost no mid-range capability but he almost
same scoring ability with Stoudemire at the rim. When Stoudemire not on court,
Phoenix not find a way to replace his scores, they find a different ways to
redistributed the scoring options. So when Hunter on the court, Phoenix find
another way such as he closer to the basket, and teammates pass on him. The new
distribution of scoring attempts can help team get better effective probably. Because
When we analysis a players scoring abilities, we can’ t only see player’s data
such as how many average points per game and how many points per scoring
attempt. These data can represent something but not all. At least in basketball, we
38
A 11 1.00
B 20 1.10
Normally, after we analyzing this table, player B is better scorer than A, But in fact,
he is not.
jumper layup
A 11 39 10 5 67
B 18 39 5 1 65
From this table, you will know player A is a creator and he help team get more scores
than B. Player B indeed is a good scorer, but under global impact theory, Player A
can lead team get more wins than B. Because A creates more teammates scoring
opportunities. We can’t interpret data from the surface, how to use data and get a
useful conclusion to help team or players, that we need to do by sports data analysis.
Thus occupying defenders to create open shots, even for less skilled teammates, is
really valuable; an open shot for a role player is often more efficient than a covered
one for a star. From this theory, when we analyzing sports data, we need to combine
Some conclusion even violation of common sense in sports data analysis. Like I
just talk about cases. Player A has less shoots, but the team overall increase five
39
points for efficiency. This improvement was not accomplished with different players,
but with a different distribution of scoring attempts. This is also Braess’s paradox
application in basketball. But the same conclusion probably not work in other sports
data analysis. If you only know how to analysis data, this is not enough to be a sports
data analyst.
team’s offensive possession ends in one of four ways: 1, made shot 2, a turnover
basketball data analysis, we should consider about causal factors. Ask why player get
high points? who give player assists? Who are opponents? For example, if Nash not
on the court, Stoudemire’s easy layup would be disappeared. From this point, we
need to pay attention to scoring causal for every possessions. Finding causal reasons
will help us the truth about scoring. Individual scoring is a one of factor influence
team win, but the purpose of basketball is for a team to score more points than its
opponent. So finding scoring causal is more important than know who is higher
scorer in a team. How do we get this conclusion? We combined data and basketball
When we analyze sports data, we should avoid bias and consider Anchoring
fluences. The human brain requires a starting point. If we very care about one
starting point, we will easy ignore other factors or event. For example, in sports data
40
analysis, we would like thinking scoring capability is the most important metric for a
basketball player. For most of basketball fans, this view is feasible. Because scoring
became the default measure of a player’s contribution to wining and fans feel like
easy understanding it. But for sports data analyst, this view will let us generate bias
data and reject preconceived ideas and opinions. Anchoring is a phenomenon, the
attention to efficiency rather than volume of scoring. At the same time, we should
pay attention to global impact, an individual player how influence other teammates,
When we do data analysis, we should care about data details. For example, in
large discrepancies. When we calculated and analysis every 100 possessions data in
basketball, the small difference in percentile multiplied by one hundred will also
high-variance nature, it should takes a larger sample size, a large number of games -
to be confident that the numbers are actually reflective of the overall performance.
41
In sports data analysis, the data variance and sample size usually show up together.
The rule about sports data sample size is the greater the variance, the larger the
good or bad, not only analysis one games or one series, we need all data! The
richness of data samples is conducive to the accuracy and reliability of the final
measurements can help improve the athletics performances and also help improve
training efficiency. As sports performance analyst, you should know some necessary
flexibility, balance, anaerobic power, aerobic power, reaction time, agility, and level
need to use some necessary formula. For example, Body composition includes
muscle, fat, bone, and other substances in the athlete’s body. Normally, we use BMI
42
to calculate and individual athlete fat content based on the relationship between
weight and height. BMI is calculated as the weight in kilograms(kg) divided by the
you need to refer some relevant forms and tables to draw conclusion. For example,
this table built for assess muscle strength, muscular power and muscular endurance.
Table 8
This is part of form about body muscle group testing for athletes. This form give us a
instruction about how to test and assess muscle for athletes. Some knowledge about
43
physical measure and analysis should refer some special tables. So in physical data
analysis, the table for reference is important. For example, when you need to make
conclusion about motion of single-joint form your data, you should rely on range of
fixed.
Table 9
Physical analysis belong to sports data analysis, and it needs anatomy and physiology
example, the composition of the human body, distribution and function of muscles,
and types and functions of bones. The analysts need to collect and measure by
themselves for physical analysis. During this process, analysts need to learn how to
collect data by some equipment and machine. Sports data analysis need technology
hardware support for collect data. In order to better service athlete and team,
44
physical analysts need understand the requirements of specific sports for various
specific skills includes precision, accuracy, and consistency of shooting free throws,
three pointers, and passing the ball precisely to teammates, adaptations for other
player positions are recommended. But the tennis is different with basketball. Tennis
measured for each of the major tennis stokes: serve, forehand, backhand, volley,
slice. Because these sports characteristics of tennis, the field of tennis has some new
technology to measure these sports and body moving data. For example, installed
sensors within the grip handle of the racquet to quantify measures specific to tennis
players. The sensors records the frequency of strokes, type of spin used by a player,
For basketball, the athletic tests include a standing vertical jump, maximum
vertical jump, bench press, three-quarter-court sprint time, lane agility time, and
modified event time. The physical measures include height without shoes, wingspan,
weight, standing reach, body fat, hand length, and hand width. These measurements
analysis, you can use the method of comparative analysis to compare data with
similar basis conditions. For example, the same body heigh players, comparing them
hand size.
data analysis, as data analysts, we should know our data. Because data is
45
fundamental to understanding the factors that can play a role in the athlete’s or
analysts also need to measure and quantify these intangible variables by some
models. In sports the key psychological factor that influence player’s performance
models have been repeatedly used and proven. For example, measure motivation.
As we all know, the strong and positive motivation that affects sports performance,
whatever it comes from coaches or the athletes. The motivation has two main
motivation. The Intrinsic means “I want to do” for satisfy some satisfaction. The
developed in France in 1995. The original scale measures both intrinsic and extrinsic
motivation but the effective of original scale was questioned, and then developed
new scale SMS-6(Mallett et al. 2007). Now, in order to determine specific situation
Vallerand, and Blanchard 2000). From this example, we know sports psychology
measure by using some specific scale that have existed. We need to note that these
46
measurements are not static. With the progress and development of academic
research, sports data analytes must learn to adopt new measurement scales for
different situations.
47
5. b Full Discussion of Hypothesis and of Findings
The first hypothesis of this research is data can replaces coaches to guide the
game. The final conclusion is that data cannot replace coaching in person. This final
conclusion of the hypothesis is the opposite of the initial judge. As we all know, with
the developing of technology, the machine learning and artificial intelligence paly
more and more important roles in different field. The same is true for machine
learning and artificial intelligence. But data can’t take advantage of machine learning
and artificial intelligences to coaching human to play game. At least data can’t guide
team competitive sports. Data can’t play coaching role to tell players how to play
Competitive sports like basketball, soccer, they all have some specific features
including uncertainty of the game, accidental events during the game, the nature of
the win based on the total score of the team. Because of these factor influence
games, the coaches job nature is complexity. Some uncertainty events are an
important part of the game, no one wants to watch a game with a pre-set result. The
machine learning can learn lots of game strategies for one sport, but human players
performance can’t control by anyone for example, injury and the arrangement of
learning. But teams competition not only including playing strategies , but also
including player mental recognition and psychology analysis. The coach should to
comprehensively know players conditions and status. From this point, data can’t do
that. For example, the coach’s not to put five highest-scoring players on the field, but
48
to put the five most suitable players on the field. The coach is going to motivate the
players on spiritual. Yes, machine learning also can learn these motivation languages,
but whether the players accept is a big problem. The complexity of the game and the
complexity of the coaching work determine that the data cannot fully replace the
From the machine learning technology view, the same question like whether the
autopilot can achieve level-5. Level-5 means full automation. The vehicle
performance all driving tasks under all conditions. Zero human attention or
games. Machine learning and artificial intelligence can’t analysis and handle all of
these uncertainty and accident conditions. Autopilot deep learning need pre-
learning process. But meeting some conditions they haven’t experienced before like
snow or ice covered- road. As human being we can use intuitive physics to handle
these conditions and make an appropriate decisions. And as human, we can rely on
our knowledge of how the world works to make rational decisions when we deal
with new situations. We also can causality and can determine which events cause
others and understand the rational actors for our next move. For example, when you
driving and see a rhinoceros on the street. Under this conditions, you will try to find
a new road. Even though the rhinoceros walks slowly and it won’t touch your car.
But for the same conditions, deep learning algorithms don’t have such capabilities,
therefore they need to be pre-trained for every possible situation they encounter.
49
Data support machine learning, and machine learning can help robot or machine
more intelligence. But data requires specific application environment to make the
appropriate decision through analysis. We can use data to analyze games and
players, but it is not feasible to use data to replace coaches and coaching. From the
sports view, using a computer that analyzes data to replace coaches does not meet
watching coaches body language ,facial expression and some funning words. If a
computer or robot that uses data knows about the game, it may be difficult to
The second hypothesis is that data analysis is meaningful for predicting the
outcome of the games. The hypothesis is true. First of all, the meaningful means that
it works, can achieve certain expectations, and has a certain affect. Sports Data
analysis play an important role in sports betting. As we all know, the outcome of
sports data analysis meaningful for predicting game results, we have to talk about
statistics. Sports data analysis need statistics methods to draw conclusion. In other
factors that have a strong correlation to wining games. Sometimes, these factors
aren’t immediately apparent to the betting public world. The analysis process is long
50
When we talk about statistics in sports betting, we should know “significance”
does not mean “important”. We need significance data from our dataset. The first
step we need what factor significance influence game win or loss. For example, if we
want to know a NFL team wins or loss, the “completion percentage” is significance
data. And then we need to find a dataset includes these data. “completion
percentage” is independent variable, the game win or loss is depending variable. The
more statistically scientifically important a variable is, the more likely you are believe
that is related to winning. In sports data analysis, here always have many variables at
play at once, so statistics method multiple regression is most commonly used for
sports betting. For example, for predict a basketball game result, we need multiple
regression. The guest team won the last two games by two points in home, the
home team win 93% of games in which they score 105 or more, the home team have
won 92% of their games at home, these information all belong to multiple regression
analysis. In sports betting analysis, we always use historic data to predict future
outcomes of games.
Another statistics way is logical regression. Logical regression is a method for get
result from one or more independent variables. For example, a basketball team 3-
points percentages, the total number of offensive rebound, and the total number of
and variable B. You can say variable A and variable have a correlation, but they are
not necessary have causation. Regression analysis can help us to find variables that
51
correlation such as home field advantage and winning percentage, but we can’t say
Except statistics, the sports data analysis also need probability knowledge for
sports betting. The main methods include the Bayesian Network, Poisson
distribution, and binomial distribution. All of these methods can help us to predict
outcomes of games. These methods also determine true in the data analysis
52
5, c Post Analysis and Implications of Hypothesis and of
Findings
using Excel and R. Statistics are core in sports data analysis. Excel and R have strong
usability for data analysis. With developing of data analysis technology, I believe
more and more new technology will be used in sports data analysis. Based on correct
understanding of sports data analysis, in the future, the sports data analysts should
be required more effective working for data. The data technologies comprehensive
using is necessary. The data extract, data mining, data processing and data
learning will play more and more important role in sports data analysis. As the data
collection, the huge number of data need big data technology to processing. The
sports data analysis. The greater the amount of data, the greater the advantages of
machine learning. Sports data analysts need to adapt to the development of new
technologies. The theories of sports data analysis will also change as the rules of
We can’t ignore hardware power. Like I said before, sports data analysis require
high effective. In other words, using less time, and get more analysis results.
Combining Hard wares with related software can help us achieve this goal. Recently,
this technology has been used in soccer. In some countries, soccer player wear
special clothes and put a sensor in clothes to detect the athlete’s heart rate and
53
sports performance. The assistance coaches only using iPad and reading athlete data
immediately. High-tech equipment will definitely have an impact on the sports data
analysis. In the future, the essential for the ability to analyze and interpret sports
data is necessary. The team not only need people who master of data analysis or
statistics, but also need a people who know sport nature. Therefore, the
analysis. Processing data is not the ultimate goal of analysis. Only constructive
54
Chapter 6
6.a Summary of Academic Study
The academic study focus on sports data analysis application methods and
processes. Explaining what is sports data analysis from multiple angles through
statistics, probability theory, and other summary of sports data analysis. The study
process always contains some details and lot of examples. Through the collation of
sports data analysis concepts, some necessary conditions and skills for sports data
The academic study goal is the purpose of academic research is to explore the
what is sports data analysis. The academic study also emphasized the characteristics
of sports data analysis and the differences from other industry data analysis in the
research process. For someone who want to learn sports data analysis or plan to
apply data analysis in sports athletics, this academic research can help them
understand sports data analysis and the necessary skills, so as to provide references
to decide whether to engage in this industry. People who are interested in sports
data analysis, such as fans, can learn from this academic study how the sports data
In this academic study, two questions were discussed and two related
hypotheses were answered. The two questions are how data is used in sports
athletics analysis and what are the development prospects and predictions of sports
55
data. The two hypotheses are data analysis is meaningful for predicting the outcome
of the games and data can replaces coaches to guide the game. Through these two
questions and hypotheses, reader can fully understand sports data analysis from a
macro and micro perspective. The judgment of the hypothetical result is a personal
summary based on literature and data analysis. Support the hypothetical results with
The findings of academic research include the following. (1) Sport data analysis
is not a completely independence discipline. Sports data analysis need strong and
different analysis goals, using different statistics methods. (2) Sports data analysis
need know characteristics and nature of specific sports program. Sports data
analysis need to know the significance of various variables for data analysis. (3) The
results of sports data analysis need to be derived from multiple aspects, requiring a
influences. (4) Sports data analysis involves the analysis of player’ personal game
data, team performance data and the players’ individual physical conditions. The
physical conditions include physical factor measure and psychology measure. (5) The
sports data analysis processes including finding data from reliable and available
sources, cleaning data, thinking subjects and variables, using statistics to get analysis
result, and propose conclusion. (6) Psychology analysis and measure need hard ware
56
6.b Reference to Literature Review
The academic study focus on sports data analysis application methods and
processes. Explaining what is sports data analysis from multiple angles through
statistics, probability theory, and other summary of sports data analysis. The study
process always contains some details and lot of examples. Through the collation of
sports data analysis concepts, some necessary conditions and skills for sports data
analysis are summarized. The research try to find answers about how data is used in
sports athletics analysis and what are the development prospects and predictions of
sports data. The research also provides argument and discussion for the hypothesis.
The academic study goal is the purpose of academic research is to explore the
what is sports data analysis. The academic study also emphasized the characteristics
of sports data analysis and the differences from other industry data analysis in the
research process. For someone who want to learn sports data analysis or plan to
apply data analysis in sports athletics, this academic research can help them
understand sports data analysis and the necessary skills, so as to provide references
to decide whether to engage in this industry. People who are interested in sports
data analysis, such as fans, can learn from this academic study how the sports data
and debates relevant to the research topic, and to present that knowledge in the
57
form of a written report. The literature strongly support the research in sports data
analysis. Literature review can prove the rationality of the research and provide
Thomas A. Severini’s Analytic Methods In Sports (first edition), Ben Taylor’s Thinking
Basketball, Dr. Lorena Martin’s Sports Performance Measurement and Analytics, and
Wayne Winston’s Analytics Stories. These literatures can support this research. The
book Analytic Method In Sports using mathematics and statistics to understand data
from baseball, football, basketball and other sports. The main content around
mathematics and statistic and probability methods and calculation in sports field.
And all of data relative different sports athletics data. The book main idea about how
to use statistics and probability to analysis sports data. By reading this book, I found
the big value about application of mathematics in sports data analysis. Ben Taylor’s
Thinking Basketball provide some thinking logical about no-number sports analysis
basketball analysis from where the data can’t be seen. We can get some stories from
this book data analysis. This book also tell us many analysis and strategy trap in
basketball data analysis. The data needs to be interpreted correctly, and the correct
is made based on the interpretation result. The book Sports Performance Measure
58
measurement focuses according to different sports. We need to develop different
analysis strategies according to sports. These reading material have different views
of sports data analysis. They provide different views let us understanding sports
measure world. But the they also have some disadvantages in sports data analysis.
analysis world from the perspective of an amateur. For example, when we analysis a
basketball game, we are not only analysis team data, but also including individual
player game data, player body conditions measure and analysis, and team playing
strategies. These all belong to sports data analysis world. If we put all of these thing
in this research, the readers will more easily understanding sports data analysis
world.
analysis and data analysis, which are very suitable for research topic on sports data
59
6.c Limitations of the Theory or Method of Research
The limitations of research method is that rely too much on literature and lack
practical application. All research conclusions are based on the summary of the
literature. During the research process, these was no more research with relevant
professionals such as coaches or players. The theoretical research of sports data may
One of the research goals is to make people better understand the process of
sports data analysis, but these is no guarantee that all fans have a theoretical basis in
mathematics and statistics, and the content may cause difficulties for some readers.
purpose of the article will be inconsistent with the content. Some sports data
analysis theories may change with further research in the future. Because this
research based on other literature research and make related conclusion. If the
60
6.d Recommendations or Suggestions of Future Academic
Study
Interview more sports related people to get advice and discussion on real sports
techniques such as data visualization, machine earning and practice sports data
analysis process.
61
Chapter 7
7.a Complete List of all Sources Used Regardless of
Citation or Inclusion
Thomas, A. SEVERINI (2014). Analytic Methods In Sports Using
Mathematics and Statistics to understand Data from Baseball, Football, and
Other Sports, CRC PRESS
62
UCI Department of Statistics. (n.d.). What is Statistics.
https://www.stat.uci.edu/what-is-statistics/
BYJU’S. (n.d.). Probability. https://byjus.com/maths/probability/
Complete Dissertation By Statistics Solutions. (n.d.). What is Linear
Regression? https://www.statisticssolutions.com/free-resources/directory-of-
statistical-analyses/what-is-linear-regression/
https://www.sas.com/en_us/insights/analytics/machine-learning.html
63