Probability and Statistics
CEC217 Lecture 3
Dr. Tarık Adnan
Email: tarikalmohamad@karabuk.edu.tr
1
Office: 104
Cont.
• We have seen previously the sample variance and the
sample standard deviation measures.
• Both measures play huge roles in the use of statistical
methods and both reflect the same concept in measuring
variability, …BUT:
Variability in
Sample standard linear units
deviation
Sample Variability in
variance squared
units
2
3
Population Characteristics (parameters) Parameters
• We can use the aforementioned concepts to describe the
population parameters. In other words, we have now two
mean variance
important parameters: the population mean and the
population variance.
• The sample variance plays an explicit role in the statistical
methods used to draw inferences about the population
variance.
• The sample standard deviation has an important role along
with the sample mean in inferences that are made about the
population mean.
4
Statistical Modeling,
Scientific Inspection, and
Graphical Plots
5
Statistical Modeling
• A model form is often the foundation of assumptions that are
made by the analyst.
• A statistical model is not deterministic but, rather, must entail
some probabilistic aspects. Often the end result of a
statistical analysis is the estimation of parameters of a
postulated \ assumed model.
• Assume one want to draw some level of distinction between
the nitrogen and no-nitrogen populations through the sample
information. The analysis may require a certain model for the
data such as the two samples can be derived from normal or
Gaussian distributions 6
Graphical Plots
• In this part, the duty of sampling and the display of data for
enhancement of statistical inference is explored in detail.
• We present some simple but effective displays that
complement the study of statistical populations
Graphical Illustrations
Box-and-
Stem-
Scatter Whisker
and-Leaf Histogram
Plot Plot or
Plot
Box Plot
7
Graphical Plots: Scatter Plot
tensile strength
• A textile manufacturer who designs an experiment where
cloth specimen that contain various percentages of cotton
=
çekme direnci,
çekme kuvveti,
are produced: gerilme direnci
• Five cloth specimens are manufactured for each of the four
cotton percentages. Some simple graphics can shed
important light on the clear distinction between the samples.
8
Scatter Plot (Cont.)
• Let’s have a look on the following figure; the sample means
and variability are showed nicely in the scatter plot.
Figure.1.5
Scatter plot of
tensile strength and
cotton percentages
• One possible aim of this study is simply to determine which cotton
percentages are truly distinct from the others
9
MATLAB Scatter Plot
• Open your MATLAB and 1) create 𝑥 as 200 equally spaced
values between 0 and 3𝜋 . 2) Create 𝑦 as cosine values with
random noise. Then, 3) create a scatter plot by using the
function scatter
10
Scatter Plot (Cont.)
• Plots can illustrate information that allows the results of the
formal statistical inference to be better communicated to the
scientist or engineer.
• At times, plots or exploratory data analysis can teach the
analyst something not retrieved from the formal analysis.
• Graphics can nicely highlight violation of assumptions that
would be unobserved or ignored.
• Let’s see next the other types of graphical plots. 11
Stem-and-Leaf Plot
Table 1.4: Car Battery Life
• The Stem-and-Leaf Plot is a
combination of tabular and 1
graphic display which can be
2
very handy to analyse the
distribution’s behavior of a
specific statistical data Table 1.5: Stem-and-Leaf Plot of Battery Life
generated in large masses.
• For the number 2.6, the digit 2
is designated the stem and the
digit 6 is the leaf.
12
Stem-and-Leaf Plot (Cont.)
• The stem-and-leaf plot of Table 1.5 contains only four stems
and consequently does not provide an adequate picture of
the distribution. What can we do then?
• The solution is to increase the number of stems in our plot.
• One simple way to accomplish this is to write each stem
value twice and then record the leaves 0, 1, 2, 3, and 4
opposite the appropriate stem value where it appears for the
first time, and the leaves 5, 6, 7, 8, and 9 opposite this
same stem value where it appears for the second time.
13
Double-stem-and-leaf
• This modified double-stem-and-leaf plot is illustrated in Table
1.6, where the stems corresponding to leaves 0 through 4
have been coded by the symbol (*) and the stems
corresponding to leaves 5 through 9 by the symbol (·).
Table 1.6 Double-Stem-and-Leaf Plot of Battery Life
Table 1.4: Car Battery Life
14
Frequency distribution
• Another way is through the use of the frequency distribution,
where the data, grouped into different classes or intervals,
can be constructed by counting the leaves belonging to each
stem and noting that each stem defines a class interval.
• In Table 1.5, the stem 1 with 2 leaves defines the interval
1.0–1.9 containing 2 observations; the stem 2 with 5 leaves
defines the interval 2.0–2.9 containing 5 observations and so
forth…
• Notice that the total number of observations here is 40 15
Histogram
• We obtain the proportion of the set of observations in each of
the classes by dividing each class frequency by the total
number of observations.
• A table listing relative frequencies is called a relative
frequency distribution.
• The relative frequency distribution for the data of Table 1.4,
showing the midpoint of each class interval, is given in Table
1.7.
• The information provided by a relative frequency distribution
in tabular form is easier to grasp if presented graphically. 16
Histogram (Cont.)
• Using the midpoint of each interval and the corresponding relative frequency, a
relative frequency histogram (Figure 1.6) can be constructed.
Table 1.7 Relative Frequency Distribution of Battery Life
Figure 1.6 Relative frequency histogram
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 2
= = 0.05
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 40 17
MATLAB Histogram
• Let’s generate 10,000 random numbers and create a histogram.
The histogram function automatically chooses an appropriate
number of bins to cover the range of values in x and show the
shape of the underlying distribution.
• Please investigate more about the
function histogram in MATLAB
Documentation 18
Estimating frequency distribution
• We obtain the proportion of the set of observations in each of
the classes by dividing each class frequency by the total
number of observations. Many continuous frequency
distributions can be represented graphically by the
characteristic bell-shaped curve of Figure 1.7.
Figure 1.7 Estimating frequency distribution 19
Estimating frequency distribution
• A probability distribution is said to be symmetric if it can be
folded along a vertical axis so that the two sides coincide.
• A distribution that lacks symmetry with respect to a vertical
axis is said to be skewed
• The distribution illustrated in
Figure 1.8(a) is said to be
skewed to the right since it
has a long right tail and a
much shorter left tail. In
Figure 1.8(b) we see that the
distribution is symmetric,
while in Figure 1.8(c) it is
skewed to the left.
Figure 1.8 Skewness of data 20
====Exercise====
===============
• The lengths of power failures, in minutes, are recorded in the
following table.
(a) Find the sample mean and sample median of the power-failure
times.
(b) Find the sample standard deviation of the power failure times.
21
====Exercise====
===============
• The following scores represent the final examination grades
for an elementary statistics course:
(a) Construct a stem-and-leaf plot for the examination grades in which
the stems are 1, 2, 3, . . . , 9.
(b) Construct a relative frequency histogram, draw an estimate of the
graph of the distribution, and discuss the skewness of the distribution.
(c) Compute the sample mean, sample median, and sample standard
deviation. 22
Thank you
Feel free to ask questions
23 23