الإحصاء الهندسي
الإحصاء الهندسي
CHAPTER ONE
1.1 Introduction
Engineering is about bridging the gaps between problems and solutions, and
that process requires an approach called the scientific method. Many aspects of
engineering practice involve collecting, working with, and using data in the solution
of a problem, so knowledge of statistics is just as important to the engineer
as knowledge of any of the other engineering sciences. Statistical methods are a
powerful aid in designing new products and systems, improving existing designs, and
designing, developing, and improving production operations. Statistical methods are
used to help us describe and understand variability.
All natural processes, as well as those devised by humans, are subject to
variability. Civil engineers are aware, for example, the compressive strength of
concrete, soil pressures, traffic flow, floods, and pollution loads in streams have wide
variations. To cope with uncertainty, the engineer must first obtain and investigate a
sample of data, such as a set of flow data or soil test results. The sample is used in
applying statistics and probability at the descriptive stage. For inferential purposes,
however, one needs to make decisions regarding the population from which the
sample is drawn. A data set comprises a number of measurements of a phenomenon
such as the failure load of a structural component. The quantities measured are
termed variables, each of which may take any one of a specified set of values.
1
Engineering Statistics Second Year
1.3 Populations and samples
In everyday language, the word “population” refers to all the people or
organisms contained within a specific country, area, region, etc. When we talk about
the population of IRAQ, we usually mean something like “the total number of
people who currently reside in IRAQ” In the field of statistics, however, the term
population is defined operationally by the question we ask: it is the entire collection
of measurements about which we want to make a statement.
In statistics, we are interested in obtaining information about a total collection
of elements, which we will refer to as the population. The population is often too
large for us to examine each of its members. In such cases, we try to learn about the
population by choosing and then examining a subgroup of its elements. This
subgroup of a population is called a sample.
A population consists of all possible observations available from a particular
probability distribution. A sample is a particular subset of the population that an
experimenter measures and uses to investigate the unknown probability distribution.
A random sample is one in which the elements of the sample are chosen at random
from the population, and this procedure is often used to ensure that the sample is
representative of the population.
2
Engineering Statistics Second Year
CHAPTER TWO
2.1 Introduction
Once a data set has been collected, the experimenter’s next task is to find an
informative way of presenting it. In general, a table of numbers is not very
informative, whereas a picture or graphical representation of the data set can be quite
informative. If “a picture is worth a thousand words,” then it is worth at least a
million numbers.
f
i =1
i = f 1 + f 2 + ....... + f k = n
Value Frequency
x1 f1
x2 f2
. .
. .
. .
xk fk
Total n
3
Engineering Statistics Second Year
It is sometimes useful to present a frequency distribution in terms of the
proportional or relative frequencies pi, which are defined by
fi
pi = (i = 1, 2,......, k )
n
In addition to the frequency distribution, it is often useful to present the
cumulative frequency distribution of a given variable. The cumulative frequency of xi
is defined as the sum of frequencies of values less than or equal to xi. We denote it by
Fi , and the proportional cumulative frequencies or cumulative relative frequency by
Fi
Pi = (i = 1, 2,......, k )
n
A table of proportional cumulative frequency distribution could be represented
as follows:
4
Engineering Statistics Second Year
Example 2.1 :If we have the following frequency table. Complete the other
elements of that table .
Score Frequency
80-94 8
95-109 14
110-124 24
125-139 16
140-154 13
5
Engineering Statistics Second Year
Solution :-
Largest value = 154
Smallest value = 80
Range = 154 – 80 = 74
Take No. of classes = 5
Class Interval = (74)/5 =14.8 → take C.I.= 15
Lower boundary of the first class = 80-0.5 = 79.5
Upper boundary of the first class = 79.5 + 15 = 94.5
Example 2.2 :-Group the following data into classes and show its frequency table.
111 120 127 129 130 145 145 150 153 155 160 161
165 167 170 171 174 175 177 179 180 180 185 185
190 195 195 201 210 220 224 225 230 245 248
Solution :-
Largest value = 248
Smallest value = 111
Range = 248 – 111 = 137
Take No. of classes = 6
Class Interval = (137)/6 =22.8333 → take C.I.= 23
Lower boundary of the first class = 111-0.5 = 110.5
Upper boundary of the first class = 110.5 + 23 = 133.5
6
Engineering Statistics Second Year
2.4 Relative frequency
The relative frequency of a class is obtained by dividing the frequency for a class
by the sum of all the frequencies. The relative frequencies for the five classes in
Example 2.1 are shown below. The sum of the relative frequencies will always equal
one.
Frequency Relative frequency
No.
(F) (RF)
1 8 8/75 = 0.11
2 14 14/75 = 0.19
3 24 24/75 = 0.32
4 16 16/75 = 0.21
5 13 13/75 = 0.17
Total 75 1
2.5 Percentage
The percentage for a class is obtained by multiplying the relative frequency for
that class by 100. The percentages for the six classes in Example 2.2 are shown in
below. The sum of the percentages for all the categories will always equal 100
percent.
Frequency Relative frequency
No. Percentage
(F) (RF)
1 5 5/35 = 0.14 0.14 x 100 = 14 %
2 5 5/35 = 0.14 0.14 x 100 = 14 %
3 10 10/35 = 0.29 0.29 x 100 = 29 %
4 8 8/35 = 0.23 0.23 x 100 = 23 %
5 4 4/35 = 0.11 0.11 x 100 = 11 %
6 3 3/35 = 0.09 0.09 x 100 = 9 %
Total 35 1 100 %
7
Engineering Statistics Second Year
There are different types of Cumulative frequency; which are shown below:
Relative Relative
Ascending Descending
Relative Ascending Descending
Frequency cumulative cumulative
No. frequency cumulative cumulative
(F) frequency frequency
(RF) frequency frequency
(ACF) (DCF)
(RACF) (RDCF)
1 8 8/75 = 0.11 8 8/75 = 0.11 75 75/75 = 1
2 14 14/75 = 0.19 22 22/75 = 0.29 67 67/75 = 0.89
3 24 24/75 = 0.32 46 46/75 = 0.61 53 53/75 = 0.71
4 16 16/75 = 0.21 62 62/75 = 0.82 29 29/75 = 0.39
5 13 13/75 = 0.17 75 75/75 = 1 13 13/75 = 0.17
Total 75
2.7.1 Histogram
The data are divided into groups according to their magnitudes. The horizontal
axis of the graph gives the magnitudes. Blocks are drawn to represent the groups,
each of which has a different upper and lower limit. The area of a block is
proportional to the number of occurrences in the group. The variability of the data is
shown by the horizontal spread of the blocks, and the most common values are found
in blocks with the largest areas. Other features such as the symmetry of the data or
lack of it are also shown. The first step is to take into account the range r of the
observations, that is, the difference between the largest and smallest values.
8
Engineering Statistics Second Year
2.7.2 Frequency polygon
A frequency polygon is a useful characteristic tool to determine the distribution
of a variable. It can be drawn by joining the midpoints of the tops of the rectangles of
a histogram after extending the diagram by one class on both sides. We assume that
equal class widths are used. If the ordinates of a histogram are divided by the total
number of observations, then a relative frequency histogram is obtained. Thus, the
ordinates for each class denote the probabilities bounded by 0 and 1, by which we
simply mean the chances of occurrence. The resulting diagram is called the relative
frequency polygon. The polygon ends on the horizontal axis at a distance equal to
one half class interval after the upper boundary.
Example 2.3:-
For the following classified data, draw the histogram, frequency polygon, and
the cumulative frequency curves.
Class Limits 11 – 27 28 – 44 45 – 61 62 – 78 79 – 95
Frequency 7 9 15 8 6
Solution :-
No. C.L F C.M. C.B. ACF DCF
1 11 – 27 7 19 10.5 – 27.5 7 45
2 28 – 44 9 36 27.5 – 44.5 16 38
3 45 – 61 15 53 44.5 – 61.5 31 29
4 62 – 78 8 70 61.5 – 78.5 39 14
5 79 – 95 6 87 78.5– 95.5 45 6
9
Engineering Statistics Second Year
16.00
12.00
Frequency
Polygon
Frequency
8.00 Histogr
am
4.00
0.00
50.00
Descending
Curve
40.00
Ascending
Cumulative Frequency
30.00 Curve
20.00
10.00
0.00
10
Engineering Statistics Second Year
CHAPTER THREE
3.1 Introduction
When describing a numerical data set, it is common to report both a value
that describes where the data distribution is centered along the number line and a
value that describes how spread out the data distribution is.
Measures of center describe where the data distribution is located along the number
line. A measure of center provides information about what is “typical.”
There is more than one way to measure center and spread in a data distribution.
Mean is the most commonly used measure of central tendency. The mean of a
set of N-numbers x1,x2,x3,……,xn is denoted by ( x ) and is defined as :-
11
Engineering Statistics Second Year
n
x i
x = i =1
n
Example 3.1:-
For the following raw data on a variable X: 1, 4, 10, 8, 10. What is the mean of
X or the value of x ?
Solution :-
1 + 4 + 10 + 8 + 10 33
x = = = 6.6
5 5
x /
f
i i
x = i =1
n
f
i =1
i
where :-
x/i : is the class mark of class (i).
fi : is the frequency of class (i).
n : the number of classes.
Example 3.2 :-
For the following classified data, find the mean.
12
Engineering Statistics Second Year
Solution :-
x = 4468/49 = 91.18
Example 3.3:-
For the following raw data on a variable X: 1, 8, 5, 10, 15, 2 and variable
Y: 8, 7, 12, 8, 6, 2, 4, 3, 5, 11, 10. What is the median of X and Y?
Solution :-
For variable X:-
Arranging the data in an increasing sequence gives, 1, 2, 5, 8, 10, 15
Median = (5+8)/2=6.5
13
Engineering Statistics Second Year
Median of Classified Data
The median of classified data can be founded by determined first the smallest
class that has an ascending cumulative frequency (ACF) greater than the half of
summation of all frequencies. This class called the median class. Thus, the median
can be determined as follows :-
n2 − ( f ) L
Median = B L + C
f median class
where :-
BL : lower boundary of the median class.
n : number of observations (data).
(∑f )L : sum of all class frequencies lower than the median class.
fmedian class : frequency of median class.
C : the median class interval.
Example 3.4 :-
For the following classified data, find the median.
Class Limits 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79
Frequency 6 12 15 13 8
Solution :-
No. Class Limits Frequency Class Boundaries ACF
1 30 – 39 6 29.5 – 39.5 6
2 40 – 49 12 39.5 – 49.5 18
3 50 – 59 15 49.5 – 59.5 33
4 60 – 69 13 59.5– 69.5 46
5 70 – 79 8 69.5 – 79.5 54
n/2 = 54/2=27
Then the median class is (49.5 – 59.5) which is the third class.
BL= 49.5
n = 54
(∑f)L= 18
fmedian class= 15
C = 10
54
2 − 18
Median = 49.5 + 10 = 55.5
15
14
Engineering Statistics Second Year
3.2.3 The mode
The mode is the value in a data set that occurs the most often. If no such value
exists, we say that the data set has no mode. If two or more such values exist, we say
the data set is multimodal. There is no symbol that is used to represent the mode.
Solution :-
For variable X:-
Mode = 4 (its frequency is two)
There is only one mode.
Modal Class: The class which has the maximum number of observations.
15
Engineering Statistics Second Year
Example 3.6:-
Find the mode for the following classified data?
Class Limits 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79
Frequency 6 12 15 13 8
Solution :-
Modal class is ( 50 – 59 ) which is the third class.
Lw = 49.5
Δ1 = 15-12 = 3
Δ2 = 15-13 = 2
C = 10
:. Mode = 49.5 + {3 / (3+2)} x 10 = 55.5
log x i
log G = i =1
i = 1, 2, 3, ……., n
n
Example 3.7:- Find the geometric mean for the following data 1,2,3,4,5.
Solution :-
n=5, total number of values.
Then G = [x1 x 2 x 3 ...... x n ]1/n = [1 2 3 4 5]1/5 = 2.60517
3.2.5 The harmonic mean
The harmonic mean of a set of n-numbers x1,x2,x3,……,xn , can be evaluated as
follows :-
n n
H = n
=
1 1 1 1 1
x
i =1 x1
+
x2
+
x3
+ ..... +
xn
i
Example 3.8:- Find the harmonic mean for the following data 1,2,3,4,5.
Solution :-
n=5, total number of values
5
Then, H= = 2.189781
1 1 1 1 1
+ + + +
1 2 3 4 5
16
Engineering Statistics Second Year
3.3 Measures of dispersion
In addition to locating the center of the data, another important aspect of a
descriptive study of data is numerically measuring the extent of variation around the
center. Two data sets may show similar positions of center, but may be remarkably
different with respect to variability. For example, the following figure that show dot
diagrams with similar center values but different variations.
There are several forms that are used as measures of variation such as range,
mean absolute deviation, variance, and standard deviation.
x i −x
M.A.D. = i =1
n
For grouped (classified) data, the M.A.D. can be calculated by the following
formula:-
f i x i/ − x
M .A .D . = i =1
n
where :-
c : number of classes.
n : number of observation.
x/i the class mark in grouped data.
fi the frequency of class i in grouped data.
x/i ,fi : the mark and frequency of class i.
x : the mean of (grouped or ungrouped) data.
17
Engineering Statistics Second Year
Example 3.9:- Find the mean absolute deviation for the following data 1,2,4,5.
Solution :- Mean = (1+2+4+5)/4=12/4=3
M.A.D. =[(2+1+1+2)/4]=6/4=1.5
n −1 n −1
(x i
/
− x )2
S2 = i =1
n −1
Example 3.11 :- Find the coefficient of variance for data in example 3.10 above?
Solution :- Mean ( x ) = (3+4+6+7+10)/5=6
Variance (S2) = [(-3)2+(-2)2+02+12+42)/4=7.5
Standard deviation (S) = 7.5 =2.738
S 2.738
Coefficient of variance (C.V.) = − 100 = 100 = 45.6333 %
x 6
18
Engineering Statistics Second Year
Example 3.12 :- Find the mean absolute deviation (M.A.D.), standard deviation (S),
and coefficient of variance (C.V.) for the grouped (classified) data below?
Class Limits 60 – 62 63 – 65 66 – 68 69 – 71 72 – 74
Frequency 5 18 42 27 8
Solution :-
x
i =1
i
/
fi
x= n
= 6745/100 = 67.45
f
i =1
i
f i x i/ − x
M .A .D . = i =1
= 226.5/100 = 2.265
n
c
(x i
/
− x )2
91.0125
S = i =1
= = 0.9193
n −1 99
S
. .=
CV 100 = (0.9193/67.45) * 100 = 1.3629 %
x
Homework A: For the following table that represent the statistical data for ages and
its frequencies. Find the following: the mean, median, mode, mean absolute
deviation, standard deviation, and Coefficient of Variance.
Age 5 – 14 15 – 24 25 – 34 35 – 44 45 – 54
Frequency 750 2005 1950 195 100
Homework B: As shown in the table below that gives the age distribution of
individuals starting new companies. Find the mean, median, mean absolute
deviation, standard deviation, and Coefficient of Variance for this distribution.
Age 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69
Frequency 11 25 14 7 3
19
Engineering Statistics Second Year
CHAPTER FOUR
THE PROBABILITY
4.1 Introduction
Probability theory is a mathematical theory to describe and analyze situations
where randomness or uncertainty is present. An experiment that can result in
different outcomes, even though it is repeated in the same manner every time, is
called a random experiment. After the experiment is over, we call the result an
outcome. For any given experiment, there is a set of possible outcomes, and can be
state as the set of all possible outcomes in a random experiment is called the sample
space, denoted S. The subsets of the sample space of random experiment called
event.
An empty collection has probability zero and the whole collection one. Also,
the probability of non-occurrence of event (A) is referred :
n −k k
q (A ) = P (not A ) = = 1− = 1 − P (A )
n n
Example 4.1:- as shown in the figure below, a sample space S consisting of eight
outcomes, each of which is labeled with a probability value. Find
P(A) and q(A)?
20
Engineering Statistics Second Year
Solution :-
P(A)= 0.10+0.15+0.30=0.55
q(A)=0.10+0.10+0.05+0.15+0.05=0.45
or q(A)=1-P(A)=1-0.55=0.45 because P(A)+q(A)=1, which is general rule.
Example 4.2:- As shown in the table below that gives the age distribution of
individuals starting new companies. Find the following
probabilities: 1-Ages (30-39). 2-Age=45. 3-Age>50. 4-Age<40.
Age 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69
Frequency 11 25 14 7 3
Solution :-
1- The probability of ages between (30-39) = 25/60= 0.41667
2- The probability of age (45) = probability of the class/class interval
= (14/60)/10=0.02334
3- The probability of age > 50 = probability of the classes (50-59) and (60-69)
= (7/60) + (3/60) = (10/60) = 0.16667
4- The probability of age < 40 = probability of the classes (20-29) and (30-39)
= (11/60) + (25/60) = (36/60) = 0.6
Example 4.3:- Figure below shows a sample space S that consists of nine
outcomes. Event A consists of three outcomes, and event B
consists of five outcomes. Find P(A∩B), P(A∩A´), P(A´∩B),
and P(A∩B´).
Solution :-
21
Engineering Statistics Second Year
P(A∩B) = 0.07 + 0.19 = 0.26 (there are two outcomes that contained within both
events A and B), Which is the probability that both events A and B occur
simultaneously.
Event A´, the complement of the event A, is the event consisting of the six outcomes
that are not in event A. Notice that there are obviously no outcomes in
A∩A´, and this is written as A∩A´ = ∅. Where ∅ is referred to as the “empty set,” a
set that does not contain anything.
P(A∩A´) = P(∅) = 0
And it is impossible for the event A to occur at the same time as its complement (A´).
A more interesting event is the event A´∩B. This event consists of the three
outcomes that are contained within event B but that are not contained within event A.
It has a probability of
P(A´∩B) = 0.04 + 0.14 + 0.12 = 0.30 (which is the probability that event B occurs
but event A does not occur.)
P(A∩B´) = 0.01 (This is the probability that event A occurs but event B does not.)
Notice that
P(A∩B) + P(A∩B´) = 0.26 + 0.01 = 0.27 = P(A)
and similarly that
22
Engineering Statistics Second Year
P(A∩B) + P(A´∩B) = 0.26 + 0.30 = 0.56 = P(B)
The following two equalities hold in general for all events A and B:
Two events A and B that have no outcomes in common are said to be mutually
exclusive events. In this case A∩B=∅ and P(A∩B)=0.
Some other simple results concerning the intersections of events are as follows:
Notice that the outcomes in the event A∪B can be classified into three kinds.
They are
1- in event A, but not in event B
2- in event B, but not in event A
3- in both events A and B
Since the probability of A∪B is obtained as the sum of the probability values of the
outcomes within (mutually exclusive) events, the following result is obtained:
Substituting in these expressions for P(A∩B´) and P(A´∩B) gives following result:
This equality has the intuitive interpretation that the probability of at least one of the
events A and B occurring can be obtained by adding the probabilities of the two
23
Engineering Statistics Second Year
events A and B and then subtracting the probability that both the events occur
simultaneously.
Example 4.4:- The sample space of nine outcomes illustrated in Example 4.3 can
be used to demonstrate some general relationships between unions
and intersections of events.
Solution :-
The event (A∪B) consists of the six outcomes, and it has a probability of
P(A∪B) = 0.01 + 0.07 + 0.19 + 0.04 + 0.14 + 0.12 = 0.57
The event (A∪B)´, which is the complement of the union of the events A and B,
consists of the three outcomes that are neither in event A nor in event B. It has a
probability of
Notice that the event (A∪B)´ can also be written as A´∩B´ since it consists of
those outcomes that are simultaneously neither in event A nor in event B. This is a
general result:
(A∪B)´ = A´∩B´
Furthermore, the event A´∪B´ consists of the seven outcomes, and it has a
probability of
P(A´∪B´) = 0.01 + 0.03 + 0.22 + 0.18 + 0.12 + 0.14 + 0.04 = 0.74
However, this event can also be written as (A∩B)´ since it consists of the outcomes
that are in the complement of the intersection of sets A and B. Hence, its probability
could have been calculated by
24
Engineering Statistics Second Year
4.3.3 Combinations of Three or More Events
The probability of the union of three events A, B, and C is the sum of the
probability values of the simple outcomes that are contained within at least one of the
three events. It can also be calculated from the expression
P(A∪B∪C) =[P(A)+P(B)+P(C)]-[P(A∩B)+P(A∩C)+P(B∩C)]+P(A∩B∩C)
Example 4.5:- Three events decompose the sample space into eight regions as
shown in the figure below. Find P(A∪B∪C) ?
Solution :-
The event A, is composed of the regions 2, 3, 5, and 6, and the event A∩B is
composed of the regions 3 and 6. The event A∩B∩C, the intersection of the events
A, B, and C, consists of the outcomes that are simultaneously contained within all
three events A, B, and C , it corresponds to region 6. The event A∪B∪C, the
union of the events A, B, and C, consists of the outcomes that are in at least one of
the three events A, B, and C , it corresponds to all of the regions except for region 1.
Hence region 1 can be referred to as (A∪B∪C)´ since it is the complement of the
event A∪B∪C.
25
Engineering Statistics Second Year
The event B∩C is composed of regions 6 and 7, so A∪(B∩C) is composed of
regions 2, 3, 5, 6, and 7. In contrast, the event A∪B is composed of regions 2, 3, 4,
5, 6, and 7, so (A∪B)∩C is composed of just regions 5, 6, and 7.
The required probability, P(A∪B∪C), is the sum of the probability values of the
outcomes in regions 2, 3, 4, 5, 6, 7, and 8. However, the sum of the probabilities
P(A), P(B), and P(C) counts regions 3, 5, and 7 twice, and region 6 three times.
Subtracting the probabilities P(A∩B), P(A∩C), and P(B∩C) removes the double
counting of regions 3, 5, and 7 but also subtracts the probability of region 6 three
times. The expression is then completed by adding back on P(A∩B∩C), the
probability of region 6.
The figure below illustrates three events A, B, and C that are mutually exclusive
because no two events have any outcomes in common. In this case,
P(A∪B∪C) = P(A) + P(B) + P(C)
In general
The union and intersection operations satisfy the following laws: For any subsets A,
B, C of S, we have:
26
Engineering Statistics Second Year
4.4 Combinations Rule
A second method of determining the number of sample points for an
experiment is to use combinatorial mathematics. This branch of mathematics is
concerned with developing counting rules for given situations. For example, there is a
simple rule for finding the number of different samples of 5 student selected from
1,000. This rule, called the combinations rule, is given below.
( ) = n ! (NN −! n )!
N
n
Where: -
n! = n (n-1) (n-2) …. (3)(2)(1)
and similarly for N! and (N – n)!
Example 4.6: - Consider the task of choosing 2 students from 4 students. Use the
combinations counting rule to determine how many different
selections can be made.
Solution: -
For this example, N = 4, n = 2, and
Classwork: - Compute the number of ways you can select n elements from N
elements for each of the following:
a. n = 2, N = 5
b. n = 3, N = 6
c. n = 5, N = 20
27
Engineering Statistics Second Year
Example 4.7: - There are 20 engineers for three different positions: E1, E2, and E3.
How many different ways that the positions can be fill?
Solution: -
There are k = 3 sets of elements, corresponding to
Set 1: engineers available to fill position E1
Set 2: engineers remaining (after filling E1) that are available to fill E2
Set 3: engineers remaining (after filling E1 and E2) that are available to fill E3 .
The numbers of elements in the sets are n1= 20, n2= 19, and n3= 18.
Therefore, the number of different ways of filling the three positions is given by the
multiplicative rule as n1 n2 n3 = (20) (19) (18) = 6840.
1. Multiplicative rule. If you are drawing one element from each of k sets of
elements , where the sizes of the sets are n1 , n2 , …. , nk , then the number of
different results is
( n1 n2 n3 ….nk )
2. Permutations rule. If you are drawing n elements from a set of N elements and
arranging the n elements in a distinct order , then the number of different results
is
N!
PnN =
(N − n )!
3. Partitions rule. If you are partitioning the elements of a set of N elements into k
groups consisting of n1 ,n2 , … ,nk elements (n1+…+ nk= N), then the
number of different results is
N!
n1 ! n 2 ! .....n k !
Example 4.10: - For the table shown below which represent the number of persons
for a small city. Find the conditional probability for chosen a man
and the one that chosen is educated?
Educated Uneducated Total
Male 460 40 500
Female 140 260 400
Total 600 300 900
Solution: -
For this example, M: a man is chosen, E: the one chosen is educated
P(M\E) = P(E∩M)/ P(E)=(460/900)/(600/900)= 0.766666
29
Engineering Statistics Second Year
Example 4.11: - The following table summarizes the analysis of samples of
galvanized steel for coating weight and surface roughness. The
results from 100 samples are shown.
Coating Weight
High Low
Surface High 70 9
Roughness Low 16 5
Let A denote the event that a sample has high coating weight,
and let B denote the event that a sample has high surface
roughness. Determine the following probabilities:
(a) P(A ) (b) P(B) (c) P (A \B) (d) P(B \A)
Solution: -
(a) P(A)=(70+16)/100=86/100
(b) P(B)=(70+9)/100=79/100
(c) P(A\B)=P(B∩A)/P(B)= (70/100)/(79/100)=70/79
(d) P(B\A)=P(A∩B)/P(A)= (70/100)/(86/100)=70/86
P(A)=P(B1)P(A\B1)+P(B2)P(A\B2)+…….+P(Bk)P(A\Bk) , Or
k k
P (A ) = P (B i A ) = P (B i ) P (A \ B i )
i =1 i =1
30
Engineering Statistics Second Year
Example 4.12: - Plant have three machines, B1, B2, and B3, make 30%, 45%, and
25%, respectively, of the products. It is known from past experience
that 2%, 3%, and 2% of the products made by each machine,
respectively, are BAD. Now, suppose that a finished product is
randomly selected. What is the probability that it is BAD?
Solution: -
Consider the following events:
A : the product is BAD,
B1 : the product is made by machine B1 ,
B2 : the product is made by machine B2 ,
B3 : the product is made by machine B3 .
Applying the rule of elimination, and make a tree diagram,
P(A)= P(B1)P(A\B1) + P(B2)P(A\B2) + P(B3)P(A\B3)
Referring to the tree diagram, the three branches give the probabilities
Bayes’s rule, can be applied when an observed event A occurs with any one of
several mutually exclusive and exhaustive events, B1 ,B2 , … , Bk . The formula
for finding the appropriate conditional probabilities is given below
Bayes’s Rule
Given k mutually exclusive and exhaustive events, B1, B2 , … ,Bk such that
P(B1)+P(B2)+…..+P(Bk)=1, and given an observed event A , it follows that
P(Bi\A)=P(Bi∩A)/P(A)
Where: P(A)=P(B1)P(A\B1)+P(B2)P(A\B2)+…….+P(Bk)P(A\Bk)
31
Engineering Statistics Second Year
Example 4.13: - For example 4.12, if a product was chosen randomly and found to
be BAD, what is the probability that it was made by machine B3 ?
Solution: -
Using Bayes’ rule,
P(B3\A)=P(B3∩A)/P(A)
P(B3∩A)=P(B3)P(A\B3)= (0.25)(0.02) = 0.005,
P(A)=P(B1)P(A\B1)+P(B2)P(A\B2)+P(B3)P(A\B3)= 0.006+0.0135+0.005= 0.0245.
Then,
P(B3\A)= 0.005/0.0245=0.2041
Example 4.14: - The table shows the probabilities for product failure subjected to
level of contamination in manufacturing:
Probability of Failure Level of Contamination
0.10 High
0.01 Medium
0.001 Low
In a particular production run, 20% of the chips are subjected to high levels of
contamination, 30% to medium levels of contamination, and 50% to low levels of
contamination. What is the probability that a product using one of these chips
fails?
Solution: -
Let:
H denote the event that a chip is exposed to high levels of contamination
M denote the event that a chip is exposed to medium levels of contamination
L denote the event that a chip is exposed to low levels of contamination
Then,
P(F)=P(F\H)P(H)+P(F\M)P(M)+P(F\L)P(L)=0.10(0.2)+0.01(0.30)+0.001(0.50)=0.0235
The calculations are conveniently organized with the tree diagram.
32
Engineering Statistics Second Year
4.10 Geometric Probability
Probability is always expressed as a ratio between 0 and 1 that gives a value to
how likely an event is to happen. A probability of 0 means there is no chance of that
event happening. A probability of 1 means the particular event will always happen.
To calculate geometric probability, it will need to find the areas of the shapes
involved in the problem. It will need to know the total area, which means the biggest
area in the diagram. It will also need to know the desired area. The formula is simply:
desired
P=
total
Example 4.14: Find the probability for a circle inscribed in a square of 5 cm?
Solution:-
1- Draw both of circle and square.
2- When circle inscribed in a square, it must have four points in its boundaries in
tangent with the square.
Area of circle=πr2
= 3.14 * (2.5)2= 19.63 cm2
Area of square=side 2
= (5)2= 25 cm2
Example 4.15: Two numbers are chosen which their values between 0 and 1
randomly. What is the probability that the sum of their squares is
greater than 1?
33
Engineering Statistics Second Year
Solution:-
1. Represent each number by a coordinate axis and then determine the
sample space.
y
1
0 x
0 1
1
X2 + Y2
<1
X2 + Y2
0 >1
0
x
1
X2 + Y2 =
Then, 1
P = Desired / Total
Total = 1 x 1 = 1
Desired= 1 – (π 12 / 4) = 1– π/4
34
Engineering Statistics Second Year
CHAPTER FIVE
PROBABILITY DISTRIBUTION
5.1 Introduction
In an experiment, a measurement is usually denoted by a variable such as X. In
a random experiment, a variable whose measured value can change (from one
replicate of the experiment to another) is referred to as a random variable. There are
two types of random variables. A discrete random variable is a random variable with
a finite (or countable infinite) set of real numbers for its range. A continuous random
variable is a random variable with an interval (either finite or infinite) of real numbers
for its range.
Notice that cost is a random variable because its values 50, 200, and 350 are
numbers. The breakdown cause, defined to be electrical, mechanical, or operator
misapply, is not considered to be a random variable because its values are not
numerical.
35
Engineering Statistics Second Year
5.2 Discrete probability distribution
For a discrete random variable X, its distribution can be described by a
function that specifies the probability at each of the possible discrete values for X.
(1) f (x i ) 0
n
(2) f (x
i =1
i ) =1
(3) f ( x i ) = P ( X = x i )
(1) F (x ) = P (X x ) = f (x i )
x i x
(2) 0 F ( x ) 1
(3) if x y , then F (x ) F ( y )
0.8 if x = 0
p (x ) = 0.2 if x = 1
0 if x 0 or 1
Example 5.2: - Determine the probability mass function (pmf) of X from the
following cumulative distribution function:
0 x −2
0.2 −2 x 0
F (x ) =
0.7 0x 2
1 2x
36
Engineering Statistics Second Year
Solution.
The Figure above displays a plot of F(x) . From the plot, the only points that receive
nonzero probability are –2, 0, and 2. The probability mass function (pmf) at each
point is the jump in the cumulative distribution function at the point.
Therefore,
f (−2) = 0.2 − 0 = 0.2
f (0) = 0.7 − 0.2 = 0.5
f (2) = 1.0 − 0.7 = 0.3
The mean of X is
b +a
= E (X ) =
2
The variance of X is
( b − a + 1) 2 − 1
=
2
12
Example 5.3: - The first digit of serial number is any one of the digits 0 through 9. If
one part is selected and X is the first digit, find f(x)?
Solution.
X has a discrete uniform distribution with probability 0.1 for each value. That is,
f ( x ) = 0.1
The random variable X that equals the number of trials that result in a success
is a binomial random variable with parameters 0 < p < 1 and n = 1,2 …. The
probability mass function of X is
n
f (x ) = p x (1 − p )n −x , x = 0,1, 2,......, n
x
n n!
Where, represent the following value.
x x ! (n − x ) !
If X is a binomial random variable with parameters p and n,
The mean of X is
= E (X ) = n p
The variance of X is
2 = n p (1 − p )
Example 5.5:- Each sample of water has a 10% chance of containing a particular
organic pollutant. Assume that the samples are independent with regard to the
presence of the pollutant. Find the following :-
(a) The probability that in the next 18 samples, exactly 2 contain the pollutant.
(b) The probability that at least four samples contain the pollutant in 18 samples.
(c) The probability that 3 ≤ X < 7 where X are samples contain the pollutant .
Solution
Let X = the number of samples that contain the pollutant in the next 18
samples analyzed. Then X is a binomial random variable with p = 0.1 and
n = 18.
38
Engineering Statistics Second Year
18 2 18− 2
(a) P ( X = 2) = 0.1 (1 − 0.1) = 0.284
2
x
18 18 3
18 x
(b) P ( X 4) =
x =4 x
0.1 (1 − 0.1)18− x
= 1 − 0.1 (1 − 0.1)
x =0 x
18− x
= 0.098
6
18
(c) P ( 3 X 7 ) = 0.1x (1 − 0.1)18−x = 0.265
x =3 x
Example 5.6:- Let X be a binomial random variable with p=0.1, and n=10. Calculate
the following probabilities from the binomial probability mass function:
(a ). P (X 2) (b ). P (X = 4) (c ). P (5 X 7)
Solution
2
10
(a) P (X = 2) = 0.1x (1 − 0.1)10−x = 0.9298
x =0 x
10
(b) P (X = 4) = 0.14 (1 − 0.1) 6 = 0.0112
4
7
10
(c) P ( 5 X 7 ) = 0.1x (1 − 0.1)10−x = 0.0016
x =5 x
For example, N=10, the entry in the column corresponding to p = 0.10 and the
row corresponding to x = 2 is 0.930 , and its interpretation is
2
10
P ( x 2) = 0.1x (1 − 0.1)10−x = P (x = 0) + P (x = 1) + P (x = 2) = 0.930
x =0 x
Classwork: - For N=20, p=0.6. Use Binomial Tables to calculate the following:
1- The probability that x 10
2- The probability that x 12
3- The probability that x = 11
Ans.
1- 0.245 2- 0.416 3- 0.159
39
Engineering Statistics Second Year
N=5
N=6
N=7
N=8
N=9
N=10
40
Engineering Statistics Second Year
N=15
N=20
N=25
41
Engineering Statistics Second Year
f (x ) = (1 − p ) x −1 p , x = 1, 2,.....
The mean of X is
1
= E (X ) =
p
The variance of X is
(1 − p )
2 =V (X ) =
p2
Example 5.7:- Let X denotes a random variable having a geometric distribution, with
probability of success on any trial “p” ( p=0.4) . Find (a ). P (X 2) (b ). P (4 X 7)
Solution
a − f (x ) = P (X 2) = (1 − 0.4)1−1 0.4 + (1 − 0.4) 2−1 0.4 = 0.64
b − f (x ) = P (4 X 7) = (1 − 0.4)5−1 0.4 + (1 − 0.4) 6−1 0.4 = 0.0829
The mean of Y is
r
= E (Y ) =
p
The variance of Y is
r (1 − p )
2 =V (Y ) =
p2
42
Engineering Statistics Second Year
Example 5.8:- Let Y denotes a random variable having a negative binomial
distribution, with p=0.4 . Find P (Y 4) if a- r=2 and b- r=4
Solution
y − 1 y −2
a − P (Y 4) = 0.4 (1 − 0.4)
2
, y = 2,3, 4,5,.......
2 −1
1 2
P (Y 4) = 1 − 0.42 (0.6) 2−2 + 0.42 (0.6)3− 2 = 1 − 0.16 + 0.192 = 0.648
1 1
y − 1 y −4
b − P (Y 4) = 0.4 (1 − 0.4)
4
, y = 4,5, 6, 7,.......
4 − 1
P (Y 4) = 1
Example 5.9:- It was found that 30% of the applicants for a certain job have
advanced training in computer programming. Applicants are selected at random and
are interviewed sequentially.
a- Find the probability that the first applicant having advanced training is found
on the fifth interview.
b- Suppose three jobs requiring advanced programming training are open. Find
the probability that the third qualified applicant is found on the fifth interview,
if the applicants are interviewed sequentially and at random.
Solution
a − P (Y = 5) = (1 − 0.3)5−1 0.3 = 0.072
b- It was assumed independent trials, with the probability of finding a qualified
applicant on any one trials being 0.3. Let Y denote the number of the trial on
which the third qualified applicant is found. Then Y can reasonably be assumed
to have a negative binomial distribution with r=3 and p=0.3 , so
5 − 1 5−3
P (Y = 5) = 0.3 (1 − 0.3) = 0.0794
3
3 − 1
Example 5.10:- Camera Flashes consider the time to recharge the flash. The
probability that a camera passes the test is 0.8, and the cameras perform
independently. What is the probability that the third failure is obtained in five or
fewer tests?
Solution
Let Y denote the number of cameras tested until three failures have been obtained.
The requested probability is P (Y ≤ 5).
Here Y has a negative binomial distribution with p = (1-0.8) = 0.2 and r = 3.
Therefore,
5
y − 1
P (Y 5) = 0.2 (1 − 0.2)
3 y −3
= 0.056
y =3 3 − 1
43
Engineering Statistics Second Year
5.3 Continuous Probability Distributions
Density functions are commonly used in engineering to describe physical
systems. A probability density function f (x) can be used to describe the
probability distribution of a continuous random variable X. If an interval is likely to
contain a value for X, its probability is large and it corresponds to large values for f
(x) . The probability that X is between a and b is determined as the integral of f (x)
from a to b. Then,
For a continuous random variable X, a probability density function (pdf) is a
function such that
(1) f (x ) 0
(2) f (x )dx = 1
−
b
(3) P (a X b ) = f (x )dx = areaunder f (x ) from ato b for any a and b
a
Example 5.11:- Let X denote the continuous random variable of the current
measured in a thin copper wire in milliamperes. Assume that the range of X is [4.9,
5.1] mA, and assume that the probability density function of X is f (x) = 5 for
4.9 ≤ x≤5.1. What is the probability that a current measurement is less than 5
milliamperes?
Solution
The probability density function is shown in the figure.
It is assumed that f (x) =0 wherever it is not
specifically defined. The shaded area in the figure
indicates the probability.
5 5
P (X 5) = f (x )dx =
4.9
5 dx =0.5
4.9
44
Engineering Statistics Second Year
Example 5.12:- Let X denote the continuous random variable of the diameter of a
hole drilled in a metal sheet. The target diameter is 12.5 millimeters. From past data
show that the distribution of X can be modeled by a probability density function of
f (x ) = 20 e −20( x −12.5) for x 12.5 . If a part with a diameter greater than 12.6 mm is
required, what proportion of those parts?
Solution
From the question, A part is required if X
>12.6. Now, the density function is shown in the
figure.
f (x )dx 20e
−20( x −12.5)
P (X 12.6) = = dx P (X 12.6) = −e −20( x −12.5) = 0.135
12.6 12.6
Now, it can be calculate the proportion of parts between 12.5 and 12.6 mm,
12.6
P (12.5 X 12.6) =
12.5
f (x )dx = − e −20( x −12.5) = 0.865 Or
------------------------------------------------------------
For example, 5.12, the cumulative distribution function can be shown below, consists
of two expressions. F (x) =0 for x<12.5
And the other expression can be determine when x 12.5 as shown below
x
20e
−20( x −12.5)
F (x ) = du =1 − e −20( x −12.5)
12.5
Therefore,
0 x 12.5
F (x ) = −20( x −12.5)
1 − e 12.5 x
The figure represents the graph of F(x).
45
Engineering Statistics Second Year
d F (x )
Then given F(x), f (x ) =
dx
Example 5.13:- The time until a chemical reaction is complete (in milliseconds) is
approximated by the cumulative distribution function F(x), Find the probability
density function of X.
0 x 0
F (x ) = −0.01x
1 − e 0x
Solution
Using the result that the probability density function is the derivative of F (x) ,
0 x 0
f (x ) = −0.01x
0.01e 0x
The probability that a reaction completes within 200 milliseconds is
P (X 200) = F (200) = 1 − e −2 = 0.8647
The mean and variance can also be defined for a continuous random
variable. Suppose that X is a continuous random variable with probability density
function f (x) . The mean or expected value of X, denoted as μ or E (X) , is
= E (X ) = x f (x )dx
−
The variance of X, denoted as V (X) or σ2 , is
=V ( X ) = (x − ) f (x )dx = x f (x )dx − 2
2 2 2
− −
The standard deviation of X is
= 2
4.9
46
Engineering Statistics Second Year
5.3.1 Continuous Uniform Distributions
A normal random variable with μ=0 and σ2=1 is called a standard normal
random variable and is denoted as Z . The cumulative distribution function of a
standard normal random variable is denoted as (z ) = P ( Z z )
47
Engineering Statistics Second Year
Example 5.16:- Assume that Z is a standard normal random variable. Find P(Z ≤
1.5).
Solution
Read down the z column to the row that equals 1.5.
The probability is read from the adjacent
column, labeled 0.00, to be 0.93319.
48
Engineering Statistics Second Year
For another example, P(Z≤1.53) is found by reading down the z column to the row
1.5 and then selecting the probability from the column labeled 0.03 to be 0.93699.
5- P(Z≤ - 4.6)=0 ,
since from the table
the last value for (Z≤ -
3.99)=0.00003
6- Find the value of z,
such that
P(Z>z)=0.05. This
probability expression
can
be written as P(Z ≤ z)=1-0.05=0.95. We search the probabilities to find the
value that equal to 0.95. Then the nearest value is 0.95053,
corresponding to z =1.65.
If X is a normal random variable with the E(X) =μ and V(X) =σ2, the
random variable Z=(X−μ)/σ is a normal random variable with E(Z)=0 and V(Z)=1.
That is, Z is a standard normal random variable. The random variable Z represents
the distance of X from its mean in terms of standard deviations. It is the key step for
calculating a probability for an arbitrary normal random variable.
X − x−
Then P( X x ) = P = P( Z z )
49
Engineering Statistics Second Year
Example 5.18:- Suppose that the current measurements in a strip of wire are assumed
to follow a normal distribution with a mean of 10 milliamperes and a variance of 4
(milliamperes)2.What is the probability that a measurement exceeds 13 milliamperes?
And what is the probability that a current measurement is between 9 and 11
milliamperes?
Solution
Let X denote the current in milliamperes. The requested probability can be
represented as P(X>13). Let Z=(X−10)/2. It is note that X >13 corresponds to Z>1.5.
Therefore, from tables of (z ) = P ( Z z ) ,
f ( x) = e− x for 0 x
The exponential distribution obtains its name from the exponential function in the
probability density function. If the random variable X has an exponential distribution with
parameter λ , It is important to use consistent units to express intervals, X, and λ.
1 1
= E( X ) = and 2 =V(X )=
2
50
Engineering Statistics Second Year
Class Work1:- Suppose that X has an exponential distribution with mean equal to 10.
Determine the following:
Class Work2:- Suppose that the counts recorded by a counter follow a Poisson process
with an average of two counts per minute.
(a) What is the probability that there are no counts in a 30-second interval? Ans.=0.3679
(b) What is the probability that the first count occurs in less than 10 seconds? Ans.=0.2835
(c) What is the probability that the first count occurs between one and two minutes after
start-up? Ans. =0.117
The probability density function of X can be obtained from the derivative of F (x).
This derivative is applied to the last term in the expression for F (x) . Because Φ(⋅) is the
51
Engineering Statistics Second Year
integral of the standard normal density function, the fundamental theorem of calculus is
used to calculate the derivative.
Suppose that W is normally distributed with mean θ and variance ω2 ; then the
X=exp (W) is a lognormal random variable with probability density function
1 ( ln ( x ) − )2
f ( x) = exp − for 0 x
x 2 22
The parameters of a lognormal distribution are θ and ω2 , but these are the mean and
variance of the normal random variable W. Hence, mean and variance of X are
= E ( X ) = e + 2 = V ( X ) = e 2 + (e − 1)
2 2 2
/2
and
Example 5.19:- The lifetime (in hours) of a semiconductor laser has a lognormal
distribution with θ = 10 and ω= 1.5 . What is the probability that the lifetime exceeds 10000
hours? What is the lifetime that exceeded by 99% of lasers? Find mean and standard
deviation of lifetime?
Solution
From the cumulative distribution function for X,
P( X x ) =P exp(W ) x = P W ln( x )
ln( x ) − 10
= 1− = 0.99
1.5
Therefore, from tables of (z ) = P ( Z z ) , 1-Φ(z)=0.99 when z =-2.33. Then,
ln( x ) − 10
= −2.33 and x =exp(6.505)= 668.48 hours
1.5
= E ( X ) = e + /2 = e(10 +1.125) = 67846.3
2
CHAPTER SIX
52
Engineering Statistics Second Year
SAMPLING DISTRIBUTION
6.1 Introduction
In inferential statistics, we want to use characteristics of the sample (i.e. a
statistic) to estimate the characteristics of the population (i.e. a parameter).
A sampling distribution is a probability distribution of a statistic obtained through a
large number of samples drawn from a specific population. The sampling distribution
of a given population is the distribution of frequencies of a range of different
outcomes that could possibly occur for a statistic of a population.
A lot of data drawn and used by academicians, statisticians, researchers, etc.
are actually samples, not population. A sample is a subset of a population. For
example, it is very difficult for a medical researcher to compare the average weight of
all babies born in North America from 1995 to 2005 to those born in South America
within a reasonable amount of time. He will instead only use the weight of, say 100
babies, in each continent to make a conclusion. The weight of 200 babies used is the
sample and the average weight calculated is the sample mean.
Now suppose that instead of taking just one sample of 100 newborn weights
from each continent, the medical researcher takes repeated random samples from the
general population, and computes the sample mean for each sample group. So, for
North America, he pulls up data for 100 newborn weights recorded in the US,
Canada, and Mexico as follows: four 100 samples from select hospitals in the US,
five 70 samples from Canada, and three 150 records from Mexico, for a total of 1200
weights of newborn babies grouped in 12 sets. He also collects a sample data of 100
birth weights from each of the 12 countries in South America. Each sample has its
own sample mean and the distribution of the sample means is known as the
sample distribution.
Other statistics, such as the standard deviation and variance, can be
calculated from a sample data. The standard deviation and variance measure the
variability of the sampling distribution. The standard deviation of a sampling
distribution is called the standard error. Knowing how spread apart the mean of
53
Engineering Statistics Second Year
each of the sample sets are from each other and from the population mean will give
an indication of how close the sample mean is to the population mean. The standard
error of the sampling distribution decreases as the sample size increases.
Solution:
Calculate the population mean
= (19 + 14 + 15 + 9 + 10) / 5 = 13.4 pounds
Part a
Obtain the sampling distribution of the sample mean for a sample size of 2 blocks
54
Engineering Statistics Second Year
Distribution of :
One can thus see that the chance that the sample mean is exactly the population
mean is only 1 in 10, very small. (In some other examples, it may happen that the
sample mean can never be the same value as the population mean.) When using the
sample mean to estimate the population mean, some possible error will be involved
since sample mean is random.
The mean of the sample mean =
(16.5+17.0+14.0+14.5+14.5+11.5+12.0+12.0+12.5+9.5) / 10 = 13.4 pounds
Thus, even though each sample may give an answer involving some error, the
expected value is right at the target: exactly the population mean.
Part b
55
Engineering Statistics Second Year
We can see that using sample mean to estimate population mean involves
sampling error. However, the error with a sample of size 4 is smaller than with a
sample of size 2.
The following dot plots shows the distribution of the sample means
corresponding to sample sizes of 2 and 4 blocks.
The central limit theorem states that the sampling distribution of the mean of any
independent, random variable will be normal or nearly normal, if the sample size is
large enough.
How large is "large enough"? The answer depends on two factors.
1. Requirements for accuracy. The more closely the sampling distribution needs to
resemble a normal distribution, the more sample points will be required.
2. The shape of the underlying population. The more closely the original population
resembles a normal distribution, the fewer sample points will be required.
In practice, some statisticians say that a sample size of 30 is large enough when
the population distribution is roughly bell-shaped. Others recommend a sample size
of at least 40. But if the original population is distinctly not normal (e.g., is badly
skewed, has multiple peaks, and/or has outliers), researchers like the sample size to
be even larger.
56
Engineering Statistics Second Year
6.2.3 Sample Size and Sampling Error
The possible sample means cluster more closely around the population mean as
the sample size increases. Thus, possible sampling error decreases as sample size
increases. The standard deviation of this statistic is called the standard error.
standard error =
where is the standard deviation of the population, n is the sample size.
Example 6.2:
For Example 6.1, find the standard error.
Solution:
=
For n = 2
= 4.037
=
For n = 4
=
When we know that the sample mean is normal or approximately normal, and
we know the population mean, μ, and population standard deviation, σ, then we can
calculate a z-score for the sample mean and determine probabilities for it where:
57
Engineering Statistics Second Year
Example 6.3
The engines made by Ford for speedboats had an average power of 220
horsepower (HP) and standard deviation of 15 HP.
1. A buyer intends to take a sample of 4 engines and will not place an order if the
sample mean is less than 215 HP. What is the probability that the buyer will not
place an order? (Note: Suppose that the distribution of sample mean is normal
distribution).
2. If the customer samples 100 engines, what is the probability that the sample mean
will be less than 215?
Solution:
Part 1
We want to find P ( < 215) = ?
We need to know whether the distribution of the population is normal since the
sample size is too small: n = 4 (less than 30 which is required in the central limit
theorem). If someone confirms that the population normal, then we can proceed since
the sampling distribution of the mean of a normal distribution is also normal for all
sample sizes.
If the population follows a normal distribution, we can conclude that has a
normal distribution with mean 220 HP and a standard error of:
=
Then
P ( < 215)
= P (Z ˂ (215-220) / 7.5)
= P (Z ˂ -0.67)
= 0.2514 (by using normal distribution tables)
The probability that the customer will not place an order is 25.14%.
58
Engineering Statistics Second Year
Part 2
=
P ( < 215)
= P (Z ˂ (215-220) / 1.5)
= P (Z ˂ -.3.33) = 0.0004
Example 6.4
For example 6.1 find the sampling distribution of the variance by taking a random
sample of 4 blocks from the population.
Solution
59
Engineering Statistics Second Year
CHAPTER SEVEN
THE EXPECTATION
7.1 Introduction
Statistical methods are used to make decisions and draw conclusions
about the populations. This aspect of statistics is generally called statistical
inference. These techniques utilize the information in a sample for drawing
conclusions. This chapter begins our study of the statistical methods used in decision
making. Statistical inference may be divided into two major areas: parameter
estimation and hypothesis testing.
60
Engineering Statistics Second Year
As an example of a parameter estimation problem, suppose that an
engineer is analyzing the tensile strength of a component used in an air frame.
This is an important part of assessing the overall structural integrity of the airplane.
Variability is naturally present in the individual components because of
differences in the batches of raw material used to make the components,
manufacturing processes, and measurement procedures (for example), so the
engineer wants to estimate the mean strength of the population of components. In
practice, the engineer will use sample data to compute a number that is in some sense
a reasonable value (a good guess) of the true population mean. This number is called
a point estimate. We will see that procedures are available for developing point
estimates of parameters that have good statistical properties. We will also be able to
establish the precision of the point estimate.
61
Engineering Statistics Second Year
7.2 Expectation properties
Estimation problems occur frequently in engineering. We often need to
estimate
7.3 Moments
The general idea behind the method of moments is to equate population
moments, which are defined in terms of expected values, to the corresponding
sample moments. The population moments will be functions of the unknown
parameters. Then these equations are solved to yield estimators of the unknown
parameters.
62
Engineering Statistics Second Year
CHAPTER EIGHT
THE ESTIMATION
7.1 Introduction
Typically, all the information about a population of interest is not
available. In many situations, it is also not practical to obtain information from each
and every member of the population due to time, monetary, or experimental
constraints. For example, to estimate the breaking strength of manufactured bricks,
pressure must be applied until the brick breaks. In this case, if the manufacturer tries
to collect information from the population, there will be no bricks left to use. Then
how do we get information about the population? We take a random sample
from the population and obtain information from the sampled units. We then use this
information from the sample to estimate or to make inference about the
unknown population characteristics. In this example, to estimate µ, the mean strength
of the batch of bricks manufactured, the manufacturer would probably test a few
bricks from this batch for breaking strength, compute the sample mean from the
measurements, and use the computed sample mean to estimate the mean
breaking strength of the bricks from this whole batch.
63
Engineering Statistics Second Year
• Car mechanics run a few diagnostic tests and estimate the extent of damage
and resulting repair costs.
64