Probabilistic Reliability Engineering
Probabilistic Reliability Engineering
ENGINEERING
BORIS GNEDENKO
Moscow State University and SOTAS, Inc.
IGOR USHAKOV
SOTAS, Inc. and George Washington University
A Wiley-lnterscience Publication
JOHN WILEY & SONS, INC.
New York / Chichester / Brisbane / Toronto / Singapore
This text is printed on acid-free paper.
ISBN-0-471-30502-2
1 0 9 8 7 6 5 4 3 2 1
CONTENTS
Preface xv
Introduction xvii
1 Fundamentals 1
1.1 Discrete Distributions Related to Reliability / 1
1.1.1 Bernoulli Distribution / 1
1.1.2 Geometric Distribution / 2
1.1.3 Binomial Distribution / 5
1.1.4 Negative Binomial Distribution / 6
1.1.5 Poisson Distribution / 8
1.2 Continuous Distributions Related to Reliability / 10
1.2.1 Exponential Distribution / 10
1.2.2 Erlang Distribution / 14
1.2.3 Normal Distribution / 15
1.2.4 Truncated Normal Distribution / 19
1.2.5 Weibull-Gnedenko Distribution / 19
1.2.6 Lognormal Distribution / 22
1.2.7 Uniform Distribution / 24
1.3 Summation of Random Variables / 25
1.3.1 Sum of a Fixed Number of Random
Variables / 25
v
Xii CONTENTS
2 Reliability Indexes 86
2.1 Unrepairable Systems / 87
2.1.1 Mean Time to Failure / 87
2.1.2 Probability of a Failure-Free Operation / 88
2.1.3 Failure Rate / 90
2.2 Repairable System / 92
2.2.1 Description of the Process / 92
2.2.2 Availability Coefficient / 93
2.2.3 Coefficient of Interval Availability / 94
2.2.4 Mean Time Between Failures and Related
Indexes / 95
2.3 Special Indexes / 98
2.3.1 Extra Time Resource for Performance / 98
2.3.2 Collecting Total Failure-Free Time / 99
2.3.3 Acceptable Idle Intervals / 100
2.4 Choice of Indexes and Their Quantitative Norm / 100
2.4.1 Choice of Indexes / 100
2.4.2 Required Reliability Level / 102
Conclusion / 107
References / 107
Exercises / 107
Solutions / 108
Excrciscs / 191
Solutions / 191
Exercises / 254
Solutions / 254
7 Repairable Duplicated System 256
7.1 Markov Model / 256
7.1.1 Description of the Model / 257
7.1.2 Nonstationary Availability Coefficient / 258
7.1.3 Stationary Availability Coefficient / 260
7.1.4 Probability of Failure-Free Operation / 262
7.1.5 Stationary Coefficient of Interval Availability / 264
7.1.6 MTTF and MTBF / 265
7.2 Duplication with an Arbitrary Repair Time / 267
7.3 Standby Redundancy with Arbitrary Distributions / 275
7.4 Method of Introducing Fictitious States / 278
7.5 Duplication with Switch and Monitoring / 286
7.5.1 Periodic Partial Control of the Main Unit / 286
7.5.2 Periodic Partial Monitoring of Both Units / 288
7.5.3 Unreliable Switch / 290
7.5.4 Unreliable Switch and Monitoring of Main
Unit / 292
Conclusion / 293
References / 294
Exercises / 294
Solutions / 296nn
8 Analysis of Performance Effectiveness 298
8.1 Classification of Systems / 298
8.1.1 General Explanation of Effectiveness
Concepts / 298
8.1.2 Classes of Systems / 300
8.2 Instant Systems / 301
8.3 Enduring Systems / 308
8.4 Particular Cases / 312
8.4.1 Additive Type of a System Unit's Outcome / 312
8.4.2 Systems with a Symmetrical Branching
Structure / 316
8.4.3 System with Redundant Executive Units / 322
8.5 Systems with Intersecting Zones of Action / 323
8.5.1 General Description / 323
8.5.2 Additive Coefficient of Effectiveness / 325
Xii CONTENTS
Two-Pole Networks
9.1 Rigid Computational Methods / 341
9.1.1 Method of Direct Enumeration / 341
9.1.2 Method of Boolean Function Decomposition / 343
9.2 Method of Paths and Cuts / 345
9.2.1 Esary-Proschan Bounds / 345
9.2.2 Litvak-Ushakov Bounds / 348
9.2.3 Comparison of the Two Methods / 353
9.2.4 Method of Set Truncation / 354
9.2.5 Generalization of Cut-and-Path Bounds / 356
9.3 Methods of Decomposition / 359
9.3.1 Moore-Shannon Method / 359
9.3.2 Bodin Method / 362
9.3.3 Ushakov Method / 364
9.4 Monte Carlo Simulation / 370
9.4.1 Modeling with Fixed Failed States / 370
9.4.2 Simulation Until System Failure / 372
Conclusion / 373
Xii CONTENTS
References / 374
Exercises / 375
Solutions / 376
Xii CONTENTS
Abbreviation Meaning
birth and death process
BDP distribution function
d.f. generating function
g.f.
i.i.d. independent and identically distributed
LST Laplace-Stieltjes transform
m.g. moment generating function
f. mean repair time
MR mean time between failures
T mean time to failure
MTBT probability of failure-free operation
MT pseudo-random variable
TF random variable
PFF time to failure
O
p.r.v
.
r.v.
TTF
PREFACE
This book was initially undertaken in 1987 in Moscow. We have found that
the majority of books on mathematical models of reliability are very special-
ized: essentially none of them contains a spectrum of reliability problems. At
the same time, many of them are overloaded with mathematics which may be
beautiful but not always understandable by engineers. We felt that there
should be a book covering as much as possible a spectrum of reliability
problems which are understandable to engineers. We understood that this
task was not a simple one. Of course, we now see that this book has not
completely satisfied our initial plan, and we have decided to make it open for
additions and a widening by everybody who is interested in it.
The reader must not be surprised that we have not touched on statistical
topics. We did this intentionally because we are now preparing a book on
statistical reliability engineering.
The publishing of this book became possible, in particular, because of the
opportunities given by B. Gnedenko to visit the United States twice: in 1991
by George Washington University (Washington, DC) and in 1993 by SOT AS,
Inc. (Rockville, Maryland). We both express our gratitude to Professor James
E. Falk (GWU), Dr. Peter L. Willson (SOTAS), and Dr. William C. Hardy
(MCI) for sponsoring these two visits of B. Gnedenko which permitted us to
discuss the manuscript and to make the final decisions.
We would also like to thank Tatyana Ushakov who took care of all of the
main problems in the final preparation of the manuscript, especially in
dealing with the large number of figures.
xv
XVi PREFACE
We are waiting for the readers' comments and corrections. We also repeat
our invitation to join us in improving the book for the future editions.
xvii
XViliTo compensate
INTRODUCTION for the deficiency in this book, we could recommend some
books which are dedicated to reliability in terms of equipment and compo-
nents. References can be found in the list of general publications at the end
of this book. We understand that the problem of engineering support of
reliability is very serious and extremely difficult. Most of this requires a
concrete physical analysis and sometimes relates very closely to each specific
type of equipment and component.
We are strongly convinced that the main problem in applied reliability
analysis is to invent and construct an adequate mathematical model. Model-
ing is always an art and an invention. The mathematical technique is not the
main issue. Mathematics is a tool for solution of the task.
Most modern mathematical models in reliability require a computer.
Usually, reports prepared with the help of a computer hypnotize: accurate
format, accurate calculations.... But the quality of the solution depends
only on the quality of the model and input data. The computer is only a tool,
not a panacea. A computer can never replace an analyst. The term "GIGO,"
which reminds one of FIFO and LIFO in queuing theory, was not conceived
in vain. It means: garbage in, garbage out.
A mathematical model, first of all, must reflect the main features of a real
object. But, at the same time, a model must be clear and understandable. It
must be solvable with the help of available mathematical tools (including
computer programs). It must be easily modified if a researcher can find some
new features of the real object or would tike to change the form of
representation of the input data.
Sometimes mathematical models serve a simple purpose: to make a de-
signed system more understandable for a designer. This use of modeling is
very important (even if there are no practical recommendations and no
numerical results) because this is the first stage of a system's testing, namely,
a "mental testing." According to legend Napoleon, upon being asked why he
could make fast and accurate decisions, answered that it is very simple: spend
the night before the battle analyzing all conceivable turns of the battle—and
you will gain a victory. The design of a mathematical model requires the
same type of analysis: you rethink the possible uses of a system, its opera-
tional modes, its structure, and the specific role of different system's parts.
The reader will not find many references to American authors in this book.
We agree that this is not good. To compensate for this deficiency, we list the
main English language publications on the subject at the end of this book.
We also supply a restricted list of publications in Russian which are close to
the subject of this book.
As a matter of fact, we based our book on Russian publications. We also
used our own practical experience in design and consulting. The authors
represent a team of an engineer and a professional mathematician who have
worked together for over 30 years, one as a systems analyst at industrial
research and development institutes and the other as a consultant to the
same institutes. We were both consultants to the State Committee of Stan-
dards of the former Soviet Union. For over 25 years weINTRODUCTION
have been x'lX
running
the Moscow Consulting Center of Reliability and Quality Control which
serves industrial engineers all over the country.
We had a chance to obtain knowledge of new ideas and new methods from
a tide of contemporary papers. We have been in charge of the journal
Reliability and Quality Control for over 25 years, and for more than 20 years
we have been responsible for this section on reliability and queuing theory in
the journal Tehnicheskaya Kibernetika (published in the United States as
Engineering Cybernetics and later as the Soviet Journal of Computer and
Systems Sciences).
This activity in industry and publishing was fruitful for us. Together we
wrote several papers including review on the state of reliability theory in
Russia.
We hope that the interested reader meets with terra incognita—Russian
publications in the field, Russian names, and, possibly, new viewpoints, ideas,
problems, and solutions. For those who arc interested in a more in-depth
penetration into the state of Russian results in reliability theory, we can
suggest several comprehensive reviews of Russian works in the field: Levin
and Ushakov (1965), Gnedenko, Kozlov, and Ushakov (1969), Belyaev,
Gnedenko, and Ushakov (1983), and Rukhin and Hsieh (1987).
We tried to cover almost the entire area of applied mathematical models in
the theory of reliability. Of course, we might only hope that the task is
fulfilled more or less completely. There are many special aspects of the
mathematical theory of reliability which appear outside the scope of this
book. We suggest that our readers and colleagues join us in the future: the
book is open to contributions from possible authors. We hope that the next
edition of the book will contain new contributors. Please send us your
suggestions and/or manuscripts of proposed new sections and chapters to
the address of John Wiley & Sons.
BORIS GNEDENKO
IGOR USHAKOV
REFERENCES
FUNDAMENTALS
fB(x\P) = X - 0,1
(1.1)
E{X} - 1 p + 0 q =p (1.2)
and
E { X 2 } = \ 2 p + 02 q = p
The m.g.f. can also be used to obtain the moments of the distribution:
d
AFF>-E{*} -—(pe' + q) =p
as
d2
M< > = E { A - } = ^ { p e 5 + q )
3 2
=P
j=o
which coincide with (1.2) and (1.3).
A sequence of independent identically distributed (i.i.d.) Bernoulli r.v.'s is
called a sequence of Bernoulli trials with the parameter p. For example, one
may sequentially test n statistically identical items by setting Xt = 1 if the /th
item operates successfully during the time period t, and Xt = 0 otherwise
(i = 1, - -., n). Thus, one has a random sequence of l's and 0's which reflects
the Bernoulli trial outcomes.
Pr{* = x} = f g ( x \ p ) = pxq
(1.5
)
where the subscript g denotes the geometrical distribution. For (1.5) the
d.f. is
Pr{ X z x } = q E P k
(1-6
)
0 £k£x
Since (1.6) includes the geometric series, it explains the origin of the
distribution's name.
Everybody knows how to calculate (1.6) in a standard way, but we would
like to show an interesting way which can be useful in other situations. Let
z = l+ p+p2 + (1.7)
and
y = \+ p+p2 +
+P*
Then (1.7) can be rewritten as
z = y + px + l
I + p(l + p + p2 + •• •
+px) = 1 + py
Thus, with the probability defined in (1.8), a failure has occurred before the
;cth cycle. The probability of a series of successes of length not less than x,
that is, PrlX > is, obviously,
y = 1 + a + a2 + a3 + • • • = 1 + a(l + a + a2 + • • - ) = 1 + ay
and then
y-(I-fl) -i
Thus,
(1.11)
s
1 -pe
The mean and variance of the geometric distribution can be found in a
direct way with the use of bulky transformations. We will derive them using
(1.11):
o ds \ 1 - pe5
_ P
(1.12)
and s-a a
d2 —) P( 1 +P)
"M*1)-jsWO 1 ~pes) (1.13)
.t-0 ds s-0
2
P(1 +P)
Var( A') =
(1.14)
s
Substituting e for z, we obtain the generating function (g.f.), <p, of the
distribution, that is, a sum of the form
f(z) - L PkZk = £ p V 4
=1_—
fe^O itsO I pz (1.15)
CONTINUOUS DISTRIBUTIONS RELATED TO RELIABILITY 5
Pr{ X = k + t \ X ^ k ) = Pr{JT = (}
x = x, + ■ ■ ■ +*„ = Z X;
1 sisn
E{ £ X , } - £ E { X , ) (1.16)
E{A'}=np (1.17)
6 FUNDAMENTALS
Vzv{X}=npq (1.19)
The reader can see that (1.21) is a Newton binomial so the origin of the
distribution's name is clear.
If one writes (1.21) in expanded form, the coefficients at zk is the
probability of k successes in n trials
P r { X = = jpV" (1-23)
X= x l + • ■ ■ +*„ = z Xi
1 Slid
E { X } = Z E{*,} = ^ (1.24)
isri'^M *
and
n
^ P
Var{X}= Z Var{*(}=— (1.25)
The m.g.f. of the negative binomial distribution can be easily written with
the help of the m.g.f. of the geometric distribution:
(1.26)
1 - qes
The second term of the product in (1.27) equals q and the first term
(considered relating to failures) is defined to be
f b ( x - 1|p , n - 1) = ^ ~ ) (1.28)
Pr{* = „) . ( (1-30)
♦ In n Bernoulli trials, the fcth success occurs at the «,th trial where
n{ <, n, and all remaining trials are unsuccessful.
• The negative binomially distributed r.v. is less than or equal to n.
The first and second events are described with the help of binomial and
negative binomial distributions, respectively. In other words,
—k
Thus, in some sense, a binomial d.f. plays the role of a cumulative d.f. for an
r.v. with a negative binomial d.f.
(A t ) k
Pr{*;Af} = (L33>
E{X}=Af = A (1.34)
The equation for the m.g.f. can be easily obtained with the use of (1.33)
because they can obtain a iot of elegant results with it. If, in fact, the
investigated problem has at least some relation to an exponential modei, this
is an excellent reason!
Antagonists of the exponential distribution maintain that it is an unreason-
able idealization of reality. There are no actual conditions that could gener-
ate an exponential distribution. This is not a bad reason for criticism. But on
the other hand, it is principally impossible to find a natural process that is
exactly described by a mathematical model.
The real question that must be addressed is: under which conditions it is
appropriate to use an exponential distribution. It is necessary to understand
the nature of this distribution and to decide if it can be applied in each
individual case. Therefore, sometimes an exponential distribution can be
used, and sometimes not. We should always solve practical problems with a
complete understanding of what we really want to do.
Consider a geometric distribution and take the expression for the probabil-
ity that there is no failure during n trials. If n is large and p is close to 1, one
can use the approximation
Pr{failure during A} = A A
Then, for the r.v. X, a continuous analogue of a geometric r.v., with n -» <»
and A -> 0, we obtain
f( t+ x ) - f{ t) + f( x) (1.40)
where f ( y ) = In P ( y ) .
12 FUNDAMENTALS
\(t)
X—
Figure 1.1. Exponential distribution FU) , its density /(f), and its hazard function
AO).
But the only function for which (1.40) holds is the linear function. Let
/(y) = a y , Then P ( y ) = expUy). Now one uses the condition that F(oo) = 1
- P(oo) = 1 and finds that a = — 1. Therefore, the probability of having no
failure during the period t equals
P ( t ) - 1 - F ( t ) = exp(-Af) (1.41)
/(fjA) = Aexp(-Ar)
1
x
E{ A} - f Ate~ ' dt ---- - (1.43)
A
CONTINUOUS DISTRIBUTIONS RELATED TO RELIABILITY 13
The second initial moment of the distribution can also be found in a direct
way
go 2
E{*2} = Ckt2Je~x' dt = — (1.44)
o A
211
V**) - ? - y - p
that is, the standard deviation of an exponential distribution equals the mean
<7= Vv^W = ~
The m.g.f. for the density can also be found in a direct way
(1.45)
J
0 A-s
A
<p(s) = [ \e~k'e~s' dt = ------------- (1.46)
w x
Ja A+s '
A
As we considered above, the LST of the function P i t ) = 1 - F i t ) = e ',
taken at s = 0, equals the mean. In this case
<M 0 = / dt = (1.47)
and, consequently,
<Pp(s — 0) = T
condition that the r.v. is not less than /. Thus, the intensity function is
l-F{t) -Pit)
A (0- =A (1.48)
/(0
no
that is, the failure rate for an exponential distribution is constant. This
follows as well from the memoryless property. In reliability terms it means, in
particular, that current or future reliability properties of an operating piecc
of equipment do not change with time and, consequently, do not depend on
the amount of operating time since the moment of switching the equipment
on. Of course, this assumption seems a little restrictive, even for "exponential
addicts." But this mathematical description is sometimes practically suffi-
cient.
B{X) =j (1.49)
16 FUNDAMENTALS
Var{X> (1.50)
Finally, the LST of the density of an Erlang distribution of the Nth order is
*(*) - (1.51)
U x \ a , « ) - — JLe-U-ft**1 (1.53)
(T\ Z T T
where a and a1 are the mean and the variance of the distribution, respec-
tively. These two parameters completely characterize the normal distribution.
The parameter u is called the standard deviation. Notice that cr is always
nonnegative. From (1.53) one sees that it is a symmetrical unimodal function;
it has a bell-shaped graph (see Figure 1.2).
That a and a2 are, respectively, the mean and the variance of the normal
distribution can be shown in a direct way with the use of (1.53). We leave this
direct proof to Exercises 1.2 and 1,3. Here we will use the m.g ,f.
00 1
<p„(s) = f —J=e-(x-ait/2a2esx dx = exp( a s + ± e r 2 s 2 ) (1.54)
.— oo <T\1tt
Fit) fit)
Figure 1.2. Normal distribution F(f), its density /((), and its hazard function A(()-
18 FUNDAMENTALS
and
VarfA*} = a2 (1.57)
<T\LTT
The function (1.58) has been tabulated in different forms and over a very
wide range (see Fig. 1.3). Using the symmetry of the density function, one can
compile a table of the function
e<*) = f f n(x$,\)dx
Jn
J
0
Often one can find a standard table of the so-called Laplace function:
I — 2 F ( x ) . This kind of table is used, for instance, in artillery calculations to
find the probability of hitting a target.
The distribution function of a normal distribution decreases very rapidly
with increasing x. Most standard tables are, in fact, composed for Ul < 3 or
4, which is enough for most practical purposes. But sometimes one needs
CONTINUOUS DISTRIBUTIONS RELATED TO RELIABILITY 19
values for larger x. In this case we suggest the following iterative computa-
tional procedure.
20 FUNDAMENTALS
/ = fe~x2'2dx
FX
x)
-x 0 x
Figure 1.3. Three types of tabuiated functions for the standard normal distribution.
It can be rewritten as
A(f) - A^"1
f(t)
Figure t.5. Density of the Weibull-Gnedenko distribution /(r) for the following
parameters: (a) 0 - 1, A - I; (fr) p - 2, A - 1; (c) P - 4, A = 1; p =* 2, and
A = 0,7.
24 FUNDAMENTALS
(} = 2.0
Figure 1.6. Hazard rate for the Wcibull-Gnedenko distribution with A = 1 and
different parameter 0.
Var{£} = r1+
Figure 1.7. Density function for the log normal distribution with different parameters:
( a ) = 1, o- = 1; ( h ) f L = 3, a = 1.7; (c) ^ = 3, a = 1.
2
= pH+<r /2 EU} = e
and
Var{£} = e2it+<rl(e^ - 1)
For a small coefficient of variation, one can use a normal approximation for a
lognormal d.f.
26 FUNDAMENTALS
1
/<*)-< b - a for a < x <: b (1.59)
0
for x < a and x > b
( b - a Y
Var{*} = f 12
dx - (1.61)
b-a
2 _________
a -b
0aa+bb
Figure 1.8. Uniform distribution F(f), its density /(f), and its hazard function AO).
THE SUMMATION OF RANDOM VARIABLES 27
< P F ( s ) - f 'J° F ( t ) e ~ s t d t
o
and
<PCG(S) = f°°G(t)e~"
J dt
o
{pD(S) =
If one considers a sum of n i.i.d. r.v.'s, the convolution F * " ( t ) is
J
o
where all JF * 's are determined recurrently. For a sum of i.i.d. r.v.'s each of
which has LST equal to <p(s),
For the sum of n r.v.'s with arbitrary distributions, one can write
JE (1-63)
V lsjsn
Thus,
In other words, the sum of two bionomially distributed r.v.'s with the same
parameter p will produce another binomial distribution. This can be easily
explained: arranging a joint sample from two separate samples of sizes n}
and n 2 from the same population is equivalent to taking one sample of size
n, + n2.
Obviously, an analogous result holds for an arbitrary finite number of
binomially distributed r.v.'s. Thus, we see that the sum of binomially dis-
tributed r.v.'s with the same parameter p produces a new binomially dis-
tributed r.v. with the same p and corresponding parameter
n = £ ni
lZjZN
that is, the resulting m.g.f. is the m.g.f. of a new Poisson d.f. with parameter
Az = A, + A2.
An analogous result can be obtained for an arbitrary finite number of
Poisson r.v.'s. In other words, the sum of N Poisson r.v.'s is again a Poisson
r.v. with parameter equal to the sum of the parameters:
A = £ A,- (1.68)
1
Therefore, the sum of two normal r.v.'s produces an r.v, with a normal
distribution. For n terms, the parameters of the resulting norma! distribution
are
(1.70)
a
z - £ Of
J sisN
and
(1.71)
<rN = V ts/sAf
_____ g —K / *
y/2tt
yfnpq
The next step in generalizing the conditions under which the sum of a
sequence of arbitrary r.v.'s converges to a normal distribution is formulated
in the following theorem.
Liapounov Central Limit Theorem Suppose that the r.v.'s X, are inde-
pendent with known means ai and variances or,2, and for ail of them,
EffA^ — at< <*>. Also, suppose that
E
lim -------------- = 0
V v Uisn '
Then, for the normalized and centered (with zero mean) r.v.,
32 FUNDAMENTALS i
ii
m
Pr
{Y
„
<
JC
}
=
n
Thus, this theorem allows for different r.v.'s in the sequence and the only
restrictions are in the existence of moments of an order higher than 2. As a
matter of fact, this statement is true even under weaker conditions (the
restriction of a variance is enough) but all r.v.'s in the sum must be i.i.d.
For the sample mean, the related result is formulated in the following
theorem.
tifc(X„-a) \ 1 ry
Iim Pr ------ ^ -------- < y = f e ~ z d z
y <j ) v2tt j - x
this theorem may be interpreted in the following way: the sum of i.i.d. r.v.'s
approximately has a normal distribution with mean equal to na and variance
equal to n a 2 .
A detailed historical review on the development of probability theory and
statistics can be found in Gnedenko (1988).
where an — np,
34 FUNDAMENTALS
Prfo*'}- E pW E Lzt)
In general, both of the latter expressions are practically useful only for
numerical calculation.
To find the mean of we may use the Wald equivalence:
E{ E f*)=E{»}EU} (1-72)
Below we consider two cases where the sum of finite r.v.'s will lead to
simple results.
A A2 A3
= q .. ........ + pq -------------- + ---------- + • ■ •
' yA ™ (k+sf ( A +s)3
_ y (PA)k _ <?A 1
= <? A .J 73
A
+* (A + s ) k s+£ A
?
A+ s
Theorem 1.1 Let {£} be a sequence of i.i.d. r.v.'s whose d.f. is F i t ) with
mean a > 0. Let v be the number of discrete r.v.'s of a sequence with a
geometric distribution with parameter p : Pr{f = k ) = ( i p k ~ ] where q — 1 —
p. Then, if p -* 1, the d.f. of the normalized sum
^E ~ Q £ U
lstip
1
converges to the exponential d.f. 1 — e~ ''.
36 FUNDAMENTALS
E &
6z
£
i<.k<.v
e{ L | = E{f} E{£}
Without loss of generality, we can take E{£} = 1. Because v has a geometric
distribution,E{ ) =
P l / q . Hence,
1 E ik
1 sksv
The LST of is
= E{<r^} = E j e x p / ~qs £
E pk~]qexp E ft)
Note that
Then
= E pk ~ le[ < p( w) ] k
1 sfc <0t>
- ) E [1 P 9 { s q ) \ k =
o ~P<p(W)
Now with some simple transformations
. . q<p(sq) q<p{sq)
= ------------------- —■——— -- -------------------------------------------------------------------------------------
lim ip(sq) = 1
and, consequently,
1 - ip(sq) <p(0) -
<p(sq) -<p'(0) = -H|f}
[jm ------- —-— = [im -------- —
q—o sq t)—o sq
Example 1.1 A sample consists of n = 1000 items. The probability that the
item satisfies some specified requirement equals Pr{success} - p - 0.9. Find
Pr{880 < number of successes}.
Solution. For the normal d.f. which approximates this binomial distribution,
we determine that a = np = 900 and a2 = npq = 90, that is, a ~ 9.49. Thus,
/ 880 - 900 \
Pr{880 <*} = !-4>[ 949 )
880 - 900
38 FUNDAMENTALS
Example 1.2 Under the conditions of the previous example, find the num-
ber of good items which the producer can guarantee with probability 0.99
among a sample of size n = 1000.
Solution. Using a standard table of the normal distribution, from the equa-
tion
Pr(m
x - 900 + 0.5 \ ( x - 900.5
>;*> " ^900 " = 0 01
we find
x - 900.5
» -2.33
9.49
or x = 978.6. Thus, the producer can guarantee not less than 978 satisfactory
items with the specified level of 99%.
We must remember that such an approximation is accurate for the area
which is more or less close to the mean of the binomial distribution. This
becomes clear if one notices that the domain of a normal distribution is
(—oo, <»), while the domain of a binomial distribution is restricted to [0, n).
In addition, there is an essential difference between discrete and continu-
ous distributions. Thus, we must use the so-called "correction of continuity":
a
ifnpq j \ ifnpq
Example 1.4 Consider a socket with 25 units which replace each other after
a failure. Each unit's TTF has an exponential distribution with parameter
A = 0.01 [1/hour], Find the probability of a failure-free operation of the
socket during 2600 hours. (Replacements do not interrupt the system opera-
tion.)
(AO
= -Pp(k-,kt)
RELATIONSHIPS AMONG DISTRIBUTIONS 41
Thus, events (a) and (b) are equivalent. It is important to remark that both
the Erlang r.v. and the Poisson process are formed with i.i.d. r.v.'s with
exponential d.f.'s.
Notice that the unconditional event £k > t is equivalent to the set of the
following events in the Poisson process: {no events arc observed) or {one
event is observed) or {two events are observed) or... or {k - 1 events are
observed). This leads to the following condition:
k- 1
Pr {&>*} = = Z
(Af)
Z = />,(*-!; A*)
0 nt■
or, for the probability of the event fk < t, that is, for the d.f, of the Erlang
r.v. of the fcth order, we have
Therefore, in some sense, the Poisson d.f. is a cumulative function for an r.v.
with an Erlang distribution.
Notice that this approximation is accurate in an area close to the mean and
may be very bad for the "tails" of the Poisson distribution. This is explained
by the fact that these two distributions have different domains: the normal
distribution is defined on (-00,00), while the Poisson d.f. has no meanings for
m < 0.
Example 1.5 Assume that the number of failures of some particular unit of
equipment has a Poisson distribution. The expected number of failures
42 FUNDAMENTALS
during a specified period of time equals 90. One has decided to supply the
RELATIONSHIPS AMONG DISTRIBUTIONS 43
equipment with 100 spare units of this type. With what probability will there
be no deficit of spare units?
Solution.
I 100 -
90 = <t>( 1.05)
PrH^lOO}
Example 1.6 Under the conditions of the previous example, find how many
spare units should be supplied so that the probability exceeds 0.995.
Solution.
I x ~ 90 + 0.5 \
Pr[ m >x}~ 4>| -- ^----- J - 0-995
From a standard table of the normal distribution, we find that
a: - 90.5
= 2.576
\/90
or jc = 114.9. This means that one should have 115 spare units.
first n trials where n > k . The event { v k > n} means that, in the first n trials,
there are 0, or 1, or 2,... f or k - 1 failures, that is,
Pr K>«} = £ ("W"'
O & j &k ~ 1 V J I
x - a
t - ------------ (1.74)
<T
The density function /(t) can be represented with the help of the
Gram-Charlie series
where <p(r), <p'(t), <p"(0,. . . are the density of the normal distribution and its
subsequent derivatives. The standard normal density is expressed as
a>inHt)
Hn(t) = (-1) —777- (1.75)
f(0
RELATIONSHIPS AMONG DISTRIBUTIONS 45
where c p u ' K t ) is the nth derivative of the normal density. By direct calcula-
tions we find
H0(0
=1
Jf,(0 -t
H2(t) =t2- 1 (1.76)
3
H3(t) = r - 3t
H 4 ( t ) = /4 - 6r2 + 3
Usually, for practical problems, we do not need more than four terms of the
Gram-Charlie set.
From (1.75) it follows that
<P'(~0=<P(0
<p"(-0=<P(0
<P{3)(-0 = -?(3)(r)
(pw( -t) = <p<4>(t)
f ( t ) - A 0 H 0 ( t ) v ( t ) - A M O v C O + A 2 H 2 ( t ) < p ( t ) ~ ■ ■ ■ (1.78)
( -1)"
An= - - - - f f ( tJ) H „ ( t ) d t (1.79)
n : — oo
RELATIONSHIPS AMONG DISTRIBUTIONS 46
A0=\
A, = -m?
A2=-£ (1.80)
At~ _i[mo_3m?]
= + 3]
A, - 0
A2 = 0
A4 = ±[ m A ( t) - 3] = sM O
where k3 and k4 are known as the coefficient of asymmetry and the coefficient
of excess, respectively,
m
iix)
m4(x)
expressed as
f(x) - -
cr
and
Example 1.7 With the help of the Gram-Charlie series, the Poisson distri-
bution can be approximately expressed as
where
x — a — 0.5
r=
REMARK. The Gram-Charlie distribution can be successfully applied to the evaluation of d.f.'s.
This takes place, for instance, in analyses of the distribution of a parameter of a piece of
electronic equipment when the distributions of its components are known.
Stochastic processes are used for the description of a system's operation over
time. There are two main types of stochastic processes: discrete and continu-
ous. Among discrete processes, point processes in reliability theory are widely
used to describe the appearance of events in time (e.g., failures, terminations
of repair, demand arrivals, etc.).
A well-known type of point process is the so-called renewal process. This
process is described as a sequence of events, the intervals between which are
1 Stationarity
• Memorylessness (Markov property)
• Ordinarity
48 FUNDAMENTALS
The first property means that the d.f. of the number of observed events in
a time interval depends only on the length of the interval and not on its
position on the time axis.
The second property means that the d.f. of the number of observed events
does not depend on the previous history of the process.
The third property means that the probability of an appearance of more
than one event in an infinitesimally small interval h goes to 0:
1
— Iim Pr{k events appear during h , k > 1} -» 0
h a-O
Pj(h) = Ah + o{h)
(1.83
)
where A is some constant and o ( h ) was introduced in (1.82). As a matter of
fact, (1.83) follows from the three properties characterizing a Poisson pro-
cess.
Consider the probability of the appearance of k events in a time interval
t + h. The formula for the probability can be easily written as
Let
0 zj<,k-2
STOCHASTIC PROCESSES 52
R k < L W)
= Pr{two or more events appear during interval h} (1.87)
At the same time, by assumption, this probability equals o ( t ) .
As a result, we have the equality
W + h ) = P k ( t ) P Q ( h ) + P k . i ( t ) P 1 ( h ) +o(0 (1.88)
P k ( t + h ) = P k ( 0 0 - Ah ) + P k - t ( t )AA + o ( t ) (1.89)
and, after h 0,
dPk{t)
= -\Pk( t) +\Pk^(t) (1.90)
dt
Thus, a system of equalities for P k ( t ) , k = 0, I,..., has been obtained. We
need to add one more equation to determine P0(f). Using the memoryless
property, we can write
pQ(t + h ) = p 0 ( t ) p 0 ( h ) = p0(t)[ 1 - \ h + o ( h ) 3
or, finally,
dPQ(t)
= -AP0(0 (1-91)
dt
To solve the system, we must determine the initial condition. Of course, at
t = 0, the probability of no events equals 1; that is, the initial condition is
/>,,( 0) = 1.
BIRTH AND DEATH PROCESS 53
The system of differential equations (1.90) and (1.91) with the above initial
condition can be solved by several different methods. We solve this system of
equations with the use of the LST. Let <p0(s) be the LST of the function
PoO):
<Po(*) = T—
> (!-94
A+ s
For arbitrary k > 0, from (1.90) the system of recurrent equations follows:
Ms) - +l (1.98)
(A + 5 )
From a table of LSTs, the latter transformation corresponds to a Poisson
distribution
(A/) * 4
Pk(t) = (1.99)
For the Poisson distribution the mean number of events in a fixed interval of
time is proportional to its length. The parameter A is the mean number of
events in a time unit, or, equivalently, it equals the inverse of the mean time
54 FUNDAMENTALS
A = lim lim — £ P j ( t )
T 0£j<sc
For an arbitrary stationary point process with a single arrival at a time and
without so-called "points of condensation" (infinitesimally small intervals in
BIRTH AND DEATH PROCESS 55
A* < A
For a stationary and memoryless point process, the parameter coincides with
the intensity. We can given an explanation of the parameter of a point
process based on a more physical consideration:
Thus, the probability that a failure will occur for any of these reasons is
F * k i t ) = f'JF * i k - l \ t - x ) d F i x )
o
J F* 1 - Fit)
= h i t ) d t = f i t ) d t + £ f * k i t ) d t (1.101)
2s k < oo
H ( t) = E{ N( t) } = E *Pr {N ( r ) =fc }
1 fc<°°
= E *[f**(0
= F(r) + E k F * k ( t ) - E ( k - l) F*k(t)
2sk<<x> 2
k
- E F* (0
1 <«>
Naturally, the renewal density function at time t is the sum of the densities
of the occurrence of all possible events of the renewal process: the first, or
the second, or ..., or the kth and so on. From (1.104), by integration,
we obtain
[F(r)F. Indeed,
It is important to note that F * n ;>
F*m( t) ~Vt{ E £*<*}
< P r i m a U Nf* < *) -
" 1 <.k ' \<.k<,n
This states the simple fact that the sum of n nonnegative values is not less
than the maximal one. (Equality occurs only if at least n — 1 values are equal
58 FUNDAMENTALS
F(t)
o- L ? * k* £ [ n t ) ] k - m L
1 r
1 <,k <°° tsA<® V'/
Using (1.102) and observing that the integral on the right side of the
equation is positive, we obtain two-sided bounds
The next interesting bounds for a renewal process, built with "aging" r.v.'s
i , can be obtained if we consider the following natural condition. Let N i t )
events be observed up to a moment t . Thus,
< * L £/
I s/sMrt
f<EU}[H(/) + l]
which produces
"t^inr 1
For an aging r.v, the residual time f i t ) is decreasing, which allows us to
write
H{t) <
m
BIRTH AND DEATH PROCESS 59
EU}
Thus, for a renewal process with aging r.v.'s we can write the two-sided
bounds
t
-T-r - 1 < Hit) < —— (1.107)
Here t 0 is the time needed for a successful operation. The proof of this is left
to Exercise 1.10.
BIRTH AND DEATH PROCESS 61
General Case Consider a stationary recurrent point process for which the
intervals between events have a distribution F i t ) . Sometimes this distribution
is called a "forming distribution." Apply the thinning procedure to this
process. According to this procedure, each event remains in the process with
probability q or is deleted with probability p = 1 - q . Thus, after such a
procedure, the average number of points which remain in the newly formed
process is \ / q times less than in the initial process. In other words, the time
interval between points in the new process is 1 / q times larger. The explana-
tion of the procedure is depicted in Figure 1.9.
Each interval between events represents the sum of a random number of
r.v.'s. Thus, the problem of renewal process thinning is equivalent to the
Theorem 1.6 If transformations TQI, TQ ,.,., are such that, for n -* «>,
Qn = Q \ d 2 •*•«„-» 0 as n -> 0
then their application to some point renewal process with an initial finite
intensity A leads the resulting limit process to a Poisson process with the
same intensity A.
We omit the proof because, in general, it coincides with the corresponding
proof of Section 1.3.5.
Later this result was generalized for the superposition of n different
renewal processes with different thinning procedures.
We remark that a Poisson process is sometimes called "a process of rare
events." From formulations of the above results, one can see that the T Q
transformation generates the flow of "rare" events from a dense initial
process.
First
process Resulting point process TT Figure 1.10. Example of the
" tI
Second,
process
i i
t g i
superposition of two point t i i processes.
ti i -4—1—4 -4—4-
-
unit is replaced and the process continues. Unfortunately, even if we con-
sider a small number of renewal processes, their superposition cannot be
analyzed in terms of renewal processes! (The only exception is the superposi-
tion of Poisson processes.)
At the same time, fortunately, if the number of superimposed point
processes is very large, the superposition of these processes produces a point
process that is very close to being Poisson. In the theory of stochastic
processes, the Poisson process plays a role which is analogous to that of the
normal distribution in probability theory.
U « tnin ik
1 ^k<.n
with d.f.
General Case The proofs and even the formulation of the strict conditions
of the related theorems are complex and lie outside the scope of this book.
We only formulate the main results, sometimes even on a verbal level. The
first strict result was formulated in the following theorem.
exists, a necessary and sufficient condition that the process Jr„{r) converges to
a Poisson process with parameter A is that, for any fixed t and n - >
( 1 ) ( 1 ) ( 1 j
K
f V (m - 1)\ f \ * (m + n-l)\/ V
p, |JL P
p, p p.
The failure state with the smallest index, say m + 1, may be considered
absorbing. The other system failure states are of no interest because there
are no transitions from them to the set of up states. In this case the process
behavior in the subset of up states can be used to find the probability of a
failure-free operation, the MTTF, and the mean time between failure
(MTBF). Notice that if we are interested in finding repair (idle time) indexes,
the state Hm must be chosen to be absorbing. In this case we consider the
behavior of the process in the subset of system failure states.
If there are no absorbing states, the process is considered for finding the
availability coefficients, both stationary and nonstationary. For a finite set of
states, the state with the largest index is reflecting.
In reliability problems the state H() is always considered as reflecting (if
the process is not of a special type with states H H _ 2 , . . . ) .
For reliability problems it suffices to consider only BDPs with a finite
number of states.
BIRTH AND DEATH PROCESS 67
1.6.2 Stationary
Probabilities
Consider a finite BDP with N + 1 states (see Figure l.lltr). For each state K
and for two infinitesimally close time moments t and ( + A t , we can write
68 FUNDAMENTALS
the expression
p k ( t + A t ) =p*_,(0[A*_, Af +o(A0]
+ M')[1 - (A* + Mk) A/ + o(Ai)]
+ +o(A/)] + o ( t )
pk(t + AO ~pk(t)
-------- ^ ---------- Ak- i Pk- i( t) ~ ( A* + M k) pk( t) + M k + ipk + i(t)
P'oiO = "A0Po(0
and
n, A
Ozjzk- 1
n^
^k lzjzk
-------------------- TTT (,119)
0 rZj^N £ OS'S/-!
<wsAf n M.
1 SiS/
where A 0 = 1 .
r+= = (1.121)
A m Pm
£ />, P m+ I m +1
and, finally,
E*
r _ - ( 1 . 1 2 2
)
Mm + ipm + l
A - ( A ; . + Mj + + AJ + l 9 J + l ( s ) = -/>y(0)
05< n + 1
(1.126)
where
72 FUNDAMENTALS
-
( A0 + ») M, 0 0 0
A« — ( A| + A/| + J ) M 2 0 0
=
0 A| -( AJ + A/ J + J ) 0 0
0 0 0 1+ H,- , + S ) M„
0 0 0 AN -. -( A n + + s)
Mn
+ l
( (s + x " k )
- -1) n (1.12
"+ 7)
1
1 £ k <.
n + 1
(s)
-( A„ + I ) M, 0 0 0 - P <|(0)
A -( A + M, + S) M2 0 0
«
0 A, -( A 2 + M2 + S) 0 0 - P2(0)
0 0 0 • -< A„_ , + -1 + M„ - P n -,(0
Mn S) )
0 0 0 A .-L ~( A„ + M „ - P„(0)
0 0 0 0 A» - P „ + |(0)
(1.128)
Expanding the determinant £>„ + ,(s) along the last row yields the recurrent
BIRTH AND DEATH PROCESS 73
equation:
The probability p„ + ,(f) is found with the help of the inverse LST:
s
1 , (,130)
1 , Dn + l(s)e 'ds
P..M - JSIS -.M '"* - nl -.a.» ,(.)
where i = / — 1.
We are now faced with the problem of calculating the roots (eigenvalues)
of (1.130). It can be shown that An + 1(s) is a polynomial of power n , and all of
its roots + 1 < k < , n + 1, are distinct and negative. Also, all roots of
the polynomials of the neighboring orders n and n + 1 are intermittent. This
fact facilitates the computation of the recurrent equation (1.130).
We omit the cumbersome intermediate transformations and write the final
result, taking into account that the probability of interest P ( t ) = 1 - p n + l ( t ) :
1 £i£n
i + k
If the system begins its operation at the state with all units completely
operational:
then
e
/>«»(<) = A 0 A, ... A„ Z ,,n + i ) _ , sn + i)) (1.132)
imn+lSi 11 \Sk i )
1+ t
k+i
If the system begins its operation just after coming out of a failure state.
P„(0) = 1 p , ( 0 ) = 0 i = 0 , 1 , . . . , / 2 - \ , n + 1
then
• from Hn to Hn + l.
L A-
T — VT —V ^fc
/
n , n + ] ~ Z-T * k , k + \ L i .
a
0 sitsn Osksm kPk
A: = lim Pr{tfy(r) e£ + } = £ Pj
The second way uses the means T + and T _ determined in (1.121) and
(1.122), respectively. Let K - 1 = k . For the stationary process, the probabil-
ity of being in a given state is proportional to the portion of time occupied by
this state over the entire interval of observation on the time axis. This leads
to the condition ( K / k ) = ( T + / T _ ). With the condition K + k = 1 we find
T+
-I +s^0(s) = -A0<p0(5)
Solving (1.135) beginning with the first equation and sequentially substituting
the obtained results in the next equation, we obtain
A 0 A, . . . . . . . . A*, ,
<Pn(S) =
5(5 + A 0)(s + A, ) . . . . . . . . . . . . ( s + A N _ J ) (1.137)
BIRTH AND DEATH PROCESS 77
For different A/s the solution for p n U ) , which is the probability that at
moment t the process enters state HN, can be found by using the inverse
Laplace-Stieltjes transform
P N( 0 = 1" II A, E A . (1.138)
9 n ( S ) = ~7~~77w (1-140)
5(5 + A)
In this case we find (with the use of a table of the LSTs) that
/ * 1 V <A,>* - At
MO - 1 - X. e
l &k &N * •
The mean time of the process entering into state H N in the general case
can easily be calculated as the sum of the time periods during which the
process is remaining in each state
N~ ~T~
1 <,k<,N
CONCLUSION
Two distributions which are often used in engineering practice are the
normal and the exponential. Each has its advantages and disadvantages. First
of all, these distributions are very convenient for varied mathematical manip-
ulations. But this argument is weak for practical applications. The question of
their reasonable use, as with any modeling of real objects with the help of
mathematical abstraction, always requires special "physical" verification
based on experience and engineering intuition.
A Weibull-Gnedenko distribution is very convenient as a model for
various physical phenomena because it is two parametrical. Besides, it has a
clear physical sense as a distribution of extremal values. This distribution, as
it relates to applied mechanical problems, was first mentioned in Weibull
(1939). shortly after this, Gnedenko (1943) found classes of limit distributions
of extreme values. A particular type of limit distribution has the form of the
distribution discovered by Weibull
F ( t ) = 1- ~ exp | - exp | JJ
where the new parameters are expressed as b = 1/0 and a - log A.
The reader interested in a deeper understanding of the probabilistic
fundamentals of reliability theory should pay attention to special mono-
graphs. It is practically impossible to enumerate the books dedicated to this
subject. An older, but nevertheless highly recommended book, is the book by
Feller (1966). This book along with the book by Gnedenko (1967, 1988) were
the main textbooks for several generations of statisticians and applied mathe-
maticians.
For everyday use the books by DeGroot (1987) and Devore (1991) are
recommended.
Concerning the limit theorems in the theory of stochastic processes, we
must especially mention several works. Khinchine (1956a, 1956b, 1960) and
Ososkov (1956) considered superposition of point processes, and later Grige-
lionis (1963) and Pogozhev (1964) generalized their result. Renyi (1962)
REFERENCES 79
REFERENCES
Belyaev, Yu. K. (1962). Line Markov processes and their application to problems in
reliability theory. Trans. Sixth Conf. Probab. Statist., Vilnius.
DeGroot, M. H. (1987). Probability and Statistics, 2nd ed. Reading, MA: Addison-
Wesley.
Devore, J. L. (1991). Probability and Statistics for Engineering and the Sciences, 3rd ed.
Pacific Grove, CA: Brooks/Cole.
Feller, W. (1966). An Introduction to Probability Theory and Its Applications. New
York: Wiley.
Gnedenko, B. V. (1943). Sur la distribution limit du termc maximum d'une scrie
aleatoir. Ann. Math., no. 44.
Gnedenko, B. V. (1967). The Theory of Probability. New York. Chelsea.
Gnedenko, B, V., Kovalenko, I. N. (1987). Introduction in Queuing Theory, 2nd ed.
(in Russian). Moscow: Nauka.
Gnedenko, B. V. (1988). Course of Probability Theory, 6th ed., revised (in Russian).
Moscow: Nauka.
Grigelionis, B. (1963). On the convergence of sums of step stochastic processes to a
Poisson process. Theory Probab. Appl., vol. 8, no. 2.
Grigelionis, B. (1964). Limit theorems on sum of renewal processes. In Cybernetics in
the Service of Communism, vol. 2, A. Berg, N. Bruevich, and B. Gnedenko, eds.
Moscow: Energia.
Khinchine, A. Ya. (1956a). Streams of random events without aftereffect. Theory
Probab. Appl., No. 1.
Khinchine, A. Ya. (1956b). On Poisson streams of random events. Theory Probab.
Appl., no. 1.
Khinchine, A. Ya. (I960). Mathematical Methods in the Theory of Queueing. London:
Charles Griffin.
Ososkov, G. A. (1956). A limit theorem for flows of similar events. Theory Probab.
Appl., vol. 1, no. 2.
Pogozhev, I. B. (1964). Evaluation of deviation of the equipment failure flow from a
Poisson process. In Cybernetics in the Service of Communism, vol. 2, (in Russian) A.
Berg, N. Brucvich, and B. Gnedenko, eds. Moscow: Energia.
Renyi. A. (1956), Poisson-folyamat egy jemllemzese (in Hungarian). Ann. Math.
Statist., vol. 1, no. 4.
Renyi (1962).
Ushakov, I. A. (1986). A universal generating function. Soviet J. Comput. Systems Sci.,
vol. 24, no. 5.
80 FUNDAMENTALS
pr{!/ = k} k « 0,1,2,...
L P k -1
Vk
The generating function (g.f.) of v denoted by < p ( z ) is defined as
<PU) ~ Y ,PkZk
Vk
Thus, the coefficient of z k equals the probability that v equals k . The g.f. is
very convenient when one deals with the summation of discrete r.v.'s.
Generating functions are especially effective when algorithms for computer
calculations are involved.
For example, suppose we have two discrete r.v.'s a and (i, with distribu-
tions a k and b k , respectively. We are interested in the distribution g k of the
new r.v. y = a + f3. Of course, we could find the desired distribution directly:
1
Pr{i/ <; k } =
:<P(Z)
z-0
where M i - o ^ operator that turns any negative power t of the term z'
into 1. Thus, after the substitution z = 0,
dz
To obtain higher moments, it is more convenient to use the
so-called
moment generating function (m.g.f.) < p ( s ) of the r.v. v . This function can be
written formally by simply substituting 2 = e s into the generating function,
that is, <p(es) - <f>(z). Then
=
d
' E kpke'k = E kpk = E{v]
ds J-O ds r-0 V A a l
5—0 .v/ra 1
d
2<P(es) = m(2)
_2 E k*pke* 1 = E = =
j=0 VJt2t 1 Ji-o VJtal
d
s
1 .A.2 Laplace - Stieltjes Transformation
With continuous r.v.'s the Laplace-Stieltjes transformation (LST) is often
used. This transformation allows one to solve integral-differential equations
with the "reduced" mathematical technique. The essence of the LST is
depicted in Figure 1.14.
In this book we usually consider distributions of nonnegative r.v.'s. The
transforms of such r.v.'s are defined for the distribution function (d.f.) F i t ) as
If we consider the LST corresponding to the density function, the LST can be
rewritten in the form
The correspondence between the original function f i t ) and its LST < p ( s ) is
usually denoted as
f{t) ** <P/i$)
the transforms:
This follows directly from the property of the integration. Obviously, (1.142)
is true for any number of functions in the sum.
APPENDIX: AUXILIARY TOOLS 83
A0 = j 'Jf y ( t - x ) f 2 ( x ) d x ~ f f 2 ( tJ- x ) f l ( x ) d x
o o
This operation over the functions /,(/) and f 2 i t ) is also denoted by
AO ■ /i * A(0
The transform of the convolution of a pair of functions is the product of the
transforms:
f' f ( t ) d t ~
J n
J
X (1.145)
0 S
The proof of this is left as Exercise 1.16.
Property of the LST of the Density Function If the function /(/) is the
density of the distribution of the r.v. that is, f i t ) = [ d F i t ) ] / d t , then
/ AO'
Jn s-0 .'n
dt (1.146)
and
= - /V t o* = - m)
~/V(o< d t S-0 J n
Jn
84 FUNDAMENTALS
rJ p ( t ) e ~ s , d t = f " p (J t ) d t = T (1.147)
0 Jj».o o
dk
ds (1.148)
s-0
*
These moments are obtained more conveniently with the help of the continu-
ous m.g.f. which coincides with the LST except in the sign of the power in the
exponential:
s
- rf(t)e 'dt
Ji i
ds* s—o
L Pa= 1
IZ/SM,
where M, is the number of discrete values of the ith resistor. For each unit
we can construct the generating function of the distribution of the resistance
values:
G ; ( Z ) = E PuZ'"
1 <,i<.M>
C ( z ) = £ P,zR- (1.150)
where the coefficient Ps of the term zR' equals the probability that the series
system's resistance is Rv
We remark that, in a computational sense, the introduction of the auxiliary
variable z permits us to separate the variables of interest: p and r . (We omit
other useful properties of the g.f. for this discussion because they are
irrelevant here.) To compute P s and R s , one needs only to multiply the p ' s
and to add the r's.
This example is very clear and contains no new information for those who
know how to work with generating functions. Of course, if the problem is to
calculate the resistance of a parallel connection of resistors, it is impossible
to use (1.149) and (1.150) in any direct way. To use the g.f., one has to
consider r.v.'s which measure conductivity (instead of resistance) and then
find the desired result in terms of conductivity. Finally, the result can be
transformed from units of conductivity to units of resistance.
Now suppose it is necessary to analyze the pipeline capacity of a set of
pipes connected in series. In this example the collective capacity is the
minimum of the capacities of the individual units. The usual generating
function does not work here at all! We suggest a new approach which we call
the generalized generating sequence (GGS).
86 FUNDAMENTALS
To explain how the GGS works, we use the above example with resistors in
series. First, we analyze the computations involved in moving G ( z ) as
expressed by (1.149) to G ( z ) as expressed by (1,150). For the moment,
consider a series system of two resistors labeled A and B . In terms of
calculations, we perform the following operations.
where, for example, the pair { p u , riy) exhibits the probability that
the resistance of resistor A will have the value r l j ,
2. Now introduce a new operator CI which operates on the pair of
sequences A and B and produces a new sequence C of ordered pairs
iPM,r3k). The sequence C represents the probability distribution of
the resistance of the series connection of A and B . Thus,
(l(A, B) = C
or, since each term of the sequence C is a pair of numbers, it can also
be rewritten as
+ r 2A
(b) Order the obtained pairs according to increasing values of their
second components.
(c) When two or more pairs in the newly obtained sequence are tied in
their second components, combine all such pairs into the single
pair. The first component of the new pair is the sum of all first
components of the tied pairs, and the second component of the new
pair is the (common) product of the tied second components.
Note that the operators and have a very specific meaning in this
example. But this meaning can be substituted by others in different situa-
tions. For example, for the pipeline consisting of a series connection of units
with different capacities, one can write Oc(c,,c2) = min(c,,c2) where c,- is
APPENDIX: AUXILIARY TOOLS 87
the capacity of the ith pipe. All of the remaining formal operations and the
order of their performance are similar. Therefore, the above-described
computational algorithm, in general, can be used with no restrictions on the
polynomial form. The new approach can be used for enumeration problems
involving different physical parameters. We will show the effectiveness of this
operator for computational problems of complex system reliability analysis
and discrete optimization problems.
Now let us describe the procedure in more general terms. Keeping in mind
the use of a computer, we introduce a more formal description.
For a more vivid presentation we will use a special terminology to distin-
guish the GGS from the g.f. This will relieve us of having to use traditional
terms in a new sense, which often leads to confusion. Moreover, we hope that
this new terminology can help us, in a mnemonical sense, to remember and
even to explain the procedure.
In an ancient Roman army, a cohort was the main combat unit. Each
cohort consisted of maniples which were independent and sometimes special-
ized simple combat units. Several cohorts composed a legion. The use of this
essentially military terminology appears to be convenient in this essentially
peaceful applied mathematical field. We set up a one-to-one correspondence
between the above-mentioned military units and the GGS with its attributes.
Consider a system consisting of n units. Each unit j is characterized by its
GGS. Let the GGS of a unit be called a legion. Each legion j includes Vj
cohorts:
C J K = (M J K ,, M J K 2 ,.. ., M J V . S )
L
L= N L,
the specific nature of the problem [see, e.g., item (c) of the series resistors
example].
As a result of this interaction of the legions, one obtains
N = N OJ
1
new cohorts. For each cohort the following notation is used where Clc
denotes the cohort's interaction, k is the subscript of this cohort in the set
obtained as a result of the procedure
c
c, = nc tl
i
i^k
Ck = (MkX,Mk2,...,Mks)
- Of A%
1
i.efc
EXERCISES
1.1 Prove the equivalency of expressions (1.29) and (1.30), that is, prove
that
1.2 Prove that a is the mean of a normal distribution with density function
fN(x\a,a) = -JLe-f-rf/*1
<TV2V
1.3 Prove that cr2 is the variance of a normal distribution with density
function
fN(x\a,v) - -JL,-<*-->W
crV2ir
1.4 Using the m.g.f. for the normal distribution, find the expression for the
first moment (the mean).
1.5 Using the m.g.f. for the normal distribution, find the expression for the
variance.
1.6 One observes two Bernoulli sequences with n, and n 2 trials, respec-
tively. A successful trial appears at the first sequence with probability
p x and at the second sequence with probability p 2 .
(a) Find the probability, R k ( n 1 , n 2 ) , that there will be k successes in
the entire n = n, + n 2 trials.
(b) Show that for p, = p 2 = p the probability of interest equals
1.10 There are two variants of equipment: one performs its operation
during time t and another performs the same operation during time
I t . Both units have an exponentially distributed time to failure. The
systems under consideration have different reliability: the first one has
a failure rate equal to 2A, and the second one has a failure rate equal
to A. What variant of equipment will perform its task with larger
probability?
1.11 A production line manufactures good quality items with probability
0.9. Find the probability that in a sample of size n = 500 the number
of failed items does not exceed 80.
1.12 The average portion of deficient items equals 0.01. Find the probability
that in a sample of size n = 100 the number of failed items does not
exceed 2,
1.13 A flow of the equipment failures is formed by superposition of the
flows of different types of units. Each type of unit produces a failure
flow which can be described as a Poisson process. During a given
period of time, the average number of failures of some specified type
of unit, equals 36. How many spare units should be supplied for this
period of time to support failure-free operation of this type of unit
with probability 0.95?
1.14 Prove that the LST of the convolution of a pair of functions is the
product of the LSTs of the transforms of the initial functions in
convolution.
1.15 Prove that the LST of the derivative of a function can be expressed as
f ' U ) ** stpis) - fiox
1.16 Prove that the LST of the integral of a function can be expressed as
SOLUTIONS 91
SOLUTIONS
1.1 For a binomial coefficient one can write the well-known expression
12 ........ n
(J ( I "
(1*2 ...... x)[l-2 ......... (n-x-1)]
( n - m + 1 ) • ( n - m + 2) ........ ( n - 1 ) • n
1 - 2 . . . m
/ \ ( - n - m - 1) • ( - n - m + 2) ........................ (- n - 1) • ( - n )
\ x) 1-2 .. m
„(n + m - 1) • (n + m - 2) .....(« + !) ■«
1-2 ..... m
Because
x - a
y =
ry/2
100 FUNDAMENTALS
~i= f ( o - y j l y + a ) e ~ y ' d y
Vir J-a> '
n r _2 a _2
= o--7=- / ye y dy + -j=r I e y dy
V7T — oo V7T •'-00
The first term of the latter sum equals 0 because the function under
integral is symmetrical in respect to y = 0. The second term is the
well-known Euler-Poisson integral
/ e ~ y ~ d y = 2 I e ~ y ~ dy = /ir
-00 A)
2a2 y 2
/ y e dy
V7T J-vi
+
£• -''*)
L4 Consider (1.54). Assume that the first derivative with the substitution
j = 0 derives the mean:
d
I12A/j\/12
= a
s-0
obtains
d
d I
= y — (a + <r2s) expl i
ds ' \
= |cr2 + (as + a2 2
')j exp|as + — <r3
This gives the second initial moment which is equal to the sum of the
variance a2 and the mean squared a 2 . Substituting s = 0 gives the
desired result.
1.6 Denote by b ( k , n ) the probability that there will be k successes in n
trials. Then
( a)
1
and, finally,
pkqn~k
(HI .2)
(i + i) n= £ ("W-1- £ (")
SOLUTIONS 85
/*(')=/l*/2(0 = fw-x)f2dx
= rdx C e - s x f , { x ) e - s i ' - x ) f 2 { t - j x ) d t
Substituting y = t — x , we obtain
1
dx de
/ f' f ( x ) d x e~" dt = - - / f' f ( x )
s ■'o L-'o
RELIABILITY INDEXES
r - ( i / A O ( f , + r 2 + • • • +'N) (2-1)
This reliability index means that the system, on average, works T time units
before a failure.
Consider these values in increasing order, that is, present the observations
as
(2.2)
(2.3)
where P i t ) = 1 - F i t ) .
The equivalency of (2.2) and (2.3) follows from the fact that we are only
using different means of integrating the same function (see Figure 2.1). On a
heuristic level this result may be explained by comparing this case with the
analogous discrete case depicted by Figure 2.1.
The reliability index MTTF is very convenient if the system's outcome
linearly depends on the time of its successful performance. For example,
consider a form for producing some plastic item. After a failure the form is
replaced by a new one. Thus, this form can be considered as a socket in the
above-mentioned sense. In this case the effectiveness of using the form
depends only on the average time to failure. But in other cases this reliability
index may prove to be inconvenient.
otherwise
In other words, we define a system failure in a new form: the system fails
when d = 0.
1 00 RELIABILITY INDEXES
FUk
)
I-Fit
)
1-
1-
1-
1-
P{l0) = 1 - F ( t 0 )
/ > =J f p { t ) d H { t ) (2.4)
o
/CO
A C ) - J^ (2.5)
Pr( d t \ t ) = A(0 d t
We will analyze this equation "on a verbal level" using only simple
explanations. For f = 0 we have
A(0) -pa, + (1 - p ) a 2
that is, A(0) is a weighted hazard rate at this moment. Then note that the
function AO) is a monotonically decreasing function. If a, > a2, then
lim, AO) = a2.
From (2.5) it follows that
dF{t) dP(
dt t) d[InP(0]
A(r) =
P(t) dt dt
no
- f'\(x)
P ( t ) = exp dx (2.7)
Ji i
m UNREPAIRABLE SYSTEMS 93
From the condition 0 < P(t) < . 1, it
follows that, for any t ,
lim f ' x ( t ) dt -* oo
time. Such a process is illustrated in Figure 2.3. Let denote the random
time from the completion of the (k - l)th repair to the k th failure, and jet
7jk denote the duration of the fcth repair (renewal).
In the simplest case with a socket (a one-unit system), we suppose that a
repair is equivalent to the replacing of a failed unit. This corresponds to a
complete renewal. In this case we consider an alternating stochastic process
with i.i.d. r.v.'s £ and 17, having distributions F i t ) and G i t ) , respectively. We
denote this alternating stochastic process by 7 7 } .
Of course, the corresponding process for a system consisting of several
renewal units may be much more complicated. Almost all the following
explanations will be—for simplicity—presented for a renewal unit, or a
socket.
All indexes used for an unrepairable system can also be used in this case
for the appropriate purpose. But for repairable units and systems wc have to
consider several special indexes. They are more complicated and need more
explanation.
---------------------------------------------------------
Figure 2.4. Example of the oscillating behavior of Kit) in time.
d.f. but with the standard deviation larger by times. Thus, the zone
between 2T and 2(T + TJ ), where K i t ) = 0, will be smaller. Finally, for
( » T + 7j, K i t ) will be almost constant.
If both d.f.'s F i t ) and G i t ) are exponential, the function K i t ) is strictly
decreasing with exponential speed. We will consider this case later.
For large t the initial state has practically no influence on the behavior of
K ( t ) . In this case the probability that the system is in an operating state
equals the proportion of the total up time to the total operating time. Later
we will show that for this case
EU)
K = —' - (2.10)
1 ft
K * ( t ) = - /JK { x ) d x
t Q
If t - * oo, both K i t ) and K * i t ) have the same limit, namely, K defined in
(2.10).
i?(M0)=X(r)/»(/0ir)
We will consider this index later in more detail but now we note that
P(r0[r) = P ( t { ) ) only when F i t ) is exponential.
If the system is not a socket with renewal unit, the situation is more
complicated. In Chapter 7 we will illustrate this statement on a duplicated
system consisting of two identical renewal units.
n (a)
i
1
c
e2 >■ %2
(b)
Figure 2.5. Examples of two time diagrams for a two-unit system: (a) a system of
independent units; (£>) a system working until both units have failed.
r pi - + + (2.11)
and
r(4] = max(^> + ^>,^) + ^>) (2.12)
[Notice that strictly speaking (2.11) and (2.12) should be expressed in a more
complicated way. We must take into account the mixture of the failure flows
of the two sockets. This is explained by the appearance of extremely small
and extremely large r.v.'s.]
Thus, if 7[|) > T [ 2 ] , then T [ 2 ] < 7*[3] and r[4] < r[3j. At the same time,
7"(l] > 7p, and r[2] < 7"[4j. The process continues in the same manner for
larger numbers of intcrfailure intervals.
j(I) _ E {£ 1 '+ I) -
7=
T
1 &i<.n i
I ------1 ----- 1 -----1 ----- 1 -----1 ----- 1 ----- 1 ----- 1 ----- 1----- 1 ---
J<0) -jil) j(2) J(3) jii] fi 5) J<6] y(7) y<8> j«9) jilO)
Figure 2.7. Example of changing the mean time between failures depending on the
current number of failure for which this value is evaluated.
processes, this value is called the mean residual time. In general, this index
differs from the MTTF and any version of the MTBF.
If t -» oo, for a recurrent process this index differs from both of those
mentioned above. The exception is a Poisson process for which all three
indexes coincide.
Let the computer's task be segmented into phases and assume that the
computer works in a restarting regime. After the completion of each phase,
all intermediate results are put into the computer's memory. Each short
failure has destroyed only the very last phase of the overall solving task. After
the error has been found, the calculations for this particular phase are
repeated. We do not give the formal definition of the corresponding reliabil-
ity index but it is understandable that this index may be defined with the help
of (2.13). We will consider this case in the special section dedicated to time
redundancy.
(2.14)
k
(2.15)
k
The same computer used for supporting a long and noninterrupted techno-
logical process can naturally be characterized by the probability of a failure-
free operation (PFFO).
If the computer is used for an automated landing system in an airport, and
the duration of each operation is negligibly small in comparison with the
computer's mean time to failure (MTTF), the reliability index should reflect
the number of successfully served airplanes. In this case the availability
coefficient is also the most appropriate reliability index.
If the same computer is unrepairable, for instance, its task consists of
collecting and processing information in a spy satellite, the best characteriza-
tion of it is the MTTF.
In all of these cases the reliability index corresponds to the system
predestination and to the nature of its use.
Now consider an example when a reliability index is used to characterize
"inner" technical abilities. Consider two identical computers connected in
parallel. The natural reliability index for this duplicated system is the PFFO.
In this case each computer is only a subsystem taking part in the performance
of a system's operation. What should we know about each separate computer
to characterize this system? To compute the complex PFFO, one needs to
know the probability distributions of both the time to failure and the repair
time of the computer, as well as the parameters of these distributions. The
distribution itself is not an index; it is a function. But parameters of the
distribution can be considered as technical reliability indexes of the com-
puter. These parameters have no relation to a system's operation, they only
reflect an ability to work in general.
Note that the type of reliability index chosen does not depend on the
responsibility of the performed operation. The responsibility of the system's
operation defines the level of requirements but is not part of the nomencla-
ture of reliability indexes.
When choosing reliability indexes we should take into account the follow-
ing simple recommendations based on common sense:
2.4.2 Required Reliability Level
1 00 RELIABILITY INDEXES
The problem of choosing the level of reliability is a very complcx one. In
practice, this problem is usually solved on the basis of engineering experi-
ence. For the purposes of determining reliability requirements, equipment
may be divided into three groups by its "level": systems, subsystems, and
units (components).
A system is considered as an object with its own goals of performance. A
system performance is evaluated with the help of operational reliability
indexes which are a measure of its success.
A subsystem is a more or less independent part of the system. It is
considered to be an assembly of objects within a system. Each subsystem
performs functions that are necessary for the operation of the system as a
whole. The system's subsystems can be characterized by operational indexes
if their functions can be measured with independent indexes or by technical
indexes if these indices are used to express the system's performance effec-
tiveness index.
A unit, or a component, is the smallest indivisible part of an object. The
term unit is sometimes also used as a generic term for one physically
separate item. In general, the term component is usually used for the
smallest technological part of an object: electronic components, mechanical
details, and so on.
The only problem which can be formulated as a mathematical problem is
the assignment of reliability requirements among subsystems (parts of the
system) when the requested level of reliability is known for the system as a
whole. In this case the problem is reduced to the problem of the optimal
allocation of some resources used for the improvement of reliability. The
technical aspects of the problem will be considered in Chapter 10. Here we
explain the nature of the problem.
Consider a system consisting of N independent subsystems. Assume that
the probability of successful operation of the system as a whole must not be
less than R 0 . Each subsystem can be designed with different levels of
reliability. Such levels depend on the expenditure of some kind of resource,
for example, money. Suppose that we know all functions P k ( c ) which reflect
how the reliability index of the Ac th subsystem increases as a function of the
expenditure of the resource c.
If the system's reliability index can be represented as
p(cQ) = n pk{ck)
and the value of C0 is specified, then the problem is to find the optimal
allocation of the total resource C0 in such a way that the resulting system
index is maximal, that is, find C3 = {C,,C2 ............... C^} such that
3 The chosen indexes must allow one to analyze them with the help of
analytic or computer models at the stage of system design.
• The total number of reliability indexes chosen for a system's characteri-
zation should be as small as possible.
• The chosen indexes should be easily interpreted.
CHOICE OF INDEXES AND THEIR QUANTITATIVE NORM 103
System Level Consider two principal cases. One of them consists in the
use of practical experience and engineering intuition. Mostly it is based on an
analysis of prototypes of the new technical system to be investigated. This
method needs no special comments.
Practically, the only time a system's reliability requirement appears is if:
- The system's outcome can be measured in cost units, that is, in the same
units as the costs of the system's design, production, and maintenance.
• The system's structure and operational modes are well known in ad-
vance.
• Necessary statistical data for all components are determined with a
satisfactory degree of confidence.
or, equivalently,
dEk(R) dCk{R)
dR dR
Each optimum can be evaluated and then the variant k with the highest
value of F k ( R k p t ) is selected. Unfortunately, such an ideal situation appears
extremely rare in engineering practice.
£ A jfiji
_ ) z j s M _____________
£ £ V/i
l si sn 1 S j < M
where M is the number of types of units and n« is the number of units of the
yth type in subsystem i.
CONCLUSION
This chapter does not need any special comments. In one form or another,
reliability parameters are discussed in any book on reliability engineering or
theory. As examples, we refer the reader to the wide list of general refer-
ences at the end of this book.
The nomenclature of reliability indexes in a systematic and structured form
can be found in Kozlov and Ushakov (1970) and Ushakov (1985, 1994). The
methodological problems of choosing indexes and quantitative requirements
in reliability engineering are discussed in Gnedenko, Kozlov, and Ushakov
(1969).
REFERENCES
Gnedenko, B. V., B. A. Kozlov, and I. A. Ushakov (1969). The role and place of
reliability theory in engineering. In Theory of Reliability and Queuing Systems (in
Russian). Moscow: Nauka.
Kozlov, B. A., and I. A. Ushakov (1970). Reliability Handbook. New York: Holt,
Rinehart, and Winston.
Kozlov, B. A., and I. A. Ushakov (1975). Handbook on Reliability of Radio and
Automation Systems (in Russian). Moscow: Soviet Radio.
Ushakov, I. A., ed. (1985). Reliability of Technical Systems: Handbook (in Russian).
Moscow: Radio i Sviaz.
Ushakov, I. A., ed. (1994). Handbook of Reliability Engineering. New York: Wiley.
EXERCISES
2.1 Prove that the mean value of a nonnegative r.v. v with distribution Fit)
can be expressed in the form of (2.4) which is equivalent to (2.3).
2.2 A system has an exponentially distributed TTF with parameter A. The
operation to be performed also has a random duration. Find the
1 00 RELIABILITY INDEXES
f N (x \a, <r) =
ay zir
with mean equal to a and variance equal to a 2 . We also assume
that ACT 1.
2.3 Build the graph of the failure rate for the mixture of two exponential
distributions (2.6) with the following parameters, respectively,
(a) A, « 1 [1/hour], A2 « 1 [1/hour];
(b) A, = 0.5 11/hour], \2 " 1 [1/hour];
(c) A, = 2 [1/hour], Aa = 1 [1/hour],
SOLUTIONS
1.1
'o •'o
dt dt
- ftdPit) = -tP(t) + fV(f) = r^iO = A1 ~ ^0)3 dt
1.2
(a) Using (2.4), one writes
(b) First of all, consider the given conditions. Almost all "probabilistic
mass" is concentrated in a relatively very compact area related to
the MTTF of the system. This means that in this area the exponen-
tial function can be successfully approximated by the set with at
most two terms:
(A*)
1 - Ax +
Thus, one has SOLUTIONS 109
In the first case there is no mixture at all; the second and third cases
differ only by the scale. In general, one can write
A
/ (f) = p, A ] e - " + p 2 k 2 e ~^'
and
p , k . e + p2\2e
A(/ )
pxe *«' + p 2 e
UNREPAIRABLE SYSTEMS
In this chapter we will consider the main types of unrepairable systems. The
only type that we will not address is a general network structure, which will
be considered in a later chapter.
restrict ourselves to the case where a system has only two possible states: up
and down.
From the definition of (3.2) it is clear that the x/s are Boolean variables
and /(*,, x 2 ,..., x„) is a Boolean function. We also denote this function by
f ( X ) where X = ix l ,x 2 ,...,x n ) . System reliability structures are often dis-
played as a two-pole network. One of the simplest examples of such a
network is presented in Figure 3.1. The connectedness of this two-pole
network is equivalent to the equality of the Boolean function (3.2) to 1.
Each unit i may be in state JC , = 1 or xt = 0 in random. If each Boolean
variable x: is considered as a Bernoulli r.v., then E{ JC ,} is interpreted as the
probability that unit / is in an up state, and E {/(X)} is defined as the
probability of the system's successful operation:
= and Psysl = E{/(X)}
We consider only monotone functions /(X) for which /(X) > /(X') if
X > X'. Here the inequality X > X' means that > x\ for all i and there is
a strict inequality at least for one i. This assumption is very natural. Indeed, a
unit failure generally will not improve a system's operational state. There-
fore, if a system is down in state X, it cannot be up in state X' with some
additionally failed units. (Of course, it is correct under the assumption that
the system was correctly designed.) We emphasize that it relates only to
systems whose operation can be described in terms of Boolean functions.
where the symbol f| denotes the Boolean product (disjunction). The same
expression can be written in an equivalent form
a(X) = min xt
e{ n 4 = n e{x(. = 1} = n P, (3.6)
1 SiSn ' 1 1
= n mo
(3
.8)
1 sisn
Obviously, (3.5) and (3.8) can be written in a direct way from the verbal
definition of a series system's successful operation:
In a more general case, direct calculations must be used for obtaining the
function P(t) for the system. But in one important particular case, when
each Pj(t) is an exponential function, we can write a very simple expression
A= E A,
1
1
max e ( t ) < K —
I s/ sn n
11 [It is clear that
6 UNREPAIRABLE the
smallness of for each particular case depends on the
SYSTEMS
number of units.] Then, for a system with units having arbitrary distributions
of time to failure,
The error of the calculation in this case will not exceed the value
Note that, at t = 0,
/(0)
m- m -m
If the units are different but some of them have distribution functions F)(/),
i e a, with nonzero first derivatives at t = 0 equal to A,(0), then for small t0
Of course, we assumed that |a| » 1; that is, the number of distributions with
a nonzero derivative is large. Therefore, we see one more example of an
exponential distribution in reliability theory.
SERIES SYSTEMS 115
If the distribution of a unit's TTF is such that dFi^/dt' = 0 for i < k and
d k FiO) /dt k = a, then
[ l ( 3 - 1 4 )
11 if t Tt ^^
\ 0 otherwise
then Pit) coincides with the pf(() of the worst unit, that is,
iW'J-l1 if r
-minr' (3.16)
\ 0 otherwise
Of course, such a distribution does not exist in a real life. (Mathematics
always deals with ideal objects!) But normally distributed r.v.'s with very
small coefficients of variation can be considered as "almost constant" or
"almost nonrandom."
Now consider the MTTF of a series system. For any series system the
random TTF, say Y, can be expressed through the random TTFs of its units
i y t ) in the following way:
Y = min yf
(3.17
7syst = min T,
(3.20)
that is, the MTTF of the system equals the MTTF of the worst unit.
fi(X)~fi(xl>x2t„,txm)~ U (3.21)
1
^Vy=iAji (3.22 a)
x Ay = xVy (3.22b )
xVy=xAy (3.22c)
x Ay = Jt V y (3.22d)
All of these equivalences express the same property but in slightly different
form. The most important one for us is (3.22a). If one considers a series
system of two units x and y, and "1" means an up state, then x V y = 1
means unit x and/or unit y are in an up state; that is, the system is in an up
state. At the same time, x A y = 0 means unit x and unit y are in a down
slate; that is, the system is in a down state. It is clear that these two events
are complementary. To prove (3.22), one may use a Venn diagram. This
diagram graphically depicts random events, their sum and intersection,
complementary events, and so on. A simple case with two events A and B is
presented in Figure 3.4. The proof of (3.22c) one can find in Figure 3.5.
B Av B
AAB
Figure 3.4. Samples of main Venn diagrams.
11 6 UNREPAIRABLE SYSTEMS
u *, -n (3.23a)
1 SiSn
n ** - u (3.236)
I <,i<,n l^JSn
U -n (3.23c)
1 sisw
=U (3.23
c
d)
ISiSH
These latter statements can be proved by induction and we leave their proofs
to the exercises.
Another (almost purely verbal) explanation of (3.23) follows from the
definition of a parallel system's failure which was given at the very beginning
of this section:
Pr{ z = 1} + Pr{z = 1} = I
Consequently,
P
sys. = P \ + Ql Pl + QlQl Pl + ' •• +<?1<?2 ....... Q m -\Pm
+
= Pl + h( Pz h(P3 + ••• ))
^.C) = 1 II A, (3.25)
1 s/ sra
= (i f ° r ' S m a X 7 ; (3-26)
syi,v v
' 10 otherwise '
11 6 UNREPAIRABLE SYSTEMS
Now consider a parallel system's MTTF. For this system the random TTF
(£syst) is expressed through the random TTFs of its units (£,) as
W')-l- n (l-e-V)
I <.m
« £ e-V £ +V+ £ (.-(Ai + Ay + AtX
I I £i<jsm I £i<j<ksm
ISiSm '
1
=
T 5yS| £ T( ~~ £
\<.i&m 1 zi< jsm +
A A +
+(-1)"—=r --------- (3.29)
t-zl< j<k*m i + J £
1
operates successfully if at least k out of its total n units are operating. The
11 6 UNREPAIRABLE SYSTEMS
¥>(X) -
l £(£«
where
+
\0 otherwise
The task of finding the MTTF of such a system for arbitrary unit failure
distributions is not simple. One may use any numerical method. The only
case where it is possible to obtain an analytical result is the case of the
exponential distribution p(t) = exp(-A/).
We will not integrate (3.32), but we will use the method described above.
The system starts with n operating units and operates, on the average, 1/«A
MIXED STRUCTURES 125
units of time until the first failure. After the first failure, there are n — 1
operating units in the system. They work until the next failure, on the
average, 1 /(« - 1)A units of time, and so on until the (k + l)th failure has
occurred. Thus, the system's MTTF equals
where
or
(3.34)
(a)
(b)
Figure 3.6. Parallel-series structure: (a) in an aggregate form; (b) in a detailed form.
- n ( i - n «#) (3.35)
1 I s i s M '
where M is the number of units in a parallel subsystem and = 1 — p.
In conclusion, we make the following remark. If we would like to improve
the reliability of a series system of N units using redundancy, there are two
ways to do so. The first way is to use M redundant systems as a whole. The
second way is to use M redundant units for each of the main units (see
Figure 3.8).
Comparing (3.34) and (3.35), one can find that it is more effective to use a
series-parallel structure rather than a parallel-series structure. In particular,
MIXED STRUCTURES 127
(a)
(b)
Figure 3.7. Series-parallel structure: (a) in an aggregate form; (ft) in a detailed form.
(a)
(b)
Figure 3.8. Parallel-series (a) and series-parallel (ft) structures of size NxM.
11 6 UNREPAIRABLE SYSTEMS
and, consequently,
£ps = max ft (3-39)
1 iisJU
is the random TTF of the parallel-series system as a whole.
Consider the same set divided in such a way that is the random
TTF of the jth unit in the ith subsystem of parallel units. Then the random
TTF of this subsystem is
max ft, (3.40)
1 S/fim,
and, consequently,
£SP = min ft (3.41)
I <.isN
is the random TTF of the series-parallel system as a whole.
A substitution of (3.38) to (3.41) in (3.37) gives, for any sample of r.v.'s ft/f
that £SP > £PS. From this it automatically follows that
Tps = e Ups} ^ E {£sr} = T s p
and
W) = *MfPS P'USP ^ ') - PSPW
For a "long" series-parallel system (when N > 1), the Weibull-Gnedenko
distribution might be applied if the system's reliability is relatively high.
Consider a system of independent identical units. The distribution of the
TTF of each parallel subsystem is such that M is the first order of the
derivative which differs from 0. As we considered in Section 3.1, in this case
MIXED STRUCTURES 129
23
L
<iK!>J
!
<D-Gs>
I
—©—
11 6 UNREPAIRABLE SYSTEMS
Figure 3.9. Examples of the reduction of a complcx structure with parallel and series
inner substructures to the simplest kinds of structures.
STANDBY REDUNDANCY 131
inverse way, one can construct various reducible structures from a single
equivalent unit by a detailed disaggregation at each step. Examples of
irreducible structures (different arbitrary networks) are presented in a later
chapter.
A number of systems have standby redundant units. These units are not
included in an "active" system's structure. Moreover, these redundant units
cannot fail before they occupy an active position. A standby unit instantly
replaces its corresponding main unit after the latter's failure. Generally, such
a replacement does not occur instantly, but most mathematical models
assume that the time of the standby unit's switching into a main position is 0.
The spare parts supply problem is usually analyzed in terms of standby
redundancy. For this problem the sufficiency of the stock of spare parts for
replacement (or repair performance) is considered rather than the system's
successful operation. (Usually, in this case, one considers the inventory
system itself. But for simplicity of terminology we will use the terms system
failure and instant replacement in further discussion,)
For standby redundancy it is convenient to speak about "a socket" and a
set of units which are put into it one by one after each failure. All units for
the socket are considered to be identical. In reality, a standby unit of some
type must be able to replace the main unit only with a unit of the same type.
Z i,
I 5i<.m
Tw = E{9J - E = E m) = E 7] (3.42)
t <is«
The MTTF does not depend on the switching order of the steady units. If all
11 6 UNREPAIRABLE SYSTEMS
Tiyst =mT
= 1 - F * m { t ) = Jf'P^-'Kt - x) d F ( x )
o
=E E =E
(3.44)
(Af)~
pv«(0 - 1-------------- r (3.45)
m\
STANDBY REDUNDANCY 133
(AO 2
+ + •*
n+1 (« + l)(n + 2)
(A/)2
< + - - - - - - -2 + . . .
( n -f I )
(3.46)
This can be used for approximate computations. The substitution of (3.46)
Note that this value is smaller than the exact value; that is, it delivers a
"guaranteed result."
In conclusion, note that standby redundancy is more effective than active
redundancy (at least in theory!). This follows from the simple fact that
The equality is never attained because of the strongly positive values of the
£'s. (Of course, we are considering the case where rn > 1.) Of course, the
reader should never forget that standby redundancy, in practice, requires
some time to switch a unit into an active regime.
Finally, we would like to point out the relationship between the MTTFs for
series and parallel systems of two units. (The result can easily be expanded by
induction to an arbitrary number of units.) Suppose that one unit has a
random TTF f, and another has It is clear that
+ £2 = min(f„f2) + max(ft,f2)
because one of these r.v.'s is obviously larger and another is smaller. Taking
the mean of both sides of the equality, one gets
11 6 UNREPAIRABLE SYSTEMS
In particular, when both £'s are exponentially distributed, one can obtain a
convenient expression for the MTTF of a parallel system of two different
units;
11 1
Tt + ~k2 = Aj + A2 +
111
T — _ i ___ _ _______
1
parallel i v i l
A j A2 A J A 2
Of course, the latter expression has such a simple form only because both
distributions are exponential,
(One period to the first system failure and then n - k replacements of spare
units.)
Therefore, the system's MTTF is
T
Tsyst = E( £ m>n U — (rt - k + I ) — (3.47)
t \ „ (A ktV
/W«(0 = Pr E min U E ^-e"**' (3.48)
M s i s n - t t l ISysAr ) Ojsisn-fc + t
In general, the problem is very complicated. The most reasonable way to
calculate accurate values of the reliability indexes P s y % t (t) and 7~syst is via
Monte Carlo simulation.
Below we give a simple method for obtaining lower and upper bounds for
these reliability indexes. It is clear that the best use of the standby units
would be in a so-called "time-sharing" regime. Here the MTTF of the "/c out
of n" structure could be calculated as the total operation time of all units
divided by k. The upper bound for rsys, follows:
Comparison of (3.47) and (3.49) shows the difference between the accurate
value of Tsyst for the exponential distribution and its upper bound. An upper
bound for Psyst(f) can be obtained via the use of similar explanations:
where F i t ) is the d.f. of the random TTF of a single unit. A lower bound of
the MTTF can be found by integrating (3.51). If the coefficient of variation of
F i t ) is small, (3.51) can be reduced to
i
ttit1t
"Pw-«(')[*A + l a \ ]
PN-m(0 =
+ ( / - 1)«A]
PN-m-1(0
=
K + I (0 = - P *+ I (0[* A +«A ]
P*(0 = -Pk ( O ^ A
=
PN-m - l+ l(')
PN-m-liO = The initial state for this process is the system state
with all operating units, so
pff = 1. The system MTTF can be found immediately
from the transition
graph in Figure 3.10:
11
T y X=m +
'" A( A: + /a) A( * + (y - !) «)
$i%ti 3 3i
For the graph of Figure 3.11, we can write the result immediately without
solving the system of differential equations. Indeed, we have the sum of
m + I i.i.d. exponential r.v.'s with means equal to 1/A( k + a l ) . Therefore,
JWO = L 1 V ., J e-***^"
1 <.j<,k + l J-
For the second case we have the sum of m exponential r.v.'s with means
1/A(fc + a l ) and of one exponential r.v. with mean 1/Ak:
where
+ al) t] \_ u k +
FxO)' E A(k + al) t
OS/Sm J\
and
1
(A la)
^(0 = E
0&j snt+l J "
-A ki
These boundary estimates are given not as essential results but rather as
examples of the possible thinking involved in finding simple solutions to
complex problems.
We repeat that on-duty redundancy, in general, is a real problem in
practice, not because of the solution difficulties but because of the lack of
information. Usually, nobody has the slightest idea of the kind of distribution
parameters (or even the distributions of the TTFs themselves!) which a unit
in an on-duty regime has.
11 6 UNREPAIRABLE SYSTEMS
In the first case, the SD can be, for example, an interface between the
redundant group and the remaining equipment. It can be of a various
physical nature (electrical, mechanical, hydraulic, etc.). The successful opera-
tion of the redundant group depends directly on a failure-free operation of
the SD.
SWITCHING AND MONITORING 141
Figure 3.13. Approximate representation of the duplicated system with the switch as
a series connection of the two redundant units and a non-ideal switch.
Switching Device Using Only for Switching The second case should be
considered in more detail. There arc two possibilities for an SD failure:
1. The SD may fail during the time of the system's operation.
2. The SD may fail with probability Q only at the moment of switching,
independent of the time when it has occurred.
Switching Device Failure Depending on Time Consider a redundant
group of identical and independent units with an exponential distribution of
their random TTFs. The probability of a successful operation of the redun-
dant group is denoted by P R G (t) . The distribution of the switching device
TTF F S D (t} is arbitrary. There are two possibilities for the system's opera-
tion:
11 6 UNREPAIRABLE SYSTEMS
dx
+ 1 - (1 — e~jkJC)4 ]e_A(,~*)ASDe~Asi>*
4 The SD fails at some moment x < t, the redundant group has not
failed until x, and, after this moment, the current main unit does not
fail during the remaining time t — x.
SWITCHING AND MONITORING 143
(3.54)
11 6 UNREPAIRABLE SYSTEMS
f 5
^syst( 0 ~ ) ^siandhy RC( )
+l ('
fJ p { t - y ) d H ( y ) dFso(x ) (3,55)
o ^o
SWITCHING AND MONITORING 145
-x)dq{x) (3.58)
Switching Devices That Fail with Time For active redundancy a system
operates successfully in the following situations:
SWITCHING AND MONITORING 147
6
+ jf'{ Z o * ~ k )[pSD(*)l*[<?sD<*)r-*"1} (3.60)
Of course, (3.59) and (3.60) can be practically utilized only with the aid of a
computer.
« ( o - p ( o + * / f E ("rMipwi'irf*)]""7-1
where Q = 1 - R.
When we consider standby redundancy, the system can successfully oper-
ate if:
The MTTF for both systems can only be found numerically. Note that for
the exponential distribution and large m, (3.62) can be approximated by
= (3.63)
7;ys, = \ (3-64)
Both (3.63) and (3.64) are obtained under the assumption of the correctness
of the application of the result of the random summation to exponentially
distributed r.v.'s. (Note that in our case we consider a fixed number of
Bernoulli trials.)
SWITCHING AND MONITORING 149
This brief review does not, of course, cover the entire theme. There are
various cases in practice which lie outside the scope of this material. Our
main purpose was to display some inferences in this area and not to give the
reader a "cook book."
1. Both parallel units have failed inside a period between two neighboring
check points, even if there is a standby unit.
2. There are no units operating at some moment.
where after a successfully operating cycle of length A the system starts its
failure-free operations from the beginning. A cycle with two failures contains
a portion of useful time which is denoted by A*. Setting A* = A, we can write
11 6 UNREPAIRABLE SYSTEMS
the approximation
n {l- kW P} (3-68)
f* - { max t,\A k )
11 6 UNREPAIRABLE SYSTEMS
f p(x ) dx h _________
p( A) m*) =
and only the group of units at the last stage operates until complete
exhaustion of all redundant units.
Finally, for the
e (a,) n P(^) E A,. + E{(J} system we have
t sis/-! E{ max
V1ii^rti >
(3.69)
Once can use an approximation by replacing £k> with Ak or one can obtain
lower and upper bounds by substituting £k = 0 and (k = A*, respectively.
^j(Aj) = 1 — [l — exp(—AAj)]"'
After the first stage of a successful operation, the system has a random
number, say y, I <;' < «„ of operating units. These units can be used at the
second stage, starting at the moment T , with n2 new units switched in by the
prior rule. Thus, the total number of units acting at this stage equals n2 + j.
The probability of exactly j units {j > 0) being transferred to the second
stage is
i>i~I
PWi J
PU*2) = E ("2(+J
(3-70
)
SYSTEMS WITH DEPENDENT UNITS 153
}
I
11 6 UNREPAIRABLE SYSTEMS
Similarly, the expression for a system with three stages can be written as
£ h W - ' 5 £ fBa+y)p5fl2"1+J-'(i-«3"s+')
1W/1\ 1
(3.71)
Notice that in such systems the most important thing is to define the
optimal intervals Ak and the number of units rij that should be switched each
time. A simple heuristic solution of this optimization problem is presented in
Chapter 13.
In the real world different random events and random variables are often
"associated" and not independent. For instance, units of a system can be
dependent through the system's structure, the functioning environment, the
inner state changing, and so on. In all of these situations, reliability usually
tends to change in the same way for all units: all of a unit's parameters
increase or all decrease.
Two r.v.'s X and Y are associated if their covariance is positive:
A stronger requirement for the association of two r.v.'s demands that the
inequality
Cow [ f l ( X , Y ) , f 2 ( X , Y ) ] >0
holds, where both /, and f2 are increasing or both are decreasing functions.
The vector X = ( X x , X 2 ,..., X n ) consists of associated components if
This normalized value satisfies the condition — 1 < p <i 1. For n associated
r.v.'s it is possible to consider only the case p > 1. From this condition and
(3.73), for a series system of two associated units, it follows that
f II P,i 0 (3-76)
■'O Uisn
Consider the example when each unit of a series system depends on the
same factor p, for instance, the temperature. The system is designed for use
at two different temperatures, p, and p2. The designer decides to check the
probability of a failure-free operation of the series system of n units. For this
purpose, the designer arranges for a unit testing under these two conditions.
The probabilities of the unit's failure-free operation under these two
conditions are /?, and p 2 , respectively. The average unit failure-free opera-
tion probability equals p = (1/2)(p, + p 2 ) . At a first glance, it is very
attractive to try to compute the system reliability index as = p" if we
know nothing about the real conditions of the system's use.
But let us assume that the first condition appears in practice with fre-
quency R , and the second condition appears with frequency Q = 1 - R . (Of
course, the frequency R can be considered to be a probability.) Then a
11 6 UNREPAIRABLE SYSTEMS
realistic value of the index is Piyst = Rp" + Qp\. It is easy to check that
^sysi — Avst ■ (To convince yourself in a particular case, do it with n = 2 and
R = 1.2.) Of course, the same phenomenon will be observed if one considers
more than two different environmental conditions.
Another example of a system with associated units is a system operating in
a changing regime. Assume that a system operates with probability pk at the
k t h regime. Under this regime the system's units have a failure rate equal to
\k. It may happen if the system switches from regime to regime periodically
(or, perhaps, randomly). In this case
x
UO = Lpke~ '"'
v*
This is larger than
v
v* '
So, for a series system we can use the hypothesis of the unit's indepen-
dence to obtain a conservative bound on the reliability index of types P ( t )
or T.
Pr{x x V*2= 1}
= 1 - Prj*! AiJ = 1 - [Pr{5, = 1} ?T{XX = 1} + p ( x u x 2 ) } (3.77)
< 1 - Pr{JE, = 1} Pr{Jt, = 1} = 1 - fl,(r)*2(0
- V a r ^ V a r K) ^
But Cov(x,, x 2 ) = O M x y , jc2) and Varfx,} = Var{jef}, i = 1,2.
Equation (3.78) can immediately be generalized for a parallel system of m
associated units:
O^-oJi-O-^-o
C l ) — O — < D
Figure 3.14. Transition graph for obtaining the upper and lower bounds for a parallel
system with dependent units.
the same time, remember that the system is a parallel system!) We begin the
analysis of the transition graph for this case with state 1.
A single unit operates with A, = A, When two units are operating, the
temperature in the room is higher and, consequently, A2 = 2(A + A*). When
all three units are operating, A3 = 3(A + A*). Under the conditions of the
example, A* < A*. The transition graph for this case is shown in Figure
3.14c.
A comparison of these two cases with the initial system of independent
units shows that both of them possess a less favorable reliability index than
the initial system: in each case the transition intensities are higher than in the
initial case. Thus, on average, systems with associated units reach a failure
state more quickly. We finish this comparison with a comparison of
the
MTTFs:
T- 1 1 1 1 1 1
syst + + + +
3A 2A A"3A 2(A + A2) A + A,
and
1 1 1 1 1 1
T= + + + +
J
syst 3A 2A A ~ 3{A + A*2) 2(A + A*) A
P a =pU[ l - (1 -p A p B ) 2 \ + 2 P p s q v s P A P B
P 6 = P 2 PS[ I - ( 1 ~P A P B f\
K =PPS O ~ ~ <ll) + 2
Pps1PS PA PB
Again, we can deduce that P c > P d without calculation, based only on our
previous knowledge about the reliability of a series system with associated
units.
A consideration of these examples shows us that the reliability of some
auxiliary units may have an influence on other system units in such a way that
the reliability of a parallel-series structure might be worse than the reliability
is equivalent to
( i - 0 2 - [ i - 0 - P2 ) 2 }
R2 >
2p 2
The right part of the inequality is restricted by 1 for any p. Thus if Q > R2
this inequality holds for any p. The solution of the corresponding equality
gives
3 ± y/5
Q « ----- ---- a 0.382
In other words, for an unreliable common unit (in our case the power
supply), with a reliability index lower than approximately 0.6, one should
choose a parallel-series structure rather than a series-parallel one.
For some additional examples of the analysis of systems consisting of
dependent units, see Gnedenko, Belyaev, and Solovyev (1969).
Some units have two types of failures. For instance, a resistor may be
disconnected (leaving an open circuit) in an electric circuit, and in this case
no flow goes through the unit. Or it may burn out, and so will not provide any
resistance at all (a short circuit). One observes an analogous effect with
capacitors: no capacity at all (a short circuit) or infinite capacity (disconnec-
tion). In pipelines, holes allow the leaking of pumped liquid, which decreases
the user's consumption and, simultaneously, decreases the hydraulic resis-
tance. Rubbish in the pipe results in the same decrease in user consumption,
but, at the same time, increases the hydraulic resistance.
In a most essential way, this phenomenon appears in relay circuits. These
circuits are assigned for connection and disconnection and the nature of their
failure can be one of two kinds: they may fail to connect or they may fail to
disconnect. Each unit (relay) itself is subjected to two similar failures. It
TWO TYPES OF FAILURES 161
tfton = ( p+ c) n (3.80)
„ = i - (i - *c o n ) m -1 - [ i ~ ( p + oT ( 3 - 8] )
If the system must provide a disconnection, the corresponding probabilities
and
PdlsCon = [1 ~ cT (3-83)
It is clear that a single relay with the same parameters can perform
successfully in both cases only with probability p. (Any kind of failure makes
one of the operations totally impossible.)
11 6 UNREPAIRABLE SYSTEMS
and
/?con = 1 - ( 1 - p - c ) m = 1 - d m (3.85)
and
Li - {(cii*Pii)>"*>(<V/>i(,i)}
The capacity and the probability are determined by the rules of the cohort
interactions: the "cells" with the values of capacities and the "cells" with the
values of probabilities are considered separately. The capacity is determined
by
= ft" cu = min cu
* 1 ''
ls/srt ?Ji
jjSk »K
Pk « ^ pUi, = n P,J,
i >Si< , n
The operator ft1' in this particular case possesses the following property. If
for two terms of the final GGS there are Ck and Ck + i with ck = ck + l, then
these two terms form a new term with parameters c* and pk determined by
C* = Ck + i and pt = p k =p k + l
c* = min(c*_,,c„) - min(c O
One may stop the procedure as soon as the value c* < c° appears at
some
intermediate step of the calculation.
Now let possess the above-mentioned absorption property and, addi-
tionally, the preference property. In our case the latter means that if two
cohorts have different sets of maniples, then under some specified conditions
the one which possesses the "better" maniple is kept for further considera-
tion and the one with the "worse" maniple is excluded.
In the case of cohort interaction we, at first, use the absorption property
and obtain the final legion in an intermediate form
^ = ((C a c c e p t ,P ),(0,p 0 ))
where p is the sum of all p's of the cohorts with caccept. The
resulting legion, P) after applying the preference
operation, will have the form
L - ( c
It is clear that Rxyst = p.
A Parallel System The formal technique used for parallel systems com-
pletely coincides with the above-described method. But, for convenience, we
will use the corresponding operators 1 3 L , Uc, and 1 3 M . We use these new
symbols to distinguish the operations over the maniples. Indeed, for instance,
for resistance
t Sn
O" = £ rf
l&is n
and
and
UM ft = min ft
1 s i si i ! s i & n
and so on.
In Table 3.2 we present the principal kinds of maniple interaction opera-
tors for systems with series and parallel structures (the considered parame-
ters are denoted in the table by w).
Of course, the functions listed in Table 3.2 do not exhaust all possibilities.
After this brief review of the possible interactions between maniples, we
might begin with a consideration of the system with a reducible structure.
Mixed Structures Here we illustrate how to use the GGS for the analysis
of a mixed structure with a simple example. We will not produce detailed
transformations and calculations because for us and for the reader it is more
reasonable to leave it to a computer. We ignore the physical nature of the
system and its units for the moment.
MIXED STRUCTURES W!TH PHYSICAL PARAM ETERS 167
L= UL aL Ljj
U / s m 5 in,
L = a L U L Ljj
1 £i£n 1 i/ifSm,
Example 3.7 The system with a mixed structure is presented in Figure 3.20.
For this system the following chain of operations for obtaining the resulting
legion can be written: For the system as a whole, Figure 3.21a, we obtain
L = f l / j ( L , , L(J4))
DA) = Ul(Ub\L(C))
We will write expressions for L ( B ) and L(C>, again with no explanations (use
the ancient rule: "see the sketch"):
L<B> = 1 3 L ( L 4 , C l L ( L 2 t L 3 ) )
and
L(C) — C I L ( L 7 , U L ( L 5 , L 6 ) )
Therefore, the final macroalgorithm for the system GGS computation can
now be written in the final form
MIXED STRUCTURES W!TH PHYSICAL PARAM ETERS 169
Figure 3.21. Sequence of the system structure transformation for Example 3.7.
(c) Subsystem B
CONCLUSION
REFERENCES
Barlow R. E.,and F. Proschan (1975). Statistical Theory of Reliability and Life Testing.
New York; Holt, Rinehart and Winston.
Gncdcnko, B. V., Yu. K. Belyacv, and A. D. Solovyev (1969). Mathematical Methods
in Reliability Theory. San Diego: Academic Press.
Kozlov, B. A., and I. A. Ushakov (1970). Reliability Handbook. New York: Holt,
Rinehart, and Winston.
Ushakov, I. A., ed. (1985). Reliability of Technical Systems: Handbook (in Russian).
Moscow: Radio i Sviaz.
Ushakov, I. A., ed. (1994). Handbook of Reliability Engineering. New York: Wiley.
EXERCISES
Figure E3.2.
3.5 Write the Boolean function <p{X,) for the scheme depicted in Figure
E3.3.
SOLUTIONS 171
Figure E3-3.
SOLUTIONS
3.1 For given sets X and Y (see the shadowed areas in Figures E3.1a and
b, the union is a set of elements belonging tojit least one of them (see
the shadowed area in Figure E3.1c). Then X and Y are depicted in
Figures E3.1 d and e (see the shadowed areas). In Figure E3.1/ one
finds the area X A Y shadowed. Consequently, the complementary area
is X A V. Obviously, the latter area coincides with the shadowed area in
Figure E3.1c. Thus, the desired result is obtained.
3.2 For example, let us prove identity (3.22b)
X A Y=X V Y
Take a rejection operation from both sides of the identity which does
not violate it
X A Y = X V P= X V Y
Now use a rejection operation to all arguments which also does not
11 6 UNREPAIRABLE SYSTEMS
X A Y=*Xv Y = Xv Y
Thus, this identity is reduced to the first one, (3.22a), which was proven
in the previous exercise.
3.3 Using (3.22a), one has for the three arguments
Xx V X2 V X3 = (X , V X2) V X3 =( X] V X2) A X3
V X2 v X3 = Xx A X2 A A"3
U x
i = 0
11 6 UNREPAIRABLE SYSTEMS
1 SiSn- 1U X, = ( u v
FM—1 ' I ^ iV « I '
1 SiSn - 1 M <iSB - I
1 <i<.n A X ,n
1 Si SB - 1 J SI SB -1
u x] A xn — | n X,
1SiSB- 1 ' \J S i — t
V
n x\ ax „~ n x t
1 SiSB - i ' 1 </Sn
This completes the proof.
Denote Yl - X} A X2 and Y2 = XA v A^. In this notation
A [<*,A* 2 ) V(* 4 V* 5 )J
1 SiSB-1
<P(X ( ) = J f , A [ ( A JT2) A A
Denote X 4 V X s - Y l t X 2 A Y t = Y 2 , Y2 V X 3 = Y4. In this notation
tpiX,) = X { A Y 4 or in open form
— Xj A {X3 V [X2A(X4VX5)]j
The final expression using only logic AND and rejection operators has
the form
Xx A
1 - 0.999 = q3
CHAPTER 4
LOAD - STRENGTH
RELIABILITY MODELS
= Pr{X> r} (4.1)
167
= / " [ ! - d G ( x ) = f G ( x )J d F ( x ) (4.2)
— CO —X
/? = / / /(* )<& dx f ( x ) d x
f i x ) = _l=e-<*-*>W (4 .4)
oyv^fl-
where S and crf are, respectively, the mean and the standard deviation of the
strength's distribution Fi t) , and
gix) = (4.5)
where L and CRG are, respectively, the mean and the standard deviation of the
load's distribution Gi t) .
Notice that we consider the area of domain of both distributions to range
from -oo to Of course, one should consider truncated distributions such
that their r.v.'s cannot be negative. But, in practice, S > 3af and L > 3crg, so
that such a truncation does not lead to any crucial numerical errors.
STATIC RELIABILITY PROBLEMS OF "LOAD - STRENGTH" TYPE 169
Now introduce the new r.v., Z = X - Y . The mean of this new r.v. equals
E{Z} = S - L and
<rh - i/tf +
I
( x - E{Z}) S — L
R = Pr(Z > 0) = f —j^exp dx = <f>
•'o ( r h v 2 v 2cr,2 LA
(4.6)
Numerical results can he found from a standard table of the normal distribu-
tion.
From (4,6) one can see that the reliability of the construction decreases
if the variances of X and/or V increase. Roughly speaking, the more uncer-
tain the conditions of use and the more unstable the quality of the construc-
tion, the lower is the reliability of the construction.
Example 4.1 The span of a bridge has a safety coefficient c s equal to 5. The
safety coefficients is determined as c s = S / L . The coefficient of variation of
the strength K s equals 0.05 and that of the land K , equals 0,2. (a) What is
the probability of a successful operation of the construction? (b) What is the
probability of a successful operation of the construction if the coefficient of
variation of the strength is twice as large?
of 7 5 = j = of — ) = $(3.33) = 0.999517
\ v 1-25 + 0.2 J \ 1.2 ) v 7
(b) In this case the value of the variance is 2.50 and the probability of a
successful operation equals
170 LOAD - STRENGTH RELIABILITY MODELS
(4.7)
1
g(x) « - e
R = Jf f ( x ) f g ( y ) d y d x
o I/O
Notice that
fXg(y) dy = f X\e~Ay dy = l- e
1
ex
rr;V2-rr h ) p o
1 I I x - S cxp
(~l)
pr~ ( exP dx
o-fi/2-rr a,\l7T J0
f
Jn
172 LOAD - STRENGTH RELIABILITY MODELS
Then
(Jf\!2 IT
<Te a i
dx
2 o f (Ti
s + +2S
* - t T " 7F
V
J
2lT
[-S-(rf/L) \/
Of
Change the variables as
l - d>
2
2 l L L
_}_<!>----- _ exp
Notice that for most practical problems such as this, the strength S should
be located "far" from the point 0. This means that the value S / L 1 .
Incidentally, this corresponds well to the assumption that we do not take into
account the truncation of the normal distribution in / = 0. In this case, of
course, 1 - <t>
2 (4.11)
2 rt L
R = 1 - exp
s "i >
~ t
(4.10)
o>
174 LOAD - STRENGTH RELIABILITY MODELS
If, in addition, Aoy isR small, say of order 1, then it is possible to write
« 1 — exp the
(4.12)
next approximation
2 12L L2
Therefore, if one takes into account that the mean load equals L = 1/A,
(4.12) can be rewritten as
exp G*(x) dx
42K 2 I (Tf
1 1 lx-S
JR - f -
0 cr,
where
.0 for x I * (4.14)
(X) Mx )
\ 1 — e ~ ~ ' * for x > l *
1 f x - ( S - I*)
exp 2 [l - e ~ A x ] d x (4.15)
JI* yi/lTT
y cr,
/ S - I* \
R = 1 - 4> ----------------- ---- exp _-[2(S-/*)A+AV/]
(5 - /*) - A<rf
X 1 - <t> - (4,16)
(4.17)
for (S - l * ) / a f » 1 and
R
y
- T g ( x ) r f ( y ) dy dx= f g(x) dx + fg(x) ff((y) dy dx
0 /xzs* J
0 V J
x
(4.19
)
Notice that we again use the lower limit of 0 in the integral. As we pointed
out above, for numerical calculations the truncation of the normal distribu-
tion at f = 0 to the left can be neglected.
A simple transformation leads to
jf\
174 LOAD - STRENGTH RELIABILITY MODELS
s* - L 1
R = <P| / -
J
a cr.
exp
y/l^ Id}
R „$ ' s* - L
Additional results for some other important particular cases can be found
in Kapur and Lamberson(1977), We would like to mention that this reference
contains useful formulas for the Weibull-Gnedenko distribution which is
important for description of the strength of mechanical construction.
possible observations of the r.v. X , the strength will be smaller than the
specified load Y. It means that with conditional probability k * / m , the
investigated system will fail under the load Y; that is, the complete probabil-
ity of success is
R-1-- Z kt
m
If tables of the distributions F(x) and G(JC) are available, the numerical
calculation of the index R can be performed using the following formulas:
G
R=Z ( { m + TU)[^((IFI + - F(mA)] (4.24)
1 sm<.M U 2/ /
where A is the chosen increment and M is the number of increments.
it is clear that the summation can only be performed in the area of the
distribution's domain where the corresponding values of the product terms
are significant. For practical purposes, the increments may be chosen to have
a value ranging from 0.5 to 0.05 of the smallest standard deviations of the
distributions F ( x ) and G(JC). Obviously, the more accurate the result that is
180 LOAD - STRENGTH RELIABILITY MODELS
needed, the smaller the increments must be. For practical calculations, the
left bound m of the summation must begin with the value k = { m : F ( - m A )
< e} where e is chosen in correspondence with the needed accuracy.
STATIC RELIABILITY PROBLEMS OF "LOAD - STRENGTH" TYPE 181
567 8 9 10
1=5 S= 10
Solution. We present Figure 4,2 to illustrate the solution. This figure helps
us to see that, for example, the point 7 corresponds to L + cr, and, at the
A m same time, corresponds to S — 3oy, and the point 9 corresponds to L + 2<rK
and to S - oy, and so on. Use a standard table of the normal distribution
and arrange (only for illustrative purposes) the new Table 4.1 with the input
data for numerical calculation. Thus, the probability of failure equals 0.98385.
A calculation with the use of the strong formula gives
TABLE 4.1
Value of Argument Value of Intermediate
G(k + 1) - G(k) k + 1/2 F ( k + 1/2)
[7,8] 0.0062 0.00057
t8,9] 0.0920 7.5 1 0.00295
Interval
[9,10] 0.0441 8.5 0.0668 0.01114
[ k[10,1
, k + 1] 0.0164 9.5 0.692 0.00149
1] 0.0049 10. 0.308
5
182 LOAD - STRENGTH RELIABILITY MODELS
0.01615
STATIC RELIABILITY PROBLEMS OF "LOAD - STRENGTH" TYPE 183
The relatively large error is explained by the use of excessively large incre-
ments.
REMARK. The unreasonably high level of accuracy of the calculations is presented only to
compare the obtained solution with the exact solution. Once more we would like to emphasize
that for practical purposes the use of "too accurate" a solution can be considered almost
incorrect because of the very rough statistical data which we usually have in practice.
Solution. Find upper and lower bounds on the probability R. For the
purpose of numerical calculation, construct a special table (see Table 4,2)
based on standard tables of the normal and exponential distributions. Table
4.2 contains the meaning of the corresponding distribution G i x ) and the
TABLE 4.2
X Z| O(m)
m Fim) Aim)
1 10.0 2.00 0.9773 0.00 0.000 0.221
2 10.5 2.25 0.9878 0.25 0.221 0.173
3 11.0 2.50 0.9938 0.50 0.394 0.134
5 11.5 2,75 0.9970 0.75 0.528 0.104
7 12.0 3.00 0.9987 1.00 0,632
184 LOAD - STRENGTH RELIABILITY MODELS
R = (0.09878)(0.221) + (0.9938)(0.173)
+ (0.9970)(0.134) + (0.9987)(0.104) + r = 0.9956
Pr{n S K \ s 0 } = e ~ « K
Pr { u ^ K } = /J [ G ( x ) ] K d F ( x )
c
S-L
where a s is the standard deviation of the d.f. F i x ) . We can conclude that the
DYNAMIC MODELS OF "STRENGTH - LOAD" TYPE 187
right "tail" of the distribution G(JC) is concave in the essential area of the
domain of the distribution F(JC). Then the following simple bound is true:
E H = /J[ 1 ~G(*)]-'c/F(x)
c
The corresponding approximation for small q is
EH < [1 -G(S)]-1
1. Assume that the material strength decreases from cycle to cycle in such
a way that pk + i = apk where pk is the probability of success at the
fcth cycle and a is constant, 0 < a < I. Then
= pK Ft ak=pKa(K^ 2/2
2. Again consider the case where the level of the strength is known and
the deterioration is described by an exponential decrease of this level:
at the fcth cycle, the level of the strength is xk = s°ak where 0 < a < 1.
The probability of success over at least K cycles equals
Note that G(J°A*) > and for the most commonly used
distributions, this discrete function is concave. Then
Pr{* £ K \ F { x ) , a ) - / f l G ( a k ) d F ( x )
J
C is k ^ K
We do not have a simple approximation for this case,
The internal integral can be computed instantly because of its special limits
/a
f(x,v\t)dx = dtvf(a,v\t)
<j~vd i
Adding and subtracting (4.35) and (4.36), one can easily obtain the two
following equations:
Using (4.35) for any time interval T , one can obtain the mean time of x ( t )
being over the specified level a . To obtain the result, we use the following
simple arguments. Let us divide the total period T into n small nonoverlap-
ping intervals located around the points t j , ) < j < n . For some t j , we can
write
Ta~ L A, (4.43)
1 <,i<,n
and the mean time when the function jt(r) exceeds the level a equals
E{rfl}= L E{A,}
(4.44
)
1 S.j £ n
At the same time,
E{A,)-a, f f ( x \ t j ) d x (4.45)
a
Using (4.45) and taking the limit in (4.44), we obtain the expression for the
total time for which the process ;c(r) exceeds the level a :
K= Z Nj
I s/sn
where
E{Nj) ~ p(a\tJ) 8J
(4.4
8)
In addition to (4.46), the last expression permits us to write the expression for
the mean time t a for which the process x ( t ) exceeds the level a during a
single intersection. Indeed,
<4 50)
* BJAU '
T
f ff(x\t)dt
ta = - J o J a ------------------ (4 51)
a j oo \'
/y / u f ( a , u \ t ) d v d t
oo
192 LOAD - STRENGTH RELIABILITY MODELS
All of these results are essentially useful for stationary processes because
in this case all of the functions do not depend on the current time, that is,
f ( x \ t ) = f i x ) and f ( x , v \ t ) = f { x , u ) . Then all of the previous results can be
DYNAMIC MODELS OF "STRENGTH - LOAD" TYPE 193
(4.53)
ETO = T[°°uf(a,v) dv
ff(x)dx
. a
(4.54)
Tl vf(a, v) dv
o
Naturally, for the stationary process the values of E{TJ and E{/V„} depend
only on the length of the period T . More precisely, they are proportional to
T . The mean time E{fa} for which the process exceeds the level a does not
depend on T . For a stationary process one can also introduce the mean
number of itersections per unit of time Aa:
(4.55)
It is known from the theory of stochastic processes that the ordinate of the
GSP and its derivative for the same moment of time are noncorrelated. Thus,
the joint density function can be presented as the product of the two
separated densities
(x~E{X))8 - V2
f(x,v) = exp exp . <«*)
2 a2 2r2
8 a?
Note that the variance cr3 can be expressed through the correlation function
of the process as
(4.58)
T-0
cr ( a - E{A"})
= P(a) u= exp (4.59)
'.7T(T
,.
The expression for E{ra} can be obtained in an analogous way:
a — x
r = t t — exp 2 7Tu}
1 - <t> (4.60)
DYNAMIC MODELS OF "STRENGTH - LOAD" TYPE 197
A ------- (4.61)
P 0 = exp T
J
f vf(a,v)dt (4.63)
o
For Gaussian processes, the probability P 0 ( T ) can easily be written with the
use of A0 from (4.59).
It is difficult to estimate the error obtained via the use of such an
approximation for the Gaussian process. The only simple physical explana-
tion lies in he fact that there is practically no correlation between neighbor-
ing moments of intersections of the specified "high level." (To check this
fact, one should take into consideration the mean time between two neigh-
boring intersections: "too much water has passed under the bridge" after the
previous intersection!)
Solution. Let
R( t) = y /x \t) + Y \ t )
(4.65
)
and
dR(t)
= (4-66)
ff(r)dr
T = ---------------------- (4.67)
/ vrf(a,v r)dvr
Jn
where f ( a , v r ) is the joint density of the distribution of the two r.v.'s R and v r
for R = a .
Consider an arbitrary period of time T . There are, on average, k a T
intersections, and the vector parameter represented by the point ( X , Y ) is
outside the specified tolerance zone during the mean time E{ra}Aa7\ Conse-
quently, the system parameter will be inside the tolerance zone, on average,
during the time T [ 1 - E{fa}Aa], The mean time that the parameter spends
inside the tolerance zone is
r = 1 Tx al =r-E{U (4.68)
r=
J
ff(r)dr
o _________
-oo
/ vrf{a,v r)dvr
Jn
DYNAMIC MODELS OF "STRENGTH - LOAD" TYPE 199
Now we find the corresponding densities and compute the final result in a
compact and constructive form. At first, notice that the r.v.'s X and Y are
independent and have normal distributions, so
I 1 2 2
(x + y )
2tt<T exp
f(*,y) = (4.69)
2tr
CONCLUSION 200
Further,
n
that is, /(r) is a Rayleigh density.
To determine the densityd 2 K x ( r ) f i r , u r ) , we need to consider a system of four
normally distributed r.v.'s: X , Y , v x = d X ( t ) / d t , and v y = d Y ( t ) / d t .
For a
Gaussian process, all of these r.v.'s are independent. The variances of v x and
Xexp
la2{a2 + p2)
After substitution of (4.70) and (4.72) into (4.68), we obtain the final result
a y a + ft ^ '
2
p2 " +
This example shows that the use of stochastic process theory to find
reliability indexes is not a simple task. But difficult practical problems always
need the use of more or less complicated mathematical tools. Note also that
besides the technical complexity of the solution there are also some special
needs concerning the input data. Such data are not always available.
SOLUTIONS 201
CONCLUSION
REFERENCES
Barlow, R. E., and F. Proschan (1975). Statistical Theory of Reliability and Life Testing.
New York: Holt, Rinehart, and Winston.
Becker, P. W., and F. Jensen (1977). Design of Systems and Circuits for Maximum
Reliability and Maximum Production Yield. New York: McGraw-Hill.
Bolotin, V. V. (1975). Application of Methods of Probability Theory and Reliability
Theory for Construction Design (in Russian). Moscow: Stroiizdat.
Gertsbakh, I. B., and Kh. B. Kordonsky (1969). Models of Failure. Berlin: Springer,
Kapur, K. C, and L. R. lamberson (1977). Reliability in Engineering Design, New
York: Wiley.
Konyonkov, Yu. K., and I. A. Ushakov (1975). Aspects of Electronic Equipment
Reliability Under Mechanical Stress {in Russian). Moscow: Sovictsko Radio.
Rice, S. O. (1944, 1945). Mathematical analysis of random noise. Bell Syst. Tech. J.
vot, 23, no. 3, and vol, 24, no. 1.
Sveshnikov, A. A. (1968). Applied Methods of the Stochastic Function Theory. Moscow:
Nauka.
SOLUTIONS 203
Ushakov, I. A., ed. 0985). Reliability of Technical Systems: Handbook (in Russian).
Moscow: Radio i Sviaz.
Ushakov, I. A. (1994). Handbook of Reliability Engineering. New York: Wiley.
Vinogradov, O. G. (1991). Introduction to Mechanical Reliability: A Designer's Ap-
proach. New York: Hemisphere.
EXERCISES
4.1 The distributions of both a strength and a load are normal. The mean of
the load is known: L = 10 conditional units and the standard deviation
a g = 2. Find the parameters of the distribution of the strength S and oy
which deliver a probability of failure-free operation equal to R = 0.995.
4.2 The distributions of both the strength and the load are exponential with
parameters 1/5 and 1/L, respectively. L = 1 conditional unit. Find S
which delivers a probability of failure-free operation equal to R = 0.995,
4.3 The strength's distribution is normal with unknown parameters S and oy
and a known coefficient of variation k = 0.04. The distribution of the
load is exponential with L = 1 conditional unit. Find the parameter S
which delivers R = 0.999.
SOLUTIONS
4.1 First of all, notice that the problem as formulated here is incorrect: one
should know in advance the mean of the strength a or its standard
deviation or the coefficient of variation k = o-*/S2. Without this
correction the problem has no unique answer.
Let us assume that one knows 4k — 0.04. The problem can be solved
by sequential iterations. For choosing a first value of S , notice that
because of the requirement, R = 0.995, there must be at least more
than L + 2.5<7-K. Choose S(1) == L + 3ag = 16. Then
S(2) = L + 3 o ■ + 3<rr(1> = 26
204 LOAD - STRENGTH RELIABILITY MODELS
26 - 10
I 26 - 10 \
(MT53)_,1,(2'22)°0- 987 P = 0
Thus, the value 5 is still smaller than one needs to deliver R = 0.995.
The procedure continues. (We leave it to the reader to obtain the final
numerical result.)
4.2 From (4.9) one can write
L R 1 • (0.995)
S = = 199 conditional units
0.005 I - R
This coefficient of safety is too large. The assumption that both strength
and load distributions are exponential is unrealistic in practice. At the
least, this is quite unreasonable as a distribution of strength.
4.3 For a highly reliable construction, one can use (4.12) or (4.13). This
gives
1 / 2 5 0.045 2 \
R = 1 - exp ~ 2\T U~j = 0.999
or
0.025
= 0.001
exp
and the probability of a failure-free operation, say during 200 hours, equals
Both values for the used item are larger than for the new item on the
average. Of course, there is no change in the item itself. We have only new
information which allows us to make a posteriori a new conclusion about the
item's reliability. An analogous example was considered in Chapter 1 when
the mixture of exponential distributions was analyzed.
Notice that we observe a similar effect in "burning-out." It is normal
practice to use some stress tests (temperature shocks, accelerated vibration,
etc.) for selecting technologically weak items. The same effect is observed
when weak units and manufacturing defects are eliminated during a high-
failure "infant mortality" period under normal conditions.
It is the appropriate time to recall the ancient Greek myth about the
Spartans who killed their weak and ill infants by throwing them from a high
rock into a canyon. They did this to ensure that their remaining children
would be healthy and strong. (We must state that this is only a myth: it was
not a custom in the ancient democracy. As new sources claim, rich, free
citizens of Greece replaced their weak and ill infants with the healthy babies
of poor families.)
Of course, there OF
DESCRIPTION areTHEnoMONOTONICITY
"immortal"PROPERTY
items. OF
First
THE of all, RATE
FAILURE if a failure rate
207
decreases in the initial phase, an increase in the failure rate at some point is
inevitable. Many items have the failure rate function of a "U-shaped form
(see Figure 2,2). Second, even a probability distribution with a decreasing
failure
208 DISTRIBUTIONS P (MONOTONE
rate hasWITH = 1 - F(<») = 0.
INTENSITY (Of course,
FUNCTIONS this puts a special condi-
tion on the decreasing failure rate function.) The exponential distribution is
the boundary distribution between distributions with increasing and decreas-
ing rates.
One of the basic characteristics in further analysis is the "conditional
instantaneous density" of the time-to-failure distribution. For this conditional
density, we usually use the terms failure rate or failure intensity. The strict
mathematical definition of this, as we mentioned above, is
AC)-^ (5-.)
P(t) = 1 - (1 - -
and
A1e~A,f + X 2 e - (A, + A2)e-(A'+A^'
=
+ e A2 I _ g -( A,+ A2 ), —
(5.4)
UNIT WITH AN IFR DISTRIBUTION OF TTF 209
> e~
1 A2~ A,
rn = ------- ln-
° A, A, (5.6)
Thus, the function A(r) for the system starts from 0, then intersects the level
Aj from below, and after this reaches its maximum and exceeds the limit
value of A! from above. From (5.5) one can see that A(f) is monotonically
increasing if A, = A2 = A. Figure 5.1 presents the A(f) behavior over time for
three characterizing proportions between A, and A2,
Below we consider the distributions with an increasing failure rate (IFR
d.f.'s) though this is only one (and the most narrow class) of the "aging"
distributions. The reader can find other subclasses of the "aging" distribu-
tions, as well as the "younging" distributions, in the original interpretation,
in the excellent book by Barlow and Proschan (1975).
210 DISTRIBUTIONS WITH MONOTONE INTENSITY FUNCTIONS
0.5
The evaluation of a
unit's indexes is
0.4
equivalent to
finding the
0.3
parameters of
the corresponding
0.2
distribution of the
unit's TTF. If we do
0.1
not know any
additional
t
information about0 2 4 6 8 1
the distribution,
0
Figure 5.1. Example of a nonmonotone failure rate function for a duplicate system of
two different units both with an exponential distribution of time to failure.
the general evaluation is a
Chebyshev inequality of the type
1
r" , , ,2 Var{^} o-2
O - CC C c
Proof. The intensity function A(f) for the IFR distribution might increase
infinitely or be bounded above by some number A*. In the first case, there
exists a moment tn when
A('o) = /'°A(Jc)<fr-A*c
Jn
A(y)
Ay
A
P(r) < e" '
(5.9)
From (5.8) and (5.9) it follows that the "right tail" of the IFR distribution
decreases faster than the corresponding tail of the exponential function.
—
a
/ = 0, say
at
then P ( t ) lies everywhere below e .
Proof. The proof follows from the fact that A(f) > at for all t. Some hint of
a graphical explanation can be found in Figure 5.2.
Corollary 5.2 An IFR d.f. P i t ) necessarily crosses e ~ x ' from above once if
both distributions have the same MTTF equal to T .
Proof. By Theorem 5.1 both d.f.'s have to intersect once or not intersect at
all. The second statement contradicts the corollary, so we need to check this.
Suppose that both d.f.'s do not intersect. This means that
which contradicts the statement concerning the equality of the MTTFs. For a
graphical explanation, refer to Figure 5.2.
7P ( T )
UNIT WITH AN IFR DISTRIBUTION OF TTF 213
log J^ Q - togP(O)
t - 0
1 ^(0
7losm
Corollary 5.4 For an IFR d.f. the initial moments of all orders are finite.
The last corollary shows that arguments about the properties of "aging"
units, which seem to be just qualitative statements, have led to very strong
restrictions on the moments of an IFR d.f. Incidentally, note that the
coefficient of variation of the IFR d.f. is always less than 1.
Now, using all of the above results, the following important characteristics
of IFR d.f.'s can be obtained.
UNIT WITH AN IFR DISTRIBUTION OF TTF 215
ln(l -p)
a - ------ ----
Proof. An exponential function can be found which goes through the point
( £ p , 1 - p ) . The parameter of the exponent can be found from the equation
e-«fp =i-p
e
no ={ 0
r = fp(t)dt
J
o
Proof. We first present a rigorous proof. For an IFR distribution the function
A(r) is an increasing convex function, so by Jensen's inequality
PiT)>e~1 (5.12)
217 DISTRIBUTIONS WITH MONOTONE INTENSITY FUNCTIONS
1
z
0
218 DISTRIBUTIONS WITH MONOTONE INTENSITY FUNCTIONS
Figure 5.4. Explanation of the proof of Theorem 5.5: finding to, by constructing the
exponents truncated from the right.
UNIT WITH AN IFR DISTRIBUTION OF TTF 219
Upper bound
Figure 5.5. Area of possible values of IFR reliability functions with the same MTTF
and samples of different IFR reliability functions.
As a result, we have lower and upper bounds for the IFR function P i t )
which are represented in Figure 5.5. In this figure P x i t ) is the function with a
coefficient of variation close to 0, and Pxit) is the function with a coefficient
of variation close to 1.
Theorem 5.6 An upper bound for the quantile of the IFR distribution
is expressed by its MTTF, T , and corresponding probability p , p = 1 -
Pr{f > £p), as
ln(l -p)
HQ =« = _ln(l -/>)
Now the chain of obvious inequalities based on the previous results can be
written as
PtP
-ln(l- p )
Theorem 5.7 A lower bound for the quantile £ of the IFR distribution is
expressed by T and p as
For the second case with the use of Theorem 5.2, we immediately write
1-p-!>(*,) a
1 - p = />(£,) < e -l
Corollary 5.5 For the median M of an IFR d.f., the following bounds are
valid:
M M
------ < 7 < -------
2 In 2 In 2
Proof. The proof follows automatically from Theorems 5.6 and 5.7 after the
substitution p = 1/2.
If instead of the MTTF, we know the variance of the IFR distribution, the
bounds can be improved. We do not consider these more complex eases and
advice the reader to refer to Barlow and Proschan (1975). for an excellent
discussion of the subject.
rg(x)dx=o
J
o
rf{x)8{x)dx^i>)o
•'n
222 DISTRIBUTIONS WITH MONOTONE INTENSITY FUNCTIONS
= / ( « )J f g ( x ) d x + f (} a ) f g ( x ) d x = / ( « ) f g ( x ) d x
0 a •'o
-0
The sense of the lemma is clear from Figure 5.6 where an increasing
function f i x ) is shown. Obviously, the square 5, is taken in the resulting
expression with less "weight" than the square S 2 , and thus the sum turns out
to be negative.
for
no- n
\.0 for t > t *
where t * = min T r
Proof. The proof immediately follows from Corollary 5.2 by a simple substitu-
tion.
This lower bound is very important in practice because it gives a guaran-
teed estimate of the real but unknown value of the reliability index.
SYSTEM OF IFR UNITS 223
1
for T ^ t < T 2
for T2zt < T3
P(t) <
for t £ T
exp(-®<j?r)
exp[-HP + < o f ) t ]
where T h 1 < , i < , n , are ordered MTTFs and each <y$ is found from the
equation
1 - <4>7; = exp(- ( o t f t )
Theorem 5.11 The MTTF of a series system of independent IFR units has
the following bounds:
1
(5.17)
224 DISTRIBUTIONS WITH MONOTONE INTENSITY FUNCTIONS
r - ,min T>
1
E
r
Proof. An upper bound follows trivially from the obvious statement that for
any t the system is less reliable than any of its units:
p,( o* n p,{ o
1
SYSTEM OF IFR UNITS 225
Therefore,
7; - f p , ( t ) d t S f n P t ( 0 d t = rsys(
'0 -'o 1 si'sr
We may use Lemma 5.1 to obtain a lower bound. We show that the
replacement of an arbitrary unit with an IFR distribution of TTF with a unit
with an exponentially distributed TTF, which has the same MTTF, leads to a
decrease of the series system's MTTF. Suppose that such a replacement is
done for the «th unit of the system. We need to prove that
f n P i ( t ) d t ± fe-'T* n r , ( 0 d t
J J
0 liiirt 0 JiJin-t
or, equivaiently,
A = f [ P n ( t ) - e-«T*] ]1 P A t ) d t > Q
J
0 litin-l
Note that, by Theorem 5.1, Pnit) crosses expi ~ t / T „ ) once and from above
and, by assumption, both these functions have the same MTTF, Thus,
p
n ( l ) - e~ ' /T"
corresponds to the function of Lemma 5.1. At the same time, the
function
n ^(o
lilin- 1
corresponds to the decreasing function f i x ) in Lemma 5.1. Thus, by
Lemma 5.1, A > 0, and the desired intermediate statement is proved.
The systematic replacement of ati system units with an IFR distribution
with units with a corresponding exponential distribution produces
/1\1
rsyst s / nJ e ~ , / T ' d t - / exp - t E T \ d t = -------------------------------- p
0 lsis» 0\1 T{) y
L* j
i
Thus, the theorem is proved.
The upper bound can be improved if we possess additional information
about the distribution P,-(f)> for example, if we know the first derivatives in
t = 0.
226 DISTRIBUTIONS WITH MONOTONE INTENSITY FUNCTIONS
Theorem 5.12 The upper bound of the series system MTTF can be written
as
This distribution E * { t ) has the same MTTF as the initial distribution Pn(t).
Hence, E * ( t ) crosses P n ( t ) from above (see Figure 5.7).
SYSTEM OF IFR UNITS 227
IFR reliability function and the exponential function truncated from the right where
their derivatives in / = 0 are equal.
228 DISTRIBUTIONS WITH MONOTONE INTENSITY FUNCTIONS
In the expression for the system MTTF, replace the unit with distribution
P„(t) by E*(l). The new system MTTF is
Again we can use Lemma 5.1, noticing that E * ( t ) - P„(t ) corresponds to the
function g ( x ) from Lemma 5.1 and corresponds to the
decreasing function from Lemma 5.1. Thus, by Lemma 5.1, A > 0, that is,
the replacement of any IFR unit, say the nth, with a unit with distribution
E * ( t ) might only increase the system's MTTF.
Thus, the systematic replacement of units in the above-described
manner leads to the final
result
J
0 I dt
- min a{ £ A;(0)
1 - exp
L A,<0)
1 Si£n
and this completes the proof.
where
/* = min Tj
PI [l - exp(-«(./)l for t £ t *
Q ( t ) > { lidJ
(5.21
)
for t < t *
where
t * — max Tj
1 SfSm
for t < t *
n o ^ i - n [l ~ exp(-w^r)] for t > t * ( 5 - 2 2 )
1 SiSm
Theorem 5.15 The MTTF of a parallel system of independent IFR units has
the bounds
1
max ij isys, T, < Ti iyst
i < £ Tt~ Z i j
. . . i i
l^i'^rn 1 £i<j<.m __ y,
T T
1
+ * • • + ( - ! ) " ' - - - y (5-23)
ISiSm i
Proof. This proof is analogous to the proof of Theorem 5.11 and so we omit
it.
230 DISTRIBUTIONS WITH MONOTONE INTENSITY FUNCTIONS
Again note that the lower bound is trivial and can be instantly found for a
degenerate distribution, that is, for the case when ail T/s are constant.
CONCLUSION 231
Wc rtow mention that, for a parallel system, the MTTF is larger if the unit
failure distributions have larger variances. (In qualitative terms, this result is
close to that obtained for dependent units.) It seems paradoxical that an
unstable production of some units for a parallel structure is better than a
stable production: we used to think that stability is almost always better than
instability. But there is no enigma at all if one notices that the random time
to failure of a parallel system is the maximum of the unit's random time to
failure.
Theorem 5.16 A lower bound for the MTTF of a system consisting of units
with an IFR distribution, for which we know the first derivative in t — 0, is
1
Tjyst — E Z/
1 £/S(B
x[l - exp(-min{7;,7}l[A,.(0) + A,(0)])]
AAO)! 1 min T , Z A , ( 0 )
+ (-1)' exp Ulsm ijgij-^
z
1 sisin
Proof. The proof here is analogous to that of Theorem 5.12. We only need to
notice that the probability of a failure-free operation for this case after all
substitutions of P((f) for E * ( t ) has the form
p(t) = i — n [i- e t c ) ]
lsism
where £*(/) is defined in Theorem 5.12.
CONCLUSION
This relatively new branch of reliability theory was initiated by Barlow and
Proschan and became widely known after their book [Barlow and Proschan
(1975)] was published. First papers on the properties of distributions with a
232 DISTRIBUTIONS WITH MONOTONE INTENSITY FUNCTIONS
^(1-7) to"0'7
*('„) - W ( t o )
where P*(t0) is the distribution of a stationary residual time:
Pm(t) - j f ' p{ x ) dx
REFERENCES
Barlow, R, E., and A. W. Marshall (1964). Bounds for distributions with monotone
hazard rate. I and II. Ann. Math. Statist., vol. 35.
Barlow, R. E., and A. W. Marshall (1965). Tables of bounds for distributions with
monotone hazard rate. J. Amer. Statist. Assoc., vol. 60.
Barlow, R. E., and F. Proschan (1975). Statistical Theory of Reliability and Life Testing.
New York: Holt, Rinehart, and Winston.
Barlow, R. E., A. W. Marshall, and F. Proschan (1963) Properties of probability
distributions with monotone hazard rate. Ann. Math. Statist., vol. 34.
Gnedenko, B. V., ed. (1983). Mathematical Aspects of Reliability Theory (in Russian).
Moscow: Radio i Sviaz.
SOLUTIONS 233
Gen den ko, B. V., Yu. K. Belyaev, and A. D. Soiovyev (1969). Mathematical Methods
of Reliability Theory. New York: Acadcmic.
Soiovyev, A. D., and I. A. Ushakov (1967). Some bounds on a system with aging
elements (in Russian). Automat. Comput. Sci., no. 6.
Ushakov, 1. A. (1966). An estimate of reliability of a system with renewal for a
stationary process (in Russian), Radiotekhnika, no. 5.
Ushakov, L A„ ed. (1985). Reliability of Technical Systems: Handbook (in Russian).
Moscow: Radio i Sviaz.
Ushakov, I. A., ed. (1994). Handbook of Reliability Engineering. New York: Wiley.
EXERCISES
5.1 Two units have the same mean T. One unit has a uniform d.f. and
another has an exponential d.f. Which one will deliver the larger
probability of failure-free operation at moment / = T? At moment
t = 2T1
5.2 Consider the Erlang d.f. of a high order (e.g., n = 10). Explain (without
exact proof) how A(0 behaves.
5.3 One has to choose a system for the continuous performance of an
operation during 100 hours. There are two possibilities: to choose for
this purpose a system with an MTTF of 200 hours or to choose another
system with an MTTF of 300 hours. Which system should be chosen.
5.4 What kind of A(r) has the system depleted in Figure E5.3?
x Figure E5.3.
SOLUTIONS
one
up unit is increasing in t. But one unit with an exponentially distributed
TTF has a constant failure rate. So, the time diagram for A(f) has the
form shown in Figure E5.2.
XTT)A
X
Figure E5.2. 0
* Unit 1 might be the cause of the system failure during all periods of
time.
* For a large time period, the parallel connection of units 2 and 3 with
probability close to 1 will consist of only one unit; thus the entire system
also will almost surely consist of two series units: unit 1 and one of the
units 2 or 3.
0 t
Figure E5.4.
CHAPTER 6
REPAIRABLE SYSTEMS
it has failed. The transition graph is presented in Figure 6.1. Here state 0
denotes an operating state, and state 1 corresponds to a failed state. This
graph has a simple interpretation. When in state 0, the unit might go to state
1 or stay at the current state. Leaving state 0 occurs with an intensity A, and
leaving state 1 occurs with an intensity f i .
The unit transition process can be described as an alternative renewal
process. It is represented by a sequence of mutually independent r.v.'s £ (a
unit up time) and t; (a unit repair time). Both £ and 17 have exponential
distributions with parameters A and n, respectively. A sample time diagram
is presented in Figure 6.2.
Using the graph of Figure 6.1, we can easily construct the following
formula:
This expression means that the transition process may appear in state 0 at
moment t 4- Af under the following conditions:
* It was there at moment t and did not leave during the interval At.
• At moment t it was in state 1 and moved to state 0 during the interval
At.
The conditional probability of leaving state 0 equals A At, and the condi-
tional probability of leaving state 1 equals p A t .
STATES
UP
DOWN
(6.3)
^-P0{t) = -A P0(r) +/*/>,(')
tfc
This represents the simplest example of Kolmogorov's equation. This equa-
tion expresses a condition of dynamic equilibrium. To solve it with respect to
any P k ( t ) , we need to have one more equation. It is clear that another
equation cannot be obtained in the same manner: it would be linearly
dependent on the first one and, consequently, would not be useful in
obtaining the solution. The second equation which should be chosen is the
so-called normalization equation:
(6.4)
Po(<)+Pi (0-1
which means that at any moment the unit must be in one of two possible
states.
We also need to determine the initial condition for the solution of the
system of differential equations. In this simple case the problem can easily be
solved in general when P 0 ( t 0 ) = p and, consequently, P£t0) = q, p + q — 1.
This problem can be solved with the help of different methods. We will use
the Laplace-Stieltjes transform (LST) to make the presentations in the book
uniform.
Recall that the LST <p(.0 of the function f i t ) is defined as
(6.5)
(In this context we consider functions f i t ) defined over the positive axis.]
Nonstationary Availability Coefficient The system (6.2) and (6.4) for this
case has the LST:
(6.7)
SINGLE UNIT 239
Thanks to the LST, the system of linear differential equations turns into a
system of algebraic equations. To solve (6.7), we can use Cramer's rule:
P
1 5 ps + p.
<Po(s) = 2 (6.8
A + s -p s + (A + f t ) s )
s s
To invert this LST, we have to present it in the form of a sum of terms of
type a / s or b / ( s + a). The inverse functions for these terms are a constant
and an exponential function, respectively.
To present the solution (6.8) in the desired form, we should find the roots
of the denominators of (6.8). They are: = 0 and s 2 — -(A + p). Now we
can write
B
B
<Po(*) = (6.9)
A
=—+
s ~ s, 5-5, s + A + ft
It is easy to find
(6.12)
A =
A + p.
kp - p,{\ ~p)
B = p —
A + p. A + p
SINGLE UNIT 241
V-
242 REPAIRABLE SYSTEMS
Finally, the nonstationary availability coefficient, that is, the inverse LST of
(6.10), is
IL Ap ~ u( 1 — p)
K ( 0 = p 0 ( t ) = —~-r + —ji -------------------(6.14)
f j L ~R A A T" f J L
If the original system state is operational, that is, if Pa(t) 1, the solution is
= - A - +
— — ( 6 . 1 5 )
A + fx, A + fi
(6.16)
A + fJL 1 + T
where T — 1/A is the unit's MTTF and r = 1/n is the unit's mean time to
repair (MTR).
We should notice that, in general, such a method of obtaining a stationary
availability coefficient is not excusable in a computational sense. For this
purpose, one should write a system of linear algebraic equations in a direct
way without the use of the system of differential equations. It is important to
realize that the stationary regime represents static equilibrium. This means
that all derivatives d P k ( t ) / d t are equal to 0 because no states are changing in
A+P
Figure 6.3. Time dependence of nonstationary availability coefficient Kit) for expo-
nential distributions of TTF and repair time.
SINGLE UNIT 243
time, "on the average." Consequently, all Pk's must be constant. It is also
clear that the initial conditions (the original state of the unit at moment
/ = 0) also will not make any sense. This assumption leads directly to the
following system of algebraic equations:
-A Pa +=0
(6.17)
P0 + F, = 1
where
Pk - hm Pk(t)
(6.18)
are the stationary probabilities of interest.
Again, the solution can be obtained with Cramer's rule
0 fL
11
P0 = K = (6.19)
-A (i A+ (i T+t
11
6
244 REPAIRABLE SYSTEMS
0
Figure 6.4. Nontransitive graph for computation of the MTTF of a renew-
able unit with an exponentially distributed TTF.
SINGLE UNIT 245
equation
This differential equation can again be solved with the help of the LST. First,
we write the following algebraic equation with the natural initial condition
P0it) » 1:
-1 + s<p0(i) = -A<p0($)
(6.2
1)
(6.22
)
Mean Time to Failure To find the unit's MTTF, we should recall that the
mean of nonnegative r.v.'s can be found as
E { X } = [ ~P(x) dx (6.23)
A)
E{*} = f>(*)e_JW<fcUo
o1
T*=
A + j
It follows that, to find the MTTF, we can use the solution for P0it) in
terms
of the LST and substitute s = 0. In fact, it is even sometimes simpler to solve
a corresponding system of equations directly with the substitution 5 = 0.
In other words, (6.27) means that either no failures have occurred or—if
failures have occurred—the last cycle 8 is completed by moment JT, 0 < x < t ,
and a new r.v. £ is larger than the remaining time, £ > t — x . The function
H i t ) in this case is
H(t) = £fl**(/)
VA:
Figure 6.5. Time diagram for an alternative renewal proccss describing a unit opera-
tion.
248 REPAIRABLE SYSTEMS
1 E{£}
E ii ^E6
j^ _ 1 <,i£n _ lsifi
E h + E Vi rVTTi r „
! sisn 1 <Li<.n - ^ .. ^
n n
I£/£« 1< i z n
and, if n is large, one may replace each sum with the coefficient 1 / n for the
mean of the respective r.v.
E{£}
K (6 29>
-mrm '
Nonstationary Interval Availability Coefficient Again, we can write the
integral equation
where £ is the residual time of the renewal process formed with the r.v.'s {£}.
From (6.32) it becomes clear that R U 0 ) differs from ffwrong(f0) = KP{t0\
In engineering practice, nevertheless, i?wroi,g(f0) is often erroneously used.
We should emphasize that £ and its residual time £ are statistically equiva-
lent only for an exponentially distributed r.v. Consequently, in this case (and
only in this case!),
For a highly reliable unit, (6.32) can be written in the convenient form of
two-sided bounds if F i t ) is "aging." For this purpose we use a result from
Chapter 5. Recall that
where f is the residual value of the renewal process formed with {£}.
For a highly reliable unit, we can write a very simple and very convenient
approximation
P(* o) = 1 ~
T
(6.33)
(6.34)
REPAIRABLE SERIES SYSTEM 251
F ( t ) - 1 - e~x' G ( t ) « 1 -
1. After a system failure, a failed unit is shipped to a repair shop and all
of the remaining system units are switched off. In other words, the
system failure rate equals 0 during repair. In this case only one repair
facility is required and there is no queue for repair.
2. After a system failure, a failed unit is shipped to a repair shop but all
the remaining system units are kept in an operational state. Each of
them can fail during the current repair of the previously failed unit (or
units). In this case several repair facilities might be required. If the
number of repair facilities is smaller than the number of system units, a
queue might form at the repair shop.
System with a Switch-Off During Repair The transition diagram for this
system is presented in Figure 6.6. We will not write the equations for this
case. As much as possible, we will try to
use simple verbal explanations.
where
A=I
1 sisn
2. MTTF
Let us consider a general case where all units differ by their repair time
1 /n(. The current repair time of the system depends on which unit has failed.
The distribution of the system's repair time can be represented in the form
Pr{77 ^ / } = £ p k e (6.36)
where p k is the probability that the /cth unit is under repair. The probability
p k can be easily found as
A
*
Pk = ^ A
(6.37
)
J SLS/L
Notice that the distribution of the system's repair time has a decreasing
intensity function; that is, with the growth of the current repair time, the
residual repair time becomes larger and larger.
REPAIRABLE SERIES SYSTEM 253
Z v -kPk
= ----------- „ £ p.kpk~E(p}
i— Pk l^k&n
J SISN
Now, as t oo,
, . P-k*Pk>
im
MO = „
T
syst ~ syst} ~ E Ti y ,A
1 ' I
1 iiin
We are able to find these reliability indexes only with the help of general
methods of renewal process theory, in spite of the exponentiality of a TTF
distribution. One can also use standard Markov methods applied to the
254 REPAIRABLE SYSTEMS
and the initial condition P0(0) = 1. We will not solve these equations here.
But if K ( t ) is found, then—because of the exponentiality of the TTF
distribution—R ( t , t 0 ) = K ( t ) e ~ A ' .
With known 2"syst and rsyst, this index can be found in a standard way as
K = (Tsys,)/(7LSI + rsyst). Note that in this particular case it is convenient to
write
1
K= (6.38)
1 + L A ,r,
1 sirsn
Because of the exponential distribution of the system TTF, we can use the
expression R ( t 0 ) = K P ( t 0 ) where P ( t 0 ) is defined in (6.35).
Notice that if all arc constant (equal to fi), the above-described system
is transformed into a single repairable unit with an intensity of failure equal
to
A- E A,
1 rsiin
and an intensity of repair n. In most practical cases it is enough for the first
stages of design to put /J. - E{/x} in this model and to use this approximation
instead of using the exact model. We remark that in most practical cases,
when the equipment has a modular construction, the mean time of repair
might be considered almost equal for different system units. But an even
more important argument for such a suggestion is that a mathematical model
must not be too accurate at the design stage when one does not have
accurate input data.
H2 —
-M
pwo)= n w o)=«
I SiS/x
1
T =sys — (6.39)
'A
KW ) = n KXO 1
1 S . i &n
z
= n = n T ^T T = n
1 Siin 7)- + Tj I 1 +
In this case it is not a simple task to find T ^, in a dircct way. But if wc use
the direct definition of Ksyst,
T
w _ K
sysI
T
1
+ T
syst ' sysl
then
1 - K
T - ------------- T (6.40)
syst ^ syst
System With Switch-Off During Repair First of all, fsyst(/0) and Tsys,
remain the same as in the previous case. The mean repair time is defined
with the help (6.36). Consequently, the stationary availability and interval
availability coefficients can be expressed in standard form. At the same time,
nonstationary indexes can be found with the help of the general methods of
renewal process theory. The model of the investigated operation process
forms an alternative process {£*, T J *}. Each is an exponential r.v. with
parameter A and 17* is an r.v. with a complex "weighted" d.f.
G * ( / ) - P r { , T < £ / } - - E A, G,(0
A
1Z i Z n
Sysfem Without Switch-Off During Repair In this case Psyst(f0) and Tsysl
remain the same as in the previous cases. Even such a stationary index as K
can be found only for the case when the number of repair facilities equals the
number of system units, that is, when all system units are totally independent.
In this case K is defined as
1
^ _ ___________
1 ^sysl^syst
If the system units are dependent through the lack of repair facilities, we
recommend the use of Monte Carlo simulation for the computation of
nonstationary indexes.
But if K or K i t ) is known, to find R ( t 0 ) or R ( t , t 0 ) is a simple task
because of the exponentiality of the system TTF: R ( t 0 ) = K P ( t 0 ) and
R(t,t0) = K(t) P(t{}) .
i S/Sn 1i ^ ~i
Now it is possible to write J?(r0) as
I
I S i S n ' i + Ti 'u
^ C o J - ^ ^ C o ) - n ytt fp'(x)dx
Also, we can again use the two-sided
bounds (6.33) if the unit TTF distribu-
tions are "aging":
n — n — ^ - e x p ( - r 0 Z ^ f
I
Isi'sn 1 4. — \ * i f I SIS" 1 + — \ l s i s n l
i)
(6.41)
260 REPAIRABLE SYSTEMS
*syst('o) = l- £ (6-42)
l^i^rt i
Considering the transition graph without an absorbing state, for a state Hjr
0 <;'<«= n, + «2 + n i + k , one may write Ay and M f .
A0 = k X + «,A + n 2 v X
A, = /cA + n, A + n 2 v X = A0
= Ao
A„ i+, = k X + n, A + («2 — 1)P A
A„ i+2 = &A + rtj A + («2 — 2 ) u X
A
n.,+„ 2 = *A + «,A
A „ 3 + „ 2 +, = k \ + ( n , - 1 ) A
A =
Mj + n2 + n, ^A
A
n, + n2+ n,+ l = ( k ~ 1) A
A
H,+«a+n,+ 2 = (* ~ 2)A
A
/i|+n2 + n3 + A
Mj = n, M2 = 2 / J L , . . . , M, = In, M,+ ] = l n , . . . , M n + k = l p
The system with the absorbing H n + X state is the system which operates
until a first failure. This system can be analyzed with the following system of
differential equations:
dpj(t)
- A,_,/> ,_,( *) - ( Ay + M j) pj( t) + M j+ i P j+ i( t) 0 < j zn + 1
(6.43)
A_, - A„ + 1 = A/0 = A/fl = ••• =A/n+, = 0 (6.44)
where P j ( t ) is the probability that the system is in state H t at moment t . The
normalization condition is
E PAO = i
0s/sn+1
GENERAL MARKOV MODEL OF REPAIRABLE SYSTEMS 263
The system with the reflecting Hn+k state can be described by the
following system of differential equations:
dp it)
=
~~di~ Vi*/-i(0 ~ (Aj+Mj)pj(0 + M
j+i Pf+ t(0 0 <j <n+k
A^, = An+Jt = M0 - Mn+k+l « 0
E MO = 1
Because our goal is not to write down formulas for very general models but
to show the methodology and methods, we hope that the reader can use the
corresponding equations from Section 1.6 dedicated to the death and birth
process.
Precise formulas for such a general case are almost always long and
complicated. If one deals with highly reliable systems, we recommend the
reader refer to Chapter 12. (If one deals with an unreliable system, we
recommend a redesign of the system, not a useless calculation!)
The next section is devoted to general methods of analysis of repairable
systems.
where s i is the state of the z'th unit. We set s( = 1 if the unit is operational
and s■ = 0 otherwise. The transition from ( . s l , . . . , s i = 1,...,s n ) to (5,,,..,
5, = 0,..., s n ) means that the /th unit changes its state from up to down. The
transition rate (or the transition intensity) for this case equals the /th unit's
failure rate.
A transition from system state (s,,,.., = 0,.,., .?„) to state (5,,...,
Sj = 1,..., s„) means that the /'th unit was in a failed state and was renewed .
The transition rate for this case equals the 1 th unit's repair rate. These kinds
of transitions are most common. For Markovian models we assume that only
one unit may fail (or be renewed) at a time. (If several units may change
states simultaneously, for example, under a group repair, we will consider
this separately.) Of course, there are other possible interpretations of states
and transitions. For instance, s, = 1 may be a state before monitoring or
switching, and s, = 0 is the same state after the procedure. We denote these
transitions from state to state on transition graphs with arrows. The rates
(intensities) are denoted as weights on the arrows. The graph structure is
determined by the operational and maintenance regime of the system's units
and the system itself. After the transition graph has been constructed, it can
be used as a visual aid to determine different reliability indexes. An example
of such a transition graph for a system consisting of three different units is
presented in Figure 6.9.
d
0 = " M O £ Alk + £ A i k P i ( t ) (6.45)
al
i ^ e(k ) / e£ (A )
where Aik is the intensity of the transition from i to k, and p,(f) is the
probability that the system is in state //, at moment t.
The transition graph and the system of differential equations can be
interpreted as those which describe a dynamic equilibrium. Indeed, imagine
that each node i is a "basin" with "liquid" which flows to each other node j
(if there is an arrow in the corresponding direction). The intensity of the flow
is proportional to Au (specified in each direction) and to a current amount of
"liquid" in the source P J ( t ) .
If there are n states, we can construct n differential equations. To find the
nonstationary coefficient of availability, we take any n — 1 equations, add the
GENERAL MARKOV MODEL OF REPAIRABLE SYSTEMS 265
Figure 6.9. Transition graph for a system consisting of three different renewable
units.
normalization condition
L PiO) = 1 (6-46)
I
and add initial conditions of the type p f ( 0) = p t where p,-(0) is the probability
that the system is in state i at t = 0. In turn, the p/s are probabilities that
conform to a normalization condition similar to (6.46). If Pi =■ I for some i,
then p j = 0 for all j , j + i . In most problems the initial system state is the
state when all units are up.
To find the nonstationary availability coefficient, we can use the
Laplace-Stieltjes transform (LST). Then the system of n linear differential
266 REPAIRABLE SYSTEMS
E »<*(«) = 1 (6
I
f Jt ( * ) - f p ( ( t ) e - " d t (
o
f>n<Pi(s) + f>„<p2(s) + ■ ■ • + b l n ( p „ ( s ) = ct
b
2l< Pl( s) + + ' ' ' +b
2n<Pn(s) = C
2
(6.50)
where b u is the coefficient of the yth term of the rth row and c { is the
corresponding constant.
To solve this system of linear equations, we can apply Cramer's rule:
Z)j(s)
*<•> - m (6-51)
A 0 + A . s + A 2S 2 + + A s "
< p ( s ) - ---- ------ , "
-------------------------------------- (6.53)
2 +l K
' B 0 + + B 2S + • • • + B n+is" '
where A/ and Bj are known coefficients.
2. Find the polynomial roots:
B 0 + B ts + B 2S 2 + • ■ ■ + B n + ls"
+l
=0
, . Pi ,+ \ , ..
9s _ + ------- + ... + ------------- ----- (6.54)
s~bt s-b2 s - bn+t
where the jS/s are coefficients to be found.
4. Rewrite in the form
E
<p{s) =
( s - & , ) ( $ - b 2 ) - - - ( s - bm+l)
f(0- £ ~T ~ K(t) - £
5
/Hj eb''
\S j£ n + l +I
REMARK. If a(r) has multiple roots for the denominator, that is, if several b/ s are equal, then
(6.54) may be rewritten as
*•>- £ T h V
Isisn' ($ ~ b j )
where k is the number of roots equal to b, and n' is the number of different roots. To all terms
of the form
Pi
(s-fy)*
p. r*~
(M - b f Y '(*-i)! ,b,t
d
-RTPK{*) - -P*(0 E AKL + E AIKPI(0
al
jee (fc ) i^E(k)
We again use the LST to find the following system of linear differential
equations:
for all k e E . The solution of this system of equations can be found with the
help of the same methodology as before.
T - f"p(t) dt
'n
If <p i s ) is the LST for the probability of failure-free operation of the system,
then
re~"Pit)dt
T = Jn
Thus, we can find the MTTF (or MTBF) by solving the following system:
for all k e E . Note once more that this system was derived from (6.55) by the
substitution of s — 0. To find the MTTF, one sets the initial conditions as
PiiO) = I, where i is the subscript of a state in which the system is totally
operable. Obviously, pjiO) = 0 for all the other states. To find the MTBF, we
set the initial conditions in the form p*i0) - pf where the />*'s in this case
are the conditional stationary probabilities of the states i that belong to E * .
The latter is a subset of the up states which the process visits first after the
system renewal.
The conditional stationary probabilities p f ' s can be obtained from the
unconditional ones as
P,( 0) Pf =
E />,(o)
GENERAL MARKOV MODEL OF REPAIRABLE SYSTEMS 271
Example 6.1 Consider a repairable system of two different units in parallel
(Figure 6.12). The parameters of the units are A,, A2, and p2. Both units
can be repaired independently. The transition graph is presented in Figure
6.13. Here //n is the state with both units operational; H x ( H 2 ) is the state
where the first (second) unit failed; H n is the state where both units failed.
Let p k ( t ) equal the probability of the &th state at moment t . There are
two systems of equations to calculate the reliability indexes. If the system's
failure state H n is reflecting, the system of equations is
d
-7T/>o(') = -(A, + A2)p0(/) + + M2P2C)
d
-TPi(0 = A , P 0 ( 0 ~ ( A 2 + P-x)Pi{t) + P i P n O )
d
— p 2 ( 0 = A2p0(0 - (A, + p 2 ) p 2 ( t ) + P i P n ( t )
P o(0 +Pi(0 + P i ( t ) + P n ( 0 - 1
Hi
= ~ ( A I + A 2 > / ' o ( f ) + M , P ]( t ) + M 2 P 2 O )
A
= I P O( 0 - ( A2 + M 1 ) / >, ( ')
~ p 12( 0 = A 2p , ( f) + A,p 2( / )
10
M2 M O( 0)
0 p,(0)
1 1
T= - ( A , + A 2)
Mi
M2
A,
(AO + MI)
0
0
+ fi2)
-( A , + f i 2 ) p2 (
0)
-(A, + A2) Mi
A, ~(A2 + Mj)
A, 0
1
~(Ao + Mi)
0
1
0
Jt.0) _ _.
(A,+A2) Mi
A] -(A2 + fLi )
A,
0
-(A, + fi2)
M2
0
-(A, + m2)
If state Hn is absorbing,
274 REPAIRABLE SYSTEMS
1 I 1 1
-( A , + A2) MI M2 0
A, -( A 0 + MI) 0 M2
a2 0 - ( A , + n2) MI
1 1 1 1
"( A,+A2) MI M2 0
A, ~(A2 + MI) 0 M2
A2 0 "( A, + M 2) MI
TIME REDUNDANCY 275
The solutions are not presented in closed form because of their length and
complexity.
starts from the beginning but for a smaller time interval. This verbal explana-
tion leads us to the recurrent expression
R 0 ( e \ T ) = P ( 9 ) +J [ B R 0 ( e \ T - x ) d F { x ) (6.56)
o
If the remaining interval is smaller than 9, the operation cannot be per-
formed successfully. This leads to the condition
R0(6\x < 0) = 0
/?O(0|T) = P ( 0 ) + (J 8 R o ( 0 \ T - x ) d F ( x ) (6.57)
o
where the function R 0 under the integral must be taken from (6.56) with the
corresponding condition.
Of course, in this case we must again write the condition
K(0U < f „ ) = 0
which means that a system cannot successfully perform its operation if the
time resource is smaller than the required time of operation.
X
R(d\T) =P(d) + C R(t0\T-x-y) dG(y) d F ( x ) (6.58)
Jn Jq
dF(x)
R
J
(t0\T) =K P(0) + f [ ~xR{t0\T-x~y) dG{y)
o L-'o
This expression is correct for the case where a system starts to perform
at t = 0.
dF(x)
Q
J
{v\T) = P(T) + f7\(VQ(V\T-x-y)dG(y)
o l/o
Q(v\x^v) = 1
d F ( x)
P(T) + [ f Q(V\T-x~y) dG(y)
A) l/o
Q(v \x <r,) = 1
CONCLUSION
REFERENCES
Ascher, H., and H. Feingold (1984). Repairable System's Reliability: Modelling, Infer-
ence, Misconceptions and Their Causes. New York: Marcel Dekker.
Cherkesov, G. N. (1974). Reliability of Technical Systems with Time Redundancy.
Sovietskoe Radio: Moscow.
Gertsbakh, I. B. (1984). Asymptotic methods in reliability theory: a review. Adv. in
Appl. Probab., vol 16.
Gnedenko, B. V., Yu. K. Belyaev, and A. D. Solovyev (1969). Mathematical Methods
of Reliability Theory. New York: Academic.
Gnedenko, D. B., and A. D. Solovyev (1974). A general model for standby with
renewal. Engrg. Cybernet. (USA), vol. 12, no. 6.
Gnedenko, D. B., and A. D. Solovyev (1975). Estimation of the reliability of complex
renewable systems. Engrg. Cybernet. (USA), vol. 13, no. 3.
Kredentser, B. P. (1978). Prediction of Reliability of Systems with Time Redundancy (in
Russian). Kiev: Naukova Dumka.
Rudenko, Yu. N., and I. A. Ushakov (1989). Reliability of Energy Systems (in Russian).
Novosibirsk: Nauka.
Solovyev, A. D. (1972). Asymptotic distribution of the moment of first crossing of a
high level by birth and death process. Proc. Sixth Berkeley Symp. Math. Statist.
Probab., Issue 3.
Ushakov, 1. A., ed. (1985). Reliability of Technical Systems: Handbook (in Russian).
Moscow: Radio i Sviaz.
Ushakov, I. A., ed. (1994). Handbook of Reliability Engineering. New York: Wiley.
EXERCISES
SOLUTIONS
6.1 The stationary availability coefficient depends only on the mean and not
on the type of distribution of the TTF and repair time. Thus, K =
(100)/(100 + 0.5) = 0.995. If the system is found within a failure-free
interval, which is exponentially distributed, then the probability of
successful operation of length f0 beginning at an arbitrary moment of
time can be written as
lim P ( t , t + t 0 ) =e~">/T
l — oo
282 REPAIRABLE SYSTEMS
SOLUTIONS 255
exp(-(0.5/100)) = 0.995
and
3X 3\ 3X 3X 2 A.
Figure E6.1. Transition graph for the system described in Exercise 6.2.
The transition graphs describing these models are presented in Figure 7.1
(there are two of them: with and without an absorbing state). Corresponding
particular cases for different regimes of redundant units and different at-
tributes of the repair shop are reflected in Figure 7.2.
286 REPAIRABLE DUPLICATED SYSTEM
©
L
n>j
© 1 n f1!
©
(a)
©
0 0
ir v
Q
2T'
ri tt
0
©
©
X"
a
i k
©
(b)
Q,
\v
VL
© ©
© Xw
© ©
Figure 7.2, Transition graphs for four main models of a renewable duplicated system:
(a) a loaded redundant unit, two repair facilities; (fc) a loaded redundant unit, one
repair facility; (c) an unloaded redundant unit (spare unit), two repair facilities;
(d) an unloaded redundant unit, one repair facility.
A0 +
+ ft i + s 0
As0 + 5 -Ml 0
—"An A] + + s ~ti
A„ s 2
S
A()A
,
Thus,
+ fit-1) = 7 ~ ( f 2 ( s )
- g2 + + A' + + ^ + AqMz +
s[i2 + s(A0 + A, + /a, + fi2) + A0A[ + A + /i,/i2]
2. Find the roots of the denominator. The first two roots of the denomina-
tor are conjugate, that is
where, in turn,
a = A0 + A, + fjLx +
A
P = 0Al + A
0M2 + Ml/^2
and s 3 = 0.
3. Find the unknown values A , B , and C by equalizing the polynomial
coefficients of the numerators.
4. Apply the inverse LST to obtain simple fractions with the numerators
A , B , and C found above to obtain the final result.
A0A,
K(0 = 1- (7.3)
J iJ'-
s i s->
(I" .-(A+pOM
^(0 = !-(!- K *( f)) = 1 -
A + JLA
9+
« Mi
A0 -(A 0
AnA
i
K=1-P2 =1~ =1-
Mi 0 A0A, + A0^2 + M1M2
111
-A,
0
290 REPAIRABLE DUPLICATED SYSTEM
A -(At + Mi) M2
111
(7.5)
A0At
A0M2 + M1M2
1
a
oM2 + M1M2
-A 0
To obtain values of this reliability index for different cases, the specific A's
and /A 'S should be substituted. The results for the four most important cases
depicted in Figure 7.2 arc presented in Table 7.1. In this table we used the
notation y = For highly reliable systems with A « I, all of the expres-
sions in Table 7.1 can be easily transformed to obtain the approximations
given in Table 7.2. The expressions in Table 7.2 allow one to give an
understandable explanation of all of the effects. Naturally, the worst value of
K gives the case of active redundancy and restricted repair (the failure rate
1 + 2y 1 + 4y
(B)
11
y1 y2
1 + — --------------r 1+
2(1 + y ) 1 + y
(A) Loaded redundant unit; (B) unloaded redundant unit; (a) two repair
facilities; (b) one repair facility.
MARKOV MODEL 291
v2 y 27
(B) 1
Y 1
1
2(1 + y)
(A) Loaded redundant unit; (B) unloaded redundant unit; (a) two repair
facilities; (b) one repair facility.
A0 is the largest and the repair rate p2 is the smallest). The case of
unrestricted repair yields a mean repair time of less than one-half of the
restricted repair time (1/2/a and \/fi, respectively). Below in this section we
will show that the MTTF of highly reliable systems of active redundancy is
one-half of the MTBF for standby redundancy.
Of course, for two independent units, that is, when the redundancy is
active and the repair is unrestricted, we can write
K= i - (i - K*y = i -
A+p
K * . Then
2A + n2 1 1
2
A + 2A + \j}
1+ 1+
2A + M2 1 + 2y
lim K ( t )
t
-p0(O = -\0P0(t)
dt (7.6)
/>,(') =A0/>0(0 - ( A, +fi l) P1(t)
PO(0 ) = 1
The LST of (7.6) is
(A0 + s)(p0(s) - = 1
(7.7)
- A0< p0(s) + (A, + M i + s)(p,(s) =» 0
1 "M i A0 1
0 A, + Mi + s -A0 0
<Po(s) + =
A0 + s "
Mi
— A0 A, + M i + s
s + A0 4- A, + M i (7.8)
2
s + s ( A 0 + A , + M i) + A 0 A ,
1
p«»(0 = (s?ef*' — s2es1') (7-9)
r* - t*
J, S2
where the superscript (0) a stands for the initial conditions Po(0) = 1
and also
<«*r
-P
a * = A0 + A[ + M i
(3* = A 0 A ,
0
( A 0 + s ) < p 0 ( s ) - M i< P i( s ) =
A
- o<Po(^) + (Ai + M i + s)(pi(s) = 1
MARKOV MODEL 295
0 -Ml
1 A , + Mi + J
+ A0 0
-A0 1
A0 + s - Mi
— A 0 A ) + Mi + $
A0 + Hi
2
s + s(A0 + A, + ju-j) + A0A,
(7.10)
Notice that the denominator in (7.10) is the same as in (7.8), so we
can use
the roots (eigenvalues) obtained above. Omitting routine
transformations, we
may write the final result for this case
P
°\0 = "T^ [(*? - A 0 - A,)** - ( 4 - Ao - A.Kf] (7.11)
0 Ml 0
0 -(A, + Ml) M2
1 11
M1M
Pn =
"A Mi 0 A 0 A i + 2A 0 M J + M 1 M 2
0
A0 " ( A , + Mi) M
2
1 1 1
and
-A0 0 0
A0 0 M2
P, =
11 1 AOM2
-A Mi 0 A0A, - A0M 2 + M 1 M 2
0
Ao " ( A , + Mi) M
and the solution in the LST is 2
1 1 1
296 REPAIRABLE DUPLICATED SYSTEM
( A 0 + 5 ) p 0 ( 0 ~ M i< P l( 5 ) - P o
- A 0 < p 0 ( s ) + ( A „ + A t + s)t p l = P ,
This index can also be found in a different way, using the Markov property
of the process. We can write
substitution of s = 0:
A A A
i o i
Ao0o ~ M i ^ i =
1
- A0e0 + (A, =0
MARKOV MODEL 297
where 0, and d2 are values such that the MTTF = 0, + 02. The solution of
this equation system yields
1 A , + Mi 1 Mi
0 n = — and 0 , = — - - -1- - - - - - - - - - — +
A[ A0A, A0 A0Aj
T02 - ~ + Tu (7.13)
Aft
because the process inevitably moves from state 0 to state 1, After this, based
on the Markov property, the process can be considered to be starting from
state 1.
The process stays in state 1 for an average time 1/(A2 + ft,) and then
moves either to state 2 or to state 0, It moves to state 0 with probability
/i)(A2 + ^j) and then starts traveling again from state 0. Hence, we can write
T\ Z = V (
A
Z + + OI7O2>/(
A
2 + PI ) (7.14)
MTTF - T02 = 11 A 0 + A , + ft t
Mi A 0 A+
,+/ i, A0A,
A, + M - i
= + 1 + /i|
—
A0 A! AgA,
1 Pi
MTBF = ri2 = — +
Now, on a very understandable and almost verbal level, we can explain the
difference between the MTBF (or the MTTF) for repaired duplicated sys-
tems of identical units which have a different regime for the redundant unit.
For active redundancy Au = 2A and for standby redundancy A0 = A. In other
words, in the first case, the system stays in state 0, on average, one-half the
time that it stays in the second case. This fact can be seen more clearly from
DUPLICATION WITH AN ARBITRARY REPAIR TIME 299
and
MTTF
—.' I+ X + 7 - £ (7J6)
Incidentally, (7.15) and (7.16) could be explained on the basis of the Renyi
theorem. Consider an alternative renewal process describing the operation of
a repaired duplicated system.
For a highly reliable system, this process can be approximately represented
as a simple renewal process if one neglects small intervals of being at state 1.
The system's successful operation period consists of the sum of a random
number of intervals of the length 1/A0 until the process has jumped to state
2. This random number has a geometrical distribution with parameter p =
Mi/(At + Mi) ~ 1 ~ T- Thus, the sum also has an exponential distribution
with parameter A()y. This means that approximately
It is clear that after an operational unit has failed, the redundant unit
replaces it and becomes operational. A system failure occurs if and only if
the operating unit fails during the repair of the other unit, that is, when both
of the system's units have failed.
Let us find the distribution of the system's time to failure Rsit). A
failure-free operation of the duplicated system during a time period t can be
represented as the union of the following mutually disjoint events:
1. The first failure in the system occurs after moment what happens
with the redundant unit does not play any role. The probability of this
event is exp[( —A -I- A,)/].
2. The first failure of either of the two units occurs at some moment
z < t , the failed unit is not repaired during the interval (t - z ) , but the
unit in the operating position has not failed up to t . The pro-
bability of this event is
3. The last event is the most complicated. In this case, as some moment
x < t , the duplicated system comes to the initial state, that is, state 0,
where both system's units are operational. This occurs if one of the
units has failed during the interval [ z , z + d z ], the repair has taken
time x - z, and the operating unit has not failed during repair. After
the completion of the repair, the system operates successfully during
the remaining period of time (t - x) with probability Rit — x) => 1 —
Fsit - x ) . The probability of this event is
f°°R(t -x ) dx [ \\ + - z)dz
Jn Jn
Now it is easy to write the final equation for the probability of a system's
failure-free operation:
+
J
f'R(t - x)e~*'(\ + Aj) dx f X e ~ xJ< * g ( x - z ) d z (7.18)
o o
Thus, we have an integral equation with a kernel of the type
a { s ) = r eJ ~ s ' A ( t ) d t
o
b(s) = f %J - " B ( t ) dt
o
<P(j) = Jre~*'dG(t)
o
< p( s) = fe - "R( t) dt
o
then the solution can be represented in the form
= + <p(s)fc(5) (7.21)
a(s)
302 REPAIRABLE DUPLICATED SYSTEM
5 + A + (A + A,)[l - i p ( s + A)]
(s + A + A.){i + A)
(7.23)
b(s) = ------------------
v
' A + A[ + s
r -/ M A + (A +A,)[1 -^(A)] 1 1
T s = ¥>(5)1,,o = -7T , rrr;—„„ = - +
A(A +A,)[1 -*(A)] A (A+ ^[1 "*(*)]
(7.25)
1 1
T - ..... .......+ -
J
(A + A,)a A
("AO
1-L dG(t)
Jft
Such a representation is very useful if the system is highly reliable, that is,
when AT ■« 1. Then the following approximation is true:
A2 A2
A « A E{ TJ} - Y E{77 2 } = AT - [ R 2 + Var{r,}]
—
a = Pr{£ £ T } - 1 - e-Ar
A M 1
a = Pr{£
1 < r} = ------------ « 1 ------------- = 1 - ---- -------
' A + FT A +M 1 + AT
approximately,
A « AT - (AT ) 2
Example 7.3 Find a when the repair time distribution is normal with mean
equal to r and variance equal to c r 2 . Find the LST for this distribution, using
essentially the same technique that we applied in Section 1.3.3 for obtaining
the m.g.f.
£(5)
We remind the reader that the LST and the m.g.f. differ only by the sign of
the argument s. Thus,
a = 1 -<p( A) = 1
A ~ AT + ±\ 2 A 2
DUPLICATION WITH AN ARBITRARY REPAIR TIME 306
(7.27)
This means that the probability that the operational unit fails during repair
goes to 0,
Under this condition the appearance of some limit distribution of a
system's failure-free operation is expected. Of course, if aM 0 the system's
MTTF goes to <». To avoid this, we must consider a normalized random TTF,
namely, a£. It is clear that this new r.v. has a constant mean equal to 1
independent of the value of a. The distribution of this r.v. is
Pr{«£ > t} =
R
e ~ A f( l - e ~ a s' ) dG ( t)
A
*(A) - ^(as + A) = /V '(I
o
as as
STANDBY 307
REDUNDANCY
WITH
< I te
ARBITRARY
DISTRIBUTION
S
308
REPAIRABL
Ethat is,
DUPLICATE
D SYSTEM 2
a£s
Y(A) - + A) = — 0
A
2
as
a$(as) = a-
as + A + (A + A,) | a + — 0
(7.29)
a2 s S + A + A.
(as + A) as + (A + A,) | a + — 0
For practical problems, this means that for a small value of a the following
approximate expression can be used:
R(t) W e-<***d*
(7.30
)
R ( t ) = » e~'/T (7.31)
where T has been defined in (7.25). Incidentally, (7.31) is more accurate than
(7.30): it has an error of order a in comparison with a 2 associated with the
latter.
This can be explained in the following "half-verbal" terms. The random
time to failure of a duplicated system consists of a random number of cycles
of the type "totally operational system — successful repair" and a final cycle
of the type "totally operational system - unsuccessful repair." Stochastically,
all cycles of the first type are identical and, by the assumption of the
exponentiality of the distribution, are mutually independent (the latter as-
sumption is based on the memoryless property). The only cycle differing from
these is the last one. But if to suggest that the number of cycles of the first
STANDBY REDUNDANCY WITH ARBITRARY DISTRIBUTIONS 309
type is large, on average, the distribution of the system time to failure can be
approximated by the exponential distribution.
The use of the approximations (7.30) and (7.31) requires the value of a .
This value can be obtained easily in this case. Moreover, if we know that
310 REPAIRABLE DUPLICATED SYSTEM
Git) has a restricted variance, then in limit a « A T . (Of course, the condition
tr -» 0 is necessary.) In turn, this means that the following can be obtained:
R it ) < * (7.32)
If the conditions for the variance of Git) do not hold, the latter expres-
sion, of course, is wrong. The reader can verify this with an example of a
sequence of two-mass discrete distributions G ^ t ) , G 2 i t ) , G n i t ) , each with
two nonzero values of probability: at 0 and at some positive point. Let all of
the distributions have the same mean but, with increasing n, the probability
at the positive point becomes smaller with moving of the point to the right
along the axis. The variance in this case is infinitely increasing with
increasing n .
For standby redundancy the results can be obtained for the most general
case, namely, when both distributions—of a random TTF and of a random
repair time—are arbitrary. Let us use the same notation for the distributions:
F i t ) and G i t ) . The duplicated system's operation can be graphically repre-
sented in Figure 7.3.
The system's operation consists of the following random intervals. The first
one endures until the main unit fails; its random length is The second
interval and all of the remaining intervals, k = 2,3,..., are successful if
and only if each time a random failure-free time of the operating unit (k is
longer than the corresponding random repair time of the failed unit rjk. The
last interval when a system failure has occurred has a random duration
different from all of the previous ones: this is the distribution of the random
TTF under the condition that £ < T J . All of these explanations become
transparent if one considers a constant repair time: the first failure-free
interval has unconditional distribution of the r.v. all of the remaining
intervals (except the last one) have a conditional distribution under the
condition that £ > 17; and the last one has a conditional distribution under
the condition that £ < 17. In other words, the first of these distributions is
positively biased and another is truncated from the right.
Figure 7.3. Time diagram for duplicated system operation with a standby redundant
unit.
STANDBY REDUNDANCY WITH ARBITRARY DISTRIBUTIONS 311
The first term of the sum reflects the fact that during time ( no failure
occurs. The expression under the integral means that the first failure occurs
in the interval x + dx], but the repair of the failed unit has been
completed up to this moment, and from this moment on the system is in the
same state as in the previous moment at moment £0. Thus, this is the
regeneration moment for the renewal process under consideration.
The final goal is to find the distribution of the random value £0 + g* and
to express the probability of the system's successful operation P ( t ) :
(7.34)
< p ( s ) =J f d F * ( t )
o
< p( s) = Ce - dQ it)
J
n
and, finally,
<p(s) = (7.35)
1 -0(5)
STANDBY REDUNDANCY WITH ARBITRARY DISTRIBUTIONS 312
$(0 -
< p ( s ) = &( s ) $ ( s ) - (7.36)
® ( J) Hf)
i -*(.)
From this LST, the system's MTTF can be found by setting 5 = 0. But in this
case we prefer a more direct way:
= E{I0) + E( £ ^ \ - R +—
«„= A l - 3 , ( 0 1 d F ( t ) ^ 0
Jn
- x „ ( a n s ) = jf( 1 - - G„(t)} d F ( t )
C n fJ [ l - G n ( t ) } d F ( t ) + f t d JF ( t )
o c„
£ a„s
C„ =
METHOD OF INTRODUCING FICTITIOUS STATES 313
then both terms in the last set of square brackets go to 0. This leads to the
statement:
hm ----------- = 1
(i— a„
Pr{ A „ T < r} = Q n
< p n { a n s ) = 4>(anO
1 - <!>(«„$)
X „(<x„s)
a
1 - <j>(a„Q + i + sT
nO = Pr{£syst > r}
is true.
Pi> 0 v / , £ p f = l
N
(Af)""'
N R ?(0 (7.39)
Both the Erlang and the generalized Erlang distributions belong to the IFR
class. Notice that the generalized Erlang d.f. can naturally approximate a
wider subclass of distributions belonging to the IFR class.
It is reasonable to remember that the Erlang distribution represents an
appropriate mathematical model for standby redundancy. Indeed, the pro-
cess of a standby redundant group's operation can be described as a se-
quence of a constant number of periods of a unit's successful operation.
Thus, (7.38) can be used as a possible approximation of the IFR distribu-
tions, and (7.39) with (7.40) can be used for the DFR distributions. Of course,
such an approximation leads to an increase in the number of states in the
stochastic process under consideration. (Nothing can be obtained free, even
METHOD OF INTRODUCING FICTITIOUS STATES 315
IFR Repair Time and Exponential Time to Failure For some applied
problems it is natural to use the exponential distribution for a random TTF.
At the same time, to assume an exponential distribution for the repair time
might seem strange: why should the residual time of repair not depend on
the time already spent? If a repair involves a routine procedure, a more
realistic assumption involves the IFR distribution of this r.v. To make this
statement clearer, we consider a repair process as a sequence of several
steps: if one step is over, the residual time of repair is smaller because now it
consists of a smaller number of remaining steps.
In this case two failure states might be introduced for a unit: state 1 and
state 1*, both with an exponentiatly distributed time remaining in each of
them. These sequential states represent the series sequence of two stages of
repair (see Figure 7.4a). The total random time of staying in a failed state
subset is the sum of two exponentially distributed random variables and,
consequently, will have an IFR distribution. Incidentally, in this case (7.40)
has the following expression:
— -
( 7 ,
4 1 )
A, — A2
(ft) with an IFR distributed failure-free time and an exponentially distributed repair
time.
METHOD OF INTRODUCING FICTITIOUS STATES 317
Pi A
p x A + (1 ~Pl)\
Figure 7.5. Transition graphs for a multistate model of renewable units: (a) with a
DFR distributed repair time and an exponentially distributed failure-free time;
(b) with a DFR distributed failure-free time and an exponentially distributed repair
time.
318 REPAIRABLE DUPLICATED SYSTEM
Figure 7.6. Transition graphs for a multisiate mode) of renewable units: (a) with an
IFR distributed repair time and an IFR distributed failure-free time; (ft) with a DFR
distributed repair time and an IFR distributed failure-free time; (c) with an IFR
distributed repair lime and a DFR distributed failure-free time; (d) with a DFR
distributed repair time and an DFR distributed failure-free time.
METHOD OF INTRODUCING FICTITIOUS STATES 319
11o(0 = ~ X 0 K Q ( t ) + p 2 K 2 ( t )
K[ ( t) = \0 K0 ( t) - fi^ t)
(7.42)
l= K{ } ( t) + K] ( t) + K2 ( t)
K0( 0) = 1
'we put "long" and "short" in quotation marks because we consider r.v.'s with corresponding
large and small means, but this does not mean that an r.v. with a larger mean cannot be less than
an r.v, with a small mean, and vice versa. For simplicity, we use these terms for r.v.'s.
s ( s - 5 i ) ( s - s2)
M1M2 m \ M a M1M2
A =
A
O(MI + M2) + M1M2
320 REPAIRABLE DUPLICATED SYSTEM
where
a = A0 + ju, + jt2
b = A0(/ti, + Mi) + M1M2
Note that the discriminant of the denominator is negative for any A0, and
H2, which leads to the complex roots jj and s 2 .
A representation of <p0(s) is found in the form
A B C
<Po (s) - T + ---- 7 + 7 7
s s — s, s — s 2
(A + B + C)s2 - + 5 2 ) + Bs 2 + C S ] ] s + A s t s 2
(7.45)
322 REPAIRABLE DUPLICATED SYSTEM
B + c= i ---------- Ld
b
~ C i ] j b - ~ Bi=0
t
324
REPAIRA
BLE
write
DUPLICA
TED
SYSTEM e~*i' + e-*2< = 2 cos
The final result is
M +
1
M
2
b
*o(') «
part of the main unit can be monitored continuously. The state of the
remaining nonmonitored part of the main unit can be checked only periodi-
cally. In other words, if a failure has occurred in the nonmonitored part of
the main unit, no switching to replace this failed unit is performed. A
periodic test discovers that the main unit has failed and only then the
switching might be performed. Thus, before this test, the duplicated system
remains in a state of "hidden failure." The switching is assumed to be
instantaneous. The redundant unit is continuously monitored, so a repair of
the failed redundant unit begins instantly after a failure has occurred. Of
course, the same thing happens if a failure occurs in the monitored part of
the main unit.
During a repair of the main unit, all of its failures—both in the monitored
and nonmonitored parts—are deleted. In other words, the repaired unit
becomes as good as new. As soon as the failure of the main unit is detected
(by any means—continuous or periodical monitoring), the redundant unit is
switched into the main position. After repair, the unit becomes the redun-
dant one. If one finds both units have failed, the total system repair is
performed. After repair, the system, as a whole, becomes as good as new.
For the use of a Markov model, let us assume that monitoring is provided
at randomly chosen moments of time. Moreover, assume that the distribution
of the length of the periods between the tests is exponential. We mention
that such an assumption is sometimes close to reality: in a computer, tests
can be applied between runs of the main programs, and not by a previously
set strict schedule.
The transition graph for this case is presented in Figure 7.8. The following
notation is used in this example:
Figure 7.8. Transition graphs for a duplicated system with a partially monitored main
unit: (a) graph including instantaneous "jumps" (intensity equal to (b ) equivalent
graph excluding the state in which the system spends no time.
redundant position. The unit's failure rate depends on the occupied position:
operating or waiting. Let us assume that only a part of each unit can be
monitored continuously. (The monitored parts are identical in both units.)
The state of the remaining nonmonitored parts of each of these units can be
checked only periodically. Tests of the main and redundant units have
different periods (intensity).
The switching system is analogous to that described in the previous
example. If one knows that both units have failed, the repair is performed
until complete renewal of the system. Let us consider two possible means of
repair: (a) there are independent repair facilities for each unit, and (b) there
is only one repair facility. After repair, the unit becomes as good as new.
The transition graph for this case is presented in Figure 7.9. The following
notation is used in this example:
These failure states are again denoted by bold frames in the figure.
DUPLICATION WITH SWITCH AND MONITORING 329
The transition graph for this case is presented in Figure 7.10. The follow-
ing notation is used in this example:
DUPLICATION WITH SWITCH AND MONITORING 331
These failure states are again denoted by bold frames in the figure.
Note that the switching device may be one of the two following main types:
(a) as we considered before, a switching failure does not interrupt the
system's operation, or (b) a switching failure interrupts the system operation.
In the latter case, the switch is a necessary part of the system. This may
occur, for example, if the switch plays the role of an interface between the
duplicated system's output and the input of another system or subsystem.
Of course, there are many other concrete examples of this type. We can
only repeat that our main goal is to explain the methodology of modeling and
not to give a list of the results or to make the reader exhausted with boring
solutions of bulky equations.
The mathematical techniques used in this section are simple enough. But
the results obtained are not always very clear or "transparent" for further
analysis: What will happen if one changes some parameters? What will
happen if the switching or monitoring methods are changed? Of course, in
practical situations an engineer would like to have correct and simple
formulas to perform a quick and understandable analysis of the designed
system. Fortunately, for highly reliable systems (we emphasize again that this
is the most important practical case!), it is possible to develop such simple
and sufficiently accurate methods. The reader can find such methods in
Chapter 13 dedicated to heuristic methods in reliability.
CONCLUSION
It seems that the first paper on the analysis of a duplicated system with repair
(renewal) was published by Epstein and Hosford (1960). They solved the
problem foT a purely Markov model when the distributions—both TTF and
repair time—were exponential. They solved the problem with the help of
birth and death processes. Their model is also described in Gnedenko and
Kovalenko (1987). Here the solution of the same problem for the duplicated
system was found for both active and underloaded redundancy.
A systematic investigation of renewal systems, in particular, the duplicated
system, may be found in Ushakov (1985, 1994). Belyaev (1962) developed an
elegant method of so-called "striping Markov processes" which has allowed
one to solve the problem with no assumption of exponentiality on the repair
time. Independently, Gaver (1963) obtained practically the same results with
the help of traditional methods.
Gnedenko (1964a, 1964b) obtained solutions for the general case when
both distributions are arbitrary. Theorems concerning the asymptotic behav-
ior of renewal duplicated systems have been obtained by D. Gnedenko and
Solovyev (1974, 1975). and, practically simultaneously, by Gnedenko, Belyaev,
and Solovyev (1969). The method of fictitious states (stages) for "Markoviza-
tion" of non-Markov models as applied to queuing systems takes its origin
from Erlang's work. A comprehensive exposition of existent mathematical
results related to the problem can be found in Ushakov (1985, 1994), where
EXERCISES 333
REFERENCES
Belyaev, Yu. K. (1962). Striping Markov processes and their applications to reliability
problems (in Russian). Proc, Fourth All-Union Meeting on Probab. Theory Math.
Statist., Vilnius (Lithuania), pp. 309-323.
Epstein, B.. and T. Hosford (1960). Reliability of some two unit redundant systems.
Proc. Sixth Nat. Symp. on RQC, pp. 466-476.
Gavcr, D. P, (1963). Time to failure and availability of parallel systems with repair.
IEEE Trans. RQC, vol. R-12, pp. 30-38.
Gertsbakh, I. B. (1984). Asymptotic methods in reliability theory: a review. Ada. in
Appl. Probab. vol. 16.
Gnedcnko, B. V. (1964a). On duplication with renewal, Engrg. Cybernet., no. 5, pp.
111-118.
Gnedcnko, B. V. (1964b). On spare duplication. Engrg. Cybernet., no. 4, pp. 3-12.
Gnedcnko, B. V., and 1. N. Kovalenko (1987). Introduction to Queuing Theory, 2nd
ed. (in Russian). Moscow: Nauka.
Gnedcnko, B. V„ Yu. K. Belyaev, and A, D. Solovyev (1969). Mathematical Methods
of Reliability Theory. New York: Academic.
Gnedenko, D. B., and A. D. Solovyev (1974). A general model for standby with
renewal. Engrg. Cybernet., vol. 12, no. 6.
Gnedenko, D. B., and A, D. Solovyev (1975). Estimation of the reliability of complex
renewable systems. Engrg. Cybernet., vol, 13, no. 3.
Solovyev, A. D. (1970). Standby with rapid repair. Engrg. Cybernet., vol. 4, no. I.
Solovyev, A. D. (1971). Asymptotic behavior of the time of first occurrence of a rare
event. Engrg. Cybernet., vol. 9, no. 3.
Solovyev, A. D. (1972). Asymptotic distribution of the moment of first crossing of a
high level by birth and death process. Proc. Sixth Berkeley Symp. Math. Statist.
Probab., Issue 3.
Ushakov, I. A„ ed. (1985). Reliability of Technical Systems: Handbook (in Russian).
Moscow: Radio i Sviaz,
Ushakov, I. A., ed, (1994). Handbook of Reliability Engineering. New York: Wiley,
EXERCISES
7.1 There are two transition graphs (see Figures E7.1« and b). State 2 is a
failed state in both cases.
(a) Give a verbal description of the two systems to which these graphs
correspond.
(b) Which system has a larger MTTF?
334 REPAIRABLE DUPLICATED SYSTEM
/ - X 0.5\i
0h
©
2\ 2X
Q
X
©
2ft
1 f
Figure E7.1.
0
G ( / ) = Pi <?"Ml' + p 2 + p 3
0 2X
M
v
2H )
3
EXERCISES 335
0
Failure
336 REPAIRABLE DUPLICATED SYSTEM
SOLUTIONS
7.1 (a)The first system is a duplicate system with a loaded redundant unit
where after the system has failed it is renewed as a whole. The
second system is an ordinary duplicate system of two independent
identical units operating in a loaded regime.
(b) Both systems have the same MTTF.
(c) The first system has a larger MTBF than the second one because
after each failure this system starts from state 0. After a system
failure the second system starts from state 1 where there is a
possibility of entering a failed state immediately.
(d) There is no difference at all.
(e) The first system has twice as large a repair time.
7.2 The solution is dcpicted in Figure E7.2.
Figure E7.2.
Figure E7.3.
SOLUTIONS 337
Figure E7.5.
CHAPTER 8
ANALYSIS OF PERFORMANCE
EFFECTIVENESS
= E hXk(t)WXk
(8.2
)
1 StsN
K - n Pi (8-4)
1 sisn
Let h , denote the probability that only the ;'th unit of the system is in a down
state at moment / (repairable systems can be considered as well as unre-
pairable). Then
h i j ^ l . l i El Pk = ^Lh0 = 8 , S j h 0 (8.6)
1 <.!<.n P,Pj
k*UJ)
and so on.
We can write the general form of this probability as
- n Pi n * - n *
/^Cj i eC, i eC,
(8-7>
where G x is the set of subscripts of the units which are considered opera-
tional in state X and Gx is the complementary set. Sometimes it is reason-
able to write (8.7) for any X as
It is clear that (8.7) and (8.8) arc equivalent. Using (8.4) to (8.8), we can
344 ANALYSIS OF
PERFORMANCE
EFFECTIVENESS
rewrite (8.3)
1
max q* « — (8.10)
Isisn rt
expression (8.9) can be approximated as
WK k
sysi " W o f l " E 9,(< ) exp
■t E A, .(l-^)
1 <.i<.n
(8.12)
We can see that "the significance of unit" is reflected, in this case, in the
factual failure rate. (See the previous remark.)
If pt is a stationary availability coefficient, that is, p, = 7]/(7] + r,) where
Ti is the MTBF of the z'th unit and r, is its idle time and » r,, then it is
possible to write the approximation
i- E
^syst ~ W0 (8.13)
T
1S i S n
'i
INSTANT SYSTEMS 345
Example 8.1 To demonstrate the main ideas, we first use a simple system
consisting of two redundant units. Let the system's units have corresponding
probabilities of successful operation equal to px and p2. The problem is to
find the probability of the system's successful operation.
Solution.
= (0-7)/> r p 2 + (l/2)(0.7)^j p2 + (1/2)(0.7)j> i42
- (l/2)(0.7)Pl + (l/2)(0.7 )p2 = (0.7)(0.9) - 0.63
180°
Example 8.3 Consider the same airport traffic control system as in Example
8.2. To increase the effectiveness of the system, the operating zones of the
radars overlap. In addition, we assume that within the overlapped zone, the
effectiveness of service is higher. Let us say that the coefficient of effective-
ness in an overlapping zone is practically equal to 1, while the same
coefficient of effectiveness in an ordinary zone is 0.7.
The system's effectiveness is determined as the average probability of
success weighted by the size of the zones with their corresponding effective-
ness coefficients. There are two possibilities to design a system with overlap-
ping zones. These two cases are depicted in Figure 8.2. The availability
coefficient of each radar again equals 0.9. The problem is to compare the
effectiveness of both variants and to choose the best one.
Solution. Consider the first variant, A, with two radars in the north zone
and two radars in the south zone (see Figure 8.2a). It is clear that we can
consider two independent subsystems, each delivering its own outcomc to the
control system as a whole. The outcome of one subsystem is equal to one-half
of the system's total outcome. Denote the effectiveness indexes of these two
subsystems and of the whole system by W u W 2 , and Wiystt respectively.
Because of the identity Wx = W2, W^ = 2Wx = 1W2.
Each subsystem can be in one of two useful states:
• Both radars are operating, and the probability of this is (0.9X0.9) — 0.81;
the coefficient of effectiveness in the zone is equal to J.
• Only one radar is operating, and the probability of this is (0.9X0.1) =
0.09; the coefficient of effectiveness in the zone is equal to 0.7.
(Recall that each subsystem covers only one-half of the zone of the operating
system.)
Wsyst = 2[(0.81)(1)(0.5) + (0.09) (0.7)0.5)] = 0.873
180s- 180°
(a) (b)
Figure 8.2. Two possible ways of using redundant radars for an airport radar system.
348 ANALYSIS OF PERFORMANCE EFFECTIVENESS
Now let us consider the second variant, B (see Figure 8.2b). In this case we
have to analyze 23 — 1 = 7 different states. The results of this analysis are
presented in Table 8.1. Here we denote the corresponding radars by N, S, E,
and W and use the symbols N', S', E', and W' to denote their idle states. The
final result can be found by summing all values in the last column of
Table 8.1:
W = 0.99314
Thus, variant B is the preferable one.
Example 8.4 As Russian nonmilitary authors, we never had access to
information about former Soviet military systems, even the out-of-date sys-
tems. So for illustration we are forced to use an illustrative narrative from the
proceedings of one of the early IEEE Reliability Conferences.
Just after World War II there were antiaircraft missile systems of the
following simple type. There was a radar searching for a target with informa-
tion displayed on a screen. After locating the target, a conveying radar was
switched in and information about the target was processed by a computer
and displayed by the same monitor. If the searching radar failed, the
conveying radar was used for searching (with a lower efficiency). The last step
Figure 8.3. Simplified block diagram of an aircraft radar system. 1 = searching radar;
2 = optical device; 3 = conveying radar; 4 « display; 5 = computer; 6 = control
equipment.
INSTANT SYSTEMS 349
connected with the destruction of the target was fulfilled by means of control
equipment and a controlled missile. In case of a failure of the electronic
equipment (which was so unreliable at that time!), a pilot could use an
optical system for pursuing the target (see Figure 8.3).
Thus, the system could be in different states because of the failures of the
equipment. Different modes of the system under consideration are presented
in Table 8.2. The probabilities of a successful operation at some given
moment of time are
12
max <7,(r,r + f0) « - (8.15)
lsiin n
ENDURING SYSTEMS 351
8.3 ENDURING SYSTEMS
If the period of time it takes to perform a system task is sufficiently long, that
is, during this period a number of different state changes can occur, then one
needs to investigate an enduring system. In this case a probabilistic measure
is distributed over a continuous space of trajectories of the changing system
states. Let Z ( t , t + f0) denote some fixed trajectory. In the continuous
trajectory space we can determine a density f z for each such trajectory. At
the same time, if a system moves from one state to another in correspon-
dence to such a trajectory, one can characterize it by some predetermined
outcome (effectiveness), say Wz.
Now we can write an expression similar to (8.2)
= / Wzd F(Z) ( 8 . 1 4)
J
Gz
where G z is the space of all possible system state trajectories in the interval
( / , t + t0). The simplicity of (8.14) is deceptive. In general, it is very difficult
to compute the densities of trajectories and to find analytical expressions for
the outcomes of a system for each particular case. (We will return to this
topic later.) To illustrate this statement, consider a simple duplicated system
consisting of two unrepairable units. Initially, both units are in an operating
state. The expression for this case is
1 l+t
r
! + — TJ ' l V ^ d F ^ ) + - f " ' " W 2 ( t J2 ) d F ( t 2 )
P1 'u p 2 t0
+— <2) d F ( t l ) d F ( t 2 )
P\ Pi J t J t
it is possible to write an
E L(t,t + tQ) ~r"'Wl ( x t ) d F i (x,)
<i<n V '
approximate formula for J
unrepairable systems:
1_
^syst ~ "0
1
For repairable systems such an analysis becomes extremely difficult and
boring. For numerical calculations one can introduce a discrete lattice to
describe the system's trajectory in the space of system states. But in this case
one encounters a complex factorial problem. Of course, the largest practical
difficulties arise in the determination of the effectiveness coefficients for
different state trajectories in both cases: continuous and discrete.
For enduring systems B^, is also a generalized index in comparison with
the standard reliability indexes. As usual, a generalized (or more or less
universal) method permits one to obtain any different particular solutions but
with more effort. So, for simple reliability problems, one need not use this
general approach. At the same time we should mention (hat, in general, a
system performance effectiveness analysis cannot be done via common relia-
bility methods.
We first consider two simple examples which can be solved by the use of
standard reliability methods.
W^ = ZpiZ) + jZtdqit)
As one can see (and as would be expected), the result coincides with the
conditional MTTF inside the interval [0, Z].
ENDURING SYSTEMS 353
k + 1 11
1
V
(a)
k+1
(b)
UL
i>
i.j
(c )
Figure 8.4. Sample of a stochastic track: (a) an observed track without absorbing
states; ( b ) a track absorbed at state k + 1; (c) a track absorbed at state k .
Solution. To solve the problem, one should write the BDP equations (see
Chapter 1). At moment t = 0, assume all system units are operating; that is,
the system is in state 0. Let a k be the transition intensity from state k to
state k + 1 and let fik be the transition intensity from state k to state k — 1.
If we consider a process without an absorbing state, then the system of linear
ENDURING SYSTEMS 355
differential equations is
RW) = L sk(t)
kzjzn
where S k ( t ) is the probability that the worst state that the initial process
reached in [0, t ] is k . Hence, S k ( t ) = /?£(/) - /?*+,('). The final result is the
following:
£ wk sk { t)
0 sksn
Example 8.7 Consider a system that involves the collection and transmission
of information. The system consists of two identical and independent com-
munication channels. If a channel fails, the system capacity decreases to 0.3
times a nominal value. For simplicity, assume that each channel is character-
ized by an exponentially distributed TTF with parameter A. Let the duration
of a given information collection equal (0.1)/A. The volume of the collected
356 ANALYSIS OF PERFORMANCE EFFECTIVENESS
W0 = t
Below we consider several particular cases for which one can obtain simple
results. Such kinds of structures are often encountered in practice.
= E WIPI (8.17)
t <.i<,n
Expression (8.17) can also be written for dependent units. This follows from
PARTICULAR CASES 357
the fact that the expected value of a sum of random values equals the sum of
its expected values, regardless of their dependence.
For concreteness, let us consider a system with two types of units. Let us
call a unit an executive unit if it produces a portion of the system's outcome.
All of the remaining units will be called administrative units. The system's
outcome again consists of the sum of the individual outcomes of its executive
units. The coefficient of effectiveness of the /th executive unit, i = 1N ,
depends on X, the state of both the structural and executive units of the
system, that is, W^(X), 1 < i £ n , N < n .
In this case a unit's outcome depends on two factors: the operating state of
the unit itself and the state of the system. Finally, we can write
E P, E{^(X)} (8.18)
i Zi<.n
E E{W,}
isfsw
n P,J (8-20)
\<.}<.M-\
E (8.21)
1
Again, we use the fact that the mean for the sum of dependent random
variables equals the sum of their means.
E PM + Pi L PW)
Example 8.9 Let us consider a spy satellite designed for the collection and
transmission of information. This system is unrepairable and can be consid-
ered as enduring. The system consists of three communication channels.
Their capacities and failure rates correspondingly are: Vy — 100 Mbps, Kj =
200 Mbps, V 3 = 250 Mbps, and A, = 0.0001 1/hr, A2 = 0,0003 1/hr, and
360 ANALYSIS OF PERFORMANCE EFFECTIVENESS
A3 = 0.0004 1/hr. Find Wsyst (in absolute value) in two forms: (a) the mean
capacity of the system as a whole at moment t - 1000, and (b) the mean
volume of transmitted information during 5000 hours.
PARTICULAR CASES 361
=E
ISI S 3
(b) If, during the period [0, f„] in this example, there was no failure, a
channel has collected H b i t s of information. If a failure has occurred at the
moment t < t a , then a channel has collected Wtt bits of information. Taking
this into account, one can write
W
»U(0,/o)- E - ™[ l -
'n e~ x >'o}
A;
Substituting the input data (in the same time dimension), one obtains
Iflj, (0,5000)
100
3600 = 5.2 • I0y Mbits
0.00001 200 0.00004
250
-0.39 + _______ 0.78 +_________0.86
0.00003
operating states. It is clear that an executive unit will not operate successfully
if at least one "structural" unit which controls it has failed.
Note that the executive units of the system are dependent through their
common controlling units. Indeed, a failure of any controlling unit leads to
the failure of all controlled units at lower levels, including the corresponding
executive units. Therefore, a failure of some "structural" unit leads to the
stopping of successful operations of the corresponding branch as a whole;
that is, the corresponding set of executive units does not produce its out-
come. A failure of the highest-level controlling unit leads to an interruption
of successful operations at all executive units.
It is understandable that the problem of effectiveness evaluation for
dependent executive units is not trivial. We introduce the following notation:
REMARK. Here we use the two expressions: "successfully operating" and "successfully per-
forming." Really these expressions have a very slight difference. In this text we will understand
that "operating" means that a unit is in an up state itself, independent of the states of any other
unit in the system, and "performing" means that the unit is operating itself and, at the same
time, ail of its controlling units are successfully operating. (See the explanations in Figure 8.7.)
K = 11 (8-23)
tiiSrt
„ ,dkW(xn)
n*n)= Z < dtx (8.24)
til
For practical purposes, one can use an approximation taking (8.24) with
relatively smalt k . Using (8.24), we can easily write
k B M
KY, - E{ Z - Z Bk E{x „} = Z k k (8.25)
Uil ' Aal Akl
^n-i(O) = In-V
If only one unit at the (n - l)th level is operating successfully, then not
more than a n executive units can perform successfully. The random number
of successfully performing executive units will have a binomial distribution
with moment generating function (8.26). This event occurs with probability
^-.(^-(^-'jp-ifl^T'-1
If two units at the (n - l)th level are operating successfully then not more
than 2"n executive units can perform successfully. The probability of this
event equals P „ _ ,(2) where
If we let
(8-28
(8.30)
366 ANALYSIS OF PERFORMANCE EFFECTIVENESS
Y ^ a n ln( p„ex" + q n )
PARTICULAR CASES 367
We can continue similar arguments for the units at the remaining upper
levels of the system's hierarchy.
Thus, we have the recurrent expression
Using the chain rule, we obtain recurrent expressions for the desired initial
moments Mk.
Indeed, the first moment can be found in the following way:
d[G„(e*-) } d\Gn.x(eY)] dY
Ml = dY dx„
dx, x„-0 x-0
~Mn- 1 (8.32)
dx.. x-0
Finally,
=
M
n-\aaPn (8.33)
K ft Pi =Po T1 Pi<*i
Os/sn
The second moment of the distribution of the random value x n can be found
in a similar way.
d2\Gn{ex") \ d2 [ Gn ^{ e Y) \ d Y d G ( Y)
M 2 _
dxI 2
d Y dx„ d Y2 x„-0
dY ~dxT
M l = M 2 _ x a n p n + M^xattp„q„ (8.35)
368 ANALYSIS OF PERFORMANCE
EFFECTIVENESS
^sys, = AMr\ = A p ^ p x a x ) { p 2 a 2 ) - M P o P x P i )
that is, according to the chosen effectiveness measure, all four variants are
equivalent.
If W ( x n ) = B x the effectiveness of any state of the system is propor-
tional to the square of the number of successfully performing executive units.
Then
^syst = E { 1 F ( * „ ) } = JW„2 = 6 B p 0 p 1 p 2 ( 6 p l p 2 + a 2 q x q 2 p 2 )
The system with the largest a z is the most effective. Thus, the higher the
level of centralized control from the center, the better is the result.
Case 1 All units are dependent through the system state X. Then
»w - En xjfi - n (i - ^( x))
all X I lsi^N
(8.37)
In particular, for the branching system considered in the previous section, the
problem can be solved in the following elegant way. Let D be the probability
of success of a single executive unit (e.g., the kill probability of an enemy's
aircraft). Then, if x n executive units are acting simultaneously, the total
probability of success is
0^x„<N
= 1- L P ( x n ) ( l - D ) x " = 1-G„(i - D ) (8.39)
0<.x„<.N
The second term in (8.39) is a moment generating function with the substitu-
tion of 1 — D as a variable. Thus, (8.39) can be rewritten as
+•
+ ■•• + <?,)"' +<7o) (8.41)
Case 2 Assume that the system's executive units are operating sequentially.
The system states are supposed to change between the use of two consequent
executive units. If the time interval between the two system performances is
large enough, the result of their operations might be independent. The same
result will also be valid if one considers the simultaneous operation of several
executive units controlled by identical and independent controlling systems.
For example, one can consider the destruction of an enemy aircraft in the
overlapping zone of action of several antiaircraft systems. In this case
= 1 - n 1 - E P(X)^(X) (8.42)
all x
^sys, = E HW
z= u Z|
ISiSn
For further purposes, let us introduce zones Za which are disjoint. Within
each zone Za the same set of serving units is acting. The subscript
represents the set of subscripts of the executive units acting in zone Za .
Thus, / e cij means that the ith unit serves in zone Za . Let the number of
different zones Z a be M , that is, 1 < < M . It is clear that zones Za are
disjoint and we can write
U z.,
\<.i<.M
As wc mentioned above, M 2" in practice.
Because of failures the actual set of units operating in zone Zu is random.
In general, if includes my subscripts, zone Z0 can be characterized by 2'"'
different possible levels of effectiveness. For each possible set of acting units,
372 ANALYSIS OF PERFORMANCE EFFECTIVENESS
-E (8.45)
I Gay
E W a j = Z P.ZW (8.46)
Example 8.11 Consider a system consisting of two units and three acting
zones (see Figure 8.10). Let us denote
The effectiveness coefficient of the first unit is W{, and the effectiveness
coefficient of the second unit is W2. By assumption of the additive character
of the joint effect of the units, W3 = Wx + W2 for zone Z3.
SYSTEMS WITH INTERSECTING ZONES OF ACTION 373
Because all zones Z,, Z2, and Z3 are independent, we can write for the
whole system
^ P \ Z \ W i + p 2 z 2 w 2 + z 3 [ p l p 2 ( i v 1 + w2) +a1p2w2+pig2w1]
= +Z3) +p2W2(Z2 + Z3)
= P}W]Z\ +P2W2Z'2
Wsysl WJ
W„ = W, W: « * * W,
" i <! ' 2 ' t j
Taking into account the probability of success of the units, we can write
E Z a / nd -PVi) (8-48)
The final results (8.47) and (8.48) are illustrated by a simple example.
Example 8.12 The system under consideration is the same as in Example
8. 1 1 (see Figure 8.10 for an explanation). We use the same notation. To
facilitate understanding, keep in mind the case of a target's destruction in the
zone of defense. Thus, is the probability of failure of the ith executive
unit, Pi = 1 — q,, and Wt is the probability of a target's destruction by the ith
executive unit. Assume that the probability of a target's appearance in a zone
is proportional to its size. Then the probability of the target passing through
the defense zone equals
W^ = Zx(Wxpxqx) + Z2(W2p2 + q2)
+ Z3(plp2WlW2 + P xq2wx + p 2 q x W 2 + q x q 2 )
= Z x ( W x P x + q x ) + Z 2 ( W 2 p 2 + q 2 ) + Z i ( p x W x + q x ) ( p2W2 + q 2 )
or, equivalently
^vm " 2,(1 - w ] P l ) + Z2( 1 ~ w 2 p 2 ) + Z 3 ( l ~ p , w , ) ( l ~ p 2 w 2 )
ww -1 - n "v (8.49)
where H>; = 1 — WT and WT has the previous meaning, that is, the probability
of success. For this system we obtain
^'syst E i - no -P M ) (8,51)
I ZjsM
Again, let us illustrate the final result by a simple example.
ASPECTS OF COMPLEX SYSTEMS DECOMPOSITION 375
Example 8.13 Consider again the system represented in Figure 8,10. Using
the previous arguments, we have
w^zxpxwi+z2p2w2
W * = max W i (8.53)
- E Z0/ E W k P k ] I c h (8.54)
\<.j<M Area. \<k
376 ANALYSIS OF PERFORMANCE EFFECTIVENESS
* i^a
ASPECTS OF COMPLEX SYSTEMS DECOMPOSITION 377
This follows directly from the formulation of the problem. We again think
that there is no special need to explain it in more detail.
m = Yi 2"' £ 2"
1 sis M
But such a new system representation may still help to generate new ideas.
ASPECTS OF COMPLEX SYSTEMS DECOMPOSITION 379
Assume that for any system state a*, which is expressed as a composition of
subsystem states a*, that is, a* = ( a * ,..., a*^), the condition
EWA1 (8.57)
ISJSW
is true. Then, for such an additive system, (8.56) can be written as
The statement is clear if one remembers that the mean of a linear function
equals the function of the mean values of its variables. Thus, if it is possible
to choose subsystems in such a way that (8.58) holds, we can use the simple
expression (8.57).
The next results can be formulated for multiplicative systems. Note that for
multiplicative systems (8.56) can be written as
n ty (8.59)
1 I Zj &M
if and only if for any system state a * which is expressed as a composition of
subsystem states a*, that is, a* = ( a f , . . . , a*M), the following condition is
valid:
(8.60)
380 ANALYSIS OF PERFORMANCE EFFECTIVENESS
n* K
1 ZJZM
ASPECTS OF COMPLEX SYSTEMS DECOMPOSITION 381
Expression (8.60) means that the W of any subsystem does not depend on the
states of other subsystems.
Statement (8.59) becomes clear if one remembers that the mean of the
product of independent random variables equals the product of the means of
its variables. Thus, if subsystems are chosen in such a way that (8.60) holds,
we can use the simple expression (8.59).
Unfortunately, the number of practical examples where we may obtain
such a fantastic bargain in the evaluation of performance effectiveness is
exhausted by these two trivial cases. Also, unfortunately, such systems are
quite rare in engineering practice.
Fortunately, however, a similar approach can be used to obtain bounds of
a system's effectiveness index in the case of regional systems.
W
^syst< Z i
(8-62
)
t &i<,M
^syst< Z W, (8.63)
To confirm (8.63), we show that this is correct for a zone with two
acting units belonging to different subsystems. Keeping in mind that
0 < Wi; < 1, i = 1,2, we can easily write
1 - ( p x w x + q l ) ( p 2 w 2 + q 2 ) < p x W y + p 2 W 2 (8.64)
ASPECTS OF COMPLEX SYSTEMS DECOMPOSITION 383
K** E Wj
(8-
65)
isjsM
4. For systems in which one chooses for operation the unit with the
maximal effectiveness coefficient in a zone, we have
m] = 2">
different states. Thus, we must analyze m , different states for each subsys-
tem.
If it is possible to express a system's effectiveness index Wiyit as a function
of the W^s of the subsystems in the form
W ^ - f i W ^ W . ( 8 . 6 7 )
V = E r{Xji}Wit
(8.
68)
1 ^i^mj
REMARK. We mention that very complex systems are usually considered in engineering
practice in a hierarchical way with more than two levels. This permits one to independently
analyze first the system as a whole; then each subsystem as a part of the system, but performing
its own functions; then some more-or-less autonomous parts of these subsystems; and so on.
Such a mode is very effective in the evaluation of a system's effectiveness.
• The Kjth cell includes those states whose effectiveness coefficients W{J
satisfy the condition B { K } — 1) < W j < 0, where B K is the last thresh-
old of the lattice; the corresponding probability computed is R K f ,
We may now analyze the system as a whole. In each cell of the lattice, we
choose a "middle" state which corresponds to the average value of Wj. For
future analysis, this state now becomes a "representative" of all of the
remaining states related to this cell. Thus, we choose K ; representatives for
each subsystem. We should choose an appropriate number of representa-
tives, say X * . Each of them appears with probability R j r The number of
representatives is determined with respect to the required accuracy of the
analysis and the available computer capacity.
After these preliminary steps we consider
K- N (8.70)
different system states and, for each of them, evaluate the effectiveness
coefficient. We then consider all K system states
388 ANALYSIS OF PERFORMANCE EFFECTIVENESS
and write the expression for Wsyst with the use of (8.62)
This expression is not too pleasant in visual form because of the notation
used. Neither is it easy to compute. But its nature is simple and completely
coincides with (8.61).
Of course, if we decide to distinguish several levels of a system's hierarchy,
the methodology would be the same but the corresponding description, in a
general form, would be even longer than (8.71). We would like to emphasize
that a hierarchical model needs less computation.
This method of representative selection can be successfully used for
obtaining lower and upper bounds,
1. Let us choose from among the states of the latticc cell a state with a
minimal effectiveness coefficient and consider this state as a representa-
tive of this cell. Denote this aggregate state by Xp™. If we substitute
Ay1"" instead of X * in (8.71), we will obtain a lower bound for the
system index,
2. If a state XJ"ax with a maximal effectiveness coefficient is chosen as the
representative of the cells, then the same procedure gives us an upper
bound for W^,3*.
search problem, the task is concrete and its solution is more of an art than a
science.
PRACTICAL RECOMMENDATION 391
It seems that the first paper devoted to the problem discussed in this chapter
was the paper by Kolmogorov (1945). This work focused on an effectiveness
measure of antiaircraft fire. The total kill probability of an enemy's aircraft
was investigated. The random nature of the destruction of different parts of
an aircraft and the importance of these parts was assumed. It is clear that
from a methodological viewpoint the problem of system effectiveness analysis
is quite similar: one has only to change slightly the terminology.
The first papers concerning a system's effectiveness evaluation appeared in
the early 1960s [see, e.g., Ushakov (1960, 1966, 1967)]. Some special cases of
system effectiveness evaluation were considered in Ushakov (1985, 1994) and
Netes (1980, 1984).
One can find an analysis of the effectiveness of symmetrical branching
systems in Ushakov (1985, 1994) and Ushakov and'Konyonkov (1964). Terri-
torial (regional) systems with intersecting zones of action were studied in
Ushakov (1985, 1994). Here one can also find an analysis of decomposition
methods. The general methodology and methods of system effectiveness
analysis are described in Ushakov (1985, 1994).
REFERENCES
EXERCISES
8.1 A conveyor system consists of two lines, each producing N items per
hour. Each of the lines has an availability coefficient K = 0.8. When
one of the lines has failed, the other decreases its productivity to Q.7N
because of some technological demands. There is a suggestion to
replace this system with a new one consisting of one line with a
productivity of I . I N items per hour and an availability coefficient
Kt = 0,9. Is this replacement reasonable from an effective productivity
viewpoint or not?
8.2 A branching system has one main unit and three executive ones. There
are two possibilities: (1) to use a main unit with PFFO p 0 = 0.9 and an
executive unit with PFFO = 0.8 or (2) to use a main unit with PFFO
p 0 = 0.8 and an executive unit with PFFO p, = 0.9. Is there is a
difference between these two variants if the system's effectiveness
depends on (a) the average number of successfully operating executive
units, (b) a successful operation at least one executive unit, and (c) a
successful operation of all executive units?
SOLUTIONS
8.1 The old system of two lines has the following states:
taking into account expenses for installation of the new system, on the
one hand, and the potential decrease in the cost of repair, on the
other hand (the new system will fail less often).
(a) The average number of successfully operating executive units de-
pends only on the product pnp,, so both systems are equivalent.
(b) For (1) one has Wsyst = (0.9X1 - 0.8)3 « 0.898 and for (2) ^sys, =
(0.8X1 - 0.9)3 = 0.799. Thus, the first variant is more effective.
(c) For (1) one has = (0.9X0.8)3 * 0.460 and for (2) Wsyst =
(O.SXO^)3 » 0.584. In this case the second variant is more effective.
CHAPTER 9
TWO-POLE NETWORKS
Each vector X k can be expressed through its component x's and x's. For
example (see Table 9.1),
X x
~ (*1» 2> s)
From Table 9.1 it follows that the vector X s belongs to G, and so it will be
taken into account in (9.2). Then
< P( * S) = x x x 2 x 3 x A x 5
and
E{<K*s)} = E { X i X 2 x 3 x 4 x 5 )
= E{*,} E { X 2 ) E{X3} E{jc5} = Q T P 2 q 3 p A p s
We do not write the detailed expression for < p ( X ) here. This can be easily
obtained from Table 9.1 by taking into account that the corresponding term
TABLE 9.1 Description of the StructureRIGID
Function of the Bridge
COMPUTATIONAL Structure
METHODS 399
States of units Vector Value
x
*4 *5 **
\
fW
1 1 1 1 1 1
0 1 1 1 1 1
1 0 1 1 1 1
1 1 0 1 1 1
1 1 1 0 1 1
1 1 1 1 0 1
0 0 t 1 1 0
0 1 0 1 1 1
0 1 1 0 1 1
0 1 1 1 0 1
1 0 0 1 1 1
1 0 1 0 1 X12 1
1 0 1 I 0 1
1 1 0 0 1 1
1 1 0 1 0 1
1 1 1 0 0 0
0 0 0 1 1 0
0 0 1 0 1 0
0 0 1 1 0 X19 0
0 1 0 0 1 1
^20
0 1 0 1 0 0
0 1 1 0 0 0
X22
1 0 0 0 1 "^23 0
1 0 0 1 0 ^24 1
1 0 1 0 0 *25 0
1 1 0 0 0 0
0 0 0 0 1 X21 0
0 0 0 1 0 ^28 0
0 0 1 0 0 0
X29
0 1 0 0 0 0
X30
1 0 0 0 0 0
0 0 0 0 0 0
x32
1. Both the redundant group and the SD have not failed during a
specified interval of time t.
2. The first unit chosen at random fails at some moment x < f, the SD
performs a successful switch to one of the operating units of the
remaining redundant group of m - 1 units, and the new system per-
forms successfully up to time /.
1. The first unit operates successfully.
2. After its failure there is a group of randomly chosen redundant units
with operating SDs; this new system operates successfully during the
remaining time.
2. After its failure at some moment x, there is a group of redundant units.
The size of this group is random because some of them might have
failed before the moment Let the number of operating redundant
units at the moment x equal j, In some order we try to switch each of
400 TWO-POLE
these NETWORKS
j operating units to the main position until a first successful
switching occurs. The number of attempts before a success is dis-
tributed geometrically with parameter R. After k SDs have failed
during switching, a successful attempt occurs ik is random). This means
I --- I
A0M2 + M1M2 + A o A i
• M is the operational state of the main unit.
• M is the failure state of the main unit.
• R is the operational state of the redundant unit.
- R is the failure state of the redundant unit.
■ S is the operational state of the switch.
• S is the failure state of the switch.
• A, is the failure rate of the main unit.
• At is the failure rate of the switch.
■ A is the failure rate of the redundant unit.
• fi is the intensity of repair of a single unit.
• fis is the intensity of repair of the switch.
• ft* is the intensity of repair of the system as a whole.
• M is the operational state of the main unit.
• M* is the "hidden failure" state of the main unit.
• M is the failure state of the main unit.
• R is the operational state of the redundant unit.
• R is the failure state of the redundant unit,
• A, is the failure rate of the nonmonitored part of the main unit.
• A, is the failure rate of the monitored part of the main unit.
• A is the failure rate of the redundant unit.
• \s is the failure rate of the switch.
• /u, is the intensity of repair of a single unit.
• ns is the intensity of repair of the switch.
s
• /x* > the intensity of repair of the system as a whole.
• v is the intensity of periodical tests of the main unit.
of type E{<p(J^A)} has Pi for x t = 1 and <?, for i, = 1. Based on Table 9.1, the
following equation can be written:
£{?>(*)} = E{V>(*1)} + E{<p(*2)} + ... +E{?(*32)}
Omitting intermediate results, we give the final formula for the connectivity
probability (in the case of identical units) in two equivalent forms
E{<p(*)} =p
5
- 5p4 + 3
2p + 2p
2
(9.3)
2 3 4 5
E{<p(A')} = 1 - 2q - 2q + 5q - 2q (9.4)