Software Reliability Engineering
Software Reliability Engineering
19.
Florin Popențiu VLĂDICESCU
Professor Florin Popențiu Vlădicescu graduated in Electronics and
Telecommunications from University POLITEHNICA of Bucharest in
1974 and holds a PhD in Reliability in 1981. He has been Chairholder
of the UNESCO Chair in Information and Communication Engineering
at City University London since 1998 and also he has been appointed
Director of the “UNESCO Chair” Department at University of Oradea.
Professor Florin Popențiu Vlădicescu has published over 100 papers in
international journals and conference proceedings and is
co-author of three books.
He has worked for many years on problems associated with software
Software
Reliability Engineering
Course book
course book
SOFTWARE RELIABILITY
ENGINEERING
First Edition
Course book of Series of
Advanced Mechatronics Systems
CONTENTS
1 INTRODUCTION ............................................................................................. 13
2. 1 Reliability measures................................................................................. 33
2.1.1 Reliability, Maintainability and Availability ....................................... 33
2.1.2 Reliability Function for Common Distributions ............................... 41
9
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.2 Stochastic processes in reliability ......................................................... 55
2.2.1. Preliminaries ................................................................................................. 55
2.2.2 Discrete Time Markov Chains .................................................................. 57
2.2.3 Continuous Time Markov Chains ........................................................... 58
2.2.4 Semi-Markov Processes ............................................................................. 59
2.2.5 Birth-Death (BD) Processes ..................................................................... 60
2.2.6 Poisson Processes ........................................................................................ 61
10
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4.4 Applying Failure Data to Guide Decisions ........................................ 137
11
Copyright: 2012 by Florin POPENTIU-VLADICESCU
6.2 Criteria for model evaluation ............................................................... 191
12
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1 INTRODUCTION
13
Copyright: 2012 by Florin POPENTIU-VLADICESCU
improvements on software reliability, in particular, and on system
reliability, in a systemic view.
In order to understand the importance of software reliability as a
discipline in Table 1 there are presented, in chronological order, some
projects1 which failed due to software reason. A list of software bugs is
maintained online by Wikipedia2.
Table 1
# Time Name Short description
1 1962 Mariner I A bug3 in the flight software for the
Mariner I cause the rocket to divert from
its intended path on launch. Mission
control destroys the rocket over the
Atlantic Ocean. The investigation into the
accident discovers that a formula written
on paper in pencil was improperly
transcribed into computer code, causing
the computer to miscalculate the rocket's
trajectory.
2 1978 Hartford The programmer of the CAD software
Coliseum used to design the coliseum incorrectly
Collapse assumed the steel roof supports would
only face pure compression. But when
one of the supports unexpectedly buckled
from the snow, it set off a chain reaction
1
http://www.devtopics.com/20-famous-software-disasters/
2
http://en.wikipedia.org/wiki/List_of_software_bugs
3
A bug is an English word which means “insect” and comes from a problem caused by an
animal trapped in a component and preventing its operation. A software bug is a programm-
ing error caused by software developers.
14
Copyright: 2012 by Florin POPENTIU-VLADICESCU
that brought down the other roof sections
like dominoes.
3 1982 The CIA operatives allegedly planted a bug in
Trans- a Canadian computer system purchased
Siberian by the Soviets to control their gas
gas pipelines. The purchase was part of a
pipeline strategic Soviet plan to steal or covertly
obtain sensitive U.S. technology. When
the CIA discovered the purchase, they
sabotaged the software so that it would
pass Soviet inspection but fail in
operation.
4 1983 World A bug in the Soviet software failed to filter
War )))… out false missile detections caused by
False sunlight reflecting off cloud-tops. The
alarm Soviet early warning system falsely
indicated the United States had launched
five ballistic missiles. Fortunately the
Soviet duty officer had a funny feeling in
my gut and reasoned if the U.S. was
really attacking they would launch more
than five missiles, so he reported the
apparent attack as a false alarm.
5 1985- Therac- Because of a subtle bug called a race
1987 25 condition, a technician could accidentally
configure Therac-25 so the electron beam
would fire in high-power mode without
15
Copyright: 2012 by Florin POPENTIU-VLADICESCU
the proper patient shielding. Three
people dead, three people critically
injured.
6 1990 AT&T A single line of buggy code in a complex
network software upgrade implemented to speed
up calling caused a ripple effect that shut
down the network.
7 1991 Patriot A software rounding error incorrectly
Fails calculated the time, causing the Patriot
system to ignore the incoming Scud
missile.
8 1993 Pentium The divider in the Pentium floating point
unit had a flawed division table, missing
about five of a thousand entries and
resulting in rounding errors.
9 1996 Ariane 5 The shutdown of Ariane 5 occurred when
the guidance computer tried to convert
the sideways rocket velocity from 64-bits
to a 16-bit format. The number was too
big, and an overflow error resulted. When
the guidance system shut down, control
passed to an identical redundant unit,
which also failed because it was running
the same algorithm.
First Flight 501's backup computer
crashes, followed 0.05 seconds later by a
crash of the primary computer. As a result
16
Copyright: 2012 by Florin POPENTIU-VLADICESCU
of these crashed computers, the rocket's
primary processor overpowers the
rocket's engines and causes the rocket to
disintegrate 40 seconds after launch.
***
According to the ESA report (http://esa
multimedia.esa.int/docs/esa-x-1819eng.
pdf): The internal SRI software exception
was caused during execution of a data
conversion from 64-bit floating point to
16-bit signed integer value. The floating
point number which was converted had a
value greater than what could be
represented by a 16-bit signed integer.
This resulted in an Operand Error. The
data conversion instructions (in Ada
code) were not protected from causing an
Operand Error, although other con-
versions of comparable variables in the
same place in the code were protected .
10 2005 FB) s Mismanagement, and an attempt to build
Virtual a long-term project on technology that
Case File was outdated before the project
project completed, resulted in a complex and
unusable system.
11 Simple Toyota: In October of 2005, the Toyota
cases Motor Company voluntarily recalled
17
Copyright: 2012 by Florin POPENTIU-VLADICESCU
75000 of its hybrid vehicles because a
software glitch that may have shut down
the engine.
US Airways: In April of 2005, the
ticketing system for US Airways issued
incorrect fares for several hours. Some
tickets were offered for under $2.
18
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1.2 Basic Definitions and Terminology
19
Copyright: 2012 by Florin POPENTIU-VLADICESCU
performs some of the requirements of that system is called embedded
(robots, satellites, missiles, automotive etc.). That software whose
failure could have an impact on safety, or could cause large financial or
social loss is called critical software .
7. Bug – is an error or defect in software or hardware that causes a
program to malfunction. According to Pham , a bug is a design
flaw that will result in symptoms exhibited when an object is subjected
to an appropriate test.
20
Copyright: 2012 by Florin POPENTIU-VLADICESCU
10. Fault – is a defect in a hardware device or component, and an
incorrect step, process or data definition in computer programs. Two or
more faults that result in the same failure mode (the same manifestation
or behavior are called equivalent .
11. Failure – According to IEEE Std. 610.12-1990, by failure it is
described the inability of a system or component to perform its
required functions, within specified performance requirements .
Software failures are generated by software faults during run-time. The
chain is the following: the designer or programmer makes a mistake (an
error) which generates a software fault (the software has a defect). This
fault could generate, or not, a failure depending on the software
exploitation. Failure is a user-oriented concept, but fault is a developer-
oriented concept, as (Musa 1999) appreciates. If a failure is experienced
during the software testing or run-time, the process of fault
identification and error elimination is called debugging .
12. Severity levels of failures – According to (Pham 2000), the severity of
a software failure effect is classified into four categories: Catastrophic,
Critical, Major, and Minor.
Catastrophic failures are associated to disastrous effects: loss
of human life, permanent loss of assets, injuries as result of
erroneous medical prescription or traffic management.
Critical failures cover disastrous effects but restorable
damages (equipment, data) or curable illness or injury.
The major failures cover those serious failures without
physical injury to people or other systems.
The minor failures produce small inconveniences to users or
other components of the information system.
21
Copyright: 2012 by Florin POPENTIU-VLADICESCU
13. Operational mode – is a pattern of system use and/or set of
environmental conditions needing separate testing because it is likely to
simulate different failures Musa .
14. Operational profile – consists of the complete set of operations for
the system with probabilities of occurrence based on all operational
modes Musa .
15. Testing – is the process of analyzing a system to identify bugs.
Software testing is an important phase during software life cycle and
will be detailed in a dedicate section. Testability describes the ease with
which a system, or component, can be tested.
16. Unit – is a separately testable element (hardware or software
component).
17. Program correctness – establishing in finite time the fulfillment of the
output effect under given valid input data (the specified requirements).
The quality of the input/output data is modeled by logical assertions.
The program correctness is proved by mathematical techniques. In
general the correctness is the degree to which a system or component
is free from faults in its specification, design, and implementation )EEE
Std. 610.12-1990).
18. Software verification – establishing the fulfillment of the software
specification.
19. Software reliability – is the probability that software will not fail for
a specified period of time , called the mission time, under specified
conditions , as Pham considered. As an attribute of quality, the
reliability describes the ability of a software system to carry on
executing with little, or without, interruption to its functioning.
20. Reliability growth – is the improvement in reliability by successful
correction of faults, as a consequence the successive observed failure
22
Copyright: 2012 by Florin POPENTIU-VLADICESCU
times tend to increase. Software reliability growth is obtained by
software maintenance. Talking about hardware reliability growth is not
normally considered.
21. Software availability – is the probability that a system has not failed
due to a software fault , as Pham defined. According to )EEE Std.
610.12- , in general, availability is the degree to which a system or
a component is operational and accessible when required for use .
22. Software maintenance – is the process of modifying a software
system or component after delivery to correct faults, improve
performance or other attributes, or adapt to a changed environment , as
IEEE Std. 610.12-1990 established. A current approach is to provide
online service packs or software updates.
23. Maintainability – is the probability that a system will be restored to
working condition in a given period of time when being subject of
modifications (corrections or enhancements). Not only preventive
maintenance (to avoid those problems which can generate faults), but
also perfective maintenance (to improve performance) is promoted to
increase software reliability.
24. Mean time to repair (MTTR) – is the expected time to restore a
system to operation after experiencing a failure.
25. Mean time between failures (MTBF) – is the expected time when the
next failure will appear.
26. Fault tolerance – mainly is the ability of a system or component to
continue normal operation despite the presence of hardware or
software faults , according to IEEE Std. 610.12-1990.
27. Risk assessment – is a study of the probability that certain
undesirable events will occur (software failures, cost overruns etc.)
according to (Musa 1999).
23
Copyright: 2012 by Florin POPENTIU-VLADICESCU
28. Software reusability – is the ability to transfer modules constructed
for one software system into another software system , according to
(Ince1994).
29. Survivability – refers to reliability of the system with respect to a
specified set of critical functions when the system has been degraded in
some way Musa .
24
Copyright: 2012 by Florin POPENTIU-VLADICESCU
occurring during the interval. Various shapes of ROCOF appear when
modeling software reliability, as shown below:
25
Copyright: 2012 by Florin POPENTIU-VLADICESCU
35. Indirect measurement – are based on direct measurements, other
indirect measurements, and/or combinations of both type of
measurement. Some indirect measures used in software engineering
are: module defect density (number of defects per module size), defect
detection efficiency (number of defects detected per total number of
defects), programmer productivity (the KLOC produced per person
months of effort), and system spoilage (the effort spent fixing faults per
total project effort).
36. Prediction system – consists of a mathematical model together with
a set of prediction procedures for determining unknown parameters and
interpreting results , according to Littlewood, and Fenton &
Fleeger, 1996).
37. Prediction system’s validation – is the process of establishing the
accuracy of the prediction system by comparing model performance
with known data in the given environment, according to (Fenton &
Pfleeger, 1996).
38. Top-down design is defined by (Brooks, 1975) as A good top-down
design avoids bugs in several ways. First, the clarity of structure and
representation makes the precise statement of requirements and
functions of the modules easier. Second, the partitioning and
independence of modules avoids system bugs. Third, the suppression of
detail makes flaws in the structure more apparent. Fourth, the design
can be tested at each of its refinement steps, so testing can start earlier
and focus on the proper level of detail at each step.
39. The Goal-Question-Metric paradigm – involves three steps: list the
major goals of the development or maintenance project, derive from
each goal the questions to be answered for meeting the goal, and decide
26
Copyright: 2012 by Florin POPENTIU-VLADICESCU
what must be measured in order to provide an adequate answer to
every question.
40. Capability and Maturity in Software Management – are considered by
Software Engineering Institute (Carnegie Mellon) to establish the
software process maturity level.
Five levels were identified:
ad hoc (the least predictable and controllable),
repeatable (but process depends on individuals),
defined (the software process is defined and
institutionalized),
managed (the software process is measured), and
optimized (the most predictable and controllable).
27
Copyright: 2012 by Florin POPENTIU-VLADICESCU
should be tracked as a function of time and compared against
reliability allocations and customer reliability requirements;
3) The assessment strategy can includes reliability values
achieved during development phases;
4) The impact of changes to the design should me monitored and
evaluated;
5) The implementation of any corrective action is verified and its
effectiveness is measured;
6) A formal growth methodology should be used if possible to
plan and track the reliability improvement;
7) Predicted failure modes and mechanisms should be compared
against those from test and the field.
28
Copyright: 2012 by Florin POPENTIU-VLADICESCU
When we had a few computers, we had a mild programming
problem. Confronted with machines a million times as
powerful, we are faced with a gigantic programming
problem.
The fourth principles of investigation, formulated by (Fenton &
Flegger, 1996): 1) Choosing an investigative technique; 2)
Stating the hypothesis; (3) Maintaining control over variables;
4) Making the investigation meaningful.
Uncertainty in assessment (Delic, Mazzanti & Strigini, 1997):
the assessor … is confronted with a wealth of evidence
about the design methods used, the quality assurance
organisation, the results of testing, etc., none of which is
usually sufficient to prove the desired conclusion, e.g., that a
system has a certain small probability of dangerous failure.
Hardware versus software (according to IEEE 1633-2008)
from reliability point of view:
1) Hardware reliability is expressed in wall clock time. SR
may be expressed in execution, elapsed, or calendar
time.
2) Repair generally restores hardware to its previous
state. Correction of a software fault always changes the
software to a new state.
3) Hardware is constrained by physical law, while a
minor change in software can cause failure.
4) Failures attributable to software faults come without
advance warning and often provide no indication they
29
Copyright: 2012 by Florin POPENTIU-VLADICESCU
have occurred. Hardware, on the other hand, often
provides a period of graceful degradation.
5) Hardware and Software Reliability should be managed
as an integrated system attribute.
There are many ways to develop Software Reliability models: a)
describing as stochastic processes; b) relate them to Markov
models; c) define the probability density or distribution
function; or d) specify the hazard function.
The Continue Reading Principle: To find out more about
software reliability engineering is necessary to go ahead
through next chapters.
30
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1.3 The Software Reliability Engineering Objectives
31
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Musa (1999) considered that understanding software reliability
engineering has become a vital skill for both the software manager and
software engineer . This is true, also, nowadays when new software
development methodologies were proposed and new requirements of
the customers are formulated in connection with new communication
interfaces, new interaction techniques and a new behavior of users.
As the group of authors of Garcia et al consider, non-
software professionals need to understand enough about SRE to
understand how it affects their work, and even to apply it to a limited
extent. When Musa (1999) considered the objectives of software
reliability engineering, he wrote the following: A primary objective of
software reliability engineering is to help the engineer, manager, or user
of software learn to make more precise decisions. A strong secondary
objective is to make everyone more concretely aware of software
reliability by focusing attention on it.
The objective of this course is to bring that information useful
both for software professional and non-software professionals, related
to the basic activities of software reliability engineering: defining the
product, implementing the operational profiles, engineering the <<just
right>> reliability, preparing for test, executing test, and guiding test.
Also, various aspects concerning reliability models and optimization
techniques used to achieve some objectives like the optimum reliability
under budget constrains, or minimum cost for some reliability target.
32
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2 RELIABILITY ENGINEERING FUNDAMENTALS
2. 1 Reliability measures
33
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Reliability is defined as the probability that the system will
perform its intended function under specified working condition for a
specified period of time, called the software mission. Mathematically, the
reliability function R(t) (also called the survival function) is the
probability that a system will be successful without failure in the
interval from time 0 to time t, i.e.
R t =P T>t ,t ,
R(t ) f (u )du , t
.
t
34
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 2.1. Failure intensity / Reliability function over time
Hence, it results that the three functions: R(t), F(t), and f(t) are
closely related to one another. If any of them is known, all the others can
be obtained by elementary calculus. Some graphical representation is
given by Fig. 2.1.
As (Musa, 1999) remarked, Each alternative form of expressing
reliability, failure intensity, or reliability proper has its advantages.
Reliability statement is better suited to be used for complex systems
where the system reliability can be expressed using the reliability of
components (subsystems). Failure intensity can be used when the
failure rate at each point of time is required. Also, the number of
resident faults, for example, is a developer oriented measure, while
reliability is a user oriented measure.
35
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 2.2. Failure rate of Common Operating Systems
(http://www.ece.cmu.edu/~koopman/des_s99/sw_reliability/)
As faults are removed, the failure intensity will decrease and the
reliability will be increased. During design changes or adding new
features it is possible to introduce new faults, and for the new build the
reliability can be low, and will increase after faults removal. If a system
is stable (no changes are accepted), both failure intensity and reliability
tend to be constant.
The failure rate is the rate at which failures occur in a certain
time interval, let us denote it by [t1, t2], hence it is defined as the
probability that a failure per unit time occurs in the interval [t1, t2], given
that a failure has not occur prior to t1, and is computed by
R(t1 ) R(t2 )
(t2 t1 ) R(t1 )
.
36
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The hazard function is defined as the limit of failure rate as the
length of the interval approaches zero. Hence, the hazard function,
denoted by h(t), is the instantaneous failure rate, and is computed
according to the following relations:
R(t ) R(t t )
h(t )
tR(t )
lim
t 0
1 d
R(t )
R(t ) dt
f (t )
.
R(t )
If h(.) is an increasing (resp. decreasing) function then F(.) is
classified as Increasing (resp. Decreasing) Failure Rate (i.e. IFR (resp.
DFR)).
The hazard rate is related to the reliability by
t
R(t ) exp h( x)dx ,
0
and is equal to the failure intensity function.
In software reliability engineering the hazard rate in the field is
piecewise constant, discontinuities occurring only at new releases.
The mean time to failure (MTTF) is defined as the expected value
of the lifetime before the occurring of a failure. If the reliability function
for a system is given by R(t), and is the corresponding failure density
function, then MTTF can be computed as
MTTF tf (t )dt .
But f (t )
d
R(t ) , and using the integration by parts it is obtained
dt
successively:
37
Copyright: 2012 by Florin POPENTIU-VLADICESCU
MTTF tdR(r ) tR(t ) 0 R(t )dt .
0 0
However, lim tR(t ) 0 (the system must fail after a finite amount
t
V (t ) P( S t ) g ( s)ds.
t
MTTR sg ( s)ds.
38
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The mean time between failures (MTBF) is an important measure
for repairable systems being the expected value of the random variable
time between failures, and is evaluated as MTTBF = MTTF + MTTR.
Availability is a measure that allows the repair of a system when
failures occur, while reliability is a measure asking the system to be full
operational for the entire mission of time. The availability is defined as
the probability that the system is operational at time t, i.e.
System up time MTTF
System up time + System down time MTTF MTTR
Availability = .
39
Copyright: 2012 by Florin POPENTIU-VLADICESCU
A proactive fault management technique aiming at cleaning up
the internal state of the system to prevent the occurrence of more
severe crash failures in the future and increase the availability of
software is called rejuvenation design which involves, occasionally ,
stopping the running cleaning its internal state (by garbage collection,
flushing operating system kernel tables, flushing internal data
structures) and restarting it.
When comparing to hardware systems, the term occasionally is
replaced by:
clock-based (according to a plan), age-based (according to the
age of the component),
usage-based (according to the usage time of the product),
condition-based (some events appear and where identified by
monitoring),
opportunity-based (in general for multi-component systems),
and
design-out (when a component redesign is necessary to
improve the reliability).
Rejuvenation design depends on the application availability
requirement, the application failure rate and the rejuvenation
performance impact. Software rejuvenation is a cost effective approach
for proactive management of software faults that include protection not
only against hard failures, but also against performance degradation.
40
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2 Reliability Function for Common Distributions
41
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.1. Exponential distribution
Practical Having a constant failure rate, this distribution is
motivation a good choice in software reliability engineering
due to the fact that a used software component
that has not failed is as good as a new component.
Because of the memoryless property used-as-
good-as-new of this distribution, it is
appropriate to model the constant hazard rate
portion of the bathtub curve used in reliability
theory.
Other useful facts: 1) Reliability: The amount of
time a component has been in service (to be
repaired) has no effect on the amount of time
until it fails; 2) Service systems: The amount of
remaining service time is independent of the
amount of service time elapsed so far; 3) Inter-
times: The amount of time since the last event
contains no information about the amount of
time until the next event.
Due to the mathematical properties of the exp
function (exp(x) * exp(y) = exp(x + y)), it is a very
good choice for computing the reliability of serial
architectures.
In software reliability engineering, the
exponential distribution can be used for
modeling both the failure and repair processes.
42
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Analytic The probability distribution function (pdf) of the
expressions of exponential distribution, with parameter >0, is
exp( x), x 0,
f ( x; )
statistic
measures 0, x 0.
MTTF = 1/,
h(x; ) = f(x; )/R(t; ) = .
For the exponential distribution, the hazard
function, or instantaneous failure rate, is constant
over time.
The memoryless property is like enabling
technology for the construction of continuous
time Markov chains (see the next section).
43
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.2. Weibull distribution
Practical The Weibull distribution is flexible and good for
motivation modeling component lifetimes with fluctuating
hazard rate functions.
Analytic The formula for the probability density function
expressions of of the general Weibull distribution, with being
statistic the shape parameter, the location parameter,
measures and the scale parameter, is
x ( 1)
exp((( x ) / ) ), x ,
f ( x; , , )
0, x .
exp( x ), x 0,
R( x; )
Comments
1, x 0,
,
44
Copyright: 2012 by Florin POPENTIU-VLADICESCU
H(x; ) = f(x; )/R(x; ) = x(-1).
45
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1 x 2
f ( x; , ) exp
2 2
1
,
as f ( x)
1 x2
2
2
e with the cumulative distribu-
46
Copyright: 2012 by Florin POPENTIU-VLADICESCU
almost all (99.7%) of the data will fall within 3
standard deviations of the mean.
as f ( x)
2
(ln x )
x 2
1
e 2 .
47
Copyright: 2012 by Florin POPENTIU-VLADICESCU
distribution function of the normal distribution.
48
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.5. Gamma Distribution
Practical The gamma failure density function has shapes
motivation that are very similar to the Weibull distribution.
The sum of exponential distributed random
variables is a random variable gamma distributed;
more precisely the distribution is Erlang.
Analytic The general formula for the probability density
expressions of function of the gamma distribution is
x x
1
exp
statistic
f ( x; , , )
( )
measures ,
x , 0, 0
where:
is the scale parameter, is the location
parameter,
is the shape parameter, and
(.) is the gamma function given by the formula
( x) t x 1et dt.
49
Copyright: 2012 by Florin POPENTIU-VLADICESCU
x ( )
F ( x; ) , x 0, 0 ,
( )
where (.) is the function defined above, but x(a)
is the incomplete gamma function4 that is defined
by the formula
R(t ) et
1
tk
.
k 0 k !
4
The gamma function was first introduced by the Swiss mathematician Leonhard Euler
(1707-1783) in his goal to generalize the factorial to non integer values.
50
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.6. Lomax Distribution
Practical It is known also as the Pareto of the second kind.
motivation Let X be a random variable that has an exponential
distribution with parameter having the density
function
g ( x; , ) x 1e x / , x 0,
( )
1
E[ X ] , 1
( 1)
1
and
Var[ X ] , 2.
( 1)2 ( 2)
Comments The hazard (failure rate) function is given by
h(t )
1 t
.
51
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.7. Binomial Distribution
Practical The most widely used distribution in reliability
motivation and quality inspection; situations like
success/failure, accept/reject etc. can be easy
analyzed using the discrete binomial distribution.
The binomial distribution is well suited to obtain
the probability of observing x successes in N trials,
with the probability of success on a single trial
denoted by p. The binomial distribution assumes
that p is fixed for all trials.
Analytic The pdf of the distribution is given by
N
P( X x) p x (1 p) N x , x 1,1, 2,
expressions of
x
, N,
statistic
measures where
N
N!
x x !( N x)!
is the binomial coefficient (k! is for factorial).
Comments The reliability function, R(k), describes the
situation at least k pout of N items are good and
is given by
R(k ) p x (1 p) N x .
N
N
xk x
52
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.8. Geometric Distribution
Practical The geometric distribution models the number of
motivation successes before one failure in an independent
succession of tests where each test results in
success or failure. Each trial has two possible
outcomes: success with probability p, failure with
probability q, p + q= , p, q .
Analytic P(Y=x) = pxq, x = , , , , …
expressions of The cdf of the geometric distribution is obtained
statistic measures as
P Y x = P Y= + P Y= +…+ P(Y=x) = =
q+pq+…+pxq=
= q +p+…+px)=1-px+1.
Comments Another variant describes the probability
distribution of the number X of Bernoulli trials
needed to get one success, x = , , ,…
A recent proposal, namely the Weibull-Geometric
distribution, is suitable for modeling unimodal
failure rate.
There are some software reliability growth
models based on geometric distribution
interpretation.
53
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.9. Poisson Distribution
Practical The Poisson distribution is used to model the
motivation number of events occurring within a given time
interval. The sample size is unknown.
Analytic If is a constant failure rate, and x is the number
expressions of of events, the pdf of Poisson distribution is given
statistic by
( t ) x e t
P( X x) , x 0,1, 2,
measures
x!
where t is the time mission.
Comments The reliability function R(k) – the probability of k
or fewer failures – is given by
R(k )
k
( t ) k e t
.
x 0 x!
The Poisson distribution is frequently used as a
first approximation to described failures expected
with time.
In some special cases, the Poisson distribution is
an excellent approximation to the binomial
distribution.
This distribution is particularly important in
stochastic processes, where it counts the number
of events in a given period of time (see the next
section on Poisson Processes).
54
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.2 Stochastic processes in reliability
2.2.1. Preliminaries
55
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Discrete state processes are called chains. The methods of
process analysis depend on the nature of the spaces (parameter space
and state space) of the stochastic process under study.
An important example of stochastic process refers to the
counting process. A stochastic process is said to be a counting process if
X(t) represents the total number of events that have occurred up to time
t. Hence a counting process X(t) has to satisfy:
(i) X (t ) 0 ,
(ii) X(t) is integer-valued,
(iii) if s<t, then X (s) X (t ) for s, t R , and
(iv) for s<t, X(t)-X(s) equals the number of events that have
occurred in the interval (s, t).
A special case of counting process is the renewal process. An event is
called a renewal if upon its occurrence everything starts over and over
again stochastically.
Let X1 be the time to the first renewal and let Xn n = , , … be
the time between the (n-1)st and the n-th renewal. Assume that Xn (n =
, , … are independent and identically distributed random variables
with distribution function of F, with F(0) = P{Xn = 0) <1, and positive
expectation xdF ( x) 0 .
0
i 1
56
Copyright: 2012 by Florin POPENTIU-VLADICESCU
A Markov process is one having the following property, called
also the Markov property: the probability of any particular future
behavior of the process, when its current state is known exactly, is not
changed by additional information concerning its past behavior.
A Poisson process is a stochastic process which counts the
number of events and the time when these events occur during the time
interval under study.
Moreover, the most used DTMCs are the first-order DTMC type,
called simply as Markov chains. In this case only the present (at time k)
has any influence on the future (at time k+1); it is a memoryless process.
In other words, for all k ,
P{X k 1 j | X 0 i0 , X1 i1 , , X k i} P{X k 1 j | X k i}.
57
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The state space is considered to be either finite or countable
infinite. We can define the (one-step) transition probability from state i
to state j at time k by
pij (k , k 1) P( X k 1 j | X k i), k 0,1, .
where s is some integer between k and n, and l covers the space state.
58
Copyright: 2012 by Florin POPENTIU-VLADICESCU
i.e. the conditional distribution of the future state, given the present
state and all history, depends only on the present state being
independent of the past. More specifically, if for each s , t > , for each
states i, j , and the history X u , u s, the above condition becomes
P{X t+s = j | X s = i, X u , u s} = P{X t+s = j | X s = i}.
59
Copyright: 2012 by Florin POPENTIU-VLADICESCU
the state j is chosen according to a joint distribution Fij(t), that
means the holding time distribution for the state j is
Fij(t)/Fij().
For a semi-Markov process there are defined both the holding
time distribution Fi(t), Fj(t), and a probability of transition from state i to
state j, denoted by P = {pij}. Using the above notation the following
where
L( F (t )) e st F (t ) F ( s) ,
60
Copyright: 2012 by Florin POPENTIU-VLADICESCU
i pij , j i 1 (the probability of death in interval t is i t ), and
pij 0 , j {i 1, i 1} .
is a constant function with respect to the time then the process is called
Homogeneous Poisson Process (HPP). However, if = (t) then the
61
Copyright: 2012 by Florin POPENTIU-VLADICESCU
process is called Non homogenous (Non Homogeneous Poisson Process
– NHPP).
For a Poisson process, the probabilities Pn(t) = P({N(t)=n}), n
nonnegative, satisfy the following system of ordinary differential
equations:
P0' (t ) (t ) P0 (t ),
'
Pn (t ) (t ) Pn (t ) (t ) Pn 1 (t ), n 1
that for the initial conditions {there exists i such that Pi(0) = 1, Pn(0) = 0,
n i} has the solution
(t )n (t )
Pn (t ) e ,
n!
where
62
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.3 Basic Techniques in Reliability
RS Ri .
n
i 1
S1 S2 Sn
S1
S2
Sn
63
Copyright: 2012 by Florin POPENTIU-VLADICESCU
If the system S is composed by n subsystems S1, S2, …, Sn,
connected in parallel, the system Si having the reliability Ri i = , , …,
n), then the reliability of S = PAR(S1, S2, …, Sn), denoted by RS, is given by
RS 1 (1 Ri ) .
n
i 1
S1 S5
S2 S4 S6
S3 S7
given by
RS Ri (1 R)( n i ) ,
n
n
i k i
64
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where
n
n!
i i !(n i )!
, pn ) pi q j .
expressed as follows using the path (D) set of the system:
R( p1 , p2 ,
pD i p j p
65
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.3.2 Network reliability
66
Copyright: 2012 by Florin POPENTIU-VLADICESCU
In an edge fault model it is assumed that elements (nodes,
connections) fail randomly and independently, according to certain
known probabilities. In the following we consider the network G = (N,
E), and assume that is a directed network having two distinguished
nodes: s – the source, and t – the destination. The nodes are assumed to
be perfect, but the connections are assumed to fail in a statistically
independent fashion with known probabilities qk (k is the index of a
connection); pk = 1-qk is the probability of being operational. A
fundamental measure is Rst(G) – the two terminal reliability measure.
Let us consider m be the number of edges (m = |E|), every edge having
one of two states, working or failed; the state of the network can be
represented by a 0-1 array s = (s1, s2, …, sm) – the kth component of the
array s is 1 if the connection k is working, otherwise sk = 0. Based on the
assumption of independent edge failures, the probability associated of a
given state s is given by
P( s) pk sk (1 pk )(1 sk ) .
m
k 1
considering all network states (2m), filtered by the Boolean indicator as:
Rst (G) Ist ( s) P(s) .
all network states (s)
Since only states s with the non null Boolean indicator contribute
to the above sum, it is not necessary to consider all states (2m is a huge
number making the approach impractical), but only those related to the
s-t path. The focus will be on the simple paths {P1, P2, …, Pk} of G. Let Ei
be the event that all edges in path Pi operate. Therefore
67
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Rst(G) = P(E1 E2 … Ek).
Also, the two-terminal reliability can be formulated considering the
cutsets of G (minimal s-t edge disconnecting sets): an s-t edge
disconnecting set is minimal if it does not contain any other edge
disconnecting set separating s and t. Let {C1, C2, …, Cr} be the s-t cutsets
of G. If Fj is the event that all edges in the cutest Cj fail, then the
unreliability is given by:
Ust(G) = P(F1 F2 … Fr).
Let e be a particular edge of G. This edge can be perfect (pe = 1), this
system will be denoted by G/e, or can fail (pe= 0) and the system will be
denoted by G-e. According to the pivotal decomposition formula
(Moskowitz(1958), cited by Shier(1991)):
Rst(G) = peRst(G/e)+(1-pe)Rst(G-e).
68
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.3.3 Statistical estimation methods
ln L( X , ) ln f ( X i , ) ,
n
i 1
69
Copyright: 2012 by Florin POPENTIU-VLADICESCU
As example we estimate the parameter of the exponential
distribution by maximum likelihood method. Let X1, X2, …, Xn be a
random sample from the exponential distribution, where the density
function is f(x, ) = e-x, >0, x>0. The joint probabilistic distribution
function of the sample is given by
xi
L( X , ) e
n
n i 1
,
and
ln L( X , ) n ln xi .
n
i 1
*
x
n
n
,
i 1
i
S
k
( fi Fi )2
,
i 1 Fi
70
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where fi is the frequency of sample observations in each class, and Fi is
the theoretical frequency (area under density function between the class
boundaries).
From the 2 tables, a value of 2 with the desired significance
level and with degrees of freedom ( = k-1-p, where p is the number of
the estimated parameters) rejects the hypothesis that the sample
distribution is the same as theoretical distribution if
S 12 ,k 1 p ,
71
Copyright: 2012 by Florin POPENTIU-VLADICESCU
sample size, and dn, is the critical value from the table of the
Kolmogorov-Smirnov test, then the hypothesis F = F* is rejected only if
dn > dn,. In the table given below F* is denoted F0 by (Miller, 1956).
72
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3 RELIABLE SOFTWARE DEVELOPMENT
73
Copyright: 2012 by Florin POPENTIU-VLADICESCU
least six for complex software projects. One phase can be composed by
multiple activities.
Frestimate considers the following nine phases:
Concept,
Software Requirements,
Top Level Design,
Detailed Design,
Code/Implementation,
Unit/development Test,
System/verification/validation Test,
Operational Test, and
Operation.
Analysis
The analysis phase is the first and the most important in the
whole process of building high quality software. During this phase the
problem is analyzed, the requirements are defined, and the
specifications are established for the subsequent activities phases.
The problem analysis will establish the scope of the project and a
clear understanding of what is required. By interactions with customer
the problem will be reformulated in such a way to be well-defined and
the answer to the following questions will be known: What is the context
of the problem? What are the parts or elements of the problem? Why the
74
Copyright: 2012 by Florin POPENTIU-VLADICESCU
user needs a software product to solve the problem? Who or what is
affected by this problem?
Once a well-defined problem is given, the next activity consists of
requirements collection and analysis to establish the product
capabilities and constraints). Both qualitative and quantitative
requirements have to be identified.
Firstly, the requirements addresses quality software attributes:
correctness through verification & validation,
reliability by design,
usability based on needs and preferences,
integrity (safety critical software, by cryptographic
techniques),
efficiency (short time answering using small resources),
portability (by implementation),
reusability (by design and implementation),
interoperability (by usage of standards),
maintainability,
flexibility,
testability/verifiability,
survivability,
expandability (by design),
robustness,
stability,
security (by design and implementation),
safety, and
availability (by appropriate service architectures).
75
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Software requirements establish also constraints about the schedule and
resources, like:
computer and input/output equipments,
staff availability for operation,
maintenance actions,
cost for development and operation.
The requirements are not only identified but also analyzed in order to
estimate costs, benefits, and do risk analysis. Finally all results of the
requirements collection and analyses are documented in the project
plan, a reference document during all phases of project lifecycle.
According to (Keys, 2005), the costs associated with a computer project
are composed (oriented to) by:
systems analysis and design;
purchase of hardware;
software costs;
training costs;
installation costs;
conversion and changeover costs;
redundancy costs, and
operating costs, including people costs.
Some examples of risks are:
customer will change or modify requirements;
technology will not meet expectations (the advancement in
technology is too fast when considers the size of the software
lifecycle);
inexperienced project team;
delivery deadline will be tightened;
76
Copyright: 2012 by Florin POPENTIU-VLADICESCU
users will not attend training;
system will be hacked, etc.
The next activity consists of transforming the user-oriented
requirements into software specifications useful to software engineers
to design and develop the software product. According to ANSII/IEEE
standards, the requirements specification shall clearly and precisely
describe the essential functions, performances, design constraints,
attributes, and external interfaces. Each requirement shall be defined
such that its achievement is capable of being objectively verified…
according to (Jones (1978) cited by Pham (2000)).
The functionality of the future software (details in input, output,
process captured by the information system, and interfaces), technical
feasibility (characteristics of people and equipment required by the
project development), and software quality specifications should be
detailed.
Not only specifications on reliability and security, but also those
related to software performance and the human factor quality should be
addressed. A technical report will be written and specification related to
the validation process will be documented.
The overall document on specifications should include at least
the following sections:
Functional requirements ,
Performance requirements ,
)nterface requirements ,
Operational requirements ,
Verification requirements ,
Acceptance testing requirements ,
77
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Portability requirements ,
Quality requirements ,
Maintainability requirements , and
Other requirements .
The benefits of a good software requirements specification are: a
customer-supplier complete understanding of the future product
characteristics; a reduced project cost; a real estimate of cost, schedule,
and risk; a clear understanding of the validation and verification steps;
improved project performability.
Design
The phase of building the software according to the specification
is called design. According to Taylor, , design is the process of
applying various techniques and principles for the purpose of defining a
device, a process, or a system in sufficient detail to permit its physical
realization.
Both architectural (structural) design and detailed (including
algorithms) design activities will contribute to a good design.
The structural design is responsible with system decomposition
in order to identify, in a top-down approach, subsystems and their
interfaces. This approach agrees with the current component-based
software development methodology to decrease the development time.
According to (Gustafson, 2002), the phases of the design process
of software products are:
data design (to produce the data structures),
architectural design (produces the structural units or classes),
78
Copyright: 2012 by Florin POPENTIU-VLADICESCU
interface design (specifies the interfaces between the units),
and
procedural design (specifies the algorithms of each method).
Alternative (and most used, nowadays) methodologies for the
architectural design are based on object-oriented approach. Object-
oriented systems development follows the same pattern as structured
systems development. Firstly, the system is analyzed (object-oriented
analysis or OOA) and the system is designed using object-oriented
design or OOD. Finally, for coding are used object oriented (OOP)
programming techniques and languages (i.e., C++, Java).
The detailed design should provide:
the software structure (data structures and algorithms or
classes and the diagrams, and structure quality
measurement),
the software tools (languages, compilers, etc.),
the verification/validation procedures,
a test plan (items to be tested and test specifications), and
the design documentation.
The documented design specifications should cover, at least, the
following aspects (Curtis et al, 2000; Keys 2005):
an executive summary outlining the principal aspects in the
specification;
a description of the proposed system and its objectives;
a full description of all components (module specifications
and structure charts, together with test data);
descriptions of inputs (including validation tests), outputs
(screen, listings, sound, etc.), and specific interfaces;
79
Copyright: 2012 by Florin POPENTIU-VLADICESCU
a description of all data storage to include specification of file
and database structure;
a detailed specification of controls operating over procedures
within the system,
a specification of all hardware requirements and performance
characteristics to be satisfied,
a detailed schedule for the implementation of the system, cost
estimates and constraints, and
standards to be considered for documentation, coding, testing
and validation, and quality assurance.
Implementation
Transforming algorithms and their data structures in computer
programs is called coding or implementation. The design of algorithms
(in previous phase) is important for an efficient code in a suitable
programming language.
The implementation phase starts with reusable
module/class/package identification (the component-based approach,
the object-oriented design). It continues with code editing (writing new
code, and modifying reusable code) according to a good coding style.
The next step is dedicated to code inspection which includes: code
reviews (mainly the aspects concerning the program logic and code
readability), code quality (performance related concerns on speed, and
memory used, reliability and security) and the software maintainability
(how easy the code will be maintained). The final step of
implementation phase is related to code testing
80
Copyright: 2012 by Florin POPENTIU-VLADICESCU
(modules/classes/packages to be tested, testing strategies and methods,
testing schedules, and the resource required for an efficient testing.
A good practice is based on the test driven development
methodology: test-driven development asks to developers automated
unit tests, containing assertions, created before writing the code.
Testing
Software testing and debugging represent some of the most
important components of the software engineering process (Galen
(2005)) with major concerns on software reliability. Even, when the
newest and powerful automatic testing tools are used and a quality
assurance strategy in defect removing is applied, some uncertainties in
software testing can be identified in the following activities: test
planning, test selection, test execution, test result checking, error
tracing, and quality estimation, as (Madsen et al, 2006) states.
Software testing methods and techniques vary greatly in variety,
effectiveness, cost, need for automated tool support, and ease of use ,
according to Miller . Even, program testing can be used to show
the presence of bugs, but never to show their absence , as Dahl et al.
(1972) say, every test performed increases intrinsic software quality by
no fault reveals, or by discovering a latent fault.
The existence of specifications is essential to software testing.
Correctness in software has a meaning only if the program mapping is
the same as the specification mapping. According to (Gustafson, 2002),
a program without a specification is always correct. Therefore, the
software without a specification does what is does and cannot be tested
against any specification because this does violate nothing.
81
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Following Hailpern & Santhanam (2002) given a specification
and a computer program, "any activity that exposes the program
behavior violating a specification can be called testing."
The efforts will be directed to test against the following classes of
faults, according to (Madsen et al, 2006):
physical/human-made,
accidental/intentional non-malicious/intentional malicious,
development/operational,
internal/external, and
permanent/temporary.
In this view, not only program running, but also activities like:
design reviews, code inspections and static analysis of source code, are
labeled as testing activities even they form the so called "static testing".
According to the mentioned source, "each occurrence of the program
design or the program code that fails to meet a specification is a defect
(bug)."
Following (Madsen et al., 2006), the test cases design could be
based on the knowledge of the code - the white-box method. Also, the
black-box testing approach addressing the overall functionality of the
software is used to discover faults like incorrect or missing functions,
errors in any of the communication interfaces, errors in data structures
management, including databases and defects related to response time,
software start-up or software termination.
According to the mentioned reference, the white-box testing
method consists of the following tasks:
82
Copyright: 2012 by Florin POPENTIU-VLADICESCU
a) the investigation of all independent paths within every
module, based on the flow graph analysis – structured
testing using the Cyclomatic complexity metric ;
b) the examination of all logical predicates – branch/domain
testing ;
c) the analysis of all loops to check their limits, and
d) the internal/external data structures validity checking.
83
Copyright: 2012 by Florin POPENTIU-VLADICESCU
entity). They have to use various approaches depending on the
programming language used during coding phase. In a test-driven
development approach the developers often use testing frameworks to
create and automatically run sets of test cases. For many programming
languages there are available unit testing frameworks. A short list is
presented in the next table:
84
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Bottom Up Testing (where the lowest level components are tested first,
then used to facilitate the testing of higher level components), Top
Down Testing (testing is conducted from main module to sub module; if
the sub module is not developed a temporary program called STUB is
used for simulate the component functionality), and Sandwich Testing
(combine top down testing with bottom up testing).
The acceptance test is a validation of the testing phase and makes
use of internal tests (in-house) and field tests (user environment). The
acceptance test is defined by Pham, as the formal testing
conducted to determine whether a software system satisfies its
acceptance criteria and to enable the customer to determine whether
the system is acceptable.
85
Copyright: 2012 by Florin POPENTIU-VLADICESCU
uninstall (before retirement), and
retire (at the end of the life cycle of a software product).
***
86
Copyright: 2012 by Florin POPENTIU-VLADICESCU
B. Applications development — JAD follows JRP, involves users
in workshops;
C. Systems construction — design specifications, used to
develop and generate code, and
D. Cutover — users trained and the system tested.
3. Extreme programming (XP) is a new programming methodology
adopted few years ago. According to (Keys, 2005), XP is the application
of a group of practices to software development projects like:
1) The planning game: collaboration of business and
programming professionals to estimate the time for short
tasks;
2) Small releases: a ship cycle measured in weeks rather than
years;
3) Metaphor: a single overarching metaphor to guide
development substitutes for a formal architectural
specification;
4) Simple design: no provision in the code for future changes or
flexibility;
5) Testing: every piece of code exercised by unit tests when
written, and the full suite of tests when integrated;
6) Refactoring: any piece of code subject to rewriting to make it
simpler;
7) Pair programming: all production code jointly written by two
developers;
8) Collective ownership: the right of any developer to change any
piece of code on the project;
9) Continuous integration: code integrated to the main line every
few Hours;
87
Copyright: 2012 by Florin POPENTIU-VLADICESCU
10)On-site customer: a business person dedicated to the project,
and
11)Coding standards: one standard per project.
CASE tools
The term computer-aided software engineering (CASE) is used to
describe a set of tools which are able to automate all or some of various
tasks of the software life cycle, like: requirements capture, tracing, and
tracking; configuration management; model verification; facilitation of
model validation; maintenance of all project-related information;
collection and reporting of project management data, document
production and CASE data import-export.
Mainly, the CASE functions include analysis, design, and
programming. Therefore, a CASE framework provides design editors,
data dictionaries, compilers, debuggers, system building tools, etc.
There are a large variety of CASE products. A CASE product index
is maintained by (http://www.unl.csi.cuny.edu/faqs/software-
enginering/tools.html), belonging to a category:
CASE tools (provide support for some specific tasks in the
software process),
CASE workbenches (supporting only one or few activities), and
CASE Environments (supporting a large part of the software
process).
The CASE Environments can be simple toolkits (example: the
Unix Programmer's Work Bench), oriented towards a programming
language (IBM Rational ClearCase), integrated (IBM Application
Development Cycle), fourth generation language (4GL) based (Informix
88
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4GL, FourGen CASE Tools: http://www.gillani.com/CASETools.htm),
process-centered environments (Enterprise Architect: Enterprise II),
and CASE applications (covering all aspects of the software development
life cycle).
SourceForge.net provides a web-based platform for creating and
publishing open software. The user can find a lot of tools useful to
software project management. However, software reliability is not
covered, in general, by CASE products. There are specific software tools,
called Computer Aided Software Reliability Engineering, many of them
being dedicated to estimation of software reliability, such as ROBUST
(Reliability of Basic and Ultrareliable Software sysTem), FRestimate,
TERSE, SREPT (Software Reliability Estimation and Prediction Tool),
SRETOOLS (AT&T Software Reliability Engineering Toolkit), SRMP
(Statistical Modeling and Reliability Program), SoRel (Software
Reliability Program), CASRE (Computer-Aided Software Reliability
Estimation Tool), ESTM (Economic Stop Testing Model Tool), SMERFS
(Statistical Modeling and Estimation of Software Reliability Functions),
RGA(Software for Repairable System and Reliability Growth Analysis),
DACS s GOEL An automated version of the Goel-Okumoto NHPP
Software Reliability Growth Model), according to the study realized by
(Chen et al., TENCON06) which presents a powerful computer-aided
reliability assessment tool for software based on object-oriented
analysis and design, called CARATS. Details of the functionalities of some
software reliability tools are given in a next chapter.
From software reliability engineer, the following flowchart
illustrates a basic strategy to deploy a reliable software.
89
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 3.1. The algorithm of deploying reliable software
90
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.2 Developing operational profiles
91
Copyright: 2012 by Florin POPENTIU-VLADICESCU
four profiles (customer, user, system-mode, functional) are on the
design level of a system while the last profile (operational) is on the
implementation level and deals with the actually coded operations of a
system (Koziolek 2005).
The profiles are defined by creating hierarchical lists of
customers, users, modes, functions and operations that the software has
to provide under each set of conditions. As an example, an operational
profile can have a tabular representation, specifying the name for each
operation and its associated probability, or a graphical representation
(with attributes of operations as nodes and the values of the attributes
as branches).
Customer profile
A complete set of customer groups with corresponding
occurrence probabilities makes up the customer profile. Customers are
92
Copyright: 2012 by Florin POPENTIU-VLADICESCU
persons, groups, or institutions that purchase the system. They can but
need not to be the users of the system at the same time. Customers in a
customer group use the system in the same way. For example,
companies with an equal number of employees may use a telephone
switching system in the same way because they have the same number
of users even though their businesses are different.
User profile
Users are persons, groups, or institutions that use a system. All
members of a user group use the system in the same way. Examples for
user groups are system administrators, maintenance users, regular
users etc. The user profile can be derived by taking the customer profile
and determining the user groups for each customer group. Hence, the
overall occurrence probabilities for user groups can be obtained by
multiplying the probabilities for each user group of a customer group
with the occurrence probability of that customer group.
93
Copyright: 2012 by Florin POPENTIU-VLADICESCU
on server 1 versus functions executed on server 2) are some examples
of system modes.
The system modes are the consequence of some factors like: day
of week of time of day, time of year, traffic levels, system maturity, and
system s capability.
Functional profile
For (Musa 1993, 1999) a function is a task or part of work of a
system as defined during the design. Functional profiles are usually
designed during the requirement phases or during early design phases.
Before designing a functional profile it is often helpful to construct a
work-flow model capturing the overall processes and the context of the
system (i.e. software, hardware, and people). To create a functional
profile the system modes have to be broken down into single functions.
Therefore, the functional profile is a complete set of functions with
corresponding occurrence probabilities. The functional profile is
independent of the design methodology.
Operational profile
Operations, as opposed to functions, are actually implemented
tasks of a system, while functions are tasks of a system on the design
level. The functions of the functional profile evolve onto operations of
the system. The complete set of operations with corresponding
occurrence probabilities makes up the operational profile, as (Koziolek
2005) defined.
With the occurrence probabilities of the operational profile, test
cases can be selected efficiently because the most used operations will
be tested the most.
94
Copyright: 2012 by Florin POPENTIU-VLADICESCU
To capture various particular situations and difficulties in the
decomposition process to the lowest level the concept of extended
operational profile, consisting of a process profile, a structural profile
and a data profile, was introduced.
The process profile is similar to Musa s operational profile. The
structural profile characterizes the data structures of the application
and its configuration, and includes a description of the software and
hardware environment of the software. The data profile is concerned
with the value assigned to the variables.
Also a Markov Chain Usage Model is available for modeling
sequences of inputs to a software system. The model has two phases: the
structural phase (the state creation and connection of the consecutive
states by arcs), and the statistical phase (assigning probabilities to arcs).
Let E denotes the set of all operations and that {Yt} is the
operation performed by the system at time t, {Yt; t } is a continuous-
time process, called the operational process with the state space E is the
operational space.
Let (i) lim P[Yt i] (assume to exist) is the proportion of time
t
that operation i is performed in the long run. Therefore, the pair (E, )
can be considered the operational profile. The analysis of the software
failure process depends on the stochastic structure of the operational
process.
Let Xn denote the nth operation that the system performs and Sn
is the time at which the nth operation starts. We assume that the
duration of the nth operation is exponentially distributed with rate (i)
if this operation is i. Connecting with the process Y, we have Yt = Xn on Sn
t < Sn+1.
95
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Let A be the generator matrix of Y: Aij = (i)(P(i,j)-I(i,j)), where
P(i,j) = P[Xn+1=j | Xn = i], and I is the identity matrix. Let
(i) lim P[ X n i] (assume to exist).
n
A 0, P ,
(i ) 1 resp. (i) 1 .
iE iE
(i) / (i)
Also, for any operation i, (i)
( j)
.
jE
( j)
r (i, j ) 1 P(i, k )r (k , j ) ,
system of linear equations
k j
m(i, i )
(i) (i)
1
.
96
Copyright: 2012 by Florin POPENTIU-VLADICESCU
From practical point of view the operational profiles should be
updated on a regular basis since they can change over time, and the
number of operations that are being tested should be limited, as (Vouk
2000) observed.
Finally, the development of operational profiles as described
above helps to focus resources on the most used and/or most critical
functions and to optimize the testing phase by making the tests highly
representative of the operational field.
97
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.3 Software failures and failure processes
3.3.1 Faults
98
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Many faults are introduced during implementation (coding)
phase depending on the programmer s experience. Static analyzers can
be used but important limitations there exist.
If software is modified during maintenance phase, faults can be
introduced by re-engineering (new requirements, design, etc.).
Depending on the behavior, the faults can be classified as
transient (appear for a short period of time and then disappear),
permanent (software design defects), and intermittent (appear and
disappear from time to time).
To help the software engineers, in 1990, IBM invented the ODC
(Orthogonal Defect Classification) approach - A technology for software
process measurement and analysis, providing definitions for defect
classification as well as analysis metrics and procedures for process
evaluations. A defect will belong to those classes that collectively point
to the part of the process which needs attention.
The defect type is a key concept in ODC. Other attributes used by
ODC are: defect trigger (to provide measurement instruments of the
causal relationship of software defects), and defect impact (to measure
the impact of the defect in terms of the effect of failure on the customer).
Following (Chillarege et al 1992), two assumptions are required:
there exists a semantic classification of defects, from a product, such
that the defect classes can be related to the process which can explain
the progress of the product through this process the necessary
condition , and the set of all values of defect attributes must form a
spanning set over the process sub-space the sufficient condition .
(Churamani ) reports a list of 31 defect types along the software
development process:
99
Copyright: 2012 by Florin POPENTIU-VLADICESCU
requirements analysis (3) = {requirements not captured,
requirements captured incorrectly, violation of requirement
analysis guidelines or templates},
design (20) = {incomplete screen design, incorrect screen
design, incomplete database design, incomplete report layout,
incorrect report layout, incomplete interface definition,
incorrect interface definition, incomplete security design,
incorrect security design, incorrect functionality, violation of
design guidelines or templates, incorrect/not optimized SQL,
incorrect logic in design, incomplete Unit Test Cases, incorrect
Unit Test Cases, incomplete Integration Test Cases, incorrect
Integration Test Cases, incomplete Functional Test Cases,
incorrect Functional Test Cases},
implementation (5) = {code missing, incorrect logic in code,
exception handling error, navigation errors, violation of
coding guidelines or templates}, and
documentation (3) = {incomplete documentation, incorrect
documentation, violation of documentation guidelines or
templates}.
The defect types should be chosen so as to be general enough to
apply to any software development, independent of a specific product,
according to (Chillarege, Lyu). The classification should be applied to a
defect found in any phase, and the defect types should span the space of
phases to satisfy the sufficient condition. A defect type is important also
for the programmer making changes for a defect. Some types of defects
were also described by (Chillarege, Lyu) in order to help programmers:
100
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1) A function defect is one that affects significant capability,
end-user features, product application programming
interface, interface with hardware architecture, or global
structure ;
2) An assignment defect indicates a few lines of code, such as
the initialization of control blocks or data structure ;
3) An operand defect is one that affect the value of the operand
after a type cast (conversion) (see the case of Ariane 5
project).
4) An interface defect corresponds to errors in interacting with
other components, modules, device drivers via macros, call
statements, control blocks, or parameter lists ;
5) A checking defect addresses program logic improper
validation conditions, loop conditions, etc;
6) Timing/serialization defects are the result of a timing error
between systems/subsystems, modules, software/hardware
or are the result of the omission or an incorrect use of
serialization for controlling access to a shared resource;
7) An algorithm defect is the result of efficiency or correctness
problems that affect the task and can be fixed by
(re)implementing an algorithm or local data structure;
8) A Build/Package/Merge defect is those encountered during
the system build process, and was the result of the library
systems, or with management of change or version control;
9) A Documentation Defect was the result of an error in the
written description contained in user guides, installation
manuals, prologues, and code comments.
101
Copyright: 2012 by Florin POPENTIU-VLADICESCU
ODC uses three sets of defect triggers: Review and Inspection
Triggers, Unit and Function Test Triggers, and System and Field Test
Triggers. Some triggers belong to both the set of Field Test Triggers and
the set of Inspection Triggers or Function Test Triggers.
The set of Review and Inspection Triggers contains:
DC-Design Conformance (the fault was detected while
comparing the work product being inspected with the
corresponding specification from the prior development
phase(s)),
OS-Operational Semantics (the fault was detected while
considering the logic flow needed to implement the function
under review),
C-Concurrency (The error was detected while considering
the synchronization needed between tasks to control a shared
resource),
BC-Backward Compatibility (the fault was detected while
considering the compatibility between the function described
by the work product under review and that of prior versions
of the same product),
LC-Lateral Compatibility (the fault was detected while
considering the compatibility between the function described
by the work product under review and other
systems/subsystems or functions that it must interface with),
RS-Rare Situation (the fault was detected while considering
some abnormal system behavior that is not specified by the
requirements for the function under review),
102
Copyright: 2012 by Florin POPENTIU-VLADICESCU
SE-Side Effects (the fault was detected while considering
some system, product, function, or behavior that may occur
that is beyond the scope of the work product under review),
DC-Document Consistency/Completeness (Internal Docu-
ment; the error was detected while reviewing the work
product for consistency/completeness and conformance to
documentation standards),
LD-Language Dependencies (the fault was detected while
reviewing the work product for implementation language
specific details).
103
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Test Sequencing (the fault was detected by using Black Box
testing to execute multiple functions in a very specific
sequence), and
Test Interaction (the fault was detected by using Black Box
testing to execute multiple functions in combination. The
interaction involves more than a simple sequencing. This
trigger applies only if the functions operate correctly when
tested independently but fail when executed in together).
104
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Startup/Restart (the fault was detected while the
system/subsystem was being initialized or restarted
following an earlier shutdown or complete system or
subsystem failure),
Hardware Configuration (the fault was detected as a result
of testing a particular hardware configuration), and
Software Configuration (the fault was detected as a result of
testing a particular software configuration).
105
Copyright: 2012 by Florin POPENTIU-VLADICESCU
corrective fixes without impacting the customer's data and
operations),
Documentation (the degree to which the product/system
documentation and user's manuals are correct and aid in the
customer's understanding and use of the product/system),
Migration (the ease and degree to which the product/system
can be upgraded to the newer release without impacting the
customer's data and/or operations),
Standards (the degree to which the product/system
conforms to established pertinent standards), and
Integrity/Security (the degree to which the product/system
is protected from inadvertent or malicious destruction,
modification, or disclosure).
106
Copyright: 2012 by Florin POPENTIU-VLADICESCU
stack locations, instead of keeping them in registers. The second reason
is that debuggers commonly provide watches or other user interfaces
that cause code to be executed, which, in general, change the state of the
program. A bohrbug (named after the Bohr atom model) is a bug that, in
contrast with heisenbugs, does not disappear or alter its characteristics
when it is researched. The bohrbugs are easily identified using a test and
debug approach during coding process, or a unit testing done by a
programmer testing, as Rainsberger & Stirling (2005) propose. A special
case of heisenbug is the mandelbug (named after fractal innovator Benoit
Mandelbrot), which is a computer bug whose causes are so complex that
its behavior appears chaotic. A schroedinbug is a bug that manifests itself
apparently only after the soft-ware is used in an unusual way.
In C/C++, some heisenbugs appear from uninitialized auto
variables, fandango on core phenomena (corruption of the heap memory
under programmer management) or errors in stack management.
The bugs depending on the environment are called agingbugs
and are motivated by: deterioration in the availability of the operating
system re-sources, data corruption, and the numerical error
accumulation. Gradual service degradation and crash failures are some
effects of these sorts of bugs. Reactive (restart, rollback, roll-forward,
progressive retry, occasional reboot) and proactive by software
rejuvenation introduced by Huang et al. (1995), are some popular
approaches when dealing with agingbugs. Software rejuvenation is
primarily indicated for servers where applications are intended to run
indefinitely without failure as shown by Shankland (2000). Other
approaches in software fault tolerance can be found in EventHelix
(2000).
107
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.3.2 Software failures
108
Copyright: 2012 by Florin POPENTIU-VLADICESCU
In general, classes are separated by factors of 10, as (Musa, 1999)
proposed. Another measure is based on the system capability impact:
The class 4 can cover those failures generating minor deficiencies in one
or more operations, the class 3 can cover failures which make
unavailable to users one or more minor operations, the class 2 can
include those failures generating the unavailability of one or more
important operations, and the class 1 will cover those who make
unavailable to users one or more key operations. The number of classes
is only an example. The scale can be improved taking into consideration
more details on system capability impact.
Another important term for software reliability engineers is the
failure intensity (failures per unit time). The unit time can be the
operating time (software age) or the amount of processing done.
In order to define the necessary reliability it is necessary to
define failure with severity classes, to choose a common measure and
set a failure intensity objective according to the following steps (Musa,
1999):
1) Establish known component values;
2) Pick a trial allocation of component failure intensity
objectives to minimize system development time,
development risk, or development cost;
3) Compute a trial system failure intensity from trial component
failure intensities and compare this to the requirement;
4) Modify the allocation and recomputed the trial system failure
intensity until it matches the requirement.
109
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Not only failures and their impact are relevant for software
reliability engineering, but also the failure mode in order to evaluate
effects and eliminate the generative causes.
In general, a failure mode and effects analysis (FMEA) can be
defined as a systematic way of identifying failure modes of a system,
item or function, and evaluating the effects of the failure modes on the
higher level , according to (Pentti & Atte, 2002). The objective of FMEA
is to determine the causes for the failure modes and decisions to be
taken to eliminate or reduce the chance of failure.
For all kind of systems, a failure mode is the manner by which a
failure is observed, and describes the way the failure occurs and its
impact on equipment operation. The failure effect is represented by the
consequence(s) a failure mode has on the operation, function, or status
of an item. Failure effects are classified as local effect, higher level, and
end effect.
In software engineering, the failure modes of software are
generally unknown. The software modules do not fail, but only they
display incorrect results. The definition of failure modes is one of the
hardest parts of the FMEA of a software-based system. To discover this
incorrect behavior the software engineer has to apply his own
knowledge about the software to set up an appropriate FMEA approach.
The failure mode and effects analysis for hardware or software
has certain distinguishing characteristics, as Ristord et al. (2001)
describes. The hardware FMEA may be performed at functional level or
part level; it applies to a system considered as free from failed
components; it postulates failures of hardware components according to
failure modes due to ageing, wearing or stress; it analyses the
consequences of theses failures at system level, and states the criticality
110
Copyright: 2012 by Florin POPENTIU-VLADICESCU
and the measures taken to prevent or mitigate the consequences. When
consider software processes, the FMEA (let us denote by SWFMEA) is
only used at functional level. Moreover, SWFMEA applies to a system
considered as containing software faults which may lead to failure under
triggering conditions. Also, it postulates failures of software components
ac-cording to functional failure modes due to potential software faults,
and analyses the consequences of these failures at system level.
SWFMEA states the criticality and describes the measures taken to
prevent or mitigate the consequences.
There are many sources of successfully application of SWFMEA,
as identified by (Pentti & Atte, 2002). For (Goddard 2000), SWFMEA
validates that the software has been constructed to achieve the specified
safety requirements, while (Maier 1997) described the use of FMEA
during the development of robot control system software for a fusion
reactor, to mention only a few of them.
(Reifer 1979) gives the following general list of failure modes
based on the analysis of three large software projects: computational,
logic, data I/0, data handling, interface, data definition, data base, and
other.
(Ristord et. al. 2001) give a list of five general purpose failure
modes at processing unit level: the operating system stops, the program
stops with a clear message, the program stops without clear message,
the program runs, producing obviously wrong results, and the program
runs, producing apparently correct but in fact wrong results.
For each input and each output of the software component, (Lutz
et. al. 1999) consider the following four failure modes:
missing data (e.g., lost message, data loss due to hardware
failure),
111
Copyright: 2012 by Florin POPENTIU-VLADICESCU
incorrect data (e.g., inaccurate data),
timing of data (e.g., obsolete data, data arrives too soon for
processing), and
extra data (e.g., data redundancy, data over-flow).
For each event (step in processing), on the other hand, they
consider the following four failure modes:
halt/abnormal termination (e.g., hung or dead-locked at this
point),
omitted event (e.g., event does not take place, but execution
continues),
incorrect logic (e.g., preconditions are inaccurate; event does
not implement intent), and
timing/order (e.g., event occurs in wrong order; event occurs
too early/late).
To close this section, we have to say that reliability is a necessary,
but not sufficient condition for software acceptance by the customer.
However, establishing reliability objective under various constraints is a
good practice and requested by the paradigm development Design by
Contract .
Also, FMEA is well understood at the systems and hardware
levels, where the potential failure modes usually are known and the task
is to analyze their effects on system behavior. A common opinion is that
FMEA is applicable to software-based systems but only to a limited
extent.
112
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.3.3 Failure processes
113
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Assume: 1) A program initially contains m faults when the test
phase starts; 2) A test is defined as a number of test instances (couples
of input data and output data) denoted by ti, i= , , …, n; Detected
faults are not removed between test instances.
The model assume two categories of faults: newly discovered
faults, and rediscovered faults.
Notations: Wi – the number of active faults during test instance ti;
Ci-1 is the cumulative number of errors already detected so far by t1, t2, …,
ti-1; Ni – the number of newly detected errors by time ti; ni – an observed
instance of Ni.
Assuming a hypergeometric distribution for the newly detected
faults Ni, the probability of obtaining exactly ni newly detected faults
among Wi faults is:
m Ci 1 Ci 1
Wi ni
P( N i ni )
ni
m
,
Wi
where
Ci 1 nk , C0 = 0, n0 = 0,
i 1
k 1
and
max{0, Wi-Ci-1} ni max{Wi, m-Ci-1}, for all i.
Therefore, the expected number of newly detected faults during the
interval [ti-1, ti] is
m Ci 1Wi
E ( Ni ) ,
m
and the expected value of Ci is
114
Copyright: 2012 by Florin POPENTIU-VLADICESCU
E (Ci ) m 1 1
Wj
.
i
j 1 m
115
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.4 Factors that affect Software Reliability
Y
n
programming experience, PS i 1
i
, where n is the total number of
n
116
Copyright: 2012 by Florin POPENTIU-VLADICESCU
programmers involved in the project, and Yi is the number of years of
experience of the programmer i, i = , , …, n.
4. Percentage of Reused Code (PORC) is defined taking into
account KLOC for the existing modules (denoted by S0), and KLOC for
new modules (denoted by SN):
PORC
S0
S0 S N
.
ICON
nh
,
n
TCVG
Sc
,
St
LPT Ri ,
n
i 1
117
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where n is the number of categories, and Ri is the rating score of the
category i.
8. Percentage of Reused and Modified Modules (PRMM) can be
computed by:
PRMM
LOCm
,
LOCn
LOCm is the number of line code for modified modules, and LOCn is the
number of line code for new modules.
9. Testing Effort (TE) is defined as the number of testing cases
generated, testing expenditure, or the number of human years that the
testing takes.
10. Testing Environment (TENV) is defined as the degree of the
compatibility of testing and operational environments.
118
Copyright: 2012 by Florin POPENTIU-VLADICESCU
design to requirements; work standards; development
management),
Coding (programmer skill; programmer organization;
development team size; program workload; domain
knowledge; human nature like mistake and work omission),
Testing (testing environment; testing effort; testing resource
allocation; testing methodologies; testing coverage; testing
tools; documentation), and
Hardware systems (processors; storage devices,
input/output devices, telecommunication device, system
software).
119
Copyright: 2012 by Florin POPENTIU-VLADICESCU
120
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4 SOFTWARE TESTING FOR CERTIFICATION AND
RELIABILITY GROWTH
121
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Alpha testing is done at the end of development on a virtual user
environment created according to the user requirements.
Beta testing is done by end-users or others (third parties), and
represents the final testing before releasing the application.
Acceptance testing is done to verify if system meets the customer
specified requirements.
Load testing is a performance testing to check system behavior
under load.
Another software testing concepts used by software reliability
engineers is regression testing which ask for testing the application as a
whole for the modification in any module or functionality.
The acceptance testing is based both on simple and complex
tests. A simple test consists of single execution of operations, with
minimized interactions between the operations. Complex tests are used
for system testing and integration testing.
Preparing for test assumes the recording the user and other
interface interactions and environmental variables necessary to initiate
the runs. Run categories can be established for a careful test selection
and improve testing.
According to Musa(1999), the process for preparing test cases
needs:
1) to estimate the number of new test cases required for the
current release;
2) to allocate the number of test cases among the (sub)systems
to be tested;
3) to allocate the number of new test cases for each system
among its new operations;
4) to specify the new test cases;
122
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5) to add the new test cases to the pool of test cases from
previous releases.
To estimate n1 - the number of new test cases under constraints
of time and cost is a challenging task. Let t1 be the number of test cases
that can be prepared in the specified time, and t2 be the number of test
cases that can be prepared under a specified cost. Then n1 = min (t1, t2).
The number t1 depends proportionally on the available time and the
available staff. The available time can be considered as the interval
between requirements establishing (the end of), and the start of test.
The number t2 depends on the budget allocated for test case
preparation. In general, the number of new test cases is greater than the
number of new operations (at least one test case is necessary for each
operation). Sanity testing (if a new software version is performing well
enough to accept it for a major testing effort) is necessary before
releasing the new software version.
To establish n2(k) – the number of new test cases allocated to the
(sub) system k, it is necessary to give particular weight to the different
operations that have high occurrence probabilities, as Musa(1999)
recommended. At least two subsystems can be considered: operating
system – n2(1), and application n2(2). Therefore, n1 = n2(1)+n2(2).
Having n2(k) and o(k) the number of new operations of the
subsystem k, the allocation can be realized in four steps (Musa, 1999):
1) starting with the operational profile obtain the occurrence
probability for each operation;
2) identify the rarely occurring critical (with great impact in
case of failure) new operations and determine the number of
test cases to be assigned;
123
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3) determine the allocation probabilities for the other new
operations and pre-assign one test case to each infrequent
other new operation;
4) assign the remaining test cases to the remaining other new
operations in accordance with the allocation probabilities.
Test cases specification assumes equal probability of test cases
selection among all the choices defined by the possible combination of
subsets (called levels by Musa(1999)) of the input domains of the input
variables defining the operation.
For each operational mode is required a test procedure. Test
procedures are executable. Test procedure is nothing but a group of
small test cases maximum of 10.
A test case can be recorded (in a data base) taking into account
the following fields:
test case id,
unit to test,
assumptions,
test data,
steps to be executed,
expected result,
actual result,
pass/fail,
comments.
A basic format of a test case statement is:
Verify [action to do]
using [tool name, dialog, … ]
with [conditions] to [demonstrate, show, …].
124
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4.2 Executing test
125
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Any testing approach, including software reliability engineering
testing, makes use of test cases, test scripts, test scenarios and test runs.
A test case is a document that defines a test item and specifies a
set of test inputs or data, execution conditions, and expected results. A
test case is generally executed manually but many test cases can be
combining for automated execution (by a test execution engine).
A test script is a step-by-step procedure for using a test case to
test a specific unit of code, function, or capability.
A test scenario is a chronological record of the details of the
execution of a test script.
A test run represents a series of logically related groups of test
cases or conditions. The homogeneity of a set of runs is the proportion
that shares the same failure behavior. The sets of runs with
homogeneity close to 1 are called run categories. If one run in the run
category executes successfully, there is a high probability that all runs
will be successful. Hence, establishing run categories can improve the
testing efficiency, because testing must be as short as possible in order
to keep costs under control.
According to Musa (1999), executing test asks the following
activities: testing time allocation, the test invocation, and the failure
identification.
The test time is measured, in general, in hours and is allocated
taking into account:
1) the systems (components) to be tested;
2) the type of test (from feature, regression, and load test);
3) the operational modes.
126
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Musa(1999) recommends the following rules.
R1. In the case of reliability growth test, then
- The allocation among systems is based on the
estimated risk;
- Same proportion of test time will be allocated for
every new test case;
- Allow enough time for regression test;
- Allocate the remaining time for load test.
R2. When only certification test is planned, the total time will
be allocated to load test. The load test should be divided
among the operational modes in the same proportions of the
exploitation in the field.
R3. There are no guidelines for determining the total amount
of feature and load test time. Only the principal variables are
known, but not the model parameters.
t CT
F
TN
,
where tCT is the amount of testing time in natural or clock time units, λF
is the failure intensity objective (the same units), and TN is a normalized
measure given by
ln( /(1 ))
TN
1
,
127
Copyright: 2012 by Florin POPENTIU-VLADICESCU
with – the producer risk, – the consumer risk, and γ – the
discrimination ratio.
The SRE test starts only after the units of the systems were tested
and their integration was verified also. Test invocation follows the
operational mode invocation.
The number of test cases invoked depends on the time the
operational mode runs and its occurrence rate. From reliability point of
view, the testing driven by an operational profile is efficient. It identifies
failures, on average, according to their occurrence pattern.
Also, in load test scripts, dynamic data can be used to reproduce
more complicated and time-sensitive bugs.
To identify failures, the following tasks are necessary:
1) analyze the test output for deviations from expected behavior
(by manual inspection, built-in audit code usage, or using
external checkers);
2) determine those deviations which can be considered as
failures (violate the user requirements);
3) establish when the failures occurred (using the same measure
that used in setting the failure intensity objective).
128
Copyright: 2012 by Florin POPENTIU-VLADICESCU
executing test (by testing tools);
store the test results (in a database, for future use);
analyze test results (statistics, data mining),
authenticate the test operators.
As consequences of the considerations presented above let
outline the following:
- Ensure that the test environment is correctly configured for
metrics collection.
- Use the system manually during test execution so that you can
compare your observations with the results data at a later time.
- Using dynamic test data in a load test allows the usage of error
values if they suit the test plan.
- Run a test that loops through all of selected data to check for
unexpected errors.
- Run tests in a variety of combinations and sequences to ensure
that one test does not corrupt data needed by another test in
order to run properly.
- When use agile methods in software engineering then modify the
tests, test designs, and subsequent strategies as results analysis
leads to new priorities. If repeating the test is necessary, consider
establishing a test data restore point before beginning the testing.
- Analyze results immediately and modify the test plan
accordingly.
129
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4.3 Software Reliability Data Collection
130
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The clock time counts the only time during software runs (the
down time is excluded), when both the run ID and the time will be
recorded.
The execution time counts only that part of the clock time during
which the processor is executing instructions of the software being
monitored (at other times the software may be suspended, or the
processor will be executing other programs, in a time-sharing task
scheduling approach).
Musa (1999) proposed that all reliability theory should be based
on execution time, which is a better measure of the failure-inducing
stress . Musa recommends the usage of natural units or time units when
define the reliability quantities. A natural unit is a unit that is related
to the output of a software-based product and hence the amount of
processing done, and represents, also, the amount of failure-inducing
stress, so the number of failures experienced is related to the number of
natural units that have occurred. Some examples follows: one failure per
1000 printed pages, one failure per 1000 transactions, etc. According to
his experience, Musa appreciates that measurement of natural units is
often easier to implement than that of execution time .
The computation of execution time data by the operating system
is necessary but not sufficient for execution time data collection and
recording because the measurements are based on sampling and the
accuracy is affected, or the measurement is not repeatable (multitasking
time-sharing systems and the Heisenberg effect). Even imperfect, the
execution time recorded by the operating system is a good
approximation to the true execution time. For instance, when executing
a command from the Unix shell, the user can prefix the command with
the word time to measure the execution time of the command.
131
Copyright: 2012 by Florin POPENTIU-VLADICESCU
From software engineering point of view, each programming
language provides some mechanism to retrieve the current time from
the system. If we read and store the system time at various moments, we
can compute time intervals by subtracting the values of system time
taken at different moments. A simple example, in C, follows:
clock_t st, fin;
st=clock(); {the_algorithm_code();} fin=clock();
printf("time=%f\n",(fin-st)/CLK_TCK);
However, the basic unit of the clock_t type is seconds, which is way too
large for measuring operations that are happening at clock rates in the
gigahertz range.
Companies and research groups concerned with getting highly
accurate performance measurements often set up specially configured
machines that minimize any sources of timing irregularity, such as by
limiting access and by disabling many OS and networking services. To
provide greater precision for timing measurements, many processors
also contain a timer that operates at the clock cycle level. This timer is a
special register that gets incremented every single clock cycle. Special
machine instructions can be used to read the value of the counter. Not
all processors have such counters, and those that do vary in the
implementation details.
Let consider that during the monitoring period there are two
active processes: A and B. The processor alternately executes part of
process A, then part of B, and so on. As it executes these processes, it
operates either in user mode, executing the instructions of the
application program; or in kernel mode, performing operating system
functions on behalf of the program, such as, handling page faults, input,
or output. When the scheduler switches from process A to process B, it
132
Copyright: 2012 by Florin POPENTIU-VLADICESCU
must enter kernel mode to save the state of process A (still considered
part of process A) and to restore the state of process B (considered part
of process B). From the perspective of an application program, the flow
of time can be viewed as alternating between periods when the program
is active (executing its instructions), and inactive (waiting to be
scheduled by the operating system).
Manual (non-automated) and automated failure data collection
approaches are used. Where data is manually collected (by users,
testers, etc.) the following situations are possible:
1) data collectors could be not fully familiar with the
requirements of failure data collection;
2) collection standards, in general, are not used, or the usage is
not complete;
3) the paper forms (even well designed) are not filled
accordingly: critical fields left blank, informally recording of
information outside the form fields, and incorrect data entry
(due to transcription errors).
Automated data collection is not fully possible even formal
specification, and classification rules (applied to failures) are used.
In the following the organization of the failure data files used by
CASRE (Computer Aided Software Reliability Estimation) and SMERFS
(Statistical Modeling and Estimation of Reliability Functions for
Software) is presented. CASRE uses time units (interpretable as natural
units). The type of CASRE files is ASCII (American Standard Code for
Information Interchange) and can be created by basic software as text
editors (like Edit, Notepad etc.) Two formats are supported by CASRE
corresponding to NTUD (Natural or Time Units Data) and FID (Failure
per Interval Data) collection method. However, the first line, indicates
133
Copyright: 2012 by Florin POPENTIU-VLADICESCU
time units that will be used Seconds , Minutes , or (ours for both
collection methods. The following lines differ in format. All files contain
lines with fields separated by blanks. The fields of NTUD files are: the
failure number, the number of natural or time units since previous failure,
and the severity class.
A short collection follows:
Minutes
1 35 1
2 82 1
3 115 1
4 128 1
5 157 1
6 173 1
7 214 1
8 243 1
9 257 1
10 301 1
11 376 1
12 425 1
13 514 1
14 724 1
15 968 1
The fields of the FID lines are: the interval number, the number of
failures in the interval (greater than or equal to zero), the duration of
interval in natural or time units, three fields having the value zero (to
void out not used program features), and the severity class, as shown in
the following data set:
134
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Hours
1 7 2 0 0 0 1
2 4 3 0 0 0 1
3 4 4 0 0 0 1
4 3 2 0 0 0 1
5 2 3 0 0 0 1
6 2 4 0 0 0 1
7 2 5 0 0 0 1
8 1 8 0 0 0 1
9 1 10 0 0 0 1
10 1 23 0 0 0 1
135
Copyright: 2012 by Florin POPENTIU-VLADICESCU
As Pham remarked, the time-domain approach always
provides higher accuracy in the parameter estimates . The software
engineer responsible with reliability management must trade off the
cost of data collection with the accuracy reliability level required.
136
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4.4 Applying Failure Data to Guide Decisions
137
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Before choosing a model, tests on data a necessary to be applied (for
instance, a test to check the Poisson process assumption).
Some important topics in Reliability Data Analysis are:
Calendar-Time Analysis (CTA),
Usage Time Analysis (UTA),
Calendar Time Reliability Analysis (CTRA),
Usage-Based Reliability Analysis (UBRA), and
Special Events Analysis. (SEA).
CTRA is applied when failures are reported only in the calendar-
time domain. This is possible, and accepted in some proportion, because
practice managers and users who reports the failures of software
according to their operational profile, are more attached to CT domain
which is closer to their way of thinking in order to decide for future.
However, Musa described the case of a Weibull-type failure intensity
model applied for calendar-time system behavior conducting to inferior
fitting than similar model obtained using execution-time-based
intensity.
The UBRA can be used if sufficient usage information is available,
as Jones & Vouk shown. SEA considers failure modes and those events
that lead to a class of failures having extremely damaging consequences.
Rare event models should be used. However, the computation of the
probability of rare events is difficult.
If a (software) system is composed by small systems
interconnected and integrated in some way, and there is available a
collection of failure data for every component and another collection of
failure data regarding the integration step, the decisions can be made in
the following order, according to Musa (1999):
138
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1) accepting or rejecting an existing component;
2) guiding the software development process for the system
(both changes in the process and the resolution of failures –
according to some priority list, are important);
3) accepting or rejection a super-system;
4) releasing the system.
All charts have similar shape, but the boundaries between the
reject and continue and continue and accept regions could be different.
139
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Let Tn be the normalized measure of when failures occur (horizontal
1
coordinate), n be the failure number, A ln , and B ln
1
. The
value of A is changing rapidly with consumer risk but only very slightly
with supplier risk, but the value of B changes fast with supplier risk and
very slightly with consumer risk. The boundary between the accept and
A n ln
continue regions is given by Tn
1
, and the boundary between
B n ln
the continue and reject regions is computed by Tn
1
. The
D
tt F
TN ( n)
,
140
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3. The software can be accepted or rejected as soon as a cross of accept
or reject region appears. It is not necessary to wait for another failure to
occur.
4. The demonstrable FI/FIO ratio can be used when the certification
periods end without leaving the continue region.
5. For the case of components under certification testing it is necessary
to continue the test only if no failures appear and if the demonstrable
FI/FIO ratio is large. If the demonstrable FI/FIO ratio is large and
failures have occurred then reject the component. Accept the
component if the demonstrable FI/FIO ratio is not large.
6. For software products (the whole system), if failures appear and the
demonstrable FI/FIO ratio is large it is necessary to return to the
reliability growth test and, after removing faults, run the certification
test, once again.
7. For super-systems the certification test ends if the product has been
accepted and the demonstrable FI/FIO ratio is acceptable (less then 2.0).
The FI/FIO ratio can be used during reliability growth tests being
estimated from failure data. The estimation of FI/FIO ratio can be
obtained using SMERFS and CASRE. These packages provide estimation
methods for the parameters of the supported software reliability
models. Also, statistical inference on accuracy is available. The most
used software reliability models and computational and statistical
aspects will be described in the next chapter.
141
Copyright: 2012 by Florin POPENTIU-VLADICESCU
142
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5 SOFTWARE RELIABILITY MODELS
143
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The most desirable characteristics of software reliability methods are
(according to Chu et al. (2010)):
1. The description of the method and its application is
comprehensive and understandable.
2. The assumptions of the method have reasonable bases.
3. The method allows for consideration of the specific
operating conditions of the software.
4. The method takes into consideration the quality of
lifecycle activities.
5. The method makes use of available test results and
operational experience.
6. The method addresses uncertainty.
7. The method has been verified and validated.
8. The method is capable of demonstrating the high
reliability.
9. The method should be able to estimate parameters that
can be used to account for software common cause
failures.
A large number of software reliability growth models (SRGMs)
have been developed over the years, and each model has its strengths
and limitations. There are different assumptions in different SRGMs on
how the failure rate decreases with time; that is, the models specify
different empirical formulas, and use test data to estimate their
parameters.
According to (IEEE, 2008), Software Reliability (SR) models can
both assess and predict reliability . Assessment deals with measuring
past and current reliabilities and provides forecasts of future reliability.
144
Copyright: 2012 by Florin POPENTIU-VLADICESCU
There are many ways to develop an SR model: a) describe it as a
stochastic process; b) relate it to a Markov model; c) define the
probability density or distribution function; or d) specify the hazard
function. In the following the most important models are described.
145
Copyright: 2012 by Florin POPENTIU-VLADICESCU
greater or equal to zero. (a>1 -> the error rate
increases with time; a<1 -> a decreasing error rate; a
= 1 -> a constant failure rate over time). The
corresponding pdf for the time to failure is given by:
a t
a 1
t a
f (t ) exp , t 0 ,
bb b
with: cdf F(t)=1-exp(-(t/b)a), the reliability function
R(t)= exp(-(t/b)a), and hence, the MTBF is b/a
(1/1), where (.) is the gamma function.
Data K – the total number of testing intervals.
requirements ni – the total number of errors in each time interval of
testing, i = 1, 2, ..., K.
di – the length of the testing interval, i = 1, 2, ..., K.
i 1
Estimates Let
n
i
, X i ln d j , Yi ln ln
i 1
F (i ) ,
j 1
j
M j 1 1 F (i )
Xi Yi (Y Y )( X X )
K K K
X i 1
,Y i 1
, mˆ i 1
(X
i i
X)
K
,
K K 2
i 1
i
and bˆ0 Y mX
ˆ .
146
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Comments di should be measured in CPU time rather than wall
clock time.
bˆ 1
, where (.) is the
aˆ aˆ
The estimate of MTBF is
147
Copyright: 2012 by Florin POPENTIU-VLADICESCU
E
R(t ) exp K T c ( ) t .
IT
E
1
MTBF K T c ( ) .
IT
Therefore, The only
H i Ti , j t
mi ri mi
.
j 1 j 1
i, j
Estimates The constant failure rate for the ith functional testing
period is estimated as:
r mi
Zˆi number of failures per hour = i ,
Hi
148
Copyright: 2012 by Florin POPENTIU-VLADICESCU
H1
r m ET
1
c ( 1 )
,
1 1 K
IT
H2
r2 m2
1
c ( 2 )
.
ET
K
IT
149
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The rate is reduced by the same amount at the time
of each error is detected.
Data If t0 = 0, and t1, t2, ... are the sequence of time
requirements moments when errors are discovered, and Xi = ti – ti-1,
i=1, 2, ..., N, then Xi are assumed to be distributed
exponentially with rate Z(ti):
f ( X i ) [ N (i 1)]exp{[ N (i 1)] X i }.
i 1
150
Copyright: 2012 by Florin POPENTIU-VLADICESCU
N X i (i 1) X i
n
n n
i 1 i 1
N (i 1)
and
(i 1) X i
n
1 n
1 n
.
i 1
N n
X i i1
i 1
( X i MTBFi ) X i
,
2
n n
1
i 1 [ N (i 1)]
2
i 1
n
1
[ N (i 1)]2
i 1n
,
Xi
i 1 N (i 1)
n
2
n
2
Xi 1
i 1 ( N i 1) i 1 ( N i 1)
=
n n
3
Xi 1
i 1 ( N i 1) i 1 ( N i 1)
.
151
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The estimating time to remove the next m errors
(TRNE), after observing n failures is given by
( N j 1).
nm
TRNE
1
j n 1
152
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where N is the initial number of errors in the
program, is the proportionality constant, and
Fi 1 n j
i 1
j 1
L( f1 , , f M )
[( N Fi 1 ) xi ] fi exp{( N Fi 1 ) xi }
M
.
i 1 fi !
The MLEs of and N can be obtained by taking the
partial derivatives of lnL with respect to and N:
FM A
N 1 B A
,
M
N 1 B A i 1 N Fi 1
FM fi
,
i 1
i 1 i 1
153
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5.1.5 Schick-Wolverton model
respectively,
154
Copyright: 2012 by Florin POPENTIU-VLADICESCU
MTBF R( X i )dX i
2[ N (i 1)]
.
0
( N i 1) X
2n
n
2
i 1
i
and
X
n
2
i 1
n i
1
i 1 N i 1
.
2
Comments The MLE of the MTBF is given as
MTBF
2( N n)
.
155
Copyright: 2012 by Florin POPENTIU-VLADICESCU
during the current time interval of testing. Suppose fi
errors are discovered during the ith testing interval
j 1
x
Z ( xi ) ( N F i 1) Yi 1 i ,
2
where
xi is the amount of testing time spent between
the end of testing period i1 and the end of the
ith testing interval, and
j 1
with
MTBF
exp ( N Fi 1 )Yi 21 .
( N Fi 1 )
156
Copyright: 2012 by Florin POPENTIU-VLADICESCU
M
FM
( N Fi 1 ) xi (Yi 1 )
,
xi
xi (Yi 1 i ),
M
i 1 2
M
fi x
i 1 N Fi 1 i 1 2
with
M – the number of testing intervals and
i 1
157
Copyright: 2012 by Florin POPENTIU-VLADICESCU
occurrence of the (i-1)st error and the ith (the hazard
rate decreases in a geometric progression as error
detection occurs because 0 < <1). If Xi = ti – ti-1 (is
the time of discovery between the ith and (i-1)st
error), and Xi s are assumed to be independent
exponentials with rate Z(ti):
f ( X i ) D i 1 exp( D i 1 X i ) ,
exp D i 1 X i .
L ( X 1 , X 2 , , X n ) D i 1
n n
n
i 1 i 1
D X i 0,
n
i 1
n
n i 1
D (i 1) i 2 X i 0.
i 1
n
i 1 i 1
Therefore,
i X
n
n 1
i
D i 1
i
X X
n
n
, and n
.
i i 2
i 1 i 1
i i
158
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4. The error detection, for the ith interval, respects
the following rule:
Z (t ) D ni 1 for t i 1 t t i ,
159
Copyright: 2012 by Florin POPENTIU-VLADICESCU
requirements x2, ..., xn and fi errors are discovered in the ith interval.
Let Mi be the number of corrected errors, then E{fi} =
(N-Mi-1)gi(x1, x2, ..., xi), where is the
proportionality constant, N is the initial number of
errors, and gi is a function (gi (.) = xi ( =1; the
x i2
Jelinski-Moranda model), (gi(.)= , the Schick-
2
n
fi
i 1
N g i M i 1 g i
,
n n
n
i 1 i 1
(N M )
n
fi
i 1
g i.
i 1 i 1
160
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5.2.2 The Geometric Poisson Model
D
is given by:
L( f 1 , f 2 , , f m )
m i 1 f i
exp D i 1
.
i 1 fi!
161
Copyright: 2012 by Florin POPENTIU-VLADICESCU
i
m 1
i 1
m
i
After some computation on and , the
i 1 i 1
m
D m 1
fi
i 1
,
i 1
fi
i 1
m
i 1 (1 m )(1 )
m
(i 1) f i
(m 1) m 1 m m
.
i 1
162
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5. The intervals are all of the same length.
6. The rate of error detection is proportional to the
number of errors within the program at the time of
test. The error detection process is assumed to be a
non-homogeneous Poisson process with an
exponentially decreasing error detection rate: di =
i 1
M sFs11 exp( M s 1 ) m mifi exp( mi )
L( f 1 , f 2 , , f m ) ,
Fs 1! i s fi!
fi
m
ln( y ), i 1
1 exp( m )
,
163
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where y is the solution of the polynomial equation
( s 1) Fs 1 Fs ,m
m m A,
mF
y 1
s 1
y 1 y 1
with
A ( s i 1) f s i and F s ,m f i .
m s m
i 0 is
164
Copyright: 2012 by Florin POPENTIU-VLADICESCU
b(t )[a(t ) m(t )],
dm(t )
dt
t0
B(t ) b( s)ds .
t
t0
i 1
LLF log( ( si )) m( s n ),
n
i 1
where (t )
dm(t )
.
dt
165
Copyright: 2012 by Florin POPENTIU-VLADICESCU
A summary of some representative NHPP software reliability
models is presented in the following table.
If a(t)=a, b(t )
b 2t
bt 1
then
166
Copyright: 2012 by Florin POPENTIU-VLADICESCU
(t ) ab 2 t exp( bt ),
and
m(t ) a{1 (1 bt ) exp( bt )}
Inflection S- S-shaped Assumptions:
shaped 1. Some of the faults are not detectable
before some other faults are removed.
2. The probability of failure detection is
proportional to the current number of
detectable faults.
3. Failure rate of detectable faults is
constant and identical.
4. The isolated faults can be entirely
removed.
If is an inflection factor, a(t) = a,
m(t)=a(1-exp(-bt))/(1+ exp(-bt)),
and
ab(1 ) exp( bt )
(t )
(1 exp( bt )) 2
.
167
Copyright: 2012 by Florin POPENTIU-VLADICESCU
TTF of failure i−1; 5) Each time a failure
occurs, the fault which caused it is
immediately removed, and no other faults
are introduced.
Attempt to count for testing effort:
a(t)=a,
b(t)=r exp(- t),
168
Copyright: 2012 by Florin POPENTIU-VLADICESCU
dmi (t )
dt bi (ni (t ) mi (t ))
dn (t )
i i
dmi (t )
dt
m(t ) mi (t )
dt
3
i 1
ni (t ) api
m (0) 0,
i
with the following components:
R( x / t ) exp
3 api (1 i )bit
(1 e (1 i )bi
i 1 1 i
e
169
Copyright: 2012 by Florin POPENTIU-VLADICESCU
a1 exp( bt ) 1 at
m(t )
1 bt
.
b(t )
1 e bt
b
,
(c a)(1 e bt )
b
a
e t e bt
m(t )
1 exp( bt )
.
(a m(t )) ,
dm(t ) c
dt nT
with the solution m(t)=a(1-exp(-ct/(nT))), which is called the Musa
exponential model.
Another model based on NHPP was obtained by (Duane, 1964)
which observed that the cumulative failure rate versus cumulative
testing time on logarithmic paper tended to follow a straight line. The
model assumptions are:
1. The software is operated according to operational profile.
170
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2. Every error has the same chance of being detected and is of the
same severity as any other error.
3. The error occurrences are independent.
4. The cumulative number of errors detected at any time t, denoted
by N(t), follows a Poisson distribution with mean m(t). The mean
function is taken to be of the form m(t)=λt .
and λ are:
t 1
1 n
t n
The MTBF is .The MLEs of , and
ln(t
n
n 1
, where the ti s are the observed failure times and n is
/ ti )
i 1
n
environmental factors is
171
Copyright: 2012 by Florin POPENTIU-VLADICESCU
and the reliability function has the expression:
R(x/t)=R0(x/t)( z),
where R0(x/t) is th reliability function without environmental factors.
172
Copyright: 2012 by Florin POPENTIU-VLADICESCU
(b) For a given error rate, i, the time between error
occurrence Xi = ti – ti-1 is exponential with mean 1/i,
f ( X i | i ) i ei X i , X i 0.
173
Copyright: 2012 by Florin POPENTIU-VLADICESCU
f ( X i 1 ) f ( X i 1 | i 1 ) g (i 1 )d i 1
( t ) ( N i )
0
[( N 1) 1]1 ( t xi 1 ) i 1
i 1
d i 1
[( N i ) ]
e
[( N i ) ]( t )( N i )
0
, X i 1 0
( t X i 1 )( N 1) 1
i.e. a Pareto distribution. The following quantities can
be derived:
1. The reliability function after i errors are
discovered is given by:
R( x) P{ X i 1 x} 1 P{ X i 1 x} 1 f ( X i 1 )dX i 1 .
x
Hence,
t
( N i )
R ( x)
t x
.
when (N-i)>1.
4. The likelihood function is
174
Copyright: 2012 by Florin POPENTIU-VLADICESCU
( N i 1) ( t )( N i 1)
n
i 1
L( N , , ) i 1
( ti 1 X i )( N i 1) 1
,
where
ti 1 x j
i 1
j 1
S ( N , , ) X i
ti 1
( N i 1) 1
2
n
i 1
.
X i ( ti 1 ) ( ti 1 ) 2
n n
,
2 3
n X ( N i 1)( t ) n ( N i 1)( t ) 2
[( N i 1) 1] [( N i 1) 1]
i
i 1 i 1
i 1 i 1
i 1 [( N i 1) 1] i 1 [( N i 1) 1]
2 3
.
175
Copyright: 2012 by Florin POPENTIU-VLADICESCU
t
N i
ti
R( x) 1
ti dt ti dt x
i
.
176
Copyright: 2012 by Florin POPENTIU-VLADICESCU
and the randomness of debugging process, it is
obtained:
[ (i)]
f [ X i | , (i)] f ( X i | i ) g (i )d i
[ X i (i)] 1
,
0
n [ (i)]
n
f [ X i ,..., X n | , (i)] i 1
[ X (i )]
,
1
n
i 1
i
Xi> , i= , , …, n.
Using one of the forms for (i), the likelihood
function is now a function of three unknowns (, 0,
1):
L(, 0, 1) =f(X1, X2, …, Xn|, 0, 1).
The MLEs are the solutions to the following system of
equations:
ln (i ) ln( X i (i )] 0,
L n n
n
i 1 i 1
L
0,
n n
1 1
0
( 1)
L
(i ) X (i )
i 1 i 1 i
( 1) 0,
n
1 i 1 (i ) X i (i )
j j
177
Copyright: 2012 by Florin POPENTIU-VLADICESCU
[ (i)]
f ( X i | , (i))
[ X i (i)] 1
,
X i [ (i)] (i)
E{ X i }
dX i
[ X i (i)] 1
1
,
if >1.
0
S ( , 0 , 1 ) X i E{ X i } .
n
2
i 1
S 2 (i )
n
1 ni 1 0,
X i (i )
i 1
S n 0 1n(n 1)
Xi 0,
n
1 2( 1)
1 i
0 i 1
1
n
2
X ii i 1 0.
n
2( 1) 1
S n ( n 1) 0
i 1
Comments The model tries to account for error generation in the
corrective process. With each error correction, a
sequence of programs is generated, each being
obtained from its predecessor by attempting to
correct a bug. A good programmer should have a
more rapidly increasing function than a poorer
programmer.
178
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5.5.3 Thomson-Chelson’s Bayesian reliability model
179
Copyright: 2012 by Florin POPENTIU-VLADICESCU
(b) The length of testing time for each period (Ti, i =
, ,…
(c) For the total number of software packages that
have been released, the number found to contain
errors. These numbers are used in determining the
prior distribution for p at any stage.
Estimates Case 1: If p=1 (i.e. it is known before testing begins
that the software contains bugs), and fi errors are
observed in testing time Ti, then the posterior
distributions for (the failure rate) and the software
reliability R are:
(Ti T0 ) fi f0 1 fi f0 (Ti T0 )
h ( | f i ) , 0,
( fi f 0 1)
e
and
Ti T0
1
T T 1
f 0 fi f 0 fi 1
R t
f ( R | fi ) i 0 ln
t R ( f 0 f i )
,
180
Copyright: 2012 by Florin POPENTIU-VLADICESCU
a 1 b 1
H ( | fi 0) (Ti T0 ) f0 1 x f0 e x (Ti T0 ) dx
ab2 0 ab2
,
and
R (Ti T0 )
f 0 1 Ti T0
F ( R | fi 0, p) p
1
x dx, 0 R 1,
x ln
( f 0 1)
0
1, R 1.
and
b 1 a 1 Ti T0
f 0 1
R
a b 2 a b 2 Ti T0 1
.
If fi>0, then
fi f 0 1
Ti T0
,
and
T T
f 0 fi
R i 0
Ti T0 t
.
181
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Comments 1. The model objective is to obtain the total system
reliability. Not only software failures are considered,
but also hardware failures and unknown or
ambiguous source-related malfunctions.
2. The parameter a is thought of as the number of
previously delivered software packages with errors
among a total of a+b delivered.
T0 T j ,
i 1
j 1
and
f0 f j .
i 1
j 1
182
Copyright: 2012 by Florin POPENTIU-VLADICESCU
6 Software Reliability Model Selection
183
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1) Establishment of reliability goals for software elements
through modeling and allocation
2) Estimation of software design reliability through the software
reliability prediction process
3) Software reliability growth through operational profile
testing, and
4) Evaluation of the achieved software reliability through
Formal Qualification Testing.
Software reliability prediction is performed at each phase of the
software development process up to software system test.
Product and process metrics applicable to the goals of the project
are collected and used to predict the failure rate that the software will
exhibit at the beginning of system test and at deployment. The failure
rate prediction is then used to estimate the duration of, and assurance
resources required for, the reliability growth testing, which will be
needed to achieve the allocated software reliability.
Both non-quantitative and quantitative criteria are used to select
a software reliability model to make a decision.
The set of non-quantitative criteria contains: the model validity,
the ease of measurements, quality of assumptions, applicability,
simplicity, and insensitivity to noise. As a general definition, a model is
valid if it is measuring/predicting what it is intended to
measure/predict, and nothing else.
The set of quantitative criteria contains, but is not limited to, the
self-consistency, the goodness of fit, the relative accuracy, the bias, and
the bias trend.
Analysis of a model s predictive quality can help user decide
which model(s) to use. Simplest question a Software reliability
184
Copyright: 2012 by Florin POPENTIU-VLADICESCU
engineer/manager can ask is (ow reliable is the software at this
moment? The time to the next failure, Ti, is usually predicted using
observed times of failure t1, t2, …, ti-1.
In general, predictions of Ti can be made using observed times to
failure t1, t2, …, ti-k, k>0.
The results of predictions made for different values of K can then
be compared. )f a model produced self consistent results for differing
values of K, this indicates that its use is appropriate for the data on
which the particular predictions were made.
185
Copyright: 2012 by Florin POPENTIU-VLADICESCU
following some statistical tools useful to judge the increasing/
decreasing reliability of software systems are presented.
tk is the inter-failure time between failures k-1 and k, and si is the time of
occurrence of failure i.
One can extract two types of information from such a graph: local
and global changes.
When the values are positive (resp. negative), the reliability is
globally increasing (resp. decreasing). On the other side, when the
values are increasing (resp. decreasing), we have local variations of the
reliability.
If U is approximately equal to zero, it indicates a lack of trend,
If U is greater than zero, the TBFs are decreasing,
If U is less than zero, the TBFs are increasing.
186
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 6.1. Local and global changes (Karama Kanoun:
http://resist.isti.cnr.it/files/corsi/courseware_slides/software_reli
ability_eng.pdf)
187
Copyright: 2012 by Florin POPENTIU-VLADICESCU
6.1.2 TTT – Total Time on Test
T
i
. Since the remaining n-i components have survived the interval
j 1
( j)
(t ) T( j ) (n i)t ,
i
j 1
(T(i ) ) T( j ) (n i)T(i ) ,
i
j 1
for all i = 1, : , … , n.
The TTT plot is defined as a plot of the ordered pairs
i (T(i ) )
, ,
n (T( n ) )
188
Copyright: 2012 by Florin POPENTIU-VLADICESCU
(T(i ) )
(T( n ) )
for all i = ;, , …, n. The second item is called the scaled TTT
statistics.
It was proved that, for any 0 u 1, as n , it is valid the
following relation:
(T( nu ) )
(u ) ,
(T( n ) )
0
(u ) (1 F (t ))dt ,
1 F 1 ( u )
1.0
0.8
Scaled Total Time of Test
0.6
0.4
0.2
0.0
189
Copyright: 2012 by Florin POPENTIU-VLADICESCU
If the distribution F(t) of the lifetimes is exponential then
(u)=u, hence in case of exponential failure times the TTT
plot converges to the diagonal of the unit square.
A convex TTT plot indicates reliability growth and a concave
one indicates reliability decrease.
b t
b 1
(t ) ,
aa
where a and b are the model parameters which are positive. It is easy to
observe that if b<1 then (t) decreases, meaning that the failures tend to
occur less frequently (and the system shows reliability growth). If b>1,
then the system shows reliability decrease. When b=1 the homogeneous
Poisson process case is obtained.
Considering the event {Tn = tn}, the MLE of b of a failure truncated
power-law process is given by
b
log(t
n
n 1
.
/ Ti )
i 1
n
Hence, under the null hypothesis of b=1 it follows that 2n/b 2(2(n-1)).
If the alternative hypothesis is two-sided, then the null hypothesis is
rejected if
b
2n
1 / 2
2
(2(n 1))
or
190
Copyright: 2012 by Florin POPENTIU-VLADICESCU
b
/ 2 (2(n 1))
2n
2
,
191
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Insensitivity to noise: The ability of the model to produce
accurate results in spite of errors in input data and parameter
estimates.
For assessing model fit, the following three measures may be
used to compare model predictions with a set of failure data: Accuracy,
Bias, and Trend. One way to measure prediction accuracy is by the
prequential likelihood (PL) function.
A model is considered biased if it predicts values that are
consistently greater than the observed failure data, or consistently less
than the observed data. To measure the amount of a model s bias, we
can compute the maximum vertical distance (i.e., the K-S distance)
between cumulative distribution functions of the observed and
predicted data. Trend is measured by whether a prediction quantity is
monotonically increasing or decreasing over a series of time intervals.
192
Copyright: 2012 by Florin POPENTIU-VLADICESCU
There is a great variation in the accuracy of software reliability
prediction models. Two main ways of inaccuracy has been detected:
predictions are biased or/and noisy. Bias and noise should be detected
before making a decision. In the following, basic mechanisms for quality
evaluation are given.
time to the next failure is less than t, let us consider the sequence
ui F i (t ) ,
193
Copyright: 2012 by Florin POPENTIU-VLADICESCU
If each F i were identical to the true Fi then the ui would be
realizations of independent random variables with a uniform
distribution in [0, 1]. The simplest question to answer is whether their
distribution is close to U by plotting their sample cdf: we call this the u-
Plot.
One problem with the u-Plot is that it measures only one type of
departure of predictors from reality, the bias. For example if a prediction
system applied for a set of data shows optimism in the early predictions,
and pessimism in the later predictions, then a small distance will be
observed. It seems necessary then to examine the u'is for trend (i.e. the
trend bias). To do that, the y-Plot method makes, firstly, the following
transformation xi = - ln(1-ui).
If the {ui s} are a true realization of i.i.d. random variables with
U(0,1) distribution, then the {xi s} will be a realization of i.i.d, unit
exponential random variables. Thus, if the {xi's} are plotted on the
horizontal axis, then they should look like, a realization of a
homogeneous Poisson process. The alternative hypothesis (that there is
trend in the ui s will show itself as a non-constant rate for this process.
If the values of {xi's) are normalized onto (0, 1) and the obtained
values {yi} are plotted, with
x
i
yi j 1
j
x
m
,
j 1
j
194
Copyright: 2012 by Florin POPENTIU-VLADICESCU
then a plot that shows the trend of the predictions of the model will be
obtained. The departure from the line of unit slope will indicate that the
prediction system is not capturing the "trend" in the failure data (i.e.
reliability growth.)
Fig. 6.4. The y-Plot for the LV and JM models (Littlewood, 1981)
Both u-Plot and y-Plot can be used to measure the bias. In order
to measure the noise some statistics can be used: Braun Statistics (B),
Median Variability (MV), and the Rate Variability (RV).
195
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Let E i be the estimated mean of Ti, and n the number of data. The
Braun statistics, denoted by B, is given by
ti E (Ti )
n
n 1
2
B i 1
t t
n2
n
.
2
i 1
i
MV
mi mi 1
,
i mi 1
with mi being the predicted median of Ti. This measure indicates which
model has higher variability in its predictions (how noisy are the
obtained predictions). A small value of MV indicates better smoothing of
the predictions.
The rate variability (RV) measure is defined by
RV
n
ri ri 1
,
i 1 ri 1
where ri , the rate of occurrence of failures (ROCOF), is calculated
immediately after fixing a fault.
For all measures, the model comparison proceeds only for the
same data set.
fi (t ) d Fi (t ) / dt
196
Copyright: 2012 by Florin POPENTIU-VLADICESCU
For one-step ahead predictions of Tj+1, Tj+2, …, Tj+n , the
prequential likelihood is:
f (t ).
j n
PLn
i j 1
i
Fig. 6.5. How prequential likelihood works (Fenton & Pfleeger, 1996)
197
Copyright: 2012 by Florin POPENTIU-VLADICESCU
f
j n
A
(t )
PLRn
i j 1
i
f
j n
.
B
(t )
i j 1
i
Till now, the recommendation was: pick the model which had
performed best in the past, and use it for the next prediction . This
seems unduly rejecting of models which are only slightly inferior to the
best . Why not combine predictions from different models in some
optimal way? According to the pooling of expert opinion ?
The idea behind this method is that, rather than predicting
software reliability by using only one model, a meta predictor could be
formed to take a linear combination of two (or more) predictions, with
weights chosen in some optimal way.
198
Copyright: 2012 by Florin POPENTIU-VLADICESCU
For two candidate models, A and B, could take
199
Copyright: 2012 by Florin POPENTIU-VLADICESCU
model is 0P+1M+0O for the MLC case, 1/6P+4/6M+1/6O is an example
of ULC model. For DLC models, weighting is determined by dynamically
calculating the posterior prequential likelihood , or other measure, of
each model as a meta-predictor. Changes can be observed over windows
of varying size and type. Fixed or sliding windows can be used.
For the strategy described in Fig. 6.6, the weights can vary over
time. For a prediction at stage i, i.e. of ti, let {wk} take values that
maximise the PL of the combined predictor over previous predictions, i.
j 1 j 1
two components.
Other schemes related to project criteria and user judgment, or
hybrid combinations can be used. Good practice recommends those
models that predictive validity has been observed in previous
investigations, and represents different categories of models. It is
important to select the models after a checking of the limit conditions
for the model to be used in order to avoid the meaningless consumption
of time.
200
Copyright: 2012 by Florin POPENTIU-VLADICESCU
6.5 Recalibrating Software Reliability Models
Fi* (t ) Gi* ( F i (t )) .
This technique recalibrates the raw model output Fi(t) related to the
accuracy of past predictions.
The mentioned authors use u-Plot and two models for the
transformation Gi: a polygon, respective a spline.
The polygonal approach is described in the following:
1. Use the y-Plot to make sure that the error in previous
predictions is approximately stationary.
2. Construct the polygon G* by joining up the steps of the u-Plot
formed by predictions before Ti (Fig. 6.7).
201
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig.6.7. Drawing the polygon Gi* (Brocklehurst et al., 1990)
202
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7 SOFTWARE RELIABILITY TOOLS
203
Copyright: 2012 by Florin POPENTIU-VLADICESCU
SREPT (Software Reliability Estimation and Prediction Tool
supporting: the discrete-event simulation, the architecture-based
approach, composite models. Follow the link to download the
SREPT software http://shannon.ee.duke.edu/tools/srept/).
SRETOOLS (AT&T Software Reliability Engineering Toolkit:
http://www.refis.nl/producten/tool.php?uk=1).
SRMP (Statistical Modeling and Reliability Program: The SRMP
was developed by the Reliability and Statistical Consultants,
Limited of UK in 1988).
SoRel (Software Reliability Program) is a Macintosh-based
reliability measurement tool that was developed by LAAS, a lab of
the National Center for Scientific Research in France, in 1991.
SoRel is composed of two parts: The first part allows several
reliability trend tests. These tests allow us to identify whether the
reliability function is increasing or decreasing so that an
appropriate model can be applied. The second part allows
reliability growth model application and contains 4 models. Only
one model is executed at a time.
CASRE (Computer-Aided Software Reliability Estimation Tool) is
a PC-based tool that was developed in 1993 by the Jet Propulsion
Laboratories to address the ease-of-use issues of other tools.
CASRE has a pull-down, menu-driven user interface and uses the
same model library as the SMERFS tool with the additional
feature of allowing linear combinations of models to create new
ones at the user s discretion. Follow the link to download the
software: http://johnmusa.com/CASRE.htm.
204
Copyright: 2012 by Florin POPENTIU-VLADICESCU
ESTM (Economic Stop Testing Model Tool) is a Bellcore system
that can be used (in command-line mode) to help decide when to
stop testing a large software system which changes over time.
The package determines the optimal stopping time using a birth-
death model for the introduction of faults, and an economic
model to measure the trade-off between shipping and further
testing.
SMERFS (Statistical Modeling and Estimation of Software
Reliability Functions) is a program for estimating and predicting
the reliability of software during the testing phase. It uses failure
count information to make these predictions. The following
Interval Data Models are covered: Brooks and Motley s Binomial
Model, Brooks and Motley s Poisson Model, Generalized Poisson
Model (2 versions), Non-homogeneous Poisson Model,
Schneidewind s Model treatments , and Yamada s S-Shaped
Model. Also, SMERFS support the following Time Between Failure
Models: Geometric Model, Jelinski / Moranda Model, Littlewood
and Verrall Linear Model, Littlewood and Verrall Quadratic
Model, Musa s Basic Model, Musa s Logarithmic Model, Non-
homogeneous Poisson Model.
RGA (Software for Repairable System and Reliability Growth
Analysis: http://rga.reliasoft.com/) is a software tool that allows
the engineers to apply reliability growth models to analyze data
from both developmental testing and fielded repairable systems.
DACS s GOEL An automated version of the Goel - Okumoto NHPP
Software Reliability Growth Model) software runs on an IBM-PC
or compatible under MS-DOS 2.11 or higher, and is distributed by
205
Copyright: 2012 by Florin POPENTIU-VLADICESCU
DACS (Data & Analysis Center for Software) Rome, NY. This tool
also provides the Kolmogorov-Smirnov statistic and different
estimations about fault and failure
CARATS - computer-aided reliability assessment tool for software
based on object-oriented analysis and design (the main
characteristics can be found in the paper available at UCLA:
http://www.seas.ucla.edu/~chienchi/TENCON06_CCChen.pdf).
7.1. CASRE
206
Copyright: 2012 by Florin POPENTIU-VLADICESCU
predictions), Setup (Add application, Remove application, Remember
most recent files, GOF significance, Laplace test significance), Plot
(create plot window), and Help (Help index, keys help, help for help,
about)] appears when CASRE starts after user invocation, and contains a
text representation of the failure data used as input to the models. The
second window is called the GRAPHIC DISPLAY WINDOW [Plot, Results,
Display, Settings, Copy, and Help], in which graphical representations of
the failure data as well as modeling results are displayed. The third
window is dedicated to the modeling results presented in tabular
format, and is called MODEL RESULTS TABLE [File, Results, and Help].
The items shown in this table include model parameters, model
estimates, model applicability statistics, and the residuals.
Most options are self-explaining. Let us detail only on some
entries, as described in CASRE user s guide.
The option Change data type permits the changing of failure
data from time between failures (TBF) to failure counts (FC) or vice
versa. Some of the models built into CASRE accept only TBF data, while
the others accept only FC data. This menu item is included to allow use
of both types of models on the same set of failure data.
The shaping and scaling transformations allow the shape of a
curve drawn through the failure data to be changed. The filters are
implemented such that the output remains in the first quadrant of the
plot shown in the graphic display window (to prevent filters from
producing physically meaningless results. The following
transformations are possible: linear (multiply by a scaling factor and add
an offset), power (multiply by a scale factor and then raise to a user-
specified power), logarithmic (apply a linear transformation and then
take the natural logarithm), and exponentiation (initially, applies a linear
207
Copyright: 2012 by Florin POPENTIU-VLADICESCU
transformation and then raises the base of natural logarithms to the
result).
208
Copyright: 2012 by Florin POPENTIU-VLADICESCU
failure count data, this test is available only if the test intervals are of
equal length. For time between failures, if the running average increases
with failure number, this indicates reliability growth. For failure count
data, if the running average decreases with time (fewer failures per test
interval), reliability growth is indicated. The results of the test are
shown in the main window and the graphic display window, as Nikora
describe in the CASRE user s guide. The null hypothesis for the Laplace
test is that occurrences of failures can be described as a homogeneous
Poisson process. The test is applied according to the following rules:
1) If the test statistic decreases with increasing failure number
(test interval number), then the null hypothesis can be
rejected in favor of reliability growth at an appropriate
significance level.
2) If the test statistic increases with failure number (test interval
number), then the null hypothesis can be rejected in favor of
decreasing reliability.
209
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1. Create a set of failure data (indicated, but it can be created from
CASRE, later by using an external application).
2. Start CASRE.
3. Open a set of failure data, if already created.
4. Use an external application to change the failure data, or to create a
new set of failure data.
5. Apply filters and smoothing operations to the data.
6. Apply trend tests to the failure data to determine whether or not
software reliability models should be applied.
7. Apply models, including combinations of models, to the failure data.
8. View the model outputs.
9. Print failure data and model results.
10. Save failure data and model results to disk, and EXIT.
However, before starting, the SRE engineer may have to setup some
options of CASRE, like:
A. Add external applications and combination model definitions to the
CASRE configuration, or remove selected items from the CASRE
configuration.
B. Set significance levels for goodness-of-fit tests and trend tests.
C. Set the length of the list of most recently used files that is kept in the
File menu.
210
Copyright: 2012 by Florin POPENTIU-VLADICESCU
For time between failures data, the data file format is as follows:
1. The first line represents the time units for the data file. A choice of
seven keywords is possible: Seconds, Minutes, Hours, Days, Weeks,
Months, and Years. This specifies the time units for the times between
successive failures given in the file.
2. The second line in the file (the first line of actual failure data)
represents the first failure observed the third line represents the second
failure observed, and so forth.
3. For every line starting with the second (the first failure occurrence)
the following fields are required:
3.1. The first column is the current failure number.
3.2. The second column represents the time that has passed since the
last failure was observed. The values in the second column are
measured in the time units given in the first line of the file.
3.3. The third column indicates the severity of the failure on a scale of 1
to 9. CASRE assigns no meaning to these severity categories; 1 could
represent the least serious type of failure just as easily as it could
represent the most serious type.
211
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.3. The third column gives the length of the test interval. Test interval
lengths do not have to be equal. They are measured in the time units
given in the first line.
2.4. The fourth column specifies the severity of the failures that were
observed during a given test interval, on a scale of 1 to 9. If no failures
were seen in an interval, a failure count of 0 and a severity of 0 should
be used. CASRE will show the severity as "N/A". Also, CASRE assigns no
meaning to severities.
2.5. For failure count data, one test interval can occupy more than one
line in the data file, depending on the maximum level of severity of
failures described.
212
Copyright: 2012 by Florin POPENTIU-VLADICESCU
After loading the input file contained failure data, the following
options are available, depending on the file structure (Fig. 7.2, Table
7.1):
Table 7.1. Available plot options
Times between Failures Data Failure Counts Data
Failure counts
Test interval lengths
Failure Intensity Failure Intensity
Times between failures Times between failures
Cumulative number of failures Cumulative number of failures
Cumulative number of failures Cumulative number of failures
from modeling start point from modeling start point
Reliability Reliability
Once a failure data file has been opened, one or more reliability
models can be applied to it. Before running models, there are three
model controls necessary to be select:
213
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Table 7.2. Available parameter estimation methods
Model Name Selected Estimation Method
Maximum Least
Likelihood Squares
Geometric ML LS
Jelinski-Moranda ML LS
Littlewood-Verrall Linear ML LS
Littlewood-Verrall Quadratic ML LS
Musa Basic ML ML
Musa-Okumoto ML ML
Nonhomogeneous Poisson Process (for ML ML
time between failures)
Generalized Poisson ML LS
Generalized Poisson – user selected ML LS
interval weights
Shick-Wolverton ML LS
Nonhomogeneous Poisson process (for ML LS
failure counts)
Schneidewind ML ML
Schneidewind – )gnore first s ML ML
intervals
Schneidewind – Total failures in first ML ML
s intervals
Yamada S-Shaped ML ML
214
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.1.3 CASRE- Project
Let us use the data set S1.dat from the folder CASREV3/DATA,
and do some tasks. The reader is invited to apply the approaches
presented above for more data sets provided by CASRE, and for other
data sets available online.
Firstly, use plot options to display Time between failures plot for
some software reliability models. In Fig. 7.3 are shown the results on
models: Geometric, Jelinski-Moranda, and Linear Littlewood-Verral.
The u-plot based analyze is presented in Fig. 7.4. The Geometric model
proves to be closed to the diagonal of the unit square (unbiased).
215
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 7.4. CASRE: u-plot (s1.data)
When analysing the Bias trend (y-plot) we found the results from Fig.
7.5.
216
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Next step consists of doing combinations and compare against the initial
models (components). Fig.7.5. shows the setup procedure for a
combination, and Fig. 7.6. gives the results after running the models.
217
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Finally the SRE engineer can use four permanent combination
models built into CASRE – these combinations are for time between
failures data only.
These combinations are:
218
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.2 SREPT
219
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The SREPT architecture is shown in Fig. 7.7, according to Ramani
et al (2000).
220
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.2.2 SRE by SREPT
221
Copyright: 2012 by Florin POPENTIU-VLADICESCU
coverage function. The available models are: Exponential (GO)
model, Weibull, S-shaped, and Log-logistic.
Release criteria could be of the following types, according to
Ramani et al (2000):
(1) Number of remaining faults - In this case the release time is
when a fraction f of all detectable faults has been removed;
(2) Failure intensity requirements – The software should be
released when the failure intensity measured at the end of
the development test phase reaches a specified value.
(3) Reliability requirements - This criteria could be used to
specify that the required conditional reliability in the
operational phase is say R* at time t* after product release.
(4) Cost requirements – From a knowledge of the expected cost of
removing a fault during testing, the expected cost of
removing a fault during operation and the expected cost of
software testing per unit time the total cost can be
computed. The release time is obtained by determining the
time that minimizes this total cost.
(5) Availability requirements – Assuming that the meantime to
repair is the time needed to reboot and that there is no
reliability growth the release time can be estimated based
on an operational availability requirement.
Architecture-based approaches use the internal control structure
of the software to predict the reliability of software, according to
Ramani et al (2000). This assumes added significance in the evaluation
of the reliability of modern software systems which are not monolithic
entities but are likely to be made up of several modules distributed
222
Copyright: 2012 by Florin POPENTIU-VLADICESCU
globally. SREPT allows prediction of the reliability of software based on:
architecture of the software, and failure behaviour.
Running the fault density engine the results described in Fig. 7.8,
on the data file SHARPE.LOC.
223
Copyright: 2012 by Florin POPENTIU-VLADICESCU
on the observed failure intensity and the number of faults remaining.
The user has to specify a Truncation level , at which the infinite state
space of the underlying NHPP (non-homogeneous Poisson process) is
truncated. This is necessary for numerical solution.
The user also needs to supply a failure intensity function. This
again can come from the ENHPP module. SREPT reports the coverage
function fitted (c(t)). If a is the expected total number of faults (given
infinite testing time), which is one of the parameters estimated for all
the coverage functions fitted, then the failure intensity is a*c t where
c t is the derivative of c t with respect to time t. The interface ensures
that the failure intensity is of the form a*c t , by having separate entry
fields for a and c t .
224
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 7.10. SREPT - ENHPP engine (results for DRAPER data)
225
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.2.3 SREPT-Project
Firstly, let us use the COBRA.DAT file from SREPT\DATA folder. Running
the ENHPP engine, we obtained the u-plot analysis (Fig. 7.12) and
numerical results during computational process.
226
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Generating the u-plots. Please wait...
Done!
The best coverage model for the input data based on sum of
square errors is -
S-shaped coverage model
227
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Determining the best coverage model for the input data...
The best coverage model for the input data based on sum of
square errors is -
S-shaped coverage model
228
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.3 SREtool
SRE Tool can import existing input data and identify the format
by the service Try to Autodetect Data Type . Data can be filtered and
transformed (Fig. 7.14), used in various analyses, and exported if
necessary. The Data menu provides this kind of functionality. The
Graphics menu is specialized in plotting results, exporting plots, and
copying to clipboard the current plot.
229
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.3.2 SRE by SREtool
230
Copyright: 2012 by Florin POPENTIU-VLADICESCU
significance level to perform hypothesis testing (by default set to 1%).
Last, choose between the maximum-likelihood or least squares methods
for parameter estimation (see Table 7.2). To perform the analysis, click
on Analyse Model button. Subsequently, the analysis output shows the
data type, the model choices, the parameter estimates and the result of
the Kolmogorov goodness-of-fit test. The data window presents the
fitted values for the selected models.
For the data set s1.dat, used with CASRE, the results obtained by
Military Handbook 189 trend test are given in Fig. 7.16., and the Running
Average plot is presented in Fig. 7.17. Model analysis is shown in Fig.
7.18, and the Reliability Plot is generated in Fig. 7.19.
500
400
300
200
100
Test Statistic
0 Lower confidence lim
Upper confidence lim
Observation Number
231
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Running Average Plot Trend Plot - casre_S1_tbf.xls
600
400
300
200
100
Failure number
232
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fitted Models - casre_S1_tbf.xls
140
120
Cumulative Number of Failures
100
80
60
40
20
233
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Reliability Plot
1.0
0.8
Reliability for interval of length 656.9
0.6
0.4
0.2
Estimated reliabi
Jelinski-Moranda
0.0 Goel-Okumoto
Duane (power law
The graphical output shows the observed values (circles) and the
estimated model (curve).
Firstly create an .xls file containing the time between failure data.
We use s3.dat from CASREV3\DATA having 207 observations. The file
was imported and used as input to data analysis. The results of Model
Analysis task is given in Fig. . .
234
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fitted Models - S3.xls
200
Cumulative Number of Failures
150
100
50
235
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Model: Jelinski-Moranda
Model estimates:
N = 254.000, Upper 95% confidence limit: 303.000
Model: Goel-Okumoto
Model estimates:
a = 256.625, Upper 95% confidence limit: 295.827
P-Value: 0.000
236
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The test statistic is large, which indicates that there is a
significant growth in reliability. The test indicates that there is a
significant growth in reliability since observation 1. The graphical
output is given in Fig. 7.21.
600
500
400
300
200
100
Test Statistic
0 Lower confidence lim
Upper confidence lim
Observation Number
237
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.4 Learned Lessons and conclusions
238
Copyright: 2012 by Florin POPENTIU-VLADICESCU
It was necessary a new discipline, nowadays called, software
reliability engineering, to increase the dependability of computer-based
systems.
Also, software engineers developed important methodologies to
increase the software reliability by formal verification of software
requirements, reliable programming techniques, and adequate testing
strategies.
In the following we review the most important ideas presented in
this course in order to outline best practice useful to software engineers,
software reliability engineers, software managers, and ICT managers.
Understanding the triad error-fault (bug) – failure is important
both to software engineers and software reliability engineers. For
complex distributed software the software engineer can experience
some unusual software bugs more difficult to understand and repair
than ordinary software bugs, according to (Madsen et al 2006):
heisenbugs, mandelbugs, schroedinbugs, and agingbugs.
This is why terminology is essential to all actors involved in
software development, reliability modeling and analysis, software risk
management, and ICT project management.
The seven general reliability assessment requirements play an
important role in deploying highly reliable software. Software reliability
can be achieved by using many sources of information, and will be
effective by using various software reliability engineering approaches.
There are some conflicting views on software quality concerning
the correctness (a program is either correct or incorrect) and the
reliability under uncertainty factors (measured as the relative frequency
of the times that the program performs without failure).
239
Copyright: 2012 by Florin POPENTIU-VLADICESCU
During software operation, time of failure, time interval between
failures, cumulative failures experienced up to a given time, and failures
experienced in a time interval are all random variables (that means
uncertainty is experienced) generated by the introduction of faults by
programmers which is an unpredictable process (the locations of faults
are unknown), and the working environment (including conditions of
running) of the software that can behave unpredictable, etc. Hence, the
usage of probability distribution functions, stochastic processes, and
statistical methods for data analysis is required in software reliability
modeling, analyzing, and software reliability assessment.
The rejuvenation as a proactive fault management technique
increases only the availability which is a different concept from
reliability.
Reliability is improved when software faults which occur in the
most frequently used parts of the software are removed. Removing x%
of software faults will not necessarily lead to an x% reliability
improvement. In a study, removing 60% of software defects actually led
to a 3% reliability improvement. Hence, removing faults with serious
consequences is the most important objective.
The components reliability, the connections reliability, the
network reliability, the complex system reliability can be modelled in
specific ways and used in reliability optimization. In order to define the
necessary reliability it is necessary to define failure with severity
classes, to choose a common measure and set a failure intensity
objective (Musa, 1999).
Software testing and debugging represent some of the most
important components of the software engineering process with major
concerns on software reliability (Galen (2005)). Even, program testing
240
Copyright: 2012 by Florin POPENTIU-VLADICESCU
can be used to show the presence of bugs, but never to show their
absence , as Dahl et al. say, every test performed increases
intrinsic software quality by no fault reveals, or by discovering a latent
fault. Unit testing, the integration testing, and the acceptance testing are
core tasks in order to obtain highly reliable software accepted by
customers.
Computer-aided software engineering (CASE) refers to a set of
tools which are able to automate all or some of various tasks of the
software life cycle, like: requirements capture, tracing, and tracking;
configuration management; model verification; facilitation of model
validation; maintenance of all project-related information; collection
and reporting of project management data, document production and
CASE data import-export. However, software reliability is not covered,
in general, by CASE products. There are specific software tools, called
Computer Aided Software Reliability Engineering, many of them being
dedicated to estimation of software reliability. Some of them were
detailed above (CASRE, SREPT, SRE Tool). These tools useful in
reliability assessment are based on three time domains which are most
commonly used: calendar time, clock time, and execution time. Musa
(1999) proposed that all reliability theory should be based on execution
time, which is a better measure of the failure-inducing stress . Musa
recommends the usage of natural units or time units when define the
reliability quantities.
Manual (non-automated) and automated failure data collection
approaches are used. Automated data collection is not fully possible
even formal specification, and classification rules (applied to failures)
are used.
241
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Collected data (manually or automated) are sent to analysis
modules in order to find patterns and models, and to predict the system
behavior. Before choosing a model, tests on data a necessary to be
applied (for instance, a test to check the Poisson process assumption, a
trend test, etc).
A large number of software reliability growth models (SRGMs)
have been developed over the years, and each model has its strengths
and limitations. Also, there is a major difference between software
reliability prediction and software reliability estimation, i.e., the
prediction is performed based on historical data while estimations are
based on collected data. Predictions, by their nature, will almost
certainly be less accurate than estimations. However, they are useful for
improving the software reliability during the development process. If
the organization waits until collected data is available (normally during
testing), it will generally be too late to make substantial improvements
in software reliability. The predictions should be performed iteratively
during each phase of the life cycle and as collected data becomes
available the predictions should be refined to represent the software
product at hand.
Analysis collected data in a software reliability engineering
approach sometimes provides unsatisfactory results. Sometimes there is
dramatic disagreement between model predictions on the same data
source, there is no universally accurate model, and there is no way of
selecting a model a priori and being confident that it will be accurate on
a particular data source.
Due to the prediction triad, we could not separately examine
models and inference procedure because is too difficult: we are forced to
242
Copyright: 2012 by Florin POPENTIU-VLADICESCU
examine directly the accuracy of the different available models on each
data source and somehow decide which, if any, is giving accurate results.
When look to the software reliability models, the parameter
estimation, and results obtained by CASRE, SREPT or SRE Tool, we are
able to mention the following: The NHPP and Bayesian models seem to
perform almost as well as the best parametric models for most data sets
(Fig. 7.22). They seem robust: whilst the performance of the parametric
models varies considerably from one data set to another, these seem
fairly consistent. In most cases the best performing model is usually a
parametric one. Since we can select the best model via analysis of
predictive accuracy, it is still best to be eclectic in model choice and let
the data decide . Of course, some of these models are very
computationally intensive, but this is not a concern nowadays.
Finally, the following ideas should be pointed out:
1) The increase in software-based systems for safety functions
requires systematic evaluation of software reliability;
2) Direct observation of operational behaviour of the system (e.g.
in test or simulation) is not going to give assurance of ultra-high
reliability;
3) The presented techniques only work for modest reliability
levels;
4) There is no perfect model, and all models sometimes behave
inaccurate;
5) Recalibration often improves accuracy.
6) Software reliability estimation is still an unresolved issue and
existing approaches have limitations and assumptions that are not
acceptable for safety applications.
243
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 7.22. CASRE – case study (s3.dat from CASREV3\DATA\)
244
Copyright: 2012 by Florin POPENTIU-VLADICESCU
8 REFERENCES
245
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Corro Ramos, I., Statistical Procedures for Certification of Software
Systems, PhD Thesis, Eindhoven University of Technology, 2009,
Crowe D. Feinberg A. (eds.), Design for Reliability, CRC Press, 2001.
Dennison T.E., Fitting and Prediction Uncertainty for a Software
Reliability Model, Master's Thesis, Naval Postgraduate School, 1992.
Dotson K.J., Development of Confidence Limits by Pivotal Functions for
Estimating Software Reliability, NASA Technical Paper 2709, 1987.
Duane, J.T., Learning curve approach to reliability monitoring, IEEE
Transactions on Aerospace, vol. 2, no. 2, pp. 563–566, April 1964
EIFEL - An Introduction to Design by Contract™, http://www.eiffel.com/
developers/design_by_contract_in_detail.html
Einarsson Bo (ed.), Accuracy and Reliability in Scientific Computing,
SIAM, 2005.
Everett G.D., McLeod Jr. R., Software Testing. Testing Across the Entire
Software Development Life Cycle, Wiley, 2007.
Everett W., Keene S., Nikora A., Applying Software Reliability
Engineering in the 90's, 1998.
Farr, W.H., Smith O.D., Statistical Modeling and Estimation of Reliability
Functions for Software (SMERFS), NSWCDD TR 84-371, 1993.
Farr, W.H., A survey of Software Reliability Modeling and Estimation,
NSWC TR 82-171, 1983.
Goel, A.L. & Okumoto, K., Time-Dependent Error-Detection Rate Model
for Software Reliability and Other Performance Measures, IEEE
Transactions on Reliability, Vol. R-28, No. 3, August 1979.
Gokhale, S.S., Marinos, P.N., Trivedi, K.S., Important Milestones in
Software Reliability Modeling, Communications in Reliability,
Maintainability and Serviceability: An International Journal published by
SAE International, 1996.
246
Copyright: 2012 by Florin POPENTIU-VLADICESCU
GSAM4: Guidelines for Successful Acquisition and Management of
Software-Intensive Systems: Weapon Systems, Command and Control
Systems, Management Information Systems - Condensed Version 4.0
February 2003.
Gustafson D.A., Theory and Problems of Software Engineering, McGraw-
Hill, 2002.
Howe D., The Free On-line Dictionary of Computing, 2010.
Huber C., Limnios N., Mesbah M., Nikulin M., Mathematical Methods in
Survival Analysis, Reliability and Quality of Life, ISTE and Wiley, 2008.
IEEE. IEEE Std. 610.12-1990.
IEEE. IEEE Recommended Practice on Software Reliability, 1633/2008.
Ince D., An introduction to quality assurance and its implementation,
McGraw-Hill, 1994.
Jelinski, A. & Moranda, P., Software Reliability Research, in W.
Freiberfer, ed., Statistical Computer Performance Evaluation, Academic
Press, New York, NY, pp. 465-484, 1972.
Kan, S.H., Metrics and Models in Software Quality Engineering, Second
Edition, Addison Wesley, 2002.
Karanta I., Methods and problems of software reliability estimation, VTT,
2006.
Khatatneh K., Mustafa T., Software Reliability Modeling Using Soft
Computing Technique, European Journal of Scientific Research, 2009.
Kirkwood K., Bazzana G., A Software Reliability Toolkit, Measuring
program execution time, http://csapp.cs.cmu.edu/public/1e/public/
ch9-preview.pdf
Koziolek H., Operational Profiles for Software Reliability, Seminar
Dependability Engineering , 005.
247
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Lu, M., Brocklehurst, S., and Littlewood, B., Combination of predictions
obtained from different software reliability growth models, Proceedings
of the Tenth Annual Software Reliability Symposium, Denver, CO, pp.
24–33, June 1992.
Lyu, M.R., Nikora, A.P., CASRE - A Computer-Aided Software Reliability
Estimation Tool, Proceedings of 1992 Computer-Aided Software
Engineering Workshop (CASE92), Montreal, Quebec, Canada, July 6-10,
1992
Lyu M.R. (ed.), Handbook of software reliability engineering, McGraw-
Hill, 1996.
Lyu, M.R., Nikora, A.P., Farr W.H., A Systematic and Comprehensive Tool
for Software Reliability Modeling and Measurement, In Proceedings of
FTCS. 1993, 648-653.
Madsen, H., Thyregod, P., Burtschy, B., Popentiu, F. & Albeanu, G., On
using soft computing techniques in software reliability engineering, ESREL
2005, Vol. 2, pp. 1317-1323, A.A. Balkema Publisher, 27-30 June 2005
Tri City, Gdynia, Gdansk, Sopot, Poland.
Meeker W.Q., EscobarL., Statistical Methods for Reliability Data, Wiley,
1998.
Meier J.D., Carlos Farre, Prashant Bansode, Scott Barber, and
Dennis Rea, Performance Testing Guidance for Web Applications,
Microsoft Corporation, 2007.
Military Handbook. Reliabilitv Growth Management, MIL-HDBK-189. US
Department of Defense.
Musa, J.D., Iannino, A., and Okumoto, K., Software Reliability:
Measurement, Prediction, Application, New York: McGraw-Hill, 1987.
Musa, J. D., Operational Profiles in Software-Reliability Engineering. In:
IEEE Software, 10 (1993), 2, pp. 14–32.
248
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Musa, J.D., Software Reliability Engineering, McGraw-Hill, 1999.
Musa, J.D., Software Reliability Engineering: More Reliable Software
Faster and Cheaper, 2nd Edition, Author House, 2004.
Neufelder, A.M., The Facts about Predicting Software Defects and
Reliability, The Journal of the Reliability Analysis Center (RAC), Second
Quarter – 2002.
Naik K., Tripathy P., Software Testing and Quality Assurance. Theory and
Practice, Wiley, 2008.
Nicora A.P., Lyu M.R., An Experiment in Determining Software Reliability
Model Applicability. Proceedings of the Sixth International Symposium
on Software Reliability Engineering, Toulouse, France, October 24-27,
1995.
Ozekici S., Soyer R., Reliability of software with an operational profile,
European Journal of Operational Research 149 (2003) 459–474.
Pandian, R. C., Software Metrics—A Guide to Planning, Analysis, and
Application, CRC Press, 2004.
Park M.-G., Jung E.-Y., Software Reliability Growth Modeling and
Performance Evaluation in the Testing Phase with an Outlier Stage.
Pentti H., Helminen Atte, Failure Mode and defects Analysis of Software-
Basedautomation Systems, Stuk-Yto-Tr 190 / August 2002
Pham H., Pham Michelle, Software Reliability Models for Critical
Applications, Report EGG-2663, EG&G Idaho Inc., 1991.
Pham H., Software Reliability, Springer, 2000.
Pham H. (ed.), Handbook of Reliability Engineering, Springer, 2003.
Pullum Laura L., Software Fault Tolerance Techniques and
Implementation, Artech, 2001.
249
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Sens, P, Popentiu, Fl. & Mantoiu, A., Software Reliability Forecasting for
Adapted Fault Tolerance Algorithms, ESREL'98 Conference, Springer
Verlag, 20-24 June 1998 Trondheim, Norway.
Shooman, M.L., Software Engineering: Design, Reliability, and
Management, New York McGraw-Hill Book Co., 1983.
Singpurwalla, N.D. and Wilson, S.P., Statistical Methods in Software
Engineering – Reliability and Risk, Springer-Verlag, New York, 1999.
Smith D.J., Reliability, Maintainability and Risk. Practical Methods for
Engineers (5th ed.), Reed Elsevier, 1997 (reprinted 2000).
SoftRel, LLC, A tour through Frestimate, http://www.softrel.com
/downloads/walkthru.pdf.
Taylor, An interim Report of Engineering Design, MIT, 1959.
Tian J., Software Quality EngineeringTesting, Quality Assurance, and
Quantifiable Improvement, IEEE, 2005.
Valido-Cabrera E., Software reliability methods, Technical University of
Madrid, 2006.
Vasanthi T., Arulmozhi G., A recursive method for reliability
computation of Moranda's Geometric software reliability model.
Wohlin C. and Körner U., Software Faults: Spreading, Detection and
Costs, Software Engineering Journal, Vol. 5, No. 1, pp. 33-42, 1990.
Xie M., Day Y.-S., Poh K.-L., Computing Systems Reliability. Models and
Analysis, Kluwer, 2004.
Yamada, S., Ohba, M., and Osaki, S., S-shaped reliability growth
modeling for software error detection, IEEE Transactions on Reliability,
vol. R-32, no. 5, pp. 475–478, Dec. 1983.
250
Copyright: 2012 by Florin POPENTIU-VLADICESCU