[go: up one dir, main page]

0% found this document useful (0 votes)
352 views243 pages

Software Reliability Engineering

Uploaded by

Avnish Bhasin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
352 views243 pages

Software Reliability Engineering

Uploaded by

Avnish Bhasin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 243

19 19

Software Reliability Engineering Florin Popențiu VLĂDICESCU

19.
Florin Popențiu VLĂDICESCU
Professor Florin Popențiu Vlădicescu graduated in Electronics and
Telecommunications from University POLITEHNICA of Bucharest in
1974 and holds a PhD in Reliability in 1981. He has been Chairholder
of the UNESCO Chair in Information and Communication Engineering
at City University London since 1998 and also he has been appointed
Director of the “UNESCO Chair” Department at University of Oradea.
Professor Florin Popențiu Vlădicescu has published over 100 papers in
international journals and conference proceedings and is
co-author of three books.
He has worked for many years on problems associated with software

FLORIN POPENȚIU VLĂDICESCU Software Reliability Engineering


reliability and has been Co-Director of two NATO Research Projects,
involving collaboration with partner institutions throughout Europe.
Also he is on the advisory board of several international journals,
including “Reliability and Risk Analysis:
Theory & Applications” and “Microelectronics Reliability”,
and is a reviewer for “ACM Computing Reviews”.
He is an expert for the Seventh Framework Programme-FP7.
His Research ID is E-5787-2010. Also in 2009 he has been nominated
UNESCO Expert in the field of Higher Education, Research and Knowledge.
Professor Popențiu Vlădicescu is currently Visiting Professor
at the Paris Institute of Technology - "ParisTech" - which brings
together 24 of France's best engineering and business schools.
He also lectures at the Technical University of Denmark.
He was elected Fellow of the Academy of Romanian Scientists in
2008 and Director of the Doctoral School “Engineering sciences” –
November 2011.

Software
Reliability Engineering

Course book
course book
SOFTWARE RELIABILITY
ENGINEERING
First Edition
Course book of Series of
Advanced Mechatronics Systems

edited by Florin POPENTIU-VLADICESCU

CONTENTS

1 INTRODUCTION ............................................................................................. 13

1.1 The need for software reliability ........................................................... 13

1.2 Basic Definitions and Terminology ...................................................... 19

1.3 The Software Reliability Engineering Objectives ............................. 31

2 RELIABILITY ENGINEERING FUNDAMENTALS .................................... 33

2. 1 Reliability measures................................................................................. 33
2.1.1 Reliability, Maintainability and Availability ....................................... 33
2.1.2 Reliability Function for Common Distributions ............................... 41

9
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.2 Stochastic processes in reliability ......................................................... 55
2.2.1. Preliminaries ................................................................................................. 55
2.2.2 Discrete Time Markov Chains .................................................................. 57
2.2.3 Continuous Time Markov Chains ........................................................... 58
2.2.4 Semi-Markov Processes ............................................................................. 59
2.2.5 Birth-Death (BD) Processes ..................................................................... 60
2.2.6 Poisson Processes ........................................................................................ 61

2.3 Basic Techniques in Reliability .............................................................. 63


2.3.1 Component-based systems ....................................................................... 63
2.3.2 Network reliability ....................................................................................... 66
2.3.3 Statistical estimation methods ................................................................ 69

3 RELIABLE SOFTWARE DEVELOPMENT ................................................. 73

3.1 Software lifecycle ....................................................................................... 73

3.2 Developing operational profiles ............................................................ 91

3.3 Software failures and failure processes.............................................. 98


3.3.1 Faults ................................................................................................................. 98
3.3.2 Software failures .........................................................................................108
3.3.3 Failure processes ........................................................................................113

3.4 Factors that affect Software Reliability ............................................ 116

4 SOFTWARE TESTING FOR CERTIFICATION AND RELIABILITY


GROWTH........................................................................................................... 121

4.1 Preparing for test..................................................................................... 121

4.2 Executing test ............................................................................................ 125

4.3 Software Reliability Data Collection .................................................. 130

10
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4.4 Applying Failure Data to Guide Decisions ........................................ 137

5 SOFTWARE RELIABILITY MODELS ...................................................... 143

5.1 Classical software models ..................................................................... 145


5.1.1 Weibull model ..............................................................................................145
5.1.2 Shooman model...........................................................................................147
5.1.3 Jelinski-Moranda model ...........................................................................149
5.1.4 Lipow model .................................................................................................152
5.1.5 Schick-Wolverton model .........................................................................154
5.1.6 Jelinski-Moranda Geometric model.....................................................157

5.2 Poisson models.......................................................................................... 159


5.2.1 The Generalized Poisson Model (GPM) .............................................159
5.2.2 The Geometric Poisson Model ...............................................................161
. . The Schneidewind s Model .....................................................................162

5.3 NHPP Software Reliability Models ...................................................... 164

5.4 Generalized Models with Environmental Factors .......................... 171

5.5 Bayesian Models ....................................................................................... 172


. . Littlewood s Bayesian Debugging Model ..........................................172
. . Littlewood and Verrall s Bayesian reliability model ....................176
5.5.3 Thomson-Chelson s Bayesian reliability model .............................179

6 SOFTWARE RELIABILITY MODEL SELECTION ................................. 183

6.1 Software Reliability Growth ................................................................. 185


6.1.1 The Laplace Test .........................................................................................186
6.1.2 TTT – Total Time on Test ........................................................................188
6.1.3 The MIL-HDBK-189 Test .........................................................................190

11
Copyright: 2012 by Florin POPENTIU-VLADICESCU
6.2 Criteria for model evaluation ............................................................... 191

6.3 Analysis of predictive quality ............................................................... 192


6.3.1 The u-plot ......................................................................................................193
6.3.2 The y-plot ......................................................................................................194
6.3.3 Measures of noise .......................................................................................195
6.3.4 Prequential Likelihood .............................................................................196

6.4 Model selection by combination of models....................................... 198

6.5 Recalibrating Software Reliability Models ...................................... 201

7 SOFTWARE RELIABILITY TOOLS .......................................................... 203

7.1. CASRE .......................................................................................................... 206


7.1.1 Main characteristics ..................................................................................206
7.1.2 SRE by CASRE ..............................................................................................209
7.1.3 CASRE- Project ............................................................................................215

7.2 SREPT ........................................................................................................... 219


7.2.1 Main characteristics ..................................................................................219
7.2.2 SRE by SREPT...............................................................................................221
7.2.3 SREPT-Project..............................................................................................226

7.3 SREtool......................................................................................................... 229


7.3.1 Main characteristics ..................................................................................229
7.3.2 SRE by SREtool ............................................................................................230
7.3.3 SRE tool - Project ........................................................................................234

7.4 Learned Lessons and conclusions ....................................................... 238

8 REFERENCES ................................................................................................ 245

12
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1 INTRODUCTION

1.1 The need for software reliability

Computers were used mainly for arithmetic operations when


firstly designed. Later, the new designed computers were used in
industry (real time systems), business (mainframe and minicomputers),
and science (supercomputers). Since microcomputers appeared a new
era was started and nowadays computers are involved in every aspect of
our lives. Every machine we use is based on a microprocessor.
It is well known that actual computer based systems integrate
hardware, firmware and software in order to fulfill various
functionalities. The ability of a system or component to perform its
required functions under stated conditions for a specified period of time
is called reliability. For computer-based systems, both hardware and
software reliability are analyzed.
While the engineers are able to design highly reliable hardware,
the software developers are working hard to increase the reliability of
the code produced to be installed. Some catastrophic events generated
by poor software reliability are well known. Starting from 1945, during
the testing of (arvard Mark System, when the word debugging was
used in computer engineering by Grace Hopper, not only catastrophic
cases as: Mariner I, Therac-25, Ariane 5 and others, but also projects
related to computer security motivate the engineers to study ways to
improve the software reliability and to manage the projects in order to
maximize the system reliability under a suitable cost level. As a result
the software reliability engineering field was born and proved real

13
Copyright: 2012 by Florin POPENTIU-VLADICESCU
improvements on software reliability, in particular, and on system
reliability, in a systemic view.
In order to understand the importance of software reliability as a
discipline in Table 1 there are presented, in chronological order, some
projects1 which failed due to software reason. A list of software bugs is
maintained online by Wikipedia2.
Table 1
# Time Name Short description
1 1962 Mariner I A bug3 in the flight software for the
Mariner I cause the rocket to divert from
its intended path on launch. Mission
control destroys the rocket over the
Atlantic Ocean. The investigation into the
accident discovers that a formula written
on paper in pencil was improperly
transcribed into computer code, causing
the computer to miscalculate the rocket's
trajectory.
2 1978 Hartford The programmer of the CAD software
Coliseum used to design the coliseum incorrectly
Collapse assumed the steel roof supports would
only face pure compression. But when
one of the supports unexpectedly buckled
from the snow, it set off a chain reaction

1
http://www.devtopics.com/20-famous-software-disasters/
2
http://en.wikipedia.org/wiki/List_of_software_bugs
3
A bug is an English word which means “insect” and comes from a problem caused by an
animal trapped in a component and preventing its operation. A software bug is a programm-
ing error caused by software developers.

14
Copyright: 2012 by Florin POPENTIU-VLADICESCU
that brought down the other roof sections
like dominoes.
3 1982 The CIA operatives allegedly planted a bug in
Trans- a Canadian computer system purchased
Siberian by the Soviets to control their gas
gas pipelines. The purchase was part of a
pipeline strategic Soviet plan to steal or covertly
obtain sensitive U.S. technology. When
the CIA discovered the purchase, they
sabotaged the software so that it would
pass Soviet inspection but fail in
operation.
4 1983 World A bug in the Soviet software failed to filter
War )))… out false missile detections caused by
False sunlight reflecting off cloud-tops. The
alarm Soviet early warning system falsely
indicated the United States had launched
five ballistic missiles. Fortunately the
Soviet duty officer had a funny feeling in
my gut and reasoned if the U.S. was
really attacking they would launch more
than five missiles, so he reported the
apparent attack as a false alarm.
5 1985- Therac- Because of a subtle bug called a race
1987 25 condition, a technician could accidentally
configure Therac-25 so the electron beam
would fire in high-power mode without

15
Copyright: 2012 by Florin POPENTIU-VLADICESCU
the proper patient shielding. Three
people dead, three people critically
injured.
6 1990 AT&T A single line of buggy code in a complex
network software upgrade implemented to speed
up calling caused a ripple effect that shut
down the network.
7 1991 Patriot A software rounding error incorrectly
Fails calculated the time, causing the Patriot
system to ignore the incoming Scud
missile.
8 1993 Pentium The divider in the Pentium floating point
unit had a flawed division table, missing
about five of a thousand entries and
resulting in rounding errors.
9 1996 Ariane 5 The shutdown of Ariane 5 occurred when
the guidance computer tried to convert
the sideways rocket velocity from 64-bits
to a 16-bit format. The number was too
big, and an overflow error resulted. When
the guidance system shut down, control
passed to an identical redundant unit,
which also failed because it was running
the same algorithm.
First Flight 501's backup computer
crashes, followed 0.05 seconds later by a
crash of the primary computer. As a result

16
Copyright: 2012 by Florin POPENTIU-VLADICESCU
of these crashed computers, the rocket's
primary processor overpowers the
rocket's engines and causes the rocket to
disintegrate 40 seconds after launch.
***
According to the ESA report (http://esa
multimedia.esa.int/docs/esa-x-1819eng.
pdf): The internal SRI software exception
was caused during execution of a data
conversion from 64-bit floating point to
16-bit signed integer value. The floating
point number which was converted had a
value greater than what could be
represented by a 16-bit signed integer.
This resulted in an Operand Error. The
data conversion instructions (in Ada
code) were not protected from causing an
Operand Error, although other con-
versions of comparable variables in the
same place in the code were protected .
10 2005 FB) s Mismanagement, and an attempt to build
Virtual a long-term project on technology that
Case File was outdated before the project
project completed, resulted in a complex and
unusable system.
11 Simple Toyota: In October of 2005, the Toyota
cases Motor Company voluntarily recalled

17
Copyright: 2012 by Florin POPENTIU-VLADICESCU
75000 of its hybrid vehicles because a
software glitch that may have shut down
the engine.
US Airways: In April of 2005, the
ticketing system for US Airways issued
incorrect fares for several hours. Some
tickets were offered for under $2.

The reader can browse online resources and read printed


materials to find more cases of bad practice which gave rise to data loss,
accidents, large costs, and disasters. In many cases, the main causes of
failure were undisciplined management of requirements, imprecise and
ambiguous communication, unstable architectures, high complexities,
inconsistency of requirements with design and implementation, low
automation, and insufficient verification and validation.
Software reliability engineering, according to (Musa 1999) is the
only standard, proven best practice that empowers testers and
developers to simultaneously ensure that product reliability meets
user needs, speed the product to market faster, reduce product cost,
improve customer satisfaction and reduce the risk of angry users,
and increase the customer productivity . Being a quantitative
approach of the operational behavior of software-based systems with
respect to user requirements, software reliability engineering includes:
software reliability measurement and estimation; effects of product and
development process metrics and factors on operational software
behavior, and the usage of specific knowledge in specifying and guiding
software development, acquisition, use, and maintenance.

18
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1.2 Basic Definitions and Terminology

For the best understanding of the software reliability objectives,


software reliability models and software reliability management tasks
let s find clear explanations on various terms, and expressions used in
the next chapters.
***
1. System – is defined as a collection of components organized to
accomplish a specific function or set of functions , according to )EEE Std.
610.12-1990. For our purpose, a system contains hardware, human, and
software components.
2. System life cycle – begins when a system is conceived and ends when
the system is no longer available for use )EEE Std. . -1990).
3. System specification – is a document that specifies, in a complete,
precise, verifiable manner, the requirements, design, behavior, or other
characteristics of a system or component, and, often, the procedures for
determining whether these provisions have been satisfied )EEE Std.
610.12-1990).
4. Hardware – is a general term for equipment, the physical part of any
computer.
5. Firmware – is a combination of a hardware device and computer
instructions and data that reside as read-only software on that device
(IEEE Std. 610.12-1990).
6. Software – is a collection of computer instructions and data (which
form a computer program or a set of computer programs) which is given
to the computer. The software that is part of a larger system and

19
Copyright: 2012 by Florin POPENTIU-VLADICESCU
performs some of the requirements of that system is called embedded
(robots, satellites, missiles, automotive etc.). That software whose
failure could have an impact on safety, or could cause large financial or
social loss is called critical software .
7. Bug – is an error or defect in software or hardware that causes a
program to malfunction. According to Pham , a bug is a design
flaw that will result in symptoms exhibited when an object is subjected
to an appropriate test.

Fig. 1.1. The Error-Fault-Failure Chain

8. Defect – is a generic term for any kind of shortcoming or imperfection,


whether literal or figurative.
9. Error – According to IEEE Std. 610.12-1990 there are four definitions
in use by software engineers:
 The difference between a computed, observed, or measured
value or condition and the true, specified, or theoretically
correct value or condition ;
 An incorrect step, process, or data definition ;
 An incorrect result and
 A human action that produces an incorrect result .
An error that results in the complete inability of a system or component
to function is called fatal error .

20
Copyright: 2012 by Florin POPENTIU-VLADICESCU
10. Fault – is a defect in a hardware device or component, and an
incorrect step, process or data definition in computer programs. Two or
more faults that result in the same failure mode (the same manifestation
or behavior are called equivalent .
11. Failure – According to IEEE Std. 610.12-1990, by failure it is
described the inability of a system or component to perform its
required functions, within specified performance requirements .
Software failures are generated by software faults during run-time. The
chain is the following: the designer or programmer makes a mistake (an
error) which generates a software fault (the software has a defect). This
fault could generate, or not, a failure depending on the software
exploitation. Failure is a user-oriented concept, but fault is a developer-
oriented concept, as (Musa 1999) appreciates. If a failure is experienced
during the software testing or run-time, the process of fault
identification and error elimination is called debugging .
12. Severity levels of failures – According to (Pham 2000), the severity of
a software failure effect is classified into four categories: Catastrophic,
Critical, Major, and Minor.
 Catastrophic failures are associated to disastrous effects: loss
of human life, permanent loss of assets, injuries as result of
erroneous medical prescription or traffic management.
 Critical failures cover disastrous effects but restorable
damages (equipment, data) or curable illness or injury.
 The major failures cover those serious failures without
physical injury to people or other systems.
 The minor failures produce small inconveniences to users or
other components of the information system.

21
Copyright: 2012 by Florin POPENTIU-VLADICESCU
13. Operational mode – is a pattern of system use and/or set of
environmental conditions needing separate testing because it is likely to
simulate different failures Musa .
14. Operational profile – consists of the complete set of operations for
the system with probabilities of occurrence based on all operational
modes Musa .
15. Testing – is the process of analyzing a system to identify bugs.
Software testing is an important phase during software life cycle and
will be detailed in a dedicate section. Testability describes the ease with
which a system, or component, can be tested.
16. Unit – is a separately testable element (hardware or software
component).
17. Program correctness – establishing in finite time the fulfillment of the
output effect under given valid input data (the specified requirements).
The quality of the input/output data is modeled by logical assertions.
The program correctness is proved by mathematical techniques. In
general the correctness is the degree to which a system or component
is free from faults in its specification, design, and implementation )EEE
Std. 610.12-1990).
18. Software verification – establishing the fulfillment of the software
specification.
19. Software reliability – is the probability that software will not fail for
a specified period of time , called the mission time, under specified
conditions , as Pham considered. As an attribute of quality, the
reliability describes the ability of a software system to carry on
executing with little, or without, interruption to its functioning.
20. Reliability growth – is the improvement in reliability by successful
correction of faults, as a consequence the successive observed failure

22
Copyright: 2012 by Florin POPENTIU-VLADICESCU
times tend to increase. Software reliability growth is obtained by
software maintenance. Talking about hardware reliability growth is not
normally considered.
21. Software availability – is the probability that a system has not failed
due to a software fault , as Pham defined. According to )EEE Std.
610.12- , in general, availability is the degree to which a system or
a component is operational and accessible when required for use .
22. Software maintenance – is the process of modifying a software
system or component after delivery to correct faults, improve
performance or other attributes, or adapt to a changed environment , as
IEEE Std. 610.12-1990 established. A current approach is to provide
online service packs or software updates.
23. Maintainability – is the probability that a system will be restored to
working condition in a given period of time when being subject of
modifications (corrections or enhancements). Not only preventive
maintenance (to avoid those problems which can generate faults), but
also perfective maintenance (to improve performance) is promoted to
increase software reliability.
24. Mean time to repair (MTTR) – is the expected time to restore a
system to operation after experiencing a failure.
25. Mean time between failures (MTBF) – is the expected time when the
next failure will appear.
26. Fault tolerance – mainly is the ability of a system or component to
continue normal operation despite the presence of hardware or
software faults , according to IEEE Std. 610.12-1990.
27. Risk assessment – is a study of the probability that certain
undesirable events will occur (software failures, cost overruns etc.)
according to (Musa 1999).

23
Copyright: 2012 by Florin POPENTIU-VLADICESCU
28. Software reusability – is the ability to transfer modules constructed
for one software system into another software system , according to
(Ince1994).
29. Survivability – refers to reliability of the system with respect to a
specified set of critical functions when the system has been degraded in
some way Musa .

Fig. 1.2. The Bathtub Curve


30. Bathtub curve – is a name for the graph of the number of failures in a
system or component as a function of time. This model is common for
hardware systems, and is not suitable for software systems.
31. Classes of software faults – were identified as the following: transient
(only some input values), permanent (all input values), recoverable
(with or without user intervention), unrecoverable (the system should
restart), cosmetic (minor irritation but no bad reports concerning
processing).
32. The rate of occurrence of failures (ROCOF) - is defined so that ROCOF
times the interval length is the probability of a failure, not necessarily
the first, in this interval. ROCOF measure of a computer program can be
obtained by observing the behavior of this software in operation over a
specified time interval and then recording the total number of failures

24
Copyright: 2012 by Florin POPENTIU-VLADICESCU
occurring during the interval. Various shapes of ROCOF appear when
modeling software reliability, as shown below:

Fig. 1.3. Some decreasing ROCOF shapes

Fig. 1.4. Increasing ROCOF by adding new faults

33. Measurement – is defined by Fenton & Pfleeger, as the


process by which numbers or symbols are assigned to attributes of
entities in the real world in such a way as to describe them according to
clearly defined rules.
34. Direct measurement – those measurements that involve no other
attributes or entities. Some direct measures used in software
engineering are: Length of source code (KLOC – Kilo Lines of Code),
duration of the testing process, the number of defects discovered, and
software project development time.

25
Copyright: 2012 by Florin POPENTIU-VLADICESCU
35. Indirect measurement – are based on direct measurements, other
indirect measurements, and/or combinations of both type of
measurement. Some indirect measures used in software engineering
are: module defect density (number of defects per module size), defect
detection efficiency (number of defects detected per total number of
defects), programmer productivity (the KLOC produced per person
months of effort), and system spoilage (the effort spent fixing faults per
total project effort).
36. Prediction system – consists of a mathematical model together with
a set of prediction procedures for determining unknown parameters and
interpreting results , according to Littlewood, and Fenton &
Fleeger, 1996).
37. Prediction system’s validation – is the process of establishing the
accuracy of the prediction system by comparing model performance
with known data in the given environment, according to (Fenton &
Pfleeger, 1996).
38. Top-down design is defined by (Brooks, 1975) as A good top-down
design avoids bugs in several ways. First, the clarity of structure and
representation makes the precise statement of requirements and
functions of the modules easier. Second, the partitioning and
independence of modules avoids system bugs. Third, the suppression of
detail makes flaws in the structure more apparent. Fourth, the design
can be tested at each of its refinement steps, so testing can start earlier
and focus on the proper level of detail at each step.
39. The Goal-Question-Metric paradigm – involves three steps: list the
major goals of the development or maintenance project, derive from
each goal the questions to be answered for meeting the goal, and decide

26
Copyright: 2012 by Florin POPENTIU-VLADICESCU
what must be measured in order to provide an adequate answer to
every question.
40. Capability and Maturity in Software Management – are considered by
Software Engineering Institute (Carnegie Mellon) to establish the
software process maturity level.
Five levels were identified:
 ad hoc (the least predictable and controllable),
 repeatable (but process depends on individuals),
 defined (the software process is defined and
institutionalized),
 managed (the software process is measured), and
 optimized (the most predictable and controllable).

From mathematical point of view, the following terminology is


important for a software reliability engineer:
1) A random variable that records the life time, or time to
failure, of a product is called a life time or failure time, or
survival time.
2) The reliability function is called, also, the survival function
(mainly in life sciences).
Obtaining adequate data from which to assess reliability and
availability is critical to any measurement based methodology. The
seven general reliability assessment requirements:
1) The developer assesses the reliability of the system
periodically throughout the lifecycle;
2) The reliability assessments should be based on data from
analysis, modeling and simulation, test and the field, and

27
Copyright: 2012 by Florin POPENTIU-VLADICESCU
should be tracked as a function of time and compared against
reliability allocations and customer reliability requirements;
3) The assessment strategy can includes reliability values
achieved during development phases;
4) The impact of changes to the design should me monitored and
evaluated;
5) The implementation of any corrective action is verified and its
effectiveness is measured;
6) A formal growth methodology should be used if possible to
plan and track the reliability improvement;
7) Predicted failure modes and mechanisms should be compared
against those from test and the field.

In software engineering and software reliability engineering,


people involved in software development, testing, reliability assessment,
and software management should consider the following principles:
 Gilb’s Principle of Fuzzy Targets: Projects without clear goals
will not achieve their goals clearly.
 DeMarco’s Principle: You cannot control what you cannot
measure.
 The first principle of Dijkstra on bug’s presence: Testing shows
the presence, not the absence of bugs.
 The first Dijkstra’s principle reformulated: Program testing
can be used to show the presence of bugs, but never to show
their absence!
 The Dijkstra’s classes of programming problems: When we
had no computers, we had no programming problem either.

28
Copyright: 2012 by Florin POPENTIU-VLADICESCU
When we had a few computers, we had a mild programming
problem. Confronted with machines a million times as
powerful, we are faced with a gigantic programming
problem.
 The fourth principles of investigation, formulated by (Fenton &
Flegger, 1996): 1) Choosing an investigative technique; 2)
Stating the hypothesis; (3) Maintaining control over variables;
4) Making the investigation meaningful.
 Uncertainty in assessment (Delic, Mazzanti & Strigini, 1997):
the assessor … is confronted with a wealth of evidence
about the design methods used, the quality assurance
organisation, the results of testing, etc., none of which is
usually sufficient to prove the desired conclusion, e.g., that a
system has a certain small probability of dangerous failure.
 Hardware versus software (according to IEEE 1633-2008)
from reliability point of view:
1) Hardware reliability is expressed in wall clock time. SR
may be expressed in execution, elapsed, or calendar
time.
2) Repair generally restores hardware to its previous
state. Correction of a software fault always changes the
software to a new state.
3) Hardware is constrained by physical law, while a
minor change in software can cause failure.
4) Failures attributable to software faults come without
advance warning and often provide no indication they

29
Copyright: 2012 by Florin POPENTIU-VLADICESCU
have occurred. Hardware, on the other hand, often
provides a period of graceful degradation.
5) Hardware and Software Reliability should be managed
as an integrated system attribute.
 There are many ways to develop Software Reliability models: a)
describing as stochastic processes; b) relate them to Markov
models; c) define the probability density or distribution
function; or d) specify the hazard function.
 The Continue Reading Principle: To find out more about
software reliability engineering is necessary to go ahead
through next chapters.

30
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1.3 The Software Reliability Engineering Objectives

For short term, the objectives of software reliability engineering


are close connected to the software reliability management and
improvement, software reliability estimation, and the maintenance
effort estimation in the case of faults appearance. For long term, the
software reliability of successive generations will be improved.
Software reliability can be achieved by using many sources of
information (Fig. 1.4) and will be effective by using various software
reliability engineering approaches (Fig. 1.5).
According to M.R. Lyu, software reliability engineering is
focused on engineering techniques for developing and maintaining
software systems whose reliability can be quantitatively evaluated .
Also, there are four categories of interested people in this subject: SRE
practitioners, software practitioners, non-software professionals, and
researchers.

Fig. 1.4. Sources of information Fig. 1.5. Possible approaches in


obtaining reliable software

31
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Musa (1999) considered that understanding software reliability
engineering has become a vital skill for both the software manager and
software engineer . This is true, also, nowadays when new software
development methodologies were proposed and new requirements of
the customers are formulated in connection with new communication
interfaces, new interaction techniques and a new behavior of users.
As the group of authors of Garcia et al consider, non-
software professionals need to understand enough about SRE to
understand how it affects their work, and even to apply it to a limited
extent. When Musa (1999) considered the objectives of software
reliability engineering, he wrote the following: A primary objective of
software reliability engineering is to help the engineer, manager, or user
of software learn to make more precise decisions. A strong secondary
objective is to make everyone more concretely aware of software
reliability by focusing attention on it.
The objective of this course is to bring that information useful
both for software professional and non-software professionals, related
to the basic activities of software reliability engineering: defining the
product, implementing the operational profiles, engineering the <<just
right>> reliability, preparing for test, executing test, and guiding test.
Also, various aspects concerning reliability models and optimization
techniques used to achieve some objectives like the optimum reliability
under budget constrains, or minimum cost for some reliability target.

32
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2 RELIABILITY ENGINEERING FUNDAMENTALS

2. 1 Reliability measures

2.1.1 Reliability, Maintainability and Availability

According to Gokhale et al., , a number of conflicting


views exist as to what software reliability is and how it should be
quantified. The first approach claims that a program is either correct or
incorrect. Hence an imperfect program has zero reliability and a perfect
program has a reliability value of one. The second approach is based on
program testing, and software reliability is defined as the relative
frequency of the times that the program performs without failure.
The failure process is random. Time of failure, time interval
between failures, cumulative failures experienced up to a given time,
and failures experienced in a time interval are all random variables (that
means uncertainty is experienced). There are some reasons explaining
this randomness:
1) the introduction of faults by programmers is an unpredictable
process (the locations of faults are unknown);
2) the working environment (including conditions of running) of
the software can behave unpredictable, etc.
The reader should note that a fault is the defect causing a failure.
Therefore it is not possible that different faults to determine the same
failure. Concerning to the second reason, if the environment has few
input states it is possible that the software may run without failure
forever (and no faults will be identified).

33
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Reliability is defined as the probability that the system will
perform its intended function under specified working condition for a
specified period of time, called the software mission. Mathematically, the
reliability function R(t) (also called the survival function) is the
probability that a system will be successful without failure in the
interval from time 0 to time t, i.e.
R t =P T>t ,t ,

where T is a random variable representing the failure time or time-to-


failure. The failure probability, or unreliability – defined as the
probability that the system will fail during the mission interval, is given
by
F t = P T t) = 1 – R t , t ,
which is known as the distribution function of T – called the failure
distribution function.
If the time-to-failure random variable T has a density function f
(t), then

R(t )   f (u )du , t

.
t

The density function, named also as the failure intensity function,


can be can be interpreted as the probability that the failure time T will
occur between time t and t + t, mathematically described by
f (t )  lim P(t  T  t  t ) .
t 0

34
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 2.1. Failure intensity / Reliability function over time
Hence, it results that the three functions: R(t), F(t), and f(t) are
closely related to one another. If any of them is known, all the others can
be obtained by elementary calculus. Some graphical representation is
given by Fig. 2.1.
As (Musa, 1999) remarked, Each alternative form of expressing
reliability, failure intensity, or reliability proper has its advantages.
Reliability statement is better suited to be used for complex systems
where the system reliability can be expressed using the reliability of
components (subsystems). Failure intensity can be used when the
failure rate at each point of time is required. Also, the number of
resident faults, for example, is a developer oriented measure, while
reliability is a user oriented measure.

35
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 2.2. Failure rate of Common Operating Systems
(http://www.ece.cmu.edu/~koopman/des_s99/sw_reliability/)

As faults are removed, the failure intensity will decrease and the
reliability will be increased. During design changes or adding new
features it is possible to introduce new faults, and for the new build the
reliability can be low, and will increase after faults removal. If a system
is stable (no changes are accepted), both failure intensity and reliability
tend to be constant.
The failure rate is the rate at which failures occur in a certain
time interval, let us denote it by [t1, t2], hence it is defined as the
probability that a failure per unit time occurs in the interval [t1, t2], given
that a failure has not occur prior to t1, and is computed by
R(t1 )  R(t2 )
(t2  t1 ) R(t1 )
.

36
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The hazard function is defined as the limit of failure rate as the
length of the interval approaches zero. Hence, the hazard function,
denoted by h(t), is the instantaneous failure rate, and is computed
according to the following relations:
R(t )  R(t  t )
h(t ) 
tR(t )
lim
t 0

1  d 
   R(t ) 
R(t )  dt 

f (t )
.
R(t )
If h(.) is an increasing (resp. decreasing) function then F(.) is
classified as Increasing (resp. Decreasing) Failure Rate (i.e. IFR (resp.
DFR)).
The hazard rate is related to the reliability by
 t 
R(t )  exp    h( x)dx  ,
 0 
and is equal to the failure intensity function.
In software reliability engineering the hazard rate in the field is
piecewise constant, discontinuities occurring only at new releases.
The mean time to failure (MTTF) is defined as the expected value
of the lifetime before the occurring of a failure. If the reliability function
for a system is given by R(t), and is the corresponding failure density
function, then MTTF can be computed as

MTTF   tf (t )dt .

But f (t )  
d
R(t ) , and using the integration by parts it is obtained
dt
successively:

37
Copyright: 2012 by Florin POPENTIU-VLADICESCU
MTTF    tdR(r )   tR(t ) 0   R(t )dt .
 

0 0

However, lim tR(t )  0 (the system must fail after a finite amount
t 

of operating time), and it results that

MTTF   R(t )dt ,


the integration being performed with respect to the system operating


time.
As Musa, claims, MTTF is used in the hardware reliability
field and to a decreasing extent in software reliability. There are many
cases in software reliability in which MTTF is undefined, while failure
intensity always exists. If the reliability is perfect, the MTTF is undefined
(because there are no failures).
Let S denote the random variable of the time of repair or the total
downtime. If the repair time S has a repair time density function g(.),
then the maintainability, V(t), defined as the probability that the system
will be back in operational state by time t, can be written as:

V (t )  P( S  t )   g ( s)ds.
t

For software, maintainability can be described also by the speed


and ease with which a program can be corrected , as Musa, 1999)
explained.
The mean downtime or the mean time to repair (MTTR) is the
expected value of the random variable repair time S, and is given by

MTTR   sg ( s)ds.

38
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The mean time between failures (MTBF) is an important measure
for repairable systems being the expected value of the random variable
time between failures, and is evaluated as MTTBF = MTTF + MTTR.
Availability is a measure that allows the repair of a system when
failures occur, while reliability is a measure asking the system to be full
operational for the entire mission of time. The availability is defined as
the probability that the system is operational at time t, i.e.


System up time MTTF
System up time + System down time MTTF  MTTR
Availability = .

Is a common practice to correct software, in generally, off-line,


this is why software maintainability does not affect considerably the
software availability (only a restart is necessary).
The reader should be aware of the following assumptions for
software availability modeling, according to (Tokuno & Yamada, 2003,
pp. 236):
1) A software system is unavailable and starts to be restored as
soon as a software failure occurs, and the system cannot operate
until the restoration action is complete;
2) A restoration action implies debugging activity, which is
performed perfectly with probability a – the perfect debugging
rate, and imperfectly with the probability b = 1-a, where a, b
greater or equal to zero;
3) The software failure-occurrence time (up time) and the
restoration time (down time) follow random models, usually
exponential models, and
4) The probability that two or more software failures occur
simultaneously is negligible.

39
Copyright: 2012 by Florin POPENTIU-VLADICESCU
A proactive fault management technique aiming at cleaning up
the internal state of the system to prevent the occurrence of more
severe crash failures in the future and increase the availability of
software is called rejuvenation design which involves, occasionally ,
stopping the running cleaning its internal state (by garbage collection,
flushing operating system kernel tables, flushing internal data
structures) and restarting it.
When comparing to hardware systems, the term occasionally is
replaced by:
 clock-based (according to a plan), age-based (according to the
age of the component),
 usage-based (according to the usage time of the product),
 condition-based (some events appear and where identified by
monitoring),
 opportunity-based (in general for multi-component systems),
and
 design-out (when a component redesign is necessary to
improve the reliability).
Rejuvenation design depends on the application availability
requirement, the application failure rate and the rejuvenation
performance impact. Software rejuvenation is a cost effective approach
for proactive management of software faults that include protection not
only against hard failures, but also against performance degradation.

40
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2 Reliability Function for Common Distributions

In this section some common distributions will be considered.


Probability distribution is a fundamental concept in statistics being
useful both on a theoretical level and a practical level. For reliability
models, the probabilistic distributions are used to calculate confidence
intervals for parameters and to obtain critical regions during hypothesis
testing. Collected data should be analyzed and one task will be to
determine an adequate distributional model for data. Also, the
simulation of the behavior of the models requests random numbers
from a specific probability distribution.
Below, relevant aspects for applications in reliability are detailed.
For a compact presentation, the following aspects will be described for
each probabilistic distribution function: practical motivation, the
analytic expressions of statistic measures, the reliability measures, and
some comments. The abbreviation pdf, when used, is for probability
distribution function and cdf, also when is used, stands for cumulative
distribution function , or simply distribution function .

41
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.1. Exponential distribution
Practical Having a constant failure rate, this distribution is
motivation a good choice in software reliability engineering
due to the fact that a used software component
that has not failed is as good as a new component.
Because of the memoryless property used-as-
good-as-new of this distribution, it is
appropriate to model the constant hazard rate
portion of the bathtub curve used in reliability
theory.
Other useful facts: 1) Reliability: The amount of
time a component has been in service (to be
repaired) has no effect on the amount of time
until it fails; 2) Service systems: The amount of
remaining service time is independent of the
amount of service time elapsed so far; 3) Inter-
times: The amount of time since the last event
contains no information about the amount of
time until the next event.
Due to the mathematical properties of the exp
function (exp(x) * exp(y) = exp(x + y)), it is a very
good choice for computing the reliability of serial
architectures.
In software reliability engineering, the
exponential distribution can be used for
modeling both the failure and repair processes.

42
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Analytic The probability distribution function (pdf) of the
expressions of exponential distribution, with parameter >0, is
 exp( x), x  0,
f ( x;  )  
statistic
measures 0, x  0.

The cumulative distribution function is


1  exp( x), x  0,
F ( x;  )  
0, x  0.

The expected value of an exponentially


distributed random variable X with parameter 
is E[X] = 1/, and the variance Var[X] = 1/(2).
Alternative parameterization:
1  (x  ) 
 exp   , x 
f ( x;  ,  )      
0 x  ,

where  is a location parameter, and  is the scale
parameter.
exp( x), x  0,
R( x;  )  
Comments
1, x  0,
,

MTTF = 1/,
h(x; ) = f(x; )/R(t; ) = .
For the exponential distribution, the hazard
function, or instantaneous failure rate, is constant
over time.
The memoryless property is like enabling
technology for the construction of continuous
time Markov chains (see the next section).

43
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.2. Weibull distribution
Practical The Weibull distribution is flexible and good for
motivation modeling component lifetimes with fluctuating
hazard rate functions.
Analytic The formula for the probability density function
expressions of of the general Weibull distribution, with  being
statistic the shape parameter,  the location parameter,
measures and  the scale parameter, is
   x   ( 1)
 exp((( x   ) /  ) ), x   ,
f ( x;  ,  ,  )     
0, x  .

The case where  = 0 and  = 1 is called the


standard Weibull distribution. The case where  =
0 is called the 2-parameter Weibull distribution.
The case  = 0 and  = 2 is called the Rayleigh
distribution.

The statistic and reliability measures are


presented for the standard Weibull distribution.
The cumulative distribution function is
1  exp( x ), x  0,
F ( x;  )  
0, x  0.

exp( x ), x  0,
R( x;  )  
Comments
1, x  0,
,

44
Copyright: 2012 by Florin POPENTIU-VLADICESCU
H(x; ) = f(x; )/R(x; ) = x(-1).

2.1.2.3. Normal Distribution


Practical Many classical statistical tests are based on the
motivation assumption that the data follow a normal
distribution. In modeling applications, such as
linear and non-linear regression, the error term is
often assumed to follow a normal distribution
N(0,1), and the normal distribution is used to find
significance levels in many hypothesis tests and
confidence intervals. The theoretical basis for the
wide applicability of the normal distribution is the
central limit theorem which states that as the
sample size (denoted by N) becomes large (N),
the following occur:
 The sampling distribution of the mean
becomes approximately normal regardless
of the distribution of the original variable.
 The sampling distribution of the mean is
centered at the population mean () of the
original variable, and the standard
deviation of the sampling distribution of

the mean approaches  .


N
Analytic The general formula for the probability density
expressions of function of the normal distribution N(,), for x 
statistic (-, ), is
measures

45
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 1  x   2 
f ( x;  ,  )  exp   
 2  2    
1
 
,

where  is the location parameter and  is the


scale parameter.
The case where  = 0 and  = 1 is called the
standard normal distribution written

as f ( x) 
1  x2
2

2
e with the cumulative distribu-

tion function (t)=  f ( x)dx , which does not exist


t



in a simple closed formula, being computed only


numerically. Tables for the standard normal
density function are readily available for use.
If X is a random variable distributed N(, ), then
E[X] = , and Var[X] = .
Comments The reliability function is R(t)=1-(t) and the
hazard function becomes
h(t)=f(t)/R(t)=f(t)/(-t).
)t is easy to prove h t that the hazard
function for a normal distribution is a mono-
tonically increasing function of t.
The shape of the normal distribution resembles
that of a bell (the Gaussian bell).
The empirical rule states that for a normal
distribution: 68% of the data will fall within 1
standard deviation of the mean; 95% of the data
will fall within 2 standard deviations of the mean;

46
Copyright: 2012 by Florin POPENTIU-VLADICESCU
almost all (99.7%) of the data will fall within 3
standard deviations of the mean.

2.1.2.4. Log Normal Distribution


Practical The lognormal distribution is applied in
motivation maintainability engineering and is able to model
failure rate distribution. A variable X is lognormal
distributed if Y = LN(X) is normally distributed
with "LN" denoting the natural logarithm. Of
course, the popularity of the lognormal
distribution is also due to the fact that the
probabilities and percentiles (the inverse) are
easily evaluated by reference to the normal
distribution.
Analytic The general formula for the probability density
expressions of function of the lognormal distribution LnN(,),
statistic for x  (-, ), is
 1  ln x   2 
f ( x;  ,  )  exp   
 2    
measures
 x 2
1
 
.

where  is the location parameter and  is the


scale parameter. The case where  = 0 and  = 1 is
called the standard lognormal distribution written

as f ( x) 

2
(ln x )

x 2
1
e 2 .

The formula for the cumulative distribution


function of the standard lognormal distribution is
F(x) = (ln x), where  is the cumulative

47
Copyright: 2012 by Florin POPENTIU-VLADICESCU
distribution function of the normal distribution.

If X is a random variable distributed LnN(, ),


then
E[X] = exp(+2/2),
and
Var[X] = exp(2   2 )(exp( 2 )  1).

Comments If T is the random variable of failure time then


F t = P T t = P ln T ln t =
=P Z ln t - )/),
where
Z = (T-)/
is distributed as N(0, 1).
R(t) = 1 - F(t) and h(t) = f(t)/R(t).
The lognormal pdf is used extensively in reliability
applications to model failure times.
The lognormal and Weibull pdfs are extensively
used in reliability applications proving very good
modeling capabilities.
Other properties: (1) the lognormal distribution
is a distribution skewed to the right; (2) the pdf
starts at zero, increases to its mode, and decreases
thereafter; (3) Most used to determine failure due
to crack propagation, modeling material fatigue
failures, and material strengths.

48
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.5. Gamma Distribution
Practical The gamma failure density function has shapes
motivation that are very similar to the Weibull distribution.
The sum of exponential distributed random
variables is a random variable gamma distributed;
more precisely the distribution is Erlang.
Analytic The general formula for the probability density
expressions of function of the gamma distribution is

 x   x 
 1

 exp  
   
statistic

f ( x;  ,  ,  )   
( )
measures ,

x   ,   0,   0
where:
  is the scale parameter,  is the location
parameter,
  is the shape parameter, and
 (.) is the gamma function given by the formula

( x)   t x 1et dt.

The case =0 and =1 is named the standard


gamma distribution, and its pdf is reduced to
x 1e x
f ( x;  )  , x  0;   0.
( )
The formula for cdf of the standard gamma
distribution is

49
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 x ( )
F ( x;  )  , x  0,   0 ,
( )
where (.) is the function defined above, but x(a)
is the incomplete gamma function4 that is defined
by the formula

 x (a)   t a 1et dt.


x

In order to compute various measures (statistics,


reliability), the reader should consider the
following basic property of the gamma function:
 When x = k (a positive integer), (k)=(k-1)!
The factorial function; If x is not integer,
the reader should use the table of gamma
function.
 When x = (2k+1)/2, with k an integer, then
 2k  1  2 k  1 2 k  3
  .
1
 2  2 2 2
Comments R(t; ) = 1 – F(t; ), h(t; ) = f(t; )/R(t; ).
If  = 0,  = 1, and  is an integer then () = (-1)!,
the failure density reduces to
t  1 t
f (t ) 
(  1)!
e ,

and the reliability function becomes

R(t )  et 
 1
tk
.
k 0 k !

4
The gamma function was first introduced by the Swiss mathematician Leonhard Euler
(1707-1783) in his goal to generalize the factorial to non integer values.

50
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.6. Lomax Distribution
Practical It is known also as the Pareto of the second kind.
motivation Let X be a random variable that has an exponential
distribution with parameter  having the density
function

g ( x; ,  )  x 1e x /  , x  0,
 ( )
1

then the resulting unconditional reliability


function is R(t)=(1+t)-. It is said that X has a
Lomax distribution.
Analytic The pdf of Lomax distribution is

f ( x; ,  )  , x  0.
(1   x) 1
expressions of
statistic
measures Therefore

E[ X ]  ,  1
 (  1)
1

and

Var[ X ]  ,   2.
(  1)2 (  2)
Comments The hazard (failure rate) function is given by

h(t ) 
1 t
.

The Lomax distribution can be used to model the


reliability of software operated in perturbed
environment.

51
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.7. Binomial Distribution
Practical The most widely used distribution in reliability
motivation and quality inspection; situations like
success/failure, accept/reject etc. can be easy
analyzed using the discrete binomial distribution.
The binomial distribution is well suited to obtain
the probability of observing x successes in N trials,
with the probability of success on a single trial
denoted by p. The binomial distribution assumes
that p is fixed for all trials.
Analytic The pdf of the distribution is given by
N
P( X  x)    p x (1  p) N  x , x  1,1, 2,
expressions of
x
, N,
statistic
measures where
N
 
N!
 x  x !( N  x)!
is the binomial coefficient (k! is for factorial).
Comments The reliability function, R(k), describes the
situation at least k pout of N items are good and
is given by

R(k )     p x (1  p) N  x .
N
N
xk  x 

52
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.8. Geometric Distribution
Practical The geometric distribution models the number of
motivation successes before one failure in an independent
succession of tests where each test results in
success or failure. Each trial has two possible
outcomes: success with probability p, failure with
probability q, p + q= , p, q .
Analytic P(Y=x) = pxq, x = , , , , …
expressions of The cdf of the geometric distribution is obtained
statistic measures as
P Y x = P Y= + P Y= +…+ P(Y=x) = =
q+pq+…+pxq=
= q +p+…+px)=1-px+1.
Comments Another variant describes the probability
distribution of the number X of Bernoulli trials
needed to get one success, x = , , ,…
A recent proposal, namely the Weibull-Geometric
distribution, is suitable for modeling unimodal
failure rate.
There are some software reliability growth
models based on geometric distribution
interpretation.

53
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.1.2.9. Poisson Distribution
Practical The Poisson distribution is used to model the
motivation number of events occurring within a given time
interval. The sample size is unknown.
Analytic If  is a constant failure rate, and x is the number
expressions of of events, the pdf of Poisson distribution is given
statistic by
( t ) x e   t
P( X  x)  , x  0,1, 2,
measures
x!
where t is the time mission.
Comments The reliability function R(k) – the probability of k
or fewer failures – is given by

R(k )  
k
( t ) k e   t
.
x 0 x!
The Poisson distribution is frequently used as a
first approximation to described failures expected
with time.
In some special cases, the Poisson distribution is
an excellent approximation to the binomial
distribution.
This distribution is particularly important in
stochastic processes, where it counts the number
of events in a given period of time (see the next
section on Poisson Processes).

54
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.2 Stochastic processes in reliability

2.2.1. Preliminaries

The previous section uses both the concept of probability and


distribution function, and their characteristics relevant to reliability
engineering, in general, and to software reliability engineering, in
particular. This section introduces sequences of random variables
defined on the same probability space useful to describe the evolution of
the process in time.
When examining the probabilistic evolution of a process, the
researcher observes a stochastic process: Xt, t  T (the set T is called the
parameter space). If the distribution of the random variables of the
process is the same then the process is called stationary, but if the
process evolves with time, is called nonstationary.
If the parameter space T is discrete and countable or finite, the
sequence is called a discrete-time process and is denoted by {Xn}, where
n = 1, 2, ... .
If the parameter space T is continuous or uncountable, the
sequence is called a continuous-time process and is denoted by {X(t);
t }.
The set of all possible and distinct outcomes of all experiments in
a stochastic process is called its state space and normally is denoted by
. Its elements are called the states.
If the state space is a discrete set, then the process is called a
discrete-state process. Otherwise, it is called continuous-state process.

55
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Discrete state processes are called chains. The methods of
process analysis depend on the nature of the spaces (parameter space
and state space) of the stochastic process under study.
An important example of stochastic process refers to the
counting process. A stochastic process is said to be a counting process if
X(t) represents the total number of events that have occurred up to time
t. Hence a counting process X(t) has to satisfy:
(i) X (t )  0 ,
(ii) X(t) is integer-valued,
(iii) if s<t, then X (s)  X (t ) for s, t  R , and
(iv) for s<t, X(t)-X(s) equals the number of events that have
occurred in the interval (s, t).
A special case of counting process is the renewal process. An event is
called a renewal if upon its occurrence everything starts over and over
again stochastically.
Let X1 be the time to the first renewal and let Xn n = , , … be
the time between the (n-1)st and the n-th renewal. Assume that Xn (n =
, , … are independent and identically distributed random variables
with distribution function of F, with F(0) = P{Xn = 0) <1, and positive
 
expectation   xdF ( x)  0  .
0 

Let define the time of the n-th renewal by Sn   X i , and let


n

i 1

N (t )  max{n | Sn  t} be the number of renewals by time t. Build in this

way, the counting process {N t , t } is a renewal process one.


Two types of processes are important in reliability: Markov
processes and Poisson Processes.

56
Copyright: 2012 by Florin POPENTIU-VLADICESCU
A Markov process is one having the following property, called
also the Markov property: the probability of any particular future
behavior of the process, when its current state is known exactly, is not
changed by additional information concerning its past behavior.
A Poisson process is a stochastic process which counts the
number of events and the time when these events occur during the time
interval under study.

2.2.2 Discrete Time Markov Chains

A general discrete-time Markov chain (DTMC) is a sequence of


discrete random variables {Xn; n = , , , … } in which Xk+1 is dependent
on all X0, X1, …, Xk, k .
In practice and for the ease of mathematical study, it is assumed
that Xk+1 is dependent only on i previous random variables, where is a
fixed and is a finite number. These types of DTMCs are called Markov
chains of order i and satisfy the following condition:
P{ X k 1  j | X 0  i0 , X 1  i1 , , X k  i} 
 P{ X k 1  j | X k i 1  ik i 1 , X k i  2  ik i  2 , , X k  i}.

Moreover, the most used DTMCs are the first-order DTMC type,
called simply as Markov chains. In this case only the present (at time k)
has any influence on the future (at time k+1); it is a memoryless process.
In other words, for all k ,
P{X k 1  j | X 0  i0 , X1  i1 , , X k  i}  P{X k 1  j | X k  i}.

57
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The state space is considered to be either finite or countable
infinite. We can define the (one-step) transition probability from state i
to state j at time k by
pij (k , k  1)  P( X k 1  j | X k  i), k  0,1, .

The multistep transition probabilities at time k are defined by


pij (k , k  m)  P( X k m  j | X k  i), k  0,1,

These transitions can be modeled by matrices. The Chapman-


Kolmogorov relations establish the rules of computing the transition

pij (k , n)   pil (k , s) plj (s, n) ,


matrix (a product of matrices):

where s is some integer between k and n, and l covers the space state.

2.2.3 Continuous Time Markov Chains

A continuous time Markov chain (CTMC) has a discrete state


space, but the time of process evolution is continuous. Let T be the time
space, T = [0, ), and {X(t); t T} a stochastic process taking values in
the discrete set of states . The process {X(t); tT} is a continuous time
Markov chain if for each s , t > , and each set A, the following
condition is satisfied:
P{X(t+s)  A | X u , u s} = P{X t+s  A | X(s)},

58
Copyright: 2012 by Florin POPENTIU-VLADICESCU
i.e. the conditional distribution of the future state, given the present
state and all history, depends only on the present state being
independent of the past. More specifically, if for each s , t > , for each
states i, j  , and the history X u , u s, the above condition becomes
P{X t+s = j | X s = i, X u , u s} = P{X t+s = j | X s = i}.

Let us denote pij s, t = P X t = j | X s = i}, s < t, the transition


probability function. The Chapman-Kolmogorov equations can be

pij ( s, t )   pik ( s, u) pkj (u, t ) ,


written as
s < t,
k

where k covers the space of states.


A special case of continuous-time stochastic process is
represented by the process having independent increments: if
t0  t1   tn then the random variables { X (ti )  X (ti 1 ) | i  1, 2, ,n}

are independently stochastic.

2.2.4 Semi-Markov Processes

One kind of continuous time Markov chain that allows the


presence of probabilistic distribution for the holding time is called semi-
Markov process if it is defined as follows:
 X(0) = i, and

59
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 the state j is chosen according to a joint distribution Fij(t), that
means the holding time distribution for the state j is
Fij(t)/Fij().
For a semi-Markov process there are defined both the holding
time distribution Fi(t), Fj(t), and a probability of transition from state i to
state j, denoted by P = {pij}. Using the above notation the following

Fij (t )   pik pkj  Fik (t )  Fkj (t ) ,


equation holds:

where

F (u )  G(u )   F ( s)G(u  s)ds


u

is the convolution operation.


Applying the Laplace-Stieltjes transform

L( F (t ))   e st F (t )  F ( s) ,

Fij ( s)   pik pkj Fik (s) Fkj (s) .


to the above equation, a simplified equation is obtained:

2.2.5 Birth-Death (BD) Processes

A birth-death (BD) process refers to a Markov process with a


discrete state space, the states being indexed by i = , , , … such that
state transitions can occur only between neighboring states i  i  1 or
i  i  1 . The following notations are used for BD-processes:
i  pij , j  i  1 (the probability of birth in interval t is i t ),

60
Copyright: 2012 by Florin POPENTIU-VLADICESCU
i  pij , j  i  1 (the probability of death in interval t is i t ), and

pij  0 , j {i  1, i  1} .

An alternative way to consider the Markov process {X t , t as


a BD process consists of fulfilling of the following properties:
(i) it is a counting process having independent increments;
(ii) there exists the rates {n , n  0} (the birth intensities) such

as P({N (t  t )  n  1}| N (t )  n})  n t  O(t ) ;

(iii) there exists the rates {n , n  0} (the death intensities)

such as P({N (t  t )  n  1}| N (t )  n})  n t  O(t ) ;

(iv) P({N (t  t )  n  k}| N (t )  n})  O(t ) , for any k>1. The


notation O(.) is the big O used to describe the limiting
behavior of a function when the argument tends towards a
O(t )
 0.
t
particular value. In our case lim
t 0

As defined above, the BD process is homogeneous. For the case


when n  n (t ), n  0 or n  n (t ), n  0 , the process is a non
homogeneous one.

2.2.6 Poisson Processes

A Poisson process is a BD process with n   a constant

expression when n is a parameter, and n  0 (a pure birth process). If 

is a constant function with respect to the time then the process is called
Homogeneous Poisson Process (HPP). However, if  = (t) then the

61
Copyright: 2012 by Florin POPENTIU-VLADICESCU
process is called Non homogenous (Non Homogeneous Poisson Process
– NHPP).
For a Poisson process, the probabilities Pn(t) = P({N(t)=n}), n
nonnegative, satisfy the following system of ordinary differential
equations:
 P0' (t )   (t ) P0 (t ),
 '
 Pn (t )   (t ) Pn (t )   (t ) Pn 1 (t ), n  1

that for the initial conditions {there exists i such that Pi(0) = 1, Pn(0) = 0,
n  i} has the solution
(t )n  (t )
Pn (t )  e ,
n!

where

(t )    ( s)ds, (0)  0.


t

For a homogenous Poisson process, (t )  t.

62
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.3 Basic Techniques in Reliability

2.3.1 Component-based systems

The complex systems are composed by components; this is why


the reliability of the entire system depends on the reliability of the
components. Depending on the architecture of the system, the
components can be connected serially (SEQ notation), in parallel (PAR
notation), and mixed (see below). If the system S is composed by n
subsystems S1, S2, …, Sn, connected serially, the system Si having the
reliability Ri i = , , …, n , then the reliability of S = SEQ S1, S2, …, Sn),
denoted by RS, is given by

RS   Ri .
n

i 1

S1 S2 Sn

Fig. 2.1. A serial system

S1

S2

Sn

Fig. 2.2. A parallel system

63
Copyright: 2012 by Florin POPENTIU-VLADICESCU
If the system S is composed by n subsystems S1, S2, …, Sn,
connected in parallel, the system Si having the reliability Ri i = , , …,
n), then the reliability of S = PAR(S1, S2, …, Sn), denoted by RS, is given by

RS  1   (1  Ri ) .
n

i 1

In order to exemplify the reliability computation of a hybrid


system, let us consider a parallel series system S = SEQ (PAR(S1, S2, S3),
S4, PAR(S5, S6), S7), every basic component Si having the reliability Ri, i =
, , …,. . The reliability of S can be computed as:
RS = (1-(1-R1)(1-R2)(1-R3)) R4 (1-(1-R5)(1-R6)) R7.

S1 S5

S2 S4 S6

S3 S7

Fig. 2.3. A hybrid system

An important design is used in the framework of fault tolerant


systems, where a block is composed by n identical basic components,
everyone having the reliability R, and the block is working when at least
k components out of a total of n must be operational. In this case, we
denote the system by S  PARkn (S1 , S2 , Sn ) , and the reliability of S is

given by

RS    Ri (1  R)( n i ) ,
n
n
i k  i 

64
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where
n
 
n!
 i  i !(n  i )!

is the binomial coefficient.


A particular design, used in telecommunications, oil pipelines,
computer ring networks, and other kinds of systems, is an ordered
sequence of n components which fail if and only if at least k consecutive
components in the system fails, denoted by Con/n/k. The particular
cases are k=1 (serial system), and k = n (the parallel system). The
components can be arranged on a straight line or on a circle. Let pi be
the reliability of the component i, and qi = 1 – pi i = , , …, n . The
reliability R(p1, p2, …, pn) of any coherent system of order n may be

, pn )   pi  q j .
expressed as follows using the path (D) set of the system:
R( p1 , p2 ,
pD i p j p

The computation can be rewritten based on the pivoting


technique:
R(p1, p2, …,pn) = pi R(p1, …, pi-1, 1, pi+1, …, pn) +
+ qi R(p1, …, pi-1, 0, pi+1, …, pn).

65
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.3.2 Network reliability

Actual information systems are network-based: Internet-based


applications, applications working in local area computer networks, grid
applications, and cloud computing approaches. Computing the reliability
of such a complex system requires knowledge of network models and
associated basic techniques.
A network consisting of interconnected nodes can be specified by
the set of nodes, denoted by N, and the set of connections, denoted by E.
A connection is described by a pair of nodes. If the pairs are ordered
then a directed network is obtained; the connection of the node i
(starting point) and the node j (ending point) is denoted by (i, j). If the
pairs are considered as unordered then an undirected network is
obtained and the edge joining i and j is represented by [i, j].
Generally, there are three kinds of fault models in a probabilistic
network:
1) node fault model: the edges of the network are perfectly
reliable, but the nodes fail independently;
2) edge fault model: the nodes of the network are perfectly
reliable, but the connections fail independently;
3) node-and-edge fault model: edges and nodes fail
independently of each other.
In this material we consider only the edge fault model to
illustrate the aspects required by the brute-force computational model.

66
Copyright: 2012 by Florin POPENTIU-VLADICESCU
In an edge fault model it is assumed that elements (nodes,
connections) fail randomly and independently, according to certain
known probabilities. In the following we consider the network G = (N,
E), and assume that is a directed network having two distinguished
nodes: s – the source, and t – the destination. The nodes are assumed to
be perfect, but the connections are assumed to fail in a statistically
independent fashion with known probabilities qk (k is the index of a
connection); pk = 1-qk is the probability of being operational. A
fundamental measure is Rst(G) – the two terminal reliability measure.
Let us consider m be the number of edges (m = |E|), every edge having
one of two states, working or failed; the state of the network can be
represented by a 0-1 array s = (s1, s2, …, sm) – the kth component of the
array s is 1 if the connection k is working, otherwise sk = 0. Based on the
assumption of independent edge failures, the probability associated of a
given state s is given by

P( s)   pk sk (1  pk )(1 sk ) .
m

k 1

Define the Boolean indicator Ist(s), as having the value 1 when s


describes an s-t path (any connection k from a s-t path has sk =1), and
the value 0, otherwise. The two-terminal reliability can be obtained by


considering all network states (2m), filtered by the Boolean indicator as:
Rst (G)  Ist ( s) P(s) .
all network states (s)

Since only states s with the non null Boolean indicator contribute
to the above sum, it is not necessary to consider all states (2m is a huge
number making the approach impractical), but only those related to the
s-t path. The focus will be on the simple paths {P1, P2, …, Pk} of G. Let Ei
be the event that all edges in path Pi operate. Therefore

67
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Rst(G) = P(E1  E2  …  Ek).
Also, the two-terminal reliability can be formulated considering the
cutsets of G (minimal s-t edge disconnecting sets): an s-t edge
disconnecting set is minimal if it does not contain any other edge
disconnecting set separating s and t. Let {C1, C2, …, Cr} be the s-t cutsets
of G. If Fj is the event that all edges in the cutest Cj fail, then the
unreliability is given by:
Ust(G) = P(F1  F2  …  Fr).
Let e be a particular edge of G. This edge can be perfect (pe = 1), this
system will be denoted by G/e, or can fail (pe= 0) and the system will be
denoted by G-e. According to the pivotal decomposition formula
(Moskowitz(1958), cited by Shier(1991)):
Rst(G) = peRst(G/e)+(1-pe)Rst(G-e).

Example (The bridge network).


Let us consider the network with nodes {s, t, a, b} and the
connections with their numerical labels {(s, a)-1, (a, t)-5, (s, b)-2, (b, t)-6,
(a, b)-3, (b, a)-4}. The set of s-t paths of G are: P1: 1, 5; P2: 2, 6; P3: 1, 3, 6;
P4: 2, 4, 5.
Using the inclusion-exclusion principle, Rst(G) = P(E1) +P(E2) +
P(E3) + P(E4) – P(E1E2) – P(E1E3) – P(E1E4) – P(E2E3) – p(E2E4) – p(E3E4)
+ P(E1E2E3)+P(E1E2E4)+P(E1E3E4)+P(E2E3E4)- P(E1E2E3E4), where: P(E1)
= p1p5; P(E2) = p2p6; P(E3) = p1p3p6; P(E4) = p2p4p5; P(E1E2) = p1p2p5p6;
P(E1E3) = p1p3p5p6; P(E1E4) = p1p2p4p5; P(E2E3) = p1p2p3p6; P(E2E4) =
p2p4p5p6; P(E3E4) = p1p2p3p4p5p6; P(E1E2E3) = p1p2p3p5p6; P(E1E2E4) =
p1p2p4p5p6; P(E1E3E4) = p1p2p3p4p5p6; P(E2E3E4) = p1p2p3p4p5p6;
P(E1E2E3E4) = p1p2p3p4p5p6.

68
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.3.3 Statistical estimation methods

The reader may skip this subsection if its background is adequate


to reliability data analysis. However, if necessary he/she will return to
these pages if necessary.
The maximum likelihood estimation method is considered firstly,
and then goodness of fit techniques like Chi-Squared test, and
Kolmogorov-Smirnov test are described. Finally, some interval
estimation methods are given. However, we do not consider bootstrap
methods and other computer intensive approaches for confidence bands
estimation.
If X1, X2, …, Xn is a random sample from a distribution having the
density f(x, ), where  = (1, 2, …, m) is a vector of unknown
parameters to be estimated, the random variables are independent, then
the likelihood function L(X, ) is the product of probability density
evaluated at each sample point, where X = (X1, X2, …, Xm). The maximum
likelihood estimator * is found by maximizing L(X, ) with respect to .
Computationally, the maximum likelihood estimator is found
considering the log likelihood function

ln L( X , )   ln f ( X i ,  ) ,
n

i 1

taking the partial derivatives of lnL with respect to each component i of


, and equating everyone to zero and solving the system of nonlinear (in
general) equations.

69
Copyright: 2012 by Florin POPENTIU-VLADICESCU
As example we estimate the parameter of the exponential
distribution by maximum likelihood method. Let X1, X2, …, Xn be a
random sample from the exponential distribution, where the density
function is f(x, ) =  e-x, >0, x>0. The joint probabilistic distribution
function of the sample is given by
  xi
L( X ,  )   e
n

n i 1
,
and

ln L( X ,  )  n ln     xi .
n

i 1

By setting the first derivative of lnL, with respect to , equal to


zero, and solving the equation we find

* 
x
n
n
,

i 1
i

which is the maximum likelihood estimate of .


Two common techniques, namely the 2 goodness of fit test and
the Kolmogorov-Smirnov d test, are used to compare some observed
sample distribution with a theoretical distribution.
A 2 test can be used to test the hypothesis that observed data
follow a particular distribution. The test procedure consists of arranging
the n observations in the sample into a frequency table with k classes.
The 2 statistic is:

S 
k
( fi  Fi )2
,
i 1 Fi

70
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where fi is the frequency of sample observations in each class, and Fi is
the theoretical frequency (area under density function between the class
boundaries).
From the 2 tables, a value of 2 with the desired significance
level and with degrees of freedom ( = k-1-p, where p is the number of
the estimated parameters) rejects the hypothesis that the sample
distribution is the same as theoretical distribution if
S  12 ,k 1 p ,

where  is called the significance level.


The 2 goodness of fit test is not valid if the expected frequencies
are too small. There is no general agreement on the minimum expected
frequency allowed (greater than 1), but values of 3, 4, or 5 are often
used. If an expected frequency is too small, two or more classes have to
be combined.
The Kolmogorov-Smirnov test is based on the empirical
distribution function. Given n ordered data points, X1 X2 … Xn, the
observed distribution function, Fn(x), is computed as follows (it is a step
function that increases by 1/n at the value of each data point):
0, X  X1
i

Fn ( X )   , X i  X  X i 1
n
1, X  X n.

If we can plot the empirical distribution function Fn and the cumulative


distribution function for a given distribution F*, then the Kolmogorov-
Smirnov test is based on the maximum distance between these two
curves: Dn  sup | Fn ( x)  F *( x) | . If  is the significance level, n is the
xR

71
Copyright: 2012 by Florin POPENTIU-VLADICESCU
sample size, and dn, is the critical value from the table of the
Kolmogorov-Smirnov test, then the hypothesis F = F* is rejected only if
dn > dn,. In the table given below F* is denoted F0 by (Miller, 1956).

72
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3 RELIABLE SOFTWARE DEVELOPMENT

3.1 Software lifecycle

A software lifecycle covers the period of time in which the


software is conceived, developed and used , according to )EEE. Other
definitions also exist, for instance: The phases a software product goes
through between when it is conceived and when it is no longer available
for use.
The software life-cycle typically includes the following:
requirements analysis, design, construction, testing (validation),
installation, operation, maintenance, and retirement Denis (owe - The
Free On-line Dictionary of Computing, 2010).
According to GSAM4: Guidelines for Successful Acquisition and
Management of Software-Intensive Systems: Weapon Systems,
Command and Control Systems, Management Information Systems -
Condensed Version . February , The overall framework in which
software is conceived, developed, and maintained is known as the
Software Development Life Cycle SDLC . […] A life cycle model defines
the phases, milestones, deliverables, and evaluation criteria of the
software development process. These form the basis of the work
breakdown structure (WBS), used for project planning and
management. This is a valuable remark for the software reliability
engineer, which is not interested in technical details of coding, but in
other phases. The number of phases starts from three (Design,
Development, and Maintenance) in the case of very simple projects to at

73
Copyright: 2012 by Florin POPENTIU-VLADICESCU
least six for complex software projects. One phase can be composed by
multiple activities.
Frestimate considers the following nine phases:
 Concept,
 Software Requirements,
 Top Level Design,
 Detailed Design,
 Code/Implementation,
 Unit/development Test,
 System/verification/validation Test,
 Operational Test, and
 Operation.

Let us follow a simplified SDLC model consisting of five phases:


Analysis, Design, Implementation, Testing, and Deployment and
maintenance.

Analysis
The analysis phase is the first and the most important in the
whole process of building high quality software. During this phase the
problem is analyzed, the requirements are defined, and the
specifications are established for the subsequent activities phases.
The problem analysis will establish the scope of the project and a
clear understanding of what is required. By interactions with customer
the problem will be reformulated in such a way to be well-defined and
the answer to the following questions will be known: What is the context
of the problem? What are the parts or elements of the problem? Why the

74
Copyright: 2012 by Florin POPENTIU-VLADICESCU
user needs a software product to solve the problem? Who or what is
affected by this problem?
Once a well-defined problem is given, the next activity consists of
requirements collection and analysis to establish the product
capabilities and constraints). Both qualitative and quantitative
requirements have to be identified.
Firstly, the requirements addresses quality software attributes:
 correctness through verification & validation,
 reliability by design,
 usability based on needs and preferences,
 integrity (safety critical software, by cryptographic
techniques),
 efficiency (short time answering using small resources),
 portability (by implementation),
 reusability (by design and implementation),
 interoperability (by usage of standards),
 maintainability,
 flexibility,
 testability/verifiability,
 survivability,
 expandability (by design),
 robustness,
 stability,
 security (by design and implementation),
 safety, and
 availability (by appropriate service architectures).

75
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Software requirements establish also constraints about the schedule and
resources, like:
 computer and input/output equipments,
 staff availability for operation,
 maintenance actions,
 cost for development and operation.
The requirements are not only identified but also analyzed in order to
estimate costs, benefits, and do risk analysis. Finally all results of the
requirements collection and analyses are documented in the project
plan, a reference document during all phases of project lifecycle.
According to (Keys, 2005), the costs associated with a computer project
are composed (oriented to) by:
 systems analysis and design;
 purchase of hardware;
 software costs;
 training costs;
 installation costs;
 conversion and changeover costs;
 redundancy costs, and
 operating costs, including people costs.
Some examples of risks are:
 customer will change or modify requirements;
 technology will not meet expectations (the advancement in
technology is too fast when considers the size of the software
lifecycle);
 inexperienced project team;
 delivery deadline will be tightened;

76
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 users will not attend training;
 system will be hacked, etc.
The next activity consists of transforming the user-oriented
requirements into software specifications useful to software engineers
to design and develop the software product. According to ANSII/IEEE
standards, the requirements specification shall clearly and precisely
describe the essential functions, performances, design constraints,
attributes, and external interfaces. Each requirement shall be defined
such that its achievement is capable of being objectively verified…
according to (Jones (1978) cited by Pham (2000)).
The functionality of the future software (details in input, output,
process captured by the information system, and interfaces), technical
feasibility (characteristics of people and equipment required by the
project development), and software quality specifications should be
detailed.
Not only specifications on reliability and security, but also those
related to software performance and the human factor quality should be
addressed. A technical report will be written and specification related to
the validation process will be documented.
The overall document on specifications should include at least
the following sections:
 Functional requirements ,
 Performance requirements ,
 )nterface requirements ,
 Operational requirements ,
 Verification requirements ,
 Acceptance testing requirements ,

77
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 Portability requirements ,
 Quality requirements ,
 Maintainability requirements , and
 Other requirements .
The benefits of a good software requirements specification are: a
customer-supplier complete understanding of the future product
characteristics; a reduced project cost; a real estimate of cost, schedule,
and risk; a clear understanding of the validation and verification steps;
improved project performability.

Design
The phase of building the software according to the specification
is called design. According to Taylor, , design is the process of
applying various techniques and principles for the purpose of defining a
device, a process, or a system in sufficient detail to permit its physical
realization.
Both architectural (structural) design and detailed (including
algorithms) design activities will contribute to a good design.
The structural design is responsible with system decomposition
in order to identify, in a top-down approach, subsystems and their
interfaces. This approach agrees with the current component-based
software development methodology to decrease the development time.
According to (Gustafson, 2002), the phases of the design process
of software products are:
 data design (to produce the data structures),
 architectural design (produces the structural units or classes),

78
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 interface design (specifies the interfaces between the units),
and
 procedural design (specifies the algorithms of each method).
Alternative (and most used, nowadays) methodologies for the
architectural design are based on object-oriented approach. Object-
oriented systems development follows the same pattern as structured
systems development. Firstly, the system is analyzed (object-oriented
analysis or OOA) and the system is designed using object-oriented
design or OOD. Finally, for coding are used object oriented (OOP)
programming techniques and languages (i.e., C++, Java).
The detailed design should provide:
 the software structure (data structures and algorithms or
classes and the diagrams, and structure quality
measurement),
 the software tools (languages, compilers, etc.),
 the verification/validation procedures,
 a test plan (items to be tested and test specifications), and
 the design documentation.
The documented design specifications should cover, at least, the
following aspects (Curtis et al, 2000; Keys 2005):
 an executive summary outlining the principal aspects in the
specification;
 a description of the proposed system and its objectives;
 a full description of all components (module specifications
and structure charts, together with test data);
 descriptions of inputs (including validation tests), outputs
(screen, listings, sound, etc.), and specific interfaces;

79
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 a description of all data storage to include specification of file
and database structure;
 a detailed specification of controls operating over procedures
within the system,
 a specification of all hardware requirements and performance
characteristics to be satisfied,
 a detailed schedule for the implementation of the system, cost
estimates and constraints, and
 standards to be considered for documentation, coding, testing
and validation, and quality assurance.

Implementation
Transforming algorithms and their data structures in computer
programs is called coding or implementation. The design of algorithms
(in previous phase) is important for an efficient code in a suitable
programming language.
The implementation phase starts with reusable
module/class/package identification (the component-based approach,
the object-oriented design). It continues with code editing (writing new
code, and modifying reusable code) according to a good coding style.
The next step is dedicated to code inspection which includes: code
reviews (mainly the aspects concerning the program logic and code
readability), code quality (performance related concerns on speed, and
memory used, reliability and security) and the software maintainability
(how easy the code will be maintained). The final step of
implementation phase is related to code testing

80
Copyright: 2012 by Florin POPENTIU-VLADICESCU
(modules/classes/packages to be tested, testing strategies and methods,
testing schedules, and the resource required for an efficient testing.
A good practice is based on the test driven development
methodology: test-driven development asks to developers automated
unit tests, containing assertions, created before writing the code.

Testing
Software testing and debugging represent some of the most
important components of the software engineering process (Galen
(2005)) with major concerns on software reliability. Even, when the
newest and powerful automatic testing tools are used and a quality
assurance strategy in defect removing is applied, some uncertainties in
software testing can be identified in the following activities: test
planning, test selection, test execution, test result checking, error
tracing, and quality estimation, as (Madsen et al, 2006) states.
Software testing methods and techniques vary greatly in variety,
effectiveness, cost, need for automated tool support, and ease of use ,
according to Miller . Even, program testing can be used to show
the presence of bugs, but never to show their absence , as Dahl et al.
(1972) say, every test performed increases intrinsic software quality by
no fault reveals, or by discovering a latent fault.
The existence of specifications is essential to software testing.
Correctness in software has a meaning only if the program mapping is
the same as the specification mapping. According to (Gustafson, 2002),
a program without a specification is always correct. Therefore, the
software without a specification does what is does and cannot be tested
against any specification because this does violate nothing.

81
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Following Hailpern & Santhanam (2002) given a specification
and a computer program, "any activity that exposes the program
behavior violating a specification can be called testing."
The efforts will be directed to test against the following classes of
faults, according to (Madsen et al, 2006):
 physical/human-made,
 accidental/intentional non-malicious/intentional malicious,
 development/operational,
 internal/external, and
 permanent/temporary.
In this view, not only program running, but also activities like:
design reviews, code inspections and static analysis of source code, are
labeled as testing activities even they form the so called "static testing".
According to the mentioned source, "each occurrence of the program
design or the program code that fails to meet a specification is a defect
(bug)."
Following (Madsen et al., 2006), the test cases design could be
based on the knowledge of the code - the white-box method. Also, the
black-box testing approach addressing the overall functionality of the
software is used to discover faults like incorrect or missing functions,
errors in any of the communication interfaces, errors in data structures
management, including databases and defects related to response time,
software start-up or software termination.
According to the mentioned reference, the white-box testing
method consists of the following tasks:

82
Copyright: 2012 by Florin POPENTIU-VLADICESCU
a) the investigation of all independent paths within every
module, based on the flow graph analysis – structured
testing using the Cyclomatic complexity metric ;
b) the examination of all logical predicates – branch/domain
testing ;
c) the analysis of all loops to check their limits, and
d) the internal/external data structures validity checking.

On the other side, the black-box method, based on the


relationships between all modules in the system model, uses the
equivalence partitioning approach, the boundary-value analysis (BVA)
method and, the back-to-back testing procedure. Input conditions are
divided into equivalence classes (logical, ordinal values, range (interval)
or set of values). The BVA approach consists of designing the test cases
in order to examine the upper and lower limits of the equivalence class
and the corresponding outputs. Back-to-back testing is used only for
many-versions software. The versions are tested in parallel to see if the
outputs are identical. Pressman & Ince (2000) provides more insides to
these testing methods.
Mainly, as (Pham, 2000) says, the testing phase has the following
goals: to affirm the quality of the product by finding and eliminating
faults in the program , to demonstrate the presence of all specified
functionality in the product , and to estimate the operational reliability
of the software .
Unit testing, the integration testing, and the acceptance testing
are core tasks in order to obtain highly reliable software accepted by
customers. The unit test, which belongs to white-box approach, is the
responsibility of programmers (the software house or an outsourcing

83
Copyright: 2012 by Florin POPENTIU-VLADICESCU
entity). They have to use various approaches depending on the
programming language used during coding phase. In a test-driven
development approach the developers often use testing frameworks to
create and automatically run sets of test cases. For many programming
languages there are available unit testing frameworks. A short list is
presented in the next table:

Programming Unit testing frameworks (selection only)


language/
Environment
C, C++ API Sanity Checker (ASC),
TPT (for the automated software test and verification of
embedded control systems),
STRIDE (http://www.stridewiki.com/)
Internet YUI Test (Yahoo),
EnhancePHP (http://www.enhance-php.com/),
SoapUI (web service testing tool for service-oriented
architectures),
XUnit (XML, http://reflex.gforge.inria.fr/xunit.html)
Java JUnit, JUnitEE, Cactus
.NET Pex and Moles (Microsoft Research)
Python PyUnit

The integration test includes subsystem and system test. The


system test tests all the subsystems interconnected as a whole to verify
functional, performance, and reliability requirements placed on major
design items.
The integration testing step can use the following approaches:
Big Bang (all or most of the developed modules are coupled together),

84
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Bottom Up Testing (where the lowest level components are tested first,
then used to facilitate the testing of higher level components), Top
Down Testing (testing is conducted from main module to sub module; if
the sub module is not developed a temporary program called STUB is
used for simulate the component functionality), and Sandwich Testing
(combine top down testing with bottom up testing).
The acceptance test is a validation of the testing phase and makes
use of internal tests (in-house) and field tests (user environment). The
acceptance test is defined by Pham, as the formal testing
conducted to determine whether a software system satisfies its
acceptance criteria and to enable the customer to determine whether
the system is acceptable.

Deployment and maintenance


Putting into operation is the final phase of the software
development process. The operation will continue with or without
maintenance till the end of the software lifecycle (retirement). The main
activities during this phase are:
 release (includes all the operations to prepare a system for
assembly and transfer to the customer site),
 install and activate (starting up the executable component of
software),
 adapt (modify a software system that has been previously
installed) or
 update (replaces an earlier version of all or part of a software
system with a newer release),
 deactivate (before adapting or updating),

85
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 uninstall (before retirement), and
 retire (at the end of the life cycle of a software product).

During maintenance the following approaches can be added: a


version tracking system (to help the user find and install updates to
software systems installed), and a built-in mechanism for installing
updates (correcting maintenance: the modification of a software product
after delivery to correct faults, to improve performance or other
attributes). Other maintenance activities are of the following nature:
adaptive (dealing with changes and adapting in the software
environment), perfective (accommodating to user requirements), and
preventive (oriented to increase the time mission period - in general by
keeping stable data structures, files and databases).

***

Many complementary software development methods to systems


development life cycle (SDLC) there exist. However, the most important
are:
1. Iterative prototyping (compressing the development cycle to shorten
the time to market and providing interim results to the end user) which
consists, mainly, of three steps: 1) listen to customer; 2) build and revise
a prototype; 3) have customer test drive the prototype and then return
to step 1.
2. Rapid application development (RAD) in four phases:
A. Requirements planning — joint requirements planning (JRP),
establishes high-level objectives;

86
Copyright: 2012 by Florin POPENTIU-VLADICESCU
B. Applications development — JAD follows JRP, involves users
in workshops;
C. Systems construction — design specifications, used to
develop and generate code, and
D. Cutover — users trained and the system tested.
3. Extreme programming (XP) is a new programming methodology
adopted few years ago. According to (Keys, 2005), XP is the application
of a group of practices to software development projects like:
1) The planning game: collaboration of business and
programming professionals to estimate the time for short
tasks;
2) Small releases: a ship cycle measured in weeks rather than
years;
3) Metaphor: a single overarching metaphor to guide
development substitutes for a formal architectural
specification;
4) Simple design: no provision in the code for future changes or
flexibility;
5) Testing: every piece of code exercised by unit tests when
written, and the full suite of tests when integrated;
6) Refactoring: any piece of code subject to rewriting to make it
simpler;
7) Pair programming: all production code jointly written by two
developers;
8) Collective ownership: the right of any developer to change any
piece of code on the project;
9) Continuous integration: code integrated to the main line every
few Hours;

87
Copyright: 2012 by Florin POPENTIU-VLADICESCU
10)On-site customer: a business person dedicated to the project,
and
11)Coding standards: one standard per project.

CASE tools
The term computer-aided software engineering (CASE) is used to
describe a set of tools which are able to automate all or some of various
tasks of the software life cycle, like: requirements capture, tracing, and
tracking; configuration management; model verification; facilitation of
model validation; maintenance of all project-related information;
collection and reporting of project management data, document
production and CASE data import-export.
Mainly, the CASE functions include analysis, design, and
programming. Therefore, a CASE framework provides design editors,
data dictionaries, compilers, debuggers, system building tools, etc.
There are a large variety of CASE products. A CASE product index
is maintained by (http://www.unl.csi.cuny.edu/faqs/software-
enginering/tools.html), belonging to a category:
 CASE tools (provide support for some specific tasks in the
software process),
 CASE workbenches (supporting only one or few activities), and
 CASE Environments (supporting a large part of the software
process).
The CASE Environments can be simple toolkits (example: the
Unix Programmer's Work Bench), oriented towards a programming
language (IBM Rational ClearCase), integrated (IBM Application
Development Cycle), fourth generation language (4GL) based (Informix

88
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4GL, FourGen CASE Tools: http://www.gillani.com/CASETools.htm),
process-centered environments (Enterprise Architect: Enterprise II),
and CASE applications (covering all aspects of the software development
life cycle).
SourceForge.net provides a web-based platform for creating and
publishing open software. The user can find a lot of tools useful to
software project management. However, software reliability is not
covered, in general, by CASE products. There are specific software tools,
called Computer Aided Software Reliability Engineering, many of them
being dedicated to estimation of software reliability, such as ROBUST
(Reliability of Basic and Ultrareliable Software sysTem), FRestimate,
TERSE, SREPT (Software Reliability Estimation and Prediction Tool),
SRETOOLS (AT&T Software Reliability Engineering Toolkit), SRMP
(Statistical Modeling and Reliability Program), SoRel (Software
Reliability Program), CASRE (Computer-Aided Software Reliability
Estimation Tool), ESTM (Economic Stop Testing Model Tool), SMERFS
(Statistical Modeling and Estimation of Software Reliability Functions),
RGA(Software for Repairable System and Reliability Growth Analysis),
DACS s GOEL An automated version of the Goel-Okumoto NHPP
Software Reliability Growth Model), according to the study realized by
(Chen et al., TENCON06) which presents a powerful computer-aided
reliability assessment tool for software based on object-oriented
analysis and design, called CARATS. Details of the functionalities of some
software reliability tools are given in a next chapter.
From software reliability engineer, the following flowchart
illustrates a basic strategy to deploy a reliable software.

89
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 3.1. The algorithm of deploying reliable software

In the following section and the next chapters details on


implementation of the above strategy are given.

90
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.2 Developing operational profiles

An operation is a major system logical task of short duration


which returns control to the system when complete and is unique from
algorithmic point of view when comparing against other operations, as
(Musa 1999) considers. It is important to consider major tasks (those
related to functional requirements), having short duration (a large
number of calls per mission time , being unique in order to have a
clear separation in fault classes, if the software contains bugs.
According to (Vouk , operational profile is a set of relative
frequencies (or probabilities) of occurrence of disjoint software
operations during its operational use . The modern software-based
systems may have more operational profiles. Operational profiles are
important for the selection of the test cases and software development,
for the evaluation of the testing and maintenance efforts directed to the
most frequently used or risky components.
The most important advantage of defining operational profile can
be described as: with an operational profile, a system can be tested
more efficiently because testing can focus on the operations most used
in the field. It is a practical approach to ensure that a system is delivered
with a maximized reliability, because the operations most used also
have been tested the most , according to Koziolek .
Following (Musa 1993), the development process of the
operational profile can be described by successively breaks down
system use into five different profiles in such a way the development of
an operational profile is preceded by the definition of a customer profile,
a user profile, a system mode profile, and a functional profile. The first

91
Copyright: 2012 by Florin POPENTIU-VLADICESCU
four profiles (customer, user, system-mode, functional) are on the
design level of a system while the last profile (operational) is on the
implementation level and deals with the actually coded operations of a
system (Koziolek 2005).
The profiles are defined by creating hierarchical lists of
customers, users, modes, functions and operations that the software has
to provide under each set of conditions. As an example, an operational
profile can have a tabular representation, specifying the name for each
operation and its associated probability, or a graphical representation
(with attributes of operations as nodes and the values of the attributes
as branches).

Fig. 3.2. Developing an operational profile

Customer profile
A complete set of customer groups with corresponding
occurrence probabilities makes up the customer profile. Customers are

92
Copyright: 2012 by Florin POPENTIU-VLADICESCU
persons, groups, or institutions that purchase the system. They can but
need not to be the users of the system at the same time. Customers in a
customer group use the system in the same way. For example,
companies with an equal number of employees may use a telephone
switching system in the same way because they have the same number
of users even though their businesses are different.

User profile
Users are persons, groups, or institutions that use a system. All
members of a user group use the system in the same way. Examples for
user groups are system administrators, maintenance users, regular
users etc. The user profile can be derived by taking the customer profile
and determining the user groups for each customer group. Hence, the
overall occurrence probabilities for user groups can be obtained by
multiplying the probabilities for each user group of a customer group
with the occurrence probability of that customer group.

System mode profile


System-modes are sets of functions (design level) or operations
(implementation level) that are grouped for a more convenient analysis
of the execution behavior. More precisely, the system mode profile is a
complete set of system-modes with corresponding occurrence
probabilities.
Following (Koziolek 2005), operation mode (administration
mode versus regular mode), exploitation conditions (overload traffic
versus normal traffic, initialization versus normal operation), criticality
(nuclear power plant controls versus logging functions), user experience
(newbie versus expert), or hardware components (functions executed

93
Copyright: 2012 by Florin POPENTIU-VLADICESCU
on server 1 versus functions executed on server 2) are some examples
of system modes.
The system modes are the consequence of some factors like: day
of week of time of day, time of year, traffic levels, system maturity, and
system s capability.

Functional profile
For (Musa 1993, 1999) a function is a task or part of work of a
system as defined during the design. Functional profiles are usually
designed during the requirement phases or during early design phases.
Before designing a functional profile it is often helpful to construct a
work-flow model capturing the overall processes and the context of the
system (i.e. software, hardware, and people). To create a functional
profile the system modes have to be broken down into single functions.
Therefore, the functional profile is a complete set of functions with
corresponding occurrence probabilities. The functional profile is
independent of the design methodology.

Operational profile
Operations, as opposed to functions, are actually implemented
tasks of a system, while functions are tasks of a system on the design
level. The functions of the functional profile evolve onto operations of
the system. The complete set of operations with corresponding
occurrence probabilities makes up the operational profile, as (Koziolek
2005) defined.
With the occurrence probabilities of the operational profile, test
cases can be selected efficiently because the most used operations will
be tested the most.

94
Copyright: 2012 by Florin POPENTIU-VLADICESCU
To capture various particular situations and difficulties in the
decomposition process to the lowest level the concept of extended
operational profile, consisting of a process profile, a structural profile
and a data profile, was introduced.
The process profile is similar to Musa s operational profile. The
structural profile characterizes the data structures of the application
and its configuration, and includes a description of the software and
hardware environment of the software. The data profile is concerned
with the value assigned to the variables.
Also a Markov Chain Usage Model is available for modeling
sequences of inputs to a software system. The model has two phases: the
structural phase (the state creation and connection of the consecutive
states by arcs), and the statistical phase (assigning probabilities to arcs).
Let E denotes the set of all operations and that {Yt} is the
operation performed by the system at time t, {Yt; t } is a continuous-
time process, called the operational process with the state space E is the
operational space.
Let  (i)  lim P[Yt  i] (assume to exist) is the proportion of time
t 

that operation i is performed in the long run. Therefore, the pair (E, )
can be considered the operational profile. The analysis of the software
failure process depends on the stochastic structure of the operational
process.
Let Xn denote the nth operation that the system performs and Sn
is the time at which the nth operation starts. We assume that the
duration of the nth operation is exponentially distributed with rate (i)
if this operation is i. Connecting with the process Y, we have Yt = Xn on Sn
t < Sn+1.

95
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Let A be the generator matrix of Y: Aij = (i)(P(i,j)-I(i,j)), where
P(i,j) = P[Xn+1=j | Xn = i], and I is the identity matrix. Let
 (i)  lim P[ X n  i] (assume to exist).
n 

Then, the profile  (resp. the limiting distribution of X, ), can be


obtained as unique solution of the systems of linear equations

 A  0,   P   , 
  (i )  1  resp.  (i)  1 .
 iE   iE 
 

 (i) /  (i)
Also, for any operation i,  (i) 
  ( j)
.

jE
 ( j)

If the initial operation is i, the expected number of operations


performed until operation j is performed for the first time satisfies the

r (i, j )  1   P(i, k )r (k , j ) ,
system of linear equations

k j

the expected number of operations performed in between two


performances of operation is r(i,i) = 1/(i), and, the ratio (j)/(i) is the
expected number of operation j performed in between two
performances of operation i.
Moreover, the expected first passage time m (i, j) from operation i
to j satisfies

m(i, j )    P(i, k )m(k , j ) ,


 (i) k  j
1

and the mean recurrence time of operation i is

m(i, i ) 
 (i)  (i)
1
.

96
Copyright: 2012 by Florin POPENTIU-VLADICESCU
From practical point of view the operational profiles should be
updated on a regular basis since they can change over time, and the
number of operations that are being tested should be limited, as (Vouk
2000) observed.
Finally, the development of operational profiles as described
above helps to focus resources on the most used and/or most critical
functions and to optimize the testing phase by making the tests highly
representative of the operational field.

97
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.3 Software failures and failure processes

3.3.1 Faults

Faults can be introduced in every phase of the software process


and they are propagated from one phase to another.
If introduced during the requirements specification, most of them
will result in a failure being improperly interpreted by designers,
implementers, and verification and validation teams.
Some bad practice in requirements specifications are: making
bad assumptions, missing requirements, using incorrect terms, using
incorrect sentence structure or bad grammar (or support ambiguity in
natural language), describing operations instead of identifying
requirements, describing implementations instead of requirements, as
well as over-specifying.
Requirement reviewing is a good practice to identify faults
introduced during requirements specifications. However, this activity is
very expensive, do to the large number of people that read and check the
requirements document. Two approaches are possible: the usage of a
formal language to specify requirements, and the identification of
potential problems by analysis of requirements. A tool useful to analyze
requirements is QuARS (Lami 2005: http://www.sei.cmu.edu/reports/
05tr014.pdf).
Faults can be introduced during the design. A fault introduced
during the design may spread, i.e. result in a number of defects during a
coding phase. Pentium s incorrect floating point division design and
bugs in software causing infinite loops are some examples.

98
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Many faults are introduced during implementation (coding)
phase depending on the programmer s experience. Static analyzers can
be used but important limitations there exist.
If software is modified during maintenance phase, faults can be
introduced by re-engineering (new requirements, design, etc.).
Depending on the behavior, the faults can be classified as
transient (appear for a short period of time and then disappear),
permanent (software design defects), and intermittent (appear and
disappear from time to time).
To help the software engineers, in 1990, IBM invented the ODC
(Orthogonal Defect Classification) approach - A technology for software
process measurement and analysis, providing definitions for defect
classification as well as analysis metrics and procedures for process
evaluations. A defect will belong to those classes that collectively point
to the part of the process which needs attention.
The defect type is a key concept in ODC. Other attributes used by
ODC are: defect trigger (to provide measurement instruments of the
causal relationship of software defects), and defect impact (to measure
the impact of the defect in terms of the effect of failure on the customer).
Following (Chillarege et al 1992), two assumptions are required:
there exists a semantic classification of defects, from a product, such
that the defect classes can be related to the process which can explain
the progress of the product through this process the necessary
condition , and the set of all values of defect attributes must form a
spanning set over the process sub-space the sufficient condition .
(Churamani ) reports a list of 31 defect types along the software
development process:

99
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 requirements analysis (3) = {requirements not captured,
requirements captured incorrectly, violation of requirement
analysis guidelines or templates},
 design (20) = {incomplete screen design, incorrect screen
design, incomplete database design, incomplete report layout,
incorrect report layout, incomplete interface definition,
incorrect interface definition, incomplete security design,
incorrect security design, incorrect functionality, violation of
design guidelines or templates, incorrect/not optimized SQL,
incorrect logic in design, incomplete Unit Test Cases, incorrect
Unit Test Cases, incomplete Integration Test Cases, incorrect
Integration Test Cases, incomplete Functional Test Cases,
incorrect Functional Test Cases},
 implementation (5) = {code missing, incorrect logic in code,
exception handling error, navigation errors, violation of
coding guidelines or templates}, and
 documentation (3) = {incomplete documentation, incorrect
documentation, violation of documentation guidelines or
templates}.
The defect types should be chosen so as to be general enough to
apply to any software development, independent of a specific product,
according to (Chillarege, Lyu). The classification should be applied to a
defect found in any phase, and the defect types should span the space of
phases to satisfy the sufficient condition. A defect type is important also
for the programmer making changes for a defect. Some types of defects
were also described by (Chillarege, Lyu) in order to help programmers:

100
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1) A function defect is one that affects significant capability,
end-user features, product application programming
interface, interface with hardware architecture, or global
structure ;
2) An assignment defect indicates a few lines of code, such as
the initialization of control blocks or data structure ;
3) An operand defect is one that affect the value of the operand
after a type cast (conversion) (see the case of Ariane 5
project).
4) An interface defect corresponds to errors in interacting with
other components, modules, device drivers via macros, call
statements, control blocks, or parameter lists ;
5) A checking defect addresses program logic improper
validation conditions, loop conditions, etc;
6) Timing/serialization defects are the result of a timing error
between systems/subsystems, modules, software/hardware
or are the result of the omission or an incorrect use of
serialization for controlling access to a shared resource;
7) An algorithm defect is the result of efficiency or correctness
problems that affect the task and can be fixed by
(re)implementing an algorithm or local data structure;
8) A Build/Package/Merge defect is those encountered during
the system build process, and was the result of the library
systems, or with management of change or version control;
9) A Documentation Defect was the result of an error in the
written description contained in user guides, installation
manuals, prologues, and code comments.

101
Copyright: 2012 by Florin POPENTIU-VLADICESCU
ODC uses three sets of defect triggers: Review and Inspection
Triggers, Unit and Function Test Triggers, and System and Field Test
Triggers. Some triggers belong to both the set of Field Test Triggers and
the set of Inspection Triggers or Function Test Triggers.
The set of Review and Inspection Triggers contains:
 DC-Design Conformance (the fault was detected while
comparing the work product being inspected with the
corresponding specification from the prior development
phase(s)),
 OS-Operational Semantics (the fault was detected while
considering the logic flow needed to implement the function
under review),
 C-Concurrency (The error was detected while considering
the synchronization needed between tasks to control a shared
resource),
 BC-Backward Compatibility (the fault was detected while
considering the compatibility between the function described
by the work product under review and that of prior versions
of the same product),
 LC-Lateral Compatibility (the fault was detected while
considering the compatibility between the function described
by the work product under review and other
systems/subsystems or functions that it must interface with),
 RS-Rare Situation (the fault was detected while considering
some abnormal system behavior that is not specified by the
requirements for the function under review),

102
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 SE-Side Effects (the fault was detected while considering
some system, product, function, or behavior that may occur
that is beyond the scope of the work product under review),
 DC-Document Consistency/Completeness (Internal Docu-
ment; the error was detected while reviewing the work
product for consistency/completeness and conformance to
documentation standards),
 LD-Language Dependencies (the fault was detected while
reviewing the work product for implementation language
specific details).

The set of Triggers during Unit and Function Test Activities is


composed by:
 Simple Path Coverage (the fault was detected by using
White/Gray Box testing to execute simple code paths related
to a single function),
 Combinatorial Path Coverage (the fault was detected by
using White/Gray Box testing to execute combinations of
code paths related to multiple functions; the test case that
found the defect was executing combinations of code paths),
 Test Coverage (the fault was detected by using Black Box
testing to exercise a single function with either no or a single
set of input parameters),
 Test Variation (the fault was detected by using Black Box
testing to exercise a single function using multiple sets of
input parameters, such as illegal values, boundary conditions,
and various combinations of parameters),

103
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 Test Sequencing (the fault was detected by using Black Box
testing to execute multiple functions in a very specific
sequence), and
 Test Interaction (the fault was detected by using Black Box
testing to execute multiple functions in combination. The
interaction involves more than a simple sequencing. This
trigger applies only if the functions operate correctly when
tested independently but fail when executed in together).

The set Triggers during System and Field Test Activities


considers the following triggers:
 Workload Volume/Stress (the fault was detected while
operating the system at or near some resource limit, either
upper or lower),
 Normal Mode (the fault was encountered under normal
operating conditions without exercising any particular test
suit and with the system operating well within resource
constants; this trigger should only be used when system test
scenarios can not be executed because of basic problems that
block their execution),
 Recovery/Exception (the fault was detected as a result of
executing an exception hander or recovery code. The error
would not have to be discovered if some earlier event had not
caused error handling or recovery processing to be invoked.
Note, this trigger is selected only when the error is in the
systems ability to recover from a failure, not the failure itself),

104
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 Startup/Restart (the fault was detected while the
system/subsystem was being initialized or restarted
following an earlier shutdown or complete system or
subsystem failure),
 Hardware Configuration (the fault was detected as a result
of testing a particular hardware configuration), and
 Software Configuration (the fault was detected as a result of
testing a particular software configuration).

IBM defines nine defect impact types:

 Capability (the ability of the product/system to perform its


intended functions and satisfy the customer's functional
requirements),
 Usability (the ease with which the product/system can be
easily understood and utilized by the customer to perform its
intended function),
 Performance (the speed and responsiveness of the
product/system as perceived by the customer),
 Reliability (the ability of the product/system to consistently
perform its intended functions without unplanned
Interruption),
 Install-ability (the ease with which the product/system can
be easily prepared and placed into service),
 Maintainability (the ease with which a failure can be
diagnosed and the product/system can be upgraded to apply

105
Copyright: 2012 by Florin POPENTIU-VLADICESCU
corrective fixes without impacting the customer's data and
operations),
 Documentation (the degree to which the product/system
documentation and user's manuals are correct and aid in the
customer's understanding and use of the product/system),
 Migration (the ease and degree to which the product/system
can be upgraded to the newer release without impacting the
customer's data and/or operations),
 Standards (the degree to which the product/system
conforms to established pertinent standards), and
 Integrity/Security (the degree to which the product/system
is protected from inadvertent or malicious destruction,
modification, or disclosure).

For complex distributed software the software engineer can


experience some unusual software bugs more difficult to understand
and repair than ordinary software bugs, according to (Madsen et al
2006).
There are several kinds, mostly named after scientists who
discovered counterintuitive things: heisenbugs, mandelbugs,
schroedinbugs, and agingbugs. Easy repairing bugs are classified as
bohrbugs. A computer bug that disappears or alters its characteristics
when it is researched is called a heisenbug. The name comes from the
term "Heisenberg Uncertainty principle", which is popularly believed
to refer to the way observers affect the observed systems in quantum
mechanics. The first reason for this behavior is that in debug mode some
memory is cleared before starting and some variables are forced onto

106
Copyright: 2012 by Florin POPENTIU-VLADICESCU
stack locations, instead of keeping them in registers. The second reason
is that debuggers commonly provide watches or other user interfaces
that cause code to be executed, which, in general, change the state of the
program. A bohrbug (named after the Bohr atom model) is a bug that, in
contrast with heisenbugs, does not disappear or alter its characteristics
when it is researched. The bohrbugs are easily identified using a test and
debug approach during coding process, or a unit testing done by a
programmer testing, as Rainsberger & Stirling (2005) propose. A special
case of heisenbug is the mandelbug (named after fractal innovator Benoit
Mandelbrot), which is a computer bug whose causes are so complex that
its behavior appears chaotic. A schroedinbug is a bug that manifests itself
apparently only after the soft-ware is used in an unusual way.
In C/C++, some heisenbugs appear from uninitialized auto
variables, fandango on core phenomena (corruption of the heap memory
under programmer management) or errors in stack management.
The bugs depending on the environment are called agingbugs
and are motivated by: deterioration in the availability of the operating
system re-sources, data corruption, and the numerical error
accumulation. Gradual service degradation and crash failures are some
effects of these sorts of bugs. Reactive (restart, rollback, roll-forward,
progressive retry, occasional reboot) and proactive by software
rejuvenation introduced by Huang et al. (1995), are some popular
approaches when dealing with agingbugs. Software rejuvenation is
primarily indicated for servers where applications are intended to run
indefinitely without failure as shown by Shankland (2000). Other
approaches in software fault tolerance can be found in EventHelix
(2000).

107
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.3.2 Software failures

As presented in section 1.2, software failures are generated by


software faults during run-time. Faults represent problems that
developers can see, while failures represent problems that the users
(human or computer) experience. It is important to understand that:
1) Not every fault corresponds to a failure since the conditions
under which fault(s) conduct to a failure may never be
satisfied.
2) Failures appear only during run-time (software operation).

The chain is the following: the designer or programmer makes a


mistake (an error) which generates a software fault (the software has a
defect). This fault could generate, or not, a failure depending on the
software exploitation. Failure is a user-oriented concept, but fault is a
developer-oriented concept, as (Musa 1999) appreciates. If a failure is
experienced during the software testing or run-time, the process of fault
identification and error elimination is called debugging .
The following attributes of quality can be incorporated in and
measured by software reliability (Musa, 1999): functionality, quality of
failure messages, user friendliness in operation, hardware fault
tolerance or recoverability, performance, software fault tolerance or
recoverability, and tolerance to user faults.
A failure severity class, for software reliability engineers, is a set
of failures that have the same per-failure impact on users. How measure
the impact? The most important measure considers the cost impact
(extra operational cost, repair and recovery cost, and loss of business).

108
Copyright: 2012 by Florin POPENTIU-VLADICESCU
In general, classes are separated by factors of 10, as (Musa, 1999)
proposed. Another measure is based on the system capability impact:
The class 4 can cover those failures generating minor deficiencies in one
or more operations, the class 3 can cover failures which make
unavailable to users one or more minor operations, the class 2 can
include those failures generating the unavailability of one or more
important operations, and the class 1 will cover those who make
unavailable to users one or more key operations. The number of classes
is only an example. The scale can be improved taking into consideration
more details on system capability impact.
Another important term for software reliability engineers is the
failure intensity (failures per unit time). The unit time can be the
operating time (software age) or the amount of processing done.
In order to define the necessary reliability it is necessary to
define failure with severity classes, to choose a common measure and
set a failure intensity objective according to the following steps (Musa,
1999):
1) Establish known component values;
2) Pick a trial allocation of component failure intensity
objectives to minimize system development time,
development risk, or development cost;
3) Compute a trial system failure intensity from trial component
failure intensities and compare this to the requirement;
4) Modify the allocation and recomputed the trial system failure
intensity until it matches the requirement.

109
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Not only failures and their impact are relevant for software
reliability engineering, but also the failure mode in order to evaluate
effects and eliminate the generative causes.
In general, a failure mode and effects analysis (FMEA) can be
defined as a systematic way of identifying failure modes of a system,
item or function, and evaluating the effects of the failure modes on the
higher level , according to (Pentti & Atte, 2002). The objective of FMEA
is to determine the causes for the failure modes and decisions to be
taken to eliminate or reduce the chance of failure.
For all kind of systems, a failure mode is the manner by which a
failure is observed, and describes the way the failure occurs and its
impact on equipment operation. The failure effect is represented by the
consequence(s) a failure mode has on the operation, function, or status
of an item. Failure effects are classified as local effect, higher level, and
end effect.
In software engineering, the failure modes of software are
generally unknown. The software modules do not fail, but only they
display incorrect results. The definition of failure modes is one of the
hardest parts of the FMEA of a software-based system. To discover this
incorrect behavior the software engineer has to apply his own
knowledge about the software to set up an appropriate FMEA approach.
The failure mode and effects analysis for hardware or software
has certain distinguishing characteristics, as Ristord et al. (2001)
describes. The hardware FMEA may be performed at functional level or
part level; it applies to a system considered as free from failed
components; it postulates failures of hardware components according to
failure modes due to ageing, wearing or stress; it analyses the
consequences of theses failures at system level, and states the criticality

110
Copyright: 2012 by Florin POPENTIU-VLADICESCU
and the measures taken to prevent or mitigate the consequences. When
consider software processes, the FMEA (let us denote by SWFMEA) is
only used at functional level. Moreover, SWFMEA applies to a system
considered as containing software faults which may lead to failure under
triggering conditions. Also, it postulates failures of software components
ac-cording to functional failure modes due to potential software faults,
and analyses the consequences of these failures at system level.
SWFMEA states the criticality and describes the measures taken to
prevent or mitigate the consequences.
There are many sources of successfully application of SWFMEA,
as identified by (Pentti & Atte, 2002). For (Goddard 2000), SWFMEA
validates that the software has been constructed to achieve the specified
safety requirements, while (Maier 1997) described the use of FMEA
during the development of robot control system software for a fusion
reactor, to mention only a few of them.
(Reifer 1979) gives the following general list of failure modes
based on the analysis of three large software projects: computational,
logic, data I/0, data handling, interface, data definition, data base, and
other.
(Ristord et. al. 2001) give a list of five general purpose failure
modes at processing unit level: the operating system stops, the program
stops with a clear message, the program stops without clear message,
the program runs, producing obviously wrong results, and the program
runs, producing apparently correct but in fact wrong results.
For each input and each output of the software component, (Lutz
et. al. 1999) consider the following four failure modes:
 missing data (e.g., lost message, data loss due to hardware
failure),

111
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 incorrect data (e.g., inaccurate data),
 timing of data (e.g., obsolete data, data arrives too soon for
processing), and
 extra data (e.g., data redundancy, data over-flow).
For each event (step in processing), on the other hand, they
consider the following four failure modes:
 halt/abnormal termination (e.g., hung or dead-locked at this
point),
 omitted event (e.g., event does not take place, but execution
continues),
 incorrect logic (e.g., preconditions are inaccurate; event does
not implement intent), and
 timing/order (e.g., event occurs in wrong order; event occurs
too early/late).
To close this section, we have to say that reliability is a necessary,
but not sufficient condition for software acceptance by the customer.
However, establishing reliability objective under various constraints is a
good practice and requested by the paradigm development Design by
Contract .
Also, FMEA is well understood at the systems and hardware
levels, where the potential failure modes usually are known and the task
is to analyze their effects on system behavior. A common opinion is that
FMEA is applicable to software-based systems but only to a limited
extent.

112
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.3.3 Failure processes

There are unlimited possibilities to describe mathematically a


failure process. The most used form of the failure processes has
stochastically nature, these being modeled as random processes. The
time used by models is a cumulative time, the origin being the starting
time of the system test. The model specification asks for a function of
time like failure intensity or the expected number of failures. Failure
models will be described starting with the 6th chapter in the framework
of software reliability models. Some simple examples are presented in
the following.
Example 1. Consider a software which is periodically scheduled
(a real-time thread), and p (a constant) is the probability of software
failure per input.
The number Sn of failures after n inputs is given by the binomial
distribution:
n
P( Sn  k )    p k (1  p)( n k ) .
k 

This is a binomial process. System failures occur for all Sn > 0.


Therefore Psys(n) = P(Sn>0) = 1 – P(Sn=0) = 1 – (1-p)n. If n=Kt,
where K is the number of inputs per unit time. The system failure
probability at time t is P(t) = 1-(1-p)Kt, which can be approximated by an
exponential: P(t)=1-e-Ktp.
Example 2. The following model, proposed by (Tohma et al.
1991) and described by (Pham 2000) is based on the hypergeometric
distribution function.

113
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Assume: 1) A program initially contains m faults when the test
phase starts; 2) A test is defined as a number of test instances (couples
of input data and output data) denoted by ti, i= , , …, n; Detected
faults are not removed between test instances.
The model assume two categories of faults: newly discovered
faults, and rediscovered faults.
Notations: Wi – the number of active faults during test instance ti;
Ci-1 is the cumulative number of errors already detected so far by t1, t2, …,
ti-1; Ni – the number of newly detected errors by time ti; ni – an observed
instance of Ni.
Assuming a hypergeometric distribution for the newly detected
faults Ni, the probability of obtaining exactly ni newly detected faults
among Wi faults is:
 m  Ci 1  Ci 1 
  
  Wi  ni 
P( N i  ni ) 
ni
m 
,
 
 Wi 
where

Ci 1   nk , C0 = 0, n0 = 0,
i 1

k 1

and
max{0, Wi-Ci-1} ni max{Wi, m-Ci-1}, for all i.
Therefore, the expected number of newly detected faults during the
interval [ti-1, ti] is
m  Ci 1Wi
E ( Ni )  ,
m
and the expected value of Ci is

114
Copyright: 2012 by Florin POPENTIU-VLADICESCU
E (Ci )  m 1   1 
  Wj 
 .
i

 j 1  m 

Fig. 3.3. Faults/Failures Monitoring

115
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3.4 Factors that affect Software Reliability

It is clear that software reliability is affected by various factors.


However, the discovery of such factors is not an easy task. (Zhang &
Pham, 1999) made an empirical research to identify the factors affecting
the software reliability. The mentioned authors used a survey
instrument that focuses on the perspective of the primary participants,
managers, system engineers, programmers, testers and other people
involved in software research or development teams, from 13 software
companies. The survey form and results are presented also in (Pham
2000), where these factors are called environmental factors. The subject
is very important not only for software reliability engineers, but also for
developers, managers etc. Definitions for the most important
environmental factors are given below:
1. If k is the amount of programming effort (man years), and t is
the amount of programming time (years), then PDIF (the Difficulty of
Programming) is defined as follows: PDIF = k/t2.
2. Program complexity (PC) is defined by program size (Kiloline of
code – KLOC) is most used and is significantly better when compare
against McCabe s cyclomatic complexity metric (V(G)=e-n+2, where e is
the number of edges in the control graph, n is the number of vertices, the
digraph G is based on the control flow representation of the program) or
(alstead s volume.
3. Programmer’s skill (PS) is the average number of years of

Y
n

programming experience, PS  i 1
i
, where n is the total number of
n

116
Copyright: 2012 by Florin POPENTIU-VLADICESCU
programmers involved in the project, and Yi is the number of years of
experience of the programmer i, i = , , …, n.
4. Percentage of Reused Code (PORC) is defined taking into
account KLOC for the existing modules (denoted by S0), and KLOC for
new modules (denoted by SN):

PORC 
S0
S0  S N
.

5. Programmer Organization (PO) is defined as the percentage of


high-quality programmers and is computed as follows:

ICON 
nh
,
n

where nh is the number of programmers whose programming


experience is more than six years, and n is the total number of
programmers.
6. Testing Coverage (TCVG) is defined as the percentage of the
source codes which are covered by the test cases:

TCVG 
Sc
,
St

where Sc = KLOC which is covered by testing, and St = total KLOCs.


7. Levels of Programming Technologies (LPT) is defined as the
sum of rating scores of every category:

LPT   Ri ,
n

i 1

117
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where n is the number of categories, and Ri is the rating score of the
category i.
8. Percentage of Reused and Modified Modules (PRMM) can be
computed by:

PRMM 
LOCm
,
LOCn

LOCm is the number of line code for modified modules, and LOCn is the
number of line code for new modules.
9. Testing Effort (TE) is defined as the number of testing cases
generated, testing expenditure, or the number of human years that the
testing takes.
10. Testing Environment (TENV) is defined as the degree of the
compatibility of testing and operational environments.

Many other factors were considered during investigation. The


following five categories of factors were considered by (Pham 2000) –
the factors are mentioned between parentheses:
 General (program complexity; program categories like
database, operating system, web application etc; difficulty of
programming; amount of programming effort, level of
programming technologies; percentage of reused modules,
programming language like: assembler, high level, Object
Oriented etc),
 Analysis and Design (frequency of change of program
specification; volume of program design documents; design
methodology; requirements analysis; relationship of detailed

118
Copyright: 2012 by Florin POPENTIU-VLADICESCU
design to requirements; work standards; development
management),
 Coding (programmer skill; programmer organization;
development team size; program workload; domain
knowledge; human nature like mistake and work omission),
 Testing (testing environment; testing effort; testing resource
allocation; testing methodologies; testing coverage; testing
tools; documentation), and
 Hardware systems (processors; storage devices,
input/output devices, telecommunication device, system
software).

119
Copyright: 2012 by Florin POPENTIU-VLADICESCU
120
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4 SOFTWARE TESTING FOR CERTIFICATION AND
RELIABILITY GROWTH

4.1 Preparing for test

Software testing is the process of executing a program or system


with the intent of finding errors, as Meyers (1979) appreciate. However,
the efficiency of testing is correlated with the quality of test cases and
test procedures.
A basic concept during software testing is the run. A run is an
instance of an operation being characterized by the operation itself, the
input state (or the set of input variables and their values) and the
expected behavior (the output state). An input variable can be direct
(arguments, menu selection, etc) or indirect (affecting the software
behavior, but not directly observable). Any run is based on a test case,
but a test case can be used in different operation modes, and may
generate different behavior, that means a test case can generate multiple
runs.
A test procedure is a controller that sets up environmental
conditions and that invokes randomly test cases at random times ,
according to Musa (1999). Before describing the procedure for
preparing test cases and preparing test procedures, let us present the
characteristics of some types of practices and tests useful from a
software reliability engineering perspective.

121
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Alpha testing is done at the end of development on a virtual user
environment created according to the user requirements.
Beta testing is done by end-users or others (third parties), and
represents the final testing before releasing the application.
Acceptance testing is done to verify if system meets the customer
specified requirements.
Load testing is a performance testing to check system behavior
under load.
Another software testing concepts used by software reliability
engineers is regression testing which ask for testing the application as a
whole for the modification in any module or functionality.
The acceptance testing is based both on simple and complex
tests. A simple test consists of single execution of operations, with
minimized interactions between the operations. Complex tests are used
for system testing and integration testing.
Preparing for test assumes the recording the user and other
interface interactions and environmental variables necessary to initiate
the runs. Run categories can be established for a careful test selection
and improve testing.
According to Musa(1999), the process for preparing test cases
needs:
1) to estimate the number of new test cases required for the
current release;
2) to allocate the number of test cases among the (sub)systems
to be tested;
3) to allocate the number of new test cases for each system
among its new operations;
4) to specify the new test cases;

122
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5) to add the new test cases to the pool of test cases from
previous releases.
To estimate n1 - the number of new test cases under constraints
of time and cost is a challenging task. Let t1 be the number of test cases
that can be prepared in the specified time, and t2 be the number of test
cases that can be prepared under a specified cost. Then n1 = min (t1, t2).
The number t1 depends proportionally on the available time and the
available staff. The available time can be considered as the interval
between requirements establishing (the end of), and the start of test.
The number t2 depends on the budget allocated for test case
preparation. In general, the number of new test cases is greater than the
number of new operations (at least one test case is necessary for each
operation). Sanity testing (if a new software version is performing well
enough to accept it for a major testing effort) is necessary before
releasing the new software version.
To establish n2(k) – the number of new test cases allocated to the
(sub) system k, it is necessary to give particular weight to the different
operations that have high occurrence probabilities, as Musa(1999)
recommended. At least two subsystems can be considered: operating
system – n2(1), and application n2(2). Therefore, n1 = n2(1)+n2(2).
Having n2(k) and o(k) the number of new operations of the
subsystem k, the allocation can be realized in four steps (Musa, 1999):
1) starting with the operational profile obtain the occurrence
probability for each operation;
2) identify the rarely occurring critical (with great impact in
case of failure) new operations and determine the number of
test cases to be assigned;

123
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3) determine the allocation probabilities for the other new
operations and pre-assign one test case to each infrequent
other new operation;
4) assign the remaining test cases to the remaining other new
operations in accordance with the allocation probabilities.
Test cases specification assumes equal probability of test cases
selection among all the choices defined by the possible combination of
subsets (called levels by Musa(1999)) of the input domains of the input
variables defining the operation.
For each operational mode is required a test procedure. Test
procedures are executable. Test procedure is nothing but a group of
small test cases maximum of 10.
A test case can be recorded (in a data base) taking into account
the following fields:
 test case id,
 unit to test,
 assumptions,
 test data,
 steps to be executed,
 expected result,
 actual result,
 pass/fail,
 comments.
A basic format of a test case statement is:
Verify [action to do]
using [tool name, dialog, … ]
with [conditions] to [demonstrate, show, …].

124
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4.2 Executing test

As shown before, a complete testing strategy consists of testing at


various points in the production process according to a test plan,
documented according to ANSI-IEEE 829/1983 – for instance, and
covering the following items structured by component, description, and
purpose:
1. (Responsibilities; Specific people who are and their assignments;
Assigns responsibilities and keeps everyone on track and
focused)
2. (Assumptions; Code and systems status and availability; Avoids
misunderstandings about schedules)
3. (Test; Testing scope, schedule, duration, and prioritization;
Outlines the entire process and maps specific tests)
4. (Communication; Communications plan – who, what, when, how;
Everyone knows what they need to know when they need to know
it)
5. (Risk Analysis; Critical items that will be tested; Provides focus by
identifying areas that are critical for success)
6. (Defect Reporting; How defects will be logged and documented;
Tells how to document a defect so that it can be reproduced,
fixed, and retested)
7. (Environment; The technical environment, data, work area, and
interfaces used in testing; Reduces or eliminates
misunderstandings and sources of potential delay).

125
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Any testing approach, including software reliability engineering
testing, makes use of test cases, test scripts, test scenarios and test runs.
A test case is a document that defines a test item and specifies a
set of test inputs or data, execution conditions, and expected results. A
test case is generally executed manually but many test cases can be
combining for automated execution (by a test execution engine).
A test script is a step-by-step procedure for using a test case to
test a specific unit of code, function, or capability.
A test scenario is a chronological record of the details of the
execution of a test script.
A test run represents a series of logically related groups of test
cases or conditions. The homogeneity of a set of runs is the proportion
that shares the same failure behavior. The sets of runs with
homogeneity close to 1 are called run categories. If one run in the run
category executes successfully, there is a high probability that all runs
will be successful. Hence, establishing run categories can improve the
testing efficiency, because testing must be as short as possible in order
to keep costs under control.
According to Musa (1999), executing test asks the following
activities: testing time allocation, the test invocation, and the failure
identification.
The test time is measured, in general, in hours and is allocated
taking into account:
1) the systems (components) to be tested;
2) the type of test (from feature, regression, and load test);
3) the operational modes.

126
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Musa(1999) recommends the following rules.
R1. In the case of reliability growth test, then
- The allocation among systems is based on the
estimated risk;
- Same proportion of test time will be allocated for
every new test case;
- Allow enough time for regression test;
- Allocate the remaining time for load test.
R2. When only certification test is planned, the total time will
be allocated to load test. The load test should be divided
among the operational modes in the same proportions of the
exploitation in the field.
R3. There are no guidelines for determining the total amount
of feature and load test time. Only the principal variables are
known, but not the model parameters.

The minimum amount of certification test required to achieve the


desired level of confidence in a level of failure intensity can be estimated
by

t CT 
F
TN
,

where tCT is the amount of testing time in natural or clock time units, λF
is the failure intensity objective (the same units), and TN is a normalized
measure given by
ln(  /(1   ))
TN 
1 
,

127
Copyright: 2012 by Florin POPENTIU-VLADICESCU
with – the producer risk, – the consumer risk, and γ – the
discrimination ratio.
The SRE test starts only after the units of the systems were tested
and their integration was verified also. Test invocation follows the
operational mode invocation.
The number of test cases invoked depends on the time the
operational mode runs and its occurrence rate. From reliability point of
view, the testing driven by an operational profile is efficient. It identifies
failures, on average, according to their occurrence pattern.
Also, in load test scripts, dynamic data can be used to reproduce
more complicated and time-sensitive bugs.
To identify failures, the following tasks are necessary:
1) analyze the test output for deviations from expected behavior
(by manual inspection, built-in audit code usage, or using
external checkers);
2) determine those deviations which can be considered as
failures (violate the user requirements);
3) establish when the failures occurred (using the same measure
that used in setting the failure intensity objective).

When use a test execution engine, like Rational Quality Manager


from IBM, JTStand (http://jtstand.codehaus.org/) or bumbleTest
(http://bbltest.sourceforge.net/), the software reliability engineer (or
test operator) will benefit of the following functionalities:
 the selection of test cases (interactively);
 loading, from files (database),
 the test specification;

128
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 executing test (by testing tools);
 store the test results (in a database, for future use);
 analyze test results (statistics, data mining),
 authenticate the test operators.
As consequences of the considerations presented above let
outline the following:
- Ensure that the test environment is correctly configured for
metrics collection.
- Use the system manually during test execution so that you can
compare your observations with the results data at a later time.
- Using dynamic test data in a load test allows the usage of error
values if they suit the test plan.
- Run a test that loops through all of selected data to check for
unexpected errors.
- Run tests in a variety of combinations and sequences to ensure
that one test does not corrupt data needed by another test in
order to run properly.
- When use agile methods in software engineering then modify the
tests, test designs, and subsequent strategies as results analysis
leads to new priorities. If repeating the test is necessary, consider
establishing a test data restore point before beginning the testing.
- Analyze results immediately and modify the test plan
accordingly.

129
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4.3 Software Reliability Data Collection

According to (Kirkwood & Bazzana), software reliability


assessment is based on the application of statistical techniques to
failure data in order to quantitatively assess current reliability and make
predictions regarding future reliability . Moreover, predictions depend
on the assumption that the future use of the software is similar to that
during collection of the previous failures (i.e. the operational profile
does not change . Data collection is an important task because
interconnects the test execution and failure analysis activities.
According to Musa (1999), there are four general ways of
describing the failure occurrence in time:
 time of failure,
 time interval between failures,
 cumulative failures up to a given time, and
 failures appearing in a time interval.
Usually, the data required to guide decision is a list of inter-failure times
collected from monitoring the software (module, build, etc.) in
operation. Data can also be grouped , when the number of failures in a
given interval is recorded (only in cases when the information on the
exact points of failure is lost).
In reliability assessment three time domains are most commonly
used: calendar time, clock time, and execution time.
The calendar time (according to every-day life) describes the time
as shown on a watch (date and time of each failure is logged).

130
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The clock time counts the only time during software runs (the
down time is excluded), when both the run ID and the time will be
recorded.
The execution time counts only that part of the clock time during
which the processor is executing instructions of the software being
monitored (at other times the software may be suspended, or the
processor will be executing other programs, in a time-sharing task
scheduling approach).
Musa (1999) proposed that all reliability theory should be based
on execution time, which is a better measure of the failure-inducing
stress . Musa recommends the usage of natural units or time units when
define the reliability quantities. A natural unit is a unit that is related
to the output of a software-based product and hence the amount of
processing done, and represents, also, the amount of failure-inducing
stress, so the number of failures experienced is related to the number of
natural units that have occurred. Some examples follows: one failure per
1000 printed pages, one failure per 1000 transactions, etc. According to
his experience, Musa appreciates that measurement of natural units is
often easier to implement than that of execution time .
The computation of execution time data by the operating system
is necessary but not sufficient for execution time data collection and
recording because the measurements are based on sampling and the
accuracy is affected, or the measurement is not repeatable (multitasking
time-sharing systems and the Heisenberg effect). Even imperfect, the
execution time recorded by the operating system is a good
approximation to the true execution time. For instance, when executing
a command from the Unix shell, the user can prefix the command with
the word time to measure the execution time of the command.

131
Copyright: 2012 by Florin POPENTIU-VLADICESCU
From software engineering point of view, each programming
language provides some mechanism to retrieve the current time from
the system. If we read and store the system time at various moments, we
can compute time intervals by subtracting the values of system time
taken at different moments. A simple example, in C, follows:
clock_t st, fin;
st=clock(); {the_algorithm_code();} fin=clock();
printf("time=%f\n",(fin-st)/CLK_TCK);
However, the basic unit of the clock_t type is seconds, which is way too
large for measuring operations that are happening at clock rates in the
gigahertz range.
Companies and research groups concerned with getting highly
accurate performance measurements often set up specially configured
machines that minimize any sources of timing irregularity, such as by
limiting access and by disabling many OS and networking services. To
provide greater precision for timing measurements, many processors
also contain a timer that operates at the clock cycle level. This timer is a
special register that gets incremented every single clock cycle. Special
machine instructions can be used to read the value of the counter. Not
all processors have such counters, and those that do vary in the
implementation details.
Let consider that during the monitoring period there are two
active processes: A and B. The processor alternately executes part of
process A, then part of B, and so on. As it executes these processes, it
operates either in user mode, executing the instructions of the
application program; or in kernel mode, performing operating system
functions on behalf of the program, such as, handling page faults, input,
or output. When the scheduler switches from process A to process B, it

132
Copyright: 2012 by Florin POPENTIU-VLADICESCU
must enter kernel mode to save the state of process A (still considered
part of process A) and to restore the state of process B (considered part
of process B). From the perspective of an application program, the flow
of time can be viewed as alternating between periods when the program
is active (executing its instructions), and inactive (waiting to be
scheduled by the operating system).
Manual (non-automated) and automated failure data collection
approaches are used. Where data is manually collected (by users,
testers, etc.) the following situations are possible:
1) data collectors could be not fully familiar with the
requirements of failure data collection;
2) collection standards, in general, are not used, or the usage is
not complete;
3) the paper forms (even well designed) are not filled
accordingly: critical fields left blank, informally recording of
information outside the form fields, and incorrect data entry
(due to transcription errors).
Automated data collection is not fully possible even formal
specification, and classification rules (applied to failures) are used.
In the following the organization of the failure data files used by
CASRE (Computer Aided Software Reliability Estimation) and SMERFS
(Statistical Modeling and Estimation of Reliability Functions for
Software) is presented. CASRE uses time units (interpretable as natural
units). The type of CASRE files is ASCII (American Standard Code for
Information Interchange) and can be created by basic software as text
editors (like Edit, Notepad etc.) Two formats are supported by CASRE
corresponding to NTUD (Natural or Time Units Data) and FID (Failure
per Interval Data) collection method. However, the first line, indicates

133
Copyright: 2012 by Florin POPENTIU-VLADICESCU
time units that will be used Seconds , Minutes , or (ours for both
collection methods. The following lines differ in format. All files contain
lines with fields separated by blanks. The fields of NTUD files are: the
failure number, the number of natural or time units since previous failure,
and the severity class.
A short collection follows:
Minutes
1 35 1
2 82 1
3 115 1
4 128 1
5 157 1
6 173 1
7 214 1
8 243 1
9 257 1
10 301 1
11 376 1
12 425 1
13 514 1
14 724 1
15 968 1

The fields of the FID lines are: the interval number, the number of
failures in the interval (greater than or equal to zero), the duration of
interval in natural or time units, three fields having the value zero (to
void out not used program features), and the severity class, as shown in
the following data set:

134
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Hours
1 7 2 0 0 0 1
2 4 3 0 0 0 1
3 4 4 0 0 0 1
4 3 2 0 0 0 1
5 2 3 0 0 0 1
6 2 4 0 0 0 1
7 2 5 0 0 0 1
8 1 8 0 0 0 1
9 1 10 0 0 0 1
10 1 23 0 0 0 1

SMERFS has an input module which provides as options both


ASCII file and keyboard file. The following data types are supported by
SMERFS:
 Wall Clock Time Between Failures (WC TBF),
 Central Processing Unit Time Between Failure (CPU TBF),
 both WC TBF ad CPU TBF, and
 Interval Fault Counts and Testing Lengths (IFC TL).
The available units for TBF data processing are: Seconds, Minutes,
Hours, Days, Weeks, Months, and Years.
The file data for execution time analysis contains lines having as
fields the failure number and the time between failures. The file data for
interval data analysis contains lines having the following fields: interval
number, number of faults found, testing lengths (usually one), and
variable lengths (to model the intensity of testing).

135
Copyright: 2012 by Florin POPENTIU-VLADICESCU
As Pham remarked, the time-domain approach always
provides higher accuracy in the parameter estimates . The software
engineer responsible with reliability management must trade off the
cost of data collection with the accuracy reliability level required.

136
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4.4 Applying Failure Data to Guide Decisions

Collected data (manually or automated) are sent to analysis


modules in order to find patterns and models, and to predict the system
behavior. Many data analysis methods have statistical nature being of
exploratory or confirmatory type.
The most used exploratory data analysis techniques relevant in
the analysis of software failure and fault data are the following: plots
and graphs, data modeling, data transformation, and data resistance.
Confirmatory tests are based on hypothesis tests. Trend analysis of
software failures can benefits from confirmatory tests.
By using scatterplots, line plots, stem-and-leaf plots, dot plots,
and schematic or box plots, an analyst can determine some relationships
and associations in data under analysis. A valuable plot describes the
relationship between cumulative software failures and the usage time.
Another useful measure under various representations is the software
failure intensity (approximated from empirical data) with the following
benefits (according to Jones & Vouk (Chapter 11, 439-490)): the
measurement of the empirical failure intensity, a model of failure
intensity can be discovered, and its parameters could be estimated from
data using a metric, like OLS (ordinary least square) metric.
Modeling data and comparisons between models are important
to guide a decision. Also, for a better understanding of data, some
transformations are necessary (the log transformation, for instance).
Data recalibration and outliers detection can also be used. Most useful
models are predictive, but reliability growth models can also be applied.

137
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Before choosing a model, tests on data a necessary to be applied (for
instance, a test to check the Poisson process assumption).
Some important topics in Reliability Data Analysis are:
 Calendar-Time Analysis (CTA),
 Usage Time Analysis (UTA),
 Calendar Time Reliability Analysis (CTRA),
 Usage-Based Reliability Analysis (UBRA), and
 Special Events Analysis. (SEA).
CTRA is applied when failures are reported only in the calendar-
time domain. This is possible, and accepted in some proportion, because
practice managers and users who reports the failures of software
according to their operational profile, are more attached to CT domain
which is closer to their way of thinking in order to decide for future.
However, Musa described the case of a Weibull-type failure intensity
model applied for calendar-time system behavior conducting to inferior
fitting than similar model obtained using execution-time-based
intensity.
The UBRA can be used if sufficient usage information is available,
as Jones & Vouk shown. SEA considers failure modes and those events
that lead to a class of failures having extremely damaging consequences.
Rare event models should be used. However, the computation of the
probability of rare events is difficult.
If a (software) system is composed by small systems
interconnected and integrated in some way, and there is available a
collection of failure data for every component and another collection of
failure data regarding the integration step, the decisions can be made in
the following order, according to Musa (1999):

138
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1) accepting or rejecting an existing component;
2) guiding the software development process for the system
(both changes in the process and the resolution of failures –
according to some priority list, are important);
3) accepting or rejection a super-system;
4) releasing the system.

Decisions 1 and 3 are made by the use of certification test that


uses a reliability demonstration chart suitable when small sample are
used. The certification test can be used only for time units measured at
the point at which failure occurred (see Musa(1999)).
Let FIO be the failure intensity objective. In order to build the
reliability demonstration chart the failure data are multiplied by FIO and
each failure are plotted on the chart as it occurs.
The decision based on the reliability demonstration chart
depends on:
 the discrimination ratio (the factor of error in estimating
failure intensity the analyst will accept or the ratio of the
maximum acceptable failure intensity to the failure intensity
objective; denoted by γ ,
 the consumer risk level (the probability of the acceptance of
falsely saying the F)O is met when it is not; denoted by and
 the supplier risk level (the probability of acceptance of falsely
saying the F)O is not met when it is; denoted by .

All charts have similar shape, but the boundaries between the
reject and continue and continue and accept regions could be different.

139
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Let Tn be the normalized measure of when failures occur (horizontal
 1 
coordinate), n be the failure number, A  ln , and B  ln
1 
. The

value of A is changing rapidly with consumer risk but only very slightly
with supplier risk, but the value of B changes fast with supplier risk and
very slightly with consumer risk. The boundary between the accept and
A  n ln 
continue regions is given by Tn 
1 
, and the boundary between

B  n ln 
the continue and reject regions is computed by Tn 
1 
. The

reliability demonstration chart in the case of constant failure intensity


equal to the failure intensity objective λF, shows the normalized failure
times as , , … the failures occur at / λF, /λF, …. A useful measure is
the demonstrable FI/FIO ratio λD given by

D 
tt F
TN ( n)
,

where TN(n) is the normalized measure for the continue-accept


boundary at the number of failures that have been experienced, tt is the
number of natural or time units at the end of test, and λ F is the failure
intensity objective.
The following rules (having practical consequences) could be
applied in the case of certification test (Musa, 1999):
1. The certification test requires the usage of single failure data time
series (interval data are not useful).
2. When risk levels and and/or the discrimination ratio γ decrease,
the continue region becomes larger (more testing is necessary).

140
Copyright: 2012 by Florin POPENTIU-VLADICESCU
3. The software can be accepted or rejected as soon as a cross of accept
or reject region appears. It is not necessary to wait for another failure to
occur.
4. The demonstrable FI/FIO ratio can be used when the certification
periods end without leaving the continue region.
5. For the case of components under certification testing it is necessary
to continue the test only if no failures appear and if the demonstrable
FI/FIO ratio is large. If the demonstrable FI/FIO ratio is large and
failures have occurred then reject the component. Accept the
component if the demonstrable FI/FIO ratio is not large.
6. For software products (the whole system), if failures appear and the
demonstrable FI/FIO ratio is large it is necessary to return to the
reliability growth test and, after removing faults, run the certification
test, once again.
7. For super-systems the certification test ends if the product has been
accepted and the demonstrable FI/FIO ratio is acceptable (less then 2.0).

The FI/FIO ratio can be used during reliability growth tests being
estimated from failure data. The estimation of FI/FIO ratio can be
obtained using SMERFS and CASRE. These packages provide estimation
methods for the parameters of the supported software reliability
models. Also, statistical inference on accuracy is available. The most
used software reliability models and computational and statistical
aspects will be described in the next chapter.

141
Copyright: 2012 by Florin POPENTIU-VLADICESCU
142
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5 SOFTWARE RELIABILITY MODELS

Due to the increasing interest in modeling and predicting the


software behaviour during operation, the researchers identifies a large
plethora of models, both static (deterministic) and dynamic
(probabilistic). Fig. 5.1 illustrates a taxonomy proposed by Rahmani
(2008).

Fig. 5.1. Classes of Software Reliability Models (Rahmani, 2008)

143
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The most desirable characteristics of software reliability methods are
(according to Chu et al. (2010)):
1. The description of the method and its application is
comprehensive and understandable.
2. The assumptions of the method have reasonable bases.
3. The method allows for consideration of the specific
operating conditions of the software.
4. The method takes into consideration the quality of
lifecycle activities.
5. The method makes use of available test results and
operational experience.
6. The method addresses uncertainty.
7. The method has been verified and validated.
8. The method is capable of demonstrating the high
reliability.
9. The method should be able to estimate parameters that
can be used to account for software common cause
failures.
A large number of software reliability growth models (SRGMs)
have been developed over the years, and each model has its strengths
and limitations. There are different assumptions in different SRGMs on
how the failure rate decreases with time; that is, the models specify
different empirical formulas, and use test data to estimate their
parameters.
According to (IEEE, 2008), Software Reliability (SR) models can
both assess and predict reliability . Assessment deals with measuring
past and current reliabilities and provides forecasts of future reliability.

144
Copyright: 2012 by Florin POPENTIU-VLADICESCU
There are many ways to develop an SR model: a) describe it as a
stochastic process; b) relate it to a Markov model; c) define the
probability density or distribution function; or d) specify the hazard
function. In the following the most important models are described.

5.1 Classical software models

5.1.1 Weibull model

Assumptions The testing-effort can be described by a Weibull-type


distribution: Three particular cases are:
 Exponential curve: The cumulative testing-
effort consumed in time (0, t] is N(1-e-t);
 Rayleigh curve: The cumulative testing-effort
consumed in time (0, t] is N(1-exp(-t2/2));
 Weibull curve: The cumulative testing-effort
consumed in time interval (0, t] is N(1-exp(-
tc)),
where N is the total amount of testing-effort
expenditures required by software testing, is the
scale parameter, and c is the shape parameter.
The form of the hazard rate for the software
reliability model is given by Weibull model (a and b
are positive constants to be estimated from data): a/b
(t/b)a-1, where a, b are positive constants, and t is

145
Copyright: 2012 by Florin POPENTIU-VLADICESCU
greater or equal to zero. (a>1 -> the error rate
increases with time; a<1 -> a decreasing error rate; a
= 1 -> a constant failure rate over time). The
corresponding pdf for the time to failure is given by:

a t 
a 1
  t a 
f (t )    exp      , t  0 ,
bb  b 
 
with: cdf F(t)=1-exp(-(t/b)a), the reliability function
R(t)= exp(-(t/b)a), and hence, the MTBF is b/a
(1/1), where (.) is the gamma function.
Data K – the total number of testing intervals.
requirements ni – the total number of errors in each time interval of
testing, i = 1, 2, ..., K.
di – the length of the testing interval, i = 1, 2, ..., K.

M   ni - the cumulative number of errors.


K

i 1

Estimates Let

n
i

, X i  ln   d j  , Yi  ln  ln 
 i    1 
F (i )  ,
j 1
j

M  j 1    1  F (i )  

 Xi  Yi  (Y  Y )( X X )
K K K

X i 1
,Y  i 1
, mˆ  i 1

(X
i i

 X)
K
,
K K 2

i 1
i

and bˆ0  Y  mX
ˆ .

The estimate of the reliability function is given by


  t aˆ   bˆ 
ˆ and bˆ  exp   0  .
R(t )  exp      , with aˆ  m
  bˆ    mˆ 
ˆ
   

146
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Comments di should be measured in CPU time rather than wall
clock time.

bˆ  1 
   , where (.) is the
aˆ  aˆ 
The estimate of MTBF is

gamma function (see 2.1.2.5).

5.1.2 Shooman model

Assumptions 1. The number of errors in the code is a fixed number


(ET) and no new errors are introduced during the
debugging process.
2. The error detection rate is proportional to the
number of errors remaining in the code.
3. The software is operated according to the
anticipated operational usage.
4. The number of machine instructions (or LOC),
denoted by IT, is essentially constant.
5. The detection of errors is independent.
Let  be the quantity of debugging time (in months)
from the starting time of testing, and t be the
operating time (based on CPU time).
The hazard rate is Z(t) = K r(), with the
proportionality constant K and the error rate

 r ( )    c ( ), where  c ( ) is the cumulative


ET
IT

number of errors fixed in the interval of length ,


normalized by IT.
Hence, the reliability function is

147
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 E  
R(t )  exp   K  T   c ( )  t  .
  IT  

 E 
1

MTBF   K  T   c ( )   .
  IT 
Therefore, The only

unknowns in this model are ET and K.


Data For each testing interval record the number of test
requirements runs (of functional tests), and for each run record the
amount of CPU time that the program successfully
executed. Let assume that, for the functional test i, the
sequence Ti ,1 , Ti ,2 , , Ti ,mi describes the execution times

of the mi successful runs. For the ri-mi unsuccessful


runs, let the sequence ti ,1 , ti ,2 , , ti ,ri mi describing the

execution times of these runs before the errors were


discovered. Hence the total amount of successful
execution time in the ith functional testing period is

H i   Ti , j  t
mi ri  mi
.
j 1 j 1
i, j

Estimates The constant failure rate for the ith functional testing
period is estimated as:
r  mi
Zˆi  number of failures per hour = i ,
Hi

with its reciprocal to be the estimation of MTBF.


Let 1 < 2, be the debugging times which are chosen
so that c(1) < c(2), and r1, r2 are the test runs
(usually r1 = r2). The estimates of K and ET can be
obtained from

148
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 H1
 r  m  ET
1
   c ( 1 )
,


1 1 K

IT
 H2 
 r2  m2
1
  c ( 2 )
.

ET

K
IT

Comments The estimation variates depending on the debugging


times. An averaging procedure, based on the median
of a sequence of estimates, is necessary.

5.1.3 Jelinski-Moranda model

Assumptions 1. The rate of error detection is proportional to the


current error of a program.
2. All errors are equally likely to occur and are
independent of each other.
3. All errors have the same order of severity.
4. The error rate remains constant over the interval
between error occurrences.
5. The software is operated according to the
operational profile.
6. The correction of errors is instantaneous and no
new errors are introduced into the program.
Hence, the hazard rate is
Z(t)=(N-i+1),
where N is the total number of errors initially in the
software, (i-1) errors have been discovered by time t,
and  is the proportionality constant.

149
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The rate is reduced by the same amount  at the time
of each error is detected.
Data If t0 = 0, and t1, t2, ... are the sequence of time
requirements moments when errors are discovered, and Xi = ti – ti-1,
i=1, 2, ..., N, then Xi are assumed to be distributed
exponentially with rate Z(ti):
f ( X i )  [ N  (i  1)]exp{[ N  (i 1)] X i }.

Estimates Two kinds of estimates can be obtained: Maximum


Likelihood Estimates (MLE), and Least Square
Estimates (LSE).

L( X1 ,..., X n )   [ N  (i  1)]exp{[ N  (i  1)] X i },


a) MLE: Considering the joint density for all the Xi s,
n

i 1

to N and , the following equations (solvable by


and taking the partial derivatives of lnL with respect

numerical methods) are obtained:

150
Copyright: 2012 by Florin POPENTIU-VLADICESCU

N   X i    (i  1) X i
n
 n  n
 i 1  i 1

 N  (i  1) 
and

  (i  1) X i 
n
1 n
1  n 
.
i 1
N n
 X i  i1
i 1

b) LSE: Minimizing the sum of the squared


differences between the observed time between
failures and their mean values (the MTBFs):

 ( X i  MTBFi )    X i 
 
 ,
2
n n
1
i 1  [ N  (i  1)] 
2

i 1

by taking the partial derivatives of the expression


with respect to  and N and setting the resulting
expressions equal to zero, the LSE can be obtained by
solving the system of equations:




n
1
[ N  (i  1)]2
  i 1n


,

Xi
i 1 N  (i  1)

 n
  2  
 n 
2 

Xi 1
 i 1 ( N  i  1)  i 1 ( N  i  1) 

=  
  n  n 
 3 
Xi 1
  i 1 ( N  i  1)  i 1 ( N  i  1) 
.

Comments The resulting estimate of the MTBF of the (j+1)st


error occurrence is 1/Z(tj)=1/((N-j)), where  and
N are MLE or LSE values.

151
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The estimating time to remove the next m errors
(TRNE), after observing n failures is given by

 ( N  j  1).
nm
TRNE 
1
j  n 1

5.1.4 Lipow model

Assumptions The Lipow model is an extension to the Jelinski-


Moranda model by allowing more than one error
occurrence during an interval of testing and also
allowing that all errors found in a given testing
interval not be corrected by the start of the next
testing period:
1. The rate of error detection is proportional to the
current error content of a program.
2. All errors are equally likely to occur and are
independent of each other.
3. Each error is of the same order of severity as any
other error.
4. The error rate remains constant over the testing
interval.
5. The software operates in a similar manner as the
anticipated operational usage.
6. During a testing interval i, fi errors are discovered
but only ni are corrected in the time window.
Data Suppose there are M periods of testing, the length of
requirements the testing interval i is xi. The hazard rate during the
ith testing interval is Z(t) = [N-Fi-1], ti- t ti,

152
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where N is the initial number of errors in the
program,  is the proportionality constant, and

Fi 1   n j
i 1

j 1

is the total number of errors corrected up through


the (i-1)st testing intervals, and ti is the time
measured in either CPU or wall clock time of the end
of the ith testing interval. The ti s are fixed, not
random as in the Jelinski-Moranda model.
Estimates Taking the number of failures, fi, in the ith interval to
be a Poisson random variable with mean Z(ti)xi, the
likelihood function is

L( f1 , , f M )  
[( N  Fi 1 ) xi ] fi exp{( N  Fi 1 ) xi }
M
.
i 1 fi !
The MLEs of  and N can be obtained by taking the
partial derivatives of lnL with respect to  and N:

  
FM A
 N 1 B A
,

 
M

 N  1  B A i 1 N  Fi 1
FM fi
,

where A   xi is the total length of the testing


M

i 1

periods, B   ( Fi 1  1) xi , and FM   fi is the total


M M

i 1 i 1

number of errors found in the M periods of testing.


Comments The maximum likelihood of the mean time until the

next failure is: MTBF 


1
( N  FM )
.

153
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5.1.5 Schick-Wolverton model

Assumptions In this model the hazard rate function is not only


proportional to the number of errors in the program,
but proportional to the amount of testing time as
well. The following assumptions apply to this model:
1. The rate of error detection is proportional to the
current error content of a program and to the amount
of time spent in testing.
2. All errors are equally likely to occur and are
independent of each other.
3. Each error is of the same order of severity as any
other error.
4. The software is operated in a similar manner as the
anticipated operational usage.
5. The errors are corrected instantaneously without
introduction of new errors into the program.
Data Let N be the total number of errors initially in the
requirements software,  is a proportionality constant, Xi is the
amount of testing time spent between the discovery
of the (i-1)st error at time ti-1 and the ith error at time
ti. The form of the hazard rate function is
Z ( X i )  [ N  (i  1)] X i .

Therefore, the reliability function, and MTBF is


 X2
R( X i )  exp [ N  (i  1)] i  ,
 2 

respectively,

154
Copyright: 2012 by Florin POPENTIU-VLADICESCU

MTBF   R( X i )dX i 

2[ N  (i  1)]
.
0

Estimates Suppose that errors are discovered at times t 1, t2, ...,


tn, and Xi=ti-ti-1. Using
 X2
f ( X i )  [ N  (i  1)] X i exp [ N  (i  1)] i  ,
 2 

and writing the likelihood function L(X1, ..., Xn) based


upon the Xi s, and taking the partial derivatives with
respect to N and  of the log of likelihood function
and setting them to zero, the MLEs values of the
parameters are the solutions of the following system
of equations:


 ( N  i  1) X
2n
n
2

i 1
i

and

X

n
2

 i 1
n i
1
i 1 N  i  1
.
2
Comments The MLE of the MTBF is given as


MTBF 
2( N  n)
.

The Lipow s version of the Schick-Wolverton error


counting model assumes that the rate of error
detection is proportional to the current error content
of the program and the total time previously spent in
testing including an „averaged error search time

155
Copyright: 2012 by Florin POPENTIU-VLADICESCU
during the current time interval of testing. Suppose fi
errors are discovered during the ith testing interval

and Fi   f j . The hazard rate function is given by


i

j 1

 x 
Z ( xi )  ( N  F i 1) Yi 1  i  ,
 2

where
 xi is the amount of testing time spent between
the end of testing period i1 and the end of the
ith testing interval, and

 Yi 1   x j (the cumulative amount of testing


n

j 1

time spent through i-1 intervals).


Hence, the reliability function is
 x2 
R( xi )  exp  ( N  Fi 1 )(Yi 1 xi  i ) ,
 4 

with

MTBF 

 
exp ( N  Fi 1 )Yi 21 .
( N  Fi 1 )

According to Sukert (cited by Farr in NSWC TR 82-


171) the MLEs for  and N can be obtained as
solutions to the following system of equations:

156
Copyright: 2012 by Florin POPENTIU-VLADICESCU

  M

FM
 ( N  Fi 1 ) xi (Yi 1  )
,
xi

    xi (Yi 1  i ),
M
i 1 2
M
fi x
 i 1 N  Fi 1 i 1 2

with
 M – the number of testing intervals and

 FM   f i - the total number of error found


M

i 1

in all the intervals of testing.

5.1.6 Jelinski-Moranda Geometric model

Assumptions 1. The software is never error free hence there are an


infinite number of total errors.
2. The errors do not have the same chance of
detection.
3. The detections of errors are independent.
4. The software is operated according to the
operational profile.
5. The error detection rate forms a geometric
progression and is constant between error
occurrences.
Data To make a decision either the time of error
requirements occurrences (denoted by ti), or the time between
error occurrences (denoted by Xi) is necessary to be
collected.
Estimates From the above assumptions, the hazard rate is
modeled by Z(t) = Di-1 for the time t between the

157
Copyright: 2012 by Florin POPENTIU-VLADICESCU
occurrence of the (i-1)st error and the ith (the hazard
rate decreases in a geometric progression as error
detection occurs because 0 <  <1). If Xi = ti – ti-1 (is
the time of discovery between the ith and (i-1)st
error), and Xi s are assumed to be independent
exponentials with rate Z(ti):
f ( X i )  D i 1 exp(  D i 1 X i ) ,

then the likelihood function is given by:

 exp   D  i 1 X i  .
 
L ( X 1 , X 2 , , X n )  D i 1
n n

 
n

i 1 i 1

and the MLEs of D and  can be obtained as the


solution of the following system of equations:

 D   X i  0,
n
 i 1
n

 n i 1
  D (i  1) i  2 X i  0.
i 1
n

 i 1  i 1

Therefore,

 i X
n

 n 1
i

D i 1

i

 X  X
n

n
, and n
.
i i 2
i 1 i 1
i i

The MTBF after n observed errors is MTBF  1 / D n .


Comments Lipow also extend the geometric model considering
the following assumptions, only:
1. All errors do not have the same chance of
detection.
2. The detections of errors are independent.
3. The operational profile is respected.

158
Copyright: 2012 by Florin POPENTIU-VLADICESCU
4. The error detection, for the ith interval, respects
the following rule:
Z (t )  D ni 1 for t i 1  t  t i ,

Where ni-1 is the cumulative number of errors found


up the ith interval of testing.

5.2 Poisson models

5.2.1 The Generalized Poisson Model (GPM)

Assumptions 1. The expected number of errors occurring in any


time interval is proportional to the error content at
the time of testing and to some function of the
amount of time spent in error testing.
2. All errors are equally likely to occur and are
independent of each other.
3. Each error is of the same order of severity as any
other error.
4. The software is operated in a similar manner as the
anticipated operational usage (the user operational
profile).
5. The errors are corrected at the ends of the testing
period without introduction of new errors into the
software.
Data Assume that the testing intervals are of the length x1,

159
Copyright: 2012 by Florin POPENTIU-VLADICESCU
requirements x2, ..., xn and fi errors are discovered in the ith interval.
Let Mi be the number of corrected errors, then E{fi} =
 (N-Mi-1)gi(x1, x2, ..., xi), where  is the
proportionality constant, N is the initial number of
errors, and gi is a function (gi (.) = xi ( =1; the

x i2
Jelinski-Moranda model), (gi(.)= , the Schick-
2

Wolverton model), (gi(.)= xi  x j  i  , the Lipow s


 i 1 x 
 j 1 2

model)) of the amount of testing time spent


previously and currently.
Estimates The joint density of the fi s is their product and the
MLEs can be obtained by solving the following pairs
of equations:




n

 
fi
i 1

N  g i   M i 1 g i

,

n n

n
 
i 1 i 1

 (N  M )  
n
fi
 i 1
g i.
i 1 i 1

Comments The GPM estimation problem is a three parameter


estimation one (N,  , and ) or a two parameter
estimation one if  is specified. Good initial starting
values for achieving convergence are required.

160
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5.2.2 The Geometric Poisson Model

Assumptions This model is best applied to situations in which the


length of the reporting period is small in relationship
to the overall length of the testing time. The model
was also proposed by Moranda with the following
assumptions:
1. There is an infinite number of errors.
2. The detections of errors are independent.
3. The errors do not have the same chance of
detection.
4. The software is operated according to the
operational profile.
5. During the ith time period, the number of errors
detected, fi, during that period follows a Poisson
distribution with parameter D  i 1 where D is the
initial detection rate and  is the constant of
proportionality (0<<1).
6. Each error discovered is either corrected or not
counted again.
Data fi – the number of errors detected during the ith
requirements period, i = 1, 2, ....
Estimates The likelihood function for the m reporting intervals

D   
is given by:

L( f 1 , f 2 , , f m )  
m i 1 f i
exp  D i 1
.
i 1 fi!

161
Copyright: 2012 by Florin POPENTIU-VLADICESCU
  i
m 1
i 1
m
i
After some computation on and , the
i 1 i 1

MLEs of D and  can be obtained as solution of the


following system of equations:




m

 D  m 1
fi
i 1


 
,
i 1


  fi
i 1
m

 i 1 (1   m )(1   )
 m 
  (i  1) f i
  (m  1) m 1  m m
.

 i 1

Comments The testing intervals are all assumed to be the same


length (the testing period is composed of a day, week,
etc.)

5. . The Schneidewind’s Model

Assumptions The Scheidewind s model is recommended by )EEE,


2008) as a good choice for practical applications. The
model assumptions are:
1. The number of errors detected in one interval is
independent of the error count in another.
2. The error correction rate is proportional to the
number of errors to be corrected.
3. The software is operated according to the
operational profile.
4. The mean number of detected errors decreases
from one interval to the next.

162
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5. The intervals are all of the same length.
6. The rate of error detection is proportional to the
number of errors within the program at the time of
test. The error detection process is assumed to be a
non-homogeneous Poisson process with an
exponentially decreasing error detection rate: di =

exp(-i ), for the ith interval, with >0 and >0.


The cumulative mean number of errors is

Di  [1  exp( i )],

and the mean number of errors (for the ith interval)
is
mi  Di  Di 1 .

Data There are m intervals of testing and in the ith interval


requirements were detected fi errors.

Let us denote Fs 1   f i , the cumulative error count


s 1

i 1

from intervals 1 through s-1.


Estimates Let Ms-1 be the number of errors in the interval 1
through s-1 with s chosen as an integer value in the
range 2 ≤ s ≤ m. Using the likelihood function given by


M sFs11 exp(  M s 1 ) m mifi exp( mi )
L( f 1 , f 2 ,  , f m )  ,
Fs 1! i s fi!

the MLEs are given by

  fi
m

  ln( y ),   i 1

1  exp( m )
,

163
Copyright: 2012 by Florin POPENTIU-VLADICESCU
where y is the solution of the polynomial equation
( s  1) Fs 1 Fs ,m
  m m  A,
mF
y 1
s 1
y 1 y 1

with

A   ( s  i  1) f s i and F s ,m   f i .
m s m

i 0 is

Comments There are three approaches:


1. All errors (for the m intervals) are used during
analysis (applicable when all information can be used
n predicting future counts).
2. The error counts from the first s-1 time intervals
(1<s<m+1) are ignored and only data from intervals s
through m are used.
3. The cumulative error count from intervals 1
through s-1, and the individual errors counts from
interval s through m are used.

5.3 NHPP Software Reliability Models

Let be a(t) - the total number of faults in the software including


the initial and introduced faults, b(t) - the fault detection rate, λ(t) – the
failure intensity function, m(t) – the expected number of error detected
by time t, N(t) – the cumulative number of errors detected by time t.
A general class of NHPP (non-homogeneous Poisson Process)
software reliability growth models can be obtained by solving the
following differential equation (according to Pham 2003):

164
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 b(t )[a(t )  m(t )],
dm(t )
dt

with the solution


 
m(t )  exp(  B(t )) m0   a( s)b( s) exp( B( s))ds  ,

t

 
 t0 

where t0 is the starting time of the debugging process, m0  m(t 0 ) and

B(t )   b( s)ds .
t

t0

The reliability function can be computed as


R( x / t )  exp[ (m(t  x)  m(t ))] .

Parameter estimation is achieved by applying the maximum


likelihood method. The form of the log likelihood function depends on
the test data format (when collected during testing).
For interval domain data when yi is the cumulative number of
detected errors in interval (0, ti), then

LLF   ( yi  yi 1 ) log( m(t i )  m(t i 1 ))  m(t n ) , with 0<t1<t2<…<tn.


n

i 1

For time domain data, when n successive times of observed


failures sj, 0≤ s1 ≤… ≤ sn, then

LLF   log(  ( si ))  m( s n ),
n

i 1

where  (t ) 
dm(t )
.
dt

165
Copyright: 2012 by Florin POPENTIU-VLADICESCU
A summary of some representative NHPP software reliability
models is presented in the following table.

Model name Model Characteristics (Pham, 2000)


type
Goel- Concave m(t )  a(1  exp(bt )), a(t)=a, b(t)=b
Okumoto
Delayed S- S-shaped Assumptions:
shaped 1. All faults are mutually independent
detectable.
2. The probability of failure detection at
any time is proportional to the current
number of faults in the software.
3. The proportionality of failure detection
is constant.
4. The initial error content of the
software is a random variable.
5. The software is subject to failures at
random times.
6. The time between failures (i-1)st and
ith depends on the time to the (i-1)st
failure.
7. When a failure occurs, the bug is
immediately removed and no others
errors are introduced.

If a(t)=a, b(t ) 
b 2t
bt  1
then

166
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 (t )  ab 2 t exp( bt ),
and
m(t )  a{1  (1  bt ) exp( bt )}
Inflection S- S-shaped Assumptions:
shaped 1. Some of the faults are not detectable
before some other faults are removed.
2. The probability of failure detection is
proportional to the current number of
detectable faults.
3. Failure rate of detectable faults is
constant and identical.
4. The isolated faults can be entirely
removed.
If is an inflection factor, a(t) = a,

b(t)=b/(1+ exp(-bt)), then

m(t)=a(1-exp(-bt))/(1+ exp(-bt)),
and
ab(1   ) exp( bt )
 (t ) 
(1   exp( bt )) 2
.

Yamada S-shaped The model assumptions are as follows: 1)


exponential The software is operated during test in a
manner similar to anticipated usage; 2)
The failure occurrences are independent
and random; 3) The initial fault content is
a random variable; 4) The time between
failure i−1 and failure i depends on the

167
Copyright: 2012 by Florin POPENTIU-VLADICESCU
TTF of failure i−1; 5) Each time a failure
occurs, the fault which caused it is
immediately removed, and no other faults
are introduced.
Attempt to count for testing effort:
a(t)=a,
b(t)=r exp(- t),

m(t) = a{1-exp(-r (1-exp(- t))}.

NHPP Multiple The model allows three different error


Imperfect failure types: critical errors (very difficult to
Debugging type and detect and remove), major errors
Model imperfect (difficult to detect and remove), and
debugging minor errors (easy to detect and remove).
Let be: a - the expected number of
software errors to be eventually detected,
bi - the error detection ate per type i
error, and pi - the content proportion of
type i errors, i – type i error introduction
rate, and mi(t) – the expected number of
software type i detected errors by time t.
The function m(t) is obtained as the
solution of the following system of
differential equations:

168
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 dmi (t )
 dt  bi (ni (t )  mi (t ))
 dn (t )
 i  i
dmi (t )
 dt

m(t )   mi (t )
dt
3

 i 1
ni (t )  api
m (0)  0,
 i
with the following components:

mi (t )  (1  exp( (1   i )bi t )),


(1   i )
api

i (t )  api bi exp( (1   i )bi t ), and

ni (t )  (1   i exp( (1   i )bi t )).


(1   i )
api

The software reliability function is

R( x / t )  exp   
  3 api (1 i )bit 
(1  e (1 i )bi  
  i 1 1   i 
e

Yamada Concave Let be  the error introduction rate.


exponential Then a(t)=a exp( t), b(t)=b,
imperfect m(t)=ab(exp( t)-exp(-bt))/( +b).
debugging
Pham- S-shaped Assume introduction rate is a linear
Nordmann and function of testing time, and the error
concave detection rate function is non-decreasing
model with an inflection S-shaped model:
a(t)=a(1+ t), b(t)=b/(1+ exp(-bt)),

169
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 
a1  exp( bt ) 1    at
m(t )   
1   bt
.

Pham- S-shaped Assume introduction rate is exponential


Zhang and function, and the error detection rate
concave function is non-decreasing with an
model inflection S-shaped model:
a(t )  c  a(1  e t ),

b(t ) 
1  e bt
b
,

(c  a)(1  e bt ) 
b 
a

e t  e bt
m(t ) 
1   exp( bt )
.

If a is the number of failures in the software, c is the testing


compression factor, T is the mean time to failure at the beginning of the
test, n is the total number of failures during the maintenance period, and
t is the execution time, Musa (according to Pham(2000)) obtained the
following differential equation:

 (a  m(t )) ,
dm(t ) c
dt nT
with the solution m(t)=a(1-exp(-ct/(nT))), which is called the Musa
exponential model.
Another model based on NHPP was obtained by (Duane, 1964)
which observed that the cumulative failure rate versus cumulative
testing time on logarithmic paper tended to follow a straight line. The
model assumptions are:
1. The software is operated according to operational profile.

170
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2. Every error has the same chance of being detected and is of the
same severity as any other error.
3. The error occurrences are independent.
4. The cumulative number of errors detected at any time t, denoted
by N(t), follows a Poisson distribution with mean m(t). The mean
function is taken to be of the form m(t)=λt .

and λ are:  
 t  1
1 n
t n
The MTBF is .The MLEs of , and


 ln(t
n
n 1
, where the ti s are the observed failure times and n is
/ ti )
i 1
n

the number of software errors detected.

5.4 Generalized Models with Environmental Factors

Let be z – the vector of environmental factors (covariates), –

the coefficient vector of environmental factors, and ( z – the

function of environmental factors (usually takes an exponential

form: ( z = exp 0 + 1z1 + z + …+


2 2 m mz )). If 0 (t ) is the failure

intensity function without environmental factors, λ(t,z) is the failure


intensity function with environmental factors having the form
 (t , z)  0 (t )(z ) , then the mean value function with

environmental factors is

m(t , z )   0 ( s)( z )ds  m0 (t )( z ),


t

171
Copyright: 2012 by Florin POPENTIU-VLADICESCU
and the reliability function has the expression:
R(x/t)=R0(x/t)( z),
where R0(x/t) is th reliability function without environmental factors.

5.5 Bayesian Models

Bayesian models employ a subjective way of thinking the


meaning of software reliability. According to the Bayesian approach, the
reliability should be a reflection of the number f errors discovered and
the length of error-free testing periods. Also, the existence of some code
regions with huge traffic during run-time, but other regions are not so
stressed, ask for a Bayesian approach.

5.5. Littlewood’s Bayesian Debugging Model

Assumptions Each error does not contribute equally; errors


correcting earlier have an increased impact
comparing against the influence of the errors
corrected later.
Littlewood postulates that the error rate should be
treated as random.
(a) The individual failure rates of the errors in the
program are assumed to be independent random
variables each with a prior distribution that is
assumed gamma with parameters  and :
   1e 
g ( i )  g (  j )  ,   0 , for all i and j.
( )

172
Copyright: 2012 by Florin POPENTIU-VLADICESCU
(b) For a given error rate, i, the time between error
occurrence Xi = ti – ti-1 is exponential with mean 1/i,
f ( X i | i )  i ei X i , X i  0.

(c) i = 1+ 2 + … + N-i+1 after i-1 errors have been


detected and corrected.
(d) When a software error is detected, it is
immediately corrected without the introduction of
additional errors.
(e) The software is operated in a similar manner as
the anticipated operational usage.

If all of the i s are assumed to have a degenerate


distribution at the point  with probability 1, then
i=(N-i+1).
Data - The sequence ti, measured in CPU time rather than
requirements wall clock time, and Xi=ti-ti-1.

Estimates Suppose X1, X2, …, Xn be the times between failures: Xi


= ti – ti-1. If the density function for k is gamma with
parameters  and +t, then i+1 (as a sum of
independent, identically distributed random
variables) is also gamma with parameters (N-i) and
+t. The unconditional distribution of the time to the
next failure Xi+1 is

173
Copyright: 2012 by Florin POPENTIU-VLADICESCU
f ( X i 1 )   f ( X i 1 | i 1 ) g (i 1 )d i 1

 (   t ) ( N  i )

0
 [( N 1) 1]1  (   t  xi 1 ) i 1
i 1
d i 1
[( N  i ) ]
e

[( N  i ) ](   t )( N i )
0

 , X i 1  0
(   t  X i 1 )( N 1) 1
i.e. a Pareto distribution. The following quantities can
be derived:
1. The reliability function after i errors are
discovered is given by:

R( x)  P{ X i 1  x}  1  P{ X i 1  x}  1   f ( X i 1 )dX i 1 .
x

Hence,

  t 
( N i )

R ( x)   
  t  x 
.

2. The failure rate function becomes


R '( x) ( N  i)
Z ( x)   
R( x)   t  x
.

As a conclusion, the failure rate, immediately after


discovering i faults during t amount of testing time, is
( N  i)
Z (0) 
 t
.

3. The MTBF, from the Pareto distribution, is found


as:
 t
MTBF  E{ X i 1} 
( N  i)  1
,

when (N-i)>1.
4. The likelihood function is

174
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 ( N  i  1) (  t )( N i 1)
n

i 1
L( N ,  ,  )  i 1

(   ti 1  X i )( N i 1) 1
,

where

ti 1   x j
i 1

j 1

is the time of occurrence of the (i-1)st fault.


5. The LSE are those N, ,  are chosen to minimize

S ( N , ,  )    X i 
   ti 1 
( N  i  1)  1
2
n

i 1 
.

The system of equations to be solved in order to find


LS estimates is:

 ( N  i  1)  1  [( N  i  1)  1]2 ,


 n   ti 1

n
Xi
 i 1 i 1


 
X i (   ti 1 ) (   ti 1 ) 2

n n

        
,
 
2 3

 n X ( N  i  1)(   t ) n ( N  i  1)(   t ) 2
[( N i 1) 1] [( N i 1) 1]

 i 
i 1 i 1

i 1 i 1

 i 1 [( N  i  1)  1] i 1 [( N  i  1)  1]
2 3
.

Comments - The model reformulates the Jelinski-Moranda model


into a Bayesian framework.
- The reliability of the program can be estimated after
some specified execution time has elapsed or after
some specified number of errors has been removed.
If ti is the amount of testing already performed and I
is the number of discovered and corrected faults, and
dt is additional time for testing then the reliability of
the program at the time ti+dt is:

175
Copyright: 2012 by Florin POPENTIU-VLADICESCU
    t    
N i
  ti

R( x)  1      
    ti  dt     ti  dt  x  
i
.

5.5. Littlewood and Verrall’s Bayesian reliability model

Assumptions (a) Successive execution times between failures, Xi,


i= , , …, n, are independent random variables with
probability density functions
f ( X i | i )  i ei X i , X i  0.

(b) The i s are a sequence of independent random


variables, each with a gamma distribution of
parameters  and (i):
[ (i)] i 1e (i ) i
g (i )  , i  0.
( )
The function (i) should be an increasing function of
i that describes the quality of a programmer and the
difficulty of the programming task. Moreover:
P{ j  k}  P{ j 1  k}, j  2,3, , n.

Littlewood and Verrall suggested the following


expressions for the  function:
 Linear: (i)=0+1i,
 Quadratic: (i) = 0+1i2.
(c) The software is operated in a similar manner as
the anticipated operational usage.
Data - the times between failures, Xi, i= , , …, n.
requirements
Estimates Putting together the randomness of failing process

176
Copyright: 2012 by Florin POPENTIU-VLADICESCU
and the randomness of debugging process, it is
obtained:
 [ (i)]
f [ X i |  , (i)]   f ( X i | i ) g (i )d i 

[ X i  (i)] 1
,
0

where Xi>0, Xi s having a Pareto distribution.


The joint density for the Xi s is:

 n  [ (i)]
n

f [ X i ,..., X n |  , (i)]  i 1

[ X  (i )]
,
 1
n

i 1
i

Xi> , i= , , …, n.
Using one of the forms for (i), the likelihood
function is now a function of three unknowns (, 0,
1):
L(, 0, 1) =f(X1, X2, …, Xn|, 0, 1).
The MLEs are the solutions to the following system of
equations:

   ln (i )   ln( X i  (i )]  0,
 L n n

n

   i 1 i 1
 L
         0,
n n

  
1 1
 0 
( 1)
 
 L
(i ) X (i )


i 1 i 1 i

  (  1)  0,
n

 1 i 1  (i ) X i  (i )
j j

where j=i or j=i2 depending on the form of (i).


In order to derive the least square approach for the
estimation of parameters, it is necessary to compute
the MTBF. Since

177
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 [ (i)]
f ( X i |  , (i)) 
[ X i  (i)] 1
,

the MTBF becomes:

 X i [ (i)]  (i)
E{ X i }  

dX i 
[ X i  (i)]  1
 1
,

if >1.
0

The LSE s are obtained by the minimization of:

S ( , 0 , 1 )    X i  E{ X i } .
n
2

i 1

In the case of linear assumptions for (i), the


following system of equations is necessary to be
solved:



 S  2 (i )
n

    1  ni 1  0,
 
  X i (i )

i 1


 S n 0 1n(n  1)
  Xi    0,
n

      

1 2( 1)

1 i
0 i 1

 
 1 

n

  
2

 X ii   i 1  0.
n

2(  1)  1
S n ( n 1) 0


i 1


Comments The model tries to account for error generation in the
corrective process. With each error correction, a
sequence of programs is generated, each being
obtained from its predecessor by attempting to
correct a bug. A good programmer should have a
more rapidly increasing function  than a poorer
programmer.

178
Copyright: 2012 by Florin POPENTIU-VLADICESCU
5.5.3 Thomson-Chelson’s Bayesian reliability model

Assumptions (a) The software is not corrected during a testing


cycle - only at the completion of a cycle and before
the start of a new one.
(b) The software is operated in a similar manner as
the anticipated operational profile.
(c) The faults are assumed to occur at some unknown
constant rate, . The total number of faults observed
in a testing cycle of length T follows a Poisson
distribution with parameter T:
e T (T ) fi
f ( fi |  )  , fi  0,1,...
fi !
(d) If p denotes the probability that the software
contains one or more errors, it can be assumed that p
has a prior distribution that is beta with parameters
a+1 and b+1, i.e.,
(a  b  2)
g ( p)  p a (1  p)b , 0  p  1, a, b  1.
(a  1)(b  1)
(e) The uncertainty about the parameter  is
expressed as a prior distribution for : gamma with
parameters T0 and f0+1:
T0 (T0 ) f0  T0
h( ) 
( f 0  1)
e .

Data (a) The number of software errors discovered in each


requirements period of testing (fi,, i= , ,… .

179
Copyright: 2012 by Florin POPENTIU-VLADICESCU
(b) The length of testing time for each period (Ti, i =
, ,…
(c) For the total number of software packages that
have been released, the number found to contain
errors. These numbers are used in determining the
prior distribution for p at any stage.
Estimates Case 1: If p=1 (i.e. it is known before testing begins
that the software contains bugs), and fi errors are
observed in testing time Ti, then the posterior
distributions for  (the failure rate) and the software
reliability R are:
(Ti  T0 ) fi  f0 1  fi  f0   (Ti T0 )
h ( | f i )  ,   0,
( fi  f 0  1)
e

and

 Ti T0 
1
T T   1
f 0  fi f 0  fi 1 
R t 
f ( R | fi )   i 0   ln 
 t   R ( f 0  f i )
,

where 0R1, and t is the postulated mission time for


the software.

Case 2: If no errors are observed in the testing cycle i


(fi = 0), then the posterior cumulative distribution
functions for  (the failure rate) and the software
reliability R are:

180
Copyright: 2012 by Florin POPENTIU-VLADICESCU

a 1 b 1

H ( | fi  0)  (Ti  T0 ) f0 1 x f0 e x (Ti T0 ) dx 
ab2 0 ab2
,

and


R (Ti  T0 )
f 0 1 Ti T0

F ( R | fi  0, p)   p 
1
 x dx, 0  R  1,
x ln
( f 0  1)
 0
1, R  1.

If assume that a square error loss function is used to


estimate the parameters  (the failure rate) and the
software reliability R, the Bayes estimates are the
means of the respective posterior distributions.
If fi=0, then
a  1 f0  1

a  b  2 Ti  T0
,

and

b 1 a  1  Ti  T0 
f 0 1

R   
a  b  2 a  b  2  Ti  T0  1 
.

If fi>0, then
fi  f 0  1

Ti  T0
,

and

 T T 
f 0  fi

R i 0 
 Ti  T0  t 
.

181
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Comments 1. The model objective is to obtain the total system
reliability. Not only software failures are considered,
but also hardware failures and unknown or
ambiguous source-related malfunctions.
2. The parameter a is thought of as the number of
previously delivered software packages with errors
among a total of a+b delivered.

3. The f0 can be thought of as the number of software-


related system malfunctions in previous testing of
total duration T0.

4. If i-1 test periods have elapsed, then

T0   T j ,
i 1

j 1

and

f0   f j .
i 1

j 1

182
Copyright: 2012 by Florin POPENTIU-VLADICESCU
6 Software Reliability Model Selection

The problem of proper model selection can be summarized


briefly as follows: The available set of data to the user will be a sequence
of times between successive failures t1, t2, ..., ti-1. These observed times
are the realization of random variables T1, T2, ..., Ti-1. A software
reliability model is then applied to use the observed set of data t1, t2, ..., ti-
1 to predict the future unobserved Ti, Ti+1, … . The problem to be solved
by a software reliability model is then a prediction problem. It involves
the future via the unobserved random variable Ti. Using the goodness of
fit will not solve the problem. The reason, as pointed out in is that
models are usually too complicated for a traditional "goodness-of-fit"
approach to be used. In the following some specific approaches will be
presented.
The software reliability prediction problem can be solved,
according to (Fenton & Pfleeger, 1996), taking into account three
elements:
(1) a prediction model that gives a complete probability
specification of the stochastic process,
(2) an inference procedure for the unknown parameters of
the model based on collected data, and
(3) a prediction procedure that combines the model and
inference procedure to predict the future behaviour
related to the failures appearance.
The software reliability prediction and estimation tasks comprise
four fundamental conceptual tasks:

183
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1) Establishment of reliability goals for software elements
through modeling and allocation
2) Estimation of software design reliability through the software
reliability prediction process
3) Software reliability growth through operational profile
testing, and
4) Evaluation of the achieved software reliability through
Formal Qualification Testing.
Software reliability prediction is performed at each phase of the
software development process up to software system test.
Product and process metrics applicable to the goals of the project
are collected and used to predict the failure rate that the software will
exhibit at the beginning of system test and at deployment. The failure
rate prediction is then used to estimate the duration of, and assurance
resources required for, the reliability growth testing, which will be
needed to achieve the allocated software reliability.
Both non-quantitative and quantitative criteria are used to select
a software reliability model to make a decision.
The set of non-quantitative criteria contains: the model validity,
the ease of measurements, quality of assumptions, applicability,
simplicity, and insensitivity to noise. As a general definition, a model is
valid if it is measuring/predicting what it is intended to
measure/predict, and nothing else.
The set of quantitative criteria contains, but is not limited to, the
self-consistency, the goodness of fit, the relative accuracy, the bias, and
the bias trend.
Analysis of a model s predictive quality can help user decide
which model(s) to use. Simplest question a Software reliability

184
Copyright: 2012 by Florin POPENTIU-VLADICESCU
engineer/manager can ask is (ow reliable is the software at this
moment? The time to the next failure, Ti, is usually predicted using
observed times of failure t1, t2, …, ti-1.
In general, predictions of Ti can be made using observed times to
failure t1, t2, …, ti-k, k>0.
The results of predictions made for different values of K can then
be compared. )f a model produced self consistent results for differing
values of K, this indicates that its use is appropriate for the data on
which the particular predictions were made.

6.1 Software Reliability Growth

System reliability growth consists of a combination of hardware


and software reliability growth. The hardware and software growth
estimates are combined, using the system reliability model, and used to
estimate system reliability growth over time.
Rome Laboratory TR-92- , Software Reliability Measurement
and Test )ntegration Techniques, contains a method for predicting
software reliability. This method requires that a fault density parameter
in terms of faults per KSLOC is predicted. The fault density parameter
can then be translated to a failure rate by using a default table, collecting
data or using historical data.
The total estimated number of inherent defects at delivery can
also be predicted by multiplying the fault density by the total number of
KSLOC.
However, the software engineers use appropriate methodologies
to decrease the possibility of making errors, or introducing bugs. In the

185
Copyright: 2012 by Florin POPENTIU-VLADICESCU
following some statistical tools useful to judge the increasing/
decreasing reliability of software systems are presented.

6.1.1 The Laplace Test

To determine which reliability growth model is better to use,


trend tests proved to be very useful. The two trend tests will be
described: Arithmetic mean test and the Laplace test.
Let (i) be the arithmetic mean of the observed inter-failure
times tj, j = , , …, i. An increasing sequence of (i) indicates reliability
growth and a decreasing sequence indicates reliability decay.
The Laplace test analyses the trend of the failures. The
expression for the Laplace test factor u(k) is:
cm
u (k )  12(k  1) , k= , …, n,
sk

 si , m  k , si   t j , n is the total number of failures,


1 k 1
where c 
i
s
k  1 i 1 2 j 1

tk is the inter-failure time between failures k-1 and k, and si is the time of
occurrence of failure i.
One can extract two types of information from such a graph: local
and global changes.
When the values are positive (resp. negative), the reliability is
globally increasing (resp. decreasing). On the other side, when the
values are increasing (resp. decreasing), we have local variations of the
reliability.
 If U is approximately equal to zero, it indicates a lack of trend,
 If U is greater than zero, the TBFs are decreasing,
 If U is less than zero, the TBFs are increasing.

186
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 6.1. Local and global changes (Karama Kanoun:

http://resist.isti.cnr.it/files/corsi/courseware_slides/software_reli
ability_eng.pdf)

187
Copyright: 2012 by Florin POPENTIU-VLADICESCU
6.1.2 TTT – Total Time on Test

Another well-known trend plot is the total time on test (TTT)


plot, initially developed by (Barlow and Campo, 1975) for non-
repairable systems.
Suppose that n identical components are tested simultaneously
up to a certain time t > 0. The random variables T1, T2, …, Tn, with
common cumulative distribution function F(t), denote the lifetimes of
the n components. Therefore, the order statistics T(1)  T(2)  …  T(n)
represent the failure times of the components in the order that they
occurred. Suppose now that i of the components have failed in the
interval (0; t]. Thus, the total lifetime of these i components is given by

T
i
. Since the remaining n-i components have survived the interval
j 1
( j)

(0; t], the total lifetime of these components is given by (n - i)t.


Therefore, the total time on test at time t, denoted by (t), is computed
as the total observed lifetime of the n components at time t:

 (t )   T( j )  (n  i)t ,
i

j 1

where i is such that T(i)  t  T(i+1), T(0)=0, and T(n+1)=.


The total time on test at the ith failure time (T(i)) is given by

 (T(i ) )   T( j )  (n  i)T(i ) ,
i

j 1

for all i = 1, : , … , n.
The TTT plot is defined as a plot of the ordered pairs
 i  (T(i ) ) 
 ,  ,
 n  (T( n ) ) 

188
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 (T(i ) )
 (T( n ) )
for all i = ;, , …, n. The second item is called the scaled TTT

statistics.
It was proved that, for any 0  u  1, as n , it is valid the
following relation:
 (T( nu  ) )
  (u ) ,
 (T( n ) )

almost sure, where . denotes the ceiling function, and

 0
 (u )  (1  F (t ))dt ,
1 F 1 ( u )

with  being the mean value of the distribution F(t).

TTT Plot Trend Plot - casre_S1_tbf.xls

1.0

0.8
Scaled Total Time of Test

0.6

0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

Scaled number of failures

Fig. 6.2. TTT-plot (s1.dat)

The following statements are valid and useful to a software reliability


engineer:

189
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 If the distribution F(t) of the lifetimes is exponential then
(u)=u, hence in case of exponential failure times the TTT
plot converges to the diagonal of the unit square.
 A convex TTT plot indicates reliability growth and a concave
one indicates reliability decrease.

6.1.3 The MIL-HDBK-189 Test

The MIL-HDBK-189 trend test is a conditional statistical test


based on the power-law process:

b t 
b 1

 (t )    ,
aa
where a and b are the model parameters which are positive. It is easy to
observe that if b<1 then (t) decreases, meaning that the failures tend to
occur less frequently (and the system shows reliability growth). If b>1,
then the system shows reliability decrease. When b=1 the homogeneous
Poisson process case is obtained.
Considering the event {Tn = tn}, the MLE of b of a failure truncated
power-law process is given by

b
 log(t
n
n 1
.
/ Ti )
i 1
n

Hence, under the null hypothesis of b=1 it follows that 2n/b 2(2(n-1)).
If the alternative hypothesis is two-sided, then the null hypothesis is
rejected if

b

2n
1 / 2
2
(2(n  1))

or

190
Copyright: 2012 by Florin POPENTIU-VLADICESCU
b
 / 2 (2(n  1))
2n
2
,

where 2 ( ) denotes the -percentile of the chi-squared distribution

with  degrees of freedom.

For large values of b the null hypothesis is rejected in favor of


reliability growth.

6.2 Criteria for model evaluation

An evaluation of SR models in support of a given project should


take into consideration the following aspects:
 Future predictive accuracy: Accuracy of the model in making
predictions beyond the time span of the collected data (i.e.,
comparison of future predictions with future observed data).
A decision is made as to whether the accuracy is satisfactory
for the application using a goodness-of-fit test.
 Historical predictive validity (i.e., comparison of retrospective
predictions against past observed data). This means that the
model with the least difference between the retrospective
prediction and the actual data is considered the best fit,
measured by the criterion of minimum relative error. One
approach to determine if the fitted model retrospectively
predicts the observed data within a reasonable relative error,
such as (actual – predicted /predicted %.
 Generality: Ability of a model to make accurate predictions in
a variety of operational settings (e.g., real-time, Web
applications).

191
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 Insensitivity to noise: The ability of the model to produce
accurate results in spite of errors in input data and parameter
estimates.
For assessing model fit, the following three measures may be
used to compare model predictions with a set of failure data: Accuracy,
Bias, and Trend. One way to measure prediction accuracy is by the
prequential likelihood (PL) function.
A model is considered biased if it predicts values that are
consistently greater than the observed failure data, or consistently less
than the observed data. To measure the amount of a model s bias, we
can compute the maximum vertical distance (i.e., the K-S distance)
between cumulative distribution functions of the observed and
predicted data. Trend is measured by whether a prediction quantity is
monotonically increasing or decreasing over a series of time intervals.

6.3 Analysis of predictive quality

System reliability predictions are used to assess overall design


progress toward achieving the specified system reliability. Reliability
predictions and estimates for the various system hardware elements
and software elements are combined using a system model. The
resultant reliability calculation is then compared against the specified
system requirement to determine whether the current system design
achieves the specified reliability, as (Fenton & Pfleeger, 1996)
recommended.

192
Copyright: 2012 by Florin POPENTIU-VLADICESCU
There is a great variation in the accuracy of software reliability
prediction models. Two main ways of inaccuracy has been detected:
predictions are biased or/and noisy. Bias and noise should be detected
before making a decision. In the following, basic mechanisms for quality
evaluation are given.

6.3.1 The u-plot

Given a predictor Fi (t ) that estimates the probability that the

time to the next failure is less than t, let us consider the sequence

ui  F i (t ) ,

where each ui is a probability integral transform of the observed ti using

the previously calculated predictor F i based upon t1, t2, …, ti-1.

Fig. 6.3. The u-plot

193
Copyright: 2012 by Florin POPENTIU-VLADICESCU
If each F i were identical to the true Fi then the ui would be
realizations of independent random variables with a uniform
distribution in [0, 1]. The simplest question to answer is whether their
distribution is close to U by plotting their sample cdf: we call this the u-
Plot.

6.3.2 The y-plot

One problem with the u-Plot is that it measures only one type of
departure of predictors from reality, the bias. For example if a prediction
system applied for a set of data shows optimism in the early predictions,
and pessimism in the later predictions, then a small distance will be
observed. It seems necessary then to examine the u'is for trend (i.e. the
trend bias). To do that, the y-Plot method makes, firstly, the following
transformation xi = - ln(1-ui).
If the {ui s} are a true realization of i.i.d. random variables with
U(0,1) distribution, then the {xi s} will be a realization of i.i.d, unit
exponential random variables. Thus, if the {xi's} are plotted on the
horizontal axis, then they should look like, a realization of a
homogeneous Poisson process. The alternative hypothesis (that there is
trend in the ui s will show itself as a non-constant rate for this process.
If the values of {xi's) are normalized onto (0, 1) and the obtained
values {yi} are plotted, with

x
i

yi  j 1
j

x
m
,

j 1
j

194
Copyright: 2012 by Florin POPENTIU-VLADICESCU
then a plot that shows the trend of the predictions of the model will be
obtained. The departure from the line of unit slope will indicate that the
prediction system is not capturing the "trend" in the failure data (i.e.
reliability growth.)

Fig. 6.4. The y-Plot for the LV and JM models (Littlewood, 1981)

6.3.3 Measures of noise

Both u-Plot and y-Plot can be used to measure the bias. In order
to measure the noise some statistics can be used: Braun Statistics (B),
Median Variability (MV), and the Rate Variability (RV).

195
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Let E i be the estimated mean of Ti, and n the number of data. The

 
Braun statistics, denoted by B, is given by

ti  E (Ti )
n

n 1
2

B i 1

 t  t 
n2
n
.
2

i 1
i

A small value of B indicates better smoothing of software


reliability.
The mean variability, MV, is given by

MV  
mi  mi 1
,
i mi 1
with mi being the predicted median of Ti. This measure indicates which
model has higher variability in its predictions (how noisy are the
obtained predictions). A small value of MV indicates better smoothing of
the predictions.
The rate variability (RV) measure is defined by

RV  
n
ri  ri 1
,
i 1 ri 1
where ri , the rate of occurrence of failures (ROCOF), is calculated
immediately after fixing a fault.
For all measures, the model comparison proceeds only for the
same data set.

6.3.4 Prequential Likelihood

The pdf for Fi (t ) for Ti is based on observations t1, t2, …, ti-1:

fi (t )  d Fi (t ) / dt

196
Copyright: 2012 by Florin POPENTIU-VLADICESCU
For one-step ahead predictions of Tj+1, Tj+2, …, Tj+n , the
prequential likelihood is:

 f (t ).
j n
PLn 
i  j 1
i

The PL value can be used to determine the accuracy of a software


reliability model, as (Littlewood, 1981) shown. This value is usually very
small, therefore its logarithmic value (which is always negative) is used
for comparison. The more negative it is, the more inaccurate the
software reliability model predictions. This measure can be considered
as a general procedure for choosing the best prediction system for a
given set of data, so in case of having a tie between two models with
respect to all other quality measures, the one with higher accuracy will
be preferable.

Fig. 6.5. How prequential likelihood works (Fenton & Pfleeger, 1996)

Two prediction systems, A and B, can be evaluated by computing


the prequential likelihood ratio (PLA/PLB):

197
Copyright: 2012 by Florin POPENTIU-VLADICESCU
f
j n
A
(t )
PLRn 
i  j 1
i

f
j n
.
B
(t )
i  j 1
i

If PLRn approaches infinity as n approaches infinity, B is discarded in


PLA
favor of A. In another interpretation, it means that if the ratio
PLB

increases consistently as n increases, then A is producing more accurate


predictions than B.

To conclude, if method A has been producing better predictions


than method B in the recent past, then it seems sensible to prefer A for
current predictions, according to (Fenton & Pfleeger, 1996).

6.4 Model selection by combination of models

Till now, the recommendation was: pick the model which had
performed best in the past, and use it for the next prediction . This
seems unduly rejecting of models which are only slightly inferior to the
best . Why not combine predictions from different models in some
optimal way? According to the pooling of expert opinion ?
The idea behind this method is that, rather than predicting
software reliability by using only one model, a meta predictor could be
formed to take a linear combination of two (or more) predictions, with
weights chosen in some optimal way.

198
Copyright: 2012 by Florin POPENTIU-VLADICESCU
For two candidate models, A and B, could take

F j (t )  w1 F j (t )  w2 F j (t ), w1  w2  1 , as a meta-predictor ; similarly for


A B

more than two.


A heuristic algorithm for this method consists of the following
activities (steps):
1. Identify the candidate set of software reliability models to be
used.
2. Select the models that tend to cancel out in their biased
predictions (if any).
3. Selected models are equally weighted to form a linear
combination model.
4. The arithmetic average (mean value) – ELC- of the
predictions, resulting from the selected models or their
middle prediction value (median value) - MLC, is used as the
prediction of the linear combination (LC) of models. Other
ideas can be used as described below.
The following ideas can be used in the fourth step:
 ELC - Equally-Weighted LC Model (Statically-Weighted model)
 MLC - Median-Oriented LC Model (Statically-
Determined/Dynamically-Assigned)
 ULC - Unequally-Weighted LC Model (Statically-
Determined/Dynamically-Assigned)
 DLC - Dynamically-Weighted LC Model (Dynamically
Determined and Assigned)
Let considers the case of three models identified by P, M,
respective O, i.e. the pessimistic, median, respective optimistic
prediction model. For the ELC case, the weights can be equal to 1/3. The

199
Copyright: 2012 by Florin POPENTIU-VLADICESCU
model is 0P+1M+0O for the MLC case, 1/6P+4/6M+1/6O is an example
of ULC model. For DLC models, weighting is determined by dynamically
calculating the posterior prequential likelihood , or other measure, of
each model as a meta-predictor. Changes can be observed over windows
of varying size and type. Fixed or sliding windows can be used.

Fig.6.6. Windows based strategy for weighting determination – DLC case

For the strategy described in Fig. 6.6, the weights can vary over
time. For a prediction at stage i, i.e. of ti, let {wk} take values that
maximise the PL of the combined predictor over previous predictions, i.

e. max  F j (t )  max  ( w1 F j (t )  w2 F j (t )), w1  w2  1 , in the case of


i 1 i 1 A B

j 1 j 1

two components.
Other schemes related to project criteria and user judgment, or
hybrid combinations can be used. Good practice recommends those
models that predictive validity has been observed in previous
investigations, and represents different categories of models. It is
important to select the models after a checking of the limit conditions
for the model to be used in order to avoid the meaningless consumption
of time.

200
Copyright: 2012 by Florin POPENTIU-VLADICESCU
6.5 Recalibrating Software Reliability Models

This method was introduced by Brocklehurst, Chan, Littlewood &


Snell (NASA-CR-166407), and can be summarized as follows.
The relation between true distribution Fi(t) of the random

variable Ti, and the predicted one, F i (t ) , can be represented through a

relation function Gi as Fi(t) = Gi( F i (t ) ), where Gi is only slowly changing


function with i. Since Gi is not known, it will be approximated with an
estimate G* which will lead to a new prediction

Fi* (t )  Gi* ( F i (t )) .

This technique recalibrates the raw model output Fi(t) related to the
accuracy of past predictions.
The mentioned authors use u-Plot and two models for the
transformation Gi: a polygon, respective a spline.
The polygonal approach is described in the following:
1. Use the y-Plot to make sure that the error in previous
predictions is approximately stationary.
2. Construct the polygon G* by joining up the steps of the u-Plot
formed by predictions before Ti (Fig. 6.7).

3. Get the raw prediction F i (t ) by using the basic prediction


system.
4. Recalibrate the raw predictions in order to obtain Fi*(t).

The procedure described above produces a genuine prediction


system: at each stage only past observations are used to make
predictions about the unobserved future failure behaviour.

201
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig.6.7. Drawing the polygon Gi* (Brocklehurst et al., 1990)

Since Gi* is a polygon, its derivative gi* is discontinuous. This


means that Fi* is also discontinuous.
According to Brocklehurst et al., , this discontinuity
generally causes PL to report badly on the predictive accuracy of a
recalibrated model in competition with the raw version […] It is
therefore perfectly possible for PL to reject a recalibrated prediction
system in favour of the raw version, even when the recalibrated
(probability) predictions are the most accurate.
To eliminate the discontinuity problem, the usage of polynomial
splines is proposed.

202
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7 SOFTWARE RELIABILITY TOOLS

According to the study realized by (Chen et al., TENCON06), there


are specific software tools, called Computer Aided Software Reliability
Engineering, many of them being dedicated to estimation of software
reliability, such as:
 ROBUST(Reliability of Basic and Ultrareliable Software sysTem).
According to the authors (official website for download:
http://www.cs.colostate.edu/testing/robust/ , ROBUST is an
integrated tool intended to help managers make intelligent
decisions about the reliability of software being developed, and
to assist researchers in further investigating software reliability.
ROBUST allows users to model data using traditional SRGMs or
Malaiya et al's coverage based model, and supports numerous
enhancement techniques to make these models more accurate.
Because reliability estimates and models often have to be made
before testing starts or has accumulated enough failure data for
accurate results, ROBUST also supports the use of static methods
for early defect density estimation.
 Frestimate (http://www.softrel.com/prod01.htm)
 TERSE (The TERSE tool [Dept. of Comput. Sci., Purdue Univ., West
Lafayette] consists of 6 components, namely, a user interface, a
flow graph generator, a parameter generator, a testing simulator,
a reliability estimator, and a library with a store of existing
models).

203
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 SREPT (Software Reliability Estimation and Prediction Tool
supporting: the discrete-event simulation, the architecture-based
approach, composite models. Follow the link to download the
SREPT software http://shannon.ee.duke.edu/tools/srept/).
 SRETOOLS (AT&T Software Reliability Engineering Toolkit:
http://www.refis.nl/producten/tool.php?uk=1).
 SRMP (Statistical Modeling and Reliability Program: The SRMP
was developed by the Reliability and Statistical Consultants,
Limited of UK in 1988).
 SoRel (Software Reliability Program) is a Macintosh-based
reliability measurement tool that was developed by LAAS, a lab of
the National Center for Scientific Research in France, in 1991.
SoRel is composed of two parts: The first part allows several
reliability trend tests. These tests allow us to identify whether the
reliability function is increasing or decreasing so that an
appropriate model can be applied. The second part allows
reliability growth model application and contains 4 models. Only
one model is executed at a time.
 CASRE (Computer-Aided Software Reliability Estimation Tool) is
a PC-based tool that was developed in 1993 by the Jet Propulsion
Laboratories to address the ease-of-use issues of other tools.
CASRE has a pull-down, menu-driven user interface and uses the
same model library as the SMERFS tool with the additional
feature of allowing linear combinations of models to create new
ones at the user s discretion. Follow the link to download the
software: http://johnmusa.com/CASRE.htm.

204
Copyright: 2012 by Florin POPENTIU-VLADICESCU
 ESTM (Economic Stop Testing Model Tool) is a Bellcore system
that can be used (in command-line mode) to help decide when to
stop testing a large software system which changes over time.
The package determines the optimal stopping time using a birth-
death model for the introduction of faults, and an economic
model to measure the trade-off between shipping and further
testing.
 SMERFS (Statistical Modeling and Estimation of Software
Reliability Functions) is a program for estimating and predicting
the reliability of software during the testing phase. It uses failure
count information to make these predictions. The following
Interval Data Models are covered: Brooks and Motley s Binomial
Model, Brooks and Motley s Poisson Model, Generalized Poisson
Model (2 versions), Non-homogeneous Poisson Model,
Schneidewind s Model treatments , and Yamada s S-Shaped
Model. Also, SMERFS support the following Time Between Failure
Models: Geometric Model, Jelinski / Moranda Model, Littlewood
and Verrall Linear Model, Littlewood and Verrall Quadratic
Model, Musa s Basic Model, Musa s Logarithmic Model, Non-
homogeneous Poisson Model.
 RGA (Software for Repairable System and Reliability Growth
Analysis: http://rga.reliasoft.com/) is a software tool that allows
the engineers to apply reliability growth models to analyze data
from both developmental testing and fielded repairable systems.
 DACS s GOEL An automated version of the Goel - Okumoto NHPP
Software Reliability Growth Model) software runs on an IBM-PC
or compatible under MS-DOS 2.11 or higher, and is distributed by

205
Copyright: 2012 by Florin POPENTIU-VLADICESCU
DACS (Data & Analysis Center for Software) Rome, NY. This tool
also provides the Kolmogorov-Smirnov statistic and different
estimations about fault and failure
 CARATS - computer-aided reliability assessment tool for software
based on object-oriented analysis and design (the main
characteristics can be found in the paper available at UCLA:
http://www.seas.ucla.edu/~chienchi/TENCON06_CCChen.pdf).

In the following we describe the characteristics, functionalities


and various ways of usage provided by some popular software: CASRE,
SREPT and SRE tool.

7.1. CASRE

7.1.1 Main characteristics

As mentioned by A.P.Nikora in CASRE user s guide, the modeling


and model applicability analysis capabilities of CASRE are provided by
the public-domain software reliability package SMERFS (Statistical
Modeling and Estimation of Reliability Functions for Software .
However, the graphical user interface and the combination of models
are specific to CASRE.
CASRE has three windows. The MAIN WINDOW [File (Open, Save,
Save as, Setup printer, Print, and Exit), Edit (Change data type, External
application: Notepad, and Escape to DOS), Filters (Shaping and scaling:
scalling and offset, power, logarithmic, and exponentiation; change time
units: smoothing, subset data), Trend (running average, Laplace test,
undo trend test), Model (Select models, define combinations,
edit/remove models, parameter estimation, select data range, and

206
Copyright: 2012 by Florin POPENTIU-VLADICESCU
predictions), Setup (Add application, Remove application, Remember
most recent files, GOF significance, Laplace test significance), Plot
(create plot window), and Help (Help index, keys help, help for help,
about)] appears when CASRE starts after user invocation, and contains a
text representation of the failure data used as input to the models. The
second window is called the GRAPHIC DISPLAY WINDOW [Plot, Results,
Display, Settings, Copy, and Help], in which graphical representations of
the failure data as well as modeling results are displayed. The third
window is dedicated to the modeling results presented in tabular
format, and is called MODEL RESULTS TABLE [File, Results, and Help].
The items shown in this table include model parameters, model
estimates, model applicability statistics, and the residuals.
Most options are self-explaining. Let us detail only on some
entries, as described in CASRE user s guide.
The option Change data type permits the changing of failure
data from time between failures (TBF) to failure counts (FC) or vice
versa. Some of the models built into CASRE accept only TBF data, while
the others accept only FC data. This menu item is included to allow use
of both types of models on the same set of failure data.
The shaping and scaling transformations allow the shape of a
curve drawn through the failure data to be changed. The filters are
implemented such that the output remains in the first quadrant of the
plot shown in the graphic display window (to prevent filters from
producing physically meaningless results. The following
transformations are possible: linear (multiply by a scaling factor and add
an offset), power (multiply by a scale factor and then raise to a user-
specified power), logarithmic (apply a linear transformation and then
take the natural logarithm), and exponentiation (initially, applies a linear

207
Copyright: 2012 by Florin POPENTIU-VLADICESCU
transformation and then raises the base of natural logarithms to the
result).

Fig. 7.1 CASRE – Filters menu


Filtering operations allow the removing of failure data noise
(smoothing by triangular moving average), the changing of the time
units (seconds, minutes, hours, days, weeks, months, or years) of the
failure data, forming subsets of the failure data, or removing the
previously applied operations.
The Trend menu permits the running of two different trend
tests against the failure history data to check if it exhibits reliability
growth. If the data exhibits reliability growth, software reliability
models can be applied to the data, otherwise software reliability models
should not be applied. Running average approach and Laplace test are
provided by CASRE. The running average of the time between successive
failures for time between failures data, or the running average of
number of failures per interval for failure count data are computed. For

208
Copyright: 2012 by Florin POPENTIU-VLADICESCU
failure count data, this test is available only if the test intervals are of
equal length. For time between failures, if the running average increases
with failure number, this indicates reliability growth. For failure count
data, if the running average decreases with time (fewer failures per test
interval), reliability growth is indicated. The results of the test are
shown in the main window and the graphic display window, as Nikora
describe in the CASRE user s guide. The null hypothesis for the Laplace
test is that occurrences of failures can be described as a homogeneous
Poisson process. The test is applied according to the following rules:
1) If the test statistic decreases with increasing failure number
(test interval number), then the null hypothesis can be
rejected in favor of reliability growth at an appropriate
significance level.
2) If the test statistic increases with failure number (test interval
number), then the null hypothesis can be rejected in favor of
decreasing reliability.

The "Model" menu lets the software reliability engineer to select


one or more software reliability models to run, or to define combination
models and store them as part of CASRE's configuration. More options
can be chosen: modeling ranges, how far into the future predictions
should be made, and parameter estimation methods.

7.1.2 SRE by CASRE

The usage of CASRE for a real project is based on the following


procedure:

209
Copyright: 2012 by Florin POPENTIU-VLADICESCU
1. Create a set of failure data (indicated, but it can be created from
CASRE, later by using an external application).
2. Start CASRE.
3. Open a set of failure data, if already created.
4. Use an external application to change the failure data, or to create a
new set of failure data.
5. Apply filters and smoothing operations to the data.
6. Apply trend tests to the failure data to determine whether or not
software reliability models should be applied.
7. Apply models, including combinations of models, to the failure data.
8. View the model outputs.
9. Print failure data and model results.
10. Save failure data and model results to disk, and EXIT.

However, before starting, the SRE engineer may have to setup some
options of CASRE, like:
A. Add external applications and combination model definitions to the
CASRE configuration, or remove selected items from the CASRE
configuration.
B. Set significance levels for goodness-of-fit tests and trend tests.
C. Set the length of the list of most recently used files that is kept in the
File menu.

Only the most important steps, from above procedure, are


describe in the following: creating files of failure data, and applying
models. Any other steps are easy to follow.

210
Copyright: 2012 by Florin POPENTIU-VLADICESCU
For time between failures data, the data file format is as follows:
1. The first line represents the time units for the data file. A choice of
seven keywords is possible: Seconds, Minutes, Hours, Days, Weeks,
Months, and Years. This specifies the time units for the times between
successive failures given in the file.
2. The second line in the file (the first line of actual failure data)
represents the first failure observed the third line represents the second
failure observed, and so forth.
3. For every line starting with the second (the first failure occurrence)
the following fields are required:
3.1. The first column is the current failure number.
3.2. The second column represents the time that has passed since the
last failure was observed. The values in the second column are
measured in the time units given in the first line of the file.
3.3. The third column indicates the severity of the failure on a scale of 1
to 9. CASRE assigns no meaning to these severity categories; 1 could
represent the least serious type of failure just as easily as it could
represent the most serious type.

For failure count data, the format is as follows:


1. The first line of the file specifies the time units for that file, as for
times between failures data.
2. The following applies to the second and subsequent lines of the data
file:
2.1. The first column gives a sequential test interval number.
2.2. The second column specifies the number of failures that were
observed during a given test interval.

211
Copyright: 2012 by Florin POPENTIU-VLADICESCU
2.3. The third column gives the length of the test interval. Test interval
lengths do not have to be equal. They are measured in the time units
given in the first line.
2.4. The fourth column specifies the severity of the failures that were
observed during a given test interval, on a scale of 1 to 9. If no failures
were seen in an interval, a failure count of 0 and a severity of 0 should
be used. CASRE will show the severity as "N/A". Also, CASRE assigns no
meaning to severities.
2.5. For failure count data, one test interval can occupy more than one
line in the data file, depending on the maximum level of severity of
failures described.

Fig. 7.2. Cumulative time between failures

212
Copyright: 2012 by Florin POPENTIU-VLADICESCU
After loading the input file contained failure data, the following
options are available, depending on the file structure (Fig. 7.2, Table
7.1):
Table 7.1. Available plot options
Times between Failures Data Failure Counts Data
Failure counts
Test interval lengths
Failure Intensity Failure Intensity
Times between failures Times between failures
Cumulative number of failures Cumulative number of failures
Cumulative number of failures Cumulative number of failures
from modeling start point from modeling start point
Reliability Reliability

Once a failure data file has been opened, one or more reliability
models can be applied to it. Before running models, there are three
model controls necessary to be select:

1. The type of parameter estimation method to use (according to Table


7.2). To change the parameter estimation method, select the "Model"
menu's "Parameter estimation" item, which will bring up a dialog box
having a radio button beside the name of the most-recently chosen
estimation method.
2. The range of failure data over which the models will be applied.
3. The number of steps into the future for which the models should
make predictions.

213
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Table 7.2. Available parameter estimation methods
Model Name Selected Estimation Method
Maximum Least
Likelihood Squares
Geometric ML LS
Jelinski-Moranda ML LS
Littlewood-Verrall Linear ML LS
Littlewood-Verrall Quadratic ML LS
Musa Basic ML ML
Musa-Okumoto ML ML
Nonhomogeneous Poisson Process (for ML ML
time between failures)
Generalized Poisson ML LS
Generalized Poisson – user selected ML LS
interval weights
Shick-Wolverton ML LS
Nonhomogeneous Poisson process (for ML LS
failure counts)
Schneidewind ML ML
Schneidewind – )gnore first s ML ML
intervals
Schneidewind – Total failures in first ML ML
s intervals
Yamada S-Shaped ML ML

214
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.1.3 CASRE- Project

Let us use the data set S1.dat from the folder CASREV3/DATA,
and do some tasks. The reader is invited to apply the approaches
presented above for more data sets provided by CASRE, and for other
data sets available online.
Firstly, use plot options to display Time between failures plot for
some software reliability models. In Fig. 7.3 are shown the results on
models: Geometric, Jelinski-Moranda, and Linear Littlewood-Verral.

Fig. 7.3. CASRE Plot- Time between failures (s1.dat)

The u-plot based analyze is presented in Fig. 7.4. The Geometric model
proves to be closed to the diagonal of the unit square (unbiased).

215
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 7.4. CASRE: u-plot (s1.data)
When analysing the Bias trend (y-plot) we found the results from Fig.
7.5.

Fig. 7.4. CASRE: y-plot

216
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Next step consists of doing combinations and compare against the initial
models (components). Fig.7.5. shows the setup procedure for a
combination, and Fig. 7.6. gives the results after running the models.

Fig. 7.5. CASRE – defining a combination (labeled by SLC_20)

Fig. 7.6. CASRE – Compare SLC_20 against its components

217
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Finally the SRE engineer can use four permanent combination
models built into CASRE – these combinations are for time between
failures data only.
These combinations are:

• Dynamically-Weighted Linear Combination (DLC/S/4)


• Equally-Weighted Linear Combination (ELC)
• Median-Weighted Linear Combination (MLC)
• Unequally-Weighted Linear Combination (ULC)

Each of these combinations is made up of the following component


models:

• Nonhomogeneous Poisson Process N(PP


• Musa-Okumoto (MO)
• Littlewood-Verrall Quadratic (LVQ)

The weightings for the combination models are as follows:

• ELC = / N(PP + / MO + / LVQ


• MLC = whichever model s prediction is between the other two.
• ULC = / Pessimistic + / Median + / Optimistic

where Pessimistic denotes the model whose predictions of time to next


failure are the lowest, Optimistic denotes the model whose predictions of
time to next failure are the largest, and Median denotes whose
predictions are between the Optimistic and Pessimistic models.

218
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.2 SREPT

7.2.1 Main characteristics

SREPT permits the early prediction of software fault content (by


the module Product/process metrics, the analyses using inter failure
times and coverage information from testing by ENHPP (Enhanced
non-homogeneous Poisson process), and the analysis of the effect of
debugging policies by a specific module.
The ENHPP module also allows the input data to consist of the
inter failure times and coverage information from testing. This coverage
information from actual measurements will yield better estimates than
before. The coverage information may be either in the form of a function
(a distribution function, implying that it is a function of time t ,
approaching . as t approaches infinity, and at t = , or in the
form of coverage values.
SREPT reports the Kolmogorov distances for each model based
on the u-plot (Bias test) and also the sum of square errors (Goodness of
fit test).
Plots of Mean value function , Failure intensity Faults
remaining , Conditional reliability , Estimated coverage can be
obtained by choosing Plots from the main menu. Additional
information can be found in the SREPT guide.
The Debugging policies module can be used to study the effect
of non-zero fault removal times as well as different debugging policies
on the observed failure intensity and the number of faults remaining.

219
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The SREPT architecture is shown in Fig. 7.7, according to Ramani
et al (2000).

Fig. 7.7. SREPT Architecture

SREPT supports the following two approaches to software


reliability prediction: the black box - based and the architecture-based
approaches. More details on SREPT tools are described in the next
subsection from the SRE point of view.

220
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.2.2 SRE by SREPT

Black-box based approaches treat the software as a whole


without considering its internal structure. The following measures can
be obtained to aid in black-box predictions by SREPT:
 Software product/process metrics - these include the number of
lines of code, number of decisions, loops, mean length of
variable names and other static attributes of the code or
characteristics of the process (example skill level of the
development team, criteria to stop testing)
 Test coverage – this is defined to be the ratio of potential fault-
sites exercised, or executed, by test cases divided by the total
number of potential fault sites under consideration. A potential
fault-site is defined as any structurally or functionally described
program element whose integrity may require verification and
validation via an appropriately designed test. Thus a potential
fault-site could be a statement, a branch, etc.
 Interfailure times data - this refers to the observed times between
failures when the software is being tested. Interfailure times data
obtained from the testing phase can be used to parameterize the
ENHPP (Enhanced Non-Homogeneous Poisson Process) model to
obtain the following estimates for the software: failure intensity,
number of faults remaining, conditional reliability and test
coverage. According to this model the expected number of faults
detected by time t, called the mean value function, m(t) is of
the form m(t)=a*c(t) where a is the expected number of faults in
the software (before testing/debugging begins) and c(t) is the

221
Copyright: 2012 by Florin POPENTIU-VLADICESCU
coverage function. The available models are: Exponential (GO)
model, Weibull, S-shaped, and Log-logistic.
Release criteria could be of the following types, according to
Ramani et al (2000):
(1) Number of remaining faults - In this case the release time is
when a fraction f of all detectable faults has been removed;
(2) Failure intensity requirements – The software should be
released when the failure intensity measured at the end of
the development test phase reaches a specified value.
(3) Reliability requirements - This criteria could be used to
specify that the required conditional reliability in the
operational phase is say R* at time t* after product release.
(4) Cost requirements – From a knowledge of the expected cost of
removing a fault during testing, the expected cost of
removing a fault during operation and the expected cost of
software testing per unit time the total cost can be
computed. The release time is obtained by determining the
time that minimizes this total cost.
(5) Availability requirements – Assuming that the meantime to
repair is the time needed to reboot and that there is no
reliability growth the release time can be estimated based
on an operational availability requirement.
Architecture-based approaches use the internal control structure
of the software to predict the reliability of software, according to
Ramani et al (2000). This assumes added significance in the evaluation
of the reliability of modern software systems which are not monolithic
entities but are likely to be made up of several modules distributed

222
Copyright: 2012 by Florin POPENTIU-VLADICESCU
globally. SREPT allows prediction of the reliability of software based on:
architecture of the software, and failure behaviour.
Running the fault density engine the results described in Fig. 7.8,
on the data file SHARPE.LOC.

Fig. 7.8. SREPT - Fault density engine


When the type of input data chosen is Use inter failure times data only ,
the SRE engineer should specify the file containing the inter failure
times. This file should contain one inter failure time per line. Thus the
format is:
Inter failure time
The models covered by SREPT are shown in Fig. 7.9, together
with the u-plot for all models applied for the data from file DRAPER.DAT.
The ENHPP engine is illustrated in Fig. 7.10, and finally, Fig. 7.11 shows
de debug policies engine.
The Debugging policies module can be used to study the effect
of non-zero fault removal times as well as different debugging policies

223
Copyright: 2012 by Florin POPENTIU-VLADICESCU
on the observed failure intensity and the number of faults remaining.
The user has to specify a Truncation level , at which the infinite state
space of the underlying NHPP (non-homogeneous Poisson process) is
truncated. This is necessary for numerical solution.
The user also needs to supply a failure intensity function. This
again can come from the ENHPP module. SREPT reports the coverage
function fitted (c(t)). If a is the expected total number of faults (given
infinite testing time), which is one of the parameters estimated for all
the coverage functions fitted, then the failure intensity is a*c t where
c t is the derivative of c t with respect to time t. The interface ensures
that the failure intensity is of the form a*c t , by having separate entry
fields for a and c t .

Fig. 7.9. SREPT – u-plot analyses on DRAPER.DAT data

224
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 7.10. SREPT - ENHPP engine (results for DRAPER data)

Fig. 7.11 SREPT – The debugging policies

225
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.2.3 SREPT-Project

Firstly, let us use the COBRA.DAT file from SREPT\DATA folder. Running
the ENHPP engine, we obtained the u-plot analysis (Fig. 7.12) and
numerical results during computational process.

Fig. 7.12. SREPT – u-plot analysis (COBRA DATA)


The following information was generated during SRE analysis on COBRA
data:
Determining the best coverage model for the input data...
Kolmogorov distances from u-plot -
Exponential (GO) coverage model = 0.33494709374581316
Weibull coverage model = 0.3412535154195569
S-shaped coverage model = 0.33128819562824624
Log-logistic coverage model = 0.37866011725047044

The best coverage model for the input data based on


Kolmogorov
distance from u-plot is -
S-shaped coverage model

226
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Generating the u-plots. Please wait...
Done!

Sum of square errors -


Exponential (GO) coverage model = 79.5045468923137
Weibull coverage model = 79.37391836515881
S-shaped coverage model = 75.38712880251684
Log-logistic coverage model = 275.01371325171095

The best coverage model for the input data based on sum of
square errors is -
S-shaped coverage model

Let try the same analysis on SREPT\SEAWOLF data file. The


results a given in Fig. 7.14 (u-plot), and in text format (the run-time
dialog).

Fig. 7.14. SREPT – u-plot analysis on SEAWOLF data

227
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Determining the best coverage model for the input data...

Kolmogorov distances from u-plot -


Exponential (GO) coverage model = 0.1342199846015093
Weibull coverage model = 0.13109317201794313
S-shaped coverage model = 0.1025106023083946
Log-logistic coverage model = 0.338038113063996

The best coverage model for the input data based on


Kolmogorov distance from u-plot is -
S-shaped coverage model

Generating the u-plots. Please wait...


Done!

Sum of square errors -


Exponential (GO) coverage model = 132.95721071467761
Weibull coverage model = 91.1604053573391
S-shaped coverage model = 67.99762926784372
Log-logistic coverage model = 396.48970881784

The best coverage model for the input data based on sum of
square errors is -
S-shaped coverage model

Based on both Kolmogorov distance and the sum of square


errors, the S-shaped coverage model has the best behaviour for
SEAWOLF data set.

228
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.3 SREtool

7.3.1 Main characteristics

SRE Tool can import existing input data and identify the format
by the service Try to Autodetect Data Type . Data can be filtered and
transformed (Fig. 7.14), used in various analyses, and exported if
necessary. The Data menu provides this kind of functionality. The
Graphics menu is specialized in plotting results, exporting plots, and
copying to clipboard the current plot.

Fig. 7.14. SRE Tool - Transforming data

Data analysis is possible by accessing Analysis menu, having as


entries: Model Selection Wizard, Data Type (Grouped, Ungrouped, Time
Between Failures, Cumulative times), Trend Tests (Laplace, Military
Handbook 189) and Trend Plots (Running Average, TTT), Analyse
Models, Validate Models, Model Predictions, and Plot Fitted Models. SRE
Tool provides also help to the SRE engineer both about SRE and the
software Tool.

229
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.3.2 SRE by SREtool

One of the most interesting functionalities of SREtool is the Model


Selection Wizard (Fig. 7.15.). This component helps the SRE engineer to
select those models with the large score related to the assumptions
validation. In the dialog window is a list of common software reliability
assumptions. These assumptions are known from the literature and they
are used in different software reliability models. The most relevant
models are listed on the right-hand side of the dialog follow by a column
called Score. Every assumption has a score associated to every model.
After selecting the assumptions, the models receive points and they are
sorted by relevance. The higher score a model has, the better the model
will fit the assumptions.

Fig. 7.15. SREtool – Model Selection Wizard


The procedure to prepare the analysis has four steps. First, select
the model(s) to fit the data. Then, set the confidence level used to
calculate confidence intervals (by default set to 95%) and the

230
Copyright: 2012 by Florin POPENTIU-VLADICESCU
significance level to perform hypothesis testing (by default set to 1%).
Last, choose between the maximum-likelihood or least squares methods
for parameter estimation (see Table 7.2). To perform the analysis, click
on Analyse Model button. Subsequently, the analysis output shows the
data type, the model choices, the parameter estimates and the result of
the Kolmogorov goodness-of-fit test. The data window presents the
fitted values for the selected models.
For the data set s1.dat, used with CASRE, the results obtained by
Military Handbook 189 trend test are given in Fig. 7.16., and the Running
Average plot is presented in Fig. 7.17. Model analysis is shown in Fig.
7.18, and the Reliability Plot is generated in Fig. 7.19.

Military Handbook 189 Trend Test - casre_S1_tbf.xls

500

400

300

200

100

Test Statistic
0 Lower confidence lim
Upper confidence lim

0 20 40 60 80 100 120 140

Observation Number

Fig. 7.16. SRE tool - Military Handbook 189 Trend Test

231
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Running Average Plot Trend Plot - casre_S1_tbf.xls

600

Running Average of the times between failures


500

400

300

200

100

0 20 40 60 80 100 120 140

Failure number

Fig. 7.17. Running Average Plot – Trend test

232
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fitted Models - casre_S1_tbf.xls

140

120
Cumulative Number of Failures

100

80

60

40

20

Cumulative Number of Failu


Jelinski-Moranda
Goel-Okumoto
0 Duane (power law)

0 20000 40000 60000 80000

Cumulative Failure Times

Fig. 7.18. SRE Tool – Model Analysis

233
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Reliability Plot

1.0

0.8
Reliability for interval of length 656.9

0.6

0.4

0.2

Estimated reliabi
Jelinski-Moranda
0.0 Goel-Okumoto
Duane (power law

0 20000 40000 60000 80000

Cumulative Failure Times

Fig. 7.19. Reliability Plot (s1.dat)

The graphical output shows the observed values (circles) and the
estimated model (curve).

7.3.3 SRE tool - Project

Firstly create an .xls file containing the time between failure data.
We use s3.dat from CASREV3\DATA having 207 observations. The file
was imported and used as input to data analysis. The results of Model
Analysis task is given in Fig. . .

234
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fitted Models - S3.xls

200
Cumulative Number of Failures

150

100

50

Cumulative Number of Failu


Jelinski-Moranda
Goel-Okumoto
0 Duane (power law)

0 5000 10000 15000

Cumulative Failure Times

Fig. 7.20. SRE Tool - Model Analysis (S3.xls)

The numerical output is given below:

Data Set: S3.xls

Nr. of Observations: 207

Variable: Cumulative Times

Significance level used in hypothesis testing: 1%

Confidence level used in intervals: 95%

235
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Model: Jelinski-Moranda
Model estimates:
N = 254.000, Upper 95% confidence limit: 303.000

λ = 0.000, Upper 95% confidence limit: 0.000

Model: Goel-Okumoto
Model estimates:
a = 256.625, Upper 95% confidence limit: 295.827

b = 0.000, Upper 95% confidence limit: 0.000

Model: Duane (power law)


Model estimates:
a = 0.288, Upper 95% confidence limit: 0.617

b = 0.677, Upper 95% confidence limit: 0.752

Applying the Military Handbook 189 Trend Test, the numerical


results are:
: 1%

Test statistic: 611.941

Lower Critical Value: 341.818

Upper Critical Value: 489.691

P-Value: 0.000

236
Copyright: 2012 by Florin POPENTIU-VLADICESCU
The test statistic is large, which indicates that there is a
significant growth in reliability. The test indicates that there is a
significant growth in reliability since observation 1. The graphical
output is given in Fig. 7.21.

Military Handbook 189 Trend Test - S3.xls

600

500

400

300

200

100

Test Statistic
0 Lower confidence lim
Upper confidence lim

0 50 100 150 200

Observation Number

Fig. 7.21. Graphical output for MH 189 - Trend Test (S3.xls)

237
Copyright: 2012 by Florin POPENTIU-VLADICESCU
7.4 Learned Lessons and conclusions

Every machine we use is based on a microprocessor and software


is installed to put the engine at work. Software reliability is rarely of
concern to most people until something goes wrong. Physical system
components deteriorate over time, while software does not. However,
unlike a human operator, software does not adapt well to situations
which were not anticipated by its designers, and such failures can prove
enormously costly. Since microcomputers appeared a new era was
started and nowadays computers are involved in every aspect of our
lives. Earlier, and in recent time, the engineers experienced data loss,
accidents, large costs, and disasters due to bad practices in software
engineering. The main causes of failure were undisciplined management
of requirements, imprecise and ambiguous communication, unstable
architectures, high complexities, inconsistency of requirements with
design and implementation, low automation, and insufficient
verification and validation. Developers, users and military organizations
are often concerned about the reliability of systems that include
software.
Over the years, reliability engineers have developed detailed and
elaborate methods of estimating the reliability of hardware systems
based on an estimate of the reliability of their components. Software can
be viewed as one of those components, and an estimate of the reliability
of software is considered essential to estimating the reliability of the
overall system.

238
Copyright: 2012 by Florin POPENTIU-VLADICESCU
It was necessary a new discipline, nowadays called, software
reliability engineering, to increase the dependability of computer-based
systems.
Also, software engineers developed important methodologies to
increase the software reliability by formal verification of software
requirements, reliable programming techniques, and adequate testing
strategies.
In the following we review the most important ideas presented in
this course in order to outline best practice useful to software engineers,
software reliability engineers, software managers, and ICT managers.
Understanding the triad error-fault (bug) – failure is important
both to software engineers and software reliability engineers. For
complex distributed software the software engineer can experience
some unusual software bugs more difficult to understand and repair
than ordinary software bugs, according to (Madsen et al 2006):
heisenbugs, mandelbugs, schroedinbugs, and agingbugs.
This is why terminology is essential to all actors involved in
software development, reliability modeling and analysis, software risk
management, and ICT project management.
The seven general reliability assessment requirements play an
important role in deploying highly reliable software. Software reliability
can be achieved by using many sources of information, and will be
effective by using various software reliability engineering approaches.
There are some conflicting views on software quality concerning
the correctness (a program is either correct or incorrect) and the
reliability under uncertainty factors (measured as the relative frequency
of the times that the program performs without failure).

239
Copyright: 2012 by Florin POPENTIU-VLADICESCU
During software operation, time of failure, time interval between
failures, cumulative failures experienced up to a given time, and failures
experienced in a time interval are all random variables (that means
uncertainty is experienced) generated by the introduction of faults by
programmers which is an unpredictable process (the locations of faults
are unknown), and the working environment (including conditions of
running) of the software that can behave unpredictable, etc. Hence, the
usage of probability distribution functions, stochastic processes, and
statistical methods for data analysis is required in software reliability
modeling, analyzing, and software reliability assessment.
The rejuvenation as a proactive fault management technique
increases only the availability which is a different concept from
reliability.
Reliability is improved when software faults which occur in the
most frequently used parts of the software are removed. Removing x%
of software faults will not necessarily lead to an x% reliability
improvement. In a study, removing 60% of software defects actually led
to a 3% reliability improvement. Hence, removing faults with serious
consequences is the most important objective.
The components reliability, the connections reliability, the
network reliability, the complex system reliability can be modelled in
specific ways and used in reliability optimization. In order to define the
necessary reliability it is necessary to define failure with severity
classes, to choose a common measure and set a failure intensity
objective (Musa, 1999).
Software testing and debugging represent some of the most
important components of the software engineering process with major
concerns on software reliability (Galen (2005)). Even, program testing

240
Copyright: 2012 by Florin POPENTIU-VLADICESCU
can be used to show the presence of bugs, but never to show their
absence , as Dahl et al. say, every test performed increases
intrinsic software quality by no fault reveals, or by discovering a latent
fault. Unit testing, the integration testing, and the acceptance testing are
core tasks in order to obtain highly reliable software accepted by
customers.
Computer-aided software engineering (CASE) refers to a set of
tools which are able to automate all or some of various tasks of the
software life cycle, like: requirements capture, tracing, and tracking;
configuration management; model verification; facilitation of model
validation; maintenance of all project-related information; collection
and reporting of project management data, document production and
CASE data import-export. However, software reliability is not covered,
in general, by CASE products. There are specific software tools, called
Computer Aided Software Reliability Engineering, many of them being
dedicated to estimation of software reliability. Some of them were
detailed above (CASRE, SREPT, SRE Tool). These tools useful in
reliability assessment are based on three time domains which are most
commonly used: calendar time, clock time, and execution time. Musa
(1999) proposed that all reliability theory should be based on execution
time, which is a better measure of the failure-inducing stress . Musa
recommends the usage of natural units or time units when define the
reliability quantities.
Manual (non-automated) and automated failure data collection
approaches are used. Automated data collection is not fully possible
even formal specification, and classification rules (applied to failures)
are used.

241
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Collected data (manually or automated) are sent to analysis
modules in order to find patterns and models, and to predict the system
behavior. Before choosing a model, tests on data a necessary to be
applied (for instance, a test to check the Poisson process assumption, a
trend test, etc).
A large number of software reliability growth models (SRGMs)
have been developed over the years, and each model has its strengths
and limitations. Also, there is a major difference between software
reliability prediction and software reliability estimation, i.e., the
prediction is performed based on historical data while estimations are
based on collected data. Predictions, by their nature, will almost
certainly be less accurate than estimations. However, they are useful for
improving the software reliability during the development process. If
the organization waits until collected data is available (normally during
testing), it will generally be too late to make substantial improvements
in software reliability. The predictions should be performed iteratively
during each phase of the life cycle and as collected data becomes
available the predictions should be refined to represent the software
product at hand.
Analysis collected data in a software reliability engineering
approach sometimes provides unsatisfactory results. Sometimes there is
dramatic disagreement between model predictions on the same data
source, there is no universally accurate model, and there is no way of
selecting a model a priori and being confident that it will be accurate on
a particular data source.
Due to the prediction triad, we could not separately examine
models and inference procedure because is too difficult: we are forced to

242
Copyright: 2012 by Florin POPENTIU-VLADICESCU
examine directly the accuracy of the different available models on each
data source and somehow decide which, if any, is giving accurate results.
When look to the software reliability models, the parameter
estimation, and results obtained by CASRE, SREPT or SRE Tool, we are
able to mention the following: The NHPP and Bayesian models seem to
perform almost as well as the best parametric models for most data sets
(Fig. 7.22). They seem robust: whilst the performance of the parametric
models varies considerably from one data set to another, these seem
fairly consistent. In most cases the best performing model is usually a
parametric one. Since we can select the best model via analysis of
predictive accuracy, it is still best to be eclectic in model choice and let
the data decide . Of course, some of these models are very
computationally intensive, but this is not a concern nowadays.
Finally, the following ideas should be pointed out:
1) The increase in software-based systems for safety functions
requires systematic evaluation of software reliability;
2) Direct observation of operational behaviour of the system (e.g.
in test or simulation) is not going to give assurance of ultra-high
reliability;
3) The presented techniques only work for modest reliability
levels;
4) There is no perfect model, and all models sometimes behave
inaccurate;
5) Recalibration often improves accuracy.
6) Software reliability estimation is still an unresolved issue and
existing approaches have limitations and assumptions that are not
acceptable for safety applications.

243
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Fig. 7.22. CASRE – case study (s3.dat from CASREV3\DATA\)

244
Copyright: 2012 by Florin POPENTIU-VLADICESCU
8 REFERENCES

Albeanu, G. & Popentiu-Vladicescu, Fl., On Designing Learning Objects


for a Software Reliability Engineering Course, The 7th International
Scientific Conference eLSE "eLearning and Software for Education", Vol
I, pp. 105-110, Bucharest, 2011, University Publishing House.
Baber R.L., Error-free Software. Know-how and Know-why of Program
Correctness, Wiley, 1991.
Barlow R.E., Proschan F., Asymptotic Theory of Total Time on Test
Processes, with Applications to Life Testing, Rep M355, U Florida, 1975.
Boon M.A.A., Brandt, E., Ramos, I.C., Di Bucchianico, A. & Henzen, R., A
New Statistical Software Reliability Tool, REFIS.NL
Brocklehurst et al., Recalibrating Software Reliability Models, IEEE
Transactions on Software Engineering, 16(4):458-470 (April 1990).
Catelani M., Ciani L., Scarano L., Bacioccola A., An automatic test for
the software reliability: the evaluation of the overflow due to memory
leaks as failure cause.
Chen C-C., Chu-Ti Lin, Hen-Hsen Huang, Shih-Wei Huang, and Chin-
Yu Huang, CARATS: A Computer-Aided Reliability Assessment Tool for
Software Based on Object-Oriented Design, TENCON 2006, IEEE Region
10 Conference, 2006,
Chillarege R., Orthogonal Defect Classification, Chapter 9,
http://www.cse.cuhk.edu.hk/~lyu/book/reliability/pdf/Chap_9.pdf
Churamani N., Using Orthogonal Defect Classification for Defect Analysis,
http://www.niit-tech.com/images/files/whitepaper/ODC_a_white_paper.pdf

245
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Corro Ramos, I., Statistical Procedures for Certification of Software
Systems, PhD Thesis, Eindhoven University of Technology, 2009,
Crowe D. Feinberg A. (eds.), Design for Reliability, CRC Press, 2001.
Dennison T.E., Fitting and Prediction Uncertainty for a Software
Reliability Model, Master's Thesis, Naval Postgraduate School, 1992.
Dotson K.J., Development of Confidence Limits by Pivotal Functions for
Estimating Software Reliability, NASA Technical Paper 2709, 1987.
Duane, J.T., Learning curve approach to reliability monitoring, IEEE
Transactions on Aerospace, vol. 2, no. 2, pp. 563–566, April 1964
EIFEL - An Introduction to Design by Contract™, http://www.eiffel.com/
developers/design_by_contract_in_detail.html
Einarsson Bo (ed.), Accuracy and Reliability in Scientific Computing,
SIAM, 2005.
Everett G.D., McLeod Jr. R., Software Testing. Testing Across the Entire
Software Development Life Cycle, Wiley, 2007.
Everett W., Keene S., Nikora A., Applying Software Reliability
Engineering in the 90's, 1998.
Farr, W.H., Smith O.D., Statistical Modeling and Estimation of Reliability
Functions for Software (SMERFS), NSWCDD TR 84-371, 1993.
Farr, W.H., A survey of Software Reliability Modeling and Estimation,
NSWC TR 82-171, 1983.
Goel, A.L. & Okumoto, K., Time-Dependent Error-Detection Rate Model
for Software Reliability and Other Performance Measures, IEEE
Transactions on Reliability, Vol. R-28, No. 3, August 1979.
Gokhale, S.S., Marinos, P.N., Trivedi, K.S., Important Milestones in
Software Reliability Modeling, Communications in Reliability,
Maintainability and Serviceability: An International Journal published by
SAE International, 1996.

246
Copyright: 2012 by Florin POPENTIU-VLADICESCU
GSAM4: Guidelines for Successful Acquisition and Management of
Software-Intensive Systems: Weapon Systems, Command and Control
Systems, Management Information Systems - Condensed Version 4.0
February 2003.
Gustafson D.A., Theory and Problems of Software Engineering, McGraw-
Hill, 2002.
Howe D., The Free On-line Dictionary of Computing, 2010.
Huber C., Limnios N., Mesbah M., Nikulin M., Mathematical Methods in
Survival Analysis, Reliability and Quality of Life, ISTE and Wiley, 2008.
IEEE. IEEE Std. 610.12-1990.
IEEE. IEEE Recommended Practice on Software Reliability, 1633/2008.
Ince D., An introduction to quality assurance and its implementation,
McGraw-Hill, 1994.
Jelinski, A. & Moranda, P., Software Reliability Research, in W.
Freiberfer, ed., Statistical Computer Performance Evaluation, Academic
Press, New York, NY, pp. 465-484, 1972.
Kan, S.H., Metrics and Models in Software Quality Engineering, Second
Edition, Addison Wesley, 2002.
Karanta I., Methods and problems of software reliability estimation, VTT,
2006.
Khatatneh K., Mustafa T., Software Reliability Modeling Using Soft
Computing Technique, European Journal of Scientific Research, 2009.
Kirkwood K., Bazzana G., A Software Reliability Toolkit, Measuring
program execution time, http://csapp.cs.cmu.edu/public/1e/public/
ch9-preview.pdf
Koziolek H., Operational Profiles for Software Reliability, Seminar
Dependability Engineering , 005.

247
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Lu, M., Brocklehurst, S., and Littlewood, B., Combination of predictions
obtained from different software reliability growth models, Proceedings
of the Tenth Annual Software Reliability Symposium, Denver, CO, pp.
24–33, June 1992.
Lyu, M.R., Nikora, A.P., CASRE - A Computer-Aided Software Reliability
Estimation Tool, Proceedings of 1992 Computer-Aided Software
Engineering Workshop (CASE92), Montreal, Quebec, Canada, July 6-10,
1992
Lyu M.R. (ed.), Handbook of software reliability engineering, McGraw-
Hill, 1996.
Lyu, M.R., Nikora, A.P., Farr W.H., A Systematic and Comprehensive Tool
for Software Reliability Modeling and Measurement, In Proceedings of
FTCS. 1993, 648-653.
Madsen, H., Thyregod, P., Burtschy, B., Popentiu, F. & Albeanu, G., On
using soft computing techniques in software reliability engineering, ESREL
2005, Vol. 2, pp. 1317-1323, A.A. Balkema Publisher, 27-30 June 2005
Tri City, Gdynia, Gdansk, Sopot, Poland.
Meeker W.Q., EscobarL., Statistical Methods for Reliability Data, Wiley,
1998.
Meier J.D., Carlos Farre, Prashant Bansode, Scott Barber, and
Dennis Rea, Performance Testing Guidance for Web Applications,
Microsoft Corporation, 2007.
Military Handbook. Reliabilitv Growth Management, MIL-HDBK-189. US
Department of Defense.
Musa, J.D., Iannino, A., and Okumoto, K., Software Reliability:
Measurement, Prediction, Application, New York: McGraw-Hill, 1987.
Musa, J. D., Operational Profiles in Software-Reliability Engineering. In:
IEEE Software, 10 (1993), 2, pp. 14–32.

248
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Musa, J.D., Software Reliability Engineering, McGraw-Hill, 1999.
Musa, J.D., Software Reliability Engineering: More Reliable Software
Faster and Cheaper, 2nd Edition, Author House, 2004.
Neufelder, A.M., The Facts about Predicting Software Defects and
Reliability, The Journal of the Reliability Analysis Center (RAC), Second
Quarter – 2002.
Naik K., Tripathy P., Software Testing and Quality Assurance. Theory and
Practice, Wiley, 2008.
Nicora A.P., Lyu M.R., An Experiment in Determining Software Reliability
Model Applicability. Proceedings of the Sixth International Symposium
on Software Reliability Engineering, Toulouse, France, October 24-27,
1995.
Ozekici S., Soyer R., Reliability of software with an operational profile,
European Journal of Operational Research 149 (2003) 459–474.
Pandian, R. C., Software Metrics—A Guide to Planning, Analysis, and
Application, CRC Press, 2004.
Park M.-G., Jung E.-Y., Software Reliability Growth Modeling and
Performance Evaluation in the Testing Phase with an Outlier Stage.
Pentti H., Helminen Atte, Failure Mode and defects Analysis of Software-
Basedautomation Systems, Stuk-Yto-Tr 190 / August 2002
Pham H., Pham Michelle, Software Reliability Models for Critical
Applications, Report EGG-2663, EG&G Idaho Inc., 1991.
Pham H., Software Reliability, Springer, 2000.
Pham H. (ed.), Handbook of Reliability Engineering, Springer, 2003.
Pullum Laura L., Software Fault Tolerance Techniques and
Implementation, Artech, 2001.

249
Copyright: 2012 by Florin POPENTIU-VLADICESCU
Sens, P, Popentiu, Fl. & Mantoiu, A., Software Reliability Forecasting for
Adapted Fault Tolerance Algorithms, ESREL'98 Conference, Springer
Verlag, 20-24 June 1998 Trondheim, Norway.
Shooman, M.L., Software Engineering: Design, Reliability, and
Management, New York McGraw-Hill Book Co., 1983.
Singpurwalla, N.D. and Wilson, S.P., Statistical Methods in Software
Engineering – Reliability and Risk, Springer-Verlag, New York, 1999.
Smith D.J., Reliability, Maintainability and Risk. Practical Methods for
Engineers (5th ed.), Reed Elsevier, 1997 (reprinted 2000).
SoftRel, LLC, A tour through Frestimate, http://www.softrel.com
/downloads/walkthru.pdf.
Taylor, An interim Report of Engineering Design, MIT, 1959.
Tian J., Software Quality EngineeringTesting, Quality Assurance, and
Quantifiable Improvement, IEEE, 2005.
Valido-Cabrera E., Software reliability methods, Technical University of
Madrid, 2006.
Vasanthi T., Arulmozhi G., A recursive method for reliability
computation of Moranda's Geometric software reliability model.
Wohlin C. and Körner U., Software Faults: Spreading, Detection and
Costs, Software Engineering Journal, Vol. 5, No. 1, pp. 33-42, 1990.
Xie M., Day Y.-S., Poh K.-L., Computing Systems Reliability. Models and
Analysis, Kluwer, 2004.
Yamada, S., Ohba, M., and Osaki, S., S-shaped reliability growth
modeling for software error detection, IEEE Transactions on Reliability,
vol. R-32, no. 5, pp. 475–478, Dec. 1983.

250
Copyright: 2012 by Florin POPENTIU-VLADICESCU

You might also like