Reliability Testing Strategy - Reliability in Software Engineering
Reliability Testing Strategy - Reliability in Software Engineering
net/publication/301896123
CITATIONS READS
0 2,218
1 author:
Kevin Taylor-Sakyi
Aston University
2 PUBLICATIONS 20 CITATIONS
SEE PROFILE
All content following this page was uploaded by Kevin Taylor-Sakyi on 13 June 2016.
Kevin Taylor-Sakyi
Aston University
Engineering & Applied Science
Birmingham, UK
Kevin.sakyi@gmail.com
www.kevintaylorsakyi.me
Abstract— This paper presents the core principles of reliability alerts a doctor of a patients heart condition is considered
in software engineering; outlining why reliability testing is critical to have failed if accurate information is not delivered to
and specifying the process of measuring reliability. The paper
provides insight for both novice and experts in the software the doctor within a specified time constraint; which in
engineering field for assessing failure intensity as well as predicting turn expresses a relatively low reliability of the system.
failure of software systems. Measurements are conducted by In 2014, the Heart Rhythm Society mentioned that
utilizing information from an operational profile to further approximately 600,000 individuals are implanted with
enhance a test plan and test cases – all of which this paper
demonstrates how to implement.
pacemakers each year globally [4]. If the reliability rate
of these systems is deemed to be 45% for 600,000
Keywords—Software Reliability Engineering; Testing Strategy; individuals, the chance of survival is a mere 45%,
Measuring Reliability; Test Plan; Test Case
contingent upon doctors not notified within a certain
timeframe.
I. INTRODUCTION
Amazon, aerospace, and healthcare systems – what is Reliability testing efforts can be said to predict
the underlying factor of these systems? The importance failures that are likely to happen in specified system
of their reliability! Prominence of software systems in operations, identifying areas of which faults that need the
this Information Age is becoming more vital for most efforts to fix. These are typically categorized into
operations within organizations; though these systems measurements, models, and methods of quality
demonstrate competitive advantages and the like, every improvements.
rose has its thorns. According to [1] it was estimated that
B. Why measure reliability?
less than 5% of testers were competent in in utilizing
models to predict software reliability in the late 1990s; Reliability has always been centered on computer
though this measurement is outdated it’s a good hardware, i.e. how durable is a component of a printer,
indication that a gray area in dealing with reliability in keyboard, etc. yet, the same cannot be said about
testing strategies is still evident. This report delivers software systems. Thus standards of measuring
reliability theory in relation to software engineering, how reliability of hardware cannot be utilized as they focus
it’s measured & calculated, and how to develop a test on “wearing out processes” [6], unlike software, which
plan to assess the reliability of a software system. does not erode or wear out.
II. SOFTWARE RELIABILITY Testing of software systems has had its fair share of
interest in industry, however there’s a gray area of
A. What is it? concern. The complexity of software systems increase so
Software reliability is “the probability of failure-free does the acceptable definition of reliability, presently not
operation of a computer program for a specific time in a commercially agreed upon with regards to measurement
specific environment” [2]. In other words, creating a test techniques. Testing of software, currently practiced in
that produces identical reliability measures results many developmental environments merely validates if a
repeatedly; failure meaning “the program in its system or product meets business requirements but not
functioning has not met user requirements in some way” how reliable the product is.
[3]. For example, a pacemaker monitoring software that
Software’s that are considered reliable based on the would not be considered an initiator of that variation;
software requirements engineering (SRE) process manually searching for inconsistencies within a set-time
cannot only save lives, but can also bring profits to frame however makes the doctor an initiator of this
organization such as Amazon; or increase the credibility variation). Figure 2 expresses possible initiators within
of a critical system such as an aerospace’s traffic this pacemaker system.
monitoring system. Proving the necessity of being a b) Creation of operations list: Operations are jobs
phase within developmental processes – separate from conducted within the software system – these are derived
standard software testing. This supplementary ad-hoc from system requirements (functional & non-
process does not replace current processes for testing funcational), diagrams (i.e. activity diagram), and
software, but allows precise decisions-making during discussions with the various user types. Involving
software development and allows “everyone more expected users occasionally highlights areas neglected
concretely aware of SR by focusing attention on it” [3]; during requirements gathering. Refer to Figure 3.
promoting means of reducing software developmental
and maintenance costs. c) Review operations list: Consists of amending list
to ensure high probability, should consider view points
C. Engineering Process of experts within initiators. Resulting in merging
In ensuring reliability of a system a systematic opeartions to allow a system test or partitioning of
approach must be followed to ensure a safe, correct, and operations to permit selective testing resources.
functional software that meets the operational aspects of d) Determining occurence rates: “The number of
usability and user-friendliness [3]. The engineering operations divided by the time the total set of operations
process is impartial to developmental methods (i.e. is running” [3]. Potentially obtained by examining
waterfall, agile, etc.) however the process may invoke existing data, business data obtained from marketers,
changing designs, frameworks and the like to produce a estimating using the Delphi method with various experts
system with greater reliability. The engineering process is involved, and lastly, due to its cost, manually calculating
as follows: estimates. Refer to Figure 4.
1) Defining the product: involves depicting actors e) Determining occrence probabilities: Dividing
(users, suppliers, customers, etc.) of a system and each operation’s occurrence rate by the sum of the
determining the base product and its accompanying operation occurrence rate. Refer to Figure 4.
systems and different variations to establish which tests 3) Engineering the reliability: Specifying the just
are suitable for each component right goals of meeting the reliability objectives, demands
a) Refering to pacemaker system x (‘x’ denoting a defining the folling within specific system [3]:
generic pacemaker system that informs doctors of a) Define meaing of failure
patients heart rates, etc.) mentioned earlier, the different
variations entailing measurements of heart rhythm, b) Indicate common unit for all failure intensities-
transfering of measurements, etc. – recognizing these allows
variations allow different types of tests to be c) Establish failure intensity objective for each
implemented. Promoting test types to be specialized per associated system (operations, variation, etc.)
variation. d) Locate failure intensity objective for whole
2) Implementing operational profiles: Identifying system & select strategies (models, etc.) that “optimally
major tasks to be accomplished by the system meet the developed software failure intensity objective”
(customers, users, etc.) and their rate of occurrence, they 4) Preparing for test: Incorporates test cases and
must preserve control until task has been completed [3]. procedures which pilot “feature, load, and regression
These profiles facilitate testing to be conducted tests” [3]. Feature consists of independent tests on
efficiently, allowing tracability of “the reliability being operations to determine if operations perform accurately.
achieved” [5]. Development of an operational profile Load iterates large amounts of tests with confidence to
consists of the following: imitate failure which may occur due to interactions
a) Identify initiators of operations: Indicating between different operations. Regression test is done
different users or user types of the system, typically periodicly [3] and involves repetitive feature test after
found through analyzing the pre-described customer each build to determine failure based on amendments on
types during the inception phase of the requirements the software system.
analysis (For example, in the case of pacemaker system
x a doctor viewing a weekly generated report via email
5) Executing test: Identification of failures, when • Mean Time To Repair (MTTR) – measures
they occur, and how severe they impact the system is average time taken to repair a failure
found in this step [5]. You may use SRE to estimate &
track failure intensity in this process to help remove • Mean Time Between Failure (MTBF) –
failures. measurement of how reliable a component of a
6) Guiding test: Gathers all data relating to failed system is; MTBF ≈ τ / λ or “the sum of mean time
tests occured in testing to assist in the following to failure (MTTF) and mean time to repair
decisions: tracing reliabilty growth, preparing (MTTR)” [3]
acceptence testing, acceptance/rejection of a
“supersystem”, and releasing of a product entialing of all • Converting λ to reliability (R) – R ≈ exp (-λτ) if λτ
variations [3]. is less than 0.05 then R ≈ 1 - λτ
7) Post Delivery & Maintenace life-cycle phase:
Identifying failure intensity (λ) – occurrence of
Phase which realibility is attained and the operational
failure within a specified time unit is recognized as a
profile is experienced.
significant method in expressing software reliability [2].
D. Measuring & Calculating Reliability Figure 6 shows a sample reliability model with the
Establishing the SRE process is a good foundation, above standards incorporated within it. Viewing the
however the nature of quantitatively measuring software model, it’s evident that failure rate decreases as time of
systems requires greater insight. Firstly, classification of testing passes – displaying reliability growth. The model
the various types of failures must be determined, as also illustrates that reliability of software systems “stays
reliability is concerned with the occurrence of different constant over time if no changes are made to the code or
failures. This classification provides an orderly means of to the environmental conditions including the user
counting failures. Microsoft states that failures should be behavior” – unlike hardware [10].
segmented into three groups, unplanned events, planned
events, and configuration failures [12]. Figure 5 displays Failure intensity is typically visualized using models
a breakdown of these categories appropriately. that effectively illustrate the failures experienced over
time; this report demonstrates the use of the Basic Time
Secondly, obtain failure data. As mentioned in the Execution (BET) model and briefly mentions details of
SRE process there are various means of obtaining the the Logarithmic-Poisson Execution Time (LPET) model.
rates of failures – however failures must be documented Both operate on the assumption that reliability testing
from a users perspective; i.e. allowing a user to report a utilizes operational profiles “and that every detected
failure. The concern with this approach is that users often failure is immediately and perfectly repaired” [11].
avoid reporting or fix faults themselves [12]. An
alternative approach recommended by Jalote is by using Within the BET model (prediction model), failure
polling methods in periodically asking users of a system measurement is determined using execution time (CPU
to report any errors within specific operations. or processor time). The following can be calculated
accordingly [13]:
Standards to be accounted for when measuring &
calculating reliability should identify the following [8]:
• Fault introduction – defect in a software caused by
altering or inserting new code
• Fault removal – debugging actions to remove
identified faults Figure 1 – Mean failures experienced per time τ
• Execution time (τ) – duration of system running The following must be noted in Figure 1:
Similar to the BET model, the LPET model uses B. Test Case(s)
execution time as a time unit yet it also includes calendar As implied earlier a test plan is not thorough without
time (not requiring conversion of time units as the BET one or more test cases, identified by IEEE Standard 610
model requires) as a measuring time unit. This model as “a set of test inputs, execution conditions, and
represents infinite-failure models - permitting visual expected results developed for a particular objective,
description of unlimited amount of failures [11]. In such as to exercise a particular program path or to verify
theory, both portray process in removing faults in compliance with a specific requirement" [16]. In effect,
software systems that comprises of finite number of test cases must support discovery of information within
faults. software systems – each run (a particular instance of an
An instance of using the LPET model is demonstrated operation) should document & focus on the following:
in the Amazon system; there are over 304 million users,
each user has the capacity to perform ‘x’ amount of • Detected bugs
operations. The extent of operations per user (x) cannot • Resolve bugs
be estimated as it varies but in theory it’s not an infinite
value – showing that even the most complex and utilized For example, after altering code of pacemaker system
systems in essence have a finite number of possible x and measuring its performance, if over the span of 2
failures. This model could then allow software reliability hours the performance rate has decreased there should be
engineering testing (SRET) to focus on a specific documentation tracing the bug to the newly inserted
code. Additionally when code is altered to increase time under any condition. The different test cases are not
performance level over the same time unit, this should explicit to individual operations within the testing
be documented. operational profile, but rather represent different
variation each operation may undergo. Each variation
SRE is an approach that takes “a global view of the effectively has a different test case; revealing different
product involved” [3], Failure Modes and Effects dimensions of failures and/or faults within each run.
Analysis (FEMA) is another efficient approach “that
looks at particular failures, how they can be caused, and
IV. CONCLUSION
how to prevent them” [3]. It is not recommended to use
this approach as a primary reliability measuring method This report documented methods of quantitatively
because of its cost (requires detailed analysis of each measuring the reliability of a software system through
operation and its variations) but should supplement the use of the SRE process. According to Musa [5], software
SRE process after failure intensity has been established deployment and operational costs are believed to be the
to then focus on failure prevention methods [3]. most affected by unreliable software. Systematic SRET
supplements these aspects of software development from
There are numerous testing types available for both a marketing and development standpoint. As
measuring reliability; the following were selected for the implied in Section III, hardware testing and feature
purpose of this report [3]: testing processes are insufficient when measuring
reliability of a system due to its unprecedented
• Functional – Purposed to test each feature of the techniques within industry.
system in isolation; reasonable to first focus on the In relation to the Agile Testing Quadrants (Figure 7),
operations documented in the operational profile Q3 & Q4 represent methods of reliability testing as they
then test interaction of numerous functions focus on determining the robustness of a system; whereas
• Load – Tests the system by constantly stimulating testing within Q1 & Q2 represent primarily focus on
an abundance of users until the user threshold is featured testing. In which testing within Q2 may be
met (loads the server with dummy users) initiated to form a foundation of specification for the
system. Then performing testing such as Load Testing
• Performance – Tests the speed of a system within Q4 to assess the robustness (reliability) of the
• Regression – Used to test modifications in systems system.
when a system undergoes changes; aims to reveal Through the usage of the SRE process, development
failures of older functions of system of an operational profile, segmenting operations, and the
• Scenario – Testing used to test hypothetical like, testers are likely to establish reliability growth if
situations, helps analyze how a program deals with followed appropriately. Nevertheless it is safe to say
the simulated situation organizations and stakeholders are somewhat against the
ideology of promoting individual testing focused on
• Stress – “Testing conducted to evaluate a system reliability of some operations, as the costs may seem to
or component at or beyond the limits of its outweigh the benefits. In the case of pacemaker system x,
specified requirements with the goal of causing the the cost of meeting high standards of reliability is worth
system to fail” [17] the cost as this system is detrimental to ones life.
Figures 9–11 outline a sample test case for pacemaker Stakeholders with the expertise of testers should
system x which may be used to perform reliability determine this decision alongside an appropriate selection
calculation. The specified test cases reflect operations of a prediction model.
with a high probability of occurring as specified in the There are increasing developments of models in
operational profile (Figure 4); this allows SRET efforts industry to better measure reliability. However, as the
to focus on key functions within the system from a users complexity of software systems increase – so will the
perspective. definition of measuring reliability within these systems.
Thus suggesting the prediction of software failures
1) The test plan and test cases for pacemaker system (reliability) to be promising yet an uphill task as
x were chosen carefully to examine the objectives, technology improves.
reliability, and the overall goal of the system - providing
a system that handles its threshold of users at any given
REFERENCE [11] Vouk, M. (1997). Software Reliability Engineering. Software Reliability
Engineering. 1 (1), 1-21.
[1] Pham, H (2006). System Software Reliability. London: Springer. 5-30.
[12] Jalote, P; Murphy, B; Garzia, M; Errez, B. (2004). Measuring Reliability
[2] Musa, J. (1987). Software Quality and Reliabitity Basics. AT&T Bell of Software Products. Microsoft Corporation. 1 (1), 1-8.
Laboratories. 1 (1), 114-115.
[13] University of Victoria (2016) 'Chap 4. Software Reliability'. Available
[3] Musa, J (2004). Software Reliability Engineering: More Reliable at: http://www.ece.uvic.ca/~itraore/seng426-06/notes/qual06-4-2.pdf
Software Faster and Cheaper. 2nd ed. Bloomington, Indiana: (Accessed: 18 March 2016).
AuthorHouse. 1-527.
[14] Musa, J; Okumoto,K. (1984). A Logarithmic Poisson Execution Time
[4] Wurster, C. (2014). Remote Monitoring Proven To Help Prolong Life in Model for Software Reliability Measurement. Bell Laboratories. 1 (1),
Patients With Pacemakers. Available: 230-237.
http://www.hrsonline.org/News/Press-Releases/2014/05/Remote-
Monitoring-Pacemakers. Last accessed 3rd March 2016. [15] Ops A La Carte. (2016). Reliability, On-Going Reliability Test, HASS.
Available:
[5] Musa, J. (1996). Software-Reliability-Engineered-Testing. Software http://www.opsalacarte.com/Pages/reliability/reliability_prot_test.htm.
Engineered Reliability Testing and Testing Courses. 1 (1), 61-68. Last accessed 19th March, 2016.
[6] Elspas, B; Green, M; Levitt, K. (2006). Software Reliability. Computer [16] Kaner, C. (2003). What Is a Good Test Case?. What Is a Good Test
Science Group. 1 (1), 1-7. Case?. 1 (1), 1-13.
[7] Elspas, B; Green, M; Levitt, K. (2006). Software Reliability. Computer [17] Rosenberg, L; Hammer, T; Shaw,J. (2001). Software Metrics and
Science Group. 1 (1), 1-7. Reliability. SOFTWARE METRICS AND RELIABILITY. 1 (1), 1-7.
[8] Musa, J. (1975). A theory of Software Reliability and its application. [18] Rouse, M. (2007). What is regression testing?. Available:
IEEE Transactions on Software Engineering. 3 (3), 312-324. http://searchsoftwarequality.techtarget.com/definition/regression-testing.
[9] Dalal, S; Lyu, M; Mallows,C . (2014). Software Reliability. Bellcore, Last accessed 6th March, 2016.
Lucent Technologies, AT&T Research. 1 (1), 1-15.
[10] Grottke, M. (2001). Software Reliability Model Study. Software
Reliability Model Study Deliverable A.2. 1 (1), 1-35.
Appendix
After selecting the appropriate test types, gather the system’s operations from the operational profile (Figure 4) to be
tested and document the objectives each test should aim for. Figure 8 provides a high level view of the test objectives.
The following must be accounted for:
• Reference: Numeric or alphabetical reference to each test, allows easy traceability
• Operation: Refers to those documented in the operational profile
• Test Objective: Objective that the test is to demonstrate
• Evaluation Criteria: Conditions to be evaluated to validate a successful test
Following the documentation of tests to be carried out, test cases should be produced to provide detailed parameters
within testing methods. Figures 9-11 illustrates sample cases taking into account the Test Types documented in Figure
8.1.
Figure 8.2 shows the tool(s) needed for testing per operation within the test case; tools should be documented as early
as possible to ensure they are available during the SRET process. For simplicity of this report only one test case is
considered.