Lutess: a Specification-driven Testing Environment for Synchronous
Software1
L. du Bousquet
F. Ouabdesselam
J.-L. Richier
N. Zuanon1
LSR-IMAG, BP 72, 38402 St-Martin-d’Hères, France
fldubousq, ouabdess, richier, zuanong@imag.fr
ABSTRACT
Several studies have shown that automated testing is a
promising approach to save significant amounts of time and
money in the industry of reactive software. But automated
testing requires a formal framework and adequate means to
generate test data.
In the context of synchronous reactive software, we have
built such a framework and its associated tool -Lutess- to
integrate various well-founded testing techniques. This tool
automatically constructs test harnesses for fully automated
test data generation and verdict return. The generation conforms to different formal descriptions: software environment
constraints, functional and safety-oriented properties to be
satisfied by the software, software operational profiles and
software behavior patterns. These descriptions are expressed
in an extended executable temporal logic. They correspond
to more and more complex test objectives raised by the first
pre-industrial applications of Lutess.
This paper concentrates on the latest development of the tool
and its use in the validation of standard feature specifications
in telephone systems. The four testing techniques which are
coordinated in Lutess uniform framework are shown to be
well-suited to efficient software testing. The lessons learnt
from the use of Lutess in the context of industrial partnerships are discussed.
Keywords
Automated testing, synchronous reactive software, telecommunications systems, Lustre, operational profiles, behavioral
patterns.
1 INTRODUCTION
Testing receives an increasing attention from research teams
working on formal techniques for software specification, development and verification, for several reasons. First, testing
is the only means left to perform the validation of a piece of
software, when formal verification is impracticable because
of lacks of memory and/or time. Second, in industrial contexts, the formal specifications are often incomplete, even if
an entire important feature may be totally specified. Nevertheless, this specification effort should not be wasted: it can
be profitably exploited through testing. Third, testing brings
a practical solution to the assessment of the specifications
themselves. If the specifications can be made executable, it
can help one get confidence in the consistency and relevance
of these specifications. It can also reveal discrepancies between the specifications and the specifier’s intentions.
So, in this context, testing is used jointly with, and in complement to formal verification [5]. Besides, to be a significant support to validation, the testing techniques must either
provide a basis to reliability analysis [14], or have a wellestablished error detection capability and a proven cost effectiveness.
Reactive systems provide an appropriate area to promote
testing techniques within a formal framework. Indeed, they
are mainly safety critical applications which require reliable
testing. A reactive system continuously responds to signals
from its environment, and must satisfy temporal constraints
so that it can capture all the external events of concern.
An interesting subclass of reactive systems is synchronous
software. The synchrony hypothesis [3] states that every reaction of the software application to external events is theoretically instantaneous (actually, short enough to guarantee
that the environment remains invariant during the computation of the reaction). The main merit of the synchronous
approach is that its associated programming languages (e.g.
Lustre [6] or Esterel) have a formal semantics and are
equipped with model checkers for program verification. This
has enabled the successful specification, implementation and
verification of several industrial applications2.
We have investigated several testing techniques dedicated to
the synchronous approach in order to validate either formal
specifications written in Lustre3 or programs against properties also written in Lustre. We have built a tool -Lutess-
1 This work has been partially supported by a contract between CNET-
France Telecom and University Joseph Fourier, #957B043.
2 Flight controllers in Airbus aircraft, nuclear reactor monitors, subway
interlocking and monitoring systems...
3 For sake of brevity, Lustre is not presented herein.
which embodies four specification-based testing methods
and a structural testing method [18] based on the coverage
of the operator net associated with any Lustre program (This
latter technique is not presented here). The former methods
correspond to a variety of validation goals: testing for functional correctness, for safety and for reliability. They are all
implemented as highly automated processes.
With Lutess, we defend two theses. The first thesis is that a
monoformalism approach can be rich enough to encompass
several levels of languages: the software specification language, the test specification language and the programming
language. We also argue that the same technology can be
used to support the implementation of the verification techniques and the testing techniques as well.
The work on Lutess represents significant advances over
other works on testing reactive software applications. Statistical testing techniques [7] and algebraic specification-based
testing methods [4] developed for sequential programs are
applied to the automata which are the execution abstract
models of Lustre programs. Further studies are conducted
to make them more adapted to the synchronous approach.
In [16], a mixed approach combining formal verification and
systematic, but not automatic testing is presented. Lately, Jagadeesan et al [15] have described a technique and a toolset
for testing reactive software for violation of safety properties. In many ways, this approach is similar to a previous
version of Lutess [20].
Whereas our approach shares some analogies with conformance protocol testing [22], it differs in several respects.
Mainly, the latter can be considered as a verification technique, equivalent to exhaustive testing under certain hypotheses: it checks equivalence of two complete specifications (the formal specification of the protocol and a model of
its implementation). On the opposite, our approach has no
verification purposes. It does not require any formal model
of the implementation to be tested; it can even work with
only a partial specification of the system. Consequently, the
scope of industrial application of our technique is larger. But,
using incomplete specifications forbids static generation of
the test cases and forces us to build them at run-time, dynamically.
The purpose of this paper is twofold. First, we want to
demonstrate the theses which are defended with Lutess. Second, through two industrial case studies, we intend to show
the advantages of the testing approach based on Lutess, both
in terms of cost and efficiency. To this end, the paper is structured in five main sections. Section 2 is a short overview
of Lutess. Section 3 surveys all the testing methods implemented in Lutess, from the tester’s viewpoint. Section 4
presents the theoretical limitations of the testing techniques.
In section 5, we explain the test data selection process. Section 6 reports the experience gained in two case studies
undertaken in industrial partnership on telecommunication
problems. The conclusion includes a comparison between
Unit
under test
Constrained random generator
Oracle
Verdict
Figure 1: Lutess
Lutess and some related works.
2 TESTING REACTIVE SYSTEMS WITH LUTESS
A Specification-driven Process
An important feature of any reactive system is that it is developed under assumptions about the environment behavior, and, apart from the validation of its robustness, its test
consists of observing its reactions to valid environment behaviors. Clearly, it makes no sense to test a telephone system whose users can dial a number while being on the hook
(inv 1), or lift the handset twice in a row without going on
the hook in between (inv2). Since the environment may be
a very complex system, all its valid behaviors must be specified. Therefore, the environment properties constrain the test
data which are to be generated.
Furthermore, a reactive system output often depends on the
history of the software previous responses to the external
events. These time-dependent behaviors make difficult the
expression of the test result interpretation as sets of inputoutput relations. To calculate the test results, we conjecture
that the appropriate means is to use temporal properties.
A Highly Automated Process
To thoroughly test a reactive software application and get
confidence in its reliability, the number of input-output relations (test cases) to be managed is very large. This fact and
the complexity of the test data selection activity, as well as
the test result evaluation task, discard the choice of a testing
process based on human involvement. In addition, to easily
take into account the nondeterministic behavior of the environment, the testing must rely on a dynamic generation of
test data sequences.
Lutess: Operational Principle
The operation of Lutess requires three elements: a random
generator, a unit under test (UUT) and an oracle (as shown
in figure 1). Lutess constructs automatically the test harness
which links these three components, coordinates their executions and records the sequences of input-output relations
and the associated oracle verdicts. The three components are
just connected to one another and not linked into a single
executable code.
The constrained random generator is automatically built by
Lutess from specifications written in Lustre and from operational profiles [17] stated partially in Lustre. The specifications correspond to constraints defining the valid environment behaviors and possibly to properties which serve as test
guides. These guiding properties define a subset of the environment behaviors which is considered to be of interest for
the test. This subset contains, for example, data which adequately test safety properties to detect safety violations or
data which lead the UUT into sequences of complex and rare
situations. The operational profiles are provided as conditional probabilities associated with the UUT input variables.
The specifications and the operational profiles are grouped
into a specific syntactic unit, called a testnode. The notion of
testnode has been introduced as a slight extension to Lustre.
The UUT and the oracle are both synchronous and reactive
programs, with boolean inputs and outputs. Optionally, they
can be supplied as Lustre programs. If the oracle is written in
Lustre, it is automatically compiled into an executable code.
As an illustration, let consider a simple telephony system, providing the Plain Old Telephone Service (POTS).
The environment is composed of a set of telephone devices
(Phones) on which users can perform actions as inputs to the
system. Outputs are signals sent back to the phones. In the
following, RingPh and TalkPh (resp. OnPh and OffPh ) are
predicates on the system outputs (resp. inputs). The testnode
defines the valid environment behaviors as the conjunction of
logical invariants. We give here the Lustre-like expression of
the invariant inv2 presented above (with Ph 2 Phones):
once OnPh from pre4 O
Ph
to O
Ph
The oracle groups properties to be satisfied by the system.
Such a property, rather naive, could be: “Anytime a user
lifts the handset of a ringing phone, he/she gets involved
immediately in a communication”. The oracle would then
include the logic expression corresponding to that property,
with Ph 2 Phones:
(pre RingPh and OffPh ) ) TalkPh
Using Lutess, the test is operated on a single action-reaction
cycle, driven by the generator. The generator randomly selects an input vector for the UUT and sends it to the latter. The UUT reacts with an output vector and feeds back
the generator with it. The generator proceeds by producing a new input vector and the cycle is repeated. The oracle observes the program inputs and outputs, and determines
whether the software specification is violated. The testing
process is stopped when the user-defined length of the test
sequence is reached.
Lutess has a user-friendly interface, which offers the user an
integrated environment:
to define the testnode, the oracle and the UUT,
to command the construction of the test harness, to
compile Lustre programs, and to build constrained random generators,
to run the testing process, to set the number and the
length of the data sequences, and to replay a given sequence with a different oracle,
4 pre i is a Lustre expression which returns the previous value of i.
to visualize the progression of the testing process and to
format the sequences of inputs, outputs and verdicts,
to abstract global results from the sequences for both
testing efficiency analysis and software reliability evaluation.
3 THE TESTING METHODS
The four techniques provided by Lutess are here described
from the user’s viewpoint. Each of them is a form of blackbox testing. As an example, we consider an extension of
the former system which now offers a Call Forwarding on
No Reply feature. This feature allows its subscriber to have
his/her incoming calls redirected when he/she does not answer within a given delay. The feature is dynamically activated/deactivated.
Random Testing by Environment Simulation
Test data are generated only with respect to the environment
constraints. Therefore, the test data selection criterion is the
weakest one can define for synchronous software. The test
data generation is performed in such a manner that the data
distribution is uniform.
Empirical observations
For complex systems, a uniform distribution is far from the
reality. Indeed, on some test runs, we have noted that some
users stayed off the hook for long periods of time after receiving a Busy Line indication, while in reality, they would
have quickly gone on the hook. Similarly, many observed
behaviors consisted in simple alternating going off and on
the hook, without trying to perform any action in between,
which is not a typical behavior. We also noticed that, on the
whole, every user tried to call himself/herself as often as any
other user. In the real world, such a behavior occurs very
seldom.
Operational Profile-based Testing
This method gives a way to assess software reliability since
the generation process can take into account the specifications of operational profiles [17], i.e. probabilities assigned
to the input vectors. The practical problem with the operational profiles is that the tester should define them completely. Usually, achieving such a goal is a useless effort for
two reasons. In practice, the specifications are most often
incomplete. Furthermore, the user has only a partial knowledge of the environment characteristics, and he better grasps
each input variable individually.
To bypass this drawback, Lutess offers facilities to define in
the test node a multiple probability distribution [23] in terms
of conditional probabilities associated with the UUT input
variables [9]. The variables which have no associated conditional probabilities are assumed to be uniformly distributed.
The conditions are Lustre expressions.
An algorithm is implemented in Lutess to automatically
translate a set of conditional probabilities into an operational
profile (and vice versa).
Empirical observations
Regarding the last unrealistic aspect mentioned just above,
one would like to set probabilities so that the number dialed
depends on the user who is dialing (x). For instance, if we
consider 4 users in the environment, x has a basic probability of 1/4 to dial his/her own number. Thanks to the operational profile-based method, we unbalanced this distribution
by formally specifying that “if x dials a number, he/she dials
his/her own number once out of 10 times, and three times out
of ten the number of every other user.”
Property-oriented Testing
While reliability is concerned with every possible fault,
safety is only concerned with those which can lead the software to a mishap. Thus, the property-oriented testing method
is aimed at selecting test data which facilitate the detection
for safety property violations. This test may be performed
regardless of any input distribution.
More generally, at each cycle, this method automatically
generates values which are the most liable to cause an instantaneous failure with respect to a conjunction of formulae.
Each formula is reducible to a safety property (functional
property as well as liveness property in bounded time). Let’s
consider the simple property P : i ) o, where i (resp. o)
is an input (resp. output) of the UUT. When i is false, the
UUT cannot falsify P . Hence, the only value for i which
adequately tests P is true.
Note that the use of the property-oriented method is not exclusive: it is usually applied with test data which satisfy the
environment constraints. Input values which are relevant to
the considered properties are favored over the values only
associated with the environment.
Empirical observations
One safety-like property of the telephony system is that the
user’s phone goes back to its idle state every time its user
goes on the hook. Driving the generation with such a property led to favoring the considered action, thus improving
the tester’s confidence in the system’s reaction to this input. However, this resulted in every user tending to go on
the hook as soon as possible; thus, many behaviors which
are more realistic were never produced nor tested.
Behavioral Pattern-based Testing
As complexity grows, reasonable behaviors for the environment may reduce to a small part of all possible ones with
respect to the constraints. Some interesting features of a system may not be tested efficiently since their observation may
require sequences of actions which are too long and complex
to be randomly frequent.
The behavioral pattern-based method aims at guiding further the input generation so that the most interesting sequences are produced. A behavioral pattern characterizes
those sequences by listing the actions to be produced (e.g.
CFon(A,B) in fig. 2), as well as the conditions that should
CFon(A,B)
CFon(B,C)
not CFoff(A)
CFon(C,D)
not CFoff(A/B)
Dial(E,A)
not CFoff(A/B/C)
Upper conditions are actions to be produced, while conditions below curved lines are interval conditions.
Figure 2: Example of a behavioral pattern
hold between two successive actions (not CFoff(A) in fig. 2).
Regarding input data generation, all sequences matching the
pattern are favored and get higher chance to occur. To this
end, desirable actions appearing in the pattern are preferred,
while inputs that do not satisfy interval conditions get lower
chance to be chosen. Unlike the constraints, these additional
guidelines are not to be strictly enforced. As a result, all
valid behaviors are still possible, while the more reasonable
ones are more frequent. The model of the environment is
thus more “realistic”. The generation method is usually invoked with environment constrained test data. Patterns are
stated using graphical notations; Lutess automatically translates them into Lustre expressions.
Empirical observations
To avoid loops in the forwarding, the specification of the
CFNR feature requires that no more than 2 redirections are
ever performed on a single call in a row. When checking
what could happen in the case of more then 2 redirections,
we noticed that this situation had little chance to occur. On
the contrary, the use of a pattern has proved that it increases
the situation likelihood in shorter test sequences. Figure 2
shows such a pattern. CFon(x,y) means that user x activates
the CFNR feature to forward calls towards y; CFoff(x) (resp.
CFoff(x/y)) stands for the deactivation of the CFNR feature
for x (resp. x or y).
4 THEORETICAL LIMITATIONS
This section provides a formal framework for the testing
methods, in order to show explicitly their applicability. The
various forms of constrained random generators are formally
expressed in terms of reactive I/O machines.
In the following, for any set X of boolean variables, VX denotes the set of values of the variables in X . x 2 VX is an
assignment of values to all variables in X .
Definition 1 A reactive I/O machine
(Q; qinit; A; B; t; out) where
M
is a 6-tuple
Q is a finite set of states,
qinit 2 Q is the initial state,
A is a set of input variables,
B is a set of output variables,
t : Q VA VB ! Q is the (total) transition function.
out : Q ! VB is the output function.
8q 2 Q 8a 2 VA 9b 2 VB 9q 2 Q; t(q; a; b) = q
(M is reactive).
0
0
A reactive machine is never blocked: in every state, whatever the input is, a new output can be computed to enable a
transition. In response to a sequence of inputs (a1; :::; an),
a reactive I/O machine emits the sequence (b1 ; :::; bn) while
going through the sequence of states (qinit; q1; :::; qn?1) with
qk = t(qk ?1 ; ak ; bk ).
Formal Definition of an Environment Simulator
The abstraction of an environment simulator is an I/O machine which has to be reactive, and whose behavior is specific since input and output operations take place in reverse
order: it begins with an output operation instead of an entry.
With the same notation as in definition 1, it is sketched as
in (B-1). It can be rewritten from the UUT point of view in
terms of its inputs (ik ) and outputs (ok ) (see (B-2).
1
out(qinit)
b
F or k
= 1::;
read(ak )
qk
bk
+1
(B-1)
t(qk
?1; ak ; bk )
out(qk )
i
1
out(qinit)
F or k
= 1::;
read(ok )
qk
ik
+1
t(qk
?1; ok ; ik )
out(qk )
(B-2)
Definition 2 An environment simulator (or a generating machine) is a reactive I/O machine Menv =
(Q; qinit; O; I ; tenv; outenv ) where
(resp. I ) is the set of the UUT output (resp. input)
variables.
Q is the set of all possible environment states. A state
q is an assignment of values to all variables in L, I
and O (L being the set of the testnode local variables).
t : Q VO VI ! Q is the total transition function.
env Q VI represents the environment specification.
tenv : Q VO VI ! Q is the (possibly partial)
transition function constrained by env; tenv (q; o; i) is
defined and is equal to t(q; o; i) iff (q; i) 2 env.
outenv : Qnfq j n
9i(q; i) 2 envg ! VI is the output
function. It is an nondeterministic function that computes Senv (q), the set of all possible UUT inputs, then
selects an element from Senv (q) according to an equally
probable distribution.
O
Remark 1: The behavior (B-2) makes it clear that the UUT
inputs cannot depend on the UUT outputs at the same instant.
Since the purpose of the environment constraints env is to
act on the generation process (i.e. on the out function), env
can only depend on the output variable previous values. That
justifies the formal definition of env.
Definition 3 Let Menv = (Q; qinit; O; I ; tenv; outenv ) be a
generating machine and post: Q ! 2Q the function defined by:
post(q) = fq0 j9(o; i) 2 VO VI ; tenv (q; o; i) = q0 g
.
g (qinit) be the image of qinit under the transitive cloLet post
sure of post.
env is generating if
t(q; o; i)
^
8 2
q
g (q
post
init)
0 g (qinit)
q 2 post
) 9(
i; o; q
0) q0
=
Consider a testnode handling two software inputs (i; j ),
whose environment constraint is env = pre i and j .
When the associated I/O machine is in a state where
pre i = f alse, there is no input value for which the constraint holds. Thus, from a formal point of view, env is not
generating. However, if i is always set to true, the machine
would never be blocked. So, from a practical viewpoint, a
constrained random generator is always worth building.
Remark 2: This is why, in terms of implementation, we
don’t try to determine a priori if env is generating. We rather
try to detect blocking situations during the generation.
The means to determine whether env is generating is to compute the set of reachable states, or its complement (i.e., the
set of states leading inevitably to the violation of env). These
computations are based on a least fixed point calculation
which can be impracticable [20, 13, 21].
Formal Definition of a Property-guided Machine
Definition 4 Let Menv = (Q; qinit; O; I ; tenv; outenv) be a
generating machine and fP Q VO VI be a predicate
representing a property P .
A UUT input value i 2 VI (adequately) tests P on state
q 2 Q (adequateP (i,q)) iff 9o 2 VO ; fP (q; o; i) = false .
The previous definition characterizes input data values that
can facilitate the detection of property violations.
Remark 3: Note that the adequate test data is searched for in
the current state. Thus the technique is limited to an instantaneous guiding. When considering a safety property like
pre i ) o, the generator does not discover that setting i to
true will test the property at the following step.
Definition 5 A property-guided machine is defined as
MP = (Q; qinit; O; I ; tenv ; outP ) where
(Q; qinit; O; I ; tenv ; outenv )
is a generating machine,
is a conjunction of properties,
outP computes both Senv and Senv \adequateP = fi 2
VI j (q; i) 2 env \ adequateP g. If the latter is not
empty, a value is selected from that set, otherwise from
Senv .
P
Whenever it is possible to produce an input value which adequately tests the properties, all input values which do not test
adequately the properties are ignored.
Remark 4: All the properties may not be jointly tested. For
example, i being a software input, to adequately test the
property not i or o, the generator must set i to true. This
prevents it from selecting an adequate data for the property
to be tested concurrently pre i or o, since pre i will always
be true. Thus, definition 5 does not ensure that all the safety
properties will be adequately tested.
Formal Definition of an Operational Profile-guided Machine
Definition 6 A operational profile-guided machine is defined as Gprof = (Q; qinit; O; I; tenv; outCPL ) where
(Q; qinit; O; I; tenv; outenv ) is a generating machine
CPL = (cp0; cp1; :::; cpk) is a list of conditional probabilities. Each cp is a 3-tuple (i; v; fcp ) where i is
an input variable (i 2 I ), v is a probability value
(v 2 [0::1]), and fcp is a condition (fcp Q VO VI ).
v denotes the probability that the variable i takes on the
value true when the condition fcp is true.
outCPL is such that the selection process is no longer
equally probable and depends on the conditional probability list.
When the conditional probability list is empty, the machine
is equivalent to the basic one. The conditional probability list
overrides (possibly partially) the by-default equally probable
distribution of the basic generating machine.
Formal Definition of a Pattern-guided Machine
A behavioral pattern (BP) is made out of alternating and ordered instant conditions and interval conditions. The instant
conditions must be satisfied one after the other as time progresses. Each interval condition shall be continually satisfied
between the two successive instant conditions which border
it. A behavioral pattern characterizes the class of input sequences that match the sequence of conditions.
A behavioral pattern (BP) is built with the following syntax
rule, where a simple predicate (SP) is a Lustre boolean expression which does not include the current outputs:
BP ::= [SP ] SP j [SP ] SP BP
The non-braced predicates represent the instant conditions,
while the braced predicates correspond to interval conditions. [true] CFon(A; B ) [not CFoff (A)] CFon(B; C )
is an example of a BP. BPs give a means to partially describe
a sequence: whatever the inputs between two instant conditions, it is sufficient that the interval condition holds.
With a behavioral pattern is associated a progress variable
which indicates what prefix of the BP has been satisfied so
far. To any value this variable can take corresponds a pair of
predicates finter, condg which describes the next-to-appear
predicate and the predicate that should continually hold in
the meantime.
Definition 7 A pattern-guided machine is defined as Gpat
(Q; qinit; O; I; tenv; outBP ; progress) where
(Q; qinit; O; I; tenv; outenv ) is a generating machine
=
BP
?
= [true]cond0 [inter1 ]cond1 :::condn 1 [intern ]condn
progress is an integer variable taking its value over
Vprogress = [ 1; 0::n + 1]. It is the progress index on BP .
Let SH ; SL ; SN : Q Vprogress
VI be sets of input
variables defined as, q Q; j Vprogress :
?
8 2 8
)=f 2 I j(
)=f 2 I j(
)=f 2 I j(
2
)2
)2
)2
!
j \ envg
j \ condj \ envg
i
V
q; i
interj \ condj \ env g
Given q and j , the current state and progress values, outBP
–
–
–
H (q; j
L (q; j
SN (q; j
S
S
i
V
q; i
cond
i
V
q; i
inter
first selects a non-empty set among the above, then performs
the standard value selection within this set. As a side effect, outBP also computes the next value for progress:
if SH (q; progress) is chosen, progress is incremented,
if SL (q; progress) is chosen, progress is set to -1,
if progress=-1 or n+1, progress=0.
Intuitively, the partition is motivated by the status of the
transitions regarding the progression of the guiding process:
SH includes all input that make the process go forward, SL
groups those that lead to the process stopping, while SN
gathers all transitions that do not affect the process.
Remark 5: Definition 7 does not guarantee that the guiding process will lead to the completion of the pattern, i.e. to
generate sequences that match it. Indeed, there may exist a
reachable state for which a progress value makes both SL
and SH empty. If the guiding process makes the machine
reach this state, the process can’t progress nor regress anymore and becomes quiescent for the remaining of the test.
Many other similar situations may occur, that prevent from
completing the pattern. However, all of them are due to an
incorrect description of the pattern. This description should
be cautiously performed.
5 TEST DATA SELECTION
The automaton obtained by compiling the environment constraints is coded using a symbolic notation in which the states
are represented by a set of variables, and the transitions by
boolean functions. These functions are implemented as a single Binary Decision Diagram (BDD) [2], extensively used
for verification of reactive systems [13]. Each node of the
diagram carries a variable and each of its outgoing branches
is labelled with the value taken by that variable. The upper variables are those defining the environment state, the
lower are the input variables. Locating the sub-diagram corresponding to a given state is therefore a trivial operation.
The generation algorithm is based on a specific BDD labelling which is carried out once, during the BDD construction. Each node corresponding to an input variable e is labelled with a couple of integers (v0 ; v1). v0 (resp. v1 ) indicates the number of distinct input vectors including e when
e=false (resp e=true).
Random Testing by Environment Simulation
The basic random generation algorithm produces equally
probable input values. To guarantee to all the valid input
vectors an equal probability, the value of e is set in function
of the following probabilities:
v1
and
p(e = true) =
v0 +v1
p(e
=
false) =
v0
v0 v1
+
At each cycle, the generator performs four operations:
locate, in the diagram describing the environment constraints, the sub-diagram corresponding to the current
values of the state,
generate a random value for the software inputs satisfying the boolean function associated with that diagram,
read the new software outputs,
compute the next state by computing the next value of
each state variable.
In other words, the generator searches in the diagram associated with the constraints a path leading to a true leaf.
Property-oriented Testing
This technique is implemented by building a new BDD from
the environment constraints and the properties to be tested.
The resulting BDD allows to check whether a given state
and a given value of the inputs both satisfy the environment
constraints and are liable to exhibit an error with respect with
the properties. The basic algorithm is modified as follows:
locate, in this late diagram, the sub-diagram corresponding to the current value of the state,
check whether there exists at least one value for the inputs which can lead to a true leaf in this diagram,
if positive, randomly select one of these values; otherwise, perform the basic algorithm.
Operational Profile-based Testing
The generation algorithm uses both the previous BDD labelling and the conditional probability list.
Let CP (e) =((p1 ; ce1),(p2 ; ce2)...(pr ; cer )) be a list of conditional probabilities associated with the input variable e. In
CP (e), pj denotes the probability that the variable e takes
on the value true when the condition cej is true. The selection function assigns a value to e according to the following
probabilities:
8
>
>
<
>
>
:
p(e
=
true) = if
if
p(e
cer
ce
1 then p1 else if
else:::
then pr else
ce
2 then p2
v1
v0 v1
+
with v1 and v0 referring to the basic labelling
=
false) = (1 ? p(e = true))
Behavioral Pattern-based Testing
Given the pattern to be matched, the method drives the
generator to consider at every cycle the pair of predicates finter,condg corresponding to the current value of the
progress variable. At each step, first, the input space is computed to get all the possible inputs meeting the environment
specification. It is then divided into three categories: SH , SL
and SN as stated in definition 7.
A probability is assigned to each category so that an input in
the first one would be favored over an input in the third cat-
egory, which, itself, would be preferred to an input from the
second category. These probabilities are determined with respect to the cardinality of each partition and to given weights
associated with them : wH , wL and wN . A partition is said
to be of higher priority than an other if its weight is greater.
The input selection is a two-step process. First, a category
is selected according to the determined probabilities. Each
category c in C =fSH , SL , SN g has a probability pc of being
selected:
p
c
=
P
wc card(c)
j 2C
wj
card(j )
Then, an input is chosen in an equally probable manner from
the selected category. As a result, the probability for any
input i in c to be chosen is pi;c:
wc
pi;c = card1 (c) pc =
w card(j )
j 2C j
P
The implementation of the algorithm is also based on the
environment BDD. Each predicate in the pattern is represented by a BDD. The predicate BDDs and the environment
BDD are combined to identify the input sets SH , SL and SN .
These BDD are labelled in the very same manner than for the
basic generation.
Every generation step involves therefore the traversal of
the three diagrams corresponding to the current value of
progress. The traversal leads to the subdiagrams corresponding to the current environment state, where the cardinality for
SH , SL and SN can be retrieved, thanks to the labelling. The
selection is then performed with respect to the given weights
and the calculated cardinalities.
6 INDUSTRIAL APPLICATIONS
Lutess feasibility has been studied in [19] and [20]. Recently,
Lutess has shown its practicality and its efficiency on industrial case studies conducted in partnership with France Telecom. This section describes roughly the contribution of the
tool, especially regarding its latest improvements.
A first experimentation (A) has consisted in the validation of
a synchronous model of telephony system [10]. From informal but rigorous descriptions (ITU Recommendations), we
have developed a synchronous model which revealed itself
well adapted to integrate new services to the system. This
experiment has concerned five services.
Another experiment (B ) has addressed twelve services in the
framework of the “Feature Interaction Detection Tool Contest” [12] held in association with the 5th Feature Interaction
Workshop ’98. The goal was to detect possible and undesired interactions between those services.
Lutess has won the contest Best Tool Award.
Problem Complexity
The two forementioned experiments were of industrial size.
The initial specifications occupy between 18 and 26 pages
of english text per feature for ( A); for (B ), each feature was
specified with 1 to 5 pages of Chisel diagrams (a require-
ments definition language for communications services) [1].
The number of inputs/outputs varied from 17/198 (experience (A)) to 52/216 (experience (B )). The large number
of inputs/outputs is due mainly to the fact that the interface
has to be composed of booleans. The inputs are the users’s
actions, and various parameters, while the output vector is
made of signals for each user. In the first case, modeling
the system required 1700 lines of Lustre code, plus up to
530 lines for each supplementary service. In the second one,
these figures amount to 1100 and 550 lines. Environment descriptions included between 32 and 45 constraints, plus up to
8-step patterns or 16 conditional probabilities.
Usage of the Tool
For experience (B ), 78 configurations were to be tested, each
including one or two features. In each case, 5 to 10 oracles
were available (including both global consistency oracles,
e.g. for message sequencing, and feature-specific oracles).
The test process for each configuration involved 10 to 20 sequences of 1000 to 10000 steps each. On the whole, each
configuration has been tested for around 1 million test cases.
The Lutess tool was run over 1500 times. Experience ( A)
was smaller, but involved however a more important usage
of the tool, since the features under test were more complex.
Usability of the Tool
On such large-scale experiments, one benefits highly from
the tool’s user-friendliness. Running a test process, modifying the guiding, replaying a same test case with modified
oracles are all one-click commands. It is also possible to
modify the UUT and re-run the test process without having
to generate again the BDD associated with the environment.
This option allows to save large amounts of time when debugging.
Time was a critical ressource for experience ( B ): the contest
was conducted in two phases, a first one long enough to have
experimentors adjust the framework and tune the methods,
and a second one short enough to assess the tool efficiency.
We successfully managed to complete this phase in the short
period of time allocated. This compels us to believe that our
tool was well adapted to the problem.
For test purpose, the subscription lists associating the users
and the features they can use, are inputs to the system. So,
one can test various configuration without having to recompile the UUT. However, some parameters cannot be modified
without recompiling, like the number of users or the priority
order set among the features.
From the tester’s perspective, the tool allows a significant
relief by automating the test. Building the oracle appeared
to be the most difficult part of the testing process. One has
first to master the temporal logic paradigm, then to find out
the better terms to express a given property. This requires an
adequate training and experience.
From the specifier’s point of view, Lutess has shown to be
very helpful to debug specifications. First, Lutess has been
used to validate the oracles: the oracle specifications are put
in place of the UUT and a human observation is substituted
for the Lutess oracle. Second, prior to the search for interactions, the service specifications to be tested have been validated using oracle properties. For instance, in the specifications, some possible transitions were missing in a diagram,
or an expected output message was never sent in a given situation. These problems were automatically exhibited as oracle violations.
Test Cost
Building the BDD structure corresponding to a given environment is the most time-consuming task of the mechanized
part of the testing process. It was always possible to perform this computation and to run the test on a Sparc Ultra-1
station with 128 MB of memory. Maximum virtual memory required amounts to 100 MB. Though, as the number of
constraints describing the environment increases, the BDD
complexity rises and its generation lasts longer. For the lessconstrained environments that we produced, 6 seconds on
CPU were necessary, while the most-constrained environments required 33 minutes for the corresponding BDD to be
generated. As a comparison, a 1000 step test run lasts 120
seconds5 , once the BDD has been generated.
Model Adequacy
The choice of a synchronous model turned out to be welladapted to the problems of feature validation and interaction
detection. This is due to the fact that a telephony system can
be viewed as a reactive system and can be easily designed
to satisfy the synchrony hypothesis. To that, one can for instance imagine a queueing mechanism for storing inputs that
arrive during the system computation. Moreover, deriving
the feature modeling from the specification was quite simple
and almost systematic in the case of experience (B ) where
Chisel diagrams were available; it induced no added cost to
the test.
The synchronous approach has led to concise validations,
thanks to the reduced number of states in the model. In other
words, all transitions are observable. The executable model
is of higher abstraction and avoids the state space explosion
problem. As a consequence, five execution steps are enough
to initiate a call, while two steps suffice to terminate a communication. In addition to that, the UUT can be compiled
into naive C code6 , instead of a structured and optimized automaton. This prevents state space explosion even further.
Evaluation of the Benefits Brought by the Guiding Techniques
The use of operational profiles or patterns has proven to be
highly profitable when prototyping the application: these
techniques allow to have a quick return on the correctness
of the implementation. Then, when it comes to validate the
5 This phase of the testing process is proportional to the sequence length.
6 The UUT is implemented in C as a single-state automaton.
Talk(A,B)
and Hold(A)
On(A)
Dial(A,C)
InvokeEct(A)
On(A)
Figure 3: Behavioral pattern for the ECT feature
implementation (test its conformance to the specification),
these techniques drive the environment to follow a realistic
evolution. Meanwhile, thanks to the probabilistic aspect introduced in both methods, the behaviors of the environment
may vary and involve rare and unforeseen scenarios. Such
cases, close to the expected behavior -yet unexpected- are
realistic and thus worth to be tested.
Technique Application
As an example, we describe here a relevant application of the
behavioral pattern-based technique. Consider the Explicit
Call Transfer supplementary service (ECT). This service allows the user to put on hold his/her party, to place another
call and to transfer this new call to his/her first party (The
party is the user to which one is connected). The user is no
longer in the communication.
To test efficiently the feature, one should exhibit sequences
of input data that correspond to its invocation. Such sequences should include the following actions: first, putting
one’s party on hold (Hold), then placing another call (Dial)
and finally performing the feature’s invocation(InvokeEct).
The intervals between these actions are constrained as well.
That is obviously too long and complex to be randomly frequent. As a result, performing a test with nothing else but
the environment specification has lead to disappointing results. Experiment shows that on a 8000 step simulation, the
sequence appeared only twice, which is clearly poor and of
little significance.
Applying the pattern-based guiding with moderate weights
drives the desired situation to occur 30 times, which can be
viewed as a reasonable use of the service. We also tried
to stress testing the feature by unbalancing even further the
weights: the number of occurrences has raised up to 500.
Figure 3 shows the graphical expression of the loosest pattern to achieve the guiding. To increase the guiding, one
could have detailed even further the pattern, e.g. by setting
conditions on C’s state or B’s and C’s behaviors on the intervals.
7 CONCLUSION
In this article, we described a work which shares with Dillon
and Yu’s approach [8], the idea of using a graphical notation
to specify temporal properties. However, in [8] the problem
of automatic test data generation is not addressed. As ours,
the work presented in [11] guides the testing process and focuses on the validation of some features of the application.
Yet, it still requires a complete specification while ours can
cope with a very partial one. In [15], Jagadeesan et al have
presented a technique and a toolset that represent the most
similar work to Lutess. Compared to Lutess, this approach
appears to be limited in several respects. The testing process is solely directed towards safety violations, and, thus,
finds only errors related to this paradigm. Environment constraints are only taken into account to restrict the size of the
input space. Inputs are only selected with uniform weights.
The whole process is based on the compilation of the oracle, the application and the test harness into one single executable code; recompiling is necessary after each modification, which caused the biggest dissatisfaction, according to
what the authors said.
We presented Lutess, a highly automated testing environment for synchronous software and reported two experimentations to assess its suitability. The supported approach
is both formal specification-driven and machine intensive.
Thus, the human efforts can be profitably transferred from
the classical tester’s chores (selecting the data, determining
the result validity) to more defect prevention tasks by developing specifications. Besides, the same language is used to
specify the software, its environment, its usages and various
testing situations.
The industrial case studies have confirmed that this approach
is highly cost-effective. Globally, in an average 20 man
weeks effort, around 50 properties were stated and 7000 lines
of Lustre programs to be tested were written, millions of
test cases were automatically generated, the verdicts of thousands of runs were automatically delivered and around 100
defects were detected. Furthermore, the case studies have
proved that the guiding techniques were excellent at finding problems involving rare scenarios. This positive experience was reinforced by the valuable application of Lutess in
the software specification stage, which helped get confidence
in these specifications. All this has certainly contributed to
make Lutess the “best tool” of the FIW contest [12].
Experimentation has however highlighted the following
drawbacks. It has clearly shown that the more detailed
the properties, the higher the chances to detect a problem.
However, detailing properties is not always possible, and/or
would lead us away to the user’s view which, so far, we have
tried to favor. One has therefore to find a good balance for
the property precision level. In addition, specifying the software environment by means of invariant properties is a rather
delicate task. Indeed, one should adequately choose a set
of properties which do not “overspecify” the environment.
Overspecifying may prevent some realistic environment behaviors from being generated.
The generalization of these results is worth examining, since
only the development context of this study makes its external
validity questionable. For the time being, our approach is restricted to boolean programs. However, extending it to manipulate other data types seems to be feasible, since BDDs
can handle numerical constraints. It would though require
adjustments of the test data selection techniques. Although
our work includes both structural testing and black-box testing, we have not tried yet to relate these methods. That would
be interesting to provide measurement of the techniques coverage. This relation is under study.
REFERENCES
[1] A. Aho, S. Gallagher, N. Griffeth, C. Schell, and
D. Swayne. Scf3TM sculptor with chisel: Requirements
engineering for communications services. In Feature
Interactions in Telecommunications Systems V, pages
45–63. IOS Press, 1998.
[2] S.B. Akers. Binary Decision Diagrams. IEEE Transactions on Computers, C-27:509–516, 1978.
[3] A. Benveniste and G. Berry. The Synchronous Approach to Reactive and Real-Time Systems. Proceedings of the IEEE, 79(9):1270–1282, 1991.
[4] G. Bernot, M-C. Gaudel, and B. Marre. Software testing based on formal specifications : a theory and a tool.
Software Engineering Journal, 6:387–405, 1991.
[5] J. Bicarregui, J. Dick, B. Matthews, and E. Woods.
Making the most of formal specification through animation, testing and proof. Science of computer programming, 29(1-2), 1997.
[6] P. Caspi, N. Halbwachs, D. Pilaud, and J. Plaice. LUSTRE, a declarative language for programming synchronous systems. In 14th Symposium on Principles
of Programming Languages (POPL 87), Munich, pages
178–188. ACM, 1987.
[11] J.-C. Fernandez, C. Jard, T. Jéron, and C. Viho. An experiment in automatic generation of test suites for protocols with verification technology. Science of Computer Programming, 29:123–146, 1997.
[12] N. Griffeth, R. Blumenthal, J.-C. Gregoire, and T. Ohta.
Feature interaction detection contest. In Feature Interactions in Telecommunications Systems V, pages 327–
359. IOS Press, 1998.
[13] N. Halbwachs, F. Lagnier, and P. Raymond. Synchronous Observers and the Verification of Reactive
Systems. In Third Int. Conf. on Algebraic Methodology
and Software Technology, AMAST’93, Twente. Workshops in Computing, Springer Verlag, 1993.
[14] D. Hamlet and R. Taylor. Partition Analysis Does Not
Inspire Confidence. IEEE Transactions on Software
Engineering, pages 1402–1411, december 1990.
[15] L.J. Jagadeesan, A. Porter, C. Puchol, J.C. Ramming,
and L. Votta. Specification-based Testing of Reactive
Software: Tools and Experiments. In 19th International Conference on Software Engineering, 1997.
[16] M. Müllerburg, L. Holenderski, O. Maffeis, A. Merceron, and M. Morley. Systematic Testing and Formal
Verification to Validate Reactive Programs. Software
Quality Journal, 4(4), 1995.
[17] J. Musa. Operational Profiles in Software-Reliability
Engineering. IEEE Software, pages 14–32, march
1993.
[18] F. Ouabdesselam and I. Parissis. Testing Synchronous
Critical Software. In 5th International Symposium
on Software Reliability Engineering, Monterey, USA,
1994.
[7] C. Crouzet, Y. Mazuet, and P. Thevenod-Fosse. On statistical structural testing of synchronous data flow programs. In First European Dependable Computing Conference, Berlin, Germany, october 1994.
[19] F. Ouabdesselam and I. Parissis. Constructing operational profiles for synchronous critical software. In 6th
International Symposium on Software Reliability Engineering, pages 286–293, Toulouse, France, 1995.
[8] L. Dillon and Q. Yu. Oracles for checking temporal
properties of concurrent systems. Software Engineering Notes, 5(19):140–153, 1994. 2nd ACM SIGSOFT
Symposium on Foundations of Software Engineering.
[20] I. Parissis and F. Ouabdesselam. Specification-based
Testing of Synchronous Software. In 4th ACM SIGSOFT Symposium on the Foundation of Software Engineering, San Francisco, USA, 1996.
[9] L. du Bousquet, F. Ouabdesselam, and J.-L. Richier.
Expressing and implementing operational profiles for
reactive software validation. In 9th International Symposium on Software Reliability Engineering, Paderborn, Germany, 1998.
[21] P. Ramadge and W. Wonham. Supervisory Control of a
Class of Discrete Event Processes. SIAM J. CONTROL
AND OPTIMIZATION, 25(1):206–230, january 1987.
[10] L. du Bousquet, F. Ouabdesselam, J.-L. Richier, and
N. Zuanon. Incremental feature validation : a synchronous point of view. In Feature Interactions in
Telecommunications Systems V, pages 262–275. IOS
Press, 1998.
[22] J. Tretmans. A formal approach to conformance testing. PhD thesis, University of Twente, Enschede, The
Netherlands, 1992.
[23] J. Whittaker. Markov chain techniques for software
testing and reliability analysis. PhD thesis, University
of Tenessee, 1992.