Lutess

Testing commercial software is expensive and time consuming. Automated testing methods promise to save a great deal of time and money throughout the software industry. One approach that is well-suited for the reactive systems found in telephone switching systems is specification-based testing. We have built a set of tools to automatically test software applications for violations of safety properties expressed in temporal logic. Our testing system automatically constructs finite state machine oracles corresponding to safety properties, builds test harnesses, and integrates them with an application. The test harness then generates inputs automatically to test the application.

Lutess: a Specification-driven Testing Environment for Synchronous Software1 L. du Bousquet F. Ouabdesselam J.-L. Richier N. Zuanon1 LSR-IMAG, BP 72, 38402 St-Martin-d’Hères, France fldubousq, ouabdess, richier, zuanong@imag.fr ABSTRACT Several studies have shown that automated testing is a promising approach to save significant amounts of time and money in the industry of reactive software. But automated testing requires a formal framework and adequate means to generate test data. In the context of synchronous reactive software, we have built such a framework and its associated tool -Lutess- to integrate various well-founded testing techniques. This tool automatically constructs test harnesses for fully automated test data generation and verdict return. The generation conforms to different formal descriptions: software environment constraints, functional and safety-oriented properties to be satisfied by the software, software operational profiles and software behavior patterns. These descriptions are expressed in an extended executable temporal logic. They correspond to more and more complex test objectives raised by the first pre-industrial applications of Lutess. This paper concentrates on the latest development of the tool and its use in the validation of standard feature specifications in telephone systems. The four testing techniques which are coordinated in Lutess uniform framework are shown to be well-suited to efficient software testing. The lessons learnt from the use of Lutess in the context of industrial partnerships are discussed. Keywords Automated testing, synchronous reactive software, telecommunications systems, Lustre, operational profiles, behavioral patterns. 1 INTRODUCTION Testing receives an increasing attention from research teams working on formal techniques for software specification, development and verification, for several reasons. First, testing is the only means left to perform the validation of a piece of software, when formal verification is impracticable because of lacks of memory and/or time. Second, in industrial contexts, the formal specifications are often incomplete, even if an entire important feature may be totally specified. Nevertheless, this specification effort should not be wasted: it can be profitably exploited through testing. Third, testing brings a practical solution to the assessment of the specifications themselves. If the specifications can be made executable, it can help one get confidence in the consistency and relevance of these specifications. It can also reveal discrepancies between the specifications and the specifier’s intentions. So, in this context, testing is used jointly with, and in complement to formal verification [5]. Besides, to be a significant support to validation, the testing techniques must either provide a basis to reliability analysis [14], or have a wellestablished error detection capability and a proven cost effectiveness. Reactive systems provide an appropriate area to promote testing techniques within a formal framework. Indeed, they are mainly safety critical applications which require reliable testing. A reactive system continuously responds to signals from its environment, and must satisfy temporal constraints so that it can capture all the external events of concern. An interesting subclass of reactive systems is synchronous software. The synchrony hypothesis [3] states that every reaction of the software application to external events is theoretically instantaneous (actually, short enough to guarantee that the environment remains invariant during the computation of the reaction). The main merit of the synchronous approach is that its associated programming languages (e.g. Lustre [6] or Esterel) have a formal semantics and are equipped with model checkers for program verification. This has enabled the successful specification, implementation and verification of several industrial applications2. We have investigated several testing techniques dedicated to the synchronous approach in order to validate either formal specifications written in Lustre3 or programs against properties also written in Lustre. We have built a tool -Lutess- 1 This work has been partially supported by a contract between CNET- France Telecom and University Joseph Fourier, #957B043. 2 Flight controllers in Airbus aircraft, nuclear reactor monitors, subway interlocking and monitoring systems... 3 For sake of brevity, Lustre is not presented herein. which embodies four specification-based testing methods and a structural testing method [18] based on the coverage of the operator net associated with any Lustre program (This latter technique is not presented here). The former methods correspond to a variety of validation goals: testing for functional correctness, for safety and for reliability. They are all implemented as highly automated processes. With Lutess, we defend two theses. The first thesis is that a monoformalism approach can be rich enough to encompass several levels of languages: the software specification language, the test specification language and the programming language. We also argue that the same technology can be used to support the implementation of the verification techniques and the testing techniques as well. The work on Lutess represents significant advances over other works on testing reactive software applications. Statistical testing techniques [7] and algebraic specification-based testing methods [4] developed for sequential programs are applied to the automata which are the execution abstract models of Lustre programs. Further studies are conducted to make them more adapted to the synchronous approach. In [16], a mixed approach combining formal verification and systematic, but not automatic testing is presented. Lately, Jagadeesan et al [15] have described a technique and a toolset for testing reactive software for violation of safety properties. In many ways, this approach is similar to a previous version of Lutess [20]. Whereas our approach shares some analogies with conformance protocol testing [22], it differs in several respects. Mainly, the latter can be considered as a verification technique, equivalent to exhaustive testing under certain hypotheses: it checks equivalence of two complete specifications (the formal specification of the protocol and a model of its implementation). On the opposite, our approach has no verification purposes. It does not require any formal model of the implementation to be tested; it can even work with only a partial specification of the system. Consequently, the scope of industrial application of our technique is larger. But, using incomplete specifications forbids static generation of the test cases and forces us to build them at run-time, dynamically. The purpose of this paper is twofold. First, we want to demonstrate the theses which are defended with Lutess. Second, through two industrial case studies, we intend to show the advantages of the testing approach based on Lutess, both in terms of cost and efficiency. To this end, the paper is structured in five main sections. Section 2 is a short overview of Lutess. Section 3 surveys all the testing methods implemented in Lutess, from the tester’s viewpoint. Section 4 presents the theoretical limitations of the testing techniques. In section 5, we explain the test data selection process. Section 6 reports the experience gained in two case studies undertaken in industrial partnership on telecommunication problems. The conclusion includes a comparison between Unit under test Constrained random generator Oracle Verdict Figure 1: Lutess Lutess and some related works. 2 TESTING REACTIVE SYSTEMS WITH LUTESS A Specification-driven Process An important feature of any reactive system is that it is developed under assumptions about the environment behavior, and, apart from the validation of its robustness, its test consists of observing its reactions to valid environment behaviors. Clearly, it makes no sense to test a telephone system whose users can dial a number while being on the hook (inv 1), or lift the handset twice in a row without going on the hook in between (inv2). Since the environment may be a very complex system, all its valid behaviors must be specified. Therefore, the environment properties constrain the test data which are to be generated. Furthermore, a reactive system output often depends on the history of the software previous responses to the external events. These time-dependent behaviors make difficult the expression of the test result interpretation as sets of inputoutput relations. To calculate the test results, we conjecture that the appropriate means is to use temporal properties. A Highly Automated Process To thoroughly test a reactive software application and get confidence in its reliability, the number of input-output relations (test cases) to be managed is very large. This fact and the complexity of the test data selection activity, as well as the test result evaluation task, discard the choice of a testing process based on human involvement. In addition, to easily take into account the nondeterministic behavior of the environment, the testing must rely on a dynamic generation of test data sequences. Lutess: Operational Principle The operation of Lutess requires three elements: a random generator, a unit under test (UUT) and an oracle (as shown in figure 1). Lutess constructs automatically the test harness which links these three components, coordinates their executions and records the sequences of input-output relations and the associated oracle verdicts. The three components are just connected to one another and not linked into a single executable code. The constrained random generator is automatically built by Lutess from specifications written in Lustre and from operational profiles [17] stated partially in Lustre. The specifications correspond to constraints defining the valid environment behaviors and possibly to properties which serve as test guides. These guiding properties define a subset of the environment behaviors which is considered to be of interest for the test. This subset contains, for example, data which adequately test safety properties to detect safety violations or data which lead the UUT into sequences of complex and rare situations. The operational profiles are provided as conditional probabilities associated with the UUT input variables. The specifications and the operational profiles are grouped into a specific syntactic unit, called a testnode. The notion of testnode has been introduced as a slight extension to Lustre. The UUT and the oracle are both synchronous and reactive programs, with boolean inputs and outputs. Optionally, they can be supplied as Lustre programs. If the oracle is written in Lustre, it is automatically compiled into an executable code. As an illustration, let consider a simple telephony system, providing the Plain Old Telephone Service (POTS). The environment is composed of a set of telephone devices (Phones) on which users can perform actions as inputs to the system. Outputs are signals sent back to the phones. In the following, RingPh and TalkPh (resp. OnPh and OffPh ) are predicates on the system outputs (resp. inputs). The testnode defines the valid environment behaviors as the conjunction of logical invariants. We give here the Lustre-like expression of the invariant inv2 presented above (with Ph 2 Phones): once OnPh from pre4 O Ph to O Ph The oracle groups properties to be satisfied by the system. Such a property, rather naive, could be: “Anytime a user lifts the handset of a ringing phone, he/she gets involved immediately in a communication”. The oracle would then include the logic expression corresponding to that property, with Ph 2 Phones: (pre RingPh and OffPh ) ) TalkPh Using Lutess, the test is operated on a single action-reaction cycle, driven by the generator. The generator randomly selects an input vector for the UUT and sends it to the latter. The UUT reacts with an output vector and feeds back the generator with it. The generator proceeds by producing a new input vector and the cycle is repeated. The oracle observes the program inputs and outputs, and determines whether the software specification is violated. The testing process is stopped when the user-defined length of the test sequence is reached. Lutess has a user-friendly interface, which offers the user an integrated environment: to define the testnode, the oracle and the UUT, to command the construction of the test harness, to compile Lustre programs, and to build constrained random generators, to run the testing process, to set the number and the length of the data sequences, and to replay a given sequence with a different oracle, 4 pre i is a Lustre expression which returns the previous value of i. to visualize the progression of the testing process and to format the sequences of inputs, outputs and verdicts, to abstract global results from the sequences for both testing efficiency analysis and software reliability evaluation. 3 THE TESTING METHODS The four techniques provided by Lutess are here described from the user’s viewpoint. Each of them is a form of blackbox testing. As an example, we consider an extension of the former system which now offers a Call Forwarding on No Reply feature. This feature allows its subscriber to have his/her incoming calls redirected when he/she does not answer within a given delay. The feature is dynamically activated/deactivated. Random Testing by Environment Simulation Test data are generated only with respect to the environment constraints. Therefore, the test data selection criterion is the weakest one can define for synchronous software. The test data generation is performed in such a manner that the data distribution is uniform. Empirical observations For complex systems, a uniform distribution is far from the reality. Indeed, on some test runs, we have noted that some users stayed off the hook for long periods of time after receiving a Busy Line indication, while in reality, they would have quickly gone on the hook. Similarly, many observed behaviors consisted in simple alternating going off and on the hook, without trying to perform any action in between, which is not a typical behavior. We also noticed that, on the whole, every user tried to call himself/herself as often as any other user. In the real world, such a behavior occurs very seldom. Operational Profile-based Testing This method gives a way to assess software reliability since the generation process can take into account the specifications of operational profiles [17], i.e. probabilities assigned to the input vectors. The practical problem with the operational profiles is that the tester should define them completely. Usually, achieving such a goal is a useless effort for two reasons. In practice, the specifications are most often incomplete. Furthermore, the user has only a partial knowledge of the environment characteristics, and he better grasps each input variable individually. To bypass this drawback, Lutess offers facilities to define in the test node a multiple probability distribution [23] in terms of conditional probabilities associated with the UUT input variables [9]. The variables which have no associated conditional probabilities are assumed to be uniformly distributed. The conditions are Lustre expressions. An algorithm is implemented in Lutess to automatically translate a set of conditional probabilities into an operational profile (and vice versa). Empirical observations Regarding the last unrealistic aspect mentioned just above, one would like to set probabilities so that the number dialed depends on the user who is dialing (x). For instance, if we consider 4 users in the environment, x has a basic probability of 1/4 to dial his/her own number. Thanks to the operational profile-based method, we unbalanced this distribution by formally specifying that “if x dials a number, he/she dials his/her own number once out of 10 times, and three times out of ten the number of every other user.” Property-oriented Testing While reliability is concerned with every possible fault, safety is only concerned with those which can lead the software to a mishap. Thus, the property-oriented testing method is aimed at selecting test data which facilitate the detection for safety property violations. This test may be performed regardless of any input distribution. More generally, at each cycle, this method automatically generates values which are the most liable to cause an instantaneous failure with respect to a conjunction of formulae. Each formula is reducible to a safety property (functional property as well as liveness property in bounded time). Let’s consider the simple property P : i ) o, where i (resp. o) is an input (resp. output) of the UUT. When i is false, the UUT cannot falsify P . Hence, the only value for i which adequately tests P is true. Note that the use of the property-oriented method is not exclusive: it is usually applied with test data which satisfy the environment constraints. Input values which are relevant to the considered properties are favored over the values only associated with the environment. Empirical observations One safety-like property of the telephony system is that the user’s phone goes back to its idle state every time its user goes on the hook. Driving the generation with such a property led to favoring the considered action, thus improving the tester’s confidence in the system’s reaction to this input. However, this resulted in every user tending to go on the hook as soon as possible; thus, many behaviors which are more realistic were never produced nor tested. Behavioral Pattern-based Testing As complexity grows, reasonable behaviors for the environment may reduce to a small part of all possible ones with respect to the constraints. Some interesting features of a system may not be tested efficiently since their observation may require sequences of actions which are too long and complex to be randomly frequent. The behavioral pattern-based method aims at guiding further the input generation so that the most interesting sequences are produced. A behavioral pattern characterizes those sequences by listing the actions to be produced (e.g. CFon(A,B) in fig. 2), as well as the conditions that should CFon(A,B) CFon(B,C) not CFoff(A) CFon(C,D) not CFoff(A/B) Dial(E,A) not CFoff(A/B/C) Upper conditions are actions to be produced, while conditions below curved lines are interval conditions. Figure 2: Example of a behavioral pattern hold between two successive actions (not CFoff(A) in fig. 2). Regarding input data generation, all sequences matching the pattern are favored and get higher chance to occur. To this end, desirable actions appearing in the pattern are preferred, while inputs that do not satisfy interval conditions get lower chance to be chosen. Unlike the constraints, these additional guidelines are not to be strictly enforced. As a result, all valid behaviors are still possible, while the more reasonable ones are more frequent. The model of the environment is thus more “realistic”. The generation method is usually invoked with environment constrained test data. Patterns are stated using graphical notations; Lutess automatically translates them into Lustre expressions. Empirical observations To avoid loops in the forwarding, the specification of the CFNR feature requires that no more than 2 redirections are ever performed on a single call in a row. When checking what could happen in the case of more then 2 redirections, we noticed that this situation had little chance to occur. On the contrary, the use of a pattern has proved that it increases the situation likelihood in shorter test sequences. Figure 2 shows such a pattern. CFon(x,y) means that user x activates the CFNR feature to forward calls towards y; CFoff(x) (resp. CFoff(x/y)) stands for the deactivation of the CFNR feature for x (resp. x or y). 4 THEORETICAL LIMITATIONS This section provides a formal framework for the testing methods, in order to show explicitly their applicability. The various forms of constrained random generators are formally expressed in terms of reactive I/O machines. In the following, for any set X of boolean variables, VX denotes the set of values of the variables in X . x 2 VX is an assignment of values to all variables in X . Definition 1 A reactive I/O machine (Q; qinit; A; B; t; out) where M is a 6-tuple Q is a finite set of states, qinit 2 Q is the initial state, A is a set of input variables, B is a set of output variables, t : Q VA VB ! Q is the (total) transition function. out : Q ! VB is the output function. 8q 2 Q 8a 2 VA 9b 2 VB 9q 2 Q; t(q; a; b) = q (M is reactive). 0 0 A reactive machine is never blocked: in every state, whatever the input is, a new output can be computed to enable a transition. In response to a sequence of inputs (a1; :::; an), a reactive I/O machine emits the sequence (b1 ; :::; bn) while going through the sequence of states (qinit; q1; :::; qn?1) with qk = t(qk ?1 ; ak ; bk ). Formal Definition of an Environment Simulator The abstraction of an environment simulator is an I/O machine which has to be reactive, and whose behavior is specific since input and output operations take place in reverse order: it begins with an output operation instead of an entry. With the same notation as in definition 1, it is sketched as in (B-1). It can be rewritten from the UUT point of view in terms of its inputs (ik ) and outputs (ok ) (see (B-2). 1 out(qinit) b F or k = 1::; read(ak ) qk bk +1 (B-1) t(qk ?1; ak ; bk ) out(qk ) i 1 out(qinit) F or k = 1::; read(ok ) qk ik +1 t(qk ?1; ok ; ik ) out(qk ) (B-2) Definition 2 An environment simulator (or a generating machine) is a reactive I/O machine Menv = (Q; qinit; O; I ; tenv; outenv ) where (resp. I ) is the set of the UUT output (resp. input) variables. Q is the set of all possible environment states. A state q is an assignment of values to all variables in L, I and O (L being the set of the testnode local variables). t : Q VO VI ! Q is the total transition function. env Q VI represents the environment specification. tenv : Q VO VI ! Q is the (possibly partial) transition function constrained by env; tenv (q; o; i) is defined and is equal to t(q; o; i) iff (q; i) 2 env. outenv : Qnfq j n 9i(q; i) 2 envg ! VI is the output function. It is an nondeterministic function that computes Senv (q), the set of all possible UUT inputs, then selects an element from Senv (q) according to an equally probable distribution. O Remark 1: The behavior (B-2) makes it clear that the UUT inputs cannot depend on the UUT outputs at the same instant. Since the purpose of the environment constraints env is to act on the generation process (i.e. on the out function), env can only depend on the output variable previous values. That justifies the formal definition of env. Definition 3 Let Menv = (Q; qinit; O; I ; tenv; outenv ) be a generating machine and post: Q ! 2Q the function defined by: post(q) = fq0 j9(o; i) 2 VO VI ; tenv (q; o; i) = q0 g . g (qinit) be the image of qinit under the transitive cloLet post sure of post. env is generating if t(q; o; i) ^ 8 2 q g (q post init) 0 g (qinit) q 2 post ) 9( i; o; q 0) q0 = Consider a testnode handling two software inputs (i; j ), whose environment constraint is env = pre i and j . When the associated I/O machine is in a state where pre i = f alse, there is no input value for which the constraint holds. Thus, from a formal point of view, env is not generating. However, if i is always set to true, the machine would never be blocked. So, from a practical viewpoint, a constrained random generator is always worth building. Remark 2: This is why, in terms of implementation, we don’t try to determine a priori if env is generating. We rather try to detect blocking situations during the generation. The means to determine whether env is generating is to compute the set of reachable states, or its complement (i.e., the set of states leading inevitably to the violation of env). These computations are based on a least fixed point calculation which can be impracticable [20, 13, 21]. Formal Definition of a Property-guided Machine Definition 4 Let Menv = (Q; qinit; O; I ; tenv; outenv) be a generating machine and fP Q VO VI be a predicate representing a property P . A UUT input value i 2 VI (adequately) tests P on state q 2 Q (adequateP (i,q)) iff 9o 2 VO ; fP (q; o; i) = false . The previous definition characterizes input data values that can facilitate the detection of property violations. Remark 3: Note that the adequate test data is searched for in the current state. Thus the technique is limited to an instantaneous guiding. When considering a safety property like pre i ) o, the generator does not discover that setting i to true will test the property at the following step. Definition 5 A property-guided machine is defined as MP = (Q; qinit; O; I ; tenv ; outP ) where (Q; qinit; O; I ; tenv ; outenv ) is a generating machine, is a conjunction of properties, outP computes both Senv and Senv \adequateP = fi 2 VI j (q; i) 2 env \ adequateP g. If the latter is not empty, a value is selected from that set, otherwise from Senv . P Whenever it is possible to produce an input value which adequately tests the properties, all input values which do not test adequately the properties are ignored. Remark 4: All the properties may not be jointly tested. For example, i being a software input, to adequately test the property not i or o, the generator must set i to true. This prevents it from selecting an adequate data for the property to be tested concurrently pre i or o, since pre i will always be true. Thus, definition 5 does not ensure that all the safety properties will be adequately tested. Formal Definition of an Operational Profile-guided Machine Definition 6 A operational profile-guided machine is defined as Gprof = (Q; qinit; O; I; tenv; outCPL ) where (Q; qinit; O; I; tenv; outenv ) is a generating machine CPL = (cp0; cp1; :::; cpk) is a list of conditional probabilities. Each cp is a 3-tuple (i; v; fcp ) where i is an input variable (i 2 I ), v is a probability value (v 2 [0::1]), and fcp is a condition (fcp Q VO VI ). v denotes the probability that the variable i takes on the value true when the condition fcp is true. outCPL is such that the selection process is no longer equally probable and depends on the conditional probability list. When the conditional probability list is empty, the machine is equivalent to the basic one. The conditional probability list overrides (possibly partially) the by-default equally probable distribution of the basic generating machine. Formal Definition of a Pattern-guided Machine A behavioral pattern (BP) is made out of alternating and ordered instant conditions and interval conditions. The instant conditions must be satisfied one after the other as time progresses. Each interval condition shall be continually satisfied between the two successive instant conditions which border it. A behavioral pattern characterizes the class of input sequences that match the sequence of conditions. A behavioral pattern (BP) is built with the following syntax rule, where a simple predicate (SP) is a Lustre boolean expression which does not include the current outputs: BP ::= [SP ] SP j [SP ] SP BP The non-braced predicates represent the instant conditions, while the braced predicates correspond to interval conditions. [true] CFon(A; B ) [not CFoff (A)] CFon(B; C ) is an example of a BP. BPs give a means to partially describe a sequence: whatever the inputs between two instant conditions, it is sufficient that the interval condition holds. With a behavioral pattern is associated a progress variable which indicates what prefix of the BP has been satisfied so far. To any value this variable can take corresponds a pair of predicates finter, condg which describes the next-to-appear predicate and the predicate that should continually hold in the meantime. Definition 7 A pattern-guided machine is defined as Gpat (Q; qinit; O; I; tenv; outBP ; progress) where (Q; qinit; O; I; tenv; outenv ) is a generating machine = BP ? = [true]cond0 [inter1 ]cond1 :::condn 1 [intern ]condn progress is an integer variable taking its value over Vprogress = [ 1; 0::n + 1]. It is the progress index on BP . Let SH ; SL ; SN : Q Vprogress VI be sets of input variables defined as, q Q; j Vprogress : ? 8 2 8 )=f 2 I j( )=f 2 I j( )=f 2 I j( 2 )2 )2 )2 ! j \ envg j \ condj \ envg i V q; i interj \ condj \ env g Given q and j , the current state and progress values, outBP – – – H (q; j L (q; j SN (q; j S S i V q; i cond i V q; i inter first selects a non-empty set among the above, then performs the standard value selection within this set. As a side effect, outBP also computes the next value for progress: if SH (q; progress) is chosen, progress is incremented, if SL (q; progress) is chosen, progress is set to -1, if progress=-1 or n+1, progress=0. Intuitively, the partition is motivated by the status of the transitions regarding the progression of the guiding process: SH includes all input that make the process go forward, SL groups those that lead to the process stopping, while SN gathers all transitions that do not affect the process. Remark 5: Definition 7 does not guarantee that the guiding process will lead to the completion of the pattern, i.e. to generate sequences that match it. Indeed, there may exist a reachable state for which a progress value makes both SL and SH empty. If the guiding process makes the machine reach this state, the process can’t progress nor regress anymore and becomes quiescent for the remaining of the test. Many other similar situations may occur, that prevent from completing the pattern. However, all of them are due to an incorrect description of the pattern. This description should be cautiously performed. 5 TEST DATA SELECTION The automaton obtained by compiling the environment constraints is coded using a symbolic notation in which the states are represented by a set of variables, and the transitions by boolean functions. These functions are implemented as a single Binary Decision Diagram (BDD) [2], extensively used for verification of reactive systems [13]. Each node of the diagram carries a variable and each of its outgoing branches is labelled with the value taken by that variable. The upper variables are those defining the environment state, the lower are the input variables. Locating the sub-diagram corresponding to a given state is therefore a trivial operation. The generation algorithm is based on a specific BDD labelling which is carried out once, during the BDD construction. Each node corresponding to an input variable e is labelled with a couple of integers (v0 ; v1). v0 (resp. v1 ) indicates the number of distinct input vectors including e when e=false (resp e=true). Random Testing by Environment Simulation The basic random generation algorithm produces equally probable input values. To guarantee to all the valid input vectors an equal probability, the value of e is set in function of the following probabilities: v1 and p(e = true) = v0 +v1 p(e = false) = v0 v0 v1 + At each cycle, the generator performs four operations: locate, in the diagram describing the environment constraints, the sub-diagram corresponding to the current values of the state, generate a random value for the software inputs satisfying the boolean function associated with that diagram, read the new software outputs, compute the next state by computing the next value of each state variable. In other words, the generator searches in the diagram associated with the constraints a path leading to a true leaf. Property-oriented Testing This technique is implemented by building a new BDD from the environment constraints and the properties to be tested. The resulting BDD allows to check whether a given state and a given value of the inputs both satisfy the environment constraints and are liable to exhibit an error with respect with the properties. The basic algorithm is modified as follows: locate, in this late diagram, the sub-diagram corresponding to the current value of the state, check whether there exists at least one value for the inputs which can lead to a true leaf in this diagram, if positive, randomly select one of these values; otherwise, perform the basic algorithm. Operational Profile-based Testing The generation algorithm uses both the previous BDD labelling and the conditional probability list. Let CP (e) =((p1 ; ce1),(p2 ; ce2)...(pr ; cer )) be a list of conditional probabilities associated with the input variable e. In CP (e), pj denotes the probability that the variable e takes on the value true when the condition cej is true. The selection function assigns a value to e according to the following probabilities: 8 > > < > > : p(e = true) = if if p(e cer ce 1 then p1 else if else::: then pr else ce 2 then p2 v1 v0 v1 + with v1 and v0 referring to the basic labelling = false) = (1 ? p(e = true)) Behavioral Pattern-based Testing Given the pattern to be matched, the method drives the generator to consider at every cycle the pair of predicates finter,condg corresponding to the current value of the progress variable. At each step, first, the input space is computed to get all the possible inputs meeting the environment specification. It is then divided into three categories: SH , SL and SN as stated in definition 7. A probability is assigned to each category so that an input in the first one would be favored over an input in the third cat- egory, which, itself, would be preferred to an input from the second category. These probabilities are determined with respect to the cardinality of each partition and to given weights associated with them : wH , wL and wN . A partition is said to be of higher priority than an other if its weight is greater. The input selection is a two-step process. First, a category is selected according to the determined probabilities. Each category c in C =fSH , SL , SN g has a probability pc of being selected: p c = P wc card(c) j 2C wj card(j ) Then, an input is chosen in an equally probable manner from the selected category. As a result, the probability for any input i in c to be chosen is pi;c: wc pi;c = card1 (c) pc = w card(j ) j 2C j P The implementation of the algorithm is also based on the environment BDD. Each predicate in the pattern is represented by a BDD. The predicate BDDs and the environment BDD are combined to identify the input sets SH , SL and SN . These BDD are labelled in the very same manner than for the basic generation. Every generation step involves therefore the traversal of the three diagrams corresponding to the current value of progress. The traversal leads to the subdiagrams corresponding to the current environment state, where the cardinality for SH , SL and SN can be retrieved, thanks to the labelling. The selection is then performed with respect to the given weights and the calculated cardinalities. 6 INDUSTRIAL APPLICATIONS Lutess feasibility has been studied in [19] and [20]. Recently, Lutess has shown its practicality and its efficiency on industrial case studies conducted in partnership with France Telecom. This section describes roughly the contribution of the tool, especially regarding its latest improvements. A first experimentation (A) has consisted in the validation of a synchronous model of telephony system [10]. From informal but rigorous descriptions (ITU Recommendations), we have developed a synchronous model which revealed itself well adapted to integrate new services to the system. This experiment has concerned five services. Another experiment (B ) has addressed twelve services in the framework of the “Feature Interaction Detection Tool Contest” [12] held in association with the 5th Feature Interaction Workshop ’98. The goal was to detect possible and undesired interactions between those services. Lutess has won the contest Best Tool Award. Problem Complexity The two forementioned experiments were of industrial size. The initial specifications occupy between 18 and 26 pages of english text per feature for ( A); for (B ), each feature was specified with 1 to 5 pages of Chisel diagrams (a require- ments definition language for communications services) [1]. The number of inputs/outputs varied from 17/198 (experience (A)) to 52/216 (experience (B )). The large number of inputs/outputs is due mainly to the fact that the interface has to be composed of booleans. The inputs are the users’s actions, and various parameters, while the output vector is made of signals for each user. In the first case, modeling the system required 1700 lines of Lustre code, plus up to 530 lines for each supplementary service. In the second one, these figures amount to 1100 and 550 lines. Environment descriptions included between 32 and 45 constraints, plus up to 8-step patterns or 16 conditional probabilities. Usage of the Tool For experience (B ), 78 configurations were to be tested, each including one or two features. In each case, 5 to 10 oracles were available (including both global consistency oracles, e.g. for message sequencing, and feature-specific oracles). The test process for each configuration involved 10 to 20 sequences of 1000 to 10000 steps each. On the whole, each configuration has been tested for around 1 million test cases. The Lutess tool was run over 1500 times. Experience ( A) was smaller, but involved however a more important usage of the tool, since the features under test were more complex. Usability of the Tool On such large-scale experiments, one benefits highly from the tool’s user-friendliness. Running a test process, modifying the guiding, replaying a same test case with modified oracles are all one-click commands. It is also possible to modify the UUT and re-run the test process without having to generate again the BDD associated with the environment. This option allows to save large amounts of time when debugging. Time was a critical ressource for experience ( B ): the contest was conducted in two phases, a first one long enough to have experimentors adjust the framework and tune the methods, and a second one short enough to assess the tool efficiency. We successfully managed to complete this phase in the short period of time allocated. This compels us to believe that our tool was well adapted to the problem. For test purpose, the subscription lists associating the users and the features they can use, are inputs to the system. So, one can test various configuration without having to recompile the UUT. However, some parameters cannot be modified without recompiling, like the number of users or the priority order set among the features. From the tester’s perspective, the tool allows a significant relief by automating the test. Building the oracle appeared to be the most difficult part of the testing process. One has first to master the temporal logic paradigm, then to find out the better terms to express a given property. This requires an adequate training and experience. From the specifier’s point of view, Lutess has shown to be very helpful to debug specifications. First, Lutess has been used to validate the oracles: the oracle specifications are put in place of the UUT and a human observation is substituted for the Lutess oracle. Second, prior to the search for interactions, the service specifications to be tested have been validated using oracle properties. For instance, in the specifications, some possible transitions were missing in a diagram, or an expected output message was never sent in a given situation. These problems were automatically exhibited as oracle violations. Test Cost Building the BDD structure corresponding to a given environment is the most time-consuming task of the mechanized part of the testing process. It was always possible to perform this computation and to run the test on a Sparc Ultra-1 station with 128 MB of memory. Maximum virtual memory required amounts to 100 MB. Though, as the number of constraints describing the environment increases, the BDD complexity rises and its generation lasts longer. For the lessconstrained environments that we produced, 6 seconds on CPU were necessary, while the most-constrained environments required 33 minutes for the corresponding BDD to be generated. As a comparison, a 1000 step test run lasts 120 seconds5 , once the BDD has been generated. Model Adequacy The choice of a synchronous model turned out to be welladapted to the problems of feature validation and interaction detection. This is due to the fact that a telephony system can be viewed as a reactive system and can be easily designed to satisfy the synchrony hypothesis. To that, one can for instance imagine a queueing mechanism for storing inputs that arrive during the system computation. Moreover, deriving the feature modeling from the specification was quite simple and almost systematic in the case of experience (B ) where Chisel diagrams were available; it induced no added cost to the test. The synchronous approach has led to concise validations, thanks to the reduced number of states in the model. In other words, all transitions are observable. The executable model is of higher abstraction and avoids the state space explosion problem. As a consequence, five execution steps are enough to initiate a call, while two steps suffice to terminate a communication. In addition to that, the UUT can be compiled into naive C code6 , instead of a structured and optimized automaton. This prevents state space explosion even further. Evaluation of the Benefits Brought by the Guiding Techniques The use of operational profiles or patterns has proven to be highly profitable when prototyping the application: these techniques allow to have a quick return on the correctness of the implementation. Then, when it comes to validate the 5 This phase of the testing process is proportional to the sequence length. 6 The UUT is implemented in C as a single-state automaton. Talk(A,B) and Hold(A) On(A) Dial(A,C) InvokeEct(A) On(A) Figure 3: Behavioral pattern for the ECT feature implementation (test its conformance to the specification), these techniques drive the environment to follow a realistic evolution. Meanwhile, thanks to the probabilistic aspect introduced in both methods, the behaviors of the environment may vary and involve rare and unforeseen scenarios. Such cases, close to the expected behavior -yet unexpected- are realistic and thus worth to be tested. Technique Application As an example, we describe here a relevant application of the behavioral pattern-based technique. Consider the Explicit Call Transfer supplementary service (ECT). This service allows the user to put on hold his/her party, to place another call and to transfer this new call to his/her first party (The party is the user to which one is connected). The user is no longer in the communication. To test efficiently the feature, one should exhibit sequences of input data that correspond to its invocation. Such sequences should include the following actions: first, putting one’s party on hold (Hold), then placing another call (Dial) and finally performing the feature’s invocation(InvokeEct). The intervals between these actions are constrained as well. That is obviously too long and complex to be randomly frequent. As a result, performing a test with nothing else but the environment specification has lead to disappointing results. Experiment shows that on a 8000 step simulation, the sequence appeared only twice, which is clearly poor and of little significance. Applying the pattern-based guiding with moderate weights drives the desired situation to occur 30 times, which can be viewed as a reasonable use of the service. We also tried to stress testing the feature by unbalancing even further the weights: the number of occurrences has raised up to 500. Figure 3 shows the graphical expression of the loosest pattern to achieve the guiding. To increase the guiding, one could have detailed even further the pattern, e.g. by setting conditions on C’s state or B’s and C’s behaviors on the intervals. 7 CONCLUSION In this article, we described a work which shares with Dillon and Yu’s approach [8], the idea of using a graphical notation to specify temporal properties. However, in [8] the problem of automatic test data generation is not addressed. As ours, the work presented in [11] guides the testing process and focuses on the validation of some features of the application. Yet, it still requires a complete specification while ours can cope with a very partial one. In [15], Jagadeesan et al have presented a technique and a toolset that represent the most similar work to Lutess. Compared to Lutess, this approach appears to be limited in several respects. The testing process is solely directed towards safety violations, and, thus, finds only errors related to this paradigm. Environment constraints are only taken into account to restrict the size of the input space. Inputs are only selected with uniform weights. The whole process is based on the compilation of the oracle, the application and the test harness into one single executable code; recompiling is necessary after each modification, which caused the biggest dissatisfaction, according to what the authors said. We presented Lutess, a highly automated testing environment for synchronous software and reported two experimentations to assess its suitability. The supported approach is both formal specification-driven and machine intensive. Thus, the human efforts can be profitably transferred from the classical tester’s chores (selecting the data, determining the result validity) to more defect prevention tasks by developing specifications. Besides, the same language is used to specify the software, its environment, its usages and various testing situations. The industrial case studies have confirmed that this approach is highly cost-effective. Globally, in an average 20 man weeks effort, around 50 properties were stated and 7000 lines of Lustre programs to be tested were written, millions of test cases were automatically generated, the verdicts of thousands of runs were automatically delivered and around 100 defects were detected. Furthermore, the case studies have proved that the guiding techniques were excellent at finding problems involving rare scenarios. This positive experience was reinforced by the valuable application of Lutess in the software specification stage, which helped get confidence in these specifications. All this has certainly contributed to make Lutess the “best tool” of the FIW contest [12]. Experimentation has however highlighted the following drawbacks. It has clearly shown that the more detailed the properties, the higher the chances to detect a problem. However, detailing properties is not always possible, and/or would lead us away to the user’s view which, so far, we have tried to favor. One has therefore to find a good balance for the property precision level. In addition, specifying the software environment by means of invariant properties is a rather delicate task. Indeed, one should adequately choose a set of properties which do not “overspecify” the environment. Overspecifying may prevent some realistic environment behaviors from being generated. The generalization of these results is worth examining, since only the development context of this study makes its external validity questionable. For the time being, our approach is restricted to boolean programs. However, extending it to manipulate other data types seems to be feasible, since BDDs can handle numerical constraints. It would though require adjustments of the test data selection techniques. Although our work includes both structural testing and black-box testing, we have not tried yet to relate these methods. That would be interesting to provide measurement of the techniques coverage. This relation is under study. REFERENCES [1] A. Aho, S. Gallagher, N. Griffeth, C. Schell, and D. Swayne. Scf3TM sculptor with chisel: Requirements engineering for communications services. In Feature Interactions in Telecommunications Systems V, pages 45–63. IOS Press, 1998. [2] S.B. Akers. Binary Decision Diagrams. IEEE Transactions on Computers, C-27:509–516, 1978. [3] A. Benveniste and G. Berry. The Synchronous Approach to Reactive and Real-Time Systems. Proceedings of the IEEE, 79(9):1270–1282, 1991. [4] G. Bernot, M-C. Gaudel, and B. Marre. Software testing based on formal specifications : a theory and a tool. Software Engineering Journal, 6:387–405, 1991. [5] J. Bicarregui, J. Dick, B. Matthews, and E. Woods. Making the most of formal specification through animation, testing and proof. Science of computer programming, 29(1-2), 1997. [6] P. Caspi, N. Halbwachs, D. Pilaud, and J. Plaice. LUSTRE, a declarative language for programming synchronous systems. In 14th Symposium on Principles of Programming Languages (POPL 87), Munich, pages 178–188. ACM, 1987. [11] J.-C. Fernandez, C. Jard, T. Jéron, and C. Viho. An experiment in automatic generation of test suites for protocols with verification technology. Science of Computer Programming, 29:123–146, 1997. [12] N. Griffeth, R. Blumenthal, J.-C. Gregoire, and T. Ohta. Feature interaction detection contest. In Feature Interactions in Telecommunications Systems V, pages 327– 359. IOS Press, 1998. [13] N. Halbwachs, F. Lagnier, and P. Raymond. Synchronous Observers and the Verification of Reactive Systems. In Third Int. Conf. on Algebraic Methodology and Software Technology, AMAST’93, Twente. Workshops in Computing, Springer Verlag, 1993. [14] D. Hamlet and R. Taylor. Partition Analysis Does Not Inspire Confidence. IEEE Transactions on Software Engineering, pages 1402–1411, december 1990. [15] L.J. Jagadeesan, A. Porter, C. Puchol, J.C. Ramming, and L. Votta. Specification-based Testing of Reactive Software: Tools and Experiments. In 19th International Conference on Software Engineering, 1997. [16] M. Müllerburg, L. Holenderski, O. Maffeis, A. Merceron, and M. Morley. Systematic Testing and Formal Verification to Validate Reactive Programs. Software Quality Journal, 4(4), 1995. [17] J. Musa. Operational Profiles in Software-Reliability Engineering. IEEE Software, pages 14–32, march 1993. [18] F. Ouabdesselam and I. Parissis. Testing Synchronous Critical Software. In 5th International Symposium on Software Reliability Engineering, Monterey, USA, 1994. [7] C. Crouzet, Y. Mazuet, and P. Thevenod-Fosse. On statistical structural testing of synchronous data flow programs. In First European Dependable Computing Conference, Berlin, Germany, october 1994. [19] F. Ouabdesselam and I. Parissis. Constructing operational profiles for synchronous critical software. In 6th International Symposium on Software Reliability Engineering, pages 286–293, Toulouse, France, 1995. [8] L. Dillon and Q. Yu. Oracles for checking temporal properties of concurrent systems. Software Engineering Notes, 5(19):140–153, 1994. 2nd ACM SIGSOFT Symposium on Foundations of Software Engineering. [20] I. Parissis and F. Ouabdesselam. Specification-based Testing of Synchronous Software. In 4th ACM SIGSOFT Symposium on the Foundation of Software Engineering, San Francisco, USA, 1996. [9] L. du Bousquet, F. Ouabdesselam, and J.-L. Richier. Expressing and implementing operational profiles for reactive software validation. In 9th International Symposium on Software Reliability Engineering, Paderborn, Germany, 1998. [21] P. Ramadge and W. Wonham. Supervisory Control of a Class of Discrete Event Processes. SIAM J. CONTROL AND OPTIMIZATION, 25(1):206–230, january 1987. [10] L. du Bousquet, F. Ouabdesselam, J.-L. Richier, and N. Zuanon. Incremental feature validation : a synchronous point of view. In Feature Interactions in Telecommunications Systems V, pages 262–275. IOS Press, 1998. [22] J. Tretmans. A formal approach to conformance testing. PhD thesis, University of Twente, Enschede, The Netherlands, 1992. [23] J. Whittaker. Markov chain techniques for software testing and reliability analysis. PhD thesis, University of Tenessee, 1992.

RELATED PAPERS

RELATED TOPICS

Log In

Lutess

Lutess

Related Papers

RELATED PAPERS

RELATED TOPICS