[go: up one dir, main page]

Academia.eduAcademia.edu

Static Analysis of IMC

2012, The Journal of Logic and Algebraic Programming

The Journal of Logic and Algebraic Programming 81 (2012) 522–540 Contents lists available at SciVerse ScienceDirect The Journal of Logic and Algebraic Programming journal homepage: www.elsevier.com/locate/jlap Static Analysis of IMC Nataliya Skrypnyuk a,∗ , Flemming Nielson a , Henrik Pilegaard b a b Informatics and Mathematical Modelling, Richard Petersens Plads Bldg. 321, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark Kapow Software, Dr. Neergaards Vej 5A, DK-2970 Hørsholm, Denmark ARTICLE INFO ABSTRACT Article history: Available online 21 March 2012 Process algebras formalism is highly suitable for producing succinct descriptions of reactive concurrent systems. Process algebras allow to represent them in a compositional way, as processes that run in parallel and interact, for example, through synchronisation or message passing. On the other hand, checking properties on process algebraic descriptions is often hard, while “unfolding” them into the Labelled Transition Systems can lead to the infamous state space explosion problem. In this work we use a subtype of Data Flow Analysis on systems defined by finite-state process algebras with CSP-type synchronisation – in particular, on our variant of IMC with a more permissive syntax, i.e. with a possibility to start a bounded number of new processes. We prove that the defined Pathway Analysis captures all the properties of the systems, i.e. is precise. The results of the Pathway Analysis can be therefore used as an intermediate representation format, which is more concise than the Labelled Transition System with all the states explicitly represented and more suitable for devising efficient verification algorithms of concurrent systems than their process algebraic descriptions – see, for example, the reachability algorithm in Skrypnyuk and Nielson (2011) [17]. © 2012 Elsevier Inc. All rights reserved. Keywords: Process algebras Data Flow Analysis Pathway Analysis IMC 1. Introduction Many interesting concurrent systems can be modelled by means of process algebras (also called process calculi). The last provide operators that make it possible to specify systems in a compositional way, by building more complex systems out of simple ones. The syntax of process algebras is namely suitable for specifying subsystems and interactions between them separately, leading to concise and compositional syntactic descriptions. We are usually interested in some qualitative or quantitative information about the systems – for example, in questions concerning safety or liveness properties, performance, dependability guarantees, etc. The usual method of answering such questions is to “unfold” a process description in the syntax of the corresponding process algebra into the Labelled Transition System (LTS). The subsystems distinguishable in the process algebra are not recognisable anymore in the LTS; in return, all the behaviour of the system is explicitly represented. Consequently, model-checking algorithms [2] or another kinds of analysis are run on the constructed LTS in order to verify whether the system has the required properties. One potential problem with this approach is the infamous state space explosion problem: a small description of a process in the corresponding algebra’s syntax can give rise to a large or even infinite LTS. Another possible approach is to use so-called Static Analysis techniques to analyse the system’s syntax directly. Static Analysis has been initially developed in the field of imperative programming languages. It was often used in order to check for errors in a program without actually executing it, purely by analysing the code (see [7] for a detailed description of different Static Analysis techniques). Several Static Analysis techniques have been adapted to a variety of process algebras with a similar purpose – to deduce properties of system’s ∗ Corresponding author. E-mail addresses: nsk@imm.dtu.dk (N. Skrypnyuk), nielson@imm.dtu.dk (F. Nielson), henrik@pilegaard.org (H. Pilegaard). 1567-8326/$ - see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jlap.2012.03.011 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 523 behaviour directly from the syntax, thus avoiding building the full LTS. In particular, Control Flow and Data Flow Analysis have been applied to the calculi of CCS, BioAmbients, π -calculus, etc. (see [6,9–11,13]). In this work we will use the Data Flow Analysis method – see, for example, [5] for the theoretical background of the Monotone Data Flow Analysis Frameworks. Data Flow Analysis can be computed in the size of the syntax of the program or process algebraic expression to which it is applied, and for process algebras it can be computed in a compositional way, i.e. after adding a new parallel process to the system the previously computed analysis results can be reused. We call our method Pathway Analysis in order to distinguish it from the classical Data Flow Analysis. The name has been chosen in relation to the previous work of two of the authors of this paper [8,13] where similar methods have been applied to the BioAmbients calculus in order to compute all possible pathways in a cell. In the previous work (e.g. [10,11]) the Data Flow Analysis was used on process algebraic expressions in order to construct finite systems that would simulate original systems that might have an infinite state space. In the current work we have a different purpose. We intend, first, to show that Data Flow Analysis is also applicable to a stochastic process algebra and to a process algebra with the CSP-style synchronisation (see [4] for the definition of the CSP calculus), i.e. on the algebra of Interactive Markov Chains or IMC (see [1]). We show how to employ a more permissive syntax, with a bounded number of new processes that can be started, and still guarantee the finiteness of the system’s state space (through checking the well-formedness conditions). Moreover, we prove that Pathway Analysis operators deliver results on IMC systems that are stable under any number of transitions. They represent thus an alternative and often more convenient format for describing concurrent systems with a CSP-style synchronisation (recursion unfolding is not necessary). In particular, the algorithms for computing an overapproximation of reachable states (see [17]), as well as for computing bisimulation relations and average time reachability (see Ph.D. thesis [16]) have been devised based on the precomputed Pathway Analysis. We present the syntax and the Structural Operational Semantics of our variant of IMC (so-called “guarded” IMC, or IMCG ) in Section 2. Our analysis is applied to “well-behaved” IMCG expressions, called IMCG programs. The conditions for IMCG expressions to be considered programs, in particular, the well-formedness condition, are described in Section 3. In Section 4 we introduce a number of operators on the syntax of IMCG that either have been adopted for IMCG from the previous work (see, for example, [10]) or have been defined anew. The main theoretical results concerning the correctness of the proposed Pathway Analysis and it’s stability under any number of transitions are proved in Section 5. In order to illustrate that the Pathway Analysis results are still valid after any number of transitions, we present the so-called Worklist algorithm in Section 5 which builds a transition systems strongly bisimilar to the transition system induced by the analysed IMCG program based on the Pathway Analysis results (only those states are merged together by the Worklist algorithm which differ in their copied into the syntax process definitions). The paper is concluded with a discussion of the perspectives of the presented method in Section 6. The proofs for all the lemmas are presented in Appendix A. This work is a part of the Ph.D. thesis of the first author ([16]), with a notation and proofs being simplified in the current work, and has been presented as the work in progress on the NWPT workshop [18]. As already mentioned, some of the introduced operators on the syntax of IMC are based on the previous work. The well-formedness conditions for IMC are inspired by the well-formedness conditions for BioAmbients in Pilegaard et al. [13]. In general, however, our approach and proof strategies differ considerably from the previous work. 2. Introduction to IMC G calculus 2.1. Syntax We have chosen to perform the analysis of the algebra of Interactive Markov Chains, or IMC (see [1,3]), which is an orthogonal extension of both Markov Chains and traditional algebras. This means that IMC processes can execute both actions and exponentially distributed delays, and there is no direct connection between them (i.e. actions are assumed to take no time). Actions can sometimes only be executable together with other actions (i.e. synchronised with other actions) while delays do not synchronise or in other way influence each other. Note that even though we only have exponentially distributed delays in IMC, we can approximate any delay duration through them. We shall write processes in a guarded variant of Interactive Markov Chains (see the syntax rules (1–2) in Table 1) for which we have coined the name IMCG . When defining this language we shall assume a countable set of actions, Act, and a distinguished internal action, τ , such that τ  ∈ Act. An infinite set of constants, Rate ⊆ R+ , shall be used to describe Markovian rates. We shall use α to range over (Act ∪ {τ } ∪ Rate). Furthermore, we shall draw upon a countable set of labels, Lab, in order to annotate action and rate prefixes that occur in processes. Finally, a countable set of process variables, also called process identifiers, Var, shall assist us in the definition of recursive processes. The syntax of the language, which is shown in Table 1, comprises the following syntactic classes: action and rate guarded process variables (1–2), action and rate prefixed processes (3-4), sums or choice constructs (5), scope restrictions or hide constructs (6), parallel compositions or synchronisation constructs (7), recursive process definitions (8), and terminal processes (9). Scope restrictions and synchronisation compositions are parametrised by sets of action names from Act: these are namely actions that are “internalised” or hidden in the first case and synchronised in the second case. The reason for using guarded process variables in the rules (1–2) is to ensure that process variables only occur in guarded positions, which will, in particular, guarantee that processes are well-defined: we are excluding, for example, X := X. This 524 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 Table 1 Syntax of IMCG : a ∈ Act ∪ {τ }, λ ∈ Rate, ℓ ∈ Lab, X ∈ Var, A ⊆ Act. P ::= aℓ .X | (1) | (2) λℓ . X aℓ .P | (3) λℓ . P | (4) P+P | (5) hide A in P | (6) P AP | (7) X := P | (8) 0 (9) will also make some of the proofs of the lemmas in this paper easier. Apart from this highly practical restriction, there is one more difference with the syntax of IMC in Hermanns [3]: we allow the application of synchronisation and internalisation constructs (rules (7) and (6)) not only on the highest syntactic level, i.e. they can also occur inside process definitions. This, as we will see from the definition of the semantics in Section 2.2, could potentially lead to an infinite state space. We will, however, exclude such cases with the help of the well-formedness conditions that will be introduced in Section 3. We also assign labels to all the actions and rates. As we will see in Table 3 describing the Structural Operational Semantics of IMCG , labels do not have a special semantic meaning, but we will make an active use of them in the definition of our analysis and proving its correctness. Synchronisation in IMCG is similarly to IMC defined in the CSP-style (compare to the definition of the CSP calculus in Hoare [4]), i.e. any number of actions can be synchronised. Process definitions in IMCG , as in the IMC-variant from Brinksma and Hermanns [1], perform two functions: they define process variables and represent process expressions at the same time. Therefore, for example, X := P represents the process X where X = P. As usual, we consider only IMCG processes with finite syntactic definitions, with a finite number of action names, delay rates, variables and labels occurring in them. An IMCG expression is called process identifier closed (or simply closed) if all process variables in it are bound or not free, i.e. are contained in process recursive definitions complying with the syntactic rule (8) from Table 1. For example, X := aℓ .X is considered to be closed while aℓ1 .X .X := aℓ2 .X is not, because the first X is not bound. We will often conduct proofs by induction on the syntax of IMCG expressions. This means that we will be proving that some property holds for all subexpressions of some IMCG expression. Formally speaking, the property holds for all E ′ such that E ′  E. The subexpression relation  is defined in the intuitive way: i.e. E ′ can be turned into E by applying to it any number of syntactic constructs from Table 1. In particular, E is a subexpression of itself. We also define the function Labs that returns a set of labels that occur inside its argument: for example, Labs(X := aℓ .X ) = {ℓ}. 2.2. Semantics In this section we will present the Structural Operational Semantics (introduced in Plotkin [14]), shortly SOS, of IMCG and prove several results that will be useful in our further considerations. We have adopted the SOS rules from Brinksma and Hermanns [1] with one exception: transitions are additionally decorated with so-called multisets of labels that are elements from the domain M introduced in Table 2. Elements of M represent labels that are involved in the transition derivation. They will also be used in defining and proving the correctness of our analysis. We define M as a set of functions that assign each label from Lab a positive natural number or zero – the number of occurrences. The least element of M is denoted ⊥M and is defined in a natural way, as a function that assigns zero to all the labels. The sum, upper and lower bound operators on the elements of M are defined in a straightforward way, while for the subtraction we have to pay attention that labels cannot be assigned negative numbers in M: the smallest number of label’s occurrences is zero. The domains N and V defined in Table 2 are sets of functions from accordingly labels and variables into M (they will be used in the analysis), while the domain L contains mappings between labels and their corresponding names (if the mapping is not known then the symbol ? is returned). The mapping between labels and names for a particular expression can be computed by the function ln defined in Table 9. The function name returns an action name or delay rate for a multiset of labels if it is the same for all the labels mapped to a positive number in the multiset (a set of such labels is returned by the function dom). The internal action τ will be returned instead of some external action a by the function nameh if the last is parameterised by some set A not containing a (A will in general contain action names not internalised by the hide-construct). Two expressions E and E ′ are connected by a transition relation if E some α α α −→ E ′ can be derived from the rules in Table 3 for C ∈ Act ∪ {τ } ∪ Rate and C ∈ M. We call the transition −→ “enabled” or “executable” for E, and we say that E′ is a C α derivative expression or simply derivation of E obtained after the execution of the transition −→. A Labelled Transition System C 525 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 Table 2 Definition of data structures M , N , V and L , with M1 , M2 , C ∈ M , N1 , N2 ∈ N , V1 , V2 ∈ V , L1 , L2 , L ∈ L . M ⊥M (ℓ) M1 ≤ M2 [M1 + M2 ] (ℓ) [M1 − M2 ] (ℓ) [M1 − M2 ] (ℓ) [M1 ⊔ M2 ] (ℓ) [M1 ⊓ M2 ] (ℓ) dom(C ) N ⊥N (ℓ) [N1 ⊔ N2 ] (ℓ) ::= = ⇔ = = = = = = ::= = = ::= = = ::= = = = ⇔ = Lab → N0 0 M1 (ℓ) ≤ M2 (ℓ) M1 (ℓ) + M2 (ℓ) M1 (ℓ) − M2 (ℓ) 0 max(M1 (ℓ), M2 (ℓ)) min(M1 (ℓ), M2 (ℓ)) {ℓ ∈ Lab|C (ℓ) > 0} Lab → M for all ℓ ∈ Lab for all ℓ ∈ Lab for all ℓ ∈ Lab if M1 (ℓ) ≥ M2 (ℓ) if M1 (ℓ) < M2 (ℓ) for all ℓ ∈ Lab for all ℓ ∈ Lab ⊥M for all ℓ ∈ Lab N1 (ℓ) ⊔ N2 (ℓ) for all ℓ ∈ Lab V Var → M ⊥V (X) ⊥M for all X ∈ Var [V1 ⊔ V2 ] (X) V1 (X) ⊔ V2 (X) for all X ∈ Var L Lab → Act ∪ {τ } ∪ Rate ∪ {?} ? for all ℓ ∈ Lab ⊥L (ℓ) [L1 ⊔ L2 ] (ℓ) α if L1 (ℓ) = L2 (ℓ) = α [L1 ⊔ L2 ] (ℓ) ? if L1 (ℓ)  = L2 (ℓ) L1 ≤ L2 L1 (ℓ) = L2 (ℓ) for all ℓ s.t. L1 (ℓ)  =? name (C ) α if (ℓ) = α for all ℓ ∈ dom(C ) name (C ) = ? if ∄α s.t. (ℓ) = α for all ℓ ∈ dom(C ) nameh ,A (C ) = name (C ) if name (C ) ∈ Rate ∪ A ∪ {?} nameh ,A (C ) = τ otherwise (LTS) can be constructed for an IMCG expression E by registering all the possible transition relations for E and its derivative ∗ α expressions. The relation −→ is defined a reflexive and transitive closure of the relations −→ for all α C ∈ Act ∪ {τ } ∪ Rate and C ∈ M. The rules in Table 3 show how syntactic terms from IMCG can be put into correspondence with Labelled Transition Systems in a compositional manner. Most of the rules closely follow usual SOS rules for process algebras: i.e. the rules for prefixing – (1) and (10), choice – (2–3) and (11–12), synchronisation – (4–6) and (13–14), hiding – (7–8) and (15), and recursion unfolding – (9) and (16). The term E {X := E /X} denotes the expression E with every free occurrence of the variable X in it substituted by the expression X := E. In the following we may refer to an action, delay rate, or their corresponding label as being “executable” if they occur in the rules (1) and (10) that are used in the derivation of some transition. Note that the rules for transitions decorated with action names and decorated with rates (“actions”- and “delay”- or τ τ “Markovian” transitions) are very similar in Table 3. The only difference is that we have conditions F  and E  for Markovian transitions of the choice and synchronisation constructs in the rules (11–14) (i.e. no internal transitions should be enabled for F and E). This rule is due to different interpretations of actions and rates: actions are executed instantaneously, while executing a rate means that the system performs a delay with the duration which is exponentially distributed with the corresponding rate. The duration of the delay cannot be zero, therefore an internal action which can be executed immediately has precedence, see [1]. We need to take into account all enabled Markovian transitions in order to correctly compute average waiting time and transition probabilities. The SOS of IMCG is therefore defined as a multirelation for delay transitions (i.e. X := λℓ .X + λℓ .X has two different delay transitions), similarly to IMC in Brinksma and Hermanns [1]. We will now state two lemmas concerning the SOS rules of IMCG that will be used later on in the proofs. Lemma 1 asserts that closed IMCG expressions give rise only to closed IMCG expressions as a result of any number of semantic transitions, which shows that the semantics is well-defined. Lemma 2 states that no internal transition is possible for an IMCG process with an enabled delay transition (which is quite obvious but useful to be stated explicitly). Lemma 1 (Preservation of IMCG syntax). Given a closed IMCG expression E and E α −→ E ′ , then E ′ is also a closed IMCG C expression. Lemma 2 (Delay transitions). Given an IMCG expression E such that E α τ −→ for some α ∈ Rate, then E . 526 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 Table 3 Structural operational semantics of IMCG : a ∈ Act ∪ {τ }, C ∈ M , λ ∈ Rate, ℓ ∈ Lab, X ∈ Var, A ⊆ Act. a aℓ . E −−−−−−−−−→ E ⊥M [ℓ  → 1] (1) a E −→ E ′ (2) a E + F −→ E ′ C (3) a E + F −→ F ′ C τ E C (12) λ E + F −→ F ′ C (4) C τ F  C λ E  A  F −→ E ′  A  F (13) C a F −→ F ′ λ F −→ F ′ a ∈ A C a E  A  F −→ E  A  F ′ (5) C a F −→ F ′ C2 (14) (6) λ E −→ E ′ ∈ A hide A in E −→ hide A in E ′ C ′ λ E  A  F −→ E  A  F ′ a∈A E  A  F −−−−−→ E ′  A  F ′ C1 + C2 a E −→ E ′ a C a τ E C C a a (11) λ E + F −→ E ′ λ E −→ E ′ a ∈ A a E  A  F −→ E ′  A  F C1 τ F  C λ F −→ F ′ C a E −→ E ′ (10) C a F −→ F ′ C ⊥M [ℓ  → 1] λ E −→ E ′ C a E −→ E ′ λ (λℓ ). E −−−−−−−−−→ E (7) C λ hide A in E −→ hide A in E ′ (15) C a∈A E −→ E C τ hide A in E −→ hide A in E ′ (8) C a E {X := E /X} −→ E ′ C a X := E −→ E ′ C λ E {X := E /X} −→ E ′ (9) C λ X := E −→ E ′ (16) C 3. Well-formed IMC G Syntactic rules for IMCG in Table 1 are very permissive and allow for infinite state spaces. This is, however, not our intention. In this section we will introduce a number of rules such that if the syntactic description of an IMCG system fulfils these rules, then it is suitable for our analysis. In particular, its state space is guaranteed to be finite. Before presenting the conditions, we will introduce two auxiliary operators on the syntax of IMCG which will be used in the definitions below. The operator fn in Table 4 captures the notion of so-called free names. These are external action names that occur in IMCG expressions and are not internalised, i.e. are not contained in any set A of the hide-construct applied on top of the action appearance. For example, fn(hide {a} in aℓ1 .0) = ∅ but fn(aℓ1 .0) = {a}. Free action names decorate semantic transitions in an unchanged way and can synchronise with each other, while hidden names decorate corresponding transitions with the τ -action and are not involved in any synchronisation. Table 4 Operator fn : IMCG → 2Act computing free (i.e. non-internalised) names of IMCG expressions. fn(aℓ .X) fn(λℓ .X) fn(aℓ .P ) fn(λℓ .P ) fn(P1 + P2 ) fn(hide A in P ) fn(P1  A  P2 ) fn(X := P ) fn(0) = = = = = = = = = {a} if a ∈ Act ∅ otherwise {a} ∪ fn(P ) fn(P ) fn(P1 ) ∪ fn(P2 ) fn(P )\A fn(P1 ) ∪ fn(P2 ) fn(P ) ∅ (1) (2) (3) (4) (5) (6) (7) (8) (9) N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 527 Table 5 Operator fpi : IMCG → 2Var computing free (i.e. not bound) process identifiers of IMCG expressions. fpi(aℓ .X) fpi(λℓ .X) fpi(aℓ .P ) fpi(λℓ .P ) fpi(P1 + P2 ) fpi(hide A in P ) fpi(P1  A  P2 ) fpi(X := P ) fpi(0) = = = = = = = = = {X} {X} fpi(P ) fpi(P ) fpi(P1 ) ∪ fpi(P2 ) fpi(P ) fpi(P1 ) ∪ fpi(P2 ) fpi(P )\{X} ∅ (1) (2) (3) (4) (5) (6) (7) (8) (9) The free process identifiers operator fpi is defined Table 5. It returns for a given IMCG expression a set of process variables that occur in it free, i.e. are not bound. For example, fpi(aℓ .X) = {X} but fpi(X := aℓ .X) = ∅. It is clear that an IMCG expression F is process identifier closed iff fpi(F ) = ∅. Well-formedness rules for IMCG are given in Table 6. An IMCG expression E is considered well-formed if, first, only process identifier closed processes are put in parallel (rule (7)), and second, no free name can be hidden (rule (6)). The first condition is checked inductively on the syntax of E, while for the second condition we pass a set of free names as a parameter: i.e. we check whether ⊢fn(E) E holds, see Definition 1. Note that, according to the well-formedness rules, E is also well-formed if we prove that ⊢S E for some S ⊇ fn(E ). Additionally, the set of free names is enlarged in the rule (7) by the free names of synchronising processes in order to exclude cases with changed abilities to synchronise (see the example below). Definition 1 (Well-formed IMCG ). An IMCG expression E is called well-formed if ⊢fn(E) E holds. Without the rule (6) in Table 6, there could be cases where initially external actions become internal after several transitions, which we would like to avoid. For example, after the transition τ ℓ2 ℓ1 X := aℓ1 .X + hide {a} in aℓ2 .X − −−−−−−−−−→  hide {a} in X := a .X + hide {a} in a .X ⊥ M ℓ2  → 1 the initially free external action aℓ1 becomes hidden and will be executed as τ . It appears, however, that in order to make the “behaviour” of IMCG expressions predictable it is not enough to only rule out expressions where “globally” free action names can become hidden after a number of transitions. Consider the following IMCG transitions: τ hide {a} in (X := aℓ1 .X + hide {a} in aℓ2 .X  {a}  Y := aℓ3 .aℓ4 .bℓ5 .Y) −−−−− −−−−−−−−−−−→  ⊥ M ℓ1  → 1 , ℓ 3  → 1 hide {a} in (X := aℓ1 .X + hide {a} in aℓ2 .X  {a}  aℓ4 .bℓ5 .Y := aℓ3 .aℓ4 .bℓ5 .Y) and τ hide {a} in (X := aℓ1 .X + hide {a} in aℓ2 .X  {a}  Y := aℓ3 .aℓ4 .bℓ5 .Y) −−−−−  −−−−−−−−−−−→ ⊥ M ℓ2  → 1 , ℓ 3  → 1 Table 6 Well-formedness rules for IMCG expressions, with S ⊆ Act. ⊢S aℓ .X ⊢S λℓ .X ⊢S P ⊢S aℓ .P ⊢S P ⊢S λℓ .P ⊢ S P1 ⊢ S P2 ⊢S P1 + P2 ⊢S P if A ∩ S = ∅ ⊢S hide A in P ⊢S∪fn(P1 ) P1 ⊢S∪fn(P2 ) P2 if (fpi(P1 ) = ∅) ∧ (fpi(P2 ) = ∅) ⊢ S P1  A  P 2 ⊢S P ⊢S X := P ⊢S X ⊢S 0 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) 528 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 hide {a} in (hide {a} in (X := aℓ1 .X + hide {a} in aℓ2 .X)  {a}  aℓ4 .bℓ5 .Y := aℓ3 .aℓ4 .bℓ5 .Y). The action a is globally hidden in all the expressions above. However, in the first case the actions aℓ1 and aℓ4 will synchronise with each other, while in the second case aℓ1 becomes internalised and can be executed on its own and aℓ4 cannot be executed at all. This is the reason why the sets of free names are updated in the rule (7) in Table 3. We also exclude synchronising IMCG processes with free process identifiers (rule (7)) because such expressions can potentially “grow” by means of creating several copies of the same process after recursion unfoldings. Consider the following example, where two copies of the process Y have been created: a X := aℓ1 .X  {}  Y := bℓ2 .Y −→ X := aℓ1 .X  {}  Y := bℓ2 .Y  {}  Y := bℓ2 .Y We would like to exclude such cases from being considered well-formed because they are hard to analyse – in particular, they can lead to an infinite state space. Note the symmetry in the well-formedness conditions on the “global level” of the whole IMCG expression and on the “local level” of its synchronising processes: we require that in both cases we do not have any free process identifiers and do not hide free names. These two conditions are directly stated for synchronising processes in the rule (7) from Table 6, while for the whole expression we have to check them explicitly, by passing the set of globally free names as parameter and by considering only process identifier closed IMCG expressions. “Well-behaved” IMCG expressions will be called IMCG programs, see Definition 2. Besides closeness and well-formedness, we require that IMCG programs are uniquely labelled and have unique process identifiers. Unique labelling helps to differentiate between two occurrences of the same action name or the same delay rate: for example, in X := aℓ1 .aℓ2 .bℓ3 .X we can differentiate between aℓ1 and aℓ2 . The unique labelling of an IMCG expression is in general not preserved under transitions. For example, X := aℓ1 .(aℓ2 .X + bℓ3 .X ) is uniquely labelled, while the derived from it in one step expression aℓ2 .X := aℓ1 .(aℓ2 .X + bℓ3 .X )+bℓ3 .X := aℓ1 .(aℓ2 .X + bℓ3 .X ) is not. However, these different occurrences of the same label are acceptable because they are the result of copying the same process definition several times. Definition 2 (IMCG program). An IMCG program is an IMCG expression which is closed, well-formed, uniquely labelled and in which all the process identifiers used in process definitions are different. In Lemma 3 we state that the well-formedness is preserved under transitions. This shows, first, that the well-formedness is a well-defined concept, and second, that we can use the properties that follow from the well-formedness of some F in the whole LTS induced by the semantics of F – for example, the property that labels of synchronising processes stay disjoint after any number of semantic transitions. Lemma 3 (Well-formedness). Given an IMCG program F and F all E1 ∗ −→ E, then ⊢fn(E) E holds and Labs(E1 ) ∩ Labs(E2 ) = ∅ for  A  E2  E. 4. Pathway Analysis In this section we will present a number of operators on the syntax of IMCG that together represent so-called Pathway Analysis of IMCG . This is a subtype of Data Flow Analysis which is a well-know method in the field of Static Analysis. Data Flow Analysis method has been introduced in the field of program analysis as one of the methods for analysing programs’ code. The idea is to define for each program’s basic block (for example, for each command) a so-called transfer function which determines how the program’s states (for example, values of the variables) before and after the execution of the basic block are connected. Finally, the overapproximation of the program’s states at program points before and after each basic block is computed from the known initial/final program states and the transfer functions. Overapproximation is in particular due to the fact that, as basic blocks can be executed many (even infinitely many) times during the program’s execution, transfer functions in general do not describe data transformations caused by them in a precise way. If all the transfer functions are monotone, than the whole construction is often called Monotone Data Flow Analysis Frameworks, see [5] for details. If the transfer functions cause some pieces of data to become obsolete and others to be added to the current data, then the scheme is called Bitvector Frameworks. The transfer function for a basic block b would then take a simple form fb (D) = (D − killb ) + generateb , where D represents valid data before the execution of the basic block b (e.g. live variables), killb represents the information that becomes obsolete/is killed after the execution of the basic block (e.g. variables redefined in b) and generateb represents newly generated information (e.g. variables used in b), see [7] for details. Data Flow Analysis in the field of process algebras is applied to the syntax of process algebraic expressions, see [8,10– 12]. There are, however, some differences compared to Data Flow Analysis as a program analysis method. In traditional programming languages syntax is static and execution only changes the memory state and the program pointer; hence it is N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 529 natural to compute data flow information in terms of memory state approximations that are localised to program points. In process algebra, however, the syntax is the state and therefore changes dynamically during execution. Thus, when developing the Data Flow Analysis of process algebras, the goal was to approximate syntactic expressions rather than memory states. In order to do this, a notion of exposed labels or exposed prefixes has been coined in Nielson and Nielson [10]. Intuitively, exposed labels of a process are those labels that might decorate a transition enabled for this process. For example, for the IMCG process X := aℓ1 .X + aℓ2 .bℓ3 .X the exposed labels are ℓ1 and ℓ2 , but not ℓ3 . Thus, for the purposes of the analysis, we shall abstractly characterise system states by their multisets of exposed labels (there could be more than one exposed label of the same kind). We shall then compute the system’s behaviour by tracking how these multisets evolve when transitions occur. A transition may cause some exposed labels to disappear, and others to emerge. Transfer functions are used to express how multisets of exposed labels are changed when transitions decorated by one or more exposed labels are “fired”. Much akin to Bitvector Frameworks, transfer functions take the form fℓ (D) = (D − killℓ ) + generateℓ , where D denotes exposed labels of a state and for each label ℓ participating in the transition there exists a transfer function consisting of killℓ and generateℓ functions [10]. The difference with the previous equality is that the transfer functions are defined for separate labels rather than for the whole program state/basic block. Exposed labels and transfer functions have been computed by the use of a number of operators on the syntax of a process algebra in the previous work. In our work we have also defined the exposed, kill and generate operators on the syntax of IMCG . In contrast to the previous work, we analyse a process algebra with the CSP-style, i.e. multiway, synchronisation [4]; and in order to account for this, we have introduced a new, so-called chains operator, which determines which labels synchronise with each other during the execution. We have also introduced a number of auxiliary operators. In this section we will present all the mentioned operators and show how to model semantic transitions enabled for an IMCG expression E from the results of these operators on E. We call our analysis the Pathway Analysis for IMCG . This term has originated in the computational biology and has been used in Nielson et al. [8] to denote the application of Static Analysis to biological systems (in particular, to the calculus of BioAmbients, see [15]) with the purpose of extracting the information on their possible dynamic evolution (on so-called pathways). The current work is not explicitly targeted at biological processes. However, the purpose of the analysis (analysing systems’ dynamics) and the analysis methods are close to the ones in Nielson et al. [8] etc., therefore we have decided to reuse the name Pathway Analysis for it. 4.1. Exposed labels The exposed operator is denoted E and is defined inductively on the syntax of IMCG , see Table 7. The idea is that all the labels of the prefixes on the highest syntactic level are exposed, as ℓ1 in the process aℓ1 .bℓ2 .0 (see the rules (1–2) in Table 7). Exposed labels of expressions connected by the choice and parallelisation constructs are added together, see the rules (3) and (5). The operator E returns an element from the multiset domain M in order to account for possibility that several labels of the same type are exposed. We will see later in Section 5 that for an IMCG program F its exposed labels together with the results computed by the generate, kill and chains operators on F (that will be defined below in Sections 4.2 and 4.3) fully characterise the LTS induced by the semantics of F. The operator E makes use of some environment Ŵ in order to determine exposed labels of process variables, see the rule (7) in Table 7. The parameter Ŵ is a mapping from process variables to the exposed labels of their process definitions. In the following we will usually assume that Ŵ has been computed beforehand for all the variables occurring in the argument of E by, for example, the operator vl defined in Table 8. If an input expression of vl is closed and each process variable occurs in it only once, then vl will return for each process variable the exact result. Table 7 Operator E : IMCG → M computing exposed labels, with α ∈ Act ∪ {τ } ∪ Rate, Ŵ ∈ V . EŴ α ℓ .X = ⊥M [ℓ → 1] EŴ α ℓ .P  = ⊥M [ℓ → 1] EŴ P1 + P2  = EŴ P1  + EŴ P1  EŴ hide A in P  = EŴ P  EŴ P1  A  P2  = EŴ P1  + EŴ P2  EŴ X := P  = EŴ P  EŴ X(ℓ) = Ŵ (X )(ℓ) EŴ 0 = ⊥M (1) (2) (3) (4) (5) (6) (7) (8) 530 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 Table 8 Operator vl : IMCG → V computing exposed labels of process definitions, with α ∈ Act ∪ {τ } ∪ Rate. vlα ℓ .X = ⊥V vlα ℓ .P  vlP1 + P2  vlhide A in P  vlP1  A  P2  = = = = vlX := P  = vlX = vl0 = (1) (2) (3) (4) (5)  X  → E⊥V P  ⊔ vlP  (6) (7) (8) vlP  vlP1  ⊔ vlP2  vlP  vlP1 ⊔ vlP2  ⊥V ⊥V ⊥V The main rule in defining vl is the rule (6). We use there the exposed operator with an empty environment on the body of a process definition. The empty environment can be justified with the help of Lemma 4 which states that the environment is not important in this case, so we can parametrise the exposed operator with an empty environment for simplicity. It is also stated that substitutions in IMCG expressions do not change their exposed labels. The lemma is basically due to the guardedness of the syntax of IMCG and will be used in many proofs of the lemmas below. Lemma 4 (Exposed labels). Given an IMCG expression E, then EŴ E  E ′ ∈ IMCG . = E⊥V E  = E⊥V E {X /E ′ } holds for all Ŵ ∈ V and Lemma 5 states that the results returned by the operator vl on IMCG programs are in a sense invariant under semantic transitions. This will be used in the considerations concerning properties of IMCG programs that remain stable under any number of transitions. We will prove many similar results (i.e. that some property is stable under transitions) in the following. Lemma 5 (Variable definitions under transitions). Given an IMCG program F and F vlX ∗ −→ E, then for all X := E ′′  E holds := E ′′ (X) = vlF (X). Note that, in case F ∗ −→ E, then it is not necessarily strictly vlF  = vlE , because some of the process definitions may appear in F but not E. For example, X a ℓ1 := aℓ1 .X + Y := aℓ2 .Y −−−−  X := a .X, and the definition of Y has  −−−→ ⊥ M ℓ1  → 1 “disappeared” from the process on the right. However, all the processes variables present in E will have the same exposed labels as in F. We will now state in Lemma 6 that for IMCG programs all exposed labels stay unique after any number of semantic steps, i.e. exposed labels constitute a set. This is a necessary condition in order to model the semantics by our methods in a precise way. Note that we can use the same environment Ŵ in Lemma 6 both for F and for E ′′ , because, according to Lemma 5, process definitions are stable under transitions. Also note that, according to Lemma 6, different transition derivation trees are guaranteed to correspond to different multisets decorating the transitions. This result means that for IMCG programs it is not necessary to have multisets of delay transitions as it was mentioned in Section 2.2, i.e. every such multiset would contain no more than one transition. ∗ = vlF , F −→ E and E ′′  E, then EŴ E ′′ (ℓ) ∈ {0, 1} for all ℓ ∈ Lab. Moreover, for all transition pairs E −→ G1 and E −→ G2 such that their transition derivation trees are different Lemma 6 (Exposed labels are a set). Given an IMCG program F, Ŵ C1 (G1 C2 = G2 is possible) holds C1 = C2 . 4.2. Chains operator It is not enough for a label to be exposed in order to be “executable”. It might lack synchronisation partners and therefore be not executable on its own. For example, ℓ1 is not executable in X := aℓ1 .X  {a}  Y := bℓ2 .aℓ3 .Y , because it can only be executed together with ℓ3 . We will say in the following that ℓ1 and ℓ3 constitute a chain. Before presenting the chains operator which computes all the chains in its input expression, we will discuss the correspondence between labels and names. This is important during the computation of chains because only labels with the same action name, that has moreover not been internalised, synchronise with each other. For a uniquely labelled IMCG expression the correspondence between labels and action names or delay rates is uniquely defined. The mapping from labels to names can be computed by the operator ln in Table 9. Internalised action names cannot become external again after any number of transitions. On the other hand, external action names can in general be hidden by applying the hide-construct on top of them after a number of transitions (as a N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 531 Table 9 Operator ln : IMCG → L computing a mapping between labels and their corresponding names, with α ∈ Act ∪ {τ } ∪ Rate. lnα ℓ .X lnα ℓ .P  lnP1 + P2  lnhide A in P  lnP1  A  P2  lnX := P  lnX ln0 = = = = = = = = (1) ⊥A [ℓ  → α ] ⊥A [ℓ  → α ] ⊔ lnP  (2) lnP1  ⊔ lnP2  (3) lnP  (4) lnP1  ⊔ lnP2  (5) lnP  (6) ⊥A (7) ⊥A (8) result of recursion unfoldings). However, Lemma 7 states that for well-formed IMCG expressions this does not occur: free action names (that are forwarded as a parameter to the well-formedness condition) stay free. Lemma 7 (Hidden labels). Given a well-formed IMCG expression F,  ∗ = lnF  and F −→ E, then for all ℓ ∈ Labs(E ) holds: (ℓ) ∈ fn(E ) iff (ℓ) ∈ fn(F ). The chains operator T is presented in Table 10. The rules are straightforward, and the main logic is in the rule (5). There we take all possible combinations of chains from two parallel processes that synchronise on the same external action which is free in both processes. The operator T makes use of the environment  which is a mapping from labels to their corresponding names (can be previously computed by the operator ln) and of the function nameh defined in Table 2. The last returns the name for a multiset taking into account possible action hindings. It is clear from the construction that all the chains returned by T have the same action name or delay rate, and only chains that correspond to actions can have more than one label in their domain. Knowing chains and exposed labels of an IMCG expression E (for example, derived from some IMCG program, so E is guaranteed to have nice properties) is enough to predict all the transitions from E. This is stated in Lemma 9 below. For example, labels ℓ1 and ℓ3 are exposed in X := aℓ1 .X  {a}  aℓ3 .Y := bℓ2 .aℓ3 .Y , and there is a chain containing both these labels. Lemma 9 would predict that there is an enabled transition decorated by these two labels, and this is indeed the case. In order to prove Lemma 9 for transitions involving recursion unfolding, we need an additional fact about how chains evolve in case a substitution has been made. This is the subject of Lemma 8 which states that for well-formed expressions we just take a union of chains of the expression in which the substitution has been made and of the expression which has been inserted. Lemma 8 (Chains under substitution). Given well-formed IMCG expressions E ′ then it holds that T E ′ {X /E } = T E ′  ∪ T E  and T E ′  ⊆ T E .  E, X ∈ fpi(E ′ ), lnE ′  ≤  and lnE  ≤ , Table 10 Operator T : IMCG → 2M computing so-called “chains”, i.e. labels that can be executed only together, with α ∈ Act ∪ {τ } ∪ Rate,  ∈ L . ℓ T  a .X ℓ T  a .P  T   P1 + P 2  T  hide A in P  = {⊥M [ℓ  → 1]} = {⊥M [ℓ  → 1]} ∪ T  P  = T   P1  ∪ T   P2  = T  P  T  P1  A  P2  = {C |C ∈ T  P1 , nameh ,fn(P1 ) (C )  ∈ A}∪ {C |C ∈ T  P2 , nameh ,fn(P2 ) (C )  ∈ A}∪ α∈A {C1 + C2 |C1 ∈ T  P1 , C 2 ∈ T   P1  , nameh ,fn(P1 ) (C1 ) = α, nameh ,fn(P2 ) (C2 ) = α} T  X := P  = T  P  T  X = ∅ T  0  = ∅ (1) (2) (3) (4) (5) (6) (7) (8) 532 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 Lemma 9 (Transition existence). Given an IMCG program F, F C ∈ T E , C ≤ EŴ E , α nameh ,fn(F ) (C ′ ) = τ . ∗ α −→ E,  = lnE  and Ŵ = vlE , then E −→ E ′ iff C = nameh ,fn(F ) (C ), and, in case α ∈ Rate, there is no C ′ such that C ′ ∈ T E , C ′ ≤ EŴ E  and In Section 4.3 we will show how to compute exposed labels of expressions into which an IMCG expression E evolves as a result of transitions predicted for E by Lemma 9. 4.3. Generate and kill operators After a transition has occurred, some of the previously exposed labels cease to be exposed (are killed) and others become exposed (are generated). For example, labels ℓ1 and ℓ2 are exposed in the expression aℓ1 .0 + bℓ2 .cℓ3 .0. After the transition aℓ1 .0 + bℓ2 .cℓ3 .0 −→ cℓ3 .0 the labels ℓ1 and ℓ2 are killed and the label ℓ3 is generated. We associate kill and generate effects with separate labels instead of states. The generate (G) and kill (K) operators are defined by induction on the syntax of IMCG in accordingly Tables 12 and 14. A label kills all the labels in the choice construct on its own syntactic level (the rules (1–3) in Table 14) and generates all the labels of the prefixes on the lower syntactic level (the rules (1–2) in Table 12). In the above example the labels ℓ1 and ℓ2 kill two choice alternatives (i.e. both ℓ1 and ℓ2 ), while only ℓ2 generates ℓ3 . The generate operator makes use of the mapping Ŵ from process variables to the exposed labels of their process definitions. Herewith we would like to put forward two remarks on the differences of our approach from the previous work. First, we assume that the input expression is uniquely labelled, therefore it is enough to take the least upper bounds while computing both the G and K operators – each label occurs in the input expression only once anyway. In the previous work the greatest lower bound was taken for K in order to assign to each label the least killed multiset. Second, in order to simplify the proofs on the properties of the G and K operators, we have defined two additional operators – G′ and K′ , defined in Tables 11 and 13 – that compute the generate and kill effects only for the prefixes on the top syntactic level. It is clear by comparing the rules for G′ and G and for K′ and K that the only difference is in the rule (2). Lemma 10 states that the G′ and K′ operators on well-formed IMCG expressions before and after some variable substitution return the same results (for G′ the well-formedness properties are used in order to prove the lemma). This lemma will be useful for reasoning on the effects of a transition in Lemma 11: a substitution is conducted in the SOS rules (9) and (16) in Table 3 during recursion unfolding. Lemma 10 (Kill/generate under substitution). Given well-formed IMCG expressions E ′ then G′Ŵ E ′ {X /E } = G′Ŵ E ′  and K′ E ′ {X /E } = K′ E ′  hold.  E, X ∈ fpi(E ′ ), Ŵ (X ) = E⊥V E, Table 11 ′ Operator G : IMCG → N computing labels generated by the labels exposed on the top syntactic level, with α ∈ Act ∪ {τ } ∪ Rate, Ŵ ∈ V . G′Ŵ α ℓ .X G′Ŵ α ℓ .P  G′Ŵ P1 + P2  G′Ŵ hide A in P  G′Ŵ P1  A  P2  G′Ŵ X := P  G′Ŵ X G′Ŵ 0 = = = = = = = = ⊥N ℓ  → EŴ X (1)   ⊥N ℓ  → EŴ P  (2) ′ ′ GŴ P1  ⊔ GŴ P2  (3) G′Ŵ P  (4) G′Ŵ P1  ⊔ G′Ŵ P2  (5) G′Ŵ P  (6) ⊥N (7) ⊥N (8)   Table 12 Operator G : IMCG → N computing generated labels, with α ∈ Act ∪ {τ } ∪ Rate, Ŵ ∈ V . GŴ α ℓ .X G Ŵ α ℓ .P  GŴ P1 + P2  GŴ hide A in P  GŴ P1  A  P2  GŴ X := P  GŴ X GŴ 0 = = = = = = = = ⊥N ℓ  → EŴ X (1)   ⊥N ℓ  → EŴ P  ⊔ GŴ P  (2) G Ŵ  P1  ⊔ G Ŵ  P 2  (3) G Ŵ P  (4) G Ŵ  P1  ⊔ G Ŵ  P 2  (5) G Ŵ P  (6) ⊥N (7) ⊥N (8)   533 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 Table 13 ′ Operator K : IMCG → N computing labels killed by the labels exposed on the top syntactic level, with α ∈ Act ∪ {τ } ∪ Rate. K′ α ℓ .X K′ α ℓ .P  = ⊥N [ℓ  → ⊥M [ℓ  → 1]] = ⊥N [ℓ  → ⊥M [ℓ  → 1]] K′ P1 + P2 (ℓ) = K′ P1 (ℓ) + K′ P2 (ℓ) if ℓ ∈ E⊥V P1 + P2  K′ P1 + P2 (ℓ) = ⊥M otherwise K hide A in P  = K′ P  K′ P1  A  P2  = K′ P1  ⊔ K′ P2  K′ X := P  = K′ P  K′ X = ⊥N K′ 0 = ⊥N ′ (1) (2) (3a) (3b) (4) (5) (6) (7) (8) Table 14 Operator K : IMCG → N computing killed labels, with α ∈ Act ∪ {τ } ∪ Rate. Kα ℓ .X Kα ℓ .P  = ⊥N [ℓ  → ⊥M [ℓ  → 1]] = ⊥N [ℓ  → ⊥M [ℓ  → 1]] ⊔ KP  KP1 + P2 (ℓ) = KP1 (ℓ) + KP2 (ℓ) if ℓ ∈ E⊥V P1 + P2  KP1 + P2 (ℓ) = KP1 (ℓ) ⊔ KP2 (ℓ) otherwise Khide A in P  = KP  KP1  A  P2  = KP1  ⊔ KP2  KX := P  = KP  KX = ⊥N K0 = ⊥N (1) (2) (3a) (3b) (4) (5) (6) (7) (8) Lemma 11 captures the essence of the generate and kill operators and predicts the outcome (in the sense of exposed labels) of a transition enabled for some E if E, G′ and K′ have been previously computed on E. Note that there can only be one exposed label of each kind in E according to Lemma 6, therefore it is enough to account for labels in the domain of the chain only once. In the previous work the labelling was in general non-unique, and instead of the equality relation there was smaller or equal relation between the actual outcome of a transition and the predicted one, see, for example, [10]. Lemma 11 (Kill/generate). Given an IMCG program F, Ŵ  + C (ℓ)>0 G′Ŵ E (ℓ). ∗ α = vlF  and F −→ E −→ E ′ , then EŴ E ′  = EŴ E − C  C (ℓ)>0 K′ E (ℓ) 5. Pathway Analysis is precise In this section we will prove several results about the Pathway Analysis of IMCG programs. These results will be enough in order to prove that the Pathway Analysis captures all the semantic properties of IMCG programs, avoiding any imprecision stemming from the analysis technique. To illustrate this, we will show how to build LTSs strongly bisimilar to the LTSs induced by the semantics of analysed IMCG programs in Section 5.2. 5.1. Pathway Analysis results are stable under transitions In this section we will prove that the results of the generate, kill and chains operators on an IMCG program F are mostly transferable to all the states reachable from F. The word “mostly” refers to the fact that some of the behaviour possible in the original state of F can cease to be possible after a number of transitions. However, if some label still occurs in an IMCG expression E reachable from F, then its “behaviour” (i.e. chains in which it participates, the generate and kill effects) are essentially the same as in F. In particular, if for some chain C in T E  all the constituting labels are in E, then the chain C is also the chain of E. On the other hand, all the chains in T E  are also the chains of F. This is stated in Lemma 13. In the proof of this lemma we use the auxiliary Lemma 12 which shows that chains of subexpressions are also chains of larger IMCG expressions if there is no synchronisation involved. 534 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 Note that some chains may be “lost” after a transition. For example, for X a := aℓ1 .X  a  aℓ2 .0 −−−− −−−−−−−−→  ⊥ M ℓ1  → 1 , ℓ 2  → 1 X := aℓ1 .X a 0 we compute T X := aℓ1 .X a aℓ2 .0 = {⊥M [ℓ1  → 1, ℓ2  → 1]} but T X := aℓ1 .X a 0 = ∅. This illustrates the statement of Lemma 12 that no new chains are created after transitions, while those lacking the labels are lost. ∗ = lnF , F −→ E and E ′′  E ′  E such that no ′′ ′ sychronisation construct is applied to E in E , then for all C such that dom(C ) ⊆ Labs(E ′′ ) holds: C ∈ T E ′′  iff C ∈ T E ′ . Lemma 12 (Chains of subexpressions). Given an IMCG program F,  Lemma 13 (Chains under transitions). Given an IMCG program F,  C ∗ = lnF , F −→ E and dom(C ) ⊆ Labs(E ), then ∈ T F  iff C ∈ T E  and nameh ,fn(F ) (C ) = nameh ,fn(E) (C ). In Lemma 14 we state that the generate functions of labels that occur in some E reachable from some IMCG program F are the same both in F and E (in Lemma 14 we use G′ and K′ on E instead of G and K in order to simplify the reasoning in case a label occurs in E several times). This is obviously not the case for the labels occurring in F but not in E. For example, for X a ℓ2 ℓ1 ′ ℓ1 ℓ1 := aℓ1 .X + Y := bℓ2 .Y −−−−  −−−→  X := a .X we have GŴ X := a .X + Y := b .Y (ℓ1 ) = GŴ X := a .X (ℓ1 ) = ⊥ M ℓ1  → 1 ⊥M [ℓ1 → 1], but GŴ X := aℓ1 .X + Y := bℓ2 .Y (ℓ2 ) = ⊥M [ℓ2 → 1] = ⊥M = G′Ŵ X := aℓ1 .X (ℓ2 ). For the kill operator, the killed multisets can be strictly smaller after a number of transitions. In the above example, KX := aℓ1 .X + Y := bℓ2 .Y (ℓ1 ) = ⊥M [ℓ1  → 1, ℓ2  → 1] but K′ X := aℓ1 .X (ℓ1 ) = ⊥M [ℓ1  → 1]. We show, however, in Lemma 14 that the kill operator results are equivalent when constrained to the exposed labels of any reachable state. In this way the results of the kill operator are also invariant under transitions, because EŴ E ′′  − K′ E ′′ (ℓ) = EŴ E ′′  − KF (ℓ) for ℓ ∈ dom(EŴ E ′′ ), with F and E ′′ as in Lemma 14. Lemma 14 ([Kill/generate under transitions). Given an IMCG program F, Ŵ ∗ = vlF  and F −→ E, then for all E ′′  E, ℓ ∈ dom(EŴ E ′′ ) holds: G′Ŵ E ′′ (ℓ) = GŴ F (ℓ) and K′ E ′′ (ℓ) = KF (ℓ) ⊓ EŴ E ′′ . 5.2. Worklist algorithm In the previous work (see, for example, the Data Flow Analysis for CCS in Nielson and Nielson [10]) the analysis constructions have been completed by devising the so-called Worklist algorithm. The algorithm would build a finite LTS based on the analysis results that could simulate the LTS of a process algebraic expression for which the analysis has been conducted. In Table 15 we present a similar algorithm which we have named buildLTS that would complete our Pathway Analysis. The difference with the previous work is, in particular, that no states are merged during the construction, therefore no so-called granularity function is made use of as in Nielson and Nielson [10]. The algorithm buildLTS accepts as input an IMCG program F. It computes a number of the Pathway Analysis operators on it and, starting from the state characterised by the exposed labels of F, computes all the states reachable according to the results of the operators G, K and T on F (returned in the set States) and all the transitions between the states (returned in the set Transitions). The resulting LTS is strongly bisimilar to the LTS induced by the semantics of F, and the algorithm is guaranteed to terminate – this is stated in Lemma 15. The definition of the strong bisimulation on IMCG is given in Definition 3. This is essentially the same definition as the definition in Hermanns [3] of the strong bisimulation on IMC systems, just with transitions decorated by multisets of labels. Note that in the condition 2 the second sum is taken over all derivable delay transitions, i.e. over all derivation trees. However, in case each delay transition has a unique multiset decorating the transition (as it is the case for all the states reachable from some IMCG program according to Lemma 6), then the sum can be taken over all the chains decorating the transitions. This consideration will be used in the proof of Lemma 15. Definition 3 (Bisimulation). An equivalence relation R on IMCG expressions with unique exposed labels is a strong bisimulation relation if for all (E1 , E2 ) ∈ R holds: α α • if E1 −→ E1′ for some α ∈ Act ∪ {τ } then there exist E2′ and C2 such that E2 −→ E2′ and (E1′ , E2′ ) ∈ R; C1 τ C2 • if E1  then We denote E1  E ∈S  λ {E1 −→E |λ∈Rate} C λ=  E ∈S  λ {E2 −→E |λ∈Rate} λ for all equivalence classes S ∈ IMCG /R. C ∼ E2 if there exists a strong bisimulation relation R such that (E1 , E2 ) ∈ R . N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 535 Table 15 The algorithm buildLTS taking as argument an IMCG program F, making use of the procedures enabledChains (argument in M ) and move (arguments in M ). proc buildLTS(F) is 1: Ŵ := vlF ;  := lnF ; G := GŴ F ; K := KF ; T := T  F ; 2: M := EŴ F ; Worklist := {M }; States := {M }; Transitions := ∅; 3: while (Worklist  = ∅) do 4: M := getElement(Worklist); Worklist:=Worklist − {M }; 5: T := enabledChains(M ); 6: for all (C ∈ T ) do 7: M ′ := move(M , C ); 8: if (M ′  ∈ States) then 9: States:=States ∪ {M ′ }; 10: Worklist:=Worklist ∪ {M ′ }; 11: Transitions:=Transitions ∪ {(M , nameh ,A (C ), C , M ′ )}; 12: return (States,Transitions) proc enabledChains(M ) is 13: SC:={C ∈ T |C ≤ M }; 14: if (∃C ∈ CS such that name (C ) ∈ Rate) then 15: CS := {C ∈ CS |nameh ,A (C )  = τ }; 16: return SC proc move(M , C ) is   17: return M − ( ℓ∈dom(C ) K (ℓ)) + ℓ∈dom(C ) G(ℓ) The LTS constructed by the algorithm buildLTS can in general be smaller than the LTS induced by the semantics of F. The “gain” in the system’s size will be, however, not significant for most of IMCG systems because it is only due to copies of process definitions which have been created during recursion unfolding (i.e. states with one or more copies of process definitions correspond to only one state in the LTS returned by buildLTS). For example, two following expressions Y := X := aℓ1 .X + bℓ2 .Y and X := aℓ1 .X + bℓ2 .Y := X := aℓ1 .X + bℓ2 .Y have the same exposed labels and the same behaviour (the second is actu- ally derivable from the first after one transition) but they are syntactically different. The Worklist algorithm in Table 15 will map them to the same state in the constructed LTS. In general, the presented algorithm is not a new algorithm for computing bisimulations on IMC systems (see [3] for such an algorithm) but rather a proof that the Pathway Analysis captures all the properties of analysed systems. Theorem 15 (Worklist algorithm). Given an IMCG program F, then the Worklist algorithm terminates on F and constructs the LTS, whose initial state EŴ F  with Ŵ = vlF  is strongly bisimilar to the LTS induced by the semantics of F, i.e. F ∼ EŴ F . 6. Conclusions In this paper we have applied Static Analysis methods to process algebraic expressions. In particular, we have adapted the Data Flow Analysis for process algebras (devised for the first time in Nielson and Nielson [10]) to a stochastic process algebra – the algebra of Interactive Markov Chains. We have introduced a guarded variant of IMC, i.e. IMCG , with a more permissive syntax than IMC, and have devised the so-called well-formedness conditions. These conditions are quite restrictive: we cannot, for example, model a process duplicating itself. Nevertheless, the subclass of well-formed IMCG still allows to model many interesting concurrent systems – for example, those with a bounded number of newly created processes. A number of Pathway Analysis operators, i.e. the exposed, generate and kill operators, have been redefined from Nielson and Nielson [10] so that they become applicable to IMCG . A new operator (the chains operator) has been introduced in order to match the synchronisation construct in IMCG . The main difference with the previous work is that the Pathway Analysis captures the semantics of well-formed IMCG systems fully, i.e. without any imprecision stemming from the analysis technique. This is possible because such systems are essentially finite-state systems. The proof of the correspondence between the semantic properties of IMCG systems and the Pathway Analysis operators is the main theoretical contribution of this work. Apart from the theoretical value, the work opens the possibilities for developing algorithms for checking system’s properties based on Pathway Analysis operators results which, as we believe, are easier to handle than process algebraic expressions and do not have the state space explosion problem as if the corresponding Labelled Transition System would be constructed. In particular, we could introduce optimisations into the Worklist algorithm in order to merge on-the-fly states with in some sense irrelevant differences in their behaviour. In order to do this, we could, for example, merge labels beforehand based 536 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 on the results of the generate, kill and chains operators for them. We have also devised algorithms for computing states’ reachability (see [17]) and bisimulation relations (see [16]) which require the Pathway Analysis to be conducted first. We are confident that our method is applicable to a larger subclass of IMC than well-formed IMCG expressions if we adopt a more flexible approach in the definition of transfer functions. We have assumed until now that generate, kill and chains operators results should be applicable after any number of semantics steps, i.e. they should not change. In the future work we could allow their results to change in a predictable way and correctly model a larger class of systems, i.e. systems with infinite number of states. We are also considering other process algebras and in general formalisms defined in a compositional way for future applications of our methods. Appendix A. Proofs Lemma 1 (Preservation of IMCG syntax). Given a closed IMCG expression E and E α −→ E ′ , then E ′ is also a closed IMCG C expression. Proof. We can prove the statement by induction on the transition derivation using Table 3. The rules (1) and (10) are base cases and the statement is clear for them because the right side of the transition cannot be a variable due to the closeness of E. The rest of the rules follow from the induction hypothesis. For the rules (9) and (16) we have to show that E {X /X := E } is a closed IMCG expression if this holds for X := E. The closeness for E {X /X := E } follows from the fact that the only free process variable in E can be X, therefore no free process identifiers are present in E {X /X := E } after the substitution. Moreover, if X := E is an IMCG expression then also E is an IMCG expression. From the rules (1–4) in Table 1 follows that we can use an IMCG expression instead of a variable – the result will still be a valid IMCG expression.  Lemma 2 (Delay transitions). Given an IMCG expression E such that E α τ −→ for some α ∈ Rate, then E . Proof. We can prove the statement by induction on the structure of E. The statement is clear for the base cases – the syntax rules (1–4) and (9) in Table 1: for the rules (1–4) only one transition is possible according to the semantic rules (1) and (10) in Table 3 and for the rule (9) no transitions are possible. Most of the rest of syntactic rules follow by the induction hypothesis. For example, the rule (5) follows from the induction hypothesis and the semantic rules (11) and (12) in Table 3: the second summand does not have any derivable τ -transition. For the rule (8) we have to additionally prove that X α α := E −→ iff E −→ for all IMCG expressions E and α ∈ Act ∪ {τ } ∪ Rate. This can be proved by induction on the structure of E and is essentially due to the guardedness of the syntax of IMCG .  Lemma 3 (Well-formedness). Given an IMCG program F and F all E1 ∗ −→ E, then ⊢fn(E) E holds and Labs(E1 ) ∩ Labs(E2 ) = ∅ for  A  E2  E. Proof. We prove the lemma by induction on the number of transitions in F ∗ −→ E. The statement is trivially true for F, ∗ −→ E −→ E ′ and ⊢fn(E) E holds then also ⊢fn(E′ ) E ′ holds. Note that we can prove for simplicity the well-formedness preservation of E ′ with fn(E ), i.e. that ⊢fn(E) E ′ holds. This will also prove ⊢fn(E′ ) E ′ because, as it is easy to see, fn(E ′ ) ⊆ fn(E ). We will actually prove that from ⊢S E follows ⊢S E ′ for any set S ⊆ Act, i.e. a more general statement. therefore it is enough to prove the induction step, i.e. if F The base cases are the rules (1) and (10) in Table 3 and the statement follows for them from the well-formedness rules (1–4) in Table 6 and the statement about the disjoint labelling is obvious. The rules concerning the choice and hide operators follow by induction. The rules about the parallel operator are a bit more complicated. For example, the statement concerning disjoint labelling of E ′ and F ′ in the rule (6) in Table 3 follows from the disjoint labelling of E and F and from the obvious α −→ E ′ then Labs(E ′ ) ⊆ Labs(E ). For the well-formedness of E ′  A  F ′ we can apply our induction hypothesis C to the sets S ∪ fn(E ) and S ∪ fn(F ). We know that ⊢S∪fn(E) E and ⊢S∪fn(F ) F hold. From the induction hypothesis follows that ⊢S∪fn(E) E ′ and ⊢S∪fn(F ) F ′ hold as well. From fn(E ′ ) ⊆ fn(E ) and fn(F ′ ) ⊆ fn(F ) follow ⊢S∪fn(E′ ) E ′ and ⊢S∪fn(F ′ ) F ′ , which means due to the rule (7) in Table 6 that ⊢S E ′  A  F ′ holds. fact that if E For proving the lemma for the rules (9) and (16) in Table 3 we need to show that the lemma’s conditions hold for E {X /X := E } if they hold both for E and X := E. Labels of synchronising processes are disjoint in E {X /X := E } because this is the case for both E and X := E and no substitution is conducted in the synchronising processes. There no substitutions in the synchronising processes because E is well-formed and X cannot be free in a synchronising process due to the closeness of the last, see the rule (7) in Table 6. It is left to prove that E {X /X := E } is well-formed relative to an action set S if this is the case for both E and X := E. In order to prove this fact by induction on the syntactic structure of E we will show that for any IMCG expression E ′′ the expression E ′′ {X /X := E } is well-formed relative to some action set S if this is the case for both E ′′ and X := E (no transition 537 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 is derivable for E ′′ ∈ Var, E ′′  = X, so we can omit this case). We prove this by induction on the structure of E ′′ . The base cases are the rules (1–2) in Table 1 and the statement follows from the well-formedness rules (3–4) in Table 6. The rules (3–6) and (8) follow from the induction hypothesis and the rules in Table 6. The rule (7) follows because E ′′ is well-formed, therefore both synchronising sides are closed and no substitutions can be made in them.  Lemma 4 (Exposed labels). Given an IMCG expression E, then EŴ E  E ′ ∈ IMCG . = E⊥V E  = E⊥V E {X /E ′ } holds for all Ŵ ∈ V and Proof. This lemma is due to the guardedness in the syntax of IMCG , stating that the initial behaviour is fixed for all IMCG expressions. We can prove the statement by induction on the structure of E following the rules from Table 1. Base cases are rules (1–4) and (9) from Table 1 and the statement easily follows for them from the rules (1–2) and (8) for the exposed operator in Table 7. The rest of the rules follows from the induction hypothesis and the rules for the exposed operator.  Lemma 5 (Variable definitions under transitions). Given an IMCG program F and F vlX ∗ −→ E, then for all X := E ′′  E holds := E ′′ (X) = vlF (X). Proof. The statement can be shown by induction on the number of steps in F ∗ −→ E. It is essentially due to the fact that “new” occurrences of the variable definitions in the derivative expression E are only those that have been “copied” by the rules for the recursion unfolding (9) and (16) in Table 3. Exposed labels of E ′′ do not change after the copying, therefore the statement of the lemma holds.  ∗ = vlF , F −→ E and E ′′  E, then EŴ E ′′ (ℓ) ∈ {0, 1} for all ℓ ∈ Lab. Moreover, for all transition pairs E −→ G1 and E −→ G2 such that their transition derivation trees are different Lemma 6 (Exposed labels are a set). Given an IMCG program F, Ŵ C1 (G1 C2 = G2 is possible) holds C1 = C2 . Proof. The statement of the lemma obviously holds for uniquely labelled F and all its subexpressions. Otherwise we will assume that the statement holds for E and we will show that it also holds for E ′ , such that E ∗ −→ E ′ , by induction on the transition derivation. Most of the rules in Table 3 are obvious because the right side is a subexpression of the left side. For the rules (5–7) and (13–14) we have to use the fact proved in Lemma 3 that the labels of two synchronising processes are disjoint (this is in fact the most important reason why the statement of the lemma is true). For the rules (9) and (16) we have to prove that the statement holds for all subexpressions of E {X /X := E } if it holds for all subexpressions of X := E if X ≺ E. It is clear for all E ′′ ∈ Var or E ′′  X := E. Otherwise we have shown in Lemma 4 that EŴ E ′′ {X /X := E } = EŴ E ′′  and the induction hypothesis is applicable. We can now prove the lemma’s second statement by induction on the structure of E, using the result about the uniqueness of exposed labels. The base cases are the rules (1–4) and (9) in Table 1 and there is only one (or zero – for the rule (9)) possible transition derivation and only one exposed label. For the rules (5–7) we can use the induction hypothesis and the uniqueness of exposed labels due to which the transition derivation tree is always uniquely defined. For the rule (8) we can prove that transition derivation trees for E and E {X /X := E } are essentially the same, compare to Lemma 4 on exposed labels before and after substitution.  Lemma 7 (Hidden labels). Given a well-formed IMCG expression F,  ∗ = lnF  and F −→ E, then for all ℓ ∈ Labs(E ) holds: (ℓ) ∈ fn(E ) iff (ℓ) ∈ fn(F ). Proof. It is easy to show (by induction on the transition derivation) that from (ℓ) ∈ fn(E ) follows (ℓ) ∈ fn(F ), i.e. internalised actions in F cannot become external in E. The well-formedness of F is not necessary in proving this. On the other hand, if (ℓ) ∈ fn(F ) then ⊢{(ℓ)} F holds. According to the well-formedness rule (6) in Table 6, for all hide A in P  F holds (ℓ) ∈ A, i.e. free action names of F do not appear in the hide-constructs, therefore these action names will stay free in E as well after any number of semantic steps.  Lemma 8 (Chains under substitution). Given well-formed IMCG expressions E ′ then it holds that T E ′ {X /E } = T E ′  ∪ T E  and T E ′  ⊆ T E .  E, X ∈ fpi(E ′ ), lnE ′  ≤  and lnE  ≤ , Proof. The idea of the proof is that E will not appear in E ′ {X /E } in parallel with some other IMCG expressions, because this would be against the well-formedness rule (7) in Table 6 (synchronising processes should be closed) and E ′ would not be well-formed. As a consequence, all the chains of E will be directly included into the chains of the resulting expression (T E ′ {X /E } = T E ′  ∪ T E ). On the other hand, E ′ will not appear in parallel with some process inside E because 538 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 fpi(E ′ )  = ∅, and E would not be well-formed in this case. Therefore all the chains of E ′ are also the chains of E (T E ′  ⊆ T E ). The lemma can be proved formally by induction on the syntactic structure of E ′ , i.e. for all E ′′ such that E ′′  E ′ , using all the syntactic constructs in Table 1 besides the synchronisation rule (7) and the rules for the chains operator in Table 10.  Lemma 9 (Transition existence). Given an IMCG program F, F C ∈ T E , C ≤ EŴ E , α nameh ,fn(F ) (C ′ ) = τ . ∗ α −→ E,  = lnE  and Ŵ = vlE , then E −→ E ′ iff C = nameh ,fn(F ) (C ), and, in case α ∈ Rate, there is no C ′ such that C ′ ∈ T E , C ′ ≤ EŴ E  and Proof. It is easy to see that it is enough to prove the first part of the statement, because, if there would exist an exposed chain C ′ with nameh ,fn(F ) (C ′ ) = τ , then there would exist a τ -transition from E, which is not possible according to Lemma 2. Note that it is also easy to see that in case of transition existence α = nameh ,fn(F ) (C ) holds: it is enough to check the SOS rules in Table 3, in particular, the rules (1) and (10), and the rules for the operator ln in Table 9 on one hand and Lemma 7 (free action names in F stay free in E) on the other hand. It is therefore left to show that E α −→ E ′ iff C ∈ T E , C ≤ EŴ E . We prove this by induction on the syntactic structure C of E, taking into account the rules for the exposed and the chains operators in Tables 7 and 10. The rules (1–4) and (9) in Table 1 are base cases, and the statement of the lemma is clearly true for them (there is only one label that can be exposed, is in the chain and can be executed). For the rest of the rules the statement follows from the induction hypothesis and due to the definitions of the exposed and chains operators which are defined inductively on the syntax of E. In particular, compare the semantic rules (4–6) and (13–14) in Table 3 to the rule (5) of the chains operator in Table 10. For the rule (8), i.e. recursion unfolding, remember that EŴ X := E  = EŴ E {X /X := E } and T X := E  = T E {X /X := E } according to Lemmas 4 and 8. In case synchronisation is applied on top of recursion unfolding, then considerations similar to the ones in Lemma 7 should be used. In this case there exists some P (to which synchronisation is applied) such that X := E  P  E and ⊢fn(P ) P (this can be proved by induction on the transition sequence F will be free in E {X /X ∗ −→ E). If some action name a is free in X := E, then it also := E } because no hide-construct with a is allowed in E according to the well-formedness rules.  Lemma 10 (Kill/generate under substitution). Given well-formed IMCG expressions E ′ then G′Ŵ E ′ {X /E } = G′Ŵ E ′  and K′ E ′ {X /E } = K′ E ′  hold.  E, X ∈ fpi(E ′ ), Ŵ (X ) = E⊥V E, Proof. The statement is easy to prove for the kill operator because from Lemma 4 follows EŴ E ′ {X /E } = EŴ E ′ , and from the rule for the K′ operator in Table 13 it is clear that K′ E ′ (ℓ) = EŴ E ′  if ℓ ∈ dom(EŴ E ′ ) and K′ E ′ (ℓ) =⊥M otherwise. For the generate operator, in case E ′ = α ℓ .X for some α ∈ Act ∪ {τ } ∪ Rate and ℓ ∈ Lab, then we can use the fact that Ŵ (X ) = E⊥V E and the rules for the G′ in Table 11. Otherwise the statement can be proved by induction on the syntax of G′ using the rules in Table 11.  Lemma 11 (Kill/generate). Given an IMCG program F, Ŵ  + C (ℓ)>0 G′Ŵ E (ℓ). ∗ α = vlF  and F −→ E −→ E ′ , then EŴ E ′  = EŴ E −  C C (ℓ)>0 K′ E (ℓ) Proof. Note that all the exposed labels of E are unique (Lemma 6) and all the labels in C are exposed (Lemma 9), therefore all the labels in C are unique and it is enough to use the generate and kill operator for each label ℓ in C (such that C (ℓ) > 0) only once. We prove the lemma by induction on the transition derivation E α −→ E ′ , thus proving that the statement also holds for C all E ′′  E. For the rules (1) and (10) in Table 3 the statement follows from the rules for the generate and kill operators of the exposed labels in Tables 11 and 13 (currently exposed labels are killed and the exposed labels of E ′ are generated). For the most of the rest of the rules in Table 3 the statement follows from the induction hypothesis, taking into account that labels of parallel processes are disjoint and we can therefore apply the induction hypothesis to them separately. It is left to show that the statement also holds for the recursion unfolding rules (9) and (16) in Table 3, i.e. that the statement holds for X := E if it holds for E {X /X := E }. We can, however, use the fact that K′ X := E  = K′ E  = K′ E {X /X := E } and G′Ŵ X := E  = G′Ŵ E  = G′Ŵ E {X /X := E }, taking into account that Ŵ = vlF  contains an element characterising exposed labels of X := E (see also Lemma 5) and using Lemma 10. Altogether we can derive that if the statement of the lemma holds for E {X /X := E } then it also holds for X := E.  ∗ = lnF , F −→ E and E ′′  E ′  E such that no sychronisation construct is applied to E ′′ in E ′ , then for all C such that dom(C ) ⊆ Labs(E ′′ ) holds: C ∈ T E ′′  iff C ∈ T E ′ . Lemma 12 (Chains of subexpressions). Given an IMCG program F,  539 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 Proof. The statement holds for F = E due to the unique labelling of F and also because in this case T E ′′  ⊆ T E ′ , taking into account the rules for the chains operator in Table 10 (only parallel operator does not include the chains of subexpressions directly). Otherwise assume that the statement holds for E, E −→ E ′ , and we will prove that the statement holds for E ′ as well by induction on the transition derivation. The rules (1–10) in Table 3 are clear because the right side is a subexpression of the right side without synchronisation construct. For the rest of the rules the statement follows from the induction hypothesis besides the rules (9) and (16) where we can use Lemma 8. The last states that T E ′′ {X /X := E } = T X := E  for all X ≺ E ′′ ≺ X := E, and the statement is clear.  Lemma 13 (Chains under transitions). Given an IMCG program F,  C ∗ = lnF , F −→ E and dom(C ) ⊆ Labs(E ), then ∈ T F  iff C ∈ T E  and nameh ,fn(F ) (C ) = nameh ,fn(E) (C ). ∗ −→ E. It is trivially true for F. Otherwise we will prove that for E −→ E ′ and dom(C ) ⊆ Labs(E ′ ) holds: C ∈ T E  iff C ∈ T E ′ . We will prove this by induction on the Proof. We prove the statement by induction on the number of steps in F transition derivation, using Lemma 12. Note that nameh ,fn(F ) (C ′ ) = nameh ,fn(E) (C ′ ) is true according to Lemma 7. The statement is clear for the base cases – the rules (1) and (10) in Table 3 – because the right side of the rules is a subexpression of the left side (without applying the synchronisation construct), and Lemma 12 is applicable. For the most of the other rules the induction hypothesis is directly applicable. For the rules with a synchronisation construct a chain C should be eventually “divided” into chains C1 and C2 between two expressions put in parallel (this is possible, because their labels are disjoint), in order to apply the statement in Lemma 12 to two expressions separately. Additionally, we should apply Lemma 7 to both parallel processes which are well-formed (see the well-formedness rules in Table 6), therefore all the labels that are initially not hidden stay not hidden after the transition. Altogether two chains C1 and C2 in parallel processes in E ′ (that exist according to the induction hypothesis) are combined together into the chain C = C1 + C2 , C ∈ T E ′ . For the rules (9) and (16) we have T E {X /X := E } = T X := E  according to Lemma 8, therefore if the statement is true for E {X /X := E } (this is the case due to the induction hypothesis), then it is also true for X := E.  Lemma 14 ([Kill/generate under transitions). Given an IMCG program F, Ŵ ∗ = vlF  and F −→ E, then for all E ′′  E, ℓ ∈ dom(EŴ E ′′ ) holds: G′Ŵ E ′′ (ℓ) = GŴ F (ℓ) and K′ E ′′ (ℓ) = KF (ℓ) ⊓ EŴ E ′′ . Proof. The statement clearly holds for E = F because F is uniquely labelled and each label occurs in it only once. Note that K′ E ′′ (ℓ) < KF (ℓ) is in fact possible if we have E ′′ + G  F for some G, because KF (ℓ) would “kill” all the exposed labels of G as well. This is, however, taken into account by stating that only exposed labels of E ′′ can be killed in E ′′ by executing ℓ (by taking the greatest lower bound). Otherwise we will prove the statement by induction of the transition derivation E −→ E ′ , i.e. prove that the statements holds for E ′ assuming that it holds for E. It is clear for the base case – the semantic rules (1) an (10) in Table 3 – because the right side is a subexpression of the left side. For the rest of the rules we can use the induction hypothesis concerning E (remember that parallel processes have disjoint labels) besides the rules (9) and (16). For them we have to prove that if the statement holds for X := E then it also holds for E ′′ {X /X := E } if E ′′  E and X ≺ E ′′ . This can be proved by induction on the syntactic structure of E ′′ using Lemmas 4 (concerning the exposed operator) and 10 (concerning the generate and kill operators). From the two lemmas we can derive that G′Ŵ E ′′ {X /X := E }(ℓ) = G′Ŵ E ′′ (ℓ) and K′ E ′′ {X /X := E }(ℓ) ⊓ EŴ E ′′ {X /X := E } = KE ′′ (ℓ) ⊓ EŴ E ′′  for all E ′′  E and ℓ ∈ dom(EŴ E ′′ ), and the induction hypothesis is applicable.  Theorem 15 (Worklist algorithm). Given an IMCG program F, then the Worklist algorithm terminates on F and constructs the LTS, whose initial state EŴ F  with Ŵ = vlF  is strongly bisimilar to the LTS induced by the semantics of F, i.e. F ∼ EŴ F . Proof. The idea of the proof is to show that the relation R defined on the set {E , EŴ E |F ∗ −→ E } as R = {(E1 , E2 )|EŴ E1  = EŴ E2 } ∪ {(EŴ E , EŴ E )} ∪ {(E , EŴ E )} ∪ {(EŴ E , E )}} (i.e. the relation on states with the same exposed labels) is a strong bisimulation relation. First, it is clearly an equivalence relation. It is obviously reflexive, symmetric and transitive (can be proved by checking all the cases). Second, it would be enough to prove that each transition from E has a corresponding transition from EŴ E  and vice versa, such that both transitions are decorated with the same action name or delay rate and lead to bisimilar states. More formally, we will prove that E α α C C −→ E ′ iff EŴ E  −→ EŴ E ′  for all α ∈ Act ∪ {τ } ∪ Rate: the first condition in Definition 3 (concerning α ∈ Act ∪ {τ }) holds directly, and the second follows by summing all the rates in the transitions into each equivalence class. The statement will hold due to the properties of the Pathway Analysis operators. By combining together Lemmas 9 and 13 (existence of the transition and using chains of F instead of E) and Lemmas 11 and 14 (generate and kill operator can predict exposed labels of E ′ , generate and kill operators computed for F can be used instead of the ones computed for E) follows that all the transitions from E are constructed by the Worklist algorithm for 540 N. Skrypnyuk et al. / Journal of Logic and Algebraic Programming 81 (2012) 522–540 ∗ α EŴ E  and vice versa for each E reachable from F. Formally, F −→ E −→ E ′ iff there exists C ∈ T F  such that C ≤ EŴ E , C   nameh ,fn(F ) (C ) = α , EŴ E ′  = EŴ E  − C (ℓ)>0 KF (ℓ) + C (ℓ)>0 GŴ F (ℓ) and, moreover, if α ∈ Rate then there is no C ′ such that C ′ ∈ T F , C ′ ≤ EŴ E  and nameh ,fn(F ) (C ′ ) = τ . For each state E reachable from F a corresponding state EŴ E  will be created by the Worklist algorithm – this can be proved by induction on the number of steps in F ∗ −→ E. On the other hand, for each state M reachable from EŴ F  there exist a (possibly more than one) state in the LTS induced by F which is bisimilar to M. This follows from the definition of the Worklist algorithm which only creates states that are “predicted” by the Pathway Analysis results, i.e. which have their corresponding states in the semantics of F. Altogether, E ∼ EŴ E  for all E reachable from F, therefore F ∼ EŴ F  as well. The Worklist algorithm terminates on F because the number of states that can be constructed are maximally 2|Labs(F )| (according to Lemma 6, all the states reachable from F have maximal one exposed label of each kind) and, as we have already mentioned, for each state M created by the Worklist algorithm there exists a state E such that F ∗ −→ E and EŴ E  = M.  References [1] Ed Brinksma, Holger Hermanns, Process algebra and Markov chains, in: Ed Brinksma, Holger Hermanns, Joost-Pieter Katoen (Eds.), Euro Summer School on Trends in Computer Science, Lecture Notes in Computer Science, vol. 2090, Springer, 2000, pp. 183–231.ISBN 3-540-42479-2 [2] Edmund M. Clarke, E. Allen Emerson, A. Prasad Sistla, Automatic verification of finite-state concurrent systems using temporal logic specifications, ACM Trans. Program. Lang. Systems 8 (1986) 244–263. [3] Holger Hermanns, Interactive Markov Chains: The Quest for Quantified Quality, Lecture Notes in Computer Science, vol. 2428, Springer, 2002.ISBN 3-54044261-8. URL: http://dx.doi.org/10.1007/3-540-45804-2 [4] Charles A.R. Hoare, Communicating Sequential Processes, Prentice-Hall, 1985.ISBN 0-13-153271-5 [5] John B. Kam, Jeffrey D. Ullman, Monotone data flow analysis frameworks, Acta Inform. 7 (1977) 305–317. [6] Sebastian Nanz, Flemming Nielson, Hanne Riis Nielson, Modal abstractions of concurrent behaviour, in: María Alpuente, Germán Vidal (Eds.), Static Analysis, 15th International Symposium, SAS 2008, Valencia, Spain, Lecture Notes in Computer Science, vol. 5079, Springer, 2008, pp. 159–173.ISBN 9783-540-69163-1 [7] Flemming Nielson, Hanne Riis Nielson, Chris L. Hankin, Principles of Program Analysis, Springer, 1999.(Second printing, 2005) [8] Flemming Nielson, Hanne Riis Nielson, Corrado Priami, Debora Schuch da Rosa, Static analysis for systems biology, in: Proceedings of WISICT ’04, Trinity College Dublin, 2004. [9] Flemming Nielson, Hanne Riis Nielson, Corrado Priami, Debora Schuch da Rosa, Control flow analysis for BioAmbients, Electron. Notes Theoret. Comput. Sci. 180 (3) (2007) 65–79. [10] Hanne Riis Nielson, Flemming Nielson, Program Analysis and Compilation, Theory and Practice, Data Flow Analysis for CCS, Springer, Berlin, Heidelberg, 2007.pp. 311–327. ISBN 3-540-71315-8, 978-3-540-71315-9. Available from: <http://dl.acm.org/citation.cfm?id=1805793.1805810> [11] Hanne Riis Nielson, Flemming Nielson, A monotone framework for CCS, Comput. Lang. Syst. Struct. 35 (4) (2009) 365–394.ISSN 1477-8424 [12] Henrik Pilegaard, Language Based Techniques for Systems Biology, Ph.D. thesis, Informatics and Mathematical Modelling, Technical University of Denmark, DTU, 2007. Available from: <http://www2.imm.dtu.dk/pubdb/p.php?5352>. [13] Henrik Pilegaard, Flemming Nielson, Hanne Riis Nielson, Pathway analysis for BioAmbients, J. Logic Algebr. Program. 77 (1–2) (2008) 92–130. [14] Gordon Plotkin, A Structural Approach to Operational Semantics, Technical Report DAIMI FN-19, University of Aarhus, 1981. Available from: <http://citeseer.ist.psu.edu/plotkin81structural.html>. [15] Aviv Regev, Ekaterina M. Panina, William Silverman, Luca Cardelli, Ehud Y. Shapiro, BioAmbients: an abstraction for biological compartments, Theoret. Comput. Sci. 325 (1) (2004) 141–167. [16] Nataliya Skrypnyuk, Verification of Stochastic Process Calculi, Ph.D. thesis, Technical University of Denmark, Department of Informatics and Mathematical Modeling, Language-Based Technology, Kgs. Lyngby, Denmark, 2011. [17] Nataliya Skrypnyuk, Flemming Nielson, Reachability for finite-state process algebras using Static Analysis, in: Giorgio Delzanno, Igor Potapov (Eds.), Reachability Problems, 5th International Workshop, RP 2011, Genova, Italy, September 28–30, 2011, Proceedings, Lecture Notes in Computer Science, vol. 6945, Springer, 2011, pp. 231–244. ISBN 978-3-642-24287-8. [18] Nataliya Skrypnyuk, Flemming Nielson, Henrik Pilegaard, Pathway Analysis for IMC, in: Proceedings of the 21st Nordic Workshop on Programming Theory (NWPT’09), Kgs. Lyngby, Denmark, October 2009, Technical University of Denmark, 2009.