[go: up one dir, main page]

Academia.eduAcademia.edu
Improved Lower Bounds for the Shortest Superstring and Related Problems Richard Schmied† arXiv:1111.5442v3 [cs.CC] 27 Aug 2012 Marek Karpinski∗ Abstract We study the approximation hardness of the Shortest Superstring, the Maximal Compression and the Maximum Asymmetric Traveling Salesperson (MAX-ATSP) problem. We introduce a new reduction method that produces strongly restricted instances of the Shortest Superstring problem, in which the maximal orbit size is eight (with no character appearing more than eight times) and all given strings having length at most four. Based on this reduction method, we are able to improve the best up to now known approximation lower bound for the Shortest Superstring problem and the Maximal Compression problem by an order of magnitude. The results imply also an improved approximation lower bound for the MAX-ATSP problem. 1 Introduction In the Shortest Superstring problem, we are given a finite set S of strings and we would like to construct their shortest superstring, which is the shortest possible string such that every string in S is a proper substring of it. The task of computing a shortest common superstring appears in a wide variety of application related to computational biology (see. e.g. [L88] and [L90]). Intuitively, short superstrings preserve important biological structure and are good models of the original DNA sequence. In context of computational biology, DNA sequencing is the important task of determining the sequence of nucleotides in a molecule of DNA. The DNA can be seen as a double-stranded sequence of four types of nucleotides represented by the alphabet {a, c, g, t}. Identifying those strings for different molecules is an important step towards understanding the biological functions of the molecules. However, with current laboratory methods, it is quite impossible to extract a long molecule directly as a whole. In fact, biochemists split millions of identical molecules into pieces each typically containing Dept. of Computer Science and the Hausdorff Center for Mathematics, University of Bonn. Supported in part by DFG grants and the Hausdorff Center grant EXC59-1. Email: marek@cs.uni-bonn.de † Dept. of Computer Science, University of Bonn. Work supported by Hausdorff Doctoral Fellowship. Email: schmied@cs.uni-bonn.de ∗ 1 at most 500 nucleotides. Then, from sometimes millions of these fragments, one has to compute the superstring representing the whole molecule. From the computational point of view, the Shortest Superstring problem is an optimization problem, which consists of finding a minimum length superstring for a given set S of strings over a finite alphabet Σ. The underlying decision version was proved to be NP-complete [MS77]. However, there are many applications that involve relatively simple classes of strings. Motivated by those applications, many authors have investigated whether the Shortest Superstring problem becomes polynomial time solvable under various restrictions to the set of instances. Gallant et al. [GMS80] proved that this problem in the exact setting is still NPcomplete for strings of length three and polynomial time solvable for strings of length two. On the other hand, Timkovskii [T90] studied the Shortest Superstring problem under restrictions to the orbit size of the letters in Σ. The orbit size of a letter is the number of its occurrences in the strings of S. Timkovskii proved that this problem restricted to instances with maximal orbit size two is polynomial time solvable. He raised the question about the status of the problem with maximal orbit size k for any constant k ≥ 3. It is known that the Shortest Superstring problem remains NP-hard for the following strongly restricted instances, such as (i) all strings have length four and the maximal orbit size is six [M94], (ii) the size of the alphabet of the instance is exactly two [GMS80], and (iii) all strings are of the form 10p10q, where p, q ∈ N [M98]. In order to cope with the exact computation intractability, approximation algorithms were designed to deal with this problem. The first polynomial time approximation algorithm with a constant approximation ratio was given by Blum et al. [BJL+ 94]. It achieves an approximation ratio 3. This factor was improved in a series of papers yielding approximation ratios of 2.88 by Teng and Yao [TT93]; 2.83 by Czumaj et al. [CGP+ 94]; 2.79 by Kosaraju, Park, and Stein [KPS94]; 2.75 by Armen and Stein [AS95]; 2.67 by Armen and Stein [AS98]; 2.596 by Breslauer, Jiang, and Jiang [BJJ97] and 2.5 by Sweedyk [S99]. The currently best known approximation algorithm is due to Mucha [M12] and yields an approximation ratio of 2.478. On the lower bound side, Blum et al. [BJL+ 94] proved that approximating the Shortest Superstring problem is APX-hard. However, the constructed reduction produces instances with arbitrarily large alphabets. In [O99], Ott provided the first explicit approximation hardness result and proved that the problem is APXhard even if the size of the alphabet is two. In fact, Ott proved that instances over a binary alphabet are NP-hard to approximate with an approximation ratio 17246/17245 (1.000057) − ǫ for every ǫ > 0. In 2005, Vassilevska [V05] gave an improved approximation lower bound of 1217/1216 (1.00082) by using a natural construction. The constructed instances of the Shortest Superstring problem have maximal orbit size 20 and the length of the strings is exactly 4. In this paper, we prove that even instances of the Shortest Superstring problem with maximal orbit size 8 and all strings having length 4 are NP-hard to 2 approximate with less than 333/332 (1.00301). Maximal Compression problem. We are given a collection of strings S = {s1 , . . . , sn }. The task is to find a superstring for S with maximum compression, which is the difference between the sum of the lengths of the given strings and the length of the superstring. In the exact setting, an optimal solution to the Shortest Superstring problem is an optimal solution to this problem, but the approximate solutions can differ significantly in the sense of approximation ratio. The Maximal Compression problem arises in various data compression problems (cf. [SS82], [S88] and [MJ75]). The decision version of this problem is NP-complete [MS77]. Tarhio and Ukkonen [TU88] and Turner [T89] gave approximation algorithms with approximation ratio 2. The best known approximation upper bound is 1.5 [KLS+ 05] by reducing it to the MAX-ATSP problem, which is defined below. On the approximation lower bound side, Blum et al. [BJL+ 94] proved the APX-hardness of the Maximal Compression problem. The first explicit approximation lower bounds were given by Ott [O99], who proved that it is NP-hard to approximate this problem with an approximation factor 11217/11216 (1.000089)−ǫ for every ǫ > 0. This hardness result was improved by Vassilevska [V05] implying a lower bound of 1072/1071 (1.00093) − ǫ for any ǫ > 0, unless P = NP. In this paper, we prove that approximating the Maximal Compression problem with an approximation ratio less than 204/203 (1.00492) is NP-hard. Maximum Asymmetric Traveling Salesperson (MAX-ATSP) problem. We are given a complete directed graph G and a weight function w assigning each edge of G a nonnegative weight. The task is to find a closed tour of maximum weight visiting every vertex of G exactly once . This problem has various applications and in fact, a good approximation algorithm for MAX-ATSP yields a good approximation algorithm for many other optimization problems such as the Shortest Superstring problem, the Maximum Compression problem and the Minimum Asymmetric (1, 2)-Traveling Salesperson (MIN-(1, 2)-ATSP) problem. The latter problem is the restricted version of the Minimum Asymmetric Traveling Salesperson problem, in which we restrict the weight function w to weights one and two. The MAX-ATSP problem can be seen as a generalization of the MIN-(1, 2)-ATSP problem in the sense that any ( α1 )-approximation algorithm for the former problem transforms in a (2 − α)- approximation algorithm for the latter problem. Due to this reduction, all negative results concerning the approximation of the MIN-(1, 2)-ATSP problem imply hardness results for the MAX-ATSP problem. Since MIN-(1, 2)-ATSP is APX-hard [PY93], there is little hope for polynomial time approximation algorithms with arbitrary good precision for the MAX-ATSP problem. On the other hand, the first approximation algorithm for the MAX-ATSP problem with guaranteed approximation performance is due to Fisher, Nemhauser, and Wolsey [FNW79] and achieves an approximation factor of 2. After that Kosaraju, Park, and Stein [KPS94] gave an approximation algorithm for that problem with performance ratio 1.66. This result was improved by 3 Bläser [B02] who obtained an approximation upper bound of 1.63. Lewenstein and Sviridenko [LS03] were able to improve the approximation upper bound for that problem to 1.60. Then, Kaplan et al. [KLS+ 05] designed an algorithm for the MAX-ATSP problem yielding the best known approximation upper bound of 1.50. On the approximation hardness side, Engebretsen [E99] proved that, for any ǫ > 0, there is no (2805/2804 − ǫ)-approximation algorithm for MIN-(1, 2)-ATSP, unless P = NP, which yields an approximation lower bound of 2804/2803 (1.00035)− ǫ for the MAX-ATSP problem. The negative result was improved by Engebretsen and Karpinski [EK06] to 321/320 (1.0031) − ǫ for the MIN-(1, 2)-ATSP problem. It implies the best known approximation lower bound of 320/319 (1.0031) − ǫ, unless P = NP. In this paper, we prove that approximating the MAX-ATSP problem with an approximation ratio less than 204/203 (1.00492) is NP-hard. 2 Preliminaries In the following, we introduce some notation and abbreviations. Throughout, for i ∈ N, we use the abbreviation [i] for the set {1, . . . , i}. Given an finite alphabet Σ, a string is an element of Σ∗ . Given two strings v = v1 ⋯vn and w = w1 ⋯wm over Σ, we denote the length of v by ∣v∣. Furthermore, v is a substring of w, if m ≥ n and there exists a j ∈ {0, .., n − m} such that for all i ∈ [m], vi = wj+i . w is said to be a superstring of v if v is a substring of w. Given a set of strings S = {s1 , ..., sn } ⊂ Σ∗ , a string s ∈ Σ∗ is a superstring for S if s is a superstring of every si ∈ S. Given a superstring s for S, the compression of s, denoted comp(S, s), is defined as comp(S, s) = ∑ ∣si ∣ − ∣s∣. si ∈S In addition, we introduce the notion of the maximal orbit size of S being the maximal number of occurences of a character in S. We are ready to give the definition of the Shortest Superstring problem and the Maximal Compression problem. Definition 1. Given an alphabet Σ and a set of strings S = {s1 , ..., sn } ⊂ Σ∗ such that no string in S is a substring of another string in S, in the Shortest Superstring problem we have to find a string s for S of minimum length, whereas in the Maximum Compression problem, we have to find a superstring s for S with maximum compression. In the following, we concentrate on the traveling salesperson problems. We begin with the definition of the MAX-ATSP problem. For this reason, we introduce the notion of a Hamiltonian tour. Given a directed graph G = (V, A), a Hamiltonian tour is a cycle in G visiting each vertex of G exactly once. Definition 2 (MAX-ATSP). Given a complete directed graph G = (V, A) and a weight function w assigning each edge of G a nonnegative weight, the MAX-ATSP problem consists of finding a Hamiltonian tour of maximum weight in G. 4 Next, we give the definition of the MIN-(1, 2)-ATSP problem, which is closely related to the MAX-ATSP problem. Definition 3 (MIN-(1, 2)-ATSP). In the MIN-(1, 2)-ATSP problem, we are given a complete directed graph G = (V, A) and a weight function w ∶ A → {1, 2}. The task is to find a Hamiltonian tour of minimum weight in G. 3 Related Work In the following, we present some results related to the problems studied in this paper. In particular, we describe briefly some reductions, which we use later on. The following theorem is due to Vassilevska [V05] and deals with best known approximation lower bounds for the Shortest Superstring problem as well as for the Maximal Compression problem. Theorem 1 ([V05]). For any ǫ > 0, it is NP-hard to approximate the Shortest Superstring problem and the Maximal Compression problem restricted to instances with equal length strings in polynomial time within a factor of • 1.00082 − ǫ and • 1.00093 − ǫ, respectively. In addition, the maximal orbit size of the constructed instances in [V05] is 20 and all strings have length four. In the same paper, it was proved that the Shortest Superstring problem is the hardest to approximate on instances over a binary alphabet. Theorem 2 ([V05]). Suppose the Shortest Superstring problem can be approximated by a factor α on instances over a binary alphabet. Then, the Shortest Superstring problem can be approximated by a factor α on instances over any alphabet. Given an instance S of the Shortest Superstring problem, consider the associated weighted complete graph, in which the vertices are represented by the strings in S and the weight of an edge is given by the the number of maximum overlapped letters of the corresponding strings. Then, the optimal compression is equivalent to the weight of a maximum Hamiltonian path. By introducing a special vertex representing the start and the end of the Hamiltonian cycle, the Maximal Compression problem is equivalent to the MAX-ATSP problem on this graph. This fact was used in [KLS+ 05] in order to obtain an improved approximation algorithm for the Maximal Compression problem. Fact 1. An α-approximation algorithm for the MAX-ATSP problem implies an αapproximation algorithm for the Maximal Compression problem. 5 Another interesting relation can be derived by replacing all edges with weight two of an instance of the MIN-(1, 2)-ATSP problem by edges of weight zero and then, computing a Hamiltonian tour of maximum weight. Vishwanathan[V92] proved that this transformation relates the MIN-(1, 2)-ATSP problem to the MAX-ATSP problem in the following sense. Theorem 3 ([V92]). An ( α1 )- approximation algorithm for the MAX-ATSP problem implies an (2 − α)- approximation algorithm for the MIN-(1, 2)-ATSP problem. Due to this reduction, every hardness result concerning the MIN-(1, 2)-ATSP problem can be transformed into a hardness result for the MAX-ATSP problem. The best known approximation lower bound for the MIN-(1, 2)-ATSP problem is proved in [EK06] and it yields the following hardness result . Theorem 4 ([EK06]). For any constant ǫ > 0, it is NP-hard to approximate the MIN-(1, 2)-ATSP problem with an approximation ratio 1.0031 − ǫ. According to Theorem 3, it implies the hardness result for the MAX-ATSP problem stated below. Corollary 1. For any constant ǫ > 0, it is NP-hard to approximate the MAX-ATSP problem within 1.0031 − ǫ. 3.1 Hybrid Problem In their paper on approximation hardness of bounded occurrence instances of several combinatorial optimization problems, Berman and Karpinski [BK99] introduced the Hybrid problem and proved that this problem is NP-hard to approximate with some constant. Definition 4 (Hybrid problem). Given a system of linear equations mod 2 containing n variables, m2 equations with exactly two variables, and m3 equations with exactly three variables, find an assignment to the variables that satisfies as many equations as possible. In the aforementioned paper, Berman and Karpinski proved the following hardness result. Theorem 5 ([BK99]). For any constant ǫ > 0, there exists instances of the Hybrid problem with 42ν variables, 60ν equations with exactly two variables, and 2ν equations with exactly three variables such that: (i) Each variable occurs exactly three times. (ii) Either there is an assignment to the variables that leaves at most ǫν equations unsatisfied, or else every assignment to the variables leaves at least (1 − ǫ)ν equations unsatisfied. (iii) It is NP-hard to decide which of the two cases in item (ii) above holds. 6 Analyzing the details of their construction, it can be seen that every instance of the Hybrid problem produced by it has an even more special structure. The equations containing three variables are of the form x ⊕ y ⊕ z = {0, 1}. Those equations arise from the Theorem of Håstad [H01] concerning the hardness of approximating equations with exactly three variables called the MAX-E3-LIN problem, which can be seen as a special instance of the Hybrid problem. y9 y10 y11 y12 y13 y7 y14 y6 y15 y5 y16 y17 y4 y3 y18 y2 y19 y1 y20 y21 y8 x9 x10 x11 x12 x8 x13 x7 x14 x6 x15 x5 x16 x4 x17 x3 x18 x2 x19 x1 x21 x20 hyperedge e z8 z9 z10 z11 z7 z6 z5 z4 z3 z2 z1 z21 z20 z12 z13 z14 z15 z16 z17 z18 z19 Figure 1: An example of a Hybrid instance with circles C x , C y , C z , and hyperedge e = {z7 , y21 , x14 }. For every variable x of the original instance E3 of the MAX-E3-LIN problem, they introduced a corresponding set of variables Vx . If the variable x occurs tx times in E3 , then, Vx contains 7tx variables x1 , . . . , x7tx . Furthermore, the variables in Vx are connected by equations of the form xi ⊕ xi+1 = 0 with i ∈ [7tx − 1] and x1 ⊕ x7tx = 0. This construction induces the circle C x on the variables Vx . In addition to it, every circle C x possesses an associated matching M x . The variables contained in Con(Vx ) = {xi ∣ i ∈ {7ν ∣ ν ∈ [tx ]}} are called contact variables, whereas the variables in Vx /Con(Vx ) are called checker variables. Let E3 be an instance of the MAX-E3-LIN problem and H be its corresponding instance of the Hybrid problem. We denote by V (E3 ) the set of variables which occur in the instance E3 . Then, H can be represented graphically by ∣V (E3 )∣ circles C x with x ∈ V (E3 ) containing the variables V (C x ) = {x1 , . . . , xnx } as vertices. The edges are identified by the equations included in H. The equations with ex7 actly three variables are represented by hyperedges e with cardinality ∣e∣ = 3. The equations xi ⊕ xi+1 = 0 induce a circle containing the vertices {x1 , . . . , xnx } and the matching equations xi ⊕xj = 0 with {i, j} ∈ M x induce a perfect matching on the set of checker variables. An example of an instance of the Hybrid problem is depicted in Figure1. In summary, we notice that there are four type of equations in the Hybrid problem (i) the circle equations xi ⊕ xi+1 = 0 with i ∈ [7tx − 1], (ii) circle border equations x1 ⊕ x7tx , (iii) matching equations xi ⊕ xj = 0 with {i, j} ∈ M x , and (iv) equations with three variables of the form x ⊕ y ⊕ z = {0, 1}. In the remainder, we assume that equations of the form x ⊕ y ⊕ z = {0, 1} contain only unnegated variables due to the transformation x̄ ⊕ y ⊕ z = 0 ≡ x ⊕ y ⊕ z = 1. 4 Our Contribution We now formulate our main result. Theorem 6. Given an instance H of the Hybrid problem with n circles, m2 equations with two variables and m3 equations with exactly three variables with the properties described in Theorem 5, we construct in polynomial time an instance SH of the Shortest Superstring problem and Maximal Compression problem with the following properties: (i) If there exists an assignment φ to the variables of H which leaves at most u equations unsatisfied, then, there exist a superstring sφ for SH with length at most ∣sφ ∣ = 5m2 + 16m3 + 7n + u. (ii) From every superstring s for SH with length ∣s∣ = 5m2 + 16m3 + u + 7n, we can construct in polynomial time an assignment ψs to the variables of H that leaves at most u equations in H unsatisfied. (iii) If there exists an assignment φ to the variables of H which leaves at most u equations unsatisfied, then, there exist a superstring sφ for SH with compression at least comp(SH , sφ ) = 3m2 + 12m3 − u + 5n . (iv) From every superstring s for SH with compression comp(SH , s) = 3m2 + 12m3 − u + 5n, we can construct in polynomial time an assignment ψs to the variables of H that leaves at most u equations in H unsatisfied. (v) The maximal orbit size of the instance SH is 8 and the length of each string in SH is 4. The former theorem can be used to derive an explicit approximation lower bound for the Shortest Superstring problem by reducing instances of the Hybrid problem of the form described in Theorem 5 to the Shortest Superstring problem. Corollary 2. For every ǫ > 0, it is NP-hard to approximate the Shortest Superstring 333 (1.00301) − ǫ. problem with an approximation factor 332 8 333 333−δ Proof. First of all, we choose constants k ∈ N and δ > 0 such that 332+δ+ 42 ≥ 332 − ǫ k holds. Given an instance E3 of the MAX-E3-LIN problem, we generate k copies of E3 and produce an instance H of the Hybrid problem. Then, we construct the corresponding instance SH of the Shortest Superstring problem with the properties described in Theorem 6. We conclude according to Theorem 5 that there exist a superstring for SH with length at most 5 ⋅ 60νk + 16 ⋅ 2νk + δνk + 7n ≤ (332 + δ + 7n 7⋅6 )νk ≤ (332 + δ + )νk kν k or the length of a superstring for SH is bounded from below by 5 ⋅ 60νk + 16 ⋅ 2νk + (1 − δ)νk + 7n ≥ (332 + (1 − δ))νk ≥ (333 − δ)νk. From Theorem 5, we know that the two cases above are NP-hard to distinguish. Hence, for every ǫ > 0, it is NP-hard to find a solution to the Shortest Superstring 333 333−δ problem with an approximation ratio 332+δ+ 42 ≥ 332 − ǫ. k Analogously, Theorem 6 can be used to derive an approximation lower bound for the Maximal Compression problem. Corollary 3. For every ǫ > 0, it is NP-hard to approximate the Maximal Compression problem with an approximation factor 204/203 (1.00492) − ǫ. By applying Fact 1, we obtain the following hardness result for the MAX-ATSP problem. Corollary 4. For every ǫ > 0, it is NP-hard to approximate the MAX-ATSP problem with an approximation factor 204/203 (1.00492) − ǫ. 5 Reduction from the Hybrid Problem In this section, we present the proof of Theorem 6. But before we describe the approximation preserving reduction, we first prove a slightly weaker result. The first approach uses strings with length 6 simulating equations with exactly three variables. In section 5.6, we will introduce smaller gadgets for equations with exactly three variables implying the claimed inapproximability results. Let us state the properties of our first approach. Theorem 7. Given an instance H of the Hybrid problem with n circles, m2 equations with two variables and m3 equations with exactly three variables with the properties described in Theorem 5, we construct in polynomial time an instance SH of the Shortest Superstring problem and Maximal Compression problem with the following properties: (i) If there exists an assignment φ to the variables of H which leaves at most u equations unsatisfied, then, there exist a superstring sφ for SH with length at most ∣sφ ∣ = 5m2 + 22m3 + 7n + u. 9 (ii) From every superstring s for SH with length ∣s∣ = 5m2 + 22m3 + u + 7n, we can construct in polynomial time an assignment ψs to the variables of H that leaves at most u equations in H unsatisfied. (iii) If there exists an assignment φ to the variables of H which leaves at most u equations unsatisfied, then, there exist a superstring sφ for SH with compression at least comp(SH , sφ ) = 3m2 + 14m3 − u + 5n . (iv) From every superstring s for SH with compression comp(SH , s) = 3m2 + 14m3 − u + 5n, we can construct in polynomial time an assignment ψs to the variables of H that leaves at most u equations in H unsatisfied. (v) The maximal orbit size of the instance SH is eight and the length of a string in SH is bounded by six. Combining Theorem 5 with Theorem 7, we obtain the following explicit lower bound for the Shortest Superstring problem. Corollary 5. It is NP-hard to approximate the Shortest Superstring problem with an approximation factor less than 345/344 (1.0029). Before we proceed to the proof of Theorem 7, we describe the reduction from a high-level view and try to build some intuition. 5.1 Main Ideas and Overview Given an instance of the Hybrid problem H, we want to transform H into an instance of the Shortest Superstring problem. Fortunately, the special structure of the linear equations in the Hybrid problem is particularly well-suited for our reduction, since a part of the equations with two variables form a circle and every variable occurs exactly three times. For every equation gi+1 ≡ xi ⊕ xi+1 = 0 included in this circle, we introduce a set S(gi+1 ) containing two strings, which can be aligned advantageously in two natural ways. If those fragments corresponding to two successively following equations xi−1 ⊕ xi = 0 and xi ⊕ xi+1 = 0 use the same natural alignment, we are able to overlap those fragments by one letter. From a high level view, we can construct an associated superstring for each circle in H, which contains the natural aligned strings. In fact, we define for every equation g ∈ H an associated set of strings S(g) and the corresponding natural alignments. The instance SH of the Shortest Superstring problem is given by the union of all sets S(g). Due to the construction of the sets S(g), there is a particular way to interpret an alignment of the strings in S(g) included in the resulting superstring as an assignment to the variables in the Hybrid instance. The major challenge in the proof of correctness is to prove that every superstring for SH can be interpreted as an assignment to the variables in the Hybrid instance H with the property that the number of satisfied equations is connected to the length of the superstring. 10 5.2 Constructing SH from H Given an instance of the Hybrid problem H, we are going to construct the corresponding instance SH of the Shortest Superstring problem. Furthermore, we introduce some notations and conventions. For every equation g ∈ H, we define a set S(g) of corresponding strings. The corresponding instance SH of the Shortest Superstring problem is given by SH = ⋃ S(g). The strings in the set S(g) differ by the type of considered equation g∈H g ∈ H. Let us start with the description of SH . Therefore, we need to specify the instance of the Hybrid problem more precisely. Let E3 be an instance of the MAX-E3-LIN problem and H its corresponding instance of the Hybrid problem with n circles. For every variable x ∈ V (E3 ), there x is an associated circle C x . Each circle consists of mx2 − 1 circle equations gi+1 with i ∈ [mx2 − 1], a circle border equation g1x ≡ x1 ⊕ xmx2 = 0 and ∣M x ∣ matching equations gex with e ∈ M x . Furthermore, we have m3 equations gj3 with exactly three variables. We are going to specify the sets S(g) differing by the type of equation g. In particular, we distinguish four types of equations contained in H. (i) circle equations (ii) matching equations (iii) circle border equations (iv) equations with exactly three variables Let us begin with the description of the strings corresponding to circle border equations. Strings Corresponding to Circle Border Equations Given an instance of the hybrid problem H, a circle C x in H and its circle border equation g1x ≡ x1 ⊕ xn = 0, we introduce six associated strings, that are all included in the set S(g1x ). Due to the construction of the circle C x , the variable xn is a contact variable. This means that xn appears in an equation gj3 with exactly three variables. The strings in the set S(g1x ) differ by the type of equation gj3 . We begin with the case gj3 ≡ xn ⊕ y ⊕ z = 0. The string Lx Cxl is used as the initial part of the superstring corresponding to this circle, whereas Cxr Rx is used as the end part. Furthermore, we introduce strings that represent an assignment that sets either the variable x1 to 0 or the variable xn to 1. The corresponding two strings are l1 r Cxl xm0 1 xn Cx and r l m0 xl1 n Cx Cx x1 . Finally, we define the last two strings of the set S(g1x ) m0 r Cxl xr1 1 xn Cx and 11 r l r1 xm0 n Cx Cx x1 . having a similar interpretation. Both pairs of strings can be overlapped by two letters. Those natural alignments have a crucial influence during the process of constructing a superstring. For this reason, we introduce a notation for this alignments. By the 0-alignment of the strings in S(g1x ), we refer to the following alignment of the four strings. In the following, (↓) will denote the overlapping of the strings. l1 r Cxl xm0 1 xn Cx and m0 r Cxl xr1 1 xn Cx r l m0 xl1 n Cx Cx x1 and ↓ r l r1 xm0 n Cx Cx x1 . ↓ m1 r l m0 Cxl xm0 1 xn Cx Cx x1 and m0 r xnm0 Cxr Cxl xr1 1 xn Cx On the other hand, we the define the 1-alignment of the strings in S(g1x ) as follows. l1 r Cxl xm0 1 xn Cx and m0 r Cxl xr1 1 xn Cx r l m0 xl1 n Cx Cx x1 and ↓ r l r1 xm0 n Cx Cx x1 . ↓ r l m0 l1 r xl1 n Cx Cx x1 xn Cx and m0 r l r1 Cxl xr1 1 xn Cx Cx x1 Both ways to join the four strings are called simple alignments. After having described how the strings corresponding to S(g1x ) in case of gj3 ≡ xn ⊕ y ⊕ z = 0 are defined, we are going to deal with the case gj3 ≡ xn ⊕ y ⊕ z = 1. As before, we use Lx Cxl as the initial part of the superstring corresponding to this circle, whereas Cxr Rx is used as the end part. Furthermore, we define the remaining four strings contained in S(g1x ) by the following. r l m0 xm1 n Cx Cx x1 m1 r Cxl xm0 1 xn Cx and r l r1 xl0 n Cx Cx x1 l0 r Cxl xr1 1 xn Cx Both pairs of strings can be overlapped by two letters. We introduce a notation for this alignments. 12 m1 r Cxl xm0 1 xn Cx and l0 r Cxl xr1 1 xn Cx xnm1 Cxr Cxl xm0 1 and ↓ r l r1 xl0 n Cx Cx x1 . ↓ m1 r l m0 Cxl xm0 1 xn Cx Cx x1 r l r1 l0 r xl0 n Cx Cx x1 xn Cx and The former introduced alignment is called the 0-alignment of the strings in S(g1x ). On the other hand, we the define the 1-alignment of the strings in S(g1x ) as follows. m1 r Cxl xm0 1 xn Cx and xnm1 Cxr Cxl xm0 1 l0 r Cxl xr1 1 xn Cx and ↓ r l r1 xl0 n Cx Cx x1 . ↓ r l m0 m1 r xm1 n Cx Cx x1 xn Cx l0 r l r1 Cxl xr1 1 xn Cx Cx x1 and In the remainder, we refer to both ways to overlap the four strings as simple alignments. Next, we describe the strings corresponding to matching equations. Strings Corresponding to Matching Equations Let C x be a circle in H and M x its associated perfect matching. Let {i, j} be an x edge in M x and g{i,j} ≡ xi ⊕xj = 0 the associated matching equation. We now define x the corresponding set S(g{i,j} ) consisting of two strings assuming i < j. Then, we introduce two strings of the form l0 r1 l1 xr0 j xj xi xi l1 r0 l0 xr1 i xi xj xj and corresponding to the matching equation. There are two ways to align those two strings to obtain an overlap of two letters. In the remainder, we refer to those alignments as simple. l0 r1 l1 xr0 j xj xi xi l1 r0 l0 xr1 i xi xj xj and ↙ ↘ l1 r0 l0 xr1 i xi xj xj l0 r1 l1 xr0 j xj xi xi ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l1 r0 l0 r1 l1 xr1 i xi xj xj xi xi ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l0 r1 l1 r0 l0 xr0 j xj xi xi xj xj ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l0 r1 l1 xr0 j xj xi xi l1 r0 l0 xr1 i xi xj xj 13 The first way to overlap the strings is called the 0-alignment, whereas the second one is called the 1-alignment. Next, we describe the strings corresponding to circle equations. Strings Corresponding to Circle Equations Let C x be a circle in H and M x its associated matching. Furthermore, let {i, j} and {i + 1, j ′ } be both contained in M x . We assume that i < j. Then, we introduce the corresponding strings for xi ⊕ xi+1 = 0. If i + 1 < j ′ , we have r1 m0 m0 xl1 i xi+1 xi xi+1 and l1 r1 xim0 xm0 i+1 xi xi+1 . and r0 l1 m1 xm0 i xi+1 xi xi+1 . and r0 m1 m1 xl0 i xi+1 xi xi+1 Otherwise (i + 1 > j ′ ), we have m1 m0 r0 xl1 i xi+1 xi xi+1 In case of i > j and i + 1 > j ′ , we use m1 l0 r0 xm1 i xi+1 xi xi+1 Finally, if i > j and i + 1 < j ′ , we introduce r1 l0 m0 xm1 i xi+1 xi xi+1 and m0 m1 r1 xl0 i xi+1 xi xi+1 Let xi be a variable in H contained in an equation gj3 with three variables. We now define the corresponding strings for the equations xi−1 ⊕ xi = 0 and xi ⊕ xi+1 = 0. We assume that {i − 1, j} and {i + 1, j ′ } are both included in M x . Furthermore, we assume i − 1 < j and i + 1 < j ′ . If the equation gj3 is of the form xi ⊕ y ⊕ z = 0, we introduce r1 m0 l1 r1 m0 m0 and xm0 xl1 i−1 xi xi−1 xi . i−1 xi xi−1 xi for xi−1 ⊕ xi = 0. Furthermore, for xi ⊕ xi+1 = 0, we use the strings r1 m0 m0 xl1 i xi+1 xi xi+1 and m0 l1 r1 xm0 i xi+1 xi xi+1 . On the other hand, if the equation gj3 is of the form xi ⊕ y ⊕ z = 1, we introduce m1 m0 r0 xl1 i−1 xi xi−1 xi and m1 r0 l1 xm0 i−1 xi xi−1 xi . corresponding to the equation xi−1 ⊕ xi = 0. For xi ⊕ xi+1 = 0, we use the strings r1 l0 m0 xm1 i xi+1 xi xi+1 and m0 m1 r1 xl0 i xi+1 xi xi+1 . Accordingly, we introduce the notation of simple alignments for the strings in x ). For the strings S(gi+1 m1 r0 l0 xm1 i xi+1 xi xi+1 and m1 m1 l0 xr0 i xi+1 xi xi+1 , we define the following alignments as simple. l0 m1 m1 xr0 i xi+1 xi xi+1 m1 r0 l0 xm1 i xi+1 xi xi+1 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ m1 m1 m1 r0 l0 xm1 i xi+1 xi xi+1 xi xi+1 ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ and ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ m1 m1 r0 l0 l0 xr0 i xi+1 xi xi+1 xi xi+1 ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ m1 r0 l0 xm1 i xi+1 xi xi+1 m1 m1 l0 xr0 i xi+1 xi xi+1 The the former alignment is called the 1-alignment and the latter one is called the 0-alignment. Next, we describe the strings corresponding to equations with three variables. 14 Strings Corresponding to Equations with Three Variables We now concentrate on equations with exactly three variables. Let gj3 be an equation with three variables in H. For every equation gj3 , we define two corresponding sets S A (gj3 ) and S B (gj3 ) both containing exactly three strings. Finally, the set S(gj3 ) is defined by the union S A (gj3 ) ∪ S B (gj3 ). We distinguish whether gj3 is of the form x⊕y⊕z = 1 or x⊕y⊕z = 0. The description starts with the former case. An equation of the form x ⊕ y ⊕ z = 0 is represented by S A (gj3 ) containing the strings xr1 A1j xl1 y r1 A2j y l1 y r1A2j y l1xm0 A3j Cj xm0 A3j Cj xr1 A1j xl1 and by S B (gj3 ) containing the strings xr1 Bj1 xl1 z r1 Bj2 z l1 z r1 Bj2 z l1 Cj Bj3 xm0 Cj Bj3 xm0 xr1 Bj1 xl1 On the other hand, for equations of the form gj3 ≡ x⊕y ⊕z = 1, we introduce S A (gj3 ) containing the following strings. xr0 A1j xl0 y r0 A2j y l0 y r0A2j y l0xm1 A3j Cj xm1 A3j Cj xr0 A1j xl0 Furthermore, we give the definition of S B (gj3 ) including the following strings. xr0 Bj1 xl0 z r0 Bj2 z l0 z r0 Bj2 z l0 Cj Bj3 xm1 Cj Bj3 xm1 xr0 Bj1 xl0 The strings in the set S A (gj3 ) can be aligned in a cyclic fashion in order to obtain different strings which we will use in our reduction. Every specific alignment possesses its own abbreviation given below. xr1 A1j xl1 y r1A2j y l1 y r1A2j y l1xm0 A3j Cj xm0 A3j Cj xr1 A1j xl1 ↓ xr1 A1j xl1 y r1 A2j y l1 xm0 A3j Cj xr1 A1j xl1 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ xr1 A1j xl1 y r1A2j y l1xm0 A3j Cj xr1 A1j xl1 ≡ xr1 Aj xl1 called x1 - alignment ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ y r1 A2j y l1 xm0 A3j Cj ↓ xm0 A3j Cj xr1 A1j xl1 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ y r1A2j y l1xm0 A3j Cj xr1 A1j xl1 y r1 A2j y l1 ≡ y r1Aj y l1 called y 1 - alignment ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ y r1 A2j y l1 xm0 A3j Cj xr1 A1j xl1 y r1 A2j y l1 ↓ xr1 A1j xl1 y r1 A2j y l1 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ xm0 A3j Cj xr1 A1j xl1 y r1A2j y l1xm0 A3j Cj ≡ xm0 Aj Cj called left-x0 - alignment ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ xm0 A3j Cj xr1 A1j xl1 y r1 A2j y l1 xm0 A3j Cj 15 Analogously, the strings in S B (gj3 ) can also be aligned in a cyclic fashion. We are going to define the abbreviations for these alignments. xr1 Bj1 xl1 z r1 Bj2 z l1 z r1 Bj2 z l1 Cj Bj3 xm0 Cj Bj3 xm0 xr1 Bj1 xl1 ↓ xr1 Bj1 xl1 z r1 Bj2 z l1 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ Cj Bj3 xm0 xr1 Bj1 xl1 z r1 Bj2 z l1 Cj Bj3 xm0 ≡ Cj Bj xm0 called right-x0 - alignment ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ Cj Bj3 xm0 xr1 Bj1 xl1 z r1 Bj2 z l1 Cj Bj3 xm0 ↓ xr1 Bj1 xl1 z r1 Bj2 z l1 Cj Bj3 xm0 xr1 Bj1 xl1 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ xr1 Bj1 xl1 z r1 Bj2 z l1 Cj Bj3 xm0 xr1 Bj1 xl1 ≡ xr1 Bj xl1 called x1 - alignment ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ z r1 Bj2 z l1 Cj Bj3 xm0 ↓ Cj Bj3 xm0 xr1 Bj1 xl1 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ z r1 Bj2 z l1 Cj Bj3 xm0 xr1 Bj1 xl1 z r1 Bj2 z l1 ≡ z r1 Bj z l1 called z 1 - alignment ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ z r1 Bj2 z l1 Cj Bj3 xm0 xr1 Bj1 xl1 z r1 Bj2 z l1 The strings in S B (gj3 ) and S A (gj3 ) can be overlapped in a special way that corresponds to assigning the value 0 to x. xr1 A1j xl1 y r1A2j y l1 y r1A2j y l1xm0 A3j Cj xm0 A3j Cj xr1 A1j xl1 z r1 Bj2 z l1 Cj Bj3 xm0 Cj Bj3 xm0 xr1 Bj1 xl1 xr1 Bj1 xl1 z r1 Bj2 z l1 ↓ Cj Bj3 xm0 xr1 Bj1 xl1 xr1 A1j xl1 y r1 A2j y l1 z r1 Bj2 z l1 Cj Bj3 xm0 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ xm0 A3j Cj xr1 A1j xl1 y r1A2j y l1xm0 A3j Cj Bj3 xm0 xr1 Bj1 xl1 z r1 Bj2 z l1 Cj Bj3 xm0 ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ xm0 A3j Cj xr1 A1j xl1 y r1 A2j y l1 xm0 A3j Cj xr1 Bj1 xl1 z r1 Bj2 z l1 In the remainder, we call this alignment the x0 -alignment of S(gj3 ) and use the abbreviation xm0 Cj xm0 for this string. 16 5.3 Constructing the Superstring sφ from φ Given an assignment φ to the variables of H, we are going to construct the associated superstring sφ for the instance SH . For every g ∈ H, we formulate rules for aligning the corresponding strings in S(g) according to the assignment φ. We start with sets corresponding to circle border equations and circle equations. Afterwards, we show how the actual fragments can be overlapped with strings from the sets corresponding to matching equations and equations with three variables. Furthermore, we analyze the relation between the assignment φ and the length of the obtained superstring sφ . We begin with the description of the alignment of strings corresponding to circle border equations in H. Aligning Strings Corresponding to Circle Border Equations Let C x be a circle in H and x1 ⊕ xn = 0 its circle border equation. Furthermore, we assume that xn is contained in a equation with three variables of the form xn ⊕ y ⊕ z = 0. First, we set the string Lx Cxl as the initial part of our superstring corresponding to the circle C x . Then, we use the φ(x1 )-alignment of the strings m1 r Cxl xm0 1 xn Cx , r l m0 xl1 n Cx Cx x1 , m0 r Cxl xr1 1 xn Cx , and r l r1 xm0 n Cx Cx x1 . In this condition, one of the strings sl can be overlapped from the left side with Lx Cxl by one letter. The other string sr will be joined from the right side with Cxr Rx by one letter. This construction will help us to check whether φ assigns the same value to the variable xn as to x1 . The string sr can be interpreted as the φ(x1 )-alignment of the strings corresponding to xn ⊕ xn+1 = 0, since the first letter l1 of sr is either xm0 n or xn . The parts corresponding to a circle border equation with xn ⊕ y ⊕ z = 1 can be constructed analogously. Next, we are going to align strings corresponding to circle equations. Aligning Strings Corresponding to Circle Equations Let xi ⊕ xi+1 = 0 be a circle equation contained in H. Furthermore, let the corresponding strings are given by m0 l1 r1 xm0 i xi+1 xi xi+1 and r1 m0 m0 xl1 i xi+1 xi xi+1 . In dependence of the given assignment φ, we use simple alignments to overlap the considered strings. More precisely, we make use of the φ(xi+1 )-alignment. For every pair of associated strings, we derive an overlap of two letters. We are going to align those fragments with strings corresponding to matching equations and equations with three variables. 17 Aligning Strings Corresponding to Matching Equations Let xi ⊕ xj = 0 be a matching equation in H. Let us assume that i < j. We define x the alignment of the strings in S(g{i,j} ) according to the value of φ(xi+1 ). More precisely, we use the φ(xi+1 )-alignment of the strings l0 r1 l1 xr0 j xj xi xi and l1 r0 l0 xr1 i xi xj xj . Due to this alignment, we obtain an overlap of two letters. We are going to analyze the length of the resulting superstring in dependence of the assignment φ to the variables xi , xi+1 , xj and xj+1 . We start with the case φ(xi+1 ) = φ(xj+1 ) = 1. Case φ(xi+1 ) = φ(xj+1 ) = 1: r0 l0 r1 l1 l1 r0 l0 We use the 1-alignment of the strings xr1 i xi xj xj and xj xj xi xi . The situation is depicted below. (The two triangle notation ▷▷ and ◁◁ will be explained hereafter.) l0 r1 l1 r1 l1 r0 l0 l1 m1 ◁ ◁e b ▷ ▷ Xi xr0 j xj xi xi xi ◁ ◁m ▷ ▷ Yj xi xi xj xj xj ↓ l1 r0 l0 xr1 i xi xj xj ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l1 r0 l0 r1 l1 m1 ◁ ◁e b ▷ ▷ Xi xr1 i xi xj xj xi xi ◁ ◁m ▷ ▷ Yj xj ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l0 r1 l1 xr0 j xj xi xi The actual superstring s is denoted by the following sequence. l0 r1 l1 r1 l1 r0 l0 l1 m1 s = b ▷ ▷ Xi xr0 ◁ ◁e j xj xi xi xi ◁ ◁m ▷ ▷ Yj xi xi xj xj xj The part ▷ ▷ Xi represents a simple alignment of the strings corresponding to r1 xi−1 ⊕ xi = 0 ending with the letter Xi ∈ {xm0 i , xi }, which means r1 r1 m0 m0 l1 l1 r1 m0 m0 m0 l1 ▷ ▷ Xi ∈ {xm0 i−1 xi xi−1 xi xi−1 xi , xi−1 xi xi−1 xi xi−1 xi }. The letter in the box emphasizes the letter which can be used to overlap from the right side with other strings. Furthermore, the string xl1 i ◁ ◁ denotes r1 m0 m0 l1 r1 xl1 i xi+1 xi xi+1 xi xi+1 . Analogously, ▷ ▷ Yj is a simple alignment of the strings m1 corresponding to xj−1 ⊕ xj = 0, where Yj ∈ {xr0 j , xj }. Furthermore, we use m1 l0 r0 m1 m1 xm1 ◁ ◁ to denote xm1 j j xj+1 xj xj+1 xj xj+1 . Finally, b, m and e are sequences of letters, which we do not specify in detail. They define the remaining parts of the superstring s. r1 l1 r0 l0 r1 l1 If Xi = xr1 i holds, we align ▷ ▷ Xi with xi xi xj xj xi xi to achieve an additional overlap of one letter. An analogue situation holds for ▷ ▷ Yj and 18 xm1 ◁ ◁. All in all, we obtain an overlap of three letters if φ(xi ) = φ(xi+1 ) = 1 j and φ(xj+1 ) = φ(xj ) = 1 holds. Otherwise, we lose an overlap of one letter per unsatisfied equation. Case φ(xi+1 ) = φ(xj+1 ) = 0: r0 l0 r1 l1 l1 r0 l0 We use the 0-alignment of the strings xr1 i xi xj xj and xj xj xi xi . l0 r1 l1 l1 r0 l0 m0 l0 b ▷ ▷ Xi xr0 ◁ ◁ m ▷ ▷ Yj xr1 j xj xi xi xi i xi xj xj xj ◁ ◁ e ↓ l0 r1 l1 xr0 j xj xi xi ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l0 r1 l1 r0 l0 b ▷ ▷ Xi xm0 xr0 ◁ ◁ m ▷ ▷ Y j j xj xi xi xj xj ◁ ◁ e i ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l1 r0 l0 xr1 i xi xj xj m0 l1 r1 m0 m0 In this case, we use xm0 ◁ ◁ as an abbreviation for xm0 i i xi+1 xi xi+1 xi xi+1 and m0 m1 m1 l0 r0 l0 r0 holds, we align ▷ ▷ Xi with xl0 j ◁ ◁ for xj xj+1 xj xj+1 xj xj+1 . If Xi = xi xm0 ◁ ◁ and gain an additional overlap of one letter. An analogue situation i holds for ▷ ▷ Yj and xl0 j ◁ ◁. Hence, we obtain an overlap of three letters if φ(xi+1 ) = φ(xi ) = 0 and φ(xj+1 ) = φ(xj ) = 0 holds. If the corresponding equation with two variables is not satisfied, we lose an overlap of one letter. Case φ(xi+1 ) ≠ φ(xj+1 ) = 1: r0 l0 r1 l1 l1 r0 l0 In this case, we use the 0-alignment of the strings xr1 i xi xj xj and xj xj xi xi . l0 r1 l1 l1 r0 l0 m0 m1 b ▷ ▷ Xi xr0 ◁ ◁ m ▷ ▷ Yj xr1 ◁ ◁e j xj xi xi xi i xi xj xj xj ↓ l0 r1 l1 xr0 j xj xi xi b ▷ ▷ Xi xm0 ◁ ◁ m ▷ ▷ Yj xm1 ◁ ◁e i j ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l0 r1 l1 r0 l0 xr0 j xj xi xi xj xj ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l1 r0 l0 xr1 i xi xj xj l0 r1 l1 r0 l0 We attach xr0 j xj xi xi xj xj at the end of our actual solution sφ without having any overlap with the so far obtained superstring. Notice that we obtain in each case an additional overlap of one letter if the corresponding equation with two and Yj = xm1 variables is satisfied, i.e. Xi = xm0 j . i 19 Case φ(xi+1 ) ≠ φ(xj+1 ) = 0: r0 l0 r1 l1 l1 r0 l0 According to φ, we use the 1-alignment of the strings xr1 i xi xj xj and xj xj xi xi . l0 r1 l1 r1 l1 r0 l0 l1 l0 b ▷ ▷ Xi xr0 j xj xi xi xi ◁ ◁m ▷ ▷ Yj xi xi xj xj xj ◁ ◁e ↓ l1 r0 l0 xr1 i xi xj xj ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l1 r0 l0 r1 l1 l0 b ▷ ▷ Xi xr1 i xi xj xj xi xi ◁ ◁m ▷ ▷ Yj xj ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l0 r1 l1 xr0 j xj xi xi l1 r0 l0 r1 l1 l1 We join xr1 i xi xj xj xi xi from the right side with xi ◁ ◁ and obtain an overlap of one letter. This reduces the length of the superstring by one letter independent of the assignment φ(xj ). In case of Xi = xr1 i , we achieve another overlap of one l1 r0 l0 r1 l1 letter, since we are able to align ▷▷ Xi from the right side with xr1 i xi xj xj xi xi . It corresponds to the satisfied equation xi ⊕ xi+1 = 0. Hence, we obtain at least the same number of overlapped letters as satisfied equations. In summary, we note that we are able to achieve an overlap of at least one letter in each case if the corresponding equation is satisfied by φ. Hence, we obtain an overlap of at most three letters. The other cases concerning equations xi ⊕ xj = 0 with i > j can be analyzed analogously. Next, we are going to align strings corresponding to equations with three variables. Aligning Strings Corresponding to Equations with Three Variables Let gj3 ∈ H be an equation with three variables x, y and z. Furthermore, let xi−1 ⊕ x = 0, x ⊕ xi+1 = 0, yj−1 ⊕ y = 0, y ⊕ yj+1 = 0, zk−1 ⊕ z = 0 and z ⊕ zk+1 = 0 be the equations with two variables, in which the variables x, y and z occur. Given the assignment φ to x, y and z, we are going to define the alignment of the corresponding strings. Let us start with equations of the form gj3 ≡ x ⊕ y ⊕ z = 0. Then, we define the rule for aligning strings in S A (gj3 ) and S B (gj3 ) as follows, whereby we handle the cases φ(xi+1 ) + φ(yj+1 ) + φ(zk+1 ) = {3, 2, 1, 0} separately starting with φ(xi+1 ) + φ(yj+1 ) + φ(zk+1 ) = 3. Case φ(xi+1 ) + φ(yj+1 ) + φ(zk+1 ) = 3: In this case, we align the strings in S(gj3 ) in such a way that we obtain the former introduced strings y r1Aj y l1 and z r1 Bj z l1 . The situation, which we want to analyze, is depicted below. 20 b ▷ ▷ X xl1 ◁ ◁ m1 ▷ ▷ Y y r1 Aj y l1 y l1 ◁ ◁ m2 ▷ ▷ Z z r1 Bj z l1 z l1 ◁ ◁ e ↓ z r1 Bj z l1 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ b ▷ ▷ X xl1 ◁ ◁ m1 ▷ ▷ Y y r1Aj y l1 ◁ ◁ m2 ▷ ▷ Z z r1 Bj z l1 ◁ ◁ e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ y r1 Aj y l1 Similarly to the situations that we discussed concerning matching equations, we define the actual superstring s in the way described below. s = b ▷ ▷ X xl1 ◁ ◁ m1 ▷ ▷ Y y r1Aj y l1 y l1 ◁ ◁ m2 ▷ ▷ Z z r1 Bj z l1 z l1 ◁ ◁ e Here, b, m1 , m2 and e denote parts of s, which we do not specify in detail to emphasize the parts corresponding to the equation with three variables. The string xl1 ◁ ◁ denotes the φ(xi+1 )-alignment of the corresponding strings x ). The strings z l1 ◁ ◁ and y l1 ◁ ◁ are defined analogously. In in S(gi+1 this situation, we want to analyze the cases X ∈ {xr1 , xm0 }, Y ∈ {y r1, y m0 } and Z ∈ {z r1 , z m0 }. We infer that we obtain an overlap of four letters if all equations with two variables are satisfied. Otherwise, we lose an overlap of one letter per unsatisfied equation with two variables. Case φ(xi+1 ) + φ(yj+1 ) + φ(zk+1 ) = 2: Let α, γ ∈ {xi+1 , yj+1 , zk+1 } be variables such that φ(γ) = φ(α) = 1 holds. Then, we use the α1 -alignment and γ 1 -alignment of the strings in S A (gj3 ) and S B (gj3 ) breaking ties arbitrary. We display exemplary the situation for φ(zk+1 ) = φ(xi+1 ) = 1. b ▷ ▷ X xr1 Aj xl1 xl1 ◁ ◁m1 ▷ ▷ Y y m0 ◁ ◁m2 ▷ ▷ Z z r1 Bj z l1 z l1 ◁ ◁e ↓ z r1 Bj z l1 b ▷ ▷ X xr1 Aj xl1 ◁ ◁m1 ▷ ▷ Y ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ y m0 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ◁ ◁m2 ▷ ▷ Z z r1 Bj z l1 ◁ ◁e xr1 Aj xl1 In this case, we achieve an overlap of five letters if all equations with two variables are satisfied. Otherwise, we lose an overlap of one letter per unsatisfied equation with two variables. Case φ(xi+1 ) + φ(yj+1 ) + φ(zk+1 ) = 1: 21 If φ(zk+1 ) + φ(xi+1 ) = 1 holds, we align the strings in S B (gj3 ) and S A (gj3 ) to obtain xr1 Aj xl1 and z r1 Bj z l1 . Otherwise, we make use of the strings xr1 Bj xl1 and y r1Aj y l1. We display the situation for φ(yj+1 ) = 1. b ▷ ▷ X xr1 Bj xl1 xm0 ◁ ◁m1 ▷ ▷ Y y r1 Aj y l1 y l1 ◁ ◁m2 ▷ ▷ Z z m0 ◁ ◁e ↓ y r1 Aj y l1 b ▷ ▷ X xm0 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ◁ ◁m1 ▷ ▷ Y y r1 Aj y l1 ◁ ◁m2 ▷ ▷ Z z m0 ◁ ◁e xr1 Bj xl1 Notice that we obtain an overlap of four letters if the equations with two variables are satisfied, i.e. X = xm0 , Z = z m0 and Y = y r1 . Otherwise, we lose an overlap of one letter per unsatisfied equation with two variables. Case φ(xi+1 ) + φ(yj+1 ) + φ(zk+1 ) = 0: In this case, we use the x0 -alignment of the strings in S(gj3 ). The situation is displayed below. b ▷ ▷ X xm0 Cj xm0 xm0 ◁ ◁m1 ▷ ▷ Y y m0 ◁ ◁m2 ▷ ▷ Z z m0 ◁ ◁e ↓ b ▷ ▷ X xm0 Cj xm0 ◁ ◁m1 ▷ ▷ Y y m0 ◁ ◁m2 ▷ ▷ Z z m0 ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ xm0 Cj xm0 Here, we are able to achieve an overlap of five letters if all equations with two variables are satisfied, i.e. X = xm0 , Z = z m0 and Y = y m0 . In summary, we state that we can achieve an overlap of at least one letter independent of the assignment φ. Additionally, we gain another overlap of one letter if the corresponding equation is satisfied by φ. The situation for equations of the form x ⊕ y ⊕ z = 1 can be analyzed analogously. We are going to define the assignment ψs , which is associated to a given superstring for SH . 22 5.4 Defining the Assignment Given a superstring s for SH , we are going to define the associated assignment ψs to the variables of H. In order to deduce the values assigned to the variables in H from s, we have to normalize the given superstring s. For this reason, we define rules that transform a superstring for SH into a normed superstring for SH without increasing the length. First, we introduce the definition of a normed superstring for SH . Definition 5 (Normed Superstring s for SH ). Let H be an instance of the Hybrid problem, SH the corresponding instance of the Shortest Superstring problem and s a superstring for SH . We refer to s as a normed superstring for SH if for every g ∈ H, the superstring s contains sg as a proper substring, whereby sg is resulted due to a simple alignment of the strings included in S(g). After having defined a normed superstring, we are going to state rules which transform a superstring for SH into a normed superstring for SH without increasing the length of the underlying superstring. All transformation can be performed in polynomial time. Once accomplished to generate a normed superstring, we are able to define the assignment ψs and analyze the number of overlapped letters in dependence of the number of satisfied equations in H by ψs . Let us start with transformations of strings corresponding to circle equations and circle border equations. Normalizing Strings Corresponding to Circle and Circle Border Equations r0 l1 m1 Let xi ⊕ xi+1 = 0 be a circle equation in H. Furthermore, let xm0 i xi+1 xi xi+1 and m1 m0 r0 xl1 i xi+1 xi xi+1 be its corresponding strings. We observe that these strings can have an overlap of at most one letter from the left side as well as from the right side with other strings in SH . Given a superstring s for SH , we obtain at least the same number of overlapped letters if we use one of the simple alignments in s. In particular, we have to use the simple alignment that maximizes the number of overlapped letters. r0 l1 m1 Given a superstring s for SH , we separate the strings xm0 i xi+1 xi xi+1 and m0 m1 m0 r0 xl1 i xi+1 xi xi+1 from s. Consequently, this results in at most three strings bxi , m1 l1 r0 xi+1 mxi and xi+1 e such that l1 m1 m0 r0 r0 l1 m1 s = bxm0 i xi+1 xi xi+1 mxi xi+1 xi xi+1 e. Then, we define the transformed superstring s′ with at least the same number of overlapped letters by l1 m1 r0 l1 m1 m0 r0 s′ = bxm0 i xi+1 xi xi+1 xi xi+1 e xi+1 mxi . In order to define the simple alignment, which is used in s by the strings in x ), we are going to state a criterion. S(gi+1 23 x a circle equation. Let the correspondLet s be a superstring for SH and gi+1 m1 m0 r0 m0 r0 l1 m1 ing strings are given by xi xi+1 xi xi+1 and xl1 i xi+1 xi xi+1 . Then, we say that x ) use a 1-alignment in s if there are more strings s1 in the strings in S(gi+1 x ) such that either s1 is overlapped by one letter from the right side SH /S(gi+1 m1 l1 r0 1 with xi xi+1 xm0 i xi+1 or s is overlapped by one letter from the left side with x r0 l1 m1 0 0 xm0 i xi+1 xi xi+1 in s than strings s in SH /S(gi+1 ) such that either s is overlapped m1 m0 r0 0 by one letter from the left side with xl1 i xi+1 xi xi+1 or s is overlapped by one letter m0 r0 l1 m1 x from the right side with xi xi+1 xi xi+1 in s. Otherwise, the strings in S(gi+1 ) use a 0-alignment in s. Given a superstring s for SH , we define informally a part of the backbone of our transformed superstring by the strings sg , where sg is resulted due to a simple alignment used in s of the strings S(g) for every circle equation g ∈ H. Afterwards, we use this construction to align them with strings corresponding to matching equations, equations with three variables and circle border equations. Moreover, it will help us to define the assignment ψs and relate the number of satisfied equations to the number of overlapped letters. But first, we concentrate on circle border equations. Let x1 ⊕ xn = 0 be a circle border equation. Furthermore, let the corresponding strings are given by Lx Cxl , l1 r Cxl xm0 1 xn Cx , r l m0 xl1 n Cx Cx x1 , m0 r Cxl xr1 1 xn Cx , r l r1 xm0 n Cx Cx x1 , and Cxr Rx . Since the simple alignments of the strings in S(g1x ) achieve an overlap of two l1 r l1 r l m0 l r1 m0 r m0 r l r1 letters for each pair {Cxl xm0 1 xn Cx , xn Cx Cx x1 } and {Cx x1 xn Cx , xn Cx Cx x1 }, we argue as before that these strings can be rearranged in a given superstring for SH such that the pairs use a simple alignment without increasing the length of the underlying superstring for SH . In this situation, we are able to overlap one of the pairs using a simple alignment with Lx Cxl from the left side and the other one with Cxr Rx from the right side without increasing the length. This construction checks whether the variables x1 and xn have the same assigned value, which is rewarded by another overlap of one letter of the corresponding strings using a simple alignment. For any fixed order of the circles C x in H, we build the backbone of our superstring consisting of the concatenation of the strings sx sy ⋯sz , where the string sx is associated to its circle C x . Furthermore, sx consists of the corresponding simple alignments of the strings in S(gix ) used in s and the order of the strings is given by the order of the variables in C x . The string sx starts with the letter Lx and ends with Rx . Notice that similar transformations can be applied to strings corresponding to matching equations and to equations with three variables, but we are going to define the transformation for those strings in detail while analyzing the upper 24 bound of overlapped letters for simple aligned strings corresponding to circle equations, which are contained in a given superstring s for SH . Before we start our analysis, we define the assignment ψs based on the actual superstring s for SH , which is not necessarily a normed superstring for SH . By applying the transformations, which we are going to define, the assignment ψs will change in dependence to the actual considered superstring. ψs (xi ) = 1 if the strings in S(gix ) use a 1-alignment in s = 0 otherwise Due to the transformations for the strings corresponding to circle and circle border equations, the assignment ψs is well-defined. Defining the Assignment for Checker Variables Let x ∈ V (E3 ), C x be the corresponding circle in H and M x its associated perfect matching. Furthermore, let xi ⊕ xi+1 = 0, xi−1 ⊕ xi = 0, xj−1 ⊕ xj = 0, xj ⊕ xj+1 = 0 and xi ⊕ xj = 0 be equations in H, where {i, j} ∈ M x and i < j holds. Let s be a superstring for SH such that the strings corresponding to circle and circle border equations are using a simple alignment in s. Based on the simple alignments of x x , which are used in the superstring , gjx and gj+1 the strings corresponding to gix , gi+1 s, we are going to define the assignment to the variables xi and xj . Furthermore, we analyze the number of overlapped letters that can be achieved given the simple aligned strings and relate them to the number of satisfied equations in H by ψs . In the remainder, we will assume that the underlying superstring for SH contains simple aligned strings corresponding to circle and circle border equations. Before we start our analysis, we introduce the notation of a constellation that denotes which of the simple alignments are used by the strings corresponding x x in s. , gjx and gj+1 to the equations gix , gi+1 Given a superstring s for SH and {i, j} ∈ M x , a constellation c is defined by (Xi Xi+1 , Xj−1 Xj+1 )s{i,j} with Xi , Xi+1 , Xj , Xj+1 ∈ {0, 1}, where Xk = 1 if and only if the strings in S(gkx ) use the 1-alignment in s for k ∈ {i, i + 1, j, j + 1}. We call a constellation c inconsistent if there is an entry A1 A2 with A1 ≠ A2 . Otherwise, c is called consistent. Based on the given constellations, we are going to define ψs . Definition 6 (Assignment ψs to Checker Variables). Let H be an instance of the Hybrid problem, SH its corresponding instance of the superstring problem and s a superstring for SH . Given the constellation (Xi Xi+1 , Xj Xj+1 )s{i,j} , we define ψs in the following way. (i) ψs (xi ) = Xi and ψs (xj ) = Xj if Xi ⊕ Xj = 1 and c is consistent 25 (ii) ψs (xi ) = Xi and ψs (xj ) = Xj if Xi ⊕ Xj = 0 (iii) ψs (xi ) = 1 − Xi and ψs (xj ) = Xj if Xi ⊕ Xj = 1 and Xi ≠ Xi+1 (iv) ψs (xi ) = Xi and ψs (xj ) = 1 − Xj if Xi ⊕ Xj = 1, Xj ≠ Xj+1 and Xi = Xi+1 We are going to analyze the the different constellations and discuss the cases (i)-(iv) of the definition of ψs . We start with case (i). CASE (i) Xi ⊕ Xj = 1 and c is consistent: There are two constellations, which we have to analyze, namely (11, 00)s{i,j} and (00, 11)s{i,j} . Starting with the former constellation, we obtain the scenario r1 depicted below. The string ▷ ▷ Xi with Xi ∈ {xm0 i , xi } represents a simple x alignment of the strings in S(gi ). Analogously, the string l1 Xi+1 ◁ ◁ with Xi+1 ∈ {xm0 i , xi } x ). Since we know that using represents a simple alignment of the strings in S(gi+1 x ) does not increase the most profitable simple alignment of the strings in S(g{i,j} the length of the superstring, we make use of the 1-alignment and transform the superstring s in the superstring s′ , which are both depicted below. r1 l1 r0 l0 r0 l0 r1 l1 l1 r0 l0 s = b ▷ ▷ xr1 i xi xi xj xj xi ◁ ◁m ▷ ▷ xj xj xj xi xi xj ◁ ◁e ↓ l1 r0 l0 xr1 i xi xj xj ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l1 r0 l0 r1 l1 ′ r0 xl0 ◁ ◁e s = b ▷ ▷ xr1 i xi xj xj xi xi ◁ ◁m ▷ ▷ xj j ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l0 r1 l1 xr0 j xj xi xi Let us derive an upper bound on the number of overlapped letters. More precisely, we are interested in the number of overlapped letters being additional to the overlap of two letters due to the simple alignment. In both cases, either by using x the 1-alignment or the 0-alignment of the strings in S(g{i,j} ), we cannot obtain more than an overlap of two letters. It corresponds to the number of satisfied equations, which are xi ⊕ xi+1 = 0 and xj ⊕ xj+1 = 0. l1 r0 l0 In case of the constellation (00, 11)s{i,j} , we separate the strings xr1 i xi xj xj l0 r1 l1 Then, we attach the aligned string and xr0 j xj xi xi from the superstring s. r1 l1 r0 l0 r1 l1 xi xi xj xj xi xi at the end of the actual solution. The considered situation is depicted below. 26 l1 r0 l0 l0 r1 l1 m0 m1 b ▷ ▷ xm0 xr1 ◁ ◁m ▷ ▷ xm1 xr0 ◁ ◁e i xi xj xj xi j xj xi xi xj i j ↓ m1 ▷▷ xj xm0 i ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ m0 b ▷ ▷ xi ◁ ◁m ▷ ▷ xm1 ◁ ◁e j ▷▷ l1 r0 l0 xr1 i xi xj xj ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l1 r0 l0 r1 l1 xr1 i xi xj xj xi xi ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l0 r1 l1 xr0 j xj xi xi In this scenario, the best that we are able to obtain is an overlap of two letters. This corresponds to the number of satisfied equations, namely xi ⊕ xi+1 = 0 and xj ⊕ xj+1 = 0. CASE (ii) Xi ⊕ Xj = 0 : Let us start with the constellation (0Xi+1 , 0Xj+1 )s{i,j} . In this case, we set ψs (xi ) = 0 , Xi+1 ◁ ◁, ▷ ▷ xr0 and Xj+1 ◁ ◁ and ψs (xj ) = 0. Given the strings ▷ ▷ xm0 i j l1 m1 l0 with Xi+1 ∈ {xm0 i , xi } and Xj+1 ∈ {xj , xj }, we obtain the following scenario: l0 r1 l1 xr0 j xj xi xi l1 r0 l0 b ▷ ▷ xm0 xr1 i xi xj xj Xi+1 i ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l0 r1 l1 ◁ ◁m ▷ ▷ xr0 j xj xi xi Xj+1 ◁ ◁e ↓ l0 r1 l1 xr0 j xj xi xi b ▷ ▷ xm0 Xi+1 i ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l0 r1 l1 r0 l0 ◁ ◁m ▷ ▷ xr0 j xj xi xi xj xj Xj+1 ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l1 r0 l0 xr1 i xi xj xj The most advantageous simple alignment in this case is the 0-alignment of the x strings in S(g{i,j} ). If ψs (xi ) = ψs (xi+1 ) = 0 holds, which means Xi+1 = xm0 i , we obtain another overlap of one letter by aligning ▷ ▷ xm0 with xm0 ◁ ◁. i i A similar argument holds for ψs (xj ) = ψs (xj+1 ) = 0. Notice that the equation xi ⊕ xj = 0 is satisfied by ψs . In summary, we state that we obtain an overlap of one additional letter per satisfied equation. Hence, we obtain an overlap of three letters according to the satisfied equations xi ⊕ xi+1 = 0, xi ⊕ xj = 0 and xj ⊕ xj+1 = 0. Consider the constellation (1Xi+1 , 1Xj+1 )s{i,j} . Hence, we are given the strings l1 ▷ ▷ xr1 , Xi+1 ◁ ◁, ▷ ▷ xm1 and Xj+1 ◁ ◁ with Xi+1 ∈ {xm0 i j i , xi } and l0 Xj+1 ∈ {xm1 j , xj }. We obtain the scenario displayed below. 27 l0 r1 l1 r1 l1 r0 l0 m1 b ▷ ▷ xr1 xr0 j xj xi xi Xj+1 ◁ ◁e i xi xi xj xj Xi+1 ◁ ◁m ▷ ▷ xj ↓ l1 r0 l0 xr1 i xi xj xj ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l1 r0 l0 r1 l1 m1 b ▷ ▷ xr1 Xj+1 ◁ ◁e i xi xj xj xi xi Xi+1 ◁ ◁m ▷ ▷ xj ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l0 r1 l1 xr0 j xj xi xi x In this case, we use the 1-alignment of the strings in S(g{i,j} ). If l1 ψs (xi ) = ψs (xi+1 ) = 1 holds, which means Xi+1 = xi , we obtain another overlap of one letter by aligning l1 r0 l0 xr1 i xi xj xj ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l1 r0 l0 r1 l1 ▷ ▷ xr1 i xi xj xj xi xi ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ with xl1 i ◁ ◁. l0 r1 l1 xr0 j xj xi xi In case of ψs (xj ) = ψs (xj+1 ) = 1, we may apply a similar argument. Notice that the equation xi ⊕ xj = 0 is satisfied by ψs . In summary, we state that we obtain an overlap of one additional letter per satisfied equation. Hence, we obtain an overlap of three letters according to the satisfied equations xi ⊕ xi+1 = 0, xi ⊕ xj = 0 and xj ⊕ xj+1 = 0. CASE (iii) Xi ⊕ Xj = 1 and Xi ≠ Xi+1 : Let us begin with the constellation (10, 0Xj+1 )s{i,j} . We consider the scenario depicted below, in which we are given the strings ▷ ▷ xr1 , xm0 ◁ ◁, ▷ ▷ xl0 i i j m1 and Xj+1 ◁ ◁ with Xj+1 ∈ {xl0 j , xj }. r1 l1 r0 l0 r0 l0 r1 l1 m0 b ▷ ▷ xr1 ◁ ◁m ▷ ▷ xr0 i xi xi xj xj xi j xj xj xi xi Xj+1 ◁ ◁e ↓ l0 r1 l1 xm0 xr0 i j xj xi xi ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ l0 r1 l1 r0 l0 ◁ ◁m ▷ ▷ xr0 b ▷ ▷ xm0 i j xj xi xi xj xj Xj+1 ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ▷▷ l1 r0 l0 xr1 i xi xj xj Instead of using the 1-alignment of the strings in S(gix ), we rather switch to the 0-alignment, i.e. we obtain the string ▷ ▷ xm0 and define ψ(xi ) = 0. It i results directly in gaining two additional satisfied equations and an overlap of one additional letter. As a matter of fact, we might lose an overlap of one letter, 28 because the string ▷ ▷ xm might have been aligned from the right side with 1 another string. Furthermore, the equation xi−1 ⊕ xi = 0 might be unsatisfied. But all in all, we obtain at least 2 − 1 additional satisfied equations by switching the value without increasing the superstring. Notice that we may achieve an additional overlap of one letter if Xj+1 = xl0 j holds, which means that ψs satisfies the equation xj ⊕ xj+1 = 0. The next constellation, we are going to analyze, is (01, 1Xj+1 )s{i,j} . Hence, m1 we are given the strings ▷ ▷ xm0 , xl1 and Xj+1 ◁ ◁, with i i ◁ ◁, ▷ ▷ xj m1 Xj+1 ∈ {xl0 j , xj }. The situation is displayed below. l1 r0 l0 l0 r1 l1 l1 m1 b ▷ ▷ xm0 xr1 xr0 i xi xj xj xi ◁ ◁m ▷ ▷ xj j xj xi xi Xj+1 ◁ ◁e i ↓ l1 r0 l0 xr1 i xi xj xj ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l1 r0 l0 r1 l1 m1 b ▷ ▷ xr1 Xj+1 ◁ ◁e i xi xj xj xi xi ◁ ◁m ▷ ▷ xj ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ r1 l1 xjr0 xl0 j xi xi We obtain a similar situation, in which we switch ▷ ▷ xm0 to ▷ ▷ xr1 i i . Accordingly, we define ψs (xi ) = 1. We obtain at least one additional satisfied equation by switching the value without increasing the length of the superstring. Notice holds. It that we may achieve an additional overlap of one letter if Xj+1 = xm1 j corresponds to the satisfied equation xj ⊕ xj+1 = 0. CASE (iv) Xi ⊕ Xj = 1, Xj ≠ Xj+1 and Xi = Xi+1 : Starting our analysis with the constellation (00, 10)s{i,j} , we obtain the following scenario. l1 r0 l0 l0 r1 l1 m0 l0 b ▷ ▷ xm0 xr1 ◁ ◁m ▷ ▷ xm1 xr0 i xi xj xj xi j xj xi xi xj ◁ ◁e i j ↓ xm0 l0 r1 l1 xr0 i j xj xi xi ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l0 r1 l1 r0 l0 b ▷ ▷ xm0 ◁ ◁m ▷ ▷ xr0 i j xj xi xi xj xj ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ▷▷ l1 r0 l0 xr1 i xi xj xj In this case, we argue that we switch the string ▷ ▷ xm1 to ▷ ▷ xr0 j j . This means 29 that we set ψs (xj ) = 0. This transformation yields an overlap of at least the same number of letters, since we might lose an overlap of one letter from the left side. On the other hand, we align the string l0 r1 l1 xr0 j xj xi xi ▷ ▷ xr0 j with ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l0 r1 l1 r0 l0 xr0 j xj xi xi xj xj ◁ ◁ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ l1 r0 l0 xr1 i xi xj xj from the right side by one letter. Notice that we gain at least one additional satisfied equation. The last constellation, we are going to analyze, is (11, 01)s{i,j} . sponding situation is depicted below. The corre- r1 l1 r0 l0 r0 l0 r1 l1 l1 r0 m1 b ▷ ▷ xr1 ◁ ◁e i xi xi xj xj xi ◁ ◁m ▷ ▷ xj xj xj xi xi xj ↓ xm1 j ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ◁ ◁m ▷ ▷ xm1 ◁ ◁e j ▷▷ l1 r0 l0 xr1 i xi xj xj ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ l1 r0 l0 r1 l1 b ▷ ▷ xr1 i xi xj xj xi xi ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ r1 l1 xjr0 xl0 j xi xi In this case, we switch the string ▷ ▷ xr0 to ▷ ▷ xm1 . Similarly to the former j j case, this transformation does not increase the length of the superstring. By defining ψs (xj ) = 1, we achieve at least one more satisfied equation. In summary, we note that we achieve at least the same number of satisfied equations as the number of overlapped letters. By applying the defined transformations, the actual superstring contains only strings corresponding to matching equations using a simple alignment. Matching equations xi ⊕ xj = 0 with i > j can be analyzed analogously. We are going to define the assignment for contact variables. Defining the Assignment for Contact Variables Let gj3 ≡ x ⊕ y ⊕ z = 0 be an equation with exactly three variables in H. Given a simple alignment of the strings corresponding to the equations xj1 −1 ⊕ x = 0, x ⊕ xj1 +1 = 0, yj2 −1 ⊕ y = 0, y ⊕ yj2 +1 = 0, zj3 −1 ⊕ z = 0, and z ⊕ zj3 +1 = 0, we are going to define an assignment based on the underlying simple alignments and analyze the number of satisfied equations in dependence of the number of overlapped 30 letters in the superstring. For a given superstring s for SH and equation gj3 ≡ x ⊕ y ⊕ z = 0, we define a constellation c given by (X1 X2 , Y1 Y2 , Z1 Z2 )sj with X1 , X2 , Y1 , Y2 , Z1 , Z2 ∈ {0, 1}, where C = 1 with C ∈ {X1 , X2 , Y1 , Y2 , Z1 , Z2 } if and only if the strings in the corresponding set are using a 1-alignment in s. A constellation denotes which of the simple alignments is used by the strings in s. We call a constellation inconsistent if there is an entry A1 A2 such that A1 ≠ A2 . Otherwise, c is called consistent. Based on a constellation for a given superstring and an equation gj3 with three variables, we are going to define the assignment ψs for the variables in gj3 . Definition 7 (Assignment ψs to Contact Variables). Let H be an instance of the Hybrid problem, SH its corresponding instance of the superstring problem, s a superstring for SH and gj3 ≡ x ⊕ y ⊕ z = 0 an equation with three variables in H. For the associated constellation c = (X1 X2 , Y1 Y2 , Z1 Z2 )sj , we define ψs in the following way. (i) If c is consistent, then, we define ψs (x) = X1 , ψs (y) = Y1 and ψs (z) = Z1 (ii) Otherwise, let A1 A2 be an entry in c with A1 ≠ A2 and α its corresponding variable. Furthermore, let β and γ be variables associated with the entry B1 B2 and C1 C2 , respectively. If A1 ⊕ B1 ⊕ C1 = 0 holds, we define ψs (α) = A1 , ψs (β) = B1 and ψs (γ) = C1 . (iii) Otherwise, we have A1 ⊕B1 ⊕C1 = 1. Then, we define ψs (α) = 1−A1 , ψs (β) = B1 and ψs (γ) = C1 . We are going to analyze the following three cases and define the transformations for the actual superstring for SH . (i) X1 ⊕ Y1 ⊕ Z1 = 1 and c is consistent (ii) X1 ⊕ Y1 ⊕ Z1 = 0 and c is inconsistent (iii) X1 ⊕ Y1 ⊕ Z1 = 1 and c is inconsistent Let us begin with case (i). CASE (i) X1 ⊕ Y1 ⊕ Z1 = 1 and c is consistent: In this case, we start with the constellation (11, 11, 11)sj . We depict the considered situation below. b ▷ ▷ xr1 xl1 ◁ ◁m1 ▷ ▷ y r1 y r1 Aj y l1 y l1 ◁ ◁m2 ▷ ▷ z r1 z r1 Bj z l1 z l1 ◁ ◁e ↓ b ▷ ▷ xr1 xl1 ◁ ◁m1 ▷ ▷ y r1 Aj y l1 ◁ ◁m2 ▷ ▷ z r1 Bj z l1 ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ r1 l1 y r1 Ay l1 31 z Bz According to the definition of ψs , we have ψs (x) = ψs (y) = ψs (z) = 1. Notice that the equation x ⊕ y ⊕ z = 0 is unsatisfied. On the other hand, the assignment ψs satisfies the equations x ⊕ xj1 +1 = 0, y ⊕ yj2+1 = 0 and z ⊕ zj3 +1 = 0. We note that a string corresponding to S A (gj3 ) or S B (gj3 ) using a simple alignment can have an overlap of at most one letter from the right side as well as from the left side. Therefore, the best we can hope for is to overlap the string y r1 Ay l1 with ▷ ▷ y r1 and y l1 ◁ ◁ by one letter in each case. The same holds for the string z r1 Bj z l1 . Consequently, we conclude that the number of overlapped letters is bounded from above by four. In case of X1 + Y1 + Z1 = 1, we analyze exemplary the constellation (00, 00, 11)sj . We set ψs (z) = 1, ψs (x) = 0 and ψs (y) = 0. This situation is displayed below. b ▷ ▷ xm0 xm0 ◁ ◁m1 ▷ ▷ y m0 y r1Aj y l1 y m0 ◁ ◁m2 ▷ ▷ z r1 z r1 Bj z l1 z l1 ◁ ◁e ↓ b ▷ ▷ xm0 ◁ ◁m1 ▷ ▷ y m0 ◁ ◁m2 ▷ ▷ z r1 Bj z l1 ◁ ◁e y r1 Aj y l1 ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ z r1 Bj z l1 m0 ▷▷ xm0 ▷▷ y Due to the z 1 -alignment of the strings in S B (gj3 ), we obtain an overlap of two letters. Additionally, we align the string ▷▷ xm0 from the left with xm0 ◁◁. The same holds for ▷ ▷ y m0 and y m0 ◁ ◁. Notice that it is not more advantageous to align the string xm0 Bj Cj with ▷ ▷ xm0 , since we lose the overlap of one letter with xm0 ◁ ◁. Hence, we are able to get an overlap of at most four letters, which corresponds to the satisfied equations x ⊕ xj1 +1 = 0, y ⊕ yj2+1 = 0 and z ⊕ zj3 +1 = 0. CASE X1 ⊕ Y1 ⊕ Z1 = 0 and c is inconsistent: First, we concentrate on the constellations with the property X1 + Y1 + Z1 = 2. Exemplary, we analyze the constellation (0X2 , 1Y2 , 1Z2 )sj depicted below. b ▷ ▷ xm0 X2 ◁ ◁m1 ▷ ▷ y r1 y r1 Aj y l1 Y2 ◁ ◁m2 ▷ ▷ z r1 z r1 Bj z l1 Z2 ◁ ◁e ↓ b ▷ ▷ xm0 X2 ◁ ◁m1 ▷ ▷ y r1 Aj y l1 Y2 ◁ ◁m2 ▷ ▷ z r1 Bj z l1 Z2 ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ r1 l1 y r1 Aj y l1 32 z Bj z The strings ▷ ▷ y r1 and ▷ ▷ z r1 can be used to align from the right side with z r1 Bz l1 and y r1Ay l1, respectively. It yields an overlap of two letters. If the corresponding equations with two variables are satisfied, which means X2 = xm0 , Y2 = y l1 and Z2 = z l1 , we gain an overlap of one letter per satisfied equation. Notice that using the x0 -alignment of S(gj3 ) does not yield more overlapped letters. In summary, it is possible to attain an overlap of at most five letters, which corresponds to the constellation (00, 11, 11)sj . An analogue argumentation holds for the constellations (1X2 , 1Y2 , 0Z2 )sj and (1X2 , 0Y2 , 1Z2 )sj . Next, we discuss constellations with the property X1 + Y1 + Z1 = 0. For this reason, we consider the constellation (0X2 , 0Y2 , 0Z2 )sj . b ▷ ▷ xm0 X2 ◁ ◁m1 ▷ ▷ y m0 y r1 Bj y r1 Y2 ◁ ◁m2 ▷ ▷ z m0 y r1Bj y r1 Z2 ◁ ◁e ↓ Cj Bx m0 ³¹¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹µ b ▷ ▷ xm0 ACj Bxm0 X2 ◁ ◁m1 ▷ ▷ y m0 Y2 ◁ ◁m2 ▷ ▷ z m0 Z2 ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ xm0 ACj Recall that xm0 Cj xm0 denotes the x0 -alignment of S(gj3 ). This string can be aligned from the left with ▷ ▷ xm0 . If X2 = xm0 holds, we achieve another overlap of one letter. Furthermore, the string ▷ ▷ y m0 can be aligned from the right with Y2 ◁ ◁ if and only if Y2 = y m0 holds. A similar argumentation can be applied to the strings ▷ ▷ z m0 and Z2 ◁ ◁. Finally, we note that we cannot benefit by aligning the string y l1 ◁ ◁ with y r1 Ay l1. Consequently, we see that using the string xm0 Cj xm0 is generally more profitable. All in all, we gain an additional overlap of one letter for satisfying x ⊕ y ⊕ z = 0 and another overlap of one letter if the equation with two variables corresponding to the considered variable is satisfied. CASE X1 ⊕ Y1 ⊕ Z1 = 1 and c is inconsistent: Let us start with constellations satisfying X1 + Y1 + Z1 = 3. Exemplary, we analyze the constellation (10, 1Y2 , 1Z2 )sj . Due to the definition of ψs , we set ψs (x) = 1 − X1 , ψs (y) = 1 and ψs (z) = 1. Notice that ψs satisfies the equation x ⊕ y ⊕ z = 0. By switching the value ψs (x) from X1 to 1 − X1 , the equation xj1 −1 ⊕ x = 0 might become unsatisfied. Furthermore, we might lose an overlap of one letter by flipping the 1-alignment of the strings corresponding to xj1 −1 ⊕ x = 0 to the 0-alignment. On the other hand, we gain an overlap of one letter by aligning the string ▷ ▷ xm0 from the right side with xm0 ◁ ◁. This transformation yields at 33 least one more satisfied equation. In addition, the strings y r1Aj y l1 and z r1 Bz l1 can be aligned by one letter with ▷ ▷ y r1 and ▷ ▷ z r1 , respectively. If Z2 = z l1 and Y2 = y l1 holds, we achieve another overlap of one letter in each case. The situation is depicted below. b ▷ ▷ xr1 xm0 ◁ ◁m1 ▷ ▷ y r1 y r1 Ay l1 Y2 ◁ ◁m2 ▷ ▷ z r1 z r1 Bz l1 Z2 ◁ ◁e ↓ b ▷ ▷ xm0 ◁ ◁ ▷ ▷ y r1 Aj y l1 Y2 ◁ ◁ ▷ ▷ z r1 Bj z l1 Z2 ◁ ◁ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ z r1 Bj z l1 r1 A y l1 m0 y j ▷▷ x The other constellations satisfying X1 + Y1 + Z1 = 3 can be analyzed analogously. The remaining constellations (X1 X2 , Y1 Y2 , Z1 Z2 )sj to be discussed satisfy X1 + Y1 + Z1 = 1 and are inconsistent. Exemplary, we analyze the constellation (01, 0Y2 , 1Z2 )sj . For (01, 0Y2 , 1Z2 )sj , we set ψs (x) = 1 − X1 , ψs (y) = Y1 and ψs (z) = Z1 . The scenario is depicted below. b ▷ ▷ xm0 xr1 Aj xl1 xl1 ◁ ◁m1 ▷ ▷ y m0 Y2 ◁ ◁m2 ▷ ▷ z r1 Bj z l1 Z2 ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ z r1 Bj z l1 ↓ b ▷ ▷ xr1 Aj xl1 ◁ ◁m1 ▷ ▷ y m0 Y2 ◁ ◁m2 ▷ ▷ z r1 Bz l1 Z2 ◁ ◁e ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ z r1 Bz l1 xr1 Aj xl1 By flipping the 0-alignment of the strings corresponding to xj1 −1 ⊕ x = 0 to the 1-alignment, we can overlap xr1 Aj xl1 from the left side with ▷ ▷ xr1 and with xl1 ◁ ◁ from the right side. This transformation achieves an overlap of at most one more letter. Moreover, we obtain at least one more satisfied equation. If Z2 = z l1 and Y2 = y m0 holds, it yields an overlap of three additional letters, which corresponds to the constellation (11, 00, 11)sj . In summary, we note that it is possible to achieve an overlap of at least one letter in each case. In addition to it, the assignment ψs yields at least the same number of satisfied equations as the number of overlapped letters. This means that if ψs satisfies the equations gj3 , x ⊕ xj1 +1 = 0, y ⊕ yj2+1 = 0 and z ⊕ zj3 +1 = 0, the 34 corresponding strings in s can have an overlap of at most five letters. 5.5 Proof of Theorem 7 Given an instance H of the Hybrid problem with n circles, m2 equations with two variables and m3 equations with exactly three variables with the properties described in Theorem 5, we construct in polynomial time an instance SH of the Shortest Superstring problem with the properties described in section 5.2. Let φ be an assignment to the variables of H which leaves at most u equations unsatisfied. According to section 5.3, the length of the superstring sφ is at most ∣sφ ∣ ≤ 7 ⋅ n + 5 ⋅ m2 + 22 ⋅ m3 + u, since the length of the superstring increases by at most one letter for every unsatisfied equation of the assignment. Regarding the compression measure, we obtain the following. ∑ ∣s∣ − (7 ⋅ n + 5 ⋅ m2 + 22 ⋅ m3 + u) comp(SH , sφ ) ≥ s∈SH = (4 + 8)n + 8 ⋅ m2 + 36 ⋅ m3 − (7 ⋅ n + 5 ⋅ m2 + 22 ⋅ m3 + u) = 5n + 3m2 + 14m3 − u On the other hand, given an superstring s for SH with length ∣s∣ = 5m2 + 22m3 + u + 7n, we can construct in polynomial time an normed superstring s′ without increasing the length of it by applying the transformations defined in section 5.4. This enables us to define an assignment ψs to the variables of H according section 5.4 that leaves at most u equations in H unsatisfied. A similar argumentation leads to the conclusion that given a superstring s for SH with compression comp(SH , sφ ) = 5n + 3m2 + 14m3 − u, we construct in polynomial time an assignment to the variables in H such that at most u equations are unsatisfied. Next, we are going to describe smaller gadgets for equations with three variables implying an improved explicit lower bound and give the proof of Theorem 6. 5.6 Proof of Theorem 6 Given an equation with three variables gc3 ≡ x ⊕ y ⊕ z = 0, we introduce the sets S α (gj3 ) and S β (gj3 ) including the following strings. xr1α xl1 y r1 y l1, y r1y l1 xm0 Cj , 35 xm0 Cj xr1α xl1 ∈ S α (gj3 ) xr1β xl1 z r1 z l1 , z r1 z l1 Cj xm0 , Cj xm0 xr1β xl1 ∈ S β (gj3 ) In addition, we introduce new strings for the equation xi−1 ⊕ x = 0. On the other hand, the strings corresponding to x ⊕ xi+1 = 0, yi−1 ⊕ y = 0, y ⊕ yi+1 = 0, zi−1 ⊕ z = 0 and z ⊕ zi+1 = 0 remain the same. Let us define the strings for xi−1 ⊕ x = 0: r1β l1 xi−1 xr1α xl1 i−1 x r1α m0 m0 xi−1 x xl1 i−1 x m0 l1 xi−1 xr1β xm0 i−1 x These three strings can be aligned each by two letters in a cyclic fashion. Accordingly, we obtain three combinations that can be used to overlap with other strings by one letter from the left side as well as from the right side. Note that we have only two combinations if we consider only the left most position of the combined strings. For example, the combination m0 l1 r1β l1 xi−1 xr1β xi−1 xr1α xm0 xl1 i−1 x i−1 x can be used to overlap from the right side with strings in S β (gj3 ), whereas r1α r1α m0 m0 l1 xi−1 x xi−1 xr1β xl1 xl1 i−1 x i−1 x can be aligned with strings contained in S α (gj3 ). Therefore, we may apply the same arguments as in the proof of Theorem 6. The strings corresponding to equations of the form x ⊕ y ⊕ z = 1 can be constructed analogously. Given an instance H of the Hybrid problem with n circles, m2 equations with two variables and m3 equations with exactly three variables with the properties described in Theorem 5, we construct in polynomial time an instance SH of the Shortest Superstring problem. Let φ be an assignment to the variables of H which leaves at most u equations unsatisfied. Then, it is possible to construct a superstring sφ with length ∣sφ ∣ ≤ 7 ⋅ n + 5 ⋅ m2 + 16 ⋅ m3 + u, since the length of the superstring increases by at most one letter for every unsatisfied equation of the assignment. Regarding the compression measure, we obtain the following. comp(SH , sφ ) ≥ ∑ ∣s∣ − (7 ⋅ n + 5 ⋅ m2 + 16 ⋅ m3 + u) s∈SH = (4 + 8)n + 8 ⋅ m2 + 28 ⋅ m3 − (7 ⋅ n + 5 ⋅ m2 + 16 ⋅ m3 + u) = 5n + 3m2 + 12m3 − u On the other hand, given an superstring s for SH with length ∣s∣ = 5m2 + 16m3 + u + 7n, we can construct in polynomial time a normed superstring s′ without increasing the length of it. The corresponding assignment ψs′ to the variables of H leaves at most u equations in H unsatisfied. A similar argumentation leads to the conclusion that given a superstring s for SH with compression comp(SH , sφ ) = 5n + 3m2 + 12m3 − u, we construct in polynomial time an assignment to the variables in H such that at most u equations are unsatisfied. 36 6 Concluding Remarks It seems that a new method is needed now in order to obtain better approximation lower bounds. Perhaps direct PCP constructions are the natural next step for proving stronger approximation hardness results for the problems considered in this paper. References [AS95] C. Armen and C. Stein, Improved Length Bounds for the Shortest Superstring Problem, Proc. 5th WADS (1995), LNCS 955, 1995, pp. 494–505. [AS98] C. Armen, C. Stein, A 2 32 Superstring Approximation Algorithm, Discrete Applied Mathematics 88 (1998), pp. 29–57. [BK99] P. Berman, M. Karpinski, On Some Tighter Inapproximability Results, Proc. 26th ICALP (1999), LNCS 1644, 1999, pp. 200–209. [B02] 8 Approximation Algorithm for the Asymmetric Max-TSP, M. Bläser, An 13 Proc. 13th SODA (2002), 64–73. [BJL+ 94] A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yanakakis, Linear Approximation of Shortest Superstrings, J. ACM 41 (1994), pp. 630–647. [BJJ97] D. Breslauer, T. Jiang, and Z. Jiang, Rotations of Periodic Strings and Short Superstrings, J. Algorithms 24 (1997), pp. 340–353. [CGP+ 94] A. Czumaj, L. Gasieniec, M. Piotrow, and W. Rytter, Parallel and Sequential Approximations of Shortest Superstrings, Proc. 1st SWAT (1994), LNCS 824, 1994, pp. 95–106. [E99] L. Engebretsen, An Explicit Lower Bound for TSP with Distances One and Two, Algorithmica 35 (2003), pp. 301–318. [EK06] L. Engebretsen and M. Karpinski, TSP with Bounded Metrics, J. Comput. Syst. Sci. 72 (2006), pp. 509–546. [FNW79] M. Fisher, G. Nemhauser and L. Wolsey, An Analysis of Approximations for Finding a Maximum Weight Hamiltonian Circuit, Oper. Res. 27 (1979), pp. 799–809. [GMS80] J. Gallant, D. Maier and J. Storer, On Finding Minimal Length Superstrings J. Comp. Sys. Sci. 20 (1980), pp. 50–58. [H01] J. Håstad, Some Optimal Inapproximability Results, J. ACM 48 (2001), pp. 798–859. 37 [KLS+ 05] H. Kaplan, M. Lewenstein, N. Shafrir and M. Sviridenko, Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs, J. ACM 52 (2005), pp. 602–626. [KPS94] R. Kosaraju, J. Park, and C. Stein, Long Tours and Short Superstrings, Proc. 35th FOCS (1994), pp. 166–177. [L88] A. Lesk, Computational Molecular Biology, Sources and Methods for Sequence Analysis, Oxford University Press, 1988. [LS03] M. Lewenstein and M. Sviridenko, Approximating Asymmetric Maximum TSP, Proc. 14th SODA (2003), pp. 646–654. [L90] M. Li, Towards a DNA Sequencing Theory, Proc. 31th FOCS (1990), pp. 125–134. [MS77] D. Maier and J. Storer, A Note on the Complexity of the Superstring Problem, Report No. 223, Computer Science Laboratory, Princeton University, 1977. [MJ75] A. Mayne and E. James, Information Compression by Factorising Common Superstrings, Comput. J. 18 (1975), pp. 157–160. [M12] M. Mucha, Lyndon Words and Short Superstrings, CoRR abs/1205.6787, 2012. [M94] M. Middendorf, More on the Complexity of Common Superstring and Supersequence Problems, Theor. Comput. Sci. 125 (1994), pp. 205–228. [M98] M. Middendorf, Shortest Common Superstrings and Scheduling with Coordinated Starting Times, Theor. Comput. Sci. 191 (1998), pp. 205–214. [O99] S. Ott, Lower Bounds for Approximating Shortest Superstrings over an Alphabet of Size 2, Proc. 25th. WG (1999), LNCS 1665, pp. 55–64. [PY93] C. Papadimitriou and M. Yannakakis, The traveling Salesman Problem with Distances One and Two, Math. Oper. Res. 18 (1993), pp. 1–11. [S88] J. Storer, Data Compression: Methods and Theory, Computer Science Press, 1988. [SS82] J. Storer and T. Szymanski, Data Compression via Textual Substitution, J. ACM 29 (1982), pp. 928–951. [S99] Z. Sweedyk, A 2 21 -Approximation Algorithm for Shortest Superstring, SIAM J. Comput. 29 (1999), pp. 954–986. [TU88] J. Tarhio and E. Ukkonen, A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings, Theor. Comput. Sci. 57 (1988), pp. 131–145. 38 [TT93] S. Teng and F. Yao, Approximating shortest superstrings, Proc. 34th FOCS (1993), pp. 158–165, 1993. [T90] V. Timkovskii, Complexity of Common Subsequence and Supersequence Problems and Related Problems, Cybernetics and Systems Analysis 25 (1990), pp. 565–580; translated from Kibernetika 25 (1989), pp. 1–13. [T89] J. Turner, Approximation Algorithms for the Shortest Common Superstring problem , Information and Computation 83 (1989), pp. 1–20. [V05] V. Vassilevska, Explicit Inapproximability Bounds for the Shortest Superstring Problem, Proc. 30th MFCS (2005), LNCS 3618, 2005, pp. 793–800. [V92] S. Vishwanathan, An Approximation Algorithm for the Asymmetric Travelling Salesman Problem with Distances One and Two, Inf. Process. Lett. 44 (1992), pp. 297–302. 39