[go: up one dir, main page]

Academia.eduAcademia.edu
Information Sciences 135 (2001) 43±56 www.elsevier.com/locate/ins Parallelism and dictionary based data compression Sergio De Agostino Computer Science Department, Armstrong Atlantic State University, 11935 Abercorn Street, Savannah, GA 31419-1997, USA Abstract Because of the size of information involved with the emerging applications in multimedia and the Human Genome Project, parallelism o€ers the only hope of meeting the challenges of storing such databases and searching through quickly. In this paper, we address dictionary based lossless text compression and give the state-of-the-art in the ®eld of parallelism. Static dictionary compression and sliding window (LZ1) compression have been successfully parallelized by many authors. Dynamic dictionary compression (LZ2) seems hardly parallelizable since some related heuristics are known to be P-complete. In spite of such negative results, the decoding process can be parallelized eciently for LZ2 compression as well as for static and LZ1. A main issue for implementation purposes in dictionary based compression is to bound the dictionary size [3,23]. Di€erences in terms of parallel complexity are not relevant between compression with bounded and unbounded windows from a theoretical point of view. Much more interesting are the results concerning bounded size dictionary compression with the LZ2 method. When the size of the dictionary is O logk n†, a bounded size dictionary version (LRU deletion heuristic) of the LZ2 compression algorithm is hard for the class of problems solvable simultaneously in polynomial time and O logk n† space (that is, SCk ). A relaxed variation of this heuristic is the ®rst natural SCk -complete problem (the original heuristic belongs to SCk‡1 ). In virtue of these results, it can be argued that there are no practical parallel algorithms for LZ2 compression with LRU deletion heuristic or any other heuristic deleting dictionary elements in a continuous way. For simpler heuristics (SWAP, RESTART, FREEZE), practical parallel algorithms exist. Ó 2001 Elsevier Science Inc. All rights reserved. E-mail address: agos@armstrong.edu (S. De Agostino). 0020-0255/01/$ - see front matter Ó 2001 Elsevier Science Inc. All rights reserved. PII: S 0 0 2 0 - 0 2 5 5 ( 0 1 ) 0 0 1 0 0 - 1 44 S. De Agostino / Information Sciences 135 (2001) 43±56 1. Introduction Static dictionary compression [4,16,25] and sliding window (LZ1) compression [19] have been successfully parallelized by many authors. Ecient parallel algorithms (polylogarithmic time and polynomial number of processors) have been designed for compression and decompression with static and sliding dictionaries [2,5,6,14,17,21,22]. The LZ2 compression method [18,27] (also called dynamic) seems hardly parallelizable since some related heuristics are known to be P-complete [7,8]. In spite of such negative results, the decoding process can be parallelized eciently [9,10,12,13]. Observe that even if compression is not parallelizable, fast parallel decompression may be still relevant for CD-ROM memories or all applications where decompression is more frequent than compression. The problem of designing work-optimal parallel decoding algorithms was addressed in [14]. They were able to provide optimal work parallel Las Vegas algorithms for compression and decompression with static and sliding dictionaries. Similar results were obtained for decompression with LZ2 type dictionaries in [12]. A main issue for implementation purposes in dictionary based compression is to bound the dictionary size [3,23]. Di€erences in terms of parallel complexity are not relevant between compression with bounded and unbounded windows from a theoretical point of view. Much more interesting are the results concerning bounded size dictionary compression with the LZ2 method [11]. When the size of the dictionary is O logk n†, a bounded size dictionary version (LRU deletion heuristic) of the LZ2 compression algorithm is hard for the class of problems solvable simultaneously in polynomial time and O logk n† space (that is, SCk ). A relaxed variation of this heuristic is the ®rst natural SCk complete problem (the original heuristic belongs to SCk‡1 ). In virtue of these results, it can be argued that there are no practical parallel algorithms for LZ2 compression with LRU deletion heuristic or any other heuristic deleting dictionary elements in a continuous way. For simpler heuristics (SWAP, RESTART, FREEZE), practical parallel algorithms exist. In Section 2 we describe parallel algorithms for compression and decompression with static and sliding dictionaries. The LZ2 compression method is addressed in Section 3 and techniques for parallel decompression are described. Section 4 considers parallel complexity issues for LZ2 heuristics working with bounded size dictionaries. Conclusions and open problems are given in Section 5. 2. Static and sliding (LZ1) dictionaries With static dictionary compression, a string is compressed by replacing substrings with pointers to copies, called targets, which are stored in a given S. De Agostino / Information Sciences 135 (2001) 43±56 45 dictionary. The string is factorized and the factors are dictionary elements. The factorization of the string is called parsing. The compressed string is a sequence of pointers encoding the factors. As mentioned in Section 1, parallel algorithms exist for compression and decompression with static dictionaries. Decompression is trivial since we can decode the pointers in parallel and reconstruct the original string by parallel pre®x with O n† processors where n is the length of the output string. We address in Sections 2.1±2.3 how to approach in parallel the basic compression heuristics using a static dictionary. Section 2.4 concerns sliding dictionaries. 2.1. Optimal parsing A practical assumption is that the pointers have equal size, so that the way to obtain the best compression is to factorize the string into the minimum number of factors, that we call optimal parsing. The optimal parsing problem can be reduced to the problem of ®nding the shortest path between two vertices in a directed graph. If x1 . . . xn is the input string X, V ˆ v1 . . . vn ; vn‡1 † a set of vertices, k the cardinality of the dictionary D and l h† the length of the hth dictionary element, the reduction is in parallel for 1 6 i 6 n do in parallel for 1 6 h 6 k do if the hth dictionary element matches X in position i then add a directed edge from vi to vi‡l h† in G. A PRAM CREW is an allowed model for this reduction because no concurrent writing is involved in the computation and O nkM† processors realize it in logarithmic time, with M maximum length of a target. The shortest path from v1 to vn‡1 provides the optimal parsing. In fact, each path between two nodes vi and vj represents a parsing on the substring xi . . . xj 1 and the internal nodes of the path provide the position of the ®rst character of each phrase. An algorithm for computing shortest paths requires O n3 † processors and O log2 n† time or O n4 † processors and O log n† time on a PRAM CREW model of computation. It follows that the optimal parsing problem belongs to NC, the class of problems solvable in polylogarithmic time and a polynomial number of processors [6]. If the dictionary is stored in a trie, there is a parallel algorithm requiring O M ‡ log M log n† time and OnM 2 processors [17]. These procedures also apply to the case where pointers have di€erent size. It is enough to weight the edges with the size of the pointer to the corresponding target. A dictionary D is pre®x if all the pre®xes of an element are in D. Optimal parsing with pre®x dictionaries is computationally easier. 46 S. De Agostino / Information Sciences 135 (2001) 43±56 The following procedure compute the optimal parsing with a pre®x dictionary [16]: j :ˆ 0; i :ˆ 0; repeat forever begin for k ˆ j ‡ 1 to i ‡ 1 do begin let h k† be such that xk . . . xh k† is the longest match in the kth position end let k 0 be such that h k 0 † is maximum xj . . . xk0 1 is a factor of the parsing j :ˆ k 0 ; i :ˆ h k 0 † end In the procedure above, at each step we select a factor such that the longest match with a dictionary element in the next position ends to the rightest. Since the dictionary is pre®x, the parsing is optimal. The algorithm can even be implemented in real time with a modi®ed sux trie data structure. In [14], a Las Vegas work-optimal parallel algorithm to compress optimally with a pre®x dictionary in expected logarithmic time is given. In [21], they show a deterministic algorithm. With a sux trie data structure, the longest sequence of two factors in each position of the string can be computed in O M† parallel time on the PRAM CREW. Position i is linked to the position of the second factor. If the second factor is empty, i is linked to n ‡ 1 where n is the length of the string. We obtain a tree rooted in n ‡ 1. The positions of the factors of an optimal parsing are given by the path from 1 to n ‡ 1, which can be found with a linear number of processors and logarithmic time on the PRAM CREW by pointer jumping. A parallel approximation scheme for the problem is provided in [2] which works locally and can be implemented on a linear array of processors in sublinear time. 2.2. Greedy parsing Greedy parsing is produced by a procedure that, reading the string from left to right, ®nds at each step the longest match starting from the ®rst position of the sux not parsed yet. Di€erently from optimal parsing, greedy parsing can always be computed on-line in real time given any dictionary stored in a trie. In order to compute greedy parsing in parallel ®nd the greedy factor in position i and link i to j ‡ 1, where j is the last position of the factor. If the greedy factor ends the string i is linked to n ‡ 1, where n is the length of the string. Again, we formed a tree rooted in n ‡ 1 and the positions of the factors of the greedy parsing are given by the path from 1 to n ‡ 1. It follows that the greedy parsing problem belongs to NC. A practical parallel algorithm exists with a linear number of processors and O log n ‡ M† time on the PRAM CREW [17]. S. De Agostino / Information Sciences 135 (2001) 43±56 47 2.3. Longest ®rst fragment parsing Longest ®rst fragment parsing is produced by an o€-line greedy procedure which select at each step the leftmost occurence of the longest match with the dictionary which does not overlap factors previously selected. A parallel approach to this problem is to compute a maximum collection of longest nonoverlapping matches and iterate the computation on the uncovered part of the string. This approach takes O M log n† time and O n= log n† processors [17]. In [21] a di€erent approach requiring O M ‡ log n† time and O Mn† processors is shown and an NC algorithm for arbitrary match lengths using O M 2 n† processors can be derived. 2.4. Sliding dictionaries If the dictionary is a sliding window (LZ1) then the encoded string is a sequence of pointers qi ˆ di ; `i †, where di is the displacement back into the window and ` the length of the target. First, we address parallel decoding. If s1 ; . . . ; sm are the partial sums of l1 ; . . . ; lm , the target of qi encodes the substring over the positions si 1 ‡ 1    si of the output string. Link the positions si 1 ‡ 1    si to the positions si 1 ‡ 1 di    si 1 ‡ 1 di ‡ li 1, respectively. If di ˆ 0, the target of qi is an alphabet character and the corresponding position in the output string is a sink that can be seen as the root of a tree in a forest. All the nodes in a tree correspond to positions of the decoded string where the character is the root. A work-optimal parallel decoder is given by applying the Las Vegas algorithm in [15] which ®nd the connected components of a graph G ˆ V ; E† in expected time O log jV j† and O jEj† work on a PRAM CRCW [14]. On the PRAM EREW, decoding takes a linear number of processors and deterministic logarithmic time by means of the Euler tour technique [13,22]. As far as parallel coding is concerned, computational di€erences exist between the restricted case (the window has a ®xed length M) and the unrestricted one (the window is the whole input read so far). With O Mn† processors, the longest match in each position with the corresponding window can be computed in parallel on the PRAM CREW. The running time is O log M† by using the partial results obtained in the other positions during the computation. Then, the parsing is found by pointer jumping [17]. Obviously, this approach also provides an NC algorithm for the unrestricted case with a quadratic number of processors. Compression with a linear number of processors is possible in O M ‡ log n† time, where M is the maximum match length [5,17]. It exploits a sux trie data structure storing all the substrings of the input, which can be constructed in parallel on the PRAM CRCW [1]. 48 S. De Agostino / Information Sciences 135 (2001) 43±56 3. Dynamic dictionaries (LZ2) The LZ2 compression method seems hardly parallelizable since some related heuristics are known to be P-complete [7,8]. Standard implementations of such method are the next character heuristic [26] and the identity heuristic [20]. The next character heuristic is P-complete. The identity heuristic has not been proved to be P-complete. However, the P-completeness of other LZ2 compression heuristics suggests that it is hardly parallelizable. On the other hand, parallel decoding is possible. In Sections 3.1±3.4 we describe the two heuristic and discuss the related issues concerning parallel decompression. 3.1. The next character heuristic The next character heuristic is a standard technique that learns substrings by reading the input string from left to right with an incremental parsing procedure. The dictionary comprises only the alphabet characters, initially. This procedure adds a new substring to the dictionary as soon as a pre®x of the still unparsed part of the string does not match a dictionary element. Then, it restarts the reading from the last character of the new substring and replaces the pre®x with a pointer to the copy in the dictionary. Therefore, the concatenation of a phrase with the next character is a dictionary element. Example 1. abababaabaaa parsing: a, b, ab, aba, abaa, a; dictionary: a, b, ab, ba, aba, abaa, abaaa; coding: 1, 2, 3, 5, 6, 1. Let S ˆ s1 . . . sn be a text string of length n over an alphabet R ˆ fr1 ; . . . ; rd g of cardinality d and q1 . . . qm the sequence of pointers output by the next character heuristic. Observe that if the target of a pointer qi is not an alphabet character then it is the concatenation of the target of the pointer in position qi d with the ®rst character of the target of the pointer in position qi d ‡ 1. In fact, a new element is added to the dictionary at each step of the parsing and the dictionary contains initially the alphabet characters. This property is fundamental for the parallel decompressor. 3.2. Parallel decoding for the next character heuristic Let q1 . . . qm be the sequence of pointers output the next character heuristic, encoding an input string S of length n drawn over an alphabet R of cardinality d. Since the target of the pointer qi is the concatenation of the target of the pointer in position qi d with the ®rst character of the target of the pointer in S. De Agostino / Information Sciences 135 (2001) 43±56 49 position qi d ‡ 1, in parallel for each pointer qi we can compute the list of pointers to the pre®xes of the target of qi , that we rename qi;1 ; . . . ; qi;` . Observe that qi;` is the pointer representing the ®rst character of the target and ` is the target length. The maximum target length is H n1=2 † and is reached, for example, on the unary string where the match length grows by one at each step. Furthermore, the jth character of the target is represented by the last element of the list associated to the pointer in position qi;` j‡1 ‡ d 1. It follows that decoding can be realized on the PRAM CREW in O log n† time with O n† processors [9]. As for the LZ1 method, we can decode also LZ2 compressed text with a linear number of processors and logarithmic time on the PRAM EREW by reducing the problem to ®nding the trees of a forest. We apply the Euler tour technique [24] to compute the length of each target. In parallel for each i, link pointer qi to the pointer in position qi d, if qi > d. Again, we obtain a forest where each tree is rooted in a pointer representing the alphabet character and the length li of the target of a pointer qi is equal to the level of the pointer in the tree plus 1. It is trivial that building such a forest takes constant time. By means of the Euler tour technique, we can compute the trees of such forest and the level of each node in its own tree. Therefore, we can compute the length of each target with O m† processors and logarithmic time on the PRAM EREW. If s1 ; . . . ; sm are the partial sums of l1 ; . . . ; lm the target of qi is a copy of the substring over the positions si 1 ‡ 1    si of the output string. For each qi which does not correspond to an alphabet character de®ne first i† ˆ sqi d 1 ‡ 1 and last i† ˆ sqi d ‡ 1. Link the positions si 1 ‡ 1    si to the positions sfirst i†    slast i† , respectively. As in the sliding dictionary case, if the target of qi is an alphabet character the corresponding position in the output string is the root of a tree in a forest and all the nodes in a tree correspond to positions of the decoded string where the character is the root. Then, the output string can be computed with O n† processors and O log n† on the PRAM EREW [13]. On a PRAM CREW, every target length is computed in logarithmic time with m processors by pointer jumping. Since m is O n= log n†, a work-optimal parallel decoder on a PRAM CRCW also can be provided for the LZ2 method [12]. 3.3. The identity heuristic The identity heuristic also learns substrings by reading the input string from left to right with a greedy parsing procedure. The dictionary still comprises only the alphabet characters, initially. The heuristic adds to the dictionary the concatenation of the last match with the current greedy match. The current match is replaced with a pointer to the dictionary and the reading restarts from the next character. Therefore, the concatenation of two adjacent phrases is a dictionary element (the term ``identity'' comes from the fact that an entire 50 S. De Agostino / Information Sciences 135 (2001) 43±56 phrase is concatenated to the previous one). Di€erently from the next character heuristic, it is not guaranteed that the identity heuristic adds a new element to the dictionary at a given step. To enable parallel decompression for the identity heuristic, a slight variation in the coding was introduced in [10]. That is, we increase the pointer value by adding a dummy element to the dictionary when a new element is not learned at a given step (but the ®rst one). Example 2. ababababababbabbbabb parsing: a, b, ab, ab, abab, ab, b, abb, babb; dictionary: a, b, ab, bab, abab, ababab, ``ababab'', abb, babb, abbbabb; coding: 1, 2, 3, 3, 5, 3, 2, 8, 9. At the ®rst step the heuristic never learns a new dictionary element. Observe that it is possible not to learn a new element at a later step. At the ®fth step the element ababab is in the dictionary already. Neverthless, we increase the pointer value by adding a dummy element to the dictionary. We can arm that this modi®cation is negligible in practice since dummy elements are not frequent on realistic ®les, especially on large alphabets. Let S ˆ s1 . . . sn be a text string of length n over an alphabet R ˆ fr1 ; . . . ; rd g of cardinality d and q1 . . . qm the sequence of pointers output by the ID heuristic. Observe that if the target of the pointer qi is not an alphabet character then it is the concatenation of the target of the pointer in position qi d with the target of the pointer in position qi d ‡ 1. In fact, an element (new or dummy) is added to the dictionary at each step of the parsing but the ®rst and the dictionary contains initially the alphabet characters. This property enables parallel decompression. 3.4. Parallel decoding for the identity heuristic Let q1 . . . qm be the sequence of pointers output by the identity heuristic, encoding a string of length n drawn over an alphabet R of cardinality d. As explained in Section 3.3, the target of the pointer qi is the concatenation of the target of the pointer in position qi d with the target of the pointer in position qi d ‡ 1. Decompressing a pointer qi of the encoded string q1 . . . qm output by the identity heuristic recalls a tree walk. Consider qi the root of a tree where the left subtree is rooted into the pointer in position qi d and the right subtree is rooted into the pointer in position qi d ‡ 1. Then, recursively we can de®ne a tree where the leaves, read from left to right, provide the target of qi . For istance, pointer 9 of Example 2 refers to pointer 2 in position 7 and pointer 8 in position 8. Then, the ®rst character of the target of pointer 9 is b. Pointer 8 refers to pointer 3 in position 6 and pointer 2 in position 7. Then, also the last character of the target of pointer S. De Agostino / Information Sciences 135 (2001) 43±56 51 9 is b. Pointer 3 refers to pointer 1 in position 1 and pointer 2 in position 2. Therefore, the target of pointer 9 is babb. The maximum target length is H n= log n† and is reached, for example, on the unary string where the match length grows as the Fibonacci numbers. What makes problematic to compute the target lengths in logarithmic time on a PRAM EREW or with optimal parallel work is the fact that we deal with a tree structure associated with the pointers, whereas for the next character heuristic the structure is linear. Thus, the underlaying undirected graph obtained from the linking among the pointers is cyclic. In parallel for each pointer qi we can compute the tree providing the target of qi as follows: at each step of the iteration, for each leaf q of the tree rooted in qi make a copy of the trees rooted into the pointers in positions q d and q d ‡ 1 and add these copies as left subtree and right subtree of q, respectively. This procedure takes O log n† time and O n† processors on the PRAM CREW [10]. 4. Bounded size dynamic dictionaries A main issue for implementation purposes is to bound the dictionary size. In this section, we describe results concerning parallel computing and bounded size dynamic dictionary compression [11]. Deletion heuristics that discard dictionary elements to make space for the new substrings have been designed for the LZ2 compression method. Simple choices for the deletion heuristic which are easily parallelizable are: · FREEZE: once the dictionary is full, freeze it and do not allow any further entries to be added. · RESTART: stop adding further entries when the dictionary is full; when the compression ratio starts deteriorating clear the dictionary and learn new strings. · SWAP: when the primary dictionary ®rst becomes full, start an auxiliary dictionary, but continue compression based on the primary dictionary; when the auxiliary dictionary becomes full, clear the primary dictionary and reverse their roles. With FREEZE one processor ®lls up the dictionary. Then, a parallel algorithm for compression with a static dictionary may be applied to the remaining part of the string (similarly for decompression). With RESTART, starting in parallel from each position of the string a processor ®lls up a dictionary. Then, for each dictionary O n† processors compute the parsing on the corresponding remaining sux and check where the compression ratio starts deteriorating. Finally, we build a tree by linking position i to the one where the dictionary computed from i is cleared out. By pointer jumping we have the path giving the blocks providing the parsing of the string. For the RESTART heuristic the number of processor is quadratic. It is linear if the dictionary is 52 S. De Agostino / Information Sciences 135 (2001) 43±56 cleared as soon as it becomes full. Parallel decompression is possible if the points where the dictionary is cleared are expressed in the encoded string with extra bits. With SWAP, compute the blocks as in the version of RESTART using a linear number of processors. Then, recompute the parsing on each block with the dictionary computed on the previous block. Decompression is an open problem. The LRU deletion heuristic de®nes a string as ``used'' if it is a match or a pre®x of a match and replaces the least recently used leaf of the trie representing the dictionary with the new element. The SWAP and RESTART heuristics can be viewed as discrete versions of LRU since the dictionaries depend only on small segments of the input string and this is what makes possible a practical parallel algorithm. The LZ2 compression algorithm with LRU deletion heuristic on a dictionary of constant size j can be parallelized either in O log n† time with 2O j log j† n processors or in 2O j log j† log n time with O n† processors. In fact, the number of all the possible dictionaries with pointers to its elements and the associated information needed for the LRU deletion heuristic is 2O j log j† . For each of these dictionaries the greedy match is computed on each position of the input string. Link the pair p; D†, where p is a position of the string and D is one of the dictionaries, to the pair p ‡ ` ‡ 1; D0 † where ` is the length of the match relative to p; D† and D0 is the updating of D. If the match is at the end of the string the pair links to a special node v. The parsing and the sequence of pointers is given by the path from 1; ;† to v which can be computed by pointer jumping. The algorithm described is totally unpractical because of the huge multiplicative constant either in the running time or in the number of processors. Complexity argues give evidence that a better algorithm cannot be found. Such claim exploits a hardness result given in [11]. When the size of the dictionary is O logk n†, the LZ2 compression algorithm with LRU deletion heuristic is shown to be log-space hard for the class of problems solvable simultaneously in polynomial time and O logk n† space (that is, SCk ). Since its sequential complexity is polynomial in time and O logk n log log n† in space, the problem belongs to SCk‡1 . We want to point out that it is not by accident that j ®gures as an exponent in the parallel complexity of the problem. Since it is believed that SC is not included in NC, the SCk -hardness would imply the exponentiation of j. Hence, it is unlikely that practical NC algorithms exist. Observe that the P-completeness of the problem, that holds when j is superpolylogarithmic, does not suce to infer this exponentiation since j can ®gure as multiplicative factor of the time function. In [11] a relaxed version of LRU is presented. It is more sophisticated than SWAP since it removes elements in a continuous way as the original LRU but relaxes on the choice of the element. This relaxation makes the problem complete for SCk when the dictionary size is O logk n†. S. De Agostino / Information Sciences 135 (2001) 43±56 53 The relaxed version (RLRU) of the LRU heuristic is: RLRU: When the dictionary is not full, label the ith element added to the dictionary with the integer di  g=ke, where k is the dictionary size and g < k is the maximum number of distinct labels allowed. At the generic ith step, when the dictionary is full, remove one of the leaves with the smallest label in the trie storing the dictionary and add the new leaf. Let k be the greatest label among all the dictionary elements. Label the new leaf with k if di  g=ke ˆ d i 1†g=ke. If di  g=ke > d i 1†g=ke, label the leaf with k ‡ 1 and if k ‡ 1 > g decrease by 1 all the labels greater or equal to 2. Observe that the only case in which k is smaller than g is when the dictionary is the set faj : 1 6 j 6 kg, where a is a given character of the alphabet (in such case k might be equal to g 1). The SCk -completeness result of RLRU suggests that deletion heuristics that discard elements from the dictionary in a continuous way do not have practical parallel algorithms. Decompression is an open problem as for SWAP. 5. Conclusions In this paper, we addressed dictionary based lossless text compression and gave the state-of-the-art in the ®eld of parallelism. We showed how compression with static and sliding dictionaries can be parallelized eciently, while dynamic dictionary compression is hardly parallelizable. Although sliding dictionary compression is a special case of adaptive compression, from a computational point of view it resembles as the static method rather than as the dynamic one. The rationale is that sliding dictionaries are independent from how the string is parsed, similarly to the static ones. Theoretically, we consider static and sliding dictionary compression parallelizable because it employes polylogarithmic time and a polynomial number of processors, that is, it belongs to NC. In practice, we wish to have parallel algorithms requiring a linear number of processors and logarithmic time. It is a realistic assumption in dictionary based compression to assume the match lengths to be logarithmic. With such assumption, such goal is reached for greedy parsing with sliding and static dictionaries. Optimal parsing and longest ®rst fragment parsing require more computational resources. It would be interesting to design a di€erent heuristic in between greedy and optimal rather than longest ®rst fragment, which does not demand more resources. With dynamic dictionaries, the parsing determines the dictionary and compression becomes ineherently sequential. On the other hand, decompression happens to be parallelizable even for the dynamic case. The rationale is that decoding LZ2 compressed text consists of transforming the sequence of 54 S. De Agostino / Information Sciences 135 (2001) 43±56 pointers into the parsed string. Such transformation is parallelizable the other way around, as well. In other words, it is only the dynamic parsing procedure to make compression not parallelizable. It is of practical interest that the parallel decoders can be applied optimally to consecutive blocks of encoded data since the number of processors is limited in practice. This is obvious for static and sliding dictionaries. As far as the next character heuristic is concerned, the procedure copies pointers on the column vector of qi in the kth block until a pointer with the target in the dictionary built so far in the decoding process of the former blocks is reached and, consequently, li is increased by the length of this target. For the ID heuristic, pointers with the target in the dictionary built in the decoding process of the former blocks would be the leaves of a tree. A slightly di€erent coding was introduced by adding dummy elements to the dictionary in order to make the parallelization possible for the identity. We can arm that this modi®cation is negligible in practice since dummy elements are not frequent on realistic ®les, especially for large alphabets. Also, it is asymptotically irrelevant for the compression eciency since a new element is added at least at every other step and the pointer size increases at most by one bit. An interesting open problem is whether computing arbitrary target lengths can be done in logarithmic time on the PRAM EREW or in sublinear time with optimal parallel work for the identity heuristic in a randomized or deterministic way. An other question is whether other parallel decoders can be designed which do not make use of the target lengths. Bounding the size of dynamic dictionaries decreases the parallel complexity of parsing since work-space resources are strictly related to parallel time. However, we gave evidence with arguments from complexity theory that practical parallel algorithms would not exist when the deletion heuristic discards an element from the dictionary at each algorithmic step. Practical parallel algorithms exist for heuristics which discard the entire dictionary at the end of a segment. Since dictionaries depend only on small segments of the input string, an ecient parallelization is possible by applying the sequential algorithm on di€erent segments at the same time. It is worth observing that parallel decompression seems a hard task to pursue not only for LRU but even for SWAP. In fact, even transforming the parsed string into the sequence of pointers appears hardly parallelizable for these heuristics. In conclusion, FREEZE and RESTART seem to be the only suitable LZ2 heuristics for parallel compression and decompression. References [1] A. Apostolico, C. Iliopoulos, G.M. Landau, B. Schieber, U. Viskin, Parallel construction of a sux tree with applications, Algorithmica 3 (1988) 347±365. S. De Agostino / Information Sciences 135 (2001) 43±56 55 [2] D. Belinskaya, S. De Agostino, J.A. Storer, Near optimal compression with respect to a static dictionary on a practical massively parallel architecture, in: Proceedings of the IEEE Data Compression Conference, 1995, pp. 172±181. [3] T.C. Bell, J.G. Cleary, I.H. Witten, Text Compression, Prentice-Hall, Englewood Cli€s, NJ, 1990. [4] M. Cohn, R. Khazan, Parsing with sux and pre®x dictionaries, in: Proceedings of the IEEE Data Compression Conference, 1996, pp. 180±189. [5] M. Crochemore, W. Rytter, Ecient parallel algorithms to test square-freeness and factorize strings, Inf. Process. Lett. 38 (1991) 57±60. [6] S. De Agostino, J.A. Storer, Parallel algorithms for optimal compression using dictionaries with the pre®x property, in: Proceedings of the IEEE Data Compression Conference, 1992, pp. 52±61. [7] S. De Agostino, P-complete problems in data compression, Theoret. Comput. Sci. 127 (1994) 181±186. [8] S. De Agostino, Erratum to P-complete problems in data compression, Theoret. Comput. Sci. 234 (2000) 325±326. [9] S. De Agostino, A parallel decoding algorithm for LZ2 data compression, Parallel Comput. 21 (1995) 1957±1961. [10] S. De Agostino, A parallel decoder for LZ2 compression using the ID update heuristic, in: IEEE Proceedings Sequences'97, 1997, pp. 368±373. [11] S. De Agostino, R. Silvestri, Bounded size dictionary compression: SCk , in: Proceedings STACS'98, LNCS, vol. 1373, 1998, pp. 522±532. [12] S. De Agostino, Work-optimal parallel decoders for LZ2 data compression, in: Proceedings of the IEEE Data Compression Conference, 2000, pp. 393±399. [13] S. De Agostino, Speeding up parallel decoding of LZ compressed text on the PRAM EREW, in: Proceedings SPIRE'2000. [14] M. Farach, S. Muthukrishnan, Optimal parallel dictionary matching and compression, in: Proceedings SPAA'95, 1995, pp. 244±253. [15] H. Gazit, An optimal randomized parallel algorithm for ®nding connected components in a graph, in: Proceedings FOCS'86, 1986, pp. 492±501. [16] A. Hartman, M. Rodeh, Optimal parsing of strings. in: A. Apostolico, Z. Galil (Eds.), Combinatorial Algorithms on Words, Springer, New York, pp. 155±167. [17] D.S. Hirschberg, L.M. Stau€er, Dictionary compression on the PRAM, Parallel Process. Lett. 7 (1997) 297±308. [18] A. Lempel, J. Ziv, On the complexity of ®nite sequences, IEEE Trans. Inf. Theory 22 (1976) 75±81. [19] A. Lempel, J. Ziv, A universal algorithm for sequential data compression, IEEE Trans. Inf. Theory 23 (1977) 337±343. [20] V.S. Miller, M.N. Wegman, Variations on theme by Ziv±Lempel, in: A. Apostolico, Z. Galil (Eds.), Combinatorial Algorithms on Words, Springer, New York, 1985, pp. 131±140. [21] H. Nagumo, M. Lu, K. Watson, Parallel algorithms for the static dictionary compression, in: Proceedings of the IEEE Data Compression Conference, 1995, pp. 162±171. [22] M. Naor, String matching with preprocessing of text and pattern, in: Proceedings ICALP, LNCS, vol. 510, 1991, pp. 739±750. [23] J.A. Storer, Data Compression: Methods and Theory, Computer Science Press, Rockville, MD, 1988. [24] R.E. Tarjan, U. Vishkin, An ecient parallel biconnectivity algorithm, SIAM J. Comput. 14 (1985) 862±874. [25] R.A. Wagner, Common phrases and minimum text storage, Commun. ACM 16 (1973) 148± 152. 56 S. De Agostino / Information Sciences 135 (2001) 43±56 [26] T.A. Welch, A technique for high-performance data compression, IEEE Computer 17 (1984) 8±19. [27] J. Ziv, A. Lempel, Compression of individual sequences via variable rate coding, IEEE Trans. Inf. Theory 24 (1978) 531±536.