Information Sciences 135 (2001) 43±56
www.elsevier.com/locate/ins
Parallelism and dictionary based data
compression
Sergio De Agostino
Computer Science Department, Armstrong Atlantic State University, 11935 Abercorn Street,
Savannah, GA 31419-1997, USA
Abstract
Because of the size of information involved with the emerging applications in multimedia and the Human Genome Project, parallelism oers the only hope of meeting the
challenges of storing such databases and searching through quickly. In this paper, we
address dictionary based lossless text compression and give the state-of-the-art in the
®eld of parallelism. Static dictionary compression and sliding window (LZ1) compression have been successfully parallelized by many authors. Dynamic dictionary compression (LZ2) seems hardly parallelizable since some related heuristics are known to be
P-complete. In spite of such negative results, the decoding process can be parallelized
eciently for LZ2 compression as well as for static and LZ1. A main issue for implementation purposes in dictionary based compression is to bound the dictionary size
[3,23]. Dierences in terms of parallel complexity are not relevant between compression
with bounded and unbounded windows from a theoretical point of view. Much more
interesting are the results concerning bounded size dictionary compression with the LZ2
method. When the size of the dictionary is O logk n, a bounded size dictionary version
(LRU deletion heuristic) of the LZ2 compression algorithm is hard for the class of
problems solvable simultaneously in polynomial time and O logk n space (that is, SCk ).
A relaxed variation of this heuristic is the ®rst natural SCk -complete problem (the
original heuristic belongs to SCk1 ). In virtue of these results, it can be argued that there
are no practical parallel algorithms for LZ2 compression with LRU deletion heuristic or
any other heuristic deleting dictionary elements in a continuous way. For simpler
heuristics (SWAP, RESTART, FREEZE), practical parallel algorithms exist. Ó 2001
Elsevier Science Inc. All rights reserved.
E-mail address: agos@armstrong.edu (S. De Agostino).
0020-0255/01/$ - see front matter Ó 2001 Elsevier Science Inc. All rights reserved.
PII: S 0 0 2 0 - 0 2 5 5 ( 0 1 ) 0 0 1 0 0 - 1
44
S. De Agostino / Information Sciences 135 (2001) 43±56
1. Introduction
Static dictionary compression [4,16,25] and sliding window (LZ1) compression [19] have been successfully parallelized by many authors. Ecient
parallel algorithms (polylogarithmic time and polynomial number of processors) have been designed for compression and decompression with static and
sliding dictionaries [2,5,6,14,17,21,22]. The LZ2 compression method [18,27]
(also called dynamic) seems hardly parallelizable since some related heuristics
are known to be P-complete [7,8]. In spite of such negative results, the decoding
process can be parallelized eciently [9,10,12,13]. Observe that even if compression is not parallelizable, fast parallel decompression may be still relevant
for CD-ROM memories or all applications where decompression is more frequent than compression. The problem of designing work-optimal parallel decoding algorithms was addressed in [14]. They were able to provide optimal
work parallel Las Vegas algorithms for compression and decompression with
static and sliding dictionaries. Similar results were obtained for decompression
with LZ2 type dictionaries in [12].
A main issue for implementation purposes in dictionary based compression
is to bound the dictionary size [3,23]. Dierences in terms of parallel complexity are not relevant between compression with bounded and unbounded
windows from a theoretical point of view. Much more interesting are the results concerning bounded size dictionary compression with the LZ2 method
[11]. When the size of the dictionary is O logk n, a bounded size dictionary
version (LRU deletion heuristic) of the LZ2 compression algorithm is hard for
the class of problems solvable simultaneously in polynomial time and O logk n
space (that is, SCk ). A relaxed variation of this heuristic is the ®rst natural SCk complete problem (the original heuristic belongs to SCk1 ). In virtue of these
results, it can be argued that there are no practical parallel algorithms for LZ2
compression with LRU deletion heuristic or any other heuristic deleting dictionary elements in a continuous way. For simpler heuristics (SWAP, RESTART, FREEZE), practical parallel algorithms exist.
In Section 2 we describe parallel algorithms for compression and decompression with static and sliding dictionaries. The LZ2 compression method is
addressed in Section 3 and techniques for parallel decompression are described.
Section 4 considers parallel complexity issues for LZ2 heuristics working with
bounded size dictionaries. Conclusions and open problems are given in Section
5.
2. Static and sliding (LZ1) dictionaries
With static dictionary compression, a string is compressed by replacing
substrings with pointers to copies, called targets, which are stored in a given
S. De Agostino / Information Sciences 135 (2001) 43±56
45
dictionary. The string is factorized and the factors are dictionary elements. The
factorization of the string is called parsing. The compressed string is a sequence
of pointers encoding the factors. As mentioned in Section 1, parallel algorithms
exist for compression and decompression with static dictionaries. Decompression is trivial since we can decode the pointers in parallel and reconstruct
the original string by parallel pre®x with O n processors where n is the length
of the output string. We address in Sections 2.1±2.3 how to approach in parallel the basic compression heuristics using a static dictionary. Section 2.4
concerns sliding dictionaries.
2.1. Optimal parsing
A practical assumption is that the pointers have equal size, so that the way
to obtain the best compression is to factorize the string into the minimum
number of factors, that we call optimal parsing. The optimal parsing problem
can be reduced to the problem of ®nding the shortest path between two vertices
in a directed graph. If x1 . . . xn is the input string X, V v1 . . . vn ; vn1 a set of
vertices, k the cardinality of the dictionary D and l h the length of the hth
dictionary element, the reduction is
in parallel for 1 6 i 6 n do
in parallel for 1 6 h 6 k do
if the hth dictionary element matches X in position i
then add a directed edge from vi to vil h in G.
A PRAM CREW is an allowed model for this reduction because no concurrent writing is involved in the computation and O nkM processors realize it in logarithmic time, with M maximum length of a target. The
shortest path from v1 to vn1 provides the optimal parsing. In fact, each
path between two nodes vi and vj represents a parsing on the substring
xi . . . xj 1 and the internal nodes of the path provide the position of the ®rst
character of each phrase. An algorithm for computing shortest paths requires O n3 processors and O log2 n time or O n4 processors and O log n
time on a PRAM CREW model of computation. It follows that the optimal
parsing problem belongs to NC, the class of problems solvable in polylogarithmic time and a polynomial number of processors [6]. If the dictionary is stored in a trie, there is a parallel algorithm requiring
O M log M log n time and OnM 2 processors [17]. These procedures also
apply to the case where pointers have dierent size. It is enough to weight
the edges with the size of the pointer to the corresponding target.
A dictionary D is pre®x if all the pre®xes of an element are in D. Optimal
parsing with pre®x dictionaries is computationally easier.
46
S. De Agostino / Information Sciences 135 (2001) 43±56
The following procedure compute the optimal parsing with a pre®x dictionary [16]:
j : 0; i : 0;
repeat forever begin
for k j 1 to i 1 do
begin
let h k be such that xk . . . xh k is the longest match in the kth position
end
let k 0 be such that h k 0 is maximum
xj . . . xk0 1 is a factor of the parsing
j : k 0 ; i : h k 0
end
In the procedure above, at each step we select a factor such that the longest
match with a dictionary element in the next position ends to the rightest. Since
the dictionary is pre®x, the parsing is optimal. The algorithm can even be
implemented in real time with a modi®ed sux trie data structure. In [14], a
Las Vegas work-optimal parallel algorithm to compress optimally with a pre®x
dictionary in expected logarithmic time is given. In [21], they show a deterministic algorithm. With a sux trie data structure, the longest sequence of two
factors in each position of the string can be computed in O M parallel time on
the PRAM CREW. Position i is linked to the position of the second factor. If
the second factor is empty, i is linked to n 1 where n is the length of the
string. We obtain a tree rooted in n 1. The positions of the factors of an
optimal parsing are given by the path from 1 to n 1, which can be found with
a linear number of processors and logarithmic time on the PRAM CREW by
pointer jumping. A parallel approximation scheme for the problem is provided
in [2] which works locally and can be implemented on a linear array of processors in sublinear time.
2.2. Greedy parsing
Greedy parsing is produced by a procedure that, reading the string from left
to right, ®nds at each step the longest match starting from the ®rst position of
the sux not parsed yet. Dierently from optimal parsing, greedy parsing can
always be computed on-line in real time given any dictionary stored in a trie. In
order to compute greedy parsing in parallel ®nd the greedy factor in position i
and link i to j 1, where j is the last position of the factor. If the greedy factor
ends the string i is linked to n 1, where n is the length of the string. Again, we
formed a tree rooted in n 1 and the positions of the factors of the greedy
parsing are given by the path from 1 to n 1. It follows that the greedy parsing
problem belongs to NC. A practical parallel algorithm exists with a linear
number of processors and O log n M time on the PRAM CREW [17].
S. De Agostino / Information Sciences 135 (2001) 43±56
47
2.3. Longest ®rst fragment parsing
Longest ®rst fragment parsing is produced by an o-line greedy procedure
which select at each step the leftmost occurence of the longest match with the
dictionary which does not overlap factors previously selected. A parallel approach to this problem is to compute a maximum collection of longest nonoverlapping matches and iterate the computation on the uncovered part of the
string. This approach takes O M log n time and O n= log n processors [17]. In
[21] a dierent approach requiring O M log n time and O Mn processors is
shown and an NC algorithm for arbitrary match lengths using O M 2 n processors can be derived.
2.4. Sliding dictionaries
If the dictionary is a sliding window (LZ1) then the encoded string is a
sequence of pointers qi di ; `i , where di is the displacement back into the
window and ` the length of the target. First, we address parallel decoding.
If s1 ; . . . ; sm are the partial sums of l1 ; . . . ; lm , the target of qi encodes the
substring over the positions si 1 1 si of the output string. Link the
positions si 1 1 si to the positions si 1 1 di si 1 1 di li 1,
respectively. If di 0, the target of qi is an alphabet character and the
corresponding position in the output string is a sink that can be seen as the
root of a tree in a forest. All the nodes in a tree correspond to positions of
the decoded string where the character is the root. A work-optimal parallel
decoder is given by applying the Las Vegas algorithm in [15] which ®nd the
connected components of a graph G V ; E in expected time O log jV j
and O jEj work on a PRAM CRCW [14]. On the PRAM EREW, decoding
takes a linear number of processors and deterministic logarithmic time by
means of the Euler tour technique [13,22].
As far as parallel coding is concerned, computational dierences exist
between the restricted case (the window has a ®xed length M) and the
unrestricted one (the window is the whole input read so far). With O Mn
processors, the longest match in each position with the corresponding
window can be computed in parallel on the PRAM CREW. The running
time is O log M by using the partial results obtained in the other positions
during the computation. Then, the parsing is found by pointer jumping [17].
Obviously, this approach also provides an NC algorithm for the unrestricted
case with a quadratic number of processors. Compression with a linear
number of processors is possible in O M log n time, where M is the
maximum match length [5,17]. It exploits a sux trie data structure storing
all the substrings of the input, which can be constructed in parallel on the
PRAM CRCW [1].
48
S. De Agostino / Information Sciences 135 (2001) 43±56
3. Dynamic dictionaries (LZ2)
The LZ2 compression method seems hardly parallelizable since some related
heuristics are known to be P-complete [7,8]. Standard implementations of such
method are the next character heuristic [26] and the identity heuristic [20]. The
next character heuristic is P-complete. The identity heuristic has not been
proved to be P-complete. However, the P-completeness of other LZ2 compression heuristics suggests that it is hardly parallelizable. On the other hand,
parallel decoding is possible. In Sections 3.1±3.4 we describe the two heuristic
and discuss the related issues concerning parallel decompression.
3.1. The next character heuristic
The next character heuristic is a standard technique that learns substrings by
reading the input string from left to right with an incremental parsing procedure. The dictionary comprises only the alphabet characters, initially. This
procedure adds a new substring to the dictionary as soon as a pre®x of the still
unparsed part of the string does not match a dictionary element. Then, it restarts the reading from the last character of the new substring and replaces the
pre®x with a pointer to the copy in the dictionary. Therefore, the concatenation
of a phrase with the next character is a dictionary element.
Example 1.
abababaabaaa
parsing: a, b, ab, aba, abaa, a;
dictionary: a, b, ab, ba, aba, abaa, abaaa;
coding: 1, 2, 3, 5, 6, 1.
Let S s1 . . . sn be a text string of length n over an alphabet R fr1 ; . . . ; rd g of
cardinality d and q1 . . . qm the sequence of pointers output by the next character
heuristic. Observe that if the target of a pointer qi is not an alphabet character
then it is the concatenation of the target of the pointer in position qi d with
the ®rst character of the target of the pointer in position qi d 1. In fact, a
new element is added to the dictionary at each step of the parsing and the
dictionary contains initially the alphabet characters. This property is fundamental for the parallel decompressor.
3.2. Parallel decoding for the next character heuristic
Let q1 . . . qm be the sequence of pointers output the next character heuristic,
encoding an input string S of length n drawn over an alphabet R of cardinality
d. Since the target of the pointer qi is the concatenation of the target of the
pointer in position qi d with the ®rst character of the target of the pointer in
S. De Agostino / Information Sciences 135 (2001) 43±56
49
position qi d 1, in parallel for each pointer qi we can compute the list of
pointers to the pre®xes of the target of qi , that we rename qi;1 ; . . . ; qi;` . Observe
that qi;` is the pointer representing the ®rst character of the target and ` is the
target length. The maximum target length is H n1=2 and is reached, for example, on the unary string where the match length grows by one at each step.
Furthermore, the jth character of the target is represented by the last element
of the list associated to the pointer in position qi;` j1 d 1. It follows that
decoding can be realized on the PRAM CREW in O log n time with O n
processors [9].
As for the LZ1 method, we can decode also LZ2 compressed text with a
linear number of processors and logarithmic time on the PRAM EREW by
reducing the problem to ®nding the trees of a forest. We apply the Euler tour
technique [24] to compute the length of each target. In parallel for each i, link
pointer qi to the pointer in position qi d, if qi > d. Again, we obtain a forest
where each tree is rooted in a pointer representing the alphabet character and
the length li of the target of a pointer qi is equal to the level of the pointer in the
tree plus 1. It is trivial that building such a forest takes constant time. By means
of the Euler tour technique, we can compute the trees of such forest and the
level of each node in its own tree. Therefore, we can compute the length of each
target with O m processors and logarithmic time on the PRAM EREW. If
s1 ; . . . ; sm are the partial sums of l1 ; . . . ; lm the target of qi is a copy of the
substring over the positions si 1 1 si of the output string. For each qi
which does not correspond to an alphabet character de®ne first i sqi d 1 1
and last i sqi d 1. Link the positions si 1 1 si to the positions
sfirst i slast i , respectively. As in the sliding dictionary case, if the target of qi is
an alphabet character the corresponding position in the output string is the
root of a tree in a forest and all the nodes in a tree correspond to positions of
the decoded string where the character is the root. Then, the output string can
be computed with O n processors and O log n on the PRAM EREW [13].
On a PRAM CREW, every target length is computed in logarithmic time
with m processors by pointer jumping. Since m is O n= log n, a work-optimal
parallel decoder on a PRAM CRCW also can be provided for the LZ2 method
[12].
3.3. The identity heuristic
The identity heuristic also learns substrings by reading the input string from
left to right with a greedy parsing procedure. The dictionary still comprises
only the alphabet characters, initially. The heuristic adds to the dictionary the
concatenation of the last match with the current greedy match. The current
match is replaced with a pointer to the dictionary and the reading restarts from
the next character. Therefore, the concatenation of two adjacent phrases is a
dictionary element (the term ``identity'' comes from the fact that an entire
50
S. De Agostino / Information Sciences 135 (2001) 43±56
phrase is concatenated to the previous one). Dierently from the next character
heuristic, it is not guaranteed that the identity heuristic adds a new element to
the dictionary at a given step. To enable parallel decompression for the identity
heuristic, a slight variation in the coding was introduced in [10]. That is, we
increase the pointer value by adding a dummy element to the dictionary when a
new element is not learned at a given step (but the ®rst one).
Example 2.
ababababababbabbbabb
parsing: a, b, ab, ab, abab, ab, b, abb, babb;
dictionary: a, b, ab, bab, abab, ababab, ``ababab'', abb, babb, abbbabb;
coding: 1, 2, 3, 3, 5, 3, 2, 8, 9.
At the ®rst step the heuristic never learns a new dictionary element. Observe
that it is possible not to learn a new element at a later step. At the ®fth step the
element ababab is in the dictionary already. Neverthless, we increase the
pointer value by adding a dummy element to the dictionary. We can arm
that this modi®cation is negligible in practice since dummy elements are not
frequent on realistic ®les, especially on large alphabets. Let S s1 . . . sn be a
text string of length n over an alphabet R fr1 ; . . . ; rd g of cardinality d and
q1 . . . qm the sequence of pointers output by the ID heuristic. Observe that if
the target of the pointer qi is not an alphabet character then it is the concatenation of the target of the pointer in position qi d with the target of the
pointer in position qi d 1. In fact, an element (new or dummy) is added to
the dictionary at each step of the parsing but the ®rst and the dictionary
contains initially the alphabet characters. This property enables parallel decompression.
3.4. Parallel decoding for the identity heuristic
Let q1 . . . qm be the sequence of pointers output by the identity heuristic,
encoding a string of length n drawn over an alphabet R of cardinality d. As
explained in Section 3.3, the target of the pointer qi is the concatenation of
the target of the pointer in position qi d with the target of the pointer in
position qi d 1. Decompressing a pointer qi of the encoded string
q1 . . . qm output by the identity heuristic recalls a tree walk. Consider qi the
root of a tree where the left subtree is rooted into the pointer in position
qi d and the right subtree is rooted into the pointer in position qi d 1.
Then, recursively we can de®ne a tree where the leaves, read from left to
right, provide the target of qi . For istance, pointer 9 of Example 2 refers to
pointer 2 in position 7 and pointer 8 in position 8. Then, the ®rst character
of the target of pointer 9 is b. Pointer 8 refers to pointer 3 in position 6 and
pointer 2 in position 7. Then, also the last character of the target of pointer
S. De Agostino / Information Sciences 135 (2001) 43±56
51
9 is b. Pointer 3 refers to pointer 1 in position 1 and pointer 2 in position 2.
Therefore, the target of pointer 9 is babb. The maximum target length is
H n= log n and is reached, for example, on the unary string where the
match length grows as the Fibonacci numbers. What makes problematic to
compute the target lengths in logarithmic time on a PRAM EREW or with
optimal parallel work is the fact that we deal with a tree structure associated with the pointers, whereas for the next character heuristic the structure
is linear. Thus, the underlaying undirected graph obtained from the linking
among the pointers is cyclic. In parallel for each pointer qi we can compute
the tree providing the target of qi as follows: at each step of the iteration,
for each leaf q of the tree rooted in qi make a copy of the trees rooted into
the pointers in positions q d and q d 1 and add these copies as left
subtree and right subtree of q, respectively. This procedure takes O log n
time and O n processors on the PRAM CREW [10].
4. Bounded size dynamic dictionaries
A main issue for implementation purposes is to bound the dictionary size. In
this section, we describe results concerning parallel computing and bounded
size dynamic dictionary compression [11].
Deletion heuristics that discard dictionary elements to make space for the
new substrings have been designed for the LZ2 compression method. Simple
choices for the deletion heuristic which are easily parallelizable are:
· FREEZE: once the dictionary is full, freeze it and do not allow any further
entries to be added.
· RESTART: stop adding further entries when the dictionary is full; when the
compression ratio starts deteriorating clear the dictionary and learn new
strings.
· SWAP: when the primary dictionary ®rst becomes full, start an auxiliary dictionary, but continue compression based on the primary dictionary; when
the auxiliary dictionary becomes full, clear the primary dictionary and reverse their roles.
With FREEZE one processor ®lls up the dictionary. Then, a parallel algorithm for compression with a static dictionary may be applied to the remaining part of the string (similarly for decompression). With RESTART,
starting in parallel from each position of the string a processor ®lls up a dictionary. Then, for each dictionary O n processors compute the parsing on the
corresponding remaining sux and check where the compression ratio starts
deteriorating. Finally, we build a tree by linking position i to the one where the
dictionary computed from i is cleared out. By pointer jumping we have the
path giving the blocks providing the parsing of the string. For the RESTART
heuristic the number of processor is quadratic. It is linear if the dictionary is
52
S. De Agostino / Information Sciences 135 (2001) 43±56
cleared as soon as it becomes full. Parallel decompression is possible if the
points where the dictionary is cleared are expressed in the encoded string with
extra bits. With SWAP, compute the blocks as in the version of RESTART
using a linear number of processors. Then, recompute the parsing on each
block with the dictionary computed on the previous block. Decompression is
an open problem.
The LRU deletion heuristic de®nes a string as ``used'' if it is a match or a
pre®x of a match and replaces the least recently used leaf of the trie representing the dictionary with the new element. The SWAP and RESTART
heuristics can be viewed as discrete versions of LRU since the dictionaries
depend only on small segments of the input string and this is what makes
possible a practical parallel algorithm.
The LZ2 compression algorithm with LRU deletion heuristic on a dictionary of constant size j can be parallelized either in O log n time with 2O j log j n
processors or in 2O j log j log n time with O n processors. In fact, the number of
all the possible dictionaries with pointers to its elements and the associated
information needed for the LRU deletion heuristic is 2O j log j . For each of
these dictionaries the greedy match is computed on each position of the input
string. Link the pair p; D, where p is a position of the string and D is one of
the dictionaries, to the pair p ` 1; D0 where ` is the length of the match
relative to p; D and D0 is the updating of D. If the match is at the end of the
string the pair links to a special node v. The parsing and the sequence of
pointers is given by the path from 1; ; to v which can be computed by pointer
jumping.
The algorithm described is totally unpractical because of the huge multiplicative constant either in the running time or in the number of processors.
Complexity argues give evidence that a better algorithm cannot be found. Such
claim exploits a hardness result given in [11]. When the size of the dictionary is
O logk n, the LZ2 compression algorithm with LRU deletion heuristic is
shown to be log-space hard for the class of problems solvable simultaneously in
polynomial time and O logk n space (that is, SCk ). Since its sequential complexity is polynomial in time and O logk n log log n in space, the problem belongs to SCk1 . We want to point out that it is not by accident that j ®gures as
an exponent in the parallel complexity of the problem. Since it is believed that
SC is not included in NC, the SCk -hardness would imply the exponentiation of
j. Hence, it is unlikely that practical NC algorithms exist. Observe that the
P-completeness of the problem, that holds when j is superpolylogarithmic,
does not suce to infer this exponentiation since j can ®gure as multiplicative
factor of the time function.
In [11] a relaxed version of LRU is presented. It is more sophisticated than
SWAP since it removes elements in a continuous way as the original LRU but
relaxes on the choice of the element. This relaxation makes the problem
complete for SCk when the dictionary size is O logk n.
S. De Agostino / Information Sciences 135 (2001) 43±56
53
The relaxed version (RLRU) of the LRU heuristic is:
RLRU: When the dictionary is not full, label the ith element added to the
dictionary with the integer di g=ke, where k is the dictionary size and g < k
is the maximum number of distinct labels allowed. At the generic ith step,
when the dictionary is full, remove one of the leaves with the smallest label
in the trie storing the dictionary and add the new leaf. Let k be the greatest
label among all the dictionary elements. Label the new leaf with k if
di g=ke d i 1g=ke. If di g=ke > d i 1g=ke, label the leaf with
k 1 and if k 1 > g decrease by 1 all the labels greater or equal to 2.
Observe that the only case in which k is smaller than g is when the dictionary
is the set faj : 1 6 j 6 kg, where a is a given character of the alphabet (in such
case k might be equal to g 1).
The SCk -completeness result of RLRU suggests that deletion heuristics that
discard elements from the dictionary in a continuous way do not have practical
parallel algorithms. Decompression is an open problem as for SWAP.
5. Conclusions
In this paper, we addressed dictionary based lossless text compression and
gave the state-of-the-art in the ®eld of parallelism. We showed how compression with static and sliding dictionaries can be parallelized eciently, while
dynamic dictionary compression is hardly parallelizable. Although sliding
dictionary compression is a special case of adaptive compression, from a
computational point of view it resembles as the static method rather than as the
dynamic one. The rationale is that sliding dictionaries are independent from
how the string is parsed, similarly to the static ones.
Theoretically, we consider static and sliding dictionary compression parallelizable because it employes polylogarithmic time and a polynomial number of
processors, that is, it belongs to NC. In practice, we wish to have parallel algorithms requiring a linear number of processors and logarithmic time. It is a
realistic assumption in dictionary based compression to assume the match
lengths to be logarithmic. With such assumption, such goal is reached for
greedy parsing with sliding and static dictionaries. Optimal parsing and longest
®rst fragment parsing require more computational resources. It would be interesting to design a dierent heuristic in between greedy and optimal rather
than longest ®rst fragment, which does not demand more resources.
With dynamic dictionaries, the parsing determines the dictionary and
compression becomes ineherently sequential. On the other hand, decompression happens to be parallelizable even for the dynamic case. The rationale is
that decoding LZ2 compressed text consists of transforming the sequence of
54
S. De Agostino / Information Sciences 135 (2001) 43±56
pointers into the parsed string. Such transformation is parallelizable the other
way around, as well. In other words, it is only the dynamic parsing procedure
to make compression not parallelizable. It is of practical interest that the
parallel decoders can be applied optimally to consecutive blocks of encoded
data since the number of processors is limited in practice. This is obvious for
static and sliding dictionaries. As far as the next character heuristic is concerned, the procedure copies pointers on the column vector of qi in the kth
block until a pointer with the target in the dictionary built so far in the decoding process of the former blocks is reached and, consequently, li is increased by the length of this target. For the ID heuristic, pointers with the
target in the dictionary built in the decoding process of the former blocks
would be the leaves of a tree. A slightly dierent coding was introduced by
adding dummy elements to the dictionary in order to make the parallelization
possible for the identity. We can arm that this modi®cation is negligible in
practice since dummy elements are not frequent on realistic ®les, especially for
large alphabets. Also, it is asymptotically irrelevant for the compression eciency since a new element is added at least at every other step and the pointer
size increases at most by one bit. An interesting open problem is whether
computing arbitrary target lengths can be done in logarithmic time on the
PRAM EREW or in sublinear time with optimal parallel work for the identity
heuristic in a randomized or deterministic way. An other question is whether
other parallel decoders can be designed which do not make use of the target
lengths.
Bounding the size of dynamic dictionaries decreases the parallel complexity of parsing since work-space resources are strictly related to parallel
time. However, we gave evidence with arguments from complexity theory
that practical parallel algorithms would not exist when the deletion heuristic
discards an element from the dictionary at each algorithmic step. Practical
parallel algorithms exist for heuristics which discard the entire dictionary at
the end of a segment. Since dictionaries depend only on small segments of
the input string, an ecient parallelization is possible by applying the sequential algorithm on dierent segments at the same time. It is worth observing that parallel decompression seems a hard task to pursue not only for
LRU but even for SWAP. In fact, even transforming the parsed string into
the sequence of pointers appears hardly parallelizable for these heuristics. In
conclusion, FREEZE and RESTART seem to be the only suitable LZ2
heuristics for parallel compression and decompression.
References
[1] A. Apostolico, C. Iliopoulos, G.M. Landau, B. Schieber, U. Viskin, Parallel construction of a
sux tree with applications, Algorithmica 3 (1988) 347±365.
S. De Agostino / Information Sciences 135 (2001) 43±56
55
[2] D. Belinskaya, S. De Agostino, J.A. Storer, Near optimal compression with respect to a static
dictionary on a practical massively parallel architecture, in: Proceedings of the IEEE Data
Compression Conference, 1995, pp. 172±181.
[3] T.C. Bell, J.G. Cleary, I.H. Witten, Text Compression, Prentice-Hall, Englewood Clis, NJ,
1990.
[4] M. Cohn, R. Khazan, Parsing with sux and pre®x dictionaries, in: Proceedings of the IEEE
Data Compression Conference, 1996, pp. 180±189.
[5] M. Crochemore, W. Rytter, Ecient parallel algorithms to test square-freeness and factorize
strings, Inf. Process. Lett. 38 (1991) 57±60.
[6] S. De Agostino, J.A. Storer, Parallel algorithms for optimal compression using dictionaries
with the pre®x property, in: Proceedings of the IEEE Data Compression Conference, 1992, pp.
52±61.
[7] S. De Agostino, P-complete problems in data compression, Theoret. Comput. Sci. 127 (1994)
181±186.
[8] S. De Agostino, Erratum to P-complete problems in data compression, Theoret. Comput. Sci.
234 (2000) 325±326.
[9] S. De Agostino, A parallel decoding algorithm for LZ2 data compression, Parallel Comput. 21
(1995) 1957±1961.
[10] S. De Agostino, A parallel decoder for LZ2 compression using the ID update heuristic, in:
IEEE Proceedings Sequences'97, 1997, pp. 368±373.
[11] S. De Agostino, R. Silvestri, Bounded size dictionary compression: SCk , in: Proceedings
STACS'98, LNCS, vol. 1373, 1998, pp. 522±532.
[12] S. De Agostino, Work-optimal parallel decoders for LZ2 data compression, in: Proceedings of
the IEEE Data Compression Conference, 2000, pp. 393±399.
[13] S. De Agostino, Speeding up parallel decoding of LZ compressed text on the PRAM EREW,
in: Proceedings SPIRE'2000.
[14] M. Farach, S. Muthukrishnan, Optimal parallel dictionary matching and compression, in:
Proceedings SPAA'95, 1995, pp. 244±253.
[15] H. Gazit, An optimal randomized parallel algorithm for ®nding connected components in a
graph, in: Proceedings FOCS'86, 1986, pp. 492±501.
[16] A. Hartman, M. Rodeh, Optimal parsing of strings. in: A. Apostolico, Z. Galil (Eds.),
Combinatorial Algorithms on Words, Springer, New York, pp. 155±167.
[17] D.S. Hirschberg, L.M. Stauer, Dictionary compression on the PRAM, Parallel Process. Lett.
7 (1997) 297±308.
[18] A. Lempel, J. Ziv, On the complexity of ®nite sequences, IEEE Trans. Inf. Theory 22 (1976)
75±81.
[19] A. Lempel, J. Ziv, A universal algorithm for sequential data compression, IEEE Trans. Inf.
Theory 23 (1977) 337±343.
[20] V.S. Miller, M.N. Wegman, Variations on theme by Ziv±Lempel, in: A. Apostolico,
Z. Galil (Eds.), Combinatorial Algorithms on Words, Springer, New York, 1985, pp.
131±140.
[21] H. Nagumo, M. Lu, K. Watson, Parallel algorithms for the static dictionary compression, in:
Proceedings of the IEEE Data Compression Conference, 1995, pp. 162±171.
[22] M. Naor, String matching with preprocessing of text and pattern, in: Proceedings ICALP,
LNCS, vol. 510, 1991, pp. 739±750.
[23] J.A. Storer, Data Compression: Methods and Theory, Computer Science Press, Rockville,
MD, 1988.
[24] R.E. Tarjan, U. Vishkin, An ecient parallel biconnectivity algorithm, SIAM J. Comput. 14
(1985) 862±874.
[25] R.A. Wagner, Common phrases and minimum text storage, Commun. ACM 16 (1973) 148±
152.
56
S. De Agostino / Information Sciences 135 (2001) 43±56
[26] T.A. Welch, A technique for high-performance data compression, IEEE Computer 17 (1984)
8±19.
[27] J. Ziv, A. Lempel, Compression of individual sequences via variable rate coding, IEEE Trans.
Inf. Theory 24 (1978) 531±536.