Golomb Coding
Golomb Coding
Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a
geometric distribution will have a Golomb code as an optimal prefix code,[1] making Golomb coding highly suitable for situations in which the occurrence of small values in the
input stream is significantly more likely than large values.
Rice coding
Rice coding (invented by Robert F. Rice) denotes using a subset of the family of Golomb codes to produce a simpler (but possibly suboptimal) prefix code. Rice used this set of
codes in an adaptive coding scheme; "Rice coding" can refer either to that adaptive scheme or to using that subset of Golomb codes. Whereas a Golomb code has a tunable
parameter that can be any positive integer value, Rice codes are those in which the tunable parameter is a power of two. This makes Rice codes convenient for use on a computer,
since multiplication and division by 2 can be implemented more efficiently in binary arithmetic.
Rice was motivated to propose this simpler subset due to the fact that geometric distributions are often varying with time, not precisely known, or both, so selecting the
seemingly optimal code might not be very advantageous.
Rice coding is used as the entropy encoding stage in a number of lossless image compression and audio data compression methods.
Overview
Construction of codes
Golomb coding uses a tunable parameter M to divide an input value x into two parts: q, the result of a division by M, and r, the remainder. The quotient is sent in unary coding,
followed by the remainder in truncated binary encoding. When , Golomb coding is equivalent to unary coding.
Golomb–Rice codes can be thought of as codes that indicate a number by the position of the bin (q), and the offset within the bin (r). The example figure shows the position q
and offset r for the encoding of integer x using Golomb–Rice parameter M = 3, with source probabilities following a geometric distribution with p(0) = 0.2.
Formally, the two parts are given by the following expression, where x is the nonnegative integer being encoded:
and
Both q and r will be encoded using variable numbers of bits: q by a unary code, and r by b bits for Rice code, or a choice
between b and b+1 bits for Golomb code (i.e. M is not a power of 2), with . If , then use b bits
to encode r; otherwise, use b+1 bits to encode r. Clearly, if M is a power of 2 and we can encode all values of r
with b bits.
The integer x treated by Golomb was the run length of a Bernoulli process, which has a geometric distribution starting at 0.
The best choice of parameter M is a function of the corresponding Bernoulli process, which is parameterized by
the probability of success in a given Bernoulli trial. M is either the median of the distribution or the median ±1.
It can be determined by these inequalities:
Golomb coding example for a source x with geometric
distribution, with parameter p(0) = 0.2, using Golomb
code with M = 3. The 2-bit code 00 is used 20% of the
which are solved by
time; the 3-bit codes 010, 011, and 100 are used over
38% of the time; 4 bits or more are needed in a
. minority of cases. For this source, entropy = 3.610 bits;
for this code with this source, rate = 3.639 bits;
therefore redundancy = 0.030 bits, or efficiency =
For the example with p(0) = 0.2: 0.992 bits per bit.
The Golomb code for this distribution is equivalent to the Huffman code for the same probabilities, if it were possible to compute the Huffman code for the infinite set of source
values.
Golomb's scheme was designed to encode sequences of non-negative numbers. However, it is easily extended to accept sequences containing negative numbers using an overlap
and interleave scheme, in which all values are reassigned to some positive number in a unique and reversible way. The sequence begins: 0, −1, 1, −2, 2, −3, 3, −4, 4, ... The n-th
negative value (i.e., ) is mapped to the nth odd number ( ), and the mth positive value is mapped to the m-th even number ( ). This may be expressed mathematically
as follows: a positive value x is mapped to ( ), and a negative value y is mapped to (
). Such a code may be used for simplicity, even if suboptimal. Truly optimal codes for two-
sided geometric distributions include multiple variants of the Golomb code, depending on the distribution parameters,
including this one.[2]
Simple algorithm
Below is the Rice–Golomb encoding, where the remainder code uses simple truncated binary encoding, also named "Rice
coding" (other varying-length binary encodings, like arithmetic or Huffman encodings, are possible for the remainder codes,
if the statistic distribution of remainder codes is not flat, and notably when not all possible remainders after the division are
used). In this algorithm, if the M parameter is a power of 2, it becomes equivalent to the simpler Rice encoding:
Decoding:
1. Decode the unary representation of q (count the number of 1 in the beginning of the code)
2. Skip the 0 delimiter
3. Let
1. Interpret next b bits as a binary number r'. If holds, then the reminder
2. Otherwise interpret b + 1 bits as a binary number r', the reminder is given by
4. Compute
Example
Set M = 10. Thus . The cutoff is .
0 0 0 0 0000 000
1 10 1 1 0001 001
7 13 1101 1101
8 14 1110 1110
N
9 15 1111 1111
For example, with a Rice–Golomb encoding using parameter M = 10, the decimal number 42 would first be split into q = 4 and r = 2, and would be encoded as
qcode(q),rcode(r) = qcode(4),rcode(2) = 11110,010 (you don't need to encode the separating comma in the output stream, because the 0 at the end of the q code is enough to say
when q ends and r begins; both the qcode and rcode are self-delimited).
corresponds to unary (n ≥ 0 P′s followed by a Q is encoded as n ones followed by a zero). If a simpler code is desired, one can assign Golomb–Rice parameter b (i.e., Golomb
parameter ) to the integer nearest to ; although not always the best parameter, it is usually the best Rice parameter and its compression performance is
quite close to the optimal Golomb code. (Rice himself proposed using various codes for the same data to figure out which was best. A later JPL researcher proposed various
methods of optimizing or estimating the code parameter.[3])
Consider using a Rice code with a binary portion having b bits to run-length encode sequences where P has a probability p. If is the probability that a bit
will be part of an k-bit run ( Ps and one Q) and is the compression ratio of that run, then the expected compression ratio is
Compression is often expressed in terms of , the proportion compressed. For , the run-length coding approach results in compression ratios
close to entropy. For example, using Rice code for yields 91.89% compression, while the entropy limit is 91.92%.
An alternative approach to efficiently encode integer data whose PDF is not known, or is varying, is to use a backwards-adaptive encoder. The run-length Golomb–Rice (RLGR)
code (https://www.researchgate.net/publication/4230021_Adaptive_run-lengthGolomb-Rice_encoding_of_quantized_generalized_Gaussian_sources_with_unknown_statist
ics) achieves that using a very simple algorithm that adjusts the Golomb–Rice parameter up or down, depending on the last encoded symbol. A decoder can follow the same rule
to track the variation of the encoding parameters, so no side information needs to be transmitted, just the encoded data. Assuming a generalized Gaussian PDF, which covers a
wide range of statistics seen in data such as prediction errors or transform coefficients in multimedia codecs, the RLGR encoding algorithm can perform very well in such
applications.
Applications
Numerous signal codecs use a Rice code for prediction residues. In predictive algorithms, such residues tend to fall into a two-
sided geometric distribution, with small residues being more frequent than large residues, and the Rice code closely
approximates the Huffman code for such a distribution without the overhead of having to transmit the Huffman table. One
signal that does not match a geometric distribution is a sine wave, because the differential residues create a sinusoidal signal
whose values are not creating a geometric distribution (the highest and lowest residue values have similar high frequency of
occurrences, only the median positive and negative residues occur less often).
Several lossless audio codecs, such as Shorten,[4] FLAC,[5] Apple Lossless, and MPEG-4 ALS, use a Rice code after the linear
prediction step (called "adaptive FIR filter" in Apple Lossless). Rice coding is also used in the FELICS lossless image codec.
The Golomb–Rice coder is used in the entropy coding stage of Rice algorithm based lossless image codecs. One such
experiment yields the compression ratio graph shown.
See also
Elias delta coding
Variable-length code
References
1. Gallager, R. G.; van Voorhis, D. C. (1975). "Optimal source codes for geometrically distributed integer alphabets". IEEE Transactions on Information Theory. 21 (2): 228–
230. doi:10.1109/tit.1975.1055357 (https://doi.org/10.1109%2Ftit.1975.1055357).
2. Merhav, N.; Seroussi, G.; Weinberger, M. J. (2000). "Coding of sources with two-sided geometric distributions and unknown parameters". IEEE Transactions on Information
Theory. 46 (1): 229–236. doi:10.1109/18.817520 (https://doi.org/10.1109%2F18.817520).
3. Kiely, A. (2004). Selecting the Golomb Parameter in Rice Coding (Technical report). Jet Propulsion Laboratory. 42-159.
4. "man shorten" (https://web.archive.org/web/20140130053525/http://www.etree.org/shnutils/shorten/support/doc/shorten.txt). Archived from the original (http://www.etree.org/
shnutils/shorten/support/doc/shorten.txt) on 2014-01-30. Retrieved 2008-12-07.
5. "FLAC - Format overview" (https://xiph.org/flac/documentation_format_overview.html). xiph.org.
Further reading
Golomb, Solomon W. (1966). Run-length encodings. IEEE Transactions on Information Theory, IT--12(3):399--401 (http://urchin.earth.li/~twic/Golombs_Original_Paper/)
Rice, Robert F.; Plaunt, R. (1971). "Adaptive Variable-Length Coding for Efficient Compression of Spacecraft Television Data". IEEE Transactions on Communications. 16
(9): 889–897. doi:10.1109/TCOM.1971.1090789 (https://doi.org/10.1109%2FTCOM.1971.1090789).
Robert F. Rice (1979), "Some Practical Universal Noiseless Coding Techniques (https://ntrs.nasa.gov/search.jsp?R=19790014634)", Jet Propulsion Laboratory, Pasadena,
California, JPL Publication 79—22, March 1979.
Witten, Ian Moffat, Alistair Bell, Timothy. "Managing Gigabytes: Compressing and Indexing Documents and Images." Second Edition. Morgan Kaufmann Publishers, San
Francisco CA. 1999 ISBN 1-55860-570-3
David Salomon. "Data Compression",ISBN 0-387-95045-1.
H. S. Malvar, Adaptive run-length/Golomb–Rice encoding of quantized generalized Gaussian sources with unknown statistics (https://www.researchgate.net/publication/4230
021_Adaptive_run-lengthGolomb-Rice_encoding_of_quantized_generalized_Gaussian_sources_with_unknown_statistics,), Proc. Data Compression Conference, 2006.
RLGR Entropy Encoding (https://msdn.microsoft.com/en-us/library/ff635165.aspx), Microsoft MS-RDPRFX Open Specification, RemoteFX codec for Remote Desktop
Protocol.
S. Büttcher, C. L. A. Clarke, and G. V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines (http://www.ir.uwaterloo.ca/book/) Archived (https://web.
archive.org/web/20201005195805/http://www.ir.uwaterloo.ca/book/) 2020-10-05 at the Wayback Machine. MIT Press, Cambridge MA, 2010.