This article includes a list of general references, but it lacks sufficient corresponding inline citations. (December 2013) |
In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method must have an expected code length greater than or equal to the entropy of the source.[1]
More precisely, the source coding theorem states that for any source distribution, the expected code length satisfies , where is the number of symbols in a code word, is the coding function, is the number of symbols used to make output codes and is the probability of the source symbol. An entropy coding attempts to approach this lower bound.
Two of the most common entropy coding techniques are Huffman coding and arithmetic coding.[2] If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code may be useful. These static codes include universal codes (such as Elias gamma coding or Fibonacci coding) and Golomb codes (such as unary coding or Rice coding).
Since 2014, data compressors have started using the asymmetric numeral systems family of entropy coding techniques, which allows combination of the compression ratio of arithmetic coding with a processing cost similar to Huffman coding.