Entropy coding methods
Asymmetric numeral systems (ANS )[ 1] [ 2] is a family of entropy encoding methods introduced by Jarosław (Jarek) Duda [ 3] from Jagiellonian University , used in data compression since 2014[ 4] due to improved performance compared to previous methods.[ 5] ANS combines the compression ratio of arithmetic coding (which uses a nearly accurate probability distribution ), with a processing cost similar to that of Huffman coding . In the tabled ANS (tANS) variant, this is achieved by constructing a finite-state machine to operate on a large alphabet without using multiplication.
Among others, ANS is used in the Facebook Zstandard compressor[ 6] [ 7] (also used e.g. in Linux kernel,[ 8] Google Chrome browser,[ 9] Android [ 10] operating system, was published as RFC 8478 for MIME [ 11] and HTTP [ 12] ), Apple LZFSE compressor,[ 13] Google Draco 3D compressor[ 14] (used e.g. in Pixar Universal Scene Description format[ 15] ) and PIK image compressor,[ 16] CRAM DNA compressor[ 17] from SAMtools utilities,[ 18]
NVIDIA nvCOMP high speed compression library,[ 19]
Dropbox DivANS compressor,[ 20] Microsoft DirectStorage BCPack texture compressor,[ 21] and JPEG XL [ 22] image compressor.
The basic idea is to encode information into a single natural number
x
{\displaystyle x}
. In the standard binary number system, we can add a bit
s
∈
{
0
,
1
}
{\displaystyle s\in \{0,1\}}
of information to
x
{\displaystyle x}
by appending
s
{\displaystyle s}
at the end of
x
{\displaystyle x}
, which gives us
x
′
=
2
x
+
s
{\displaystyle x'=2x+s}
. For an entropy coder, this is optimal if
Pr
(
0
)
=
Pr
(
1
)
=
1
/
2
{\displaystyle \Pr(0)=\Pr(1)=1/2}
. ANS generalizes this process for arbitrary sets of symbols
s
∈
S
{\displaystyle s\in S}
with an accompanying probability distribution
(
p
s
)
s
∈
S
{\displaystyle (p_{s})_{s\in S}}
. In ANS, if the information from
s
{\displaystyle s}
is appended to
x
{\displaystyle x}
to result in
x
′
{\displaystyle x'}
, then
x
′
≈
x
⋅
p
s
−
1
{\displaystyle x'\approx x\cdot p_{s}^{-1}}
. Equivalently,
log
2
(
x
′
)
≈
log
2
(
x
)
+
log
2
(
1
/
p
s
)
{\displaystyle \log _{2}(x')\approx \log _{2}(x)+\log _{2}(1/p_{s})}
, where
log
2
(
x
)
{\displaystyle \log _{2}(x)}
is the number of bits of information stored in the number
x
{\displaystyle x}
, and
log
2
(
1
/
p
s
)
{\displaystyle \log _{2}(1/p_{s})}
is the number of bits contained in the symbol
s
{\displaystyle s}
.
For the encoding rule, the set of natural numbers is split into disjoint subsets corresponding to different symbols – like into even and odd numbers, but with densities corresponding to the probability distribution of the symbols to encode. Then to add information from symbol
s
{\displaystyle s}
into the information already stored in the current number
x
{\displaystyle x}
, we go to number
x
′
=
C
(
x
,
s
)
≈
x
/
p
{\displaystyle x'=C(x,s)\approx x/p}
being the position of the
x
{\displaystyle x}
-th appearance from the
s
{\displaystyle s}
-th subset.
There are alternative ways to apply it in practice – direct mathematical formulas for encoding and decoding steps (uABS and rANS variants), or one can put the entire behavior into a table (tANS variant). Renormalization is used to prevent
x
{\displaystyle x}
going to infinity – transferring accumulated bits to or from the bitstream.
^ J. Duda, K. Tahboub, N. J. Gadil, E. J. Delp, The use of asymmetric numeral systems as an accurate replacement for Huffman coding , Picture Coding Symposium, 2015.
^ J. Duda, Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding , arXiv:1311.2540, 2013.
^ "Dr Jarosław Duda (Jarek Duda)" . Institute of Theoretical Physics . Jagiellonian University in Krakow. Retrieved 2021-08-02 .
^ Duda, Jarek (October 6, 2019). "List of compressors using ANS, implementations and other materials" . Retrieved October 6, 2019 .
^ "Google Accused of Trying to Patent Public Domain Technology" . Bleeping Computer . September 11, 2017.
^ Smaller and faster data compression with Zstandard , Facebook, August 2016.
^ 5 ways Facebook improved compression at scale with Zstandard , Facebook, December 2018.
^ Zstd Compression For Btrfs & Squashfs Set For Linux 4.14, Already Used Within Facebook , Phoronix, September 2017.
^ New in Chrome 123 (Content-Encoding) , Google, March 2024.
^ "Zstd in Android P release" . Archived from the original on 2020-08-26. Retrieved 2019-05-29 .
^ Zstandard Compression and The application/zstd Media Type (email standard) .
^ Hypertext Transfer Protocol (HTTP) Parameters , IANA .
^ Apple Open-Sources its New Compression Algorithm LZFSE , InfoQ, July 2016.
^ Google Draco 3D compression library .
^ Google and Pixar add Draco Compression to Universal Scene Description (USD) Format .
^ Google PIK: new lossy image format for the internet .
^ CRAM format specification (version 3.0) .
^ Chen W, Elliott LT (2021). "Compression for population genetic data through finite-state entropy" . J Bioinform Comput Biol . 19 (5): 2150026. doi :10.1142/S0219720021500268 . PMID 34590992 .
^ High Speed Data Compression Using NVIDIA GPUs .
^ Building better compression together with DivANS .
^ Microsoft DirectStorage overview .
^ Rhatushnyak, Alexander; Wassenberg, Jan; Sneyers, Jon; Alakuijala, Jyrki; Vandevenne, Lode; Versari, Luca; Obryk, Robert; Szabadka, Zoltan; Kliuchnikov, Evgenii; Comsa, Iulia-Maria; Potempa, Krzysztof; Bruse, Martin; Firsching, Moritz; Khasanova, Renata; Ruud van Asseldonk; Boukortt, Sami; Gomez, Sebastian; Fischbacher, Thomas (2019). "Committee Draft of JPEG XL Image Coding System". arXiv :1908.03565 [eess.IV ].