Original author(s) | Julian Seward |
---|---|
Developer(s) | Mark Wielaard, Federico Mena, Micah Snyder |
Initial release | 18 July 1996[1] |
Stable release | 1.0.8
/ 13 July 2019 |
Repository | https://gitlab.com/bzip2/bzip2/ |
Operating system | Cross-platform[which?] |
Type | Data compression |
License | Modified zlib license[2] |
Website | sourceware |
Filename extension | .bz2 |
---|---|
Internet media type | application/x-bzip2 |
Type code | Bzp2 |
Uniform Type Identifier (UTI) | public.bzip2-archive[3] |
Magic number | BZh |
Developed by | Julian Seward |
Type of format | Data compression |
Open format? | Yes |
bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities for tasks such as handling multiple files, encryption, and archive-splitting.
bzip2 was initially released in 1996 by Julian Seward. It compresses most files more effectively than older LZW and Deflate compression algorithms but is slower. bzip2 is particularly efficient for text data, and decompression is relatively fast. The algorithm uses several layers of compression techniques, such as run-length encoding (RLE), Burrows–Wheeler transform (BWT), move-to-front transform (MTF), and Huffman coding. bzip2 compresses data in blocks between 100 and 900 kB and uses the Burrows–Wheeler transform to convert frequently recurring character sequences into strings of identical letters. The move-to-front transform and Huffman coding are then applied. The compression performance is asymmetric, with decompression being faster than compression.
The algorithm has gone through multiple maintainers since its initial release, with Micah Snyder being the maintainer since June 2021. There have been some modifications to the algorithm, such as pbzip2, which uses multi-threading to improve compression speed on multi-CPU and multi-core computers.
bzip2 is suitable for use in big data applications with cluster computing frameworks like Hadoop and Apache Spark, as a compressed block can be decompressed without having to process earlier blocks.