SAMtools

SAMtools
Original author(s)Heng Li
Developer(s)John Marshall and Petr Danecek et al [1]
Initial release2009
Stable release
1.20 / April 15, 2024; 6 months ago (2024-04-15)[2]
Repository
Written inC
Operating systemUnix-like
TypeBioinformatics
LicenseBSD, MIT
Websitewww.htslib.org Edit this on Wikidata

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map) and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA. Both simple and advanced tools are provided, supporting complex tasks like variant calling and alignment viewing as well as sorting, indexing, data extraction and format conversion.[3] SAM files can be very large (tens of Gigabytes is common), so compression is used to save space. SAM files are human-readable text files, and BAM files are simply their binary equivalent, whilst CRAM files are a restructured column-oriented binary container format. BAM files are typically compressed and more efficient for software to work with than SAM. SAMtools makes it possible to work directly with a compressed BAM file, without having to uncompress the whole file. Additionally, since the format for a SAM/BAM file is somewhat complex - containing reads, references, alignments, quality information, and user-specified annotations - SAMtools reduces the effort needed to use SAM/BAM files by hiding low-level details.

As third-party projects were trying to use code from SAMtools despite it not being designed to be embedded in that way, the decision was taken in August 2014 to split the SAMtools package into a stand-alone software library with a well-defined API (HTSlib),[4] a project for variant calling and manipulation of variant data (BCFtools), and the stand-alone SAMtools package for working with sequence alignment data.[5]

  1. ^ "SAM tools". SourceForge.
  2. ^ "Releases · samtools/samtools". github.com. Retrieved 2024-07-29.
  3. ^ Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. (August 2009). "The Sequence Alignment/Map format and SAMtools" (PDF). Bioinformatics. 25 (16): 2078–9. doi:10.1093/bioinformatics/btp352. PMC 2723002. PMID 19505943.
  4. ^ Bonfield JK, Marshall J, Danecek P, Li H, Ohan V, Whitwham A, et al. (February 2021). "HTSlib: C library for reading/writing high-throughput sequencing data". GigaScience. 10 (2). doi:10.1093/gigascience/giab007. PMC 7931820. PMID 33594436.
  5. ^ Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. (February 2021). "Twelve years of SAMtools and BCFtools". GigaScience. 10 (2). doi:10.1093/gigascience/giab008. PMC 7931819. PMID 33590861.