SAM (file format)

SAM file format
Filename extension
.sam
Developed by
Type of formatBioinformatics
Extended fromTab-separated values
Websitesamtools.github.io/hts-specs/

Sequence Alignment Map (SAM) is a text-based format originally for storing biological sequences aligned to a reference sequence developed by Heng Li and Bob Handsaker et al.[1] It was developed when the 1000 Genomes Project wanted to move away from the MAQ mapper format and decided to design a new format. The overall TAB-delimited flavour of the format came from an earlier format inspired by BLAT’s PSL. The name of SAM came from Gabor Marth from University of Utah, who originally had a format under the same name but with a different syntax more similar to a BLAST output.[2] It is widely used for storing data, such as nucleotide sequences, generated by next generation sequencing technologies, and the standard has been broadened to include unmapped sequences. The format supports short and long reads (up to 128 Mbp[3]) produced by different sequencing platforms and is used to hold mapped data within the Genome Analysis Toolkit (GATK) and across the Broad Institute, the Wellcome Sanger Institute, and throughout the 1000 Genomes Project.

  1. ^ a b Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. (2009). "The Sequence Alignment/Map format and SAMtools" (PDF). Bioinformatics. 25 (16): 2078–2079. doi:10.1093/bioinformatics/btp352. ISSN 1367-4803. PMC 2723002. PMID 19505943.
  2. ^ Edmunds, Scott (2021-02-17). "Play it again, SAMtools. Q&A with the SAMtools team on 12 years of providing bioinformatics "glue"". GigaScience. Retrieved 2021-03-20.
  3. ^ Dörpinghaus, J.; Weil, V.; Schaaf, S.; Apke, A. (2023). Computational Life Sciences: Data Engineering and Data Mining for Life Sciences. Studies in Big Data. Springer International Publishing. p. 447. ISBN 978-3-031-08411-9. Retrieved 2023-07-19.