Spaced seed

An animated example to show the utility of a spaced seed. First, an attempt to identify a local candidate match without a space seed is made (unsuccessfully), before attempting the same task with a simple spaced seed, where a hit is found successfully. Green indicates a matching base position. See here for more details.

In bioinformatics, a spaced seed is a pattern of relevant and irrelevant positions in a biosequence and a method of approximate string matching that allows for substitutions. They are a straightforward modification to the earliest heuristic-based alignment efforts that allow for minor differences between the sequences of interest. Spaced seeds have been used in homology search.,[1] alignment,[2] assembly,[3] and metagenomics.[4] They are usually represented as a sequence of zeroes and ones, where a one indicates relevance and a zero indicates irrelevance at the given position. Some visual representations use pound signs for relevant and dashes or asterisks for irrelevant positions.

  1. ^ Ma, Bin; Tromp, John; Li, Ming (March 2002). "PatternHunter: faster and more sensitive homology search". Bioinformatics. 18 (3): 440–445. doi:10.1093/bioinformatics/18.3.440. PMID 11934743.
  2. ^ David, Matei; Dzamba, Misko; Lister, Dan; Ilie, Lucian; Brudno, Michael (April 2011). "SHRiMP2: Sensitive yet Practical Short Read Mapping". Bioinformatics. 27 (7): 1011–1012. doi:10.1093/bioinformatics/btr046. PMID 21278192.
  3. ^ Birol, I; Chu, J; Mohamadi, H; Jackman, S. D.; Raghavan, K; Vandervalk, B. P.; Raymond, A; Warren, René L. (2015). "Spaced Seed Data Structures for De Novo Assembly". International Journal of Genomics. 2015: 196591. doi:10.1155/2015/196591. PMC 4619942. PMID 26539459.
  4. ^ Břinda, Karel; Sykulski, Maciej; Kucherov, Gregory (November 2015). "Spaced seeds improve k-mer-based metagenomic classification". Bioinformatics. 31 (22): 3584–3592. arXiv:1502.06256. Bibcode:2015arXiv150206256B. doi:10.1093/bioinformatics/btv419. PMID 26209798.