Alignment-free sequence analysis

In bioinformatics, alignment-free sequence analysis approaches to molecular sequence and structure data provide alternatives over alignment-based approaches.[1]

The emergence and need for the analysis of different types of data generated through biological research has given rise to the field of bioinformatics.[2] Molecular sequence and structure data of DNA, RNA, and proteins, gene expression profiles or microarray data, metabolic pathway data are some of the major types of data being analysed in bioinformatics. Among them sequence data is increasing at the exponential rate due to advent of next-generation sequencing technologies. Since the origin of bioinformatics, sequence analysis has remained the major area of research with wide range of applications in database searching, genome annotation, comparative genomics, molecular phylogeny and gene prediction. The pioneering approaches for sequence analysis were based on sequence alignment either global or local, pairwise or multiple sequence alignment.[3][4] Alignment-based approaches generally give excellent results when the sequences under study are closely related and can be reliably aligned, but when the sequences are divergent, a reliable alignment cannot be obtained and hence the applications of sequence alignment are limited. Another limitation of alignment-based approaches is their computational complexity and are time-consuming and thus, are limited when dealing with large-scale sequence data.[5] The advent of next-generation sequencing technologies has resulted in generation of voluminous sequencing data. The size of this sequence data poses challenges on alignment-based algorithms in their assembly, annotation and comparative studies.

  1. ^ Cite error: The named reference Vinga was invoked but never defined (see the help page).
  2. ^ Rothberg J, Merriman B, Higgs G (September 2012). "Bioinformatics. Introduction". The Yale Journal of Biology and Medicine. 85 (3): 305–308. PMC 3447194. PMID 23189382.
  3. ^ Batzoglou S (March 2005). "The many faces of sequence alignment". Briefings in Bioinformatics. 6 (1): 6–22. doi:10.1093/bib/6.1.6. PMID 15826353.
  4. ^ Mullan L (March 2006). "Pairwise sequence alignment--it's all about us!". Briefings in Bioinformatics. 7 (1): 113–115. doi:10.1093/bib/bbk008. PMID 16761368.
  5. ^ Kemena C, Notredame C (October 2009). "Upcoming challenges for multiple sequence alignment methods in the high-throughput era". Bioinformatics. 25 (19): 2455–2465. doi:10.1093/bioinformatics/btp452. PMC 2752613. PMID 19648142.