Multiple sequence alignment

First 90 positions of a protein multiple sequence alignment of instances of the acidic ribosomal protein P0 (L10E) from several organisms. Generated with ClustalX.

Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations (single amino acid or nucleotide changes), insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides.

Multiple sequence alignments require more sophisticated methodologies than pairwise alignments, as they are more computationally complex. Most multiple sequence alignment programs use heuristic methods rather than global optimization because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive. However, heuristic methods generally cannot guarantee high-quality solutions and have been shown to fail to yield near-optimal solutions on benchmark test cases.[1][2][3]

  1. ^ Thompson JD, Linard B, Lecompte O, Poch O (2011). "A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives". PLOS ONE. 6 (3): e18093. Bibcode:2011PLoSO...618093T. doi:10.1371/journal.pone.0018093. PMC 3069049. PMID 21483869.
  2. ^ Cite error: The named reference nuin2006 was invoked but never defined (see the help page).
  3. ^ Hosseininasab A, van Hoeve WJ (2019). "Exact Multiple Sequence Alignment by Synchronized Decision Diagrams". INFORMS Journal on Computing. doi:10.1287/ijoc.2019.0937. S2CID 109937203.