Shortest common supersequence

In computer science, the shortest common supersequence of two sequences X and Y is the shortest sequence which has X and Y as subsequences. This is a problem closely related to the longest common subsequence problem. Given two sequences X = < x1,...,xm > and Y = < y1,...,yn >, a sequence U = < u1,...,uk > is a common supersequence of X and Y if items can be removed from U to produce X and Y.

A shortest common supersequence (SCS) is a common supersequence of minimal length. In the shortest common supersequence problem, two sequences X and Y are given, and the task is to find a shortest possible common supersequence of these sequences. In general, an SCS is not unique.

For two input sequences, an SCS can be formed from a longest common subsequence (LCS) easily. For example, the longest common subsequence of X and Y is Z. By inserting the non-LCS symbols into Z while preserving their original order, we obtain a shortest common supersequence U. In particular, the equation holds for any two input sequences.

There is no similar relationship between shortest common supersequences and longest common subsequences of three or more input sequences. (In particular, LCS and SCS are not dual problems.) However, both problems can be solved in time using dynamic programming, where is the number of sequences, and is their maximum length. For the general case of an arbitrary number of input sequences, the problem is NP-hard.[1]

  1. ^ David Maier (1978). "The Complexity of Some Problems on Subsequences and Supersequences". J. ACM. 25 (2). ACM Press: 322–336. doi:10.1145/322063.322075. S2CID 16120634.