LCS and LIS

Learn the ABC of Dynamic programming by LCS and LIS.

Longest common subsequence

The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from problems of finding common substrings: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files.

The basic idea of the problem is brute force, we can implement it easily by searching:

#include <stdio.h>
#include <string.h>
 
char str_a[1000], str_b[1000];
int len_a, len_b;
 
int max(int a, int b) {
  return a>b? a:b;
}
 
// search function to find the length of the LCS of str_a(substring whose index from 0 to m-1) and str_b(substring whose index from 0 to n-1)
int search(int m, int n) {
  if (m == 1 || n == 1) return 1;
  if (str_a[m-1] == str_b[n-1])
    return search(m-1, n-1) + 1;
  else
    return max(search(m-1, n), search(m, n-1));
}
 
int main() {
  while(~scanf("%s%s", str_a, str_b)) {
    len_a = strlen(str_a);
    len_b = strlen(str_b);
    printf("%d\n", search(len_a, len_b));
  }
}

There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping subproblems.

Optimal substructure means that the solution to a given optimization problem can be obtained by the combination of optimal solutions to its subproblems. Consequently, the first step towards devising a dynamic programming solution is to check whether the problem exhibits such optimal substructure. Such optimal substructures are usually described by means of recursion.

Overlapping subproblems means that the space of subproblems must be small, that is, any recursive algorithm solving the problem should solve the same subproblems over and over, rather than generating new subproblems.

It is obvious that the LCS problem has the optimal substructure and the overlapping subproblems. The search function in our later code could be invoked again and again by the same arguments, which causes a great waste in calculation. To solve it, we need to store the results of the subproblems in our memory. The code POJ_1458_search.cpp improves our previous algorithm by store the results.

POJ 1458 is a naked problem of LCS:

We call the later method “Top-down approach”.

Top-down approach: This is the direct fall-out of the recursive formulation of any problem. If the solution to any problem can be formulated recursively using the solution to its subproblems, and if its subproblems are overlapping, then one can easily memoize or store the solutions to the subproblems in a table. Whenever we attempt to solve a new subproblem, we first check the table to see if it is already solved. If a solution has been recorded, we can use it directly, otherwise we solve the subproblem and add its solution to the table.

We can also solve the LCS in “Bottom-up approach”.

Bottom-up approach: Once we formulate the solution to a problem recursively as in terms of its subproblems, we can try reformulating the problem in a bottom-up fashion: try solving the subproblems first and use their solutions to build-on and arrive at solutions to bigger subproblems. This is also usually done in a tabular form by iteratively generating solutions to bigger and bigger subproblems by using the solutions to small subproblems.

POJ 1458(By bottom-up approach):

Longest increasing subsequence

In computer science, the longest increasing subsequence problem is to find a subsequence of a given sequence in which the subsequence’s elements are in sorted order, lowest to highest, and in which the subsequence is as long as possible. This subsequence is not necessarily contiguous, or unique.

A brute force way to solve the problem is as follow:

#include <stdio.h>

int n;
// The input sequence
int a[1000];

// Return the length of an array which stores "the length of the longest sequence which end with the 'index' element"
int search(int index) {
  if(index == 0) return 1;
  int max = 0;
  for (int i=index-1; i>=0; --i) {
    if(a[i] < a[index]) {
      int res = search(i);
      if(res > max) max = res;
    }
  }
  return max + 1;
}

int main() {
  scanf("%d", &n);
  for (int i=0; i<n; ++i) 
    scanf("%d", a+i);

  int max = 0;
  for (int i=n-1; i>=0; --i) {
    int res = search(i);
    if(res > max) {
      max = res;
    }
  }
  printf("%d\n", max);

  return 0;
}

Then we improve the search to the “memorized search”:

POJ 2533:

We can also implement it in bottom up approach:

POJ 2533:

Reference:

Wikipedia