LeetCode 1143: Longest Common Subsequence asks:

Given two strings, return the length of their longest common subsequence.

As we saw when solving the Longest Increasing Subsequence problem, a subsequence is formed by selecting elements from the input, with the only restriction being that they remain in their original order. For the LCS problem, if our two input strings are `text1`

and `text2`

, we can make a common subsequence using these steps: Iterate through `text1`

in order. For each character, either select it or don’t select it. When we get to the end of `text1`

, we’ll have a subsequence made up of characters from `text1`

. Then iterate through `text2`

and look for those same characters, in the same order. If we can find them all, we have found a common subsequence. We need to find the longest such subsequence.

In the process just described, we decide at each character position whether to include the character in the subsequence, or not include it. Two choices at each position gives us $O(2^n)$ subsequences, which is not practical unless the string is tiny. But the fact that we make a decision at each position is a hint that we could use dynamic programming to create a more efficient solution.

While the Longest Increasing Subsequence problem involved only one input array, the two input strings in this problem are an sign that a 2D table may be required to store our results.

You’ll see at the end that the dynamic programming implementation for this problem is fairly straightforward. But it can take some thinking to understand how it works. Here are the key ideas.

## The 2D Table

In the problems we have seen so far, we often start by creating a `dp`

array with the same dimensions as the input array. For LCS, we have two input strings, `text1`

and `text2`

. Let the length of `text1`

be `m`

and the length of `text2`

be `n`

. We can’t just use a `dp1`

array of size `m`

and a `dp2`

array of size `n`

because the character selections are related: If we select a character from `text1`

, we have to select the same character in the same order from `text2`

to make our *common* subsequence. So our `dp`

table needs to be two-dimensional, with `n+1`

rows and `m+1`

columns (the `+1`

s are an implementation detail, as explained below). Think of each row `i`

as representing the character `text1[i]`

and each column `j`

as representing the character `text2[j]`

. Then we can use `dp[i][j]`

to store the length of the longest common subsequence starting at `text1[i]`

and `text2[j]`

.

## The Loops

To fill out a 2D table using a bottom-up approach, we need two nested loops. We’ll use the outer loop to iterate through the rows, which represent the characters of `text1`

, and the inner loop to iterate through the columns, which represent the characters of `text2`

.

If `dp[i][j]`

is the length of the LCS starting at `text1[i]`

and `text2[j]`

, then our solution will be in `dp[0][0]`

when the loops complete. But that means we can’t start our loops at the first characters of `text1`

and `text2`

. How would we know what to store in `dp[0][0]`

?

But if we start at the last characters of the input strings, it’s easy to figure out the base case by hand: If the last characters match, then we can write `1`

to the `dp`

table, since the matching characters give us a LCS of size 1 starting at the last characters of both strings. If the last characters don’t match, we write `0`

. So we’ll run our loops in reverse, from the last characters to the first characters.

## The Body

Now we’re at the core of the solution. We’re iterating from right to left on each string, building our LCS one character at a time and building our `dp`

table row by row, starting at the bottom right and moving left and up. For each `text1[i]`

and `text2[j]`

, there are two possibilities:

If

`text1[i]`

and`text2[j]`

match, we can extend our LCS by one character. This means its length is one more than the best solution starting at the next position in both strings,`text1[i+1]`

and`text2[j+1]`

. In our table, that means`dp[i][j]`

=`dp[i+1][j+1] + 1`

.If

`text1[i]`

and`text2[j]`

don’t match, then those positions don’t contribute to the LCS. For this case, the length of the LCS is the longest of either the LCS starting at`text1[i+1]`

and`text2[j]`

or the LCS starting at`text1[i]`

and`text2[j+1]`

. In our table, that means`dp[i][j] = max(dp[i+1][j], dp[i][j+1])`

.

## Space optimization

You might have noticed that we look only one row and one column ahead in the `dp`

table. So, as often happens with dynamic programming, we don’t actually need the full table. To keep the implementation simple, I’m going to leave it as-is. But you can optimize memory usage by saving only the part of the table you’re currently working on.

## Pseudocode

```
let m = length of text1
let n = length of text2
let dp = int array of size (m+1) x (n+1)
for row from m-1 to 0
for col from n-1 to 0
if text1[row] == text2[col]
dp[row][col] = dp[row+1][col+1] + 1
else
dp[row][col] = max(dp[row+1][col], dp[row][col+1])
return dp[0][0]
```

For an implementation of the regular and memory-optimized versions, see votrubac’s solution on LeetCode.

(Image credit: DALLĀ·E 3)

*For an introduction to this year’s project, see A Project for 2024*.