LeetCode 1143: Longest Common Subsequence asks:
Given two strings, return the length of their longest common subsequence.
As we saw when solving the Longest Increasing Subsequence problem, a subsequence is formed by selecting elements from the input, with the only restriction being that they remain in their original order. For the LCS problem, if our two input strings are text1
and text2
, we can make a common subsequence using these steps: Iterate through text1
in order. For each character, either select it or don’t select it. When we get to the end of text1
, we’ll have a subsequence made up of characters from text1
. Then iterate through text2
and look for those same characters, in the same order. If we can find them all, we have found a common subsequence. We need to find the longest such subsequence.
In the process just described, we decide at each character position whether to include the character in the subsequence, or not include it. Two choices at each position gives us $O(2^n)$ subsequences, which is not practical unless the string is tiny. But the fact that we make a decision at each position is a hint that we could use dynamic programming to create a more efficient solution.
While the Longest Increasing Subsequence problem involved only one input array, the two input strings in this problem are an sign that a 2D table may be required to store our results.
You’ll see at the end that the dynamic programming implementation for this problem is fairly straightforward. But it can take some thinking to understand how it works. Here are the key ideas.
The 2D Table
In the problems we have seen so far, we often start by creating a dp
array with the same dimensions as the input array. For LCS, we have two input strings, text1
and text2
. Let the length of text1
be m
and the length of text2
be n
. We can’t just use a dp1
array of size m
and a dp2
array of size n
because the character selections are related: If we select a character from text1
, we have to select the same character in the same order from text2
to make our common subsequence. So our dp
table needs to be two-dimensional, with n+1
rows and m+1
columns (the +1
s are an implementation detail, as explained below). Think of each row i
as representing the character text1[i]
and each column j
as representing the character text2[j]
. Then we can use dp[i][j]
to store the length of the longest common subsequence starting at text1[i]
and text2[j]
.
The Loops
To fill out a 2D table using a bottom-up approach, we need two nested loops. We’ll use the outer loop to iterate through the rows, which represent the characters of text1
, and the inner loop to iterate through the columns, which represent the characters of text2
.
If dp[i][j]
is the length of the LCS starting at text1[i]
and text2[j]
, then our solution will be in dp[0][0]
when the loops complete. But that means we can’t start our loops at the first characters of text1
and text2
. How would we know what to store in dp[0][0]
?
But if we start at the last characters of the input strings, it’s easy to figure out the base case by hand: If the last characters match, then we can write 1
to the dp
table, since the matching characters give us a LCS of size 1 starting at the last characters of both strings. If the last characters don’t match, we write 0
. So we’ll run our loops in reverse, from the last characters to the first characters.
The Body
Now we’re at the core of the solution. We’re iterating from right to left on each string, building our LCS one character at a time and building our dp
table row by row, starting at the bottom right and moving left and up. For each text1[i]
and text2[j]
, there are two possibilities:
If
text1[i]
andtext2[j]
match, we can extend our LCS by one character. This means its length is one more than the best solution starting at the next position in both strings,text1[i+1]
andtext2[j+1]
. In our table, that meansdp[i][j]
=dp[i+1][j+1] + 1
.If
text1[i]
andtext2[j]
don’t match, then those positions don’t contribute to the LCS. For this case, the length of the LCS is the longest of either the LCS starting attext1[i+1]
andtext2[j]
or the LCS starting attext1[i]
andtext2[j+1]
. In our table, that meansdp[i][j] = max(dp[i+1][j], dp[i][j+1])
.
Space optimization
You might have noticed that we look only one row and one column ahead in the dp
table. So, as often happens with dynamic programming, we don’t actually need the full table. To keep the implementation simple, I’m going to leave it as-is. But you can optimize memory usage by saving only the part of the table you’re currently working on.
Pseudocode
let m = length of text1
let n = length of text2
let dp = int array of size (m+1) x (n+1)
for row from m-1 to 0
for col from n-1 to 0
if text1[row] == text2[col]
dp[row][col] = dp[row+1][col+1] + 1
else
dp[row][col] = max(dp[row+1][col], dp[row][col+1])
return dp[0][0]
For an implementation of the regular and memory-optimized versions, see votrubac’s solution on LeetCode.
(Image credit: DALLĀ·E 3)
For an introduction to this year’s project, see A Project for 2024.