Last week I wrote about solving UVa 11085 using the recursive backtracking search technique. This week, I’m going to outline a more general approach to recursive backtracking problems. Then I’ll show how it can be adapted to solve UVa 574.
UVa 574 asks us to solve this problem:
Given a multiset $S$ of integers and a target sum $t$, find all subsets of $S$ that sum to $t$.
Recursive Backtracking Characteristics
If a problem has these characteristics, it may be amenable to a recursive backtracking solution:
- It requires searching through a set of possible solutions. There’s no way to compute its solutions directly, or determine in advance what part of the complete solution space will contain the desired solutions.
- It involves building up complete solutions from partial solutions.
- As each element is added to a partial solution, it is possible to decide whether the partial solution is valid.
Here’s how those characteristics apply to UVa 574 and UVa 11085:
Complete search
- UVa 574: The solution space consists of subsets of $S$. Our goal is to find which of those subsets sum to $t$. To do that, we have to create subsets and calculate sums. We can use some shortcuts that allow us to avoid looking at every possible subset, but we still need to look through many of them.
- UVa 11085: The solution space consists of placements of eight queens on a standard chessboard. We have to check some fraction of all possible placements to determine which ones contain no pairs of queens that attack each other.
Complete solutions from partial solutions
- UVa 574: If we have a subset of $S$ whose sum is less than $t$, we can add another integer from $S$ to the subset, and get the new sum using just one operation. By keeping track of partial sums, we avoid having to go back and recalculate the sum of the whole subset.
- UVa 11085: We move towards a complete board by adding a new queen to a partial board.
Partial solution validity
- UVa 574: As we build up each subset, we can check if its sum is $t$, which means we’re done. We can also make sure we’re always using a unique element (not necessarily a unique integer) from $S$. (This is another rule from the problem statement).
- UVa 11085: If we have a valid board containing $x < 8$ queens, we can add another queen and check it for validity by comparing it with the $x$ previous queen positions. If we check each new queen as we add it to the board, we don’t have to check every pair of queens again at each step, since we know that the partial board is valid.
Recursive Backtracking Program Structure
Here’s a high-level structure for a recursive backtracking solution. First we need a method to parse the test case and kick off the recursion:
parseTestCase()
read and parse test case input
initialize case-level variables
call recursive function with initial arguments
parseTestCase
calls a recursive function that takes care of searching the solution space. For a two-dimensional solution space, this backtrack
function accepts an x
value as a parameter, and loops through a set of y
values inside the function. For example, x
and y
could be the row and column numbers on a chessboard:
backtrack(parameters, including value x for one dimension)
for each value y for another dimension
if (x, y) is valid
add (x, y) to partial solution
if solution is complete and valid
calculate/save/print result
else if more x values are available
recursively call backtrack with x+1
else do nothing (continue with next y value)
else do nothing (continue with next y value)
Based on the third characteristic of recursive backtracking problems, we know there’s a way to determine if (x, y)
is a valid element to add to the current solution. If determining this takes more than one step, it’s best to encapsulate the decision in a separate function, which we’ll call place
:
place(parameters)
using parameters and case-level data,
determine whether this is a valid element
if so, return true
otherwise, return false
Levels of Data
To solve a recursive backtracking problem, we’ll operate on data at two levels:
- Case-level: This data is valid at any point during test case processing. Examples of this type of data include the test case input and output. The input never changes once it is read from stdin, and the output never changes once it is printed to stdout. Therefore, we can rely on these not to be in an intermediate state. There may be other canonical data, like a list of prime numbers to compare against results (see UVa 524).
- Function-level: This data is specific to a particular recursive function call. Therefore, it must be local to the function, either as a function argument or as a local variable. An example of this type of data is a partial solution, like a chessboard with some open columns.
It’s important to store your data at the right level. The benefit of a recursive function is that the function’s local state is saved for you on each call. But if you put your partial solution data outside the function, the compiler won’t know that you need a separate copy for each call. This might cause one partial solution to clobber another one. I’ll cover this in more detail in the example that follows.
Keeping case-level data at the function level is not necessarily a problem, but it can consume more memory than necessary. Since case-level data is common to all recursive calls, you’ll just be storing a lot of duplicate data. If your case-level data is large, you may also run into time limit problems due to the overhead of passing it as function parameters.
UVa 574: Sum It Up
Now that we have a general framework for a recursive backtracking solution, let’s see how to apply it to UVa 574.
Parse test case
In this step, we have to parse the test case input, and set up case-level variables. I used four case-level variables for this problem:
int[] input
: The integers in the input multiset, in the order they are provided in the test case input.HashSet<String> output
: Each output row gets stored here. The algorithm can generate duplicate results, so this is how I avoid printing duplicates, which the output instructions prohibit.int[] set
: A solution set that eventually sums to the target total. Rather than storing the actual integers, I store indexes into thenumbers
array. This is more useful for tracking which numbers have been used in the current set: positions are unique, while input numbers can repeat. (That’s what multiset means).
It should be clear that input
and output
are case-level variables. You might interpret set
as a function-level variable, and that design would work as well. However, I decided to declare it at the case level using the maximum possible set size (the number of input integers). This allowed me to re-use it across function calls by keeping track of the current set position, which saves memory and execution time.
Backtrack
For UVa 574, each recursive call to backtrack
needs these parameters:
int sum
: The sum of the elements in the current subset. Passing this as a parameter means you don’t have to recalculate the sum on each call.int setPosition
: The position for the next number in the current subset. Passing this as a parameter means you don’t have to pass an entire subset array: allbacktrack
calls can share the same array.TreeSet<Integer> availablePositions
: Input number positions that have not yet been used. The test case input is sorted in nonincreasing order, and the output must also be shown in that order. ATreeSet
allows us to efficiently select only the input integers that are less than or equal to the most recently used element.
With these parameters, we can fill out the specific backtrack
function for this problem, following the general pattern shown above. At a high level, this function needs to:
- Iterate through the set of input numbers.
- Skip any that are too large.
- Add the others to the current set.
- Print a result, recursively call itself to try the next set position, or backtrack to the previous set position.
Here’s the pseudocode:
backtrack(sum, setPos, availPos)
previousPos = previous position in set
remainingPos = (elements >= previousPos in availPos)
for each numberPos in remainingPos
if place(sum, numberPos)
add numberPos to set
newSum = sum + input number at numberPos
if newSum == total
format output row
if it is not already in the output set
print it
add it to the output set
else if newSum < total and we still have input numbers
remove numberPos from availPos
backtrack(newSum, setPos+1, availPos)
A few more details about backtrack
:
- The
place
function for UVa 574 just checks if the current sum plus the candidate number is less than or equal the target sum. That’s simple enough that we don’t really need a separate function, but it doesn’t hurt to learn the standard pattern for future use. - The
(x, y)
pair in this case is(setPos, numberPos)
. Those are analogous to the column and row board positions in UVa 11085. - Building the
remainingPos
set rather than iterating through every element inavailPos
lowers the runtime significantly. SinceavailPos
is aTreeSet
, its keys are sorted. Using thetailSet
method in Java, we can slice off only the numbers that meet the nonincreasing rule specified in the problem statement. - There are a few other Java implementation details, but they’re all topics that I covered in Java Lessons from uHunt Chapter 2.
Why Use Recursion?
For appropriate problems, recursion is a convenient way to simplify your code by getting the compiler to keep track of intermediate results. Here’s how it works for UVa 574:
- You call
backtrack
with thesum
,setPos
, andavailPos
arguments. The values that you pass in are specific to the current function call. backtrack
creates other local variables,previousPos
andremainingPos
. These are also specific to the current call. You don’t have to remember which sum goes with which set position, list of available positions, etc.- The
numberPos
loop begins, checking each number inremainingPos
. - If the current sum is below the target sum,
backtrack
makes a recursive call to itself. The current values are saved on the call stack, and a new copy ofbacktrack
begins with the new values. - If the current sum is equal to the target sum, the result is printed. Since there is no recursive call, the loop continues with the next value of
numberPos
, and all of the local variables maintain their current values. This is what we want for recursive backtracking problems, because new solutions build on previous solutions. Even though we found a valid solution, there may be other valid solutions that we can build from the same set. For example, we might find a solution $5=3+2$, and then later find $5=3+1+1$. - If
place
returnsfalse
for the current sum and number position, then we have exceeded the target sum. In this case, there is also no recursive call, so we continue with the next loop iteration. Since available numbers are sorted in nonincreasing order, there may be a smaller number that will gives us the target sum. - Once the
numberPos
loop completes, the current call tobacktrack
returns. This is the actual backtracking that the technique is named for, since it causes the code to return to a previousbacktrack
call, and therefore a previous set position. ThenumberPos
loop that was in progress at the time of the previous call then resumes, checking the remaining numbers at the previous position. - This process continues until the very first
backtrack
call completes.
The process of looping through available input values and recursively checking set positions finds all of the required subsets. It also saves time compared to a complete search, by eliminating many subsets that we know won’t provide a valid result.
(Image credit: James Brooks)