uHunt Chapter 3 has six starred problems, and many more problems in total, on the topic of binary search. If you want to solve them, it helps to have a firm grasp of how that algorithm works.

The binary search algorithm is conceptually simple. In ancient times, like the early 1990’s, people used it instinctively when looking up words in a paper dictionary or finding phone numbers in the thick book that arrived on their doorstep. The process:

- Open the book to the middle.
- If your target entry is alphabetically lower than the current entry, split the left half of the book. Otherwise, split the right half.
- Repeat until done.

## Recursive Pseudocode

Getting a bit more formal, here’s the algorithm in pseudocode:

```
// return the position of k in a, or -1 if not found
binarySearch(sorted array a, target key k)
mid = the middle element of a
if k is less than mid, binary search the
left half of a
else if k is greater than mid, binary search
the right half of a
else if k == mid, return the position of mid
else return -1
```

Since binary search involves splitting a data structure into smaller pieces and repeatedly running the same algorithm on those pieces, recursion might seem like the right approach for implementing it. And that’s certainly one approach. But notice that the two recursive calls in the pseudocode are each the last statement executed in the function. This makes it easy to transform the recursive algorithm into an iterative algorithm without having to use an explicit stack. That’s the approach taken in Sedgewick’s Algorithms book. I’ll be starting with his binary search implementation to demonstrate how the algorithm works:

```
public static int indexOf(int[] a, int key) {
int lo = 0;
int hi = a.length - 1;
while (lo <= hi) {
// Key is in a[lo..hi] or not present.
int mid = lo + (hi - lo) / 2;
if (key < a[mid]) hi = mid - 1;
else if (key > a[mid]) lo = mid + 1;
else return mid;
}
return -1;
}
```

This `indexOf`

function contains only eight lines of executable Java code. But just reading the code may not be enough for you to visualize the behavior of the three indexes, `hi`

, `lo`

, and `mid`

, that control the search process. To make the inner working of the algorithm more intuitively understandable, I created a version of the program that prints its progress as it searches for the key.

## Example

My modified implementation uses the standard programming puzzle approach of a data set followed by a list of queries. The data set I’ll be using for this example is this set of ten random integers:

```
18 11 7 10 24 1 3 20 20 13
```

My test client reads the input into an array, sorts the array, and then repeatedly calls `indexOf`

with each query as the key. I’ll use two example keys to demonstrate how the algorithm works in different cases.

Let’s start by looking for the key `18`

. The visualization output looks like this after the first step:

```
mid = 0 + (9 - 0) / 2 = 4
0 1 2 3 4 5 6 7 8 9
1 3 7 10 11 13 18 20 20 24
...lo.............................................
......................mid.........................
................................................hi
................................key...............
key is higher than mid, so set lo to mid+1 = 5
```

Here’s the purpose of each line (not including the blank lines):

**Line 1: The mid calculation**

Each iteration of the binary search algorithm roughly splits the remaining search space into two halves (that’s the *binary* part of *binary search*). The `mid`

element doesn’t appear in either half. If `mid`

points to the key, then we’re done. Otherwise, we keep searching.

To find the midpoint, we use the following formula:

```
mid = lo + (half of the distance between lo and hi)
```

For an array with an odd number of elements, this formula should match your intuitive understanding of how to find a middle element. For example, if your array had 9 elements, the first step in the algorithm would have `lo = 0`

and `hi = 8`

. So `mid`

would be $0 + (8-0)/2 = 4$. With the midpoint at element 4, there are four elements to the left (`0..3`

) and four to the right (`5..8`

) of the midpoint.

In the example above, the array has an even number of elements, which means the distance between the first and last element is an odd number, so it’s not divisible by two. Java and other languages solve this problem by using *integer division* when dividing two integers. This has the effect of discarding any remainder. For the midpoint calculation above, the integer division $9/2$ results in a split with four elements to the left of the midpoint and five to the right. This is fine. The algorithm works with an unequal split, and since the split is off by at most one, the total number of iterations remains $O(\log_2 n)$

**Line 2: Array element positions**

`lo`

, `hi`

, and `mid`

contain positions in the array, not the array data. These position values, of course, start with zero. Line 2 shows the position of each element in the array.

**Line 3: Array data**

When deciding which half of the array to search next, we have to compare array data, not array positions. Since the array is sorted, we know that the elements in the left half are smaller than the element at the midpoint, and the elements in the right half are larger.

You may notice that this explanation doesn’t exactly work with the given test data, because there is a duplicate element — the `20`

at positions `7`

and `8`

. This algorithm doesn’t officially support duplicate elements, but it’s instructive to observe its behavior when `20`

is passed as the key. (It returns the position of the first `20`

).

**Lines 4-7: Control variables**

Lines 4-7 show the positions of the four control variables. `lo`

and `hi`

define the range of array elements we are currently searching. At each iteration, the size of this range is halved, which accounts for the $O(\log_2 n)$ runtime of the algorithm. `mid`

is the approximate midpoint of the range, as described above. `key`

is the position of the key, which of course we wouldn’t know in advance. When `mid`

and `key`

point to the same element, the search is over. If the `lo`

positions is higher than the `hi`

position, then the key doesn’t exist in the array.

**Line 8: What to do next**

Line 8 describes how and why to adjust the control variables for the next iteration. There are three possibilities, based on the relationship between `key`

and the element that `mid`

points to:

`key`

is smaller than the`mid`

element: Set`hi`

to`mid-1`

, the next smaller element. The next iteration will search from`lo`

to`mid-1`

.`key`

is larger than the`mid`

element: Set`lo`

to`mid+1`

, the next larger element. The next iteration will search from`mid+1`

to`hi`

.`key`

is equal to the`mid`

element: Return`mid`

, the position of the key.

After carrying out one of these three actions, it’s possible that the control variables will be in an invalid state. Our algorithm needs to detect this state and return an appropriate value. I’ll show an example of this below.

## Finding 18

Before we look for an element that doesn’t exist, here’s the rest of the process of finding `key = 18`

:

```
mid = 5 + (9 - 5) / 2 = 7
0 1 2 3 4 5 6 7 8 9
1 3 7 10 11 13 18 20 20 24
............................lo....................
.....................................mid..........
................................................hi
................................key...............
key is lower than mid, so set hi to mid-1 = 6
```

At the end of the previous iteration, we set `lo`

to `5`

, since the key was higher than the midpoint element. So we’re now searching the rightmost half of the array, a subset of five elements. The odd number of elements allows us to pick an exact midpoint, position `7`

. But we haven’t yet found the key, so we’ll need at least one more iteration. Since `key`

is now to the left of the midpoint, we adjust the `hi`

position according to the rules.

```
mid = 5 + (6 - 5) / 2 = 5
0 1 2 3 4 5 6 7 8 9
1 3 7 10 11 13 18 20 20 24
............................lo....................
...........................mid....................
.................................hi...............
................................key...............
key is higher than mid, so set lo to mid+1 = 6
```

With `lo`

now at position `5`

and `hi`

at `6`

, there’s no room in the middle for the midpoint element. But our integer division process handles this fine, and `mid`

ends up at position `5`

along with `lo`

, since $(6-5)/2$ truncates to $0$. We didn’t quite find the key, so we’ll need another iteration.

```
key is higher than mid, so set lo to mid+1 = 6
mid = 6 + (6 - 6) / 2 = 6
0 1 2 3 4 5 6 7 8 9
1 3 7 10 11 13 18 20 20 24
.................................lo...............
................................mid...............
.................................hi...............
................................key...............
key is equal to mid; found it
index: 6
value: 18
----------
```

There’s nowhere for the key to hide in this iteration, and we end up with all four control variables pointing to the key position. This doesn’t always happen in the last step of the algorithm, but it’s a common configuration. Only `mid`

and `key`

are required to be at the same position for the algorithm to terminate.

## (Not) Finding 25

Now that we know how the algorithm in a normal case, let’s find out what happens when we provide a key value that doesn’t exist in the array, like `25`

.

```
mid = 0 + (9 - 0) / 2 = 4
0 1 2 3 4 5 6 7 8 9
1 3 7 10 11 13 18 20 20 24
...lo.............................................
......................mid.........................
................................................hi
..................................................
key is higher than mid, so set lo to mid+1 = 5
```

As you can see, the `key`

indicator doesn’t show up in the diagram, since there is no `25`

in the array. But my visualization algorithm only knows this because it cheats (looks ahead). The binary search algorithm happily determines that `key = 25`

is higher than `a[mid] = 11`

, so it continues to search the rightmost half of the array.

```
mid = 5 + (9 - 5) / 2 = 7
0 1 2 3 4 5 6 7 8 9
1 3 7 10 11 13 18 20 20 24
............................lo....................
.....................................mid..........
................................................hi
..................................................
key is higher than mid, so set lo to mid+1 = 8
```

So far, so good. `key`

is still higher.

```
mid = 8 + (9 - 8) / 2 = 8
0 1 2 3 4 5 6 7 8 9
1 3 7 10 11 13 18 20 20 24
...........................................lo.....
..........................................mid.....
................................................hi
..................................................
key is higher than mid, so set lo to mid+1 = 9
```

We still haven’t found the key. Maybe it’s farther to the right?

```
mid = 9 + (9 - 9) / 2 = 9
0 1 2 3 4 5 6 7 8 9
1 3 7 10 11 13 18 20 20 24
................................................lo
...............................................mid
................................................hi
..................................................
key is higher than mid, so set lo to mid+1 = 10
```

`lo`

, `mid`

, and `hi`

are all at the rightmost edge of the array, but `key`

is still higher. Following the rules, we set `lo`

to `mid + 1 = 10`

. This is outside the bounds of the array, but the actual check that the algorithm uses is `lo <= hi`

. Since this condition is false, the loop exits, and we return the special value `-1`

to indicate that the key was not found.

```
lo = 10, hi = 9
lo and hi have crossed; key isn't in the list
index: -1
------------------------------
```

## Binary Search

It’s useful to take apart even a simple algorithm like iterative binary search. Though the visualization may make it easy to understand, coming up with this algorithm from scratch would require careful thinking about loop conditions and how to calculate array positions.

Once you understand the basic algorithm well, you can extend it to cover binary search scenarios that the standard implementation doesn’t handle. More on that in a future post.

(Image credit: Pleuntje)