Red-Green-Code

Deliberate practice techniques for software developers

  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter

Visualizing Binary Search

By Duncan Smith May 11 0

Search

uHunt Chapter 3 has six starred problems, and many more problems in total, on the topic of binary search. If you want to solve them, it helps to have a firm grasp of how that algorithm works.

The binary search algorithm is conceptually simple. In ancient times, like the early 1990’s, people used it instinctively when looking up words in a paper dictionary or finding phone numbers in the thick book that arrived on their doorstep. The process:

  • Open the book to the middle.
  • If your target entry is alphabetically lower than the current entry, split the left half of the book. Otherwise, split the right half.
  • Repeat until done.

Recursive Pseudocode

Getting a bit more formal, here’s the algorithm in pseudocode:

// return the position of k in a, or -1 if not found
binarySearch(sorted array a, target key k)
    mid = the middle element of a
    if k is less than mid, binary search the
        left half of a
    else if k is greater than mid, binary search
        the right half of a
    else if k == mid, return the position of mid
    else return -1

Since binary search involves splitting a data structure into smaller pieces and repeatedly running the same algorithm on those pieces, recursion might seem like the right approach for implementing it. And that’s certainly one approach. But notice that the two recursive calls in the pseudocode are each the last statement executed in the function. This makes it easy to transform the recursive algorithm into an iterative algorithm without having to use an explicit stack. That’s the approach taken in Sedgewick’s Algorithms book. I’ll be starting with his binary search implementation to demonstrate how the algorithm works:

public static int indexOf(int[] a, int key) {
    int lo = 0;
    int hi = a.length - 1;
    while (lo <= hi) {
        // Key is in a[lo..hi] or not present.
        int mid = lo + (hi - lo) / 2;
        if      (key < a[mid]) hi = mid - 1;
        else if (key > a[mid]) lo = mid + 1;
        else return mid;
    }
    return -1;
}

This indexOf function contains only eight lines of executable Java code. But just reading the code may not be enough for you to visualize the behavior of the three indexes, hi, lo, and mid, that control the search process. To make the inner working of the algorithm more intuitively understandable, I created a version of the program that prints its progress as it searches for the key.

Example

My modified implementation uses the standard programming puzzle approach of a data set followed by a list of queries. The data set I’ll be using for this example is this set of ten random integers:

18  11  7   10  24  1   3   20  20  13

My test client reads the input into an array, sorts the array, and then repeatedly calls indexOf with each query as the key. I’ll use two example keys to demonstrate how the algorithm works in different cases.

Let’s start by looking for the key 18. The visualization output looks like this after the first step:

mid = 0 + (9 - 0) / 2 = 4

    0    1    2    3    4    5    6    7    8    9
    1    3    7   10   11   13   18   20   20   24
...lo.............................................
......................mid.........................
................................................hi
................................key...............

key is higher than mid, so set lo to mid+1 = 5

Here’s the purpose of each line (not including the blank lines):

Line 1: The mid calculation

Each iteration of the binary search algorithm roughly splits the remaining search space into two halves (that’s the binary part of binary search). The mid element doesn’t appear in either half. If mid points to the key, then we’re done. Otherwise, we keep searching.

To find the midpoint, we use the following formula:

mid = lo + (half of the distance between lo and hi)

For an array with an odd number of elements, this formula should match your intuitive understanding of how to find a middle element. For example, if your array had 9 elements, the first step in the algorithm would have lo = 0 and hi = 8. So mid would be $0 + (8-0)/2 = 4$. With the midpoint at element 4, there are four elements to the left (0..3) and four to the right (5..8) of the midpoint.

In the example above, the array has an even number of elements, which means the distance between the first and last element is an odd number, so it’s not divisible by two. Java and other languages solve this problem by using integer division when dividing two integers. This has the effect of discarding any remainder. For the midpoint calculation above, the integer division $9/2$ results in a split with four elements to the left of the midpoint and five to the right. This is fine. The algorithm works with an unequal split, and since the split is off by at most one, the total number of iterations remains $O(\log_2 n)$

Line 2: Array element positions

lo, hi, and mid contain positions in the array, not the array data. These position values, of course, start with zero. Line 2 shows the position of each element in the array.

Line 3: Array data

When deciding which half of the array to search next, we have to compare array data, not array positions. Since the array is sorted, we know that the elements in the left half are smaller than the element at the midpoint, and the elements in the right half are larger.

You may notice that this explanation doesn’t exactly work with the given test data, because there is a duplicate element — the 20 at positions 7 and 8. This algorithm doesn’t officially support duplicate elements, but it’s instructive to observe its behavior when 20 is passed as the key. (It returns the position of the first 20).

Lines 4-7: Control variables

Lines 4-7 show the positions of the four control variables. lo and hi define the range of array elements we are currently searching. At each iteration, the size of this range is halved, which accounts for the $O(\log_2 n)$ runtime of the algorithm. mid is the approximate midpoint of the range, as described above. key is the position of the key, which of course we wouldn’t know in advance. When mid and key point to the same element, the search is over. If the lo positions is higher than the hi position, then the key doesn’t exist in the array.

Line 8: What to do next

Line 8 describes how and why to adjust the control variables for the next iteration. There are three possibilities, based on the relationship between key and the element that mid points to:

  • key is smaller than the mid element: Set hi to mid-1, the next smaller element. The next iteration will search from lo to mid-1.
  • key is larger than the mid element: Set lo to mid+1, the next larger element. The next iteration will search from mid+1 to hi.
  • key is equal to the mid element: Return mid, the position of the key.

After carrying out one of these three actions, it’s possible that the control variables will be in an invalid state. Our algorithm needs to detect this state and return an appropriate value. I’ll show an example of this below.

Finding 18

Before we look for an element that doesn’t exist, here’s the rest of the process of finding key = 18:

mid = 5 + (9 - 5) / 2 = 7

    0    1    2    3    4    5    6    7    8    9
    1    3    7   10   11   13   18   20   20   24
............................lo....................
.....................................mid..........
................................................hi
................................key...............

key is lower than mid, so set hi to mid-1 = 6

At the end of the previous iteration, we set lo to 5, since the key was higher than the midpoint element. So we’re now searching the rightmost half of the array, a subset of five elements. The odd number of elements allows us to pick an exact midpoint, position 7. But we haven’t yet found the key, so we’ll need at least one more iteration. Since key is now to the left of the midpoint, we adjust the hi position according to the rules.

mid = 5 + (6 - 5) / 2 = 5

    0    1    2    3    4    5    6    7    8    9
    1    3    7   10   11   13   18   20   20   24
............................lo....................
...........................mid....................
.................................hi...............
................................key...............

key is higher than mid, so set lo to mid+1 = 6

With lo now at position 5 and hi at 6, there’s no room in the middle for the midpoint element. But our integer division process handles this fine, and mid ends up at position 5 along with lo, since $(6-5)/2$ truncates to $0$. We didn’t quite find the key, so we’ll need another iteration.

key is higher than mid, so set lo to mid+1 = 6

mid = 6 + (6 - 6) / 2 = 6

    0    1    2    3    4    5    6    7    8    9
    1    3    7   10   11   13   18   20   20   24
.................................lo...............
................................mid...............
.................................hi...............
................................key...............

key is equal to mid; found it
index: 6
value: 18
----------

There’s nowhere for the key to hide in this iteration, and we end up with all four control variables pointing to the key position. This doesn’t always happen in the last step of the algorithm, but it’s a common configuration. Only mid and key are required to be at the same position for the algorithm to terminate.

(Not) Finding 25

Now that we know how the algorithm in a normal case, let’s find out what happens when we provide a key value that doesn’t exist in the array, like 25.

mid = 0 + (9 - 0) / 2 = 4

    0    1    2    3    4    5    6    7    8    9
    1    3    7   10   11   13   18   20   20   24
...lo.............................................
......................mid.........................
................................................hi
..................................................

key is higher than mid, so set lo to mid+1 = 5

As you can see, the key indicator doesn’t show up in the diagram, since there is no 25 in the array. But my visualization algorithm only knows this because it cheats (looks ahead). The binary search algorithm happily determines that key = 25 is higher than a[mid] = 11, so it continues to search the rightmost half of the array.

mid = 5 + (9 - 5) / 2 = 7

    0    1    2    3    4    5    6    7    8    9
    1    3    7   10   11   13   18   20   20   24
............................lo....................
.....................................mid..........
................................................hi
..................................................

key is higher than mid, so set lo to mid+1 = 8

So far, so good. key is still higher.

mid = 8 + (9 - 8) / 2 = 8

    0    1    2    3    4    5    6    7    8    9
    1    3    7   10   11   13   18   20   20   24
...........................................lo.....
..........................................mid.....
................................................hi
..................................................

key is higher than mid, so set lo to mid+1 = 9

We still haven’t found the key. Maybe it’s farther to the right?

mid = 9 + (9 - 9) / 2 = 9

    0    1    2    3    4    5    6    7    8    9
    1    3    7   10   11   13   18   20   20   24
................................................lo
...............................................mid
................................................hi
..................................................

key is higher than mid, so set lo to mid+1 = 10

lo, mid, and hi are all at the rightmost edge of the array, but key is still higher. Following the rules, we set lo to mid + 1 = 10. This is outside the bounds of the array, but the actual check that the algorithm uses is lo <= hi. Since this condition is false, the loop exits, and we return the special value -1 to indicate that the key was not found.

lo = 10, hi = 9
lo and hi have crossed; key isn't in the list
index: -1
------------------------------

Binary Search

It’s useful to take apart even a simple algorithm like iterative binary search. Though the visualization may make it easy to understand, coming up with this algorithm from scratch would require careful thinking about loop conditions and how to calculate array positions.

Once you understand the basic algorithm well, you can extend it to cover binary search scenarios that the standard implementation doesn’t handle. More on that in a future post.

(Image credit: Pleuntje)

Categories: Competitive Programming

Prev
Next

Stay in the Know

I'm trying out the latest learning techniques on software development concepts, and writing about what works best. Sound interesting? Subscribe to my free newsletter to keep up to date. Learn More
Unsubscribing is easy, and I'll keep your email address private.

Getting Started

Are you new here? Check out my review posts for a tour of the archives:

  • 2023 in Review: 50 LeetCode Tips
  • 2022 in Review: Content Bots
  • 2021 in Review: Thoughts on Solving Programming Puzzles
  • Lessons from the 2020 LeetCode Monthly Challenges
  • 2019 in Review
  • Competitive Programming Frequently Asked Questions: 2018 In Review
  • What I Learned Working On Time Tortoise in 2017
  • 2016 in Review
  • 2015 in Review
  • 2015 Summer Review

Archives

Recent Posts

  • Do Coding Bots Mean the End of Coding Interviews? December 31, 2024
  • Another Project for 2024 May 8, 2024
  • Dynamic Programming Wrap-Up May 1, 2024
  • LeetCode 91: Decode Ways April 24, 2024
  • LeetCode 70: Climbing Stairs April 17, 2024
  • LeetCode 221: Maximal Square April 10, 2024
  • Using Dynamic Programming for Maximum Product Subarray April 3, 2024
  • LeetCode 62: Unique Paths March 27, 2024
  • LeetCode 416: Partition Equal Subset Sum March 20, 2024
  • LeetCode 1143: Longest Common Subsequence March 13, 2024
Red-Green-Code
  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter
Copyright © 2025 Duncan Smith