Lessons from Competitive Programming 3, Chapter Two

CP3Chapter2

This post is part of a series of commentaries covering each chapter of Competitive Programming 3 by Steven and Felix Halim. You can find the rest of the posts in the series by visiting my CP3 post category.

Competitive Programming 3 (CP3) Chapter 2 covers a selection of data structures that are useful for solving competitive programming problems. These data structures are used in the Chapter 2 uHunt problems, and in subsequent chapters in the book. In some cases, Chapter 2 just provides a brief introduction to topics that are covered extensively elsewhere. For example, Section 2.4.1 covers some basic information about representing a graph, while Chapter 4 contains the book’s full treatment of graph algorithms. And Section 2.2 devotes a paragraph each to common data structures like arrays, stacks and queues, which readers are expected to know about already. In other cases, Chapter 2 goes into more depth. For example, it’s harder to find information about the Segment Tree data structure in other sources, so it is covered in more detail in this chapter.

As in the rest of the book, each section in this chapter ends with a list of recommended UVa Online Judge problems. The same list can be found on uHunt, where you can also track your progress.

In addition to UVa OJ problems, each section also has written problems. For example, a question in the section on stacks and arrays asks the reader how to implement a stack using an array. Answers are supplied for some of these problems at the end of the chapter. I have just been skimming these written problems. The appeal for me of the CP3/uHunt approach is that problems are “graded” automatically by a computer. Solving problems and then looking up answers at the end of the chapter is the traditional textbook self-study approach. While that can be useful, especially you have a peer or instructor to review your written answers, my goal in using this book is learning through coding. I found plenty of opportunities to learn about the Chapter 2 data structures while I was solving the corresponding uHunt problems.

Selecting a Data Structure

In the chapter overview, the authors argue that for competitive programming, it’s more important to know the interface and performance characteristics of a data structure than the details of how it is implemented. That matches my experience with the Chapter 2 problems. For the problems that I found difficult, the challenge was usually inherent in the problem, and not related to understanding a data structure. For example, it may be tricky to figure out that UVa 10107 – What is the Median? can be solved using priority queues, but once you have that idea, you can solve the problem without knowing how priority queues are implemented.

The exception to this rule: For problems that use data structures without standard implementations, it is necessary to know the implementation details, since you usually have to customize the data structures. UVa 11402 – Ahoy, Pirates! is a good example from this chapter. Neither the segment tree implementation provided in the chapter, nor others that I found online (excluding those that specifically targeted this problem), was sufficient to solve the problem. I had to write my own implementation.

Even for problems that only use standard data structures, there is an indirect benefit to studying implementation details: All of these data structures were invented to solve problems. Learning how famous problems were solved using new data structures can provide problem-solving ideas for the less famous problems found in programming competitions. For example, the CP3 authors note that many of the Chapter 2 data structures use the Divide and Conquer strategy. Studying this strategy in the context of standard data structures like Binary Search Trees helps you learn it so that you can use it for other purposes.

Standard Data Structures

Sections 2.2 and 2.3 cover data structures that are available in the C++ Standard Library (don’t call it the STL) and the Java Class Library (also called the Java API). Not coincidentally, these data structures are also covered in introductory algorithms classes, like those found on Coursera or in the first year of college computer science programs. Here’s the familiar list: array, resizeable array, linked list, stack, queue/dequeue, binary search tree, heap, and hash table.

The two sections on standard data structures provide a brief review of each data structure, and some tips on using them for competitive programming. The review is just to set some context for readers who have already studied algorithms and data structures. The tips are part of what makes this book a useful supplement to a standard algorithms textbook. Here are some examples:

  • Problem statements often specify sizes and ranges, so you should use a fixed-size array when you can rather than the somewhat slower resizeable array.
  • While you will rarely need to write your own sorting algorithm, the concepts used in these algorithms can be useful to know for solving other types of problems.
  • Competitive programming problems rarely used linked lists. However, you will encounter them in coding interviews.

One topic that gets more coverage in this chapter than in standard algorithm references is bit manipulation. After a brief mention of the bitset (C++) and BitSet (Java) classes, the authors spend a few pages on how to use bitwise operations on integers to efficiently store and manipulate strings of bits. While this is not something that most developers need to worry about for real-world code, it comes up often enough in contest problems that it’s worth learning.

Hash Tables and Binary Search Trees

The authors make an interesting observations about hash tables. They contend that hash tables should not be used in programming contests unless no other option is available. Their reasoning: the difficulty in designing an appropriate hash function outweighs the performance advantage that a hash table-based data structure has over the alternatives.

Here’s the argument: The performance of a hash table is closely tied to how well its associated hash function distributes keys into buckets. When the hash function does this well, there are few collisions, and a hash table implementation can carry out search, insert, and delete operations in $O(1)$ (constant) time. At the opposite extreme, frequent collisions cause these operations to take $O(n)$ (linear) time, since the code needs to scan through long lists of elements in each bucket.

An alternative to a hash table is a self-balancing binary tree — for example, a red-black tree. This data structure supports the same operations as a hash table (search, insert, delete), but the worst-case performance for these operations is $O(\log n)$ rather than the hash table’s $O(n)$. This makes it less likely that a bad design will result in a solution that exceeds the time limit.

What about the best-case performance of a self-balancing BST? It’s also $O(\log n)$. But the authors argue that since a programming competition problem typically has an upper limit of $n = 10^6$, the difference between $O(1)$ and $O(\log n)$ won’t be important. Self-balancing binary trees are designed to keep their height within a constant factor of the minimum tree height, which in this example is $\lfloor \log_2(10^6)\rfloor = 19$. This puts a reasonable upper bound on the time required to carry out the three operations in a programming problem context.

In my experience with the uHunt problems so far, I haven’t run into any performance difficulties related to hash functions. In Chapter 2, I used HashMap in problems 599, 10507, 11503, 11572, and 11991. I ran into performance challenges in problems 732, 11340, 11402, 11572, and 12356, but most of these didn’t use hashing. For the one that did (11572), performance problems weren’t related to the HashMap.

Even the CP3 hints recommend hashing for two problems in this chapter, 10226 and 11849. I solved those using TreeMap and TreeSet, respectively. My approach is to use search trees when they provide a feature that is useful for the problem — such as returning data in sorted order, or efficiently iterating through a list of values — and use hash tables otherwise. As I work through the remaining chapters, I’ll be keeping an eye out for situations where hash function performance might be a concern.

Custom Data Structures

The largest section in Chapter 2 — Section 2.4 — covers data structures that aren’t available in the C++ Standard Library or the Java Class Library. It includes descriptions and implementations for graphs, union-find disjoint sets, segment trees, and binary indexed (Fenwick) trees. The first two enjoy plenty of coverage in standard algorithms references, but the last two are nowhere to be found even in encyclopedic references like Sedgewick and CLRS.

The CP3 authors want their book to be a one-stop reference for beginning competitive programmers, so they discuss the full set of topics that they consider useful for that audience. But they include more details about topics that have less coverage in standard references. Is there enough detail for those topics? Here’s my experience: I found the discussion of both segment trees and Fenwick trees to be useful. The Fenwick tree section provided more than enough information for me to solve problem 12532. For problem 11402, I did a lot of research on segment trees beyond what was included in section 2.4.3 of this book.

While CP3 is a useful guide through the topics required for a beginner to get better at programming contests, it’s not intended to replace other books on algorithms, data structures, and math. It’s helpful to have those other books around, and CP3 provides a good bibliography that they refer to throughout this chapter.

CP3 Chapter 2 in Review

The CP3 target audience consists of programmers who have either completed a formal first-year algorithms and data structures class, or who are self-motivated enough to learn the material on their own. With that background in mind, the purpose of Chapter 2 is to steer the reader towards the topics that will be most useful for solving competitive programming problems. There’s no way a 35-page chapter can replace a semester or quarter worth of algorithms education, especially with eight of those pages devoted to two rather specialized data structures. So the goal in reading the chapter should be to understand where the authors think you should spend your time.

Here’s how I would summarize their advice:

  • Learn the syntax and usage of the standard data structures in your preferred language. Focus on data structures based on search trees over those based on hash tables.
  • Learn how to use bitwise operations as a lightweight way to represent and manipulate a set of Boolean values.
  • Find or build good implementations of data structures that don’t come with your language: graph representations (adjacency matrix, adjacency list, and edge list); disjoint set; segment tree; and Fenwick tree.

As always, solving practice problems is an essential part of learning this material. But before you can use a data structure in your solution, you have to know that it exists and how it works. CP3 Chapter 2 is a good survey of the options.