Why is Java I/O Slow?

Tortoises

You can read a lot on Quora about the best language to use for competitive programming. Here are some of the points covered by those questions:

  • C and C++ execute quickly, and their macro support can reduce the amount of code that you end up typing in your solution.
  • A language like Java can be useful for problems with some specific requirements (such as integers that don’t fit in 64 bits, or calendar problems).
  • The main contributor to execution speed is the algorithm that you use, not the language that you choose to implement it. But the language can provide a performance edge at the margins.
  • Choice of language is less important for more recent contests, since problem setters have made an effort to level the playing field (e.g., by testing Java solutions to ensure that the time limit is sufficient).

I decided to use Java for Project 462, mainly because it’s similar to my primary language (C#), and I’m more interested in learning about problem solving and algorithms than learning a new language that I’m unlikely to use much outside of competitive programming.

One thing about the uHunt problems that I’m working through is that they draw from a database of contest problems going back to the 1990s. Competitive programming and programming languages have changed a lot since then, so what the problem setters were going for in the original contest may not match up with how a contestant sees the problem today. Modern contests tend to be more forgiving of the slower execution time of languages that aren’t C or C++, and they know about the fancy libraries that programmers have access to.

Last week I wrote about Solving UVa 11340 in Java, and covered some performance tips related to reading files that are larger than those typically found in programming puzzles. It turns out that the very next starred problem, UVa 12356: Army Buddies, is an even stricter test of I/O performance. And for this one, there is no particular hint in the problem statement that I/O will be an issue.

Once I came up with an Accepted solution for UVa 12356 in Java, I decided to do some more benchmarking, and use the results to update my solution template. Figuring all of this out once is educational, but I’d rather not be fiddling with I/O issues for every problem that happens to have large input or output requirements.

Output Tests

As with the UVa 11340 benchmark, I created a test file containing random text. But in this case, the file creation process was part of the experiment. For this test, I avoided any character set issues by using only characters 32 through 126, inclusive. I ran each experiment 9 times, and reported the median result. (In practice, there isn’t much variance between results for this test). I generated 100 million bytes (including newlines) to ensure a reasonable runtime for both writing and reading: large enough to provide good timing data, but small enough that all of the tests ran in a reasonable amount of time. I also generated a simple hash that I calculated by summing the character values of each character in the file. I calculated execution time by comparing the starting and ending System.nanoTime values. See the end of this post for a link to the source.

System.out.println

The first output test just uses standard System.out.print/println calls. Its median runtime on my machine is about 262 seconds. Most programming puzzles don’t involve 100 MB of output, so they wouldn’t get anywhere near this execution time. But for printing one character at a time (not an uncommon scenario in programming puzzles), it is nevertheless not a very efficient option.

BufferedWriter

Java provides an option that eliminates most of the inefficiency inherent in calling System.out.print 100 million times: the BufferedWriter class. Replacing calls to System.out.print with calls to BufferedWriter.write reduces the median test runtime to about 4.0 seconds, a 98.5% decrease.

BufferedWriter + StringBuilder

While BufferedWriter provides a significant performance increase over System.out.print, I found one more way to squeeze some additional performance out of output code. A Quora answer related to UVa 12356 suggested combining BufferedWriter and StringBuilder to further reduce overhead. The basic idea is to allocate a large StringBuilder and write it using BufferedWriter when it is mostly full, rather than making BufferedWriter do all of the work. My test using this approach ran in about 2.4 seconds, a 40.3% improvement over using BufferedWriter alone, and a 99.1% improvement over the System.out.print approach.

Clearly, output can be a huge bottleneck for puzzles that require a lot of it. So rather than worry about it on a case-by-case basis, it’s best just to encapsulate one of the buffered options in a re-usable template class, and use it for every problem.

Input Tests

With output taken care of, and a 100 MB test file available, it’s time to evaluate reading performance. For the reading test, I read each line of the file produced by one of the output tests, sum each character in the line, and report the grand total at the end of the file. This can be compared with the total calculated by the output test, as a sanity check that both tests see the same data.

Scanner

Java has many options for receiving input from stdin, and Scanner is a popular one. Using the Scanner approach (calling hasNextLine and nextLine for each line), the median test runtime on my machine is about 3.8 seconds.

BufferedReader

The BufferedReader test is almost identical to the Scanner test, except that the methods are called ready and readLine instead of hasNextLine and nextLine. The median runtime for this test is about 0.73 seconds, an 81.0% improvement over Scanner.

For this basic I/O test, there isn’t a useful parallel to the third write test. But as with BufferedWriter, using BufferedReader is significantly faster than the unbuffered approach. There may be some further gains to be made using separate threads for I/O and processing. I’ll look into that if I find a programming problem that requires it.

Summary

Here’s a summary of the performance results, rounded to 3 significant figures. “X% improvement” means the runtime has been reduced by X% of the previous test (e.g., 10 seconds to 6 seconds is a 40% improvement). The percentages are calculated using the underlying data from System.nanoTime, so they won’t match exactly if you try to calculate them yourself (due to rounding).

Write Tests

  1. System.out.println: 262 seconds
  2. BufferedWriter: 3.99 seconds (98.5% improvement)
  3. BufferedWriter + StringBuilder: 2.38 seconds (40.3% improvement, 99.1% total improvement)

Read Tests

  1. Scanner: 3.83 seconds
  2. BufferedReader: 0.729 seconds (81.0% improvement)

Source Code

If you want to try these tests on your own machine, or use the code for other purposes, it is in GitHub.

(Image credit: Mikhail Esteves)