Profiling Java Programs with VisualVM

VisualVM

Last week, I wrote about ways to improve runtime performance for a Java solution to UVa 732. This week I’m going to cover a process for analyzing Java program performance using the profiling features of the VisualVM Java troubleshooting tool. But first, a note about profiling. As I mentioned last week, even a highly optimized program can fail to be accepted by an online judge if it uses the wrong algorithm. In other words, you generally can’t profile your way out of a slow design. Nevertheless, looking at a VisualVM snapshot of your implementation can provide some useful performance insights. Just don’t spin your wheels for too long making tiny adjustments to your implementation. You may need to go with a different design instead.

VisualVM

VisualVM has a number of capabilities, but I’m going to focus on its CPU profiling feature. The purpose of a CPU profiler is to determine how much time the CPU is spending when it executes each part of a program. Using that information, you can focus your optimization efforts where they will have the most impact. Profiling can also help just by showing you how actual Java code performs. This can improve your intuition about how to write faster code.

You can get VisualVM from its download page. As the download page explains, VisualVM is available in two distributions:

A stable version called Java VisualVM, which comes as a JDK tool.
A “bleeding-edge” version called VisualVM with “the latest features and bug fixes.” This version is available from the download page.

I decided to go with the bleeding-edge version, which at the time of this writing (July 2015) is at version 1.3.8. I’m using it on a Windows machine.

VisualVM is not the most full-featured Java profiler. There are commercial profilers that are fancier and potentially easier to use. But VisualVM is free and widely used, so support is easy to find.

Setup Steps

The VisualVM download page has a list of setup steps, which mainly consist of unzipping the archive and starting the program.

I also recommend installing one specific plugin. A quirk of standard VisualVM is that it wants to attach to a Java virtual machine that is already running. This is inconvenient for profiling a UVa solution. You have to run your program first and then attach the profiler, at which point your program has probably finished executing, or at best you have missed measuring the first part of the execution. A plugin called Startup Profiler, which you can find in the Tools | Plugins menu, resolves this problem by letting you start the profiler before you start your program. I’ll be using it in the examples below.

Profiling a UVa OJ Solution

If you have a UVa OJ solution that the online judge is rejecting because it doesn’t execute within the official time limit, it may be worthwhile to try using the VisualVM profiler to diagnose the problem. You may have a bottleneck in your program that is pushing you over the limit. However, if you are significantly over the time limit, then profiling and tuning probably won’t help. But there’s no way of knowing how far over you are, since an online judge will just kill your program once it reaches the limit. So you have to make an educated guess about whether profiling is worth your time.

For effective profiling of a programming puzzle solution, you need to create an input file to test with. A good input file is not too small and not too large. If your input is too small, the profiler output won’t be as accurate as it could be, since program startup will take a disproportionate amount of the total runtime. If it is too large, you’ll have to wait too long for results, and won’t be able to efficiently try performance tuning ideas. I would recommend choosing input that makes your program run for a second or so at most, without the profiler running. Making it run longer won’t give you much benefit, and when you run it under the profiler, it will take longer due to the extra overhead. Also, competitive programming solutions are intended to run about that long, so that will be the most representative input size.

When you have your program and input ready, here is the process for getting profiler results:

Start VisualVM
Click the Startup Profiler toolbar button at the far right of the toolbar. This button will show up once you install the Startup Profiler plugin. The button looks like this (except smaller and less fuzzy):

StartupProfilerButton

Step through the Startup Profiler settings in the dialog that pops up, using the following steps:
For a UVa OJ program, enter Main in the Start profiling from classes setting, since this is the class name that UVa OJ expects.
Clear out the Do not profile classes setting, since otherwise you won’t get information about the performance of library methods. You care about these methods not because you’ll be improving their code, but because you’re calling them, and they use up some of your time limit. For example, if you determine that the java.util.HashMap.get method is consuming 25% of your total runtime, that might prompt you to see if you can call it less frequently.
You can leave the remaining settings at their default values. Here’s what the dialog box looks like on my machine:

ProfileStartupDialog

Click the Copy to clipboard button to get a copy of the agentpath parameter.
Press the Profile button to start the profiler. VisualVM will wait for your program (“target VM”) to start.
On your command line, paste the agentpath parameter between java and Main. For example, your command line might look like this:

java -agentpath: C:/visualvm_138/profiler/lib/deployed/jdk16/windows/profilerinterface.dll= C:\visualvm_138\profiler\lib,5140 Main < input.txt > output.txt

Run the command line. VisualVM should detect that your program is running, and start profiling it. Don’t be alarmed if you see a lot of “Profiler engine warning” messages like “JVMTI classLoadHook: class name is null.” Apparently that’s normal.
When your program completes, VisualVM will pop up an Application Finished dialog asking you if you want to take a snapshot of the collected results. If the profiler indicates that it is still working — which is likely to happen if your program runtime is very short — wait for it to finish. Then click Yes.

ApplicationFinished

In the top-level list of methods shown in the snapshot, you should see your main method. Expand that to analyze your results.

Snapshot

Analyzing a Profiler Snapshot

In the VisualVM Snapshot tab, as shown in the image above, you’ll see a report in the form of a tree of method calls. For each method call, the report provides the method’s execution time in milliseconds, the percentage of the total runtime taken by that method, and the number of times the method is invoked.

At each tree node, you’ll see one of three types of identifiers:

One of your own methods: If you see a method name from your program, then this tree node collects runtime information about a method that you have written. This helps you identify which parts of your program are taking the most time, so that you can optimize them.
A Java library method: Unlike your own methods, you can’t directly optimize Java library methods. However, it’s useful to see how much time they are taking, since it affects your overall program runtime. To save time in this category, you have to call these methods less frequently, or use an alternative method. For example, if you find that List.contains is taking up a large percentage of your program runtime, you could switch to HashMap.contains if you don’t need to store duplicate values.
“Self time”: This indicates the time taken up by core language features, like incrementing variables. Core language features tend to be fast compared to library calls, but self time can still represent a high percentage of total program runtime. Self time is essentially a “miscellaneous” category that covers every line of code that isn’t a method call. For a long method that calls many other different methods, self time can add up. For this reason, it can be helpful to break up your code into smaller methods, which VisualVM will then report on separately. Avoiding long methods is standard coding advice, and it also helps evaluate your program in the profiler.

An Example From UVa 732

I ran the VisualVM profiler on the “simple and slow” UVa 732 solution that I provided pseudocode for last week. This is a portion of the result:

SimpleAndSlow

Here are a few things to note about this example:

parseNextTestCase is the method in my solution template where I put the code for solving a test case. Note that it runs 2002 times, which is related to the number of test cases in the input file I used for this example. And it consumes 99.9% of the program runtime, so there’s no need to look elsewhere for optimizations.
Self time takes 20.2% of the method runtime, a significant fraction. Since it can be difficult to tell what is contributing to this runtime, it would be helpful from a profiling perspective to split parseNextTestCase into multiple methods.
The next two methods by total time are versions of StringBuilder.append, a built-in Java method, which take a combined 34.2% of the method runtime. Although StringBuilder.append is efficient, the number of invocations is almost 8 million, which is a lot for only 2000 test cases. Clearly this implementation needs some work.

Here is the profiler trace for a UVa 732 solution that is optimized, though still not fast enough to be accepted by the online judge.

Optimized

Notice that the overall runtime has decreased considerably, at less than a second vs. over 14 seconds for the previous implementation. Not coincidentally, the number of invocations for the top methods is also considerably less, under one million.

Finally, here is a snapshot of the last implementation that I discussed last week, the recursive approach.

Recursive

The runtime for Main is now down to under 200 milliseconds. The profiler graphically illustrates how the recursive method, buildPath, is called. In this implementation, StringBuilder.append (which is used to construct the output) is one of the methods with the smallest total runtime. And it is called only about 14000 times, rather than hundreds of thousands. It’s not hard to see why this implementation is so fast.

Total Duration

Ultimately what you care about for resolving a Time Limit verdict is reducing your total program runtime. Adjusting individual methods based on results from the profiler is just a means to that end. Therefore, it can be handy to have a total time measurement The profiler provides this, but you can also calculate it in your code, which is more convenient at times when you don’t need the profiler output. At the top of your main method, just add this line:

long startTime = System.nanoTime();

And at the bottom of your main method, add this line:

System.err.println((System.nanoTime() - startTime)/1000000);

System.nanoTime reports time in nanoseconds (billionths of a second), so dividing by one million gives you milliseconds (thousandths of a second), which is easier to read and precise enough for optimization purposes. You can also use System.currentTimeMillis to get milliseconds directly.

By using System.err.println instead of System.out.println, you can see your duration result even if you redirect your program output to a file. This is helpful for a quick check after you make an optimization. Don’t forget to also diff your output against known good output, to make sure your optimization hasn’t introduced a bug.

A couple of other notes about calculating runtime:

Your computer is doing other things while you’re measuring your program runtime, so you won’t get the same duration result on every run, even if you haven’t made any changes to your program. For best results, run your test a few times and take the median. A spreadsheet is helpful for this purpose.
If you’re using a laptop, your operating system may lower the CPU speed when you’re on battery power. This will obviously skew your results, so it’s best to do your performance tests while you’re plugged in so that you get consistent results.

A Few Reminders About Performance

For a short program like a UVa OJ solution, you can potentially get most of your performance results before you even start coding. Halim and Halim recommend using algorithm analysis techniques to evaluate your design and ensure that 1) It will run fast enough to be accepted, and 2) Your solution is not excessively sophisticated for the puzzle that you’re trying to solve. In other words, balance speed of execution with speed of implementation.

This is good advice, and is essential if you’re in a timed programming competition where you won’t have time to experiment with a profiler. Nevertheless, a tool like VisualVM can give you practical experience with the performance characteristics of a language.