Tuesday, November 27, 2012

Performance Is Not Obvious

Recently there was a post in the Xtext forum about the runtime performance of a particular function in the Xtext code base:

Ed Merks suggested to rewrite the method to a loop iteration instead of a recursive function and to save one invocation of the method eContainer such as the new implementation shall become at "least twice as fast."

I really liked the new form since is much easier to debug and to read and from that point a the change is definitely worth it. However, as I recently did a deep dive into the runtime behavior of the JVM, I doubted that the change would have to much impact on the actual performance of that method. So I took the time and sketched a small caliper benchmark in order to double check my intuition.

As it turns out, the refactored variant is approximately 5 to 20% faster for the rare cases where a lot of objects have to be skipped before the result was found and takes the same time for the common case where the requested instance is found immediately. So it's not even close to the expected improvements. But what's the take-away?

Before applying optimizations it's worthy to measure the impact. It may be intuitive to assume that cutting the number of method invocation down to a fraction of the original implementation - after all it was a recursive implementation before - saves a lot of time but actually the JVM is quite powerful and mature at inlining and producing optimized native code. So I can only repeat the conclusion of my talk about Java Performance (which I proposed for EclipseCon Boston, too):
  • [..] Write Readable and Clear Code. [..]
(David Keenan)
  • [..] slavishly follow a principle of simple, clear coding that avoids clever optimizations [..] (Caliper FAQ)
  • Performance advice has a short shelf-life 
(B. Goetz)
From that point of view, the refactored implementation is definitely worth it, even though there is no big impact on the runtime behavior of the method.


jfr said...

If only Java had tail call optimization ;-)

Funny. I think the recursive definition is more easy to read. Maybe a matter of taste...

Sebastian Zarnekow said...

Actually it depends on taste but it's a more intuitive way to describe the intent. The refactored version is quite ok, too and simplifies debugging / profiling since those tools are better with loops as with deep stacks.

Knut Wannheden said...

I didn't quite expect the new implementation to be much faster either. But just like you I prefer it because it simplifies debugging. Yet it is again amazing to see how clever the HotSpot seems to be. Maybe others, like IBM's J9, are not quite as clever.

BTW, where's the data for the 'Iter' test case?

Sebastian Zarnekow said...

The result for the iterator-based implementation can be seen if you go to http://microbenchmarks.appspot.com/run/Sebastian.Zarnekow@gmail.com/ContainerByTypeBenchmark/2080011 and enable it in the Variables section.

Knut Wannheden said...

Actually I think the most obvious code for this problem would have been:

T getContainerOfType(EObject object, Class clazz) {
while (!clazz.isInstance(object) && object != null)
object = object.eContainer();
return (T) object;

Interestingly enough it is in many cases noticeably slower than the for loop, but in the NotFound test it seems to be consistently faster: http://microbenchmarks.appspot.com/run/knut.wannheden@gmail.com/ContainerByTypeBenchmark.