Saturday 31 August 2013

Caching Classes from the ClassLoader?

Code for the article is here. I also submitted an enhancement patch to Tomcat bugtracker.

I noticed previously that the Tomcat WebappClassLoader is heavily serialized. In fact, the entire loadClass entry point is marked synchronized, so for any poorly designed libraries, the impact of this on scalability is pretty remarkable. Of course, the ideal is not to hit the ClassLoader hundreds of times per second but sometimes that's out of your control.

I decided to play some more with JMH and run some trials to compare the impacts of various strategies to break the serialization.

I trialled four implementations:
1) GuavaCaching - a decorator on WebappCL which uses a Guava cache
2) ChmCaching - a decorator on WebappCL which uses a ConcurrentHashMap (no active eviction)
3) ChmWebappCL - a modified WebappCL using ConcurrentHashMap so that loadClass is only synchronized when it reaches up to parent loader, classes loaded through current loader are found in local map
4) Out of the box Tomcat 8.0.0-RC1 WebappClassLoader - synchronized fully around loadClass method

The results; in operations per microsecond, where an operation is a lookup of java.util.ArrayList and java.util.ArrayDeque.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
GuavaCaching 3687 978 1150 1129 1385 1497 1607 1679 1777 1733 1834
ChmCaching 27241 51062 81678 107376 134798 162125 188192 213007 208034 210812 200231 211744 214431 215283 209782 212297
ChmWebappCL 185 48 81 81 83 81 84 85 85 85 80 84 82 83 83 84
WebappCL 181 69 91 92 91 92 100 98 95 94 94 95 102 102 95 98

And the explanation -
  • GuavaCaching seems remarkably slow compared to CHM. Might be worth investigating further. I also noticed significant issues with Guava implementation; some tests were running for extremely long time, seems there is an issue in the dequeue (quick look, it appears stalled in a while loop).
  • ChmCaching seems very effective; although it is caching classes loaded from parent and system loader. This seems OK per the API but unusual, I will have to check the API in more detail. Scales linearly with cores (it is an 8 core machine).
  • ChmWebappCL seemed to have less of an effect. This is likely because I am testing loading classes against core java.util.* rather than from a JAR added to the classloader. I expect ChmWebappCL can approach ChmCaching speed if I attach JARs directly to the class loader rather than passing through to system loader. (Going to system loader means entering the synchronized block).
  • WebappCL - very slow performance.

And pretty pictures. You can see that CHM caching is far and away the best of this bunch.

.


Same picture, at log scale -