Protocol Buffer first usage high latency

Yanick Salzmann :

In one of our java applications we have quite a few protocol buffer classes and the jar essentially exposes one interface with one method that is used by another application. We have noticed that the first time this method is called the invocation time is quite high (>500ms) while subsequent calls are much faster (<10ms). At first we assumed this has something to do with our code, however after profiling we could not confirm this. Through process of elimination it became obvious that it has something to do with protocol buffers.

This was further confirmed when in a different application, that works completely different - but also uses protocol buffers - showed the same behavior. Additionally we tried creating a dummy instance (XY.newBuilder().build()) of all the proto buffer classes at startup and with each of those we added we could notice the overhead of the first invocation drop.

In .NET I can find another question that shows the similar problem (Why is ProtoBuf so slow on the 1st call but very fast inside loops?), however the solution there seems to be specific to C# with precompiling serializers. I couldn't find the same issue in Java so far. Are there workarounds like the one shown in the question above that apply to java?

Karol Dowbecki :

JVM ships with just-in-time (JIT) compiler which does a lot of optimization to your code. You can dig into JVM internals if you want to understand it further. There will be class loading and unloading, performance profiles, code compilation and de-compilation, biased locking, etc.

To give you an example how complex this can get, as per this article, in OpenJDK there are two compilers (C1 and C2) with five possible tiers of code compilation:

Tiered compilation has five tiers of optimization. It starts in tier-0, the interpreter tier, where instrumentation provides information on the performance critical methods. Soon enough the tier 1 level, the simple C1 (client) compiler, optimizes the code. At tier 1, there is no profiling information. Next comes tier 2, where only a few methods are compiled (again by the client compiler). At tier 2, for those few methods, profiling information is gathered for entry-counters and loop-back branches. Tier 3 would then see all the methods getting compiled by the client compiler with full profiling information, and finally tier 4 would avail itself of C2, the server compiler.

The takeaway here is that if you require a predictable performance you should always warmup your code by running some dummy requests after each deployment.

You did the right thing with dummy code creating all used protobuff objects but you should take it a step further and warmup the actual method you are hitting.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=356474&siteId=1