Lock nesting and the "WhatsApp" bug.

In the developer preview of ART one bug that has stood out is the "WhatsApp" bug. This has a work-around in the new 4.4.1 Android release. Here are links to information about the bug:
http://www.whatsapp.com/faq/android/28000002
https://code.google.com/p/android/issues/detail?id=61916

In the bug comments its said that the bug  "is not the fault of the WhatsApp application" which is technically true, but what caused the problem was something the WhatsApp developers had done to their application with an obfuscation tool. ART will still produce warnings in the log because of WhatsApp's code. But why would ART care about this code and Dalvik not? Let's look a bit more closely.

Java programs have synchronized blocks, these acquire and release a lock and so in the instructions of the program there are instructions to acquire and release a lock. Synchronized blocks can be nested but there should be pairs of acquire and release operations matching the code in the program. In ART the verifier is an improvement over Dalvik's in an effort to be aggressive with ahead-of-time optimization. Related to this is that we verify that the lock acquire and release operations are paired, the validity of this is shown by how stable ART has been in Android 4.4.0.

The WhatsApp bug and current warning is caused by an obfuscater. The obfuscater attempts to confuse people reverse engineering the application by introducing redundant code into the application. The code added that causes the bug are those that acquire and release the lock. ART's verifier correctly identified that if this redundant code were executed then a lock could be acquired without a corresponding release.

So why care about synchronize blocks being properly nested? Well we can produce stack traces at any point within a program. When an application in Android becomes unresponsive a signal can be sent to it to get a dump of its state (an ANR). In this information we show what locks are held in case the application is suffering a deadlock. How do we know what locks are held? Well because we can infer it from the state of the nesting and different offsets into a method.

But this is a debug feature, should ART really care? It turns out that knowing that locks are nested means the compiler can produce faster code where the held locks are inferred rather than recorded in the object or data structures in the heap. There's a write up of various optimizations this allows and a new kind of locking implemented in the Zing VM here:
http://dl.acm.org/citation.cfm?id=2069179
Shared publiclyView activity