this is the second part of a small series of articles about the newest Android Version: Kitkat, that I wrote at  +inovex GmbH. In this part we will take a closer look at the new fileformat: OAT of the ART runtime and have a brief look at the garbage collection (You can find part one here:
#androidkitkat     #art   #Kitkat   #inovex   #oat   #android  

3. diggin deeper: OAT file analysis

So far we found out that the system runs a kind of compilation on the device itself. It converts not only the apps but also huge parts of the android framework to oat files. 
In this post we'll try to figure out what this oat-thingies are and how they are executed.

As mentioned before, all installed apps run through the dex2oat compilation. Now lets have a closer look at the resulting files. 
So lets adb pull one of the dex2oat results, for example the converted result of the SystemUI.apk:
The handy 'file' comand returns:

system@priv-app@SystemUI.apk@classes.dex: ELF 32-bit LSB shared object, ARM, version 1 (GNU/Linux), dynamically linked, stripped

Wow.. that escalated quickly! With ART we go from java -> class -> dex -> oat, which is a shared object!

Further analysis with objdump shows the following: 
    file format elf32-little

00001000 g    DO .rodata        0007d000 oatdata
0007e000 g    DO .text        000a9f8f oatexec
00127f8b g    DO .text        00000004 oatlastword

Only three symbols are defined: Metadata, execution start and end.
Obviously the new runtime handels apps as shared objects(!) which were dynamicaly loaded into the VM-context (which is very likely to be the previously explained boot-image).
A look into the sources reveals that they actually use dlopen() to load the "libraries" during runtime. 

Now lets use the new oatdump to gain some more knowledge over the oat file format.

My first attemp was to use 'oatdump' at the boot-image file "/data/dalvik-cache/".
But it turned out that a whole dump of this file is about 1.6GB in size, which is quite unhandy for the kind of analysis which I'm trying to do.

Therefore, I wrote a small App with almost no functionallity, but which is simple enough to understand how this OAT thing works.

Here is the  source code: 

package de.inovex.arttest;

import android.os.Bundle;
import android.view.Menu;

public class MainActivity extends Activity {
        protected void onCreate(Bundle savedInstanceState) {
                int a = 100;
                a = foo(a);
        private int foo(int a) {
                return a+4711;

        public boolean onCreateOptionsMenu(Menu menu) {
                getMenuInflater().inflate(, menu);
                return true;

After installation, we can pull it's compiled version on our host and execute objdump on it: 

data@app@de.inovex.arttest.apk@classes.dex:  file format elf32-little

00001000 g    DO .rodata        00001000 oatdata
00002000 g    DO .text  00000238 oatexec
00002234 g    DO .text  00000004 oatlastword

No surprise so far.... it's obviously an App with very little functionality that fits into 0x238 bytes ;-)

So let's use oatdump on that file:
"oatdump --oat-file= data@app@de.inovex.arttest.apk@classes.dex"









so the header shows us something about the content of that file, architecture and some integrity-check values, and some adreses, which are presumably used to relocate the library correctly.
But the interesting part is in the body of the dump output: method names, dex code and the disassembled ARM code of this method.
For example, the oat-dump output of our "foo" method is the following: 

  1: int (dex_method_idx=5)
      0x0000: add-int/lit16 v0, v2, #+4711
      0x0002: return v0
      frame_size_in_bytes: 32
      core_spill_mask: 0x00008060 (r5, r6, r15)
      fp_spill_mask: 0x00000000
      vmap_table: 0xf73b00da (offset=0x000010da)
      v0/r5, v2/r6, v65535/r15
      mapping_table: 0xf73b00d8 (offset=0x000010d8)
      gc_map: 0xf73b00e0 (offset=0x000010e0)
    CODE: 0xf73b00bd (offset=0x000010bd size=28)...
      0xf73b00bc: e92d4060  push    {r5, r6, lr}
      0xf73b00c0: b085      sub     sp, sp, #20
      0xf73b00c2: 9000      str     r0, [sp,   #0]  
      0xf73b00c4: 9109      str     r1, [sp,   #36]  
      0xf73b00c6: 1c16      mov     r6, r2
      0xf73b00c8: f2412267  movw    r2, #4711
      0xf73b00cc: eb160502  adds.w  r5, r6, r2
      0xf73b00d0: 1c28      mov     r0, r5
      0xf73b00d2: b005      add     sp, sp, #20
      0xf73b00d4: e8bd8060  pop     {r5, r6, pc}

The DEX CODE part is quite obvious: int a in our java-source code maps to our virtual register v2, we add the constand 4711, store the result in v0 and return it.
The OAT DATA is not yet fully understood, but obviously the "core_spill_mask" describes the registers that were used in that method on the ARM to pass the data,
the "vmap_table" shows which virtuall registers map to which real ones.
The CODE sections shows whats the processor is acutally going to execute:
r2 obviously holds our "int a" at the begining. After the creation of a new stack frame, the constat 4711 is added to our "int a", and the results are passed back.
Also no surprise, but impressive to see!

It also reveals that there is almost no optimisation. It is more like a "gcc -O0". Obviously the whole "computation" can be done with one single instruction:
adds.w   r0, r2, #4711
not even a new stack frame is needed to do this.

So let's summarize what OAT files are: precompiled like APK files, that are loaded into the running process like a shared-object library. They contain each method of all classes in the APK, and of corse the method names and descriptions and a offset table, to locate the methods within the binary. 

4. keep care of your heap: GC on the ART
The ART garbage collection is quite similar to the dalviks one. Both use a mark-and-sweep approach to keep the heap clean. That surprises at first glance, but is actually quite comprehensible. 
We have never lost the tracabilty of our allocation on the way from java to class to dex to machinecode. Also the way of the code execution has changed, the data structures and referenced of the objects are still the same, and therefore the GC process can be performed in the same way as on the davik. 

A quick look into the sources unter art/runtime/gc reveals that they use 4 different types of GC runs, all of them may run in parallel, and are listet with an incresing chance of freeing heap space:

// The type of collection to be performed. The ordering of the enum matters, it is used to
// determine which GCs are run first.
 enum GcType {
 // Placeholder for when no GC has been performed.
 // Sticky mark bits GC that attempts to only free objects allocated since the last GC.
 // Partial GC that marks the application heap but not the Zygote.
 // Full GC that marks and frees in both the application and Zygote heap.
 // Number of different GC types.

The CG loops through it, until enough space is available to allocate the desired memory:

 // Loop through our different Gc types and try to Gc until we get enough free memory.
for (size_t i = static_cast<size_t>(last_gc) + 1;
      i < static_cast<size_t>(collector::kGcTypeMax); ++i) {...
 if this procedure fails, the system starts to try it harder by enlarge the heapspace etc. But this is overall a standard procedure and nothing new or exeptional different to the dalviks CG, or, at least, I didn't found it. 
Shared publiclyView activity