Profile cover photo
Profile photo
Peter Goodman
153 followers -
Computer science student interested in programming languages and compilers.
Computer science student interested in programming languages and compilers.

153 followers
About
Peter's posts

Post has attachment
Recently I released Granary 2 (https://github.com/Granary/granary2) as an open-source project. Granary 2 is a dynamic binary translator for 64-bit Linux programs. What makes Granary 2 really interesting is its "dynamic inline assembly". To instrument a program, you specify assembly instructions that you want to have inlined into the program's code. Dynamic inline assembly is higher level than normal assembly: it allows you to use virtual registers, placing the burden of complicated register allocation and state saving/restoring on the binary translator. There are many other interesting features in Granary 2. Go check it out! 

Post has attachment
This weekend I fixed one of Granary's [1] outstanding code cache corruption bugs. The symptoms of the bug was that, every now and then, some translated code would cause a spurious fault. When disassembled, it was clear that at least 16 bytes of the faulting basic block had been overwritten (corrupted) with zeroes.

When interpreted by the hardware, zeroes are valid instructions, and so control would pass through them and silently corrupt some registers as well as not correctly do what work was expected by the immediately following non-corrupted instructions. These following instructions would typically raise a fault, thus drawing attention to the issue.

I suspected that the problem was an errant memset somewhere due to the nicely aligned size and disposition of the corruption. But how does one find one offending memset in ~60k [2] lines of code? Granary provides the tools to solve this problem nicely.

To set up the solution, I started by zero-initializing the whole code cache, so that its initial "empty" state was distinguishable from a state where it contains instructions. Then, I used Granary's hot-patching interface to globally replace memset [3] with a different version that detects if memset is writing to the code cache. If so, then a GDB breakpoint is triggered any time the memset overwrites a non-zero (i.e. initialised) byte.

This found the bug and offending stack trace [4] immediately. The problem was a simple mistake: I had put a memset too early in some code--before a check to make sure that the memset should be done! [5] Otherwise, this weekend I knocked down a bunch of other bugs too. Next up, dealing with "BUG: Bad rss-counter state" when mounting and instrumenting an ext3 file system for instrumentation in a dual core VM instance.

[1] https://github.com/Granary/granary/
[2] http://www.ohloh.net/p/granary
[3] https://github.com/Granary/granary/blob/master/granary/kernel/linux/wrappers.cc#L57
[4] http://codepad.org/kux6JQUd
[5] https://github.com/Granary/granary/commit/03af555029703703bb4d5f566f01810a07d9ade7#diff-22c2d7712c172d779ae44bc6652069deR148

0xEA7DEADBEEFFEE1FULL, my new favourite 64-bit constant.

Post has attachment
Granary's source code is now publicly available under the BSD license: https://github.com/Granary/granary/. The Granary blog can also be found at: http://www.granarydbt.org.

Granary is a Linux kernel dynamic binary translation framework. Its primary purpose is to instrument Linux kernel modules, without imposing overhead on non-module kernel code. Modules are the primary focus because they represent the majority of new code under development, and tend to contain the most bugs.

Granary's three key novelties are 1) mixed-mode execution; 2) policy-driven instrumentation, and; 3) reifying instrumentation.

Mixed-mode execution lets Granary switch between native and instrumented code very quickly. This feature makes it possible for Granary to comprehensively instrument a module, while running the rest of the kernel at full speed.

Policy-driven instrumentation allows Granary tools to perform run-time code specialisation by using instrumenting code differently depending on the context in which that code executes. For example, Granary can instrument the code running inside of a RCU read-side critical section differently than the code running outside of an RCU read-side critical section. This is used in Granary's rcudbg tool, which is currently under development.

Finally, reifying instrumentation bridges the gap between static and dynamic code analysis by integrating static program information into the DBT system. This allows Granary tools to do things like interpose on accesses to specific fields of specific kernel data structures.

Together, these three features make Granary an ideal platform for developing flexible and efficient kernel module analysis and debugging tools.

+akshay kumar and I will be presenting some of our work on Granary in this year's HotDep conference (co-located with SOSP'13). Our paper, titled "Behave or Be Watched: Debugging with Behavioural Watchpoints", describes a framework built atop Granary that makes it easy to create tools that detect buffer overflows, use-after-free bugs, double-free bugs, memory leaks, and more.

Post has attachment

Post has attachment
Such an emotional song from such a figure on such a special day. 

Post has attachment
Just released a sort-of GNU C99 type/function declaration parser on GitHub: https://github.com/pgoodman/cparser

What does Spiderman do in the winter? Can he still cling to walls? Does he get frost bite through his suit?

Post has attachment
New blog post: Tracking Data with Function Pointers
In this post, I detail a fun/evil technique where I "attach" meta-information to data structures at runtime by changing function pointers in those data structures.

http://www.ioreader.com/2012/10/14/tracking-data-with-function-pointers

Post has attachment
New blog post: Traditional Parsing Methods

The purpose of this post is to give context to a future blog post about top-down operator precedence parsers (TDOP). TDOP parsers behave in a similar way to left-corner parsers. This blog post introduces top-down and bottom-up parsing, and then explains left-corner parsing in terms of those approaches. Also, the top-down parsing language (TDPL) is briefly mentioned because TDOP and the TDPL will share some semantics.

http://www.ioreader.com/2012/05/09/traditional-parsing-methods
Wait while more posts are being loaded