JIT vs Interpreter

Lately, I’ve been thinking about binary instrumentation. Binary instrumentation is awesome. Dynamic binary instrumentation (DBI) frameworks like Pin allow you to effectively insert your own code in between a binary program’s instructions. Such a capability is obviously very powerful. In our research, we use PIN to record execution traces that we can then analyze with BAP.

PIN is a great tool. It has a nice API that makes many things easy, and I’ve never found an instruction it can’t handle. (It’s made by Intel – it figures they could actually completely model their own architecture.) PIN works by reading binary code, adding the user’s specified instrumentation, and then Just In Time (JIT) compiling the whole thing. This got me thinking: BAP can understand binary code and allow users to modify it using a visitor interface. But, the BAP interpreter is really slow.

How slow? Let’s create a simple program and find out:

BAP IL program

ctr:u32 := 100000:u32
accum:u32 := 0:u32
label begin
ctr:u32 := ctr:u32 - 1:u32
accum:u32 := accum:u32 + ctr:u32
cjmp ctr:u32 == 0:u32, "end", "begin"
label end
halt accum:u32

This program computes the sum of the first 100,000 numbers. Shouldn’t take too long to execute, right?

BAP IL program

ed@ed-ThinkPad-T510:/tmp$ time ileval -il loop.il -eval

real                      0m26.773s
user                      0m26.658s
sys                       0m0.040s

Ouch! Almost 30 seconds. That is only ~4000 loop iterations per second. That got me thinking: How difficult is it to do JIT? It’s surprisingly easy! There are plenty of JIT frameworks to choose from these days. I chose to use LLVM, because I was already familiar with it, and because there is an OCaml LLVM interface. Because BAP and LLVM are fairly well designed, it only took me about 48 hours to implement a BAP IL to LLVM IL converter. Let’s re-run the JIT version of the code and see how long it takes.

BAP IL program

ed@ed-ThinkPad-T510:~/f11/llvm$ time ./utils/codegen -il /tmp/loop.il -exec

real    0m0.015s
user    0m0.004s
sys     0m0.008s

Holy smokes that was fast! Let’s see how this got converted to LLVM IL:

LLVM bytecode of BAP IL program

define i32 @0() {
allocs:
  store i32 100000, i32* @ctr, align 4
  store i32 0, i32* @accum, align 4
  br label %BB_1

BB_1:                                             ; preds = %BB_1, %allocs
  %load_var_ctr = load i32* @ctr, align 4
  %-_tmp = add i32 %load_var_ctr, -1
  store i32 %-_tmp, i32* @ctr, align 4
  %load_var_accum = load i32* @accum, align 4
  %"+_tmp" = add i32 %load_var_accum, %-_tmp
  store i32 %"+_tmp", i32* @accum, align 4
  %"==_tmp" = icmp eq i32 %-_tmp, 0
  br i1 %"==_tmp", label %BB_2, label %BB_1

BB_2:                                             ; preds = %BB_1
  %load_var_accum3 = load i32* @accum, align 4
  ret i32 %load_var_accum3
}

And here is the x86 assembly:

Compiled BAP IL program

# BB#0:                                 # %allocs
        movl    $100000, ctr            # imm = 0x186A0
        movl    $0, accum
        .align  16, 0x90
.LBB0_1:                                # %BB_1
                                        # =>This Inner Loop Header: Depth=1
        movl    ctr, %eax
        decl    %eax
        movl    %eax, ctr
        addl    %eax, accum
        testl   %eax, %eax
        jne     .LBB0_1
# BB#2:                                 # %BB_2
        movl    accum, %eax
        ret

One small issue is how to deal with memory: Sometimes we don’t want a bad memory read or write to crash our whole evaluation. There are two modes in the BAP to LLVM conversion. The first mode does no sandboxing: a memory write in the BAP IL is translated directly to a LLVM memory write. The second mode replaces all memory operations in the BAP IL with calls to C++ functions that set and read a std::map object.

Look for the LLVM JIT code to appear in a new BAP release coming soon!

Ed's Blog

A PhD Student's Musings

JIT vs Interpreter

Comments