Success?

BAP now has a new framework for adding disassembly engines. For example, here is the implementation of recursive descent:

module RECURSIVE_DESCENT_SPEC = struct
  module State = struct
    type t = unit
    let init = ()
  end
  let get_succs _asmp g (v,l,e) () =
    let o = match List.rev (CA.get_stmts g v), l with
      | CJmp _::_, Some _ ->
        (match lab_of_exp e with
        | Some l -> Addrs [l]
        | None -> Indirect)
      | CJmp _::_, _ -> failwith "error"
      (* Fallthrough/Jmp *)
      | _::_, None ->
        (match lab_of_exp e with
        | Some l -> Addrs [l]
        | None -> Indirect)
      | _::_, Some _ -> failwith "error"
      | [], _ -> Addrs []
    in
    o, ()
end

Right now, the engine is asked what the successors are for an unresolved edge. All of the yucky details for maintaing the CFG and such are hidden from the resolving code. There is also a VSA-based engine, but it’s inefficient: VSA must be re-run from scratch every time an edge must be resolved. I’m going to fix this, of course.

VSA-based lifting is neat because it allows lifting us weird malware types of obfuscation, for instance:

pushl $cool
jmp *(%esp)
cool:
xor %eax, %eax

can be resolved:

Finally, here is an example of a switch being resolved:

int main(int argc, char *argv[]) {

  int x = 0;

  switch(argc) {
      case 0:
        x += 0;
      case 5:
        x += 2;
        break;
      case 1:
        x += 2;
      case 2:
        x += 2;
      case 3:
        x += 2;
      default:
        x = 10;
        break;
  }
  return x;
}

The interesting thing here is that VSA resolved the indirect jump to more than six destinations. Because the destination addresses are not aligned, the strided interval representation is forced to overapproximate. This is bad, but not a big deal in this case.

A more pressing issue is that (our implementation of) VSA’s abstract stores (memory) is extremely slow. Balakrishnan says he uses applicative dictionaries in his dissertation for this because it is space efficient. Conveniently, OCaml’s Map module provides applicative dictionaries. (Side note: I <3 OCaml.) However, abstract interpretation often fails to prove termination of a loop. When this occurs, any stack pointer being increased in the loop has its upper bound widened to about 2^31. As soon as data is written to this pointer, 2^31 maps are added to the abstract store. This takes a while (minutes on a very fast computer), but as Balakrishnan notes, it is space efficient. Unfortunately, even more slowdown occurs whenever the meet operation takes place on two abstract stores: every individual element (a value set) in both stores must be traversed. There is no easy way to get the differences between two maps in the OCaml Map module. (There is Set.difference, though.)

I see two possible solutions here.

Create an efficient abstract store data structure. This structure would be strided-interval aware, e.g., it would not store 2^31 entries when writing 0 to [0, 2^31]. It would also be able to identify differences in two structures very quickly. We might want something like an interval tree.
Whenever a pointer value set gets “too big”, widen it to top. This has the disadvantage of wiping out memory on that region. In the first option, we would only wipe out any memory locations above the lower bound of the pointer.

Ed's Blog

A PhD Student's Musings

Comments