Ed's Blog

A PhD Student's Musings

BAP for Everyone

The use of OCaml in BAP has both positives and negatives. OCaml’s pattern matching is a wonderful fit for binary analysis. However, very few people know OCaml, and thus few people can understand or modify BAP’s source code. Although you can do some nifty things without digging into BAP’s source code, this only touches the surface of BAP’s capabilities.

Today I added a feature which I hope will bring BAP to the masses — well, at least to the masses of security people who want to use BAP but haven’t because they do not use OCaml. This feature should allow users to easily read and analyze the BAP IL in the comfort of their favorite programming language, whatever that might be.

BAP has always had a robust pretty printer and parsing mechanism which could, in theory, be parsed by other languages. But honestly — who wants to build a parser to parse the BAP IL? It’s annoying, and I doubt anyone has gone through the trouble of doing it. The new feature I added gives users the ability to serialize the BAP IL to a number of formats, including protobuf, XML, JSON, and Piqi. If your programming language doesn’t have libraries to parse one of these formats, it probably isn’t worth using.

Let’s take a look at some examples. Here’s some BAP IL:

1
2
3
addr 0x0 @asm "por    %xmm1,%xmm2"
label pc_0x0
R_XMM2:u128 = R_XMM2:u128 | R_XMM1:u128

Here’s the same IL represented in JSON:

1
[{"label_stmt":{"label":{"addr":0},"attributes":[{"asm":"por    %xmm1,%xmm2"}]}},{"label_stmt":{"label":{"name":"pc_0x0"},"attributes":[]}},{"move":{"var":{"name":"R_XMM2","id":31,"typ":{"reg":128}},"exp":{"binop":{"binop_type":"orbop","lexp":{"var":{"name":"R_XMM2","id":31,"typ":{"reg":128}}},"rexp":{"var":{"name":"R_XMM1","id":30,"typ":{"reg":128}}}}},"attributes":[]}}]

and in XML:

1
2
<?xml version="1.0" encoding="UTF-8"?>
<value><item><label-stmt><label><addr>0</addr></label><attributes><item><asm>por    %xmm1,%xmm2</asm></item></attributes></label-stmt></item><item><label-stmt><label><name>pc_0x0</name></label><attributes/></label-stmt></item><item><move><var><name>R_XMM2</name><id>31</id><typ><reg>128</reg></typ></var><exp><binop><binop-type>orbop</binop-type><lexp><var><name>R_XMM2</name><id>31</id><typ><reg>128</reg></typ></var></lexp><rexp><var><name>R_XMM1</name><id>30</id><typ><reg>128</reg></typ></var></rexp></binop></exp><attributes/></move></item></value>

and in protobuf:

1
2
3
4
5
6
7
8
9
00000000  0a 1e 22 1c 0a 02 10 00  12 16 0a 14 0a 12 70 6f  |.."...........po|
00000010  72 20 20 20 20 25 78 6d  6d 31 2c 25 78 6d 6d 32  |r    %xmm1,%xmm2|
00000020  0a 0e 22 0c 0a 08 0a 06  70 63 5f 30 78 30 12 00  |..".....pc_0x0..|
00000030  0a 41 0a 3f 0a 0f 0a 06  52 5f 58 4d 4d 32 10 3e  |.A.?....R_XMM2.>|
00000040  1a 03 08 80 02 12 2a 1a  28 08 0c 12 11 2a 0f 0a  |......*.(....*..|
00000050  06 52 5f 58 4d 4d 32 10  3e 1a 03 08 80 02 1a 11  |.R_XMM2.>.......|
00000060  2a 0f 0a 06 52 5f 58 4d  4d 31 10 3c 1a 03 08 80  |*...R_XMM1.<....|
00000070  02 1a 00                                          |...|
00000073

Hopefully this will encourage some new people to use and contribute to BAP. Adding support for new instructions isn’t that hard, even for people that don’t know OCaml! This serialization will be in BAP 0.7, which will be released in a few days.

Comments