Ed's Blog

A PhD Student's Musings

VSA Problem

Good news and bad news. I’ve made more progress on BAP’s VSA implementation. Recall this example from last time:

BAP’s VSA now outputs this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
VSA @BB_Entry
 R_ESP_1:u32 -> (R_ESP_1:u32 |=> 0[0,0])

VSA @BB_Exit
 R_ESP_1:u32 -> (R_ESP_1:u32 |=> 0[0,0])
 x_82:u32 -> ($ |=> 0[0,0])
 x_83:u32 -> VS.top
 x_84:u32 -> ($ |=> 1[4,2147483647])

VSA @BB_0
 R_ESP_1:u32 -> (R_ESP_1:u32 |=> 0[0,0])
 x_82:u32 -> ($ |=> 0[0,0])

VSA @BB_1
 R_ESP_1:u32 -> (R_ESP_1:u32 |=> 0[0,0])
 x_82:u32 -> ($ |=> 0[0,0])
 x_83:u32 -> ($ |=> 1[-2147483648,3])
 x_84:u32 -> ($ |=> 1[-2147483647,4])

VSA @BB_2
 R_ESP_1:u32 -> (R_ESP_1:u32 |=> 0[0,0])
 x_82:u32 -> ($ |=> 0[0,0])
 x_83:u32 -> VS.top
 x_84:u32 -> ($ |=> 1[4,2147483647])

There is something odd going on here. At BB_1, VSA determines that x_83 can be (almost) any number below 4. However, it’s obvious that x_83 cannot be negative.

What’s going on? Well, x_83 starts out as phi(x, x_84), which means that x_83 gets its value from either x or x_84, depending on control flow. Read up on SSA form if this makes no sense to you. VSA determines that x_83’s value set is the union of x and x_84’s value sets. Sounds good so far. x always has zero, so that one is easy. x_84 will initially be Top, since we have not seen an assignment for it yet. Since Top is any value, Top union anything is Top. Thus, x_83 becomes Top, and x_84 becomes Top + 1. Now, when the value set of x_84 flows through the edge condition x_84:u32 <= 3:u32, any values above 3 are removed, leaving us with (almost) any value less than 4.

VSA is an abstract interpretation algorithm, but not a dataflow algorithm. In dataflow problems, data values always move downward in the lattice. But in abstract interpretation, they can move up and down, which is what happens with VSA when we perform union (up) and intersection (down). I am used to working with dataflow problems, and so this seems very unnatural to me.

It seems like we would get the right answer if the initial lattice values of each node used the edge conditions somehow; right now the initial lattice values are simply Top. But if the initial lattice value for BB_1 had x_84 constrained to be less than 4, we’d get the right answer. It also seems like we would get the right answer if we were not using SSA. The original author of BAP VSA wrote it as an SSA analysis, although I do not know why. I do not think there is any benefit.

Comments