Ed's Blog

A PhD Student's Musings

Pcmpistri = ARGH

Good news: we just opened the BAP nightly reports to the public. Bad news: the pcmpistri instruction has been wrong for the past few days, even though I’ve been trying to fix it.

pcmpistri has the worst documentation of any instruction I’ve seen in the Intel manual. pcmpistri is one of the newer SSE string instructions. It can perform a variety of operations, including strcmp, substring searching, and character set and range matching. There’s an excellent blog post that covers the uses of pcmpistri and its related instructions.

This purpose of this blog post is to share some of the gotchas that I ran into when trying to fix our modeling of pcmpistri in BAP. To the best of my knowledge, these gotchas are undocumented. (To be fair, the documentation is so poorly written it’s hard to tell if these issues are addressed.) The above linked blog post avoids discussing these gotchas by discussing things at an abstract string level.

Strings in registers are reversed

pcmpistri can take both of its input strings in registers. In memory, strings are usually stored with later (farther right) characters being stored at higher memory addresses. It is unclear how strings should be represented while they are in a register. I personally think that “abcd” should be stored as 0x61626364. More generally, the rightmost character of a string would be stored in the least significant byte of the register. There is no deep reason for this convention. However, numbers are stored in the correct byte order while in registers, and it seems weird for strings to be in reverse order.

For example, what do you think

pcmpistri $0xc, %xmm1, %xmm2

should return when %xmm1 == 0x6b6579206b6579206b6579206b657920 and %xmm2 == 0x6b657900000000000000000000000000? pcmpistri $0xc, %xmm1, %xmm2 is supposed to return the least significant byte index of %xmm1 that contains the substring in %xmm2. Note that %xmm2 is “key” and %xmm1 is “key key key key “.

You might, like me, think that the result should be 3 (the least significant index is 0). But you’d be wrong. In fact, pcmpistri interprets the strings backwards. So, %xmm2 == 0x6b657900000000000000000000000000 == "", because it starts with a null byte.

Why interpret strings like this? Consider the following snippet:

    movdqu substr, %xmm1
    movdqu str, %xmm2
    pcmpistri $12, %xmm2, %xmm1
substr: .ascii "key\0"
str:    .ascii "key key key\0"

When substr and str are copied to %xmm1 and %xmm2, they are reversed. movdqu does not know that it is copying a string, and will reverse the byte order of the string, since the x86 is a little endian machine. So, operating on reversed strings does make sense on a little endian machine, but this may not be intuitive. It certainly wasn’t what I was expecting.

Output selection is confusing

The immediate byte that is the first argument to pcmpistri is called the imm8 control byte. This is documented in section 4.1 in Volume 2 of the Intel Manual. For pcmpistri, the sixth bit of this byte controls the Output Selection. The value 12 denotes that, among other things, that the least significant index should be returned. In contrast, 76 would denote that the most significant index should be returned.

Least significant and most sigificant here apply to the little endian (reversed) string representation. So, least significant means the leftmost character of the string, and most significant means the rightmost character of the string. This is exactly the opposite of my intuition for what the least and most sigificant bytes of a string are.

As an example, the above code for pcmpistri $12, %xmm2, %xmm1, which specifies the least significant output setting, returns 0 in %ecx. On the other hand, the most significant output setting, pcmpistri $76, %xmm2, %xmm1 returns 8 in %ecx.