Good news: we just opened the BAP nightly reports
to the public. Bad news: the pcmpistri
instruction has been wrong
for the past few days,
even though I’ve been trying to fix it.
pcmpistri
has the worst documentation of any instruction I’ve seen
in the Intel manual. pcmpistri
is one of the newer SSE string
instructions. It can perform a variety of operations, including
strcmp
, substring searching, and character set and range matching.
There’s an excellent blog post that
covers the uses of pcmpistri
and its related instructions.
This purpose of this blog post is to share some of the gotchas that I
ran into when trying to fix our modeling of pcmpistri
in BAP. To the
best of my knowledge, these gotchas are undocumented. (To be fair,
the documentation is so poorly written it’s hard to tell if these
issues are addressed.) The above linked blog post avoids discussing
these gotchas by discussing things at an abstract string level.
Strings in registers are reversed
pcmpistri
can take both of its input strings in registers. In
memory, strings are usually stored with later (farther right)
characters being stored at higher memory addresses. It is unclear how
strings should be represented while they are in a register. I
personally think that “abcd” should be stored as 0x61626364
.
More generally, the rightmost character of a string would be stored in
the least significant byte of the register. There is no deep reason
for this convention. However, numbers are stored in the correct byte
order while in registers, and it seems weird for strings to be in
reverse order.
For example, what do you think
1
|
|
should return when %xmm1 == 0x6b6579206b6579206b6579206b657920
and %xmm2 == 0x6b657900000000000000000000000000
? pcmpistri
$0xc, %xmm1, %xmm2
is supposed to return the least significant
byte index of %xmm1
that contains the substring in %xmm2
.
Note that %xmm2
is “key” and %xmm1
is “key key key key “.
You might, like me, think that the result should be 3 (the least
significant index is 0). But you’d be wrong. In fact, pcmpistri
interprets the strings backwards. So, %xmm2 ==
0x6b657900000000000000000000000000 == ""
, because it starts with a
null byte.
Why interpret strings like this? Consider the following snippet:
1 2 3 4 5 |
|
When substr
and str
are copied to %xmm1
and %xmm2
, they
are reversed. movdqu
does not know that it is copying a string, and
will reverse the byte order of the string, since the x86 is a little
endian machine. So, operating on reversed strings does make sense on
a little endian machine, but this may not be intuitive. It certainly
wasn’t what I was expecting.
Output selection is confusing
The immediate byte that is the first argument to pcmpistri
is called
the imm8 control byte. This is documented in section 4.1 in Volume 2
of the Intel Manual. For pcmpistri
, the sixth bit of this byte
controls the Output Selection. The value 12 denotes that, among other
things, that the least significant index should be returned. In
contrast, 76 would denote that the most significant index should be
returned.
Least significant and most sigificant here apply to the little endian (reversed) string representation. So, least significant means the leftmost character of the string, and most significant means the rightmost character of the string. This is exactly the opposite of my intuition for what the least and most sigificant bytes of a string are.
As an example, the above code for pcmpistri $12, %xmm2, %xmm1
, which
specifies the least significant output setting, returns 0 in %ecx
. On the
other hand, the most significant output setting, pcmpistri $76,
%xmm2, %xmm1
returns 8 in %ecx
.