pcmpistri has the worst documentation of any instruction I’ve seen
in the Intel manual.
pcmpistri is one of the newer SSE string
instructions. It can perform a variety of operations, including
strcmp, substring searching, and character set and range matching.
There’s an excellent blog post that
covers the uses of
pcmpistri and its related instructions.
This purpose of this blog post is to share some of the gotchas that I
ran into when trying to fix our modeling of
pcmpistri in BAP. To the
best of my knowledge, these gotchas are undocumented. (To be fair,
the documentation is so poorly written it’s hard to tell if these
issues are addressed.) The above linked blog post avoids discussing
these gotchas by discussing things at an abstract string level.
Strings in registers are reversed
pcmpistri can take both of its input strings in registers. In
memory, strings are usually stored with later (farther right)
characters being stored at higher memory addresses. It is unclear how
strings should be represented while they are in a register. I
personally think that “abcd” should be stored as
More generally, the rightmost character of a string would be stored in
the least significant byte of the register. There is no deep reason
for this convention. However, numbers are stored in the correct byte
order while in registers, and it seems weird for strings to be in
For example, what do you think
should return when
%xmm1 == 0x6b6579206b6579206b6579206b657920
%xmm2 == 0x6b657900000000000000000000000000?
$0xc, %xmm1, %xmm2 is supposed to return the least significant
byte index of
%xmm1 that contains the substring in
%xmm2 is “key” and
%xmm1 is “key key key key “.
You might, like me, think that the result should be 3 (the least
significant index is 0). But you’d be wrong. In fact,
interprets the strings backwards. So,
0x6b657900000000000000000000000000 == "", because it starts with a
Why interpret strings like this? Consider the following snippet:
1 2 3 4 5
str are copied to
movdqu does not know that it is copying a string, and
will reverse the byte order of the string, since the x86 is a little
endian machine. So, operating on reversed strings does make sense on
a little endian machine, but this may not be intuitive. It certainly
wasn’t what I was expecting.
Output selection is confusing
The immediate byte that is the first argument to
pcmpistri is called
the imm8 control byte. This is documented in section 4.1 in Volume 2
of the Intel Manual. For
pcmpistri, the sixth bit of this byte
controls the Output Selection. The value 12 denotes that, among other
things, that the least significant index should be returned. In
contrast, 76 would denote that the most significant index should be
Least significant and most sigificant here apply to the little endian (reversed) string representation. So, least significant means the leftmost character of the string, and most significant means the rightmost character of the string. This is exactly the opposite of my intuition for what the least and most sigificant bytes of a string are.
As an example, the above code for
pcmpistri $12, %xmm2, %xmm1, which
specifies the least significant output setting, returns 0 in
%ecx. On the
other hand, the most significant output setting,
%xmm2, %xmm1 returns 8 in