Guys, do you know this trick: var a = 7, b = 9; a = b - a; b = b - a; a = a + b;...

kjeetgill · on March 8, 2022

I love tricks like these even though I can't figure out when it'd be an advantage. This trick works with Xor for similar reasons:

  A = A^B
  B = A^B
  A = A^B

I've seen it used to store prev^next in one pointer in doubly linked lists. The coolest part is traveling forward and backwards depends only on if you use head or the tail to "xor-unpack" the pointer, no different for prev or next!

orlp · on March 8, 2022

Actually, djbsort [1] uses a similar trick to sort two integers in constant time:

    int32_t ab = b ^ a;
    int32_t c = b - a;
    c ^= ab & (c ^ b);
    c >>= 31;
    c &= ab;
    a ^= c;
    b ^= c;

How it works is that if b < a we initially have that c = b - a is negative, and non-negative otherwise, if it were a 33-bit integer. We can then use an arithmetic shift right to spread the top sign bit, creating a mask that is all 1s when b < a and 0s otherwise. However it isn't 33-bit, so we have to detect overflow.

If both a and b are non-negative or both negative, there was no overflow. So only if the top bit of both was different (indicating different signs and thus potentially overflow), and the top bit of the result differs from b (since if the signs of a and b differ, the sign of b is our desired result), we flip the top bit of our result. That is the role of the mysterious line c ^= ab & (c ^ b);.

We then as mentioned before do an arithmetic right shift to spread the top bit into the entirety of c, creating a mask. We filter this mask using c &= ab to only contain ones on the positions where a and b differ in bits. Finally we toggle those bit positions by XORing both a and b with the mask, swapping their values if they're out of order, and leaving them unchanged otherwise.

[1]: https://sorting.cr.yp.to

globular-toast · on March 8, 2022

The advantage is not having to use a third register to swap two values around. On systems like x86, registers are in very short supply. It's not quite as relevant now as x86 has had XCHG[0] for a long time now. Compilers are quite good at detecting when it should be used so you don't often have to explicitly write the XOR trick. XCHG also doesn't corrupt your flags register.

[0] https://c9x.me/x86/html/file_module_x86_id_328.html

Someone · on March 8, 2022

The XOR trick also introduces data dependencies that makes it slower on some CPUs (can’t start the second XOR until the result of the first is available, or the third one until the result of the second is available)

When using a third register, CPUs with register renaming need not even move data at all. They could move the labels “register A”, “register B” around.

pickledcods · on March 8, 2022

Alternatively you can do:

  a ^= b;
  b ^= a;
  a ^= b;

Which uses less energy because the bitwise operators do not use the intermediate/interbit carry.

pickledcods · on March 8, 2022

saving (one) swap space by adding 3 reads and 3 calculations.

Wiseacre · on March 8, 2022

and an int overflow!

pickledcods · on March 8, 2022

Unless they are unsigned ints, and then it is defined behaviour and it becomes modulo arithmetic.

mjevans · on March 8, 2022

Thank you, I'd forgotten and remembered and forgotten that again.

In a sort of ironic comedy, maybe an msint.h file could include definitions such as: mint8, sint8, ... mint64, sint64, etc. As well as an explanation about the difference between modular and signed (2s compliment) arithmetic operations.

mark-r · on March 8, 2022

Yes, I've known that trick for a long time. There's hardly ever a reason to use it though, unless you're swapping registers in assembly and don't have a spare register.