How does XOR variable swapping work?

You can see how it works by doing the substitution:

x1 = x0 xor y0
y2 = x1 xor y0
x2 = x1 xor y2

Substituting,

x1 = x0 xor y0
y2 = (x0 xor y0) xor y0
x2 = (x0 xor y0) xor ((x0 xor y0) xor y0)

Because xor is fully associative and commutative:

y2 = x0 xor (y0 xor y0)
x2 = (x0 xor x0) xor (y0 xor y0) xor y0

Since x xor x == 0 for any x,

y2 = x0 xor 0
x2 = 0 xor 0 xor y0

And since x xor 0 == x for any x,

y2 = x0
x2 = y0

And the swap is done.


Other people have explained it, now I want to explain why it was a good idea, but now isn't.

Back in the day when we had simple single cycle or multi-cycle CPUs, it was cheaper to use this trick to avoid costly memory dereferences or spilling registers to the stack. However, we now have CPUs with massive pipelines instead. The P4's pipeline ranged from having 20 to 31 (or so) stages in their pipelines, where any dependence between reading and writing to a register could cause the whole thing to stall. The xor swap has some very heavy dependencies between A and B that don't actually matter at all but stall the pipeline in practice. A stalled pipeline causes a slow code path, and if this swap's in your inner loop, you're going to be moving very slowly.

In general practice, your compiler can figure out what you really want to do when you do a swap with a temp variable and can compile it to a single XCHG instruction. Using the xor swap makes it much harder for the compiler to guess your intent and therefore much less likely to optimize it correctly. Not to mention code maintenance, etc.