What is the best way to calculate number of padding bytes
pad = (-size)&3;
This should be the fastest.
size 0: pad 0
size 1: pad 3
size 2: pad 2
size 3: pad 1
As long as the optimizing compiler uses bitmasking for the % 4
instead of division, I think your code is probably pretty good. This might be a slight improvement:
// only the last 2 bits (hence & 3) matter
pad = (4 - (size & 3)) & 3;
But again, the optimizing compiler is probably smart enough to be reducing your code to this anyway. I can't think of anything better.
// align n bytes on size boundary
pad n size = (~n + 1) & (size - 1)
this is similar to TypeIA's solution and only machine language ops are used.
(~n + 1) computes the negative value, that would make up 0 when added to n
& (size - 1) filters only the last relevant bits.
examples
pad 13 8 = 3
pad 11 4 = 1