Fastest way to get a positive modulo in C/C++
The standard way I learned is
inline int positive_modulo(int i, int n) {
return (i % n + n) % n;
}
This function is essentially your first variant without the abs
(which, in fact, makes it return the wrong result). I wouldn't be surprised if an optimizing compiler could recognize this pattern and compile it to machine code that computes an "unsigned modulo".
Edit:
Moving on to your second variant: First of all, it contains a bug, too -- the n < 0
should be i < 0
.
This variant may not look as if it branches, but on a lot of architectures, the i < 0
will compile into a conditional jump. In any case, it will be at least as fast to replace (n * (i < 0))
with i < 0? n: 0
, which avoids the multiplication; in addition, it's "cleaner" because it avoids reinterpreting the bool as an int.
As to which of these two variants is faster, that probably depends on the compiler and processor architecture -- time the two variants and see. I don't think there's a faster way than either of these two variants, though.
Most of the time, compilers are very good at optimizing your code, so it is usually best to keep your code readable (for both compilers and other developers to know what you are doing).
Since your array size is always positive, I suggest you to define the quotient as unsigned
. The compiler will optimize small if/else blocks into conditional instructions which have no branches:
unsigned modulo( int value, unsigned m) {
int mod = value % (int)m;
if (mod < 0) {
mod += m;
}
return mod;
}
This creates a very small function without branches:
modulo(int, unsigned int):
mov eax, edi
cdq
idiv esi
add esi, edx
mov eax, edx
test edx, edx
cmovs eax, esi
ret
For example modulo(-5, 7)
returns 2
.
Unfortunately, since the quotient is not known they must perform an integer division, which is a bit slow compared to other integer operations. If you know the sizes of your array are power of two, I recommend keeping these function definitions in a header, so that the compiler can optimize them into a more efficient function. Here is the function unsigned modulo256(int v) { return modulo(v,256); }
:
modulo256(int): # @modulo256(int)
mov edx, edi
sar edx, 31
shr edx, 24
lea eax, [rdi+rdx]
movzx eax, al
sub eax, edx
lea edx, [rax+256]
test eax, eax
cmovs eax, edx
ret
See assembly: https://gcc.godbolt.org/z/DG7jMw
See comparison with most voted answer: http://quick-bench.com/oJbVwLr9G5HJb0oRaYpQOCec4E4
Edit: turns out Clang is able to generate a function without any conditional move instructions (which cost more than regular arithmetic operations). This difference is completely negligible in the general case due to the fact that the integral division takes around 70% of the total time.
Basically, Clang shifts value
right to extend its sign bit to the whole width of m
(that is 0xffffffff
when negative and 0
otherwise) which is used to mask the second operand in mod + m
.
unsigned modulo (int value, unsigned m) {
int mod = value % (int)m;
m &= mod >> std::numeric_limits<int>::digits;
return mod + m;
}
Modulo a power of two, the following works (assuming twos complement representation):
return i & (n-1);