How to avoid overflow in expr. A * B - C * D
This should work ( I think ):
signed long long int a = 0x7ffffffffffffffd;
signed long long int b = 0x7ffffffffffffffd;
signed long long int c = 0x7ffffffffffffffc;
signed long long int d = 0x7ffffffffffffffe;
signed long long int bd = b / d;
signed long long int bdmod = b % d;
signed long long int ca = c / a;
signed long long int camod = c % a;
signed long long int x = (bd - ca) * a * d - (camod * d - bdmod * a);
Here's my derivation:
x = a * b - c * d
x / (a * d) = (a * b - c * d) / (a * d)
x / (a * d) = b / d - c / a
now, the integer/mod stuff:
x / (a * d) = (b / d + ( b % d ) / d) - (c / a + ( c % a ) / a )
x / (a * d) = (b / d - c / a) - ( ( c % a ) / a - ( b % d ) / d)
x = (b / d - c / a) * a * d - ( ( c % a ) * d - ( b % d ) * a)
Note that this is not standard since it relies on wrap-around signed-overflow. (GCC has compiler flags which enable this.)
But if you just do all the calculations in long long
, the result of applying the formula directly:(A * B - C * D)
will be accurate as long as the correct result fits into a long long
.
Here's a work-around that only relies on implementation-defined behavior of casting unsigned integer to signed integer. But this can be expected to work on almost every system today.
(long long)((unsigned long long)A * B - (unsigned long long)C * D)
This casts the inputs to unsigned long long
where the overflow behavior is guaranteed to be wrap-around by the standard. Casting back to a signed integer at the end is the implementation-defined part, but will work on nearly all environments today.
If you need more pedantic solution, I think you have to use "long arithmetic"
The simplest and most general solution is to use a representation that can't overflow, either by using a long integer library (e.g. http://gmplib.org/) or representing using a struct or array and implementing a kind of long multiplication (i.e. separating each number to two 32bit halves and performing the multiplication as below:
(R1 + R2 * 2^32 + R3 * 2^64 + R4 * 2^96) = R = A*B = (A1 + A2 * 2^32) * (B1 + B2 * 2^32)
R1 = (A1*B1) % 2^32
R2 = ((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) % 2^32
R3 = (((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) / 2^32 + (A1*B2) / 2^32 + (A2*B1) / 2^32 + (A2*B2) % 2^32) %2^32
R4 = ((((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) / 2^32 + (A1*B2) / 2^32 + (A2*B1) / 2^32 + (A2*B2) % 2^32) / 2^32) + (A2*B2) / 2^32
Assuming the end result fits in 64 bits you actually don't really need most bits of R3 and none of R4
This seems too trivial I guess.
But A*B
is the one that could overflow.
You could do the following, without losing precision
A*B - C*D = A(D+E) - (A+F)D
= AD + AE - AD - DF
= AE - DF
^smaller quantities E & F
E = B - D (hence, far smaller than B)
F = C - A (hence, far smaller than C)
This decomposition can be done further.
As @Gian pointed out, care might need to be taken during the subtraction operation if the type is unsigned long long.
For example, with the case you have in the question, it takes just one iteration,
MAX * MAX - (MAX - 1) * (MAX + 1)
A B C D
E = B - D = -1
F = C - A = -1
AE - DF = {MAX * -1} - {(MAX + 1) * -1} = -MAX + MAX + 1 = 1