Assembly ADC (Add with carry) to C++
From here (broken) or here
However, Intel processor has a special instruction called adc. This command behaves similarly as the add command. The only extra thing is that it also add the value carry flag along. So, this may be very handy to add large integers. Suppose you'd like to add a 32-bit integers with 16-bit registers. How can we do that? Well, let's say that the first integer is held on the register pair DX:AX, and the second one is on BX:CX. This is how:
add ax, cx adc dx, bx
Ah, so first, the lower 16-bit is added by add ax, cx. Then the higher 16-bit is added using adc instead of add. It is because: if there are overflows, the carry bit is automatically added in the higher 16-bit. So, no cumbersome checking. This method can be extended to 64 bits and so on... Note that: If the 32-bit integer addition overflows too at the higher 16-bit, the result will not be correct and the carry flag is set, e.g. Adding 5 billion to 5 billion.
Everything from here on, remember that it falls pretty much into the zone of implementation defined behavior.
Here's a small sample that works for VS 2010 (32-bit, WinXp)
Caveat: $7.4/1- "The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note ]"
int main(){
bool carry = false;
int x = 0xffffffff + 0xffffffff;
__asm {
jc setcarry
setcarry:
mov carry, 1
}
}
The ADC behaviour can be simulated in both C and C++. The following example adds two numbers (stored as arrays of unsigned as they are too large to fit into a single unsigned).
unsigned first[10];
unsigned second[10];
unsigned result[11];
.... /* first and second get defined */
unsigned carry = 0;
for (i = 0; i < 10; i++) {
result[i] = first[i] + second[i] + carry;
carry = (first[i] > result[i]);
}
result[10] = carry;
Hope this helps.
ADC is the same as ADD but adds an extra 1 if processor's carry flag is set.
The C++ language doesn't have any concept of a carry flag, so making an intrinsic function wrapper around the ADC
instruction is clunky. However, Intel did it anyway: unsigned char _addcarry_u32 (unsigned char c_in, unsigned a, unsigned b, unsigned * out);
. Last I checked, gcc did a poor job with this (saving the carry result into an integer register, instead of leaving it in CF), but hopefully Intel's own compiler does better.
See also the x86 tag wiki for assembly documentation.
The compiler will use ADC for you when adding integers wider than a single register, e.g. adding int64_t
in 32bit code, or __int128_t
in 64bit code.
#include <stdint.h>
#ifdef __x86_64__
__int128_t add128(__int128_t a, __int128_t b) { return a+b; }
#endif
# clang 3.8 -O3 for x86-64, SystemV ABI.
# __int128_t args passed in 2 regs each, and returned in rdx:rax
add rdi, rdx
adc rsi, rcx
mov rax, rdi
mov rdx, rsi
ret
asm output from the Godbolt compiler explorer. clang's -fverbose-asm
isn't very vebose, but gcc 5.3 / 6.1 wastes two mov
instructions so it's less readable.
You can sometimes hand-hold compilers into emitting an adc
or otherwise using the carry-out of add
using the idiom uint64_t sum = a+b;
/ carry = sum < a;
. But extending this to get a carry-out from an adc
instead of add
is not possible with current compilers; c+d+carry_in
can wrap all the way around, and compilers don't manage to optimize the multiple checks for carry out on each +
in c+d+carry
if you do it safely.
Clang _ExtInt
There is one way I'm aware of to get a chain of add/adc/.../adc: Clang's new _ExtInt(width)
feature that provides fixed-bit-width types of any size up to 16,777,215 bits (blog post). It was added to clang's development version on April 21, 2020, so it's not yet in any released version.
This will hopefully show up in ISO C and/or C++ at some point; The N2472 proposal is apparently being "being actively considered by the ISO WG14 C Language Committee"
typedef _ExtInt(256) wide_int;
wide_int add ( wide_int a, wide_int b) {
return a+b;
}
compiles as follows with clang trunk -O2
for x86-64 (Godbolt):
add(int _ExtInt<256>, int _ExtInt<256>):
add rsi, r9
adc rdx, qword ptr [rsp + 8]
adc rcx, qword ptr [rsp + 16]
mov rax, rdi # return the retval pointer
adc r8, qword ptr [rsp + 24] # chain of ADD / 3x ADC!
mov qword ptr [rdi + 8], rdx # store results to mem
mov qword ptr [rdi], rsi
mov qword ptr [rdi + 16], rcx
mov qword ptr [rdi + 24], r8
ret
Apparently _ExtInt
is passed by value in integer registers until the calling convention runs out of registers. (At least in this early version; Perhaps x86-64 SysV should class it as "memory" when it's wider than 2 or maybe 3 registers, like structs larger than 16 bytes. Although moreso than structs, having it in registers is likely to be useful. Just put other args first so they're not displaced.)
The first _ExtInt arg is in R8:RCX:RDX:RSI, and the second has its low qword in R9, with the rest in memory.
A pointer to the return-value object is passed as a hidden first arg in RDI; x86-64 System V only ever returns in up to 2 integer registers (RDX:RAX) and this doesn't change that.