64/32-bit division on a processor with 32/16-bit division
My copy of Knuth (The Art of Computer Programming) is at work, so I can't check it until Monday, but that would be my first source. It has a whole section on arithmetic.
edit: your post about "16/16 division and 32/16 division which both take 18 cycles." -- dsPICs have a conditional subtract operation in assembly. Consider using this as your computational primitive.
Also note that if X = XH * 232 + XL and D = DH * 216 + DL, then if you are looking for
(Q,R) = X/D where X = Q * D + R
where Q = QH * 216 + QL, R = RH * 216 + RL, then
XH * 232 + XL = DH * QH * 232 + (DL * QH + DH * QL) * 216 + (DL * QL) + RH * 216 + RL
This suggests (by looking at terms that are the high 32 bits) to use the following procedure, akin to long division:
- (QH, R0) = XH / (DH+1) -> XH = QH * (DH+1) + R0 [32/16 divide]
- R1 = X - (QH * 216) * D [requires a 16*32 multiply, a shift-left by 16, and a 64-bit subtract]
- calculate R1' = R1 - D * 216
- while R1' >= 0, adjust QH upwards by 1, set R1 = R1', and goto step 3
- (QL, R2) = (R1 >> 16) / (DH+1) -> R1 = QL * (DH+1) + R2 [32/16 divide]
- R3 = R1 - (QL * D) [requires a 16*32 multiply and a 48-bit subtract]
- calculate R3' = R3 - D
- while R3' >= 0, adjust QL upwards by 1, set R3 = R3', and goto step 7
Your 32-bit quotient is the pair (QH,QL), and 32-bit remainder is R3.
(This assumes that the quotient is not larger than 32-bit, which you need to know ahead of time, and can easily check ahead of time.)
See "Hacker's Delight", multiword division (pages 140-145).
The basic concept (going back to Knuth) is to think of your problem in base-65536 terms. Then you have a 4 digit by 2 digit division problem, with 2/1 digit division as a primitive.
The C code is here: https://github.com/hcs0/Hackers-Delight/blob/master/divmnu.c.txt