How do ASCII Adjust and Decimal Adjust instructions work?
why in the pseudo-code of AAA, AAS we have to add, subtract 6 from the low-order nibble in AL
Because in hexadecimal each character has 16 distinct values and BCD has only 10. When you do math in decimal, if a number is larger than 10 you need to take the modulus of 10 and carry to the next row. Similarly, in BCD math, when the result of the addition is larger than 9 you add 6 to skip the 6 remaining "invalid" values and carry to the next digit. Conversely you subtract 6 in subtractions.
For example: 27 + 36
27: 0010 0111
+ 36: 0011 0110
───────────────
5_13: 0101 1101 (13 >= 10)
+ 6: 0110
───────────────
63: 0110 0011 (13 + 6 = 19 = 0x13, where 0x3 is the units digit and 0x10 is the carry)
Doing unpacked addition is the same except that you carry directly from the units digit to the tens digit, discarding the top nibbles of each byte
For more information you can read
- BCD Addition assembly program logic
- Why must six be added to a BCD addition if it is an invalid BCD code?
and can someone explain AAM, AAD and the Decimal adjust instructions pseudo-code in the Intel instruction set manuals too, why are they like that, what's the logic behind them?
AAM is just a conversion from binary to BCD. You do the multiplication normally in binary, then calling AAM divides the result by 10 and store the quotient-remainder pair in two unpacked BCD characters
For example:
13*6 = 78 = 0100 1110
78/10 = 7 remains 8 => result = 0x78
AAD is the reverse: before the division, you call AAD to convert it from BCD to binary and do the division just like other binary divisions
For example: 87/5
0x8*10 + 0x7 = 0x57
0x57/5 = 0x11 remains 0x7
The reason for those instruction is because in the past, memories are expensive and you must reduce the memory usage as much as possible. Hence in that era CISC CPUs are very common. They use lots of complex instructions to minimize the instructions used to do a task. Nowadays memory is much cheaper and modern architectures are almost RISCy, with the trade off of CPU complexity and code density