Why MIPS uses R0 as "zero" when you could just XOR two registers to produce 0?
The zero-register on RISC CPUs is useful for two reasons:
It's a useful constant
Depending on restrictions of the ISA, you can't use a literal in some instructions encoding, but you can be sure you can use that r0
to get 0.
It can be used to synthesize other instructions
This is perhaps the most important point. As a ISA designer, you can trade-off a general purpose register to a zero-register to be able to synthesize other useful instructions. Synthesizing instructions is good because by having less actual instructions, you need less bits to encode an operation in a opcode, which frees-up space in the instruction encoding space. You can use that space to have e.g. bigger address offsets and/or literals.
The semantics of the zero-register is like /dev/zero
on *nix systems: everything written to it is discarded, and you always read back 0.
Let's see a few examples of how we can make pseudo-instructions with the help of the r0
zero-register:
; ### Hypothetical CPU ###
; Assembler with syntax:
; op rd, rm, rn
; => rd: destination, rm: 1st operand, rn: 2nd operand
; literal as #lit
; On an CPU architecture with a status register (which contains arithmetic status
; flags), `sub` can be used, with r0 as destination to discard result.
cmp rn, rm ; => sub r0, rn, rm
; `add` instruction can be used as a `mov` instruction:
mov rd, rm ; => add rd, rm, r0
mov rd, #lit ; => add rd, r0, #lit
; Negate:
neg rd, rm ; => sub rd, r0, rm
; On CPU without status flags,
nop ; => add r0, r0, r0
; RISC-V's `jal` instruction -- Jump and Link: Jump to PC-relative instruction,
; save return address into rd; we can synthesize a `jmp` instruction out of it.
jmp dest ; => jal r0, dest
; You can even load from an absolute (direct) address, for a usually small range
; of addresses by using a literal offset as an address.
ld rd, addr ; => ld rd, [r0, #addr]
The case of MIPS
I looked more closely at the MIPS instruction set. There are a handful of pseudo-instructions that uses $zero
; they are mainly used for branches. Here are some examples of what I've found:
move $rt, $rs => add $rt, $rs, $zero
not $rt, $rs => nor $rt, $rs, $zero
b Label => beq $zero, $zero, Label ; a small relative branch
bgt $rs, $rt, Label => slt $at, $rt, $rs
bne $at, $zero, Label
blt $rs, $rt, Label => slt $at, $rs, $rt
bne $at, $zero, Label
bge $rs, $rt, Label => slt $at, $rs, $rt
beq $at, $zero, Label
ble $rs, $rt, Label => slt $at, $rt, $rs
beq $at, $zero, Label
As for why you have found only one instance of the $zero
register in your disassembly, perhaps it's your disassembler that is smart enough to transform known sequences of instructions into their equivalent pseudo-instruction.
Is the zero-register really useful?
Well, apparently, ARM finds having a zero-register useful enough that in their (somewhat) new ARMv8-A core, which implement AArch64, there's now a zero-register in 64-bit mode; there wasn't a zero-register before. (The register is a bit special though, in some encoding contexts it's a zero-register, in others it instead designates the stack pointer)
Most ARM/POWER/SPARC implementations have a hidden RAZ register
You might think that ARM32, SPARC etc do not have a 0 register but in fact they do! At the micro-architecture level, most CPU design engineers add in a 0 register that may be invisible to software (ARM's zero register is invisible) and use that zero register to streamline instruction decode.
Consider a typical modern ARM32 design that has a software invisible register, say R16 wired to 0. Consider the ARM32 load, many cases of ARM32 load instruction fall into one of these forms (Ignore pre-post indexing for a while to keep the discussion simple)...
LDR ra, [rb] // NOTE:The ! is optional and represents address writeback.
LDR ra, [rb, rc](!)
LDR ra, [rb, #k](!)
Inside the processor, this decodes to a general
ldr.uop ra, rb, rx, rc, #c // Internal decoded instruction format.
before entering the issue stage where registers are read. Note that rx represents the register to write-back the updated address. Here are some decode examples:
LDR R0, [R1] ==> ldr.uop R0, R1, R16, R16, #0 // Writeback to NULL.
LDR R0, [R1, R2]! ==> ldr.uop R0, R1, R1, R2, #0 // Writeback to R1.
LDR R0, [R1, #2] ==> ldr.uop R0, R1, R16, R16, #2 // Writeback to NULL.
At the circuit level, all three loads are actually the same internal instruction and an easy way to get this kind of orthogonality is to create a ground register R16. Since R16 is always grounded, these instructions naturally decode correctly without any extra logic. Mapping a class of instructions to a single internal format greatly helps in superscalar implementations as it reduces logic complexity.
Another reason is a streamlined way to throw away writes. Instructions may be disabled by simply setting the destination register and flags to R16. There is no need of creating any other control signal to disable the write-back etc.
Most processor implementations irrespective of architecture end up with a RAZ register model early on in the pipeline. The MIPS pipeline essentially starts at a point that would in other architectures be a few stages in.
MIPS made the right choice
Thus, a read-as-zero register is almost mandatory in any modern processor implementation and MIPS making it visible to software is definitely a plus point given how it streamlines the internal decode logic. Designers of MIPS processors need not add in an extra RAZ register since $0 is already at ground. Since RAZ is available to the assembler, a lot of psuedo instructions are available to MIPS and one can think of this as pushing part of the decode logic to the assembler itself instead of creating dedicated formats for each instruction type to hide the RAZ register from software as with other architectures. The RAZ register is a good idea and that's why ARMv8 copied it.
If ARM32 had a $0 register, the decode logic would have become simpler and the architecture would have been much better in terms of speed, area and power. For example, of the three versions of LDR presented above, only 2 formats would be needed. Similarly, there is no need to reserve decode logic for the MOV and MVN instructions. Also, CMP/CMN/TST/TEQ would become redundant. There would also be no need to differentiate between short (MUL) and long multiplication (UMULL/SMULL) since short multiplication could be considered as long multiplication with the high register set to $0 etc.
Since MIPS was initially designed by a small team, the simplicity of design was important and thus $0 was explicitly chosen in the spirit of RISC. ARM32 retains lots of traditional CISC features at the architectural level.
Disclamer: I don't really know MIPS assembler, but 0-value register is not unique to this architecture, and I guess it is used in the same way as in other RISC architectures I know.
XORing a register to obtain 0 will cost you one instruction, while using a predefined 0-value register will not.
For example, mov RX, RY
instruction is often implemented as add RX, RY, R0
. Without a 0-value register, you'd have to xor RZ, RZ
every time you want to use mov
.
Another example is cmp
instruction and its variants (like "compare and jump", "compare and move", etc), where cmp RX, R0
is used to test for negative numbers.