8086- why can't we move an immediate data into segment register?
Remember that the syntax of assembly language (any assembly) is just a human-readable way to write machine code. The rules of what you can do in machine code depend on how the processor's electronics were designed, not on what the assembler syntax could easily support.
So, just because it looks like you could write mov DS, 5000h
and that conceptually it doesn't seem like there is a reason why you shouldn't be able to do it, it's really about "is there a mechanism by which the processor can load a segment register directly from an immediate value?"
In the case of 8086 assembly, I figure that the reason is simply that the engineers just didn't create an electric path that could feed a signal from the memory I/O data lines to the lines that write to the segment registers.
Why? I have several theories, but no authoritative knowledge.
The most likely reason is simply one of simplifying the design: it takes extra wiring and gates to do that, and it's an uncommon enough operation (this is the 70's) that it's not worth the real estate in the chip. This is not surprising; the 8086 already went overboard allowing any of the normal registers to be connected to the ALU (arithmetic logic unit) which allows any register to be used as an accumulator. I'm sure that wasn't cheap to do. Most processors at the time only allowed one register (the accumulator) to be used for that purpose.
As far as the brackets, you are correct. Let's say memory position 5000h contains the number 4321h. mov ax, 5000h
puts the value 5000h into ax, while mov ax, [5000h]
loads 4321h from memory into ax. Essentially, the brackets act like the *
pointer dereference operator in C.
Just to highlight the fact that assembly is an idealized abstraction of what machine code can do, you should note that the two variations are not the same instruction with different parameters, but completely different opcodes. They could have used – say – MOV
for the first and MVD
(MoVe Direct addressed memory) for the second opcode, but they must have decided that the bracket syntax was easier for programmers to remember.
x86 machine code only has one opcode for move-to-Sreg. That opcode is
8E /r
mov Sreg, r/m16
, and allows a register or memory source (but not immediate).
Contrary to some claims in other answers, mov ds, [5000h]
runs just fine, assuming the 2 bytes at address 5000h
hold a useful segment value for the mode you're in. (Real mode where they're used directly as numbers vs. protected where Sreg values are selectors that index the LDT / GDT).
x86 always uses a different opcode for the immediate form of an instruction (with a constant encoded as part of the machine code) vs. the register/memory source version. e.g. add eax, 123
assembles to a different opcode from add eax, ecx
. But add eax, [esi]
is the same add r, r/m32
opcode as add eax, ecx
, just a different ModR/M byte.
NASM listing, from nasm sreg.asm -l/dev/stdout
, assembling a flat binary in 16-bit mode and producing a listing.
I edited by hand to separate the bytes into opcode modrm extra
. These are all one-byte opcodes (with no extra opcode bits borrowing space in the /r field of the ModRM byte), so just look at the first byte to see what opcode it is, and notice when two instructions share the same opcode.
address machine code source ; comments
1 00000000 BE 0050 mov si, 5000h ; mov si, imm16
2 00000003 A1 0050 mov ax, [5000h] ; special encoding for AX, no modrm
3 00000006 8B 36 0050 mov si, [5000h] ; mov r16, r/m16 disp16
4 0000000A 89 C6 mov si, ax ; mov r/m16, r16
5
6 0000000C 8E 1E 0050 mov ds, [5000h] ; mov Sreg, r/m16
7 00000010 8E D8 mov ds, ax ; mov Sreg, r/m16
8
9 mov ds, 5000h
9 ****************** error: invalid combination of opcode and operands
Supporting a mov Sreg, imm16
encoding would need a separate opcode. This would take extra transistors for 8086 to decode, and it would use up more opcode coding space leaving less room for future extensions. I'm not sure which of these was considered more important by the architect(s) of the 8086 ISA.
Notice that 8086 has special mov AL/AX, moffs
opcodes which save 1 byte when loading the accumulator from an absolute address. But it couldn't spare an opcode for mov
-immediate to Sreg? This design decision makes good sense. How often do you need to reload a segment register? Very infrequently, and in real large programs it often wouldn't be with a constant (I think). But in code using static data, you might be loading / storing the accumulator to a fixed address inside a loop. (8086 had very weak code-fetch, so code-size = speed most of the time).
Also keep in mind that you can use mov Sreg, r/m16
for assemble-time constants with just one extra instruction (like mov ax, 4321h
). But if we'd only had mov Sreg, imm16
, runtime variable segment values would have required self-modifying code. (So obviously you wouldn't leave out the r/m16
source version.) My point is if you're only going to have one, it's definitely going to be the register/memory source version.
About segment registers
The segment registers are not the same (on hardware level) as the general purpose registers. Of course, as Mike W said in the comments, the exact reason why you can't move directly immediate value into the segment register is known only by the Intel developers. But I suppose, it is because the design is simple this way. Note that this choice does not affects the processor performance, because the segment register operations are very rare. So, one instruction more, one less is not important at all.
About syntax
In all reasonable implementations of x86 assembler syntax, mov reg, something
moves the immediate number something
to the register reg
. For example:
NamedConst = 1234h
SomeLabel:
mov edx, 1234h ; moves the number 1234h to the register edx
mov eax, SomeLabel ; moves the value (address) of SomeLabel to eax
mov ecx, NamedConst ; moves the value (1234h in this case) to ecx
Closing the number in square brackets means that the content of memory with this address is moved to the register:
SomeLabel dd 1234h, 5678h, 9abch
mov eax, [SomeLabel+4] ; moves 5678h to eax
mov ebx, dword [100h] ; moves double word memory content from the
; address 100h in the data segment (DS) to ebx.