Is CIL an assembly language and JIT an assembler
The line is actually pretty blurry... the arguments I've seen against calling CIL
an "assembly language" can apply almost as well to x86
/x86-64
in practice.
Intel and AMD haven't made processors that execute assembly instructions exactly as emitted in decades (if ever), so even so-called "native" code is not much different from running on a virtual machine whose bytecode is specified in x86
/x86-64
.
x86
/x86-64
are the lowest-level thing typical developers have access to, so if we had to put our foot down and call something in our ecosystem an "assembly language", that would win, and since CIL
bytecode ultimately requires x86
/x86-64
instructions to be able to run on a processor in that family, then there's a pretty strong case to be made that it indeed doesn't "feel" like it should count.
So in a sense, maybe neither can be considered to be "assembly language". When referring to x86
/x86-64
processors, we almost never refer to processors that execute x86
/x86-64
without translating it into something else (i.e., whatever the microcode does).
To add in yet another wrinkle, the way in which an x86
/x86-64
processor executes a given sequence of instructions can change simply by updating the microcode. A quick search shows that Linux can even make it easy to do this yourself in software!
So I guess, here are criteria that can justify putting them in two separate categories:
- Does it matter that all current machines that run
CIL
bytecode are implemented in software? - Does it matter that the same hardware can interpret the same
x86
/x86-64
instructions in a different way after being instructed to do so in software? - Does it matter that we don't currently have a way of bypassing the microcode and issuing commands directly to the physical units of
x86
/x86-64
processors?
So regarding the "is CIL
an assembly language` question, the best answers I can give are "it depends" (for scientists) and "pretty much" (for engineers).
Assembly is made up of mnemonics for the machine code instructions of a particular processor. A direct representation of the 1s and 0s that make the core execute code, but written in text to make it easy on a human. Which is very unlike CIL:
- you can't buy a processor that executes CIL
- CIL doesn't target a specific processor, the jitter does
- CIL assumes a stack-based execution model, processors are primarily register based
- CIL code is optimized from its original form
- there is no one-to-one translation of a CIL instruction to a processor instruction
That last bullet is a key one, a design decision that makes CIL strongly different from bytecode is that CIL instructions are type-less. There is only one ADD instruction but processors have many versions of it. Specific ones that take byte, short, int, long, float and double operands. Required because different parts of the processor core are used to execute the add. The jitter picks the right one, based on the type of the operands it infers from previous CIL instructions.
Just like the + operator in the C# language, it also can work with different operand types. Which really make the L in CIL significant, it is a Language. A simple one, but it is only simple to help make it easy to write a jitter for it.
This question is all about definitions, so let's define the terms properly. First, assembly language:
Assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices in which each statement corresponds to a single machine language instruction. An assembly language is specific to a certain computer architecture, in contrast to most high-level programming languages, which generally are portable to multiple systems.
Now, CIL:
Common Intermediate Language is the lowest-level human-readable programming language defined by the Common Language Infrastructure (CLI) specification and is used by the .NET Framework and Mono. Languages which target a CLI-compatible runtime environment compile to CIL, which is assembled into an object code that has a bytecode-style format.
Okay, this part is technically not correct: for example C# compiler compiles directly to the bytecode, it doesn't go through CIL (the human-readable language), but theoretically, we can imagine that's what's happening.
With these two definitions, CIL is an assembly language, because each statement in it is compiled down to a single bytecode instruction. The fact that there is no physical computer that can execute that bytecode directly doesn't matter.
The definition says that each assembly language is “specific to a certain computer architecture”. In this case, the architecture is the CLR virtual machine.
About JIT: the JIT compiler can't be considered an assembler: it doesn't do the 1:1 translation from human-readable form to bytecode, ilasm
does that.
The JIT compiler is an optimizing compiler that compiles from bytecode to native machine code (for whatever ISA / CPU it's running on), while making optimizations.