What does Compile[] do to make code run so much faster?
One of the biggest differences between main kernel evaluation and compiled evaluation in the "Wolfram Virtual Machine" (WVM) is that in the kernel, arbitrary expressions are allowed that are rewritten according to pattern-matching rules and in the WVM things are much more restricted and predictable. For instance, the types of all variables are restricted and can all be predicted at compile time. Only certain functions or restricted versions of them are available. The restrictions and predictability allow for optimizations. The WVM execution cycle is also simpler and faster than the kernel's. The restrictions are significant, and many Mathematica programs cannot be compiled down to WVM code because they violate these restrictions. Pattern-matching is one of the signature features of Mathematica, but it is not available in the WVM.
The new FunctionCompile
generalizes things somewhat, but I don't really grok it yet.
Note that Table
, Map
, Do
, etc. do try to compile their functions when the length is long enough. See SystemOptions["CompileOptions"]
.
Note also that code that operates on medium to large arrays but stays in the MKL/BLAS/LAPACK functions is generally faster uncompiled, although they will be close. (This remark is for 5yo prodigies.)
Extended comment..
I've never come across specifics of what Compile[]
specifically does.
That said, a definition re: programming languages in general provides some insight/understanding:
Compiling is the transformation from Source Code (human readable) into machine code (computer executable). ... A compiler takes the recipe (code) for a new program (written in a high level language) and transforms this Code into a new language (Machine Language) that can be understood by the computer itself.
Wolfram has developed the Wolfram Language as a very high even meta-level
language.
This provides huge advantages to developers/programers, including: conciseness, clarity, ease of debugging, functional organizational paradigm, and (largely) self-documenting code.
In past conversations, with Wolfram staff, they have describe a process of developing Mathematica functions in C level languages. One would expect Wolfram to compile as much of these efforts as possible into as low a level language as possible.
One always has a trade-off in doing this. Generally the lower level of the language, the faster, BUT the lower level of the language the harder it becomes to modify, debug, and refactor the code.
Observing Wolfram's continuing refactoring of the language, especially in developing more abstract level constructs (e.g., CompiledFunction
objects, TemporalData
objects) leads me to surmise, that this kind of refactoring/abstraction of the language works hand-in-hand with Wolfram compiling more and more of these constructs and related functions to lower and lower levels of a code stack, with the limit of assembler/machine language.
So, Wolfram, has many incentives to compile everything it can compile.
This becomes obvious, when one runs certain speed tests of Mathematica code vs C, C#, or C++ code. As often as not, when I've done this, Mathematica code just runs faster.
Of course it doesn't do this in every such speed comparison, but it does it often enough to know that the architects at Wolfram do some very smart things.
Thinking through the above, it become clear why certain custom code can benefit from compiling and why Wolfram makes compiling possible.
A stand alone Mathematica function may run compiled code in the background (and run blazingly fast), but such functions almost invariably have connector code to enable them to interact with other native
Mathematica functions.
When you or I develop custom functions using multiple native
Mathematica functions, then we have an increased high-level language overhead.
One could think that using Compile[]
on a custom function as stripping out any extra/unnecessary high-level language overhead. Doing this can, with a suitable custom function speed up performance considerably.
I am looking for an ELI5 explanation ... I am trying to get more insight to this rather than what I know, which is, because Wolfram code becomes "machine code".
Given its power, why does Mathematica not try to compile code in its own functions? If compilation is used for internal functions, why does having a bunch of Mathematica functions followed by one another become so much slower without compiling?
Here is my ELI5 thought process. All code gets compiled (converted) into machine code. Without this happening the machine doesn't know what to do. The problem is that an interpreted language doesn't know if you're going to use a particular sequence of functions again so it throws away the machine code after using it. When the sequence is encountered again it has to go through the same time-consuming process of converting to machine code. A compiler, turns a whole section of high-level code into the most efficient possible machine code and remembers it. The chunk of machine code is stored with a name and doesn't have to be reinterpreted every time it is encountered.
P.S.
Another aspect of trying to foretell the future is that Mathematica does not know ahead of time how much flexibility is needed. For example, it may have to allow for delaying evaluation of parts of the expression according to the user's input to the function. Mathematica allows for very general forms of input. Compiling may be able to lose this flexibility in return for speed. It can do this by examining the whole of the compiled section and seeing whether flexibility is required or not.