Performance tuning in Mathematica?

Since Mathematica is a symbolic system, with symbolic evaluator much more general than in Matlab, it is not surprising that performance-tuning can be more tricky here. There are many techniques, but they can all be understood from a single main principle. It is:

Avoid full Mathematica symbolic evaluation process as much as possible.

All techniques seem to reflect some facet of it. The main idea here is that most of the time, a slow Mathematica program is such because many Mathematica functions are very general. This generality is a great strength, since it enables the language to support better and more powerful abstractions, but in many places in the program such generality, used without care, can be a (huge) overkill.

I won't be able to give many illustrative examples in the limited space, but they can be found in several places, including some WRI technical reports (Daniel Lichtblau's one on efficient data structures in Mathematica comes to mind), a very good book of David Wagner on Mathematica programming, and most notably, many Mathgroup posts. I also discuss a limited subset of them in my book. I will supply more references soon.

Here are a few most common ones (I only list those available within Mathematica language itself, not mentioning CUDA \ OpenCL, or links to other languages, which are of course also the possibilities):

  1. Push as much work into the kernel at once as possible, work with as large chunks of data at a time as possible, without breaking them into pieces

    1.1. Use built-in functions whenever possible. Since they are implemented in the kernel, in a lower-level language (C), they are typically (but not always!) much faster than user-defined ones solving the same problem. The more specialized version of a built-in function you are able to use, the more chances you have for a speed-up.

    1.2. Use functional programming (Map, Apply, and friends). Also, use pure functions in #-& notation when you can, they tend to be faster than Function-s with named arguments or those based on patterns (especially for not computationally-intensive functions mapped on large lists).

    1.3. Use structural and vectorized operations (Transpose, Flatten, Partition, Part and friends), they are even faster than functional.

    1.4. Avoid using procedural programming (loops etc), because this programming style tends to break large structures into pieces (array indexing etc). This pushes larger part of the computation outside of the kernel and makes it slower.

  2. Use machine-precision whenever possible

    2.1. Be aware and use Listability of built-in numerical functions, applying them to large lists of data rather than using Map or loops.

    2.2. Use Compile, when you can. Use the new capabilities of Compile, such as CompilationTarget->"C", and making our compile functions parallel and Listable.

    2.3. Whenever possible, use vectorized operations (UnitStep, Clip, Sign, Abs, etc) inside Compile, to realize "vectorized control flow" constructs such as If, so that you can avoid explicit loops (at least as innermost loops) also inside Compile. This can move you in speed from Mathematica byte-code to almost native C speed, in some cases.

    2.4. When using Compile, make sure that the compiled function doesn't bail out to non-compiled evaluation. See examples in this MathGroup thread.

  3. Be aware that Lists are implemented as arrays in Mathematica

    3.1. Pre-allocate large lists

    3.2. Avoid Append, Prepend, AppendTo and PrependTo in loops, for building lists etc (because they copy entire list to add a single element, which leads to quadratic rather than linear complexity for list-building)

    3.3. Use linked lists (structures like {1,{2,{3,{}}}} ) instead of plain lists for list accumulation in a program. The typical idiom is a = {new element, a}. Because a is a reference, a single assignment is constant-time.

    3.4. Be aware that pattern-matching for sequence patterns (BlankSequence, BlankNullSequence) is also based on Sequences being arrays. Therefore, a rule {fst_,rest___}:>{f[fst],g[rest]} will copy the entire list when applied. In particular, don't use recursion in a way which may look natural in other languages. If you want to use recursion on lists, first convert your lists to linked lists.

  4. Avoid inefficient patterns, construct efficient patterns

    4.1. Rule-based programming can be both very fast and very slow, depending on how you build your structures and rules, but in practice it is easier to inadvertently make it slow. It will be slow for rules which force the pattern-matcher to make many a priory doomed matching attempts, for example by under-utilizing each run of the pattern-matcher through a long list (expression). Sorting elements is a good example: list//.{left___,x_,middle___,y_,right___}/;x>y:>{left,y,middle,x,right} - has a cubic complexity in the size of the list (explanation is e.g. here).

    4.2. Build efficient patterns, and corresponding structures to store your data, making pattern-matcher to waste as little time on false matching attempts as possible.

    4.3. Avoid using patterns with computationally intensive conditions or tests. The pattern-matcher will give you the most speed when patterns are mostly syntactic in nature (test structure, heads, etc). Every time when condition (/;) or pattern test (?) is used, for every potential match, the evaluator is invoked by the pattern-matcher, and this slows it down.

  5. Be aware of immutable nature of most Mathematica built-in functions

    Most Mathematica built-in functions which process lists create a copy of an original list and operate on that copy. Therefore, they may have a linear time (and space) complexity in the size of the original list, even if they modify a list in only a few places. One universal built-in function that does not create a copy, modifies the original expression and does not have this issue, is Part.

    5.1. Avoid using most list-modifying built-in functions for a large number of small independent list modifications, which can not be formulated as a single step (for example, NestWhile[Drop[#,1]&,Range[1000],#<500&] )

    5.2. Use extended functionality of Part to extract and modify a large number of list (or more general expression) elements at the same time. This is very fast, and not just for packed numerical arrays (Part modifies the original list).

    5.3. Use Extract to extract many elements at different levels at once, passing to it a possibly large list of element positions.

  6. Use efficient built-in data structures

    The following internal data structures are very efficient and can be used in many more situations than it may appear from their stated main purpose. Lots of such examples can be found by searching the Mathgroup archive, particularly contributions of Carl Woll.

    6.1. Packed arrays

    6.2. Sparse arrays

  7. Use hash - tables.

    Starting with version 10, immutable associative arrays are available in Mathematica (Associations)

    7.1 Associations

    the fact that they are immutable does not prevent them to have efficient insertion and deletion of key-value pairs (cheap copies different from the original association by the presence, or absence, of a given key-value pair). They represent the idiomatic associative arrays in Mathematica, and have very good performance characteristics.

    For earlier versions,the following alternatives work pretty well, being based on internal Mathematica's hash-tables:

    7.2. Hash-tables based on DownValues or SubValues

    7.3. Dispatch

  8. Use element - position duality

    Often you can write faster functions to work with positions of elements rather than elements themselves, since positions are integers (for flat lists). This can give you up to an order of magnitude speed-up, even compared to generic built-in functions (Position comes to mind as an example).

  9. Use Reap - Sow

    Reap and Sow provide an efficient way of collecting intermediate results, and generally "tagging" parts you want to collect, during the computation. These commands also go well with functional programming.

  10. Use caching, dynamic programming, lazy evaluation

    10.1. Memoization is very easily implemented in Mathematica, and can save a lot of execution time for certain problems.

    10.2. In Mathematica, you can implement more complex versions of memoization, where you can define functions (closures) at run-time, which will use some pre-computed parts in their definitions and therefore will be faster.

    10.3. Some problems can benefit from lazy evaluation. This seems more relevant to memory - efficiency, but can also affect run-time efficiency. Mathematica's symbolic constructs make it easy to implement.

A successful performance - tuning process usually employs a combination of these techniques, and you will need some practice to identify cases where each of them will be beneficial.


You may use the profiler included in the Wolfram Workbench

alt text


Take a look at the presentation Principles for Efficient Mathematica Programs from the Wolfram Technology Conference 2007.

Another useful presentation is Tips for Memory Efficient Coding in the Wolfram Language.