What causes Module variables to leak?

Preamble

I will try to summarize some cases I've seen or encountered, in a few rules, which I believe do hold and explain most or all of the cases I am aware of.

The rules

Here are the rules (assuming that $HistoryLength is set to 0 and that there are no UI elements present on screen, or other code constructs - such as e.g. Internal`Cache object which one might use, etc., which reference any of the symbols in question):

  1. Module does clear all *Values of local symbols, as long as all of the following conditions hold:

    • a. They are not returned from Module (by themselves or as parts of larger expressions)
    • b. They are not referenced by definitions of any symbols in the outer lexical scopes, by the time Module exits.
    • c. They don't have circular references to each other
  2. For local variables having only OwnValues defined:

    • a. If all of the conditions in rule # 1 hold, the symbols are garbage-collected right away. More precisely, their definitions are cleared when Module exits, but symbols themselves are collected as soon as they are no longer referenced by any expressions.
    • b. If 1.b and 1.c hold, but 1.a does not:
        1. If the symbols have OwnValues defined through Set rather than SetDelayed, then symbols and their definitions survive outside Module for as long as they are referenced in the computation that uses the return value of Module
        1. If the symbols have OwnValues defined through SetDelayed, then they will leak to the outer scope and survive there indefinitely, regardless of whether they are / were referenced externally or not.
    • c. If 1.a and 1.b hold, but 1.c does not, the symbols and definitions will leak to the outer scope and survive there indefinitely, regardless of whether they are / were referenced externally or not.
  3. Whenever local symbols are referenced by external symbols, the following happens when Module exits:

    • a. If local symbol has OwnValues defined by immediate assignments (Set, including in Module initialization) and no other *Values defined and it contains no self-references, then those symbols and their OwnValues are retained only as long as the symbols are still externally referenced, and GC-d after that.
    • b. If local symbol has either OwnValues defined by delayed assignments (SetDelayed), or self-references, or other defined *Values (DownValues, SubValues, UpValues), those values are retained / leaked into the global scope, regardless of whether the symbol is returned from Module or not.
  4. Whenever symbols have circular references to each other, they retain their definitions (leaked, are not collected/ destroyed) after Module exits, in all cases, and whether or not they were referenced by external symbols inside Module.

  5. The garbage collector removes all Temporary symbols as soon as they satisfy both of these conditions:

    • Have no references (by other symbols or themselves)
    • Have no attached definitions (with an exception of symbols with existing OwnValues obtained via immediate assignments / Set while referenced by external symbols - in which case GC will keep both the symbol and the definition until the symbol is no longer referenced, at which point it is collected)

Exceptions and puzzling behavior

There are cases where the above rules don't hold, but where the behavior of Module is puzzling enough that it probably makes more sense to categorize it as an exception rather than trying to modify the rules.

As illustrated below, particularly in the section on Module and Unique, unique Temporary symbols pretty much always leak when they have delayed definitions attached to them, and it is Module's responsibility to clean those up in cases when it can determine that the variable actually can and should be collected. The problem seems to be that Module isn't really doing a good job at that, in all cases.

Local dependent non-cyclically variables with delayed definitions

While the list of exceptions will probably grow with time, the first one was noted by Shadowray in his answer, it is example # 3 there.

DownValues

Basically, this leaks local variable a:

Module[{a, b}, 
  a[y_] := 2 y;
  b[y_] := 2 a[y];
  b[1]
]

(* 4 *)

(leaks can be seen using the function vals defined below, similarly to other examples below. In this case one would have to execute vals[DownValues]["a"]), explicitly violating the rule #1 above (since all 3 conditions hold), while this does not:

Module[{b, a}, 
  a[y_] := 2 y;
  b[y_] := 2 a[y];
  b[1]
]

(* 4 *)

even though the only difference is the order of the variables in Module initialization list.

The former behavior looks like a Module bug to me.

OwnValues

Somewhat similar situation happens for OwnValues. The first case here will look as follows:

Module[{a, b}, 
  a := 2 ;
  b := 2 a;
  b
]

(* 4 *)

In this case, a does leak (evaluate vals[]["a"] to see it, vals defined below), but its definition (OwnValues) gets cleared by Module (unlike the previously considered case of DownValues). For the other one:

Module[{b, a}, 
  a := 2 ;
  b := 2 a;
  b
]

(* 4 *)

things are fine as before.

Possible explanation

I can only guess that Module, before exiting, "processes" local variables (for the purposes of clearing up their definitions), in the same order they appear in the Module initialization list. Therefore, in the first case, a is "processed" first, and by that time, b has not been destroyed yet, so to Module it looks like a has an extra ref.count from b, and therefore it does not clear a and leaks it. In the second case, b is processed first and promptly destroyed, and then a is processed and also promptly destroyed, since it no longer has a reference from b.

Status of this exception

While I have categorized this behavior as exception, there is a plausible explanation of it. So we may decide to promote this to a modification of rule #1 at some point, if further evidence of its correctness emerges.

Some implications

The main implication of the above set of rules is that the garbage collector is, in most cases, not smart enough to collect the temporary local symbols, even when they are no longer referenced by any other symbols, if those local symbols have some global rules / definitions attached.

Module is responsible for cleaning up those definitions. So every time when the symbol leaks outside of Module with definitions attached to it (except in one specific case of OwnValues defined by Set with no self-references, detailed below), it will stay in the system for an indefinite time, even after it stops being referenced by any other symbol.

Illustration

Preparation

We will assume for all examples below that they are executed on a fresh kernel with the following code executed first:

$HistoryLength = 0

vals[type_ : OwnValues][pattern_] := 
  Map[
    {#, ToExpression[#, StandardForm, type]} &,
    Names["Global`" ~~ pattern ~~ "$*"]
  ]

Rule #1

The rule #1 does not require almost any special examples, since it is something we have all experienced many times. The condition 1.c may need some illustration, which we will however give with the examples for rule # 2:

The rule #2

2.a

Here is an example to illustrate this case, which I've made a little more interesting by making a symbol reference itself:

Replace[
  Module[{a}, a = Hold[a]; a],
  Hold[s_] :> {s, OwnValues[s]}
]
vals[]["a"]

(* {a$713392, {}} *)

(* {} *)

what this shows is that while the symbol does get returned from Module as a part of its own value in Hold[a], it has no OwnValues outside Module - and is promptly collected once Replace finishes, as shown with a call to vals.

2.b

Here is an example to illustrate the cases 2.b.1 and 2.b.2

Replace[
  Module[{a}, a = 1; Hold[a]], 
  Hold[sym_] :> OwnValues[sym]
]
vals[]["a"]

(* {HoldPattern[a$3063] :> 1} *)

(* {} *)

This shows that the symbol and its definition both survive in this case for as long as they are needed in enclosing computation, and are GC-d right after that.


If we now change the way we defined local symbols from immediate to delayed, we will get the case covered by 2.b.2:

Replace[
  Module[{a}, a := 1; Hold[a]], 
  Hold[sym_] :> OwnValues[sym]
]
vals[]["a"]

(* {HoldPattern[a$3060] :> 1} *)

(* {{"a$3060", {HoldPattern[a$3060] :> 1}}} *)

An example observed by @Michael E2 also falls into the same category:

ff[] := Module[{a}, a := 1; a /; True]
ff[]
Remove[ff]
vals[]["a"]

(* 1 *)

(* {{"a$3063", {HoldPattern[a$3063] :> 1}}} *)

It is not clear to me why delayed definitions (should) prevent the symbol to get garbage - collected in cases like this (see also below) and whether this is actually a bug or not.

2.c

The case 2.c definitely needs an illustration:

Module[{a, b}, a = Hold[b]; b = Hold[a]; Length[{a, b}]]

(* 2 *)

vals[]["a" | "b"]

(* 
  {
    {"a$3063", {HoldPattern[a$3063] :> Hold[b$3063]}}, 
    {"b$3063", {HoldPattern[b$3063] :> Hold[a$3063]}}
  }
*)

This may be quite surprising for many, since the symbols are not returned from the Module directly, not referenced externally, and have only OwnValues. However, they reference each other, and WL's GC / Module are not smart enough to recognize that they are unreachable.

The rule #3

This is probably the most interesting one.

3.1

Here is a simple illustration for this one, where local symbol a is given an immediate definition and is referenced by external symbol s:

ClearAll[s];
Module[{a}, a = 1; s := a];
s

(* 1 *)

We can see that a gets GC-d right after we Remove s, as promised:

vals[]["a"]
Remove[s]
vals[]["a"]

(* {{"a$2628", {HoldPattern[a$2628] :> 1}}} *)

(* {} *)

3.b

This one will probably have the most examples. We start by modifying the previous example in a few ways.

First, let us make local symbol reference itself:

ClearAll[s];
Module[{a}, a = Hold[1, a]; s := a];
{s, Last[s]}

(* {Hold[1, a$3063], Hold[1, a$3063]} *)

In this case, removal of external reference (symbol s) does not help, since GC is not able to recognize the self-reference:

vals[]["a"]
Remove[s]
vals[]["a"]

(* {{"a$3063", {HoldPattern[a$3063] :> Hold[1, a$3063]}}} *)

(* {{"a$3063", {HoldPattern[a$3063] :> Hold[1, a$3063]}}} *)

Note b.t.w., that self-references are recognized in cases with no external references:

Module[{a}, a = Hold[a]; a]
vals[]["a"]

(* Hold[a$3090] *)

(* {} *)

My guess is that Module is smart enough to recognize self-references (but not mutual references, as we've seen) as long as there are no external references to a symbol - and then decide to destroy symbol's definitions - which automatically decrements the ref. count and makes the symbol's total ref.count 1 just before leaving Module and 0 right after leaving Module, thus making it collectable by the GC.

When there are external references, Module keeps symbol's definitions as well - that is, does not destroy them when exiting. Then later, even when external reference gets removed, we have both symbol and its definition present, and the ref. count is still 1, since while the definition is present, the symbol references itself. Which makes it look to the GC as a non-collectable symbol.


To illustrate the next case, let us create OwnValues with SetDelayed:

ClearAll[s];
Module[{a}, a := 1; s := a];
s

(* 1 *)

vals[]["a"]
Remove[s]
vals[]["a"]

(* {{"a$3067", {HoldPattern[a$3067] :> 1}}} *)

(* {{"a$3067", {HoldPattern[a$3067] :> 1}}} *)

It is less clear to me, why in this case the GC does not recognize the symbol as collectable even after external references have been removed. This might be considered a bug, or there might be some deeper reason and rationale for this behavior, which I simply am not seeing.


Finally, the case of existence of other *Values has been noted before, and I will steal a (slightly simplified) example from there:

Module[{g},
  Module[{f},
    g[x_] := f[x];
    f[1] = 1
  ];
  g[1]
]

(* 1 *)

vals[DownValues]["f" | "g"]

(* {{"f$", {}}, {"f$3071", {HoldPattern[f$3071[1]] :> 1}}} *)

This shows that even though the local variable g has itself been removed (since, while it had DownValues defined, it was not itself externally referenced), the inner local variable f has leaked, because, by the time inner Module was exiting, it was still referenced by g.

In this particular case, one (rather ugly) way to reclaim it is as follows:

Module[{g, inner},
  inner = Module[{f},
    g[x_] := f[x];
    f[1] = 1;
    f
  ];
  # &[g[1], Clear[Evaluate@inner]]
]

(* 1 *)

where we have returned the local variable f itself from inner Module, and put it into inner local variable of the outer Module - which made it possible to clear its definitions after g[1] was computed:

vals[DownValues]["f" | "g"]

(* {{"f$", {}}} *)

so that f had no definitions and therefore was GC-d (see rule 5). I've shown this workaround not to suggest to use such constructs in practice, but rather to illustrate the mechanics.

The rules #4 and #5

These have been already illustrated by the examples above.

Observations and speculations

Module and Unique

Things can actually be simpler than they look. We know that the Module localization mechanism is based on Unique. We can use this knowledge to test how much of the observed behavior of Module actually comes from the interplay between Unique and the garbage collector. This may allow us to demystify the role of Module here.

Let us consider a few examples with Unique, which would parallel the cases we already looked at in the context of Module.

First, let us create a unique Temporary symbol and simply observe that it gets immediately collected:

Unique[a, Temporary]
vals[]["a"]

(* a$3085 *)

(* {} *)

Next, we save it into a variable, assign it some value, and then Remove that variable:

b = Unique[a, Temporary]
vals[]["a"]
Evaluate[b] = 1
vals[]["a"]
Remove[b]
vals[]["a"]

(* a$3089 *)
(* {{"a$3089", {}}} *)
(* 1 *)
(* {{"a$3089", {HoldPattern[a$3089] :> 1}}} *)
(* {} *)

Here, the variable b plays a role of Module environment, which prevents the local variable from being immediately collected while inside Module. What we see is that as soon we Remove b (think - exit Module), the variable is destroyed. Note that the definition we gave was using Set.

We now repeat the same but replace Set with SetDelayed. Again, variable b emulates the Module environment:

b = Unique[a, Temporary]
Evaluate[b] := 1
vals[]["a"]
Remove[b]
vals[]["a"]


(* a$714504 *)
(* {{"a$714504", {HoldPattern[a$714504] :> 1}}} *)
(* {{"a$714504", {HoldPattern[a$714504] :> 1}}} *)

what we have just reproduced was a puzzling behavior of Module w.r.t. local variables assigned with SetDelayed.

Let us move on and consider self-references made with Set:

b = Unique[a, Temporary]
Evaluate[b] = Hold[Evaluate[b]]
vals[]["a"]
Remove[b]
vals[]["a"]

(* a$3070 *)
(* Hold[a$3070] *)
(* {{"a$3070", {HoldPattern[a$3070] :> Hold[a$3070]}}} *)
(* {{"a$3070", {HoldPattern[a$3070] :> Hold[a$3070]}}} *)

We have again reproduced exactly the behavior we previously observed for Module.

Finally, consider the case of mutual references:

c = Unique[a, Temporary]
d = Unique[b, Temporary]
With[{a = c, b  = d},
  a = Hold[b];
  b = Hold[a];
]
vals[]["a" | "b"]
Remove[c, d]
vals[]["a" | "b"]


(* a$3070 *)
(* b$3071 *)

(* 
  {
    {"a$3070", {HoldPattern[a$3070] :> Hold[b$3071]}}, 
    {"b$3071", {HoldPattern[b$3071] :> Hold[a$3070]}}
  }
*)

(* 
  {
    {"a$3070", {HoldPattern[a$3070] :> Hold[b$3071]}}, 
    {"b$3071", {HoldPattern[b$3071] :> Hold[a$3070]}}
  }
*)

Where again, we have reproduced the exact behavior we've seen before for Module.

What we can conclude from this, is that a large part of observed behaviors is actually due to the underlying behavior of Unique, rather than Module.

Simple Module emulation

To push the previous arguments a little further still, consider the following crude emulation of Module based on Unique:

SetAttributes[myModule, HoldAll]
myModule[vars : {___Symbol}, body_] :=
  Block[vars,
    ReleaseHold[
      Hold[body] /. Thread[vars -> Map[Unique[#, Temporary]&, vars]]
    ]
  ]

This emulation disallows initialization in the variable list, and simply replaces all occurrences of any of the vars symbols in the body with generated Temporary unique symbols, and then lets the body to evaluate.

If you rerun all the examples involving Module with myModule, you will observe exactly the same results in all cases but two: the example in 2.a and last one in 3.c. But those behaviors of the original Module are least puzzling, and the most puzzling ones are correctly reproduced with myModule.

So while obviously Module does more than myModule, it may do not that much more. This shifts the problem to one of the interplay between Unique and garbage collector, which might be considered at least some complexity reduction.

Conclusions

It seems that the behavior or Module in terms of symbol leaking can in general be described by a set of reasonably simple rules. Exceptions exist, but there it seems that at least they also may have plausible explanations.

We can make several general conclusions to summarize the behavior described above.

  • For garbage collection / symbol leaking, it does make a difference whether the symbol had external references or not, by the time the execution leaves Module
  • The garbage collector isn't smart enough to recount self-references or mutual references forming closed loops, after the execution left Module, and realize that some such local variables became collectable.
  • In the absence of external and self-references at the time code execution leaves the Module, OwnValues are typically fine in terms of symbol collection / not leaking.
  • Symbols with OwnValues created by immediate assignment (Set) and without self-references only keep their definitions until they are externally referenced (by other symbols or enclosing expressions, if returned from Module), and are promptly destroyed / garbage-collected afterwards.
  • Symbols with OwnValues keep their definitions and therefore are not collected, in cases when they are given delayed definitions (using SetDelayed) and they (still) were externally referenced at the time execution left Module. It is not clear why this is so, and whether or not this can be considered a bug.
  • Local symbols with DownValues and other *Values except OwnValues, will in general leak / not be collected if they have been externally referenced by the time the execution left their Module, regardless of whether or not they are still externally referenced
  • Once a Temporary symbol's definitions have been removed, the symbol will be collected as long as it is not referenced externally.

Most of the puzzling behavior from the above observations can be reproduced in a simpler setting with Module emulated in a very simple way using Unique variables. It looks like it has more to do with the dynamics of Unique variables and garbage collection, than Module per se. It may happen that Module is not doing all that much extra, in this regard.


I believe that the above description is accurate and covers all cases I am aware of. But I can easily imagine that there are cases I have not seen or accounted for, which would make the picture more complex (or may be, simpler). If you know of such cases, or others not well described by this scheme, please comment.


Here are some examples of unexpected memory leaks in Mathematica and how to avoid them:

1. Parallel computation functions may prevent garbage collection

Module[{a}, Length[ParallelTable[a, {10}]]];
Names["a*"]

{"a", "a$1698"}

Also when temporary symbol is sent to a parallel kernel the Temporary attribute is cleared:

Module[{a}, ParallelTable[Attributes[a], {10}] ]

{{}, {}, {}, {}, {}, {}, {}, {}, {}, {}}

How to avoid these leaks: Do not send temporary symbols to or from parallel kernels.

2. Mathematica stack tracing feature (introduced in v11) prevents garbage collection if your code produces messages

Module[{a}, a; 0/0];
Names["a*"]

{"a", "a$1697"}

Note: there will be no leak if you set $HistoryLength = 0

How to avoid this leak: set $HistoryLength = 0 or disable message menu via Internal`$MessageMenu = False See also: How do I disable the stack tracing feature in Mathematica 11?

3. Local functions inside Module may cause a memory leak if one function depends on another

f[] := Module[{a, b},
  a[y_] := 2 y;
  b[y_] := 2 a[y];
  b[1]
  ];
f[];
Names["a*"]

{"a", "a$1698"}

Note that this leak does not require neither self-references nor circular references like in Leonid answer.

It is remarkable that this leak is gone if one interchanges the order of symbols in the first argument of Module i.e. the following code does not leak:

f[] := Module[{b, a},
  a[y_] := 2 y;
  b[y_] := 2 a[y];
  b[1]
  ];
f[];
Names["a*"]

{"a"}

How to avoid this leak: avoid local functions altogether or remove them explicitly before exiting module, e.g.:

f[] := Module[{a, b, result},
  a[y_] := 2 y;
  b[y_] := 2 a[y];
  result = b[1];
  Remove[a, b];
  result
  ];
f[];
Names["a*"]

{"a"}

4. Local functions inside Module cause a memory leak when there is a Condition inside Module

f[x_] := Module[{a}, (a[y_] := y; a[x]) /; (x > 0)];
f[1];
Names["a*"]

{"a", "a$", "a$1698"}

How to avoid this leak: Remove local functions explicitly, e.g.

f[x_] := Module[{a, result}, (a[y_] := y; result = a[x]; Remove[a]; result) /; (x > 0)];
f[1];
Names["a*"]

{"a", "a$"}

Finally, for those who want to go deeper into debugging Mathematica garbage collector, there is a function, which gives a number of references to a given symbol: System`Private`GetRefCount[f]

Tags:

Scoping