What causes Module variables to leak?
Preamble
I will try to summarize some cases I've seen or encountered, in a few rules, which I believe do hold and explain most or all of the cases I am aware of.
The rules
Here are the rules (assuming that $HistoryLength
is set to 0
and that there are no UI elements present on screen, or other code constructs - such as e.g. Internal`Cache
object which one might use, etc., which reference any of the symbols in question):
Module
does clear all*Values
of local symbols, as long as all of the following conditions hold:- a. They are not returned from
Module
(by themselves or as parts of larger expressions) - b. They are not referenced by definitions of any symbols in the outer lexical scopes, by the time
Module
exits. - c. They don't have circular references to each other
- a. They are not returned from
For local variables having only
OwnValues
defined:- a. If all of the conditions in rule # 1 hold, the symbols are garbage-collected right away. More precisely, their definitions are cleared when
Module
exits, but symbols themselves are collected as soon as they are no longer referenced by any expressions. - b. If 1.b and 1.c hold, but 1.a does not:
- If the symbols have
OwnValues
defined throughSet
rather thanSetDelayed
, then symbols and their definitions survive outsideModule
for as long as they are referenced in the computation that uses the return value ofModule
- If the symbols have
- If the symbols have
OwnValues
defined throughSetDelayed
, then they will leak to the outer scope and survive there indefinitely, regardless of whether they are / were referenced externally or not.
- If the symbols have
- c. If 1.a and 1.b hold, but 1.c does not, the symbols and definitions will leak to the outer scope and survive there indefinitely, regardless of whether they are / were referenced externally or not.
- a. If all of the conditions in rule # 1 hold, the symbols are garbage-collected right away. More precisely, their definitions are cleared when
Whenever local symbols are referenced by external symbols, the following happens when
Module
exits:- a. If local symbol has
OwnValues
defined by immediate assignments (Set
, including inModule
initialization) and no other*Values
defined and it contains no self-references, then those symbols and theirOwnValues
are retained only as long as the symbols are still externally referenced, and GC-d after that. - b. If local symbol has either
OwnValues
defined by delayed assignments (SetDelayed
), or self-references, or other defined*Values
(DownValues
,SubValues
,UpValues
), those values are retained / leaked into the global scope, regardless of whether the symbol is returned fromModule
or not.
- a. If local symbol has
Whenever symbols have circular references to each other, they retain their definitions (leaked, are not collected/ destroyed) after
Module
exits, in all cases, and whether or not they were referenced by external symbols insideModule
.The garbage collector removes all
Temporary
symbols as soon as they satisfy both of these conditions:- Have no references (by other symbols or themselves)
- Have no attached definitions (with an exception of symbols with existing
OwnValues
obtained via immediate assignments /Set
while referenced by external symbols - in which case GC will keep both the symbol and the definition until the symbol is no longer referenced, at which point it is collected)
Exceptions and puzzling behavior
There are cases where the above rules don't hold, but where the behavior of Module
is puzzling enough that it probably makes more sense to categorize it as an exception rather than trying to modify the rules.
As illustrated below, particularly in the section on Module
and Unique
, unique Temporary
symbols pretty much always leak when they have delayed definitions attached to them, and it is Module
's responsibility to clean those up in cases when it can determine that the variable actually can and should be collected. The problem seems to be that Module
isn't really doing a good job at that, in all cases.
Local dependent non-cyclically variables with delayed definitions
While the list of exceptions will probably grow with time, the first one was noted by Shadowray in his answer, it is example # 3 there.
DownValues
Basically, this leaks local variable a
:
Module[{a, b},
a[y_] := 2 y;
b[y_] := 2 a[y];
b[1]
]
(* 4 *)
(leaks can be seen using the function vals
defined below, similarly to other examples below. In this case one would have to execute vals[DownValues]["a"]
), explicitly violating the rule #1 above (since all 3 conditions hold), while this does not:
Module[{b, a},
a[y_] := 2 y;
b[y_] := 2 a[y];
b[1]
]
(* 4 *)
even though the only difference is the order of the variables in Module
initialization list.
The former behavior looks like a Module
bug to me.
OwnValues
Somewhat similar situation happens for OwnValues
. The first case here will look as follows:
Module[{a, b},
a := 2 ;
b := 2 a;
b
]
(* 4 *)
In this case, a
does leak (evaluate vals[]["a"]
to see it, vals
defined below), but its definition (OwnValues
) gets cleared by Module
(unlike the previously considered case of DownValues
). For the other one:
Module[{b, a},
a := 2 ;
b := 2 a;
b
]
(* 4 *)
things are fine as before.
Possible explanation
I can only guess that Module
, before exiting, "processes" local variables (for the purposes of clearing up their definitions), in the same order they appear in the Module
initialization list. Therefore, in the first case, a
is "processed" first, and by that time, b
has not been destroyed yet, so to Module
it looks like a
has an extra ref.count from b
, and therefore it does not clear a
and leaks it. In the second case, b
is processed first and promptly destroyed, and then a
is processed and also promptly destroyed, since it no longer has a reference from b
.
Status of this exception
While I have categorized this behavior as exception, there is a plausible explanation of it. So we may decide to promote this to a modification of rule #1 at some point, if further evidence of its correctness emerges.
Some implications
The main implication of the above set of rules is that the garbage collector is, in most cases, not smart enough to collect the temporary local symbols, even when they are no longer referenced by any other symbols, if those local symbols have some global rules / definitions attached.
Module
is responsible for cleaning up those definitions. So every time when the symbol leaks outside of Module
with definitions attached to it (except in one specific case of OwnValues
defined by Set
with no self-references, detailed below), it will stay in the system for an indefinite time, even after it stops being referenced by any other symbol.
Illustration
Preparation
We will assume for all examples below that they are executed on a fresh kernel with the following code executed first:
$HistoryLength = 0
vals[type_ : OwnValues][pattern_] :=
Map[
{#, ToExpression[#, StandardForm, type]} &,
Names["Global`" ~~ pattern ~~ "$*"]
]
Rule #1
The rule #1 does not require almost any special examples, since it is something we have all experienced many times. The condition 1.c may need some illustration, which we will however give with the examples for rule # 2:
The rule #2
2.a
Here is an example to illustrate this case, which I've made a little more interesting by making a symbol reference itself:
Replace[
Module[{a}, a = Hold[a]; a],
Hold[s_] :> {s, OwnValues[s]}
]
vals[]["a"]
(* {a$713392, {}} *)
(* {} *)
what this shows is that while the symbol does get returned from Module
as a part of its own value in Hold[a]
, it has no OwnValues
outside Module
- and is promptly collected once Replace
finishes, as shown with a call to vals
.
2.b
Here is an example to illustrate the cases 2.b.1 and 2.b.2
Replace[
Module[{a}, a = 1; Hold[a]],
Hold[sym_] :> OwnValues[sym]
]
vals[]["a"]
(* {HoldPattern[a$3063] :> 1} *)
(* {} *)
This shows that the symbol and its definition both survive in this case for as long as they are needed in enclosing computation, and are GC-d right after that.
If we now change the way we defined local symbols from immediate to delayed, we will get the case covered by 2.b.2:
Replace[
Module[{a}, a := 1; Hold[a]],
Hold[sym_] :> OwnValues[sym]
]
vals[]["a"]
(* {HoldPattern[a$3060] :> 1} *)
(* {{"a$3060", {HoldPattern[a$3060] :> 1}}} *)
An example observed by @Michael E2 also falls into the same category:
ff[] := Module[{a}, a := 1; a /; True]
ff[]
Remove[ff]
vals[]["a"]
(* 1 *)
(* {{"a$3063", {HoldPattern[a$3063] :> 1}}} *)
It is not clear to me why delayed definitions (should) prevent the symbol to get garbage - collected in cases like this (see also below) and whether this is actually a bug or not.
2.c
The case 2.c definitely needs an illustration:
Module[{a, b}, a = Hold[b]; b = Hold[a]; Length[{a, b}]]
(* 2 *)
vals[]["a" | "b"]
(*
{
{"a$3063", {HoldPattern[a$3063] :> Hold[b$3063]}},
{"b$3063", {HoldPattern[b$3063] :> Hold[a$3063]}}
}
*)
This may be quite surprising for many, since the symbols are not returned from the Module
directly, not referenced externally, and have only OwnValues
. However, they reference each other, and WL's GC / Module
are not smart enough to recognize that they are unreachable.
The rule #3
This is probably the most interesting one.
3.1
Here is a simple illustration for this one, where local symbol a
is given an immediate definition and is referenced by external symbol s
:
ClearAll[s];
Module[{a}, a = 1; s := a];
s
(* 1 *)
We can see that a
gets GC-d right after we Remove
s
, as promised:
vals[]["a"]
Remove[s]
vals[]["a"]
(* {{"a$2628", {HoldPattern[a$2628] :> 1}}} *)
(* {} *)
3.b
This one will probably have the most examples. We start by modifying the previous example in a few ways.
First, let us make local symbol reference itself:
ClearAll[s];
Module[{a}, a = Hold[1, a]; s := a];
{s, Last[s]}
(* {Hold[1, a$3063], Hold[1, a$3063]} *)
In this case, removal of external reference (symbol s
) does not help, since GC is not able to recognize the self-reference:
vals[]["a"]
Remove[s]
vals[]["a"]
(* {{"a$3063", {HoldPattern[a$3063] :> Hold[1, a$3063]}}} *)
(* {{"a$3063", {HoldPattern[a$3063] :> Hold[1, a$3063]}}} *)
Note b.t.w., that self-references are recognized in cases with no external references:
Module[{a}, a = Hold[a]; a]
vals[]["a"]
(* Hold[a$3090] *)
(* {} *)
My guess is that Module
is smart enough to recognize self-references (but not mutual references, as we've seen) as long as there are no external references to a symbol - and then decide to destroy symbol's definitions - which automatically decrements the ref. count and makes the symbol's total ref.count 1
just before leaving Module
and 0
right after leaving Module
, thus making it collectable by the GC.
When there are external references, Module
keeps symbol's definitions as well - that is, does not destroy them when exiting. Then later, even when external reference gets removed, we have both symbol and its definition present, and the ref. count is still 1, since while the definition is present, the symbol references itself. Which makes it look to the GC as a non-collectable symbol.
To illustrate the next case, let us create OwnValues
with SetDelayed
:
ClearAll[s];
Module[{a}, a := 1; s := a];
s
(* 1 *)
vals[]["a"]
Remove[s]
vals[]["a"]
(* {{"a$3067", {HoldPattern[a$3067] :> 1}}} *)
(* {{"a$3067", {HoldPattern[a$3067] :> 1}}} *)
It is less clear to me, why in this case the GC does not recognize the symbol as collectable even after external references have been removed. This might be considered a bug, or there might be some deeper reason and rationale for this behavior, which I simply am not seeing.
Finally, the case of existence of other *Values
has been noted before, and I will steal a (slightly simplified) example from there:
Module[{g},
Module[{f},
g[x_] := f[x];
f[1] = 1
];
g[1]
]
(* 1 *)
vals[DownValues]["f" | "g"]
(* {{"f$", {}}, {"f$3071", {HoldPattern[f$3071[1]] :> 1}}} *)
This shows that even though the local variable g
has itself been removed (since, while it had DownValues
defined, it was not itself externally referenced), the inner local variable f
has leaked, because, by the time inner Module
was exiting, it was still referenced by g
.
In this particular case, one (rather ugly) way to reclaim it is as follows:
Module[{g, inner},
inner = Module[{f},
g[x_] := f[x];
f[1] = 1;
f
];
# &[g[1], Clear[Evaluate@inner]]
]
(* 1 *)
where we have returned the local variable f
itself from inner Module
, and put it into inner
local variable of the outer Module
- which made it possible to clear its definitions after g[1]
was computed:
vals[DownValues]["f" | "g"]
(* {{"f$", {}}} *)
so that f
had no definitions and therefore was GC-d (see rule 5). I've shown this workaround not to suggest to use such constructs in practice, but rather to illustrate the mechanics.
The rules #4 and #5
These have been already illustrated by the examples above.
Observations and speculations
Module
and Unique
Things can actually be simpler than they look. We know that the Module
localization mechanism is based on Unique
. We can use this knowledge to test how much of the observed behavior of Module
actually comes from the interplay between Unique
and the garbage collector. This may allow us to demystify the role of Module
here.
Let us consider a few examples with Unique
, which would parallel the cases we already looked at in the context of Module
.
First, let us create a unique Temporary
symbol and simply observe that it gets immediately collected:
Unique[a, Temporary]
vals[]["a"]
(* a$3085 *)
(* {} *)
Next, we save it into a variable, assign it some value, and then Remove
that variable:
b = Unique[a, Temporary]
vals[]["a"]
Evaluate[b] = 1
vals[]["a"]
Remove[b]
vals[]["a"]
(* a$3089 *)
(* {{"a$3089", {}}} *)
(* 1 *)
(* {{"a$3089", {HoldPattern[a$3089] :> 1}}} *)
(* {} *)
Here, the variable b
plays a role of Module
environment, which prevents the local variable from being immediately collected while inside Module
. What we see is that as soon we Remove
b
(think - exit Module
), the variable is destroyed. Note that the definition we gave was using Set
.
We now repeat the same but replace Set
with SetDelayed
. Again, variable b
emulates the Module
environment:
b = Unique[a, Temporary]
Evaluate[b] := 1
vals[]["a"]
Remove[b]
vals[]["a"]
(* a$714504 *)
(* {{"a$714504", {HoldPattern[a$714504] :> 1}}} *)
(* {{"a$714504", {HoldPattern[a$714504] :> 1}}} *)
what we have just reproduced was a puzzling behavior of Module
w.r.t. local variables assigned with SetDelayed
.
Let us move on and consider self-references made with Set
:
b = Unique[a, Temporary]
Evaluate[b] = Hold[Evaluate[b]]
vals[]["a"]
Remove[b]
vals[]["a"]
(* a$3070 *)
(* Hold[a$3070] *)
(* {{"a$3070", {HoldPattern[a$3070] :> Hold[a$3070]}}} *)
(* {{"a$3070", {HoldPattern[a$3070] :> Hold[a$3070]}}} *)
We have again reproduced exactly the behavior we previously observed for Module
.
Finally, consider the case of mutual references:
c = Unique[a, Temporary]
d = Unique[b, Temporary]
With[{a = c, b = d},
a = Hold[b];
b = Hold[a];
]
vals[]["a" | "b"]
Remove[c, d]
vals[]["a" | "b"]
(* a$3070 *)
(* b$3071 *)
(*
{
{"a$3070", {HoldPattern[a$3070] :> Hold[b$3071]}},
{"b$3071", {HoldPattern[b$3071] :> Hold[a$3070]}}
}
*)
(*
{
{"a$3070", {HoldPattern[a$3070] :> Hold[b$3071]}},
{"b$3071", {HoldPattern[b$3071] :> Hold[a$3070]}}
}
*)
Where again, we have reproduced the exact behavior we've seen before for Module
.
What we can conclude from this, is that a large part of observed behaviors is actually due to the underlying behavior of Unique
, rather than Module
.
Simple Module
emulation
To push the previous arguments a little further still, consider the following crude emulation of Module
based on Unique
:
SetAttributes[myModule, HoldAll]
myModule[vars : {___Symbol}, body_] :=
Block[vars,
ReleaseHold[
Hold[body] /. Thread[vars -> Map[Unique[#, Temporary]&, vars]]
]
]
This emulation disallows initialization in the variable list, and simply replaces all occurrences of any of the vars
symbols in the body with generated Temporary
unique symbols, and then lets the body to evaluate.
If you rerun all the examples involving Module
with myModule
, you will observe exactly the same results in all cases but two: the example in 2.a and last one in 3.c. But those behaviors of the original Module
are least puzzling, and the most puzzling ones are correctly reproduced with myModule
.
So while obviously Module
does more than myModule
, it may do not that much more. This shifts the problem to one of the interplay between Unique
and garbage collector, which might be considered at least some complexity reduction.
Conclusions
It seems that the behavior or Module
in terms of symbol leaking can in general be described by a set of reasonably simple rules. Exceptions exist, but there it seems that at least they also may have plausible explanations.
We can make several general conclusions to summarize the behavior described above.
- For garbage collection / symbol leaking, it does make a difference whether the symbol had external references or not, by the time the execution leaves
Module
- The garbage collector isn't smart enough to recount self-references or mutual references forming closed loops, after the execution left
Module
, and realize that some such local variables became collectable. - In the absence of external and self-references at the time code execution leaves the
Module
,OwnValues
are typically fine in terms of symbol collection / not leaking. - Symbols with
OwnValues
created by immediate assignment (Set
) and without self-references only keep their definitions until they are externally referenced (by other symbols or enclosing expressions, if returned fromModule
), and are promptly destroyed / garbage-collected afterwards. - Symbols with
OwnValues
keep their definitions and therefore are not collected, in cases when they are given delayed definitions (usingSetDelayed
) and they (still) were externally referenced at the time execution leftModule
. It is not clear why this is so, and whether or not this can be considered a bug. - Local symbols with
DownValues
and other*Values
exceptOwnValues
, will in general leak / not be collected if they have been externally referenced by the time the execution left theirModule
, regardless of whether or not they are still externally referenced - Once a
Temporary
symbol's definitions have been removed, the symbol will be collected as long as it is not referenced externally.
Most of the puzzling behavior from the above observations can be reproduced in a simpler setting with Module
emulated in a very simple way using Unique
variables. It looks like it has more to do with the dynamics of Unique
variables and garbage collection, than Module
per se. It may happen that Module
is not doing all that much extra, in this regard.
I believe that the above description is accurate and covers all cases I am aware of. But I can easily imagine that there are cases I have not seen or accounted for, which would make the picture more complex (or may be, simpler). If you know of such cases, or others not well described by this scheme, please comment.
Here are some examples of unexpected memory leaks in Mathematica and how to avoid them:
1. Parallel computation functions may prevent garbage collection
Module[{a}, Length[ParallelTable[a, {10}]]];
Names["a*"]
{"a", "a$1698"}
Also when temporary symbol is sent to a parallel kernel the Temporary
attribute is cleared:
Module[{a}, ParallelTable[Attributes[a], {10}] ]
{{}, {}, {}, {}, {}, {}, {}, {}, {}, {}}
How to avoid these leaks: Do not send temporary symbols to or from parallel kernels.
2. Mathematica stack tracing feature (introduced in v11) prevents garbage collection if your code produces messages
Module[{a}, a; 0/0];
Names["a*"]
{"a", "a$1697"}
Note: there will be no leak if you set $HistoryLength = 0
How to avoid this leak: set $HistoryLength = 0
or disable message menu via
Internal`$MessageMenu = False
See also:
How do I disable the stack tracing feature in Mathematica 11?
3. Local functions inside Module
may cause a memory leak if one function depends on another
f[] := Module[{a, b},
a[y_] := 2 y;
b[y_] := 2 a[y];
b[1]
];
f[];
Names["a*"]
{"a", "a$1698"}
Note that this leak does not require neither self-references nor circular references like in Leonid answer.
It is remarkable that this leak is gone if one interchanges the order of symbols in the first argument of Module
i.e. the following code does not leak:
f[] := Module[{b, a},
a[y_] := 2 y;
b[y_] := 2 a[y];
b[1]
];
f[];
Names["a*"]
{"a"}
How to avoid this leak: avoid local functions altogether or remove them explicitly before exiting module, e.g.:
f[] := Module[{a, b, result},
a[y_] := 2 y;
b[y_] := 2 a[y];
result = b[1];
Remove[a, b];
result
];
f[];
Names["a*"]
{"a"}
4. Local functions inside Module
cause a memory leak when there is a Condition
inside Module
f[x_] := Module[{a}, (a[y_] := y; a[x]) /; (x > 0)];
f[1];
Names["a*"]
{"a", "a$", "a$1698"}
How to avoid this leak: Remove local functions explicitly, e.g.
f[x_] := Module[{a, result}, (a[y_] := y; result = a[x]; Remove[a]; result) /; (x > 0)];
f[1];
Names["a*"]
{"a", "a$"}
Finally, for those who want to go deeper into debugging Mathematica garbage collector, there is a function, which gives a number of references to a given symbol:
System`Private`GetRefCount[f]