Java "for" statement implementation prevents garbage collecting
Thanks for the bug report. We have fixed this bug, see JDK-8175883. As commented here in the case of the enhanced for, javac was generating synthetic variables so for a code like:
void foo(String[] data) {
for (String s : data);
}
javac was approximately generating:
for (String[] arr$ = data, len$ = arr$.length, i$ = 0; i$ < len$; ++i$) {
String s = arr$[i$];
}
as mentioned above this translation approach implies that the synthetic variable arr$ holds a reference to the array data that impedes the GC to collect the array once it is not referred anymore inside the method. This bug has been fixed by generating this code:
String[] arr$ = data;
String s;
for (int len$ = arr$.length, i$ = 0; i$ < len$; ++i$) {
s = arr$[i$];
}
arr$ = null;
s = null;
The idea is to set to null any synthetic variable of a reference type created by javac to translate the loop. If we were talking about an array of a primitive type, then the last assignment to null is not generated by the compiler. The bug has been fixed in repo JDK repo
So this is actually an interesting question that could have benefited from a slightly different wording. More specifically, focusing on the generated bytecode instead would have cleared a lot of the confusion. So let's do that.
Given this code:
List<Integer> foo = new ArrayList<>();
for (Integer i : foo) {
// nothing
}
This is the generated bytecode:
0: new #2 // class java/util/ArrayList
3: dup
4: invokespecial #3 // Method java/util/ArrayList."<init>":()V
7: astore_1
8: aload_1
9: invokeinterface #4, 1 // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;
14: astore_2
15: aload_2
16: invokeinterface #5, 1 // InterfaceMethod java/util/Iterator.hasNext:()Z
21: ifeq 37
24: aload_2
25: invokeinterface #6, 1 // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
30: checkcast #7 // class java/lang/Integer
33: astore_3
34: goto 15
So, play by play:
- Store the new list in local variable 1 ("foo")
- Store the iterator in local variable 2
- For each element, store the element in local variable 3
Note that after the loop, there's no cleanup of anything that was used in the loop. That isn't restricted to the iterator: the last element is still stored in local variable 3 after the loop ends, even though there's no reference to it in the code.
So before you go "that's wrong, wrong, wrong", let's see what happens when I add this code after that code above:
byte[] bar = new byte[0];
You get this bytecode after the loop:
37: iconst_0
38: newarray byte
40: astore_2
Oh, look at that. The newly declared local variable is being stored in the same "local variable" as the iterator. So now the reference to the iterator is gone.
Note that this is different from the Java code you assume is the equivalent. The actual Java equivalent, which generates the exact same bytecode, is this:
List<Integer> foo = new ArrayList<>();
for (Iterator<Integer> i = foo.iterator(); i.hasNext(); ) {
Integer val = i.next();
}
And still there's no cleanup. Why's that?
Well, here we are in guessing territory, unless it's actually specified in the JVM spec (haven't checked). Anyway, to do cleanup, the compiler would have to generate extra bytecode (2 instructions, aconst_null
and astore_<n>
) for each variable that's going out of scope. This would mean the code runs slower; and to avoid that, possibly complicated optimizations would have to be added to the JIT.
So, why does your code fail?
You end up in a similar situation as the above. The iterator is allocated and stored in local variable 1. Then your code tries to allocate the new string array and, because local variable 1 is not in use anymore, it would be stored in the same local variable (check the bytecode). But the allocation happens before the assignment, so there's a reference to the iterator still, so there's no memory.
If you add this line before the try
block, things work, even if you remove the System.gc()
call:
int i = 0;
So, it seems the JVM developers made a choice (generate smaller / more efficient bytecode instead of explicitly nulling variables that go out of scope), and you happen to have written code that doesn't behave well under the assumptions they made about how people write code. Given that I've never seen this problem in actual applications, seems like a minor thing to me.