Finding the cause of a memory leak in Ruby
It looks like you are entering The Lost World here. I don’t think the problem is with c-bindings in racc
either.
Ruby memory management is both elegant and cumbersome. It stores objects (named RVALUE
s) in so-called heaps of size of approx 16KB. On a low level, RVALUE
is a c-struct, containing a union
of different standard ruby object representations.
So, heaps store RVALUE
objects, which size is not more than 40 bytes. For such objects as String
, Array
, Hash
etc. this means that small objects can fit in the heap, but as soon as they reach a threshold, an extra memory outside of the Ruby heaps will be allocated.
This extra memory is flexible; is will be freed as soon as an object became GC’ed. That’s why your testcase with big_string
shows the memory up-down behaviour:
def report
puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`
.strip.split.map(&:to_i)[1].to_s + 'KB'
end
report
big_var = " " * 10000000
report
big_var = nil
report
ObjectSpace.garbage_collect
sleep 1
report
# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 11788KB
But the heaps (see GC[:heap_length]
) themselves are not released back to OS, once acquired. Look, I’ll make a humdrum change to your testcase:
- big_var = " " * 10000000
+ big_var = 1_000_000.times.map(&:to_s)
And, voilá:
# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 57448KB
The memory is not released back to OS anymore, because each element of the array I introduced suits the RVALUE
size and is stored in the ruby heap.
If you’ll examine the output of GC.stat
after the GC was run, you’ll find that GC[:heap_used]
value is decreased as expected. Ruby now has a lot of empty heaps, ready.
The summing up: I don’t think, the c
code leaks. I think the problem is within base64 representation of huge image in your css
. I have no clue, what’s happening inside parser, but it looks like the huge string forces the ruby heap count to increase.
Hope it helps.
Okay, I found the answer. I am leaving my other answer up because that information was very difficult to gather, it is related, and it could help someone else searching for a related issue.
Your problem, however, appears to be due to the fact that Ruby actually does not release memory back to the Operating System once it has acquired it.
Memory Allocation
While Ruby programmers do not often worry about memory allocation, sometimes the following question comes up:
Why did my Ruby process stay so big even after I’ve cleared all references to big objects? I’m /sure/ GC has run several times and freed my big objects and I’m not leaking memory.
A C programmer might ask the same question:
I free()-ed a lot of memory, why is my process still so big?
Memory allocation to user space from the kernel is cheaper in large chunks, thus user space avoids interaction with the kernel by doing more work itself.
User space libraries/runtimes implement a memory allocator (e.g.: malloc(3) in libc) which takes large chunks of kernel memory2 and divides them up into smaller pieces for user space applications to use.
Thus, several user space memory allocations may occur before user space needs to ask the kernel for more memory. Thus if you got a large chunk of memory from the kernel and are only using a small part of that, that large chunk of memory remains allocated.
Releasing memory back to the kernel also has a cost. User space memory allocators may hold onto that memory (privately) in the hope it can be reused within the same process and not give it back to the kernel for use in other processes. (Ruby Best Practices)
So, your objects may very well have been garbage collected and released back to Ruby's available memory, but because Ruby never gives back unused memory to the OS, the rss value for the process remains the same, even after garbage collection. This is actually by design. According to Mike Perham:
...And since MRI never gives back unused memory, our daemon can easily be taking 300-400MB when it’s only using 100-200.
It’s important to note that this is essentially by design. Ruby’s history is mostly as a command line tool for text processing and therefore it values quick startup and a small memory footprint. It was not designed for long-running daemon/server processes. Java makes a similar tradeoff in its client and server VMs.