Tips for keeping Perl memory usage low
My two dimes.
Do threads created in Perl prevent copying Perl module libraries into memory for each thread?
- It does not, it is just one process, what isn't repeated in the program stack, each thread must have its own.
Is threads (use threads) the most efficient way (or the only) way to create threads in Perl?
- IMO Any method eventually calls the pthread library APIs which actually does the work.
In threads, I can specify a stack_size paramater, what specifically should I consider when specifying this value, and how does it impact memory usage?
- Since threads runs in the same process space, the stack cannot be shared. The stack size tells pthreads how far away they should be from each other. Everytime a function is called the local variables are allocated on the stack. So stack size limits how deep you can recurse. you can allocate as little as possible to the extend that your application still works.
With threads in Perl/Linux, what is the most reliable method to determine the actual memory usage on a per-thread basis?
* Stack storage is fixed after your thread is spawned, heap and static storage is shared and
they can be used by any thread so this notion of memory usage per-thread doesn't really
apply. It is per process.
Comparing fork and thread:
* fork duplicate the process and inherites the file handles
advantages: simpler application logic, more fault tolerant.
the spawn process can become faulty and leaking resource
but it will not bring down the parent. good solution if
you do not fork a lot and the forked process eventually
exits and cleaned up by the system.
disadvantages: more overhead per fork, system limitation on the number
of processes you can fork. You program cannot share variables.
* threads runs in the same process with addtional program stacks.
advantages: lower memory footprint, thread spawn if faster and ligther
than fork. You can share variables.
disadvantages: more complex application logic, serialization of resources etc.
need to have very reliable code and need to pay attention to
resource leaks which can bring down the entire application.
IMO, depends on what you do, fork can use way less memory over the life time of the
application run if whatever you spawn just do the work independently and exit, instead of
risking memory leaks in threads.
What sort of problem are you running into, and what does "large" mean to you? I have friends you need to load 200 Gb files into memory, so their idea of good tips is a lot different than the budget shopper for minimal VM slices suffering with 250 Mb of RAM (really? My phone has more than that).
In general, Perl holds on to any memory you use, even if it's not using it. Realize that optimizing in one direction, e.g. memory, might negatively impact another, such as speed.
This is not a comprehensive list (and there's more in Programming Perl):
☹ Use Perl memory profiling tools to help you find problem areas. See Profiling heap memory usage on perl programs and How to find the amount of physical memory occupied by a hash in Perl?
☹ Use lexical variables with the smallest scope possible to allow Perl to re-use that memory when you don't need it.
☹ Avoid creating big temporary structures. For instance, reading a file with a foreach
reads all the input at once. If you only need it line-by-line, use while
.
foreach ( <FILE> ) { ... } # list context, all at once
while( <FILE> ) { ... } # scalar context, line by line
☹ You might not even need to have the file in memory. Memory-map files instead of slurping them
☹ If you need to create big data structures, consider something like DBM::Deep or other storage engines to keep most of it out of RAM and on disk until you need it. Outside of Perl, there are various key-value stores, such as Redis, that may help.
☹ Don't let people use your program. Whenever I've done that, I've reduced the memory footprint by about 100%. It also cuts down on support requests.
☹ (Update: Perl can now handle this for you in most cases because it uses a Copy On Write (COW) mechanism) Pass large chunks of text and large aggregates by reference so you don't make a copy, thus storing the same information twice. If you have to copy it because you want to change something, you might be stuck. This goes both ways as subroutine arguments and subroutine return values:
call_some_sub( \$big_text, \@long_array );
sub call_some_sub {
my( $text_ref, $array_ref ) = @_;
...
return \%hash;
}
☹ Track down memory leaks in modules. I had big problems with an application until I realized that a module wasn't releasing memory. I found a patch in the module's RT queue, applied it, and solved the problem.
☹ If you need to handle a big chunk of data once but don't want the persistent memory footprint, offload the work to a child process. The child process only has the memory footprint while it's working. When you get the answer, the child process shuts down and releases it memory. Similarly, work distribution systems, such as Minion, can spread work out among machines.
☹ Turn recursive solutions into iterative ones. Perl doesn't have tail recursion optimization, so every new call adds to the call stack. You can optimize the tail problem yourself with tricks with goto or a module, but that's a lot of work to hang onto a technique that you probably don't need.
☹ Use external programs, forks, job queues, or other separate actors so you don't have to carry around short-term memory burdens. If you have a have processing task that will use a big chunk of memory, let a different program (perhaps a fork of the current program) handle that and give you back the answer. When that other program is done, all of its memory returns to the operating system. This program doesn't even need to be on the same box.
☹ Did he use 6 Gb or only five? Well, to tell you the truth, in all this excitement I kind of lost track myself. But being as this is Perl, the most powerful language in the world, and would blow your memory clean off, you've got to ask yourself one question: Do I feel lucky? Well, do ya, punk?
There are many more, but it's too early in the morning to figure out what those are. I cover some in Mastering Perl and Effective Perl Programming.