java.lang.OutOfMemoryError: unable to create new native thread for big data set
I've experienced this with MapReduce in general. In my experience it's not actually an Out of Memory error - the system is running out of file descriptors to start threads, which is why it says "unable to create new native thread".
The fix for us (on Linux) was to increase the ulimit, which was set to 1024, to 2048 via: ulimit -n 2048
. You will need to have permissions to do this - either sudo or root access or have a hard limit of 2048 or higher so you can set it as your own user on the system. You can do this in your .profile
or .bashrc
settings file.
You can check your current settings with ulimit -a
. See this reference for more details: https://stackoverflow.com/a/34645/871012
I've also seen many others talk about changing the /etc/security/limits.conf
file, but I haven't had to do that yet. Here is a link talking about it: https://stackoverflow.com/a/8285278/871012