Why does a job fail with "No space left on device", but df says otherwise?
By default Spark
uses the /tmp
directory to store intermediate data. If you actually do have space left on some device -- you can alter this by creating the file SPARK_HOME/conf/spark-defaults.conf
and adding the line. Here SPARK_HOME
is wherever you root directory for the spark install is.
spark.local.dir SOME/DIR/WHERE/YOU/HAVE/SPACE
You need to also monitor df -i
which shows how many inodes are in use.
on each machine, we create M * R temporary files for shuffle, where M = number of map tasks, R = number of reduce tasks.
https://spark-project.atlassian.net/browse/SPARK-751
If you do indeed see that disks are running out of inodes to fix the problem you can:
- Decrease partitions (see
coalesce
withshuffle = false
). - One can drop the number to O(R) by “consolidating files”. As different file-systems behave differently it’s recommended that you read up on
spark.shuffle.consolidateFiles
and see https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf. - Sometimes you may simply find that you need your DevOps to increase the number of inodes the FS supports.
EDIT
Consolidating files has been removed from spark since version 1.6. https://issues.apache.org/jira/browse/SPARK-9808