Why does a job fail with "No space left on device", but df says otherwise?

By default Spark uses the /tmp directory to store intermediate data. If you actually do have space left on some device -- you can alter this by creating the file SPARK_HOME/conf/spark-defaults.conf and adding the line. Here SPARK_HOME is wherever you root directory for the spark install is.

spark.local.dir                     SOME/DIR/WHERE/YOU/HAVE/SPACE

You need to also monitor df -i which shows how many inodes are in use.

on each machine, we create M * R temporary files for shuffle, where M = number of map tasks, R = number of reduce tasks.

https://spark-project.atlassian.net/browse/SPARK-751

If you do indeed see that disks are running out of inodes to fix the problem you can:

Decrease partitions (see coalesce with shuffle = false).
One can drop the number to O(R) by “consolidating files”. As different file-systems behave differently it’s recommended that you read up on spark.shuffle.consolidateFiles and see https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf.
Sometimes you may simply find that you need your DevOps to increase the number of inodes the FS supports.

EDIT

Consolidating files has been removed from spark since version 1.6. https://issues.apache.org/jira/browse/SPARK-9808

Why does a job fail with "No space left on device", but df says otherwise?

Tags:

Apache Spark

Related

Recent Posts