Spark: java.io.IOException: No space left on device
This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.
Set the following properties in spark-env.sh
.
(change the directories accordingly to whatever directory in your infra, that has write permissions set and with enough space in it)
SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"
export SPARK_JAVA_OPTS
You can also set the spark.local.dir
property in $SPARK_HOME/conf/spark-defaults.conf
as stated by @EUgene below
As a complementary, to specify default folder for you shuffle tmp files, you can add below line to $SPARK_HOME/conf/spark-defaults.conf
:
spark.local.dir /mnt/nvme/local-dir,/mnt/nvme/local-dir2
According to the Error message
you have provided, your situation is no disk space left on your hard-drive. However, it's not caused by RDD persistency, but shuffle which you implicitly required when calling reduce
.
Therefore, you should clear your drive and make more spaces for your tmp folder