PySpark: java.lang.OutofMemoryError: Java heap space
I had the same problem with pyspark
(installed with brew
). In my case it was installed on the path /usr/local/Cellar/apache-spark
.
The only configuration file I had was in apache-spark/2.4.0/libexec/python//test_coverage/conf/spark-defaults.conf
.
As suggested here I created the file spark-defaults.conf
in the path /usr/local/Cellar/apache-spark/2.4.0/libexec/conf/spark-defaults.conf
and appended to it the line spark.driver.memory 12g
.
After trying out loads of configuration parameters, I found that there is only one need to be changed to enable more Heap space and i.e. spark.driver.memory
.
sudo vim $SPARK_HOME/conf/spark-defaults.conf
#uncomment the spark.driver.memory and change it according to your use. I changed it to below
spark.driver.memory 15g
# press : and then wq! to exit vim editor
Close your existing spark application and re run it. You will not encounter this error again. :)
If you're looking for the way to set this from within the script or a jupyter notebook, you can do:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master('local[*]') \
.config("spark.driver.memory", "15g") \
.appName('my-cool-app') \
.getOrCreate()