Increase memory available to PySpark at runtime
As far as i know it wouldn't be possible to change the spark.executor.memory at run time. The containers, on the datanodes, will be created even before the spark-context initializes.
Citing this, after 2.0.0 you don't have to use SparkContext
, but SparkSession
with conf
method as below:
spark.conf.set("spark.executor.memory", "2g")
You could set spark.executor.memory
when you start your pyspark-shell
pyspark --num-executors 5 --driver-memory 2g --executor-memory 2g
I'm not sure why you chose the answer above when it requires restarting your shell and opening with a different command! Though that works and is useful, there is an in-line solution which is what was actually being requested. This is essentially what @zero323 referenced in the comments above, but the link leads to a post describing implementation in Scala. Below is a working implementation specifically for PySpark.
Note: The SparkContext you want to modify the settings for must not have been started or else you will need to close it, modify settings, and re-open.
from pyspark import SparkContext
SparkContext.setSystemProperty('spark.executor.memory', '2g')
sc = SparkContext("local", "App Name")
source: https://spark.apache.org/docs/0.8.1/python-programming-guide.html
p.s. if you need to close the SparkContext just use:
SparkContext.stop(sc)
and to double check the current settings that have been set you can use:
sc._conf.getAll()