In Apache Spark. How to set worker/executor's environment variables?
Just stumbled upon something in the Spark documentation:
spark.executorEnv.[EnvironmentVariableName]
Add the environment variable specified by EnvironmentVariableName to the Executor process. The user can specify multiple of these to set multiple environment variables.
So in your case, I'd set the Spark configuration option spark.executorEnv.com.amazonaws.sdk.disableCertChecking
to true
and see if that helps.
Adding more to the existing answer.
import pyspark
def get_spark_context(app_name):
# configure
conf = pyspark.SparkConf()
conf.set('spark.app.name', app_name)
# init & return
sc = pyspark.SparkContext.getOrCreate(conf=conf)
# Configure your application specific setting
# Set environment value for the executors
conf.set(f'spark.executorEnv.SOME_ENVIRONMENT_VALUE', 'I_AM_PRESENT')
return pyspark.SQLContext(sparkContext=sc)
SOME_ENVIRONMENT_VALUE
environment variable will be available in the executors/workers.
In your spark application, you can access them like this:
import os
some_environment_value = os.environ.get('SOME_ENVIRONMENT_VALUE')