While submit job with pyspark, how to access static files upload with --files argument?
Files distributed using SparkContext.addFile
(and --files
) can be accessed via SparkFiles
. It provides two methods:
getRootDirectory()
- returns root directory for distributed filesget(filename)
- returns absolute path to the file
I am not sure if there are any Dataproc specific limitations but something like this should work just fine:
from pyspark import SparkFiles
with open(SparkFiles.get('test.yml')) as test_file:
logging.info(test_file.read())