While submit job with pyspark, how to access static files upload with --files argument?

Files distributed using SparkContext.addFile (and --files) can be accessed via SparkFiles. It provides two methods:

  • getRootDirectory() - returns root directory for distributed files
  • get(filename) - returns absolute path to the file

I am not sure if there are any Dataproc specific limitations but something like this should work just fine:

from pyspark import SparkFiles

with open(SparkFiles.get('test.yml')) as test_file:
    logging.info(test_file.read())