How to run a script in PySpark
pyspark 2.0 and later execute script file in environment variable PYTHONSTARTUP
, so you can run:
PYTHONSTARTUP=code.py pyspark
Compared to spark-submit
answer this is useful for running initialization code before using the interactive pyspark shell.
You can do: ./bin/spark-submit mypythonfile.py
Running python applications through pyspark
is not supported as of Spark 2.0.