Cannot load main class from JAR file in Spark Submit
The --py-files
flag is for additional python file dependencies used from your program; you can see here in SparkSubmit.scala it uses the so-called "primary argument", meaning first non-flag argument, to determine whether to do a "submit jarfile" mode or "submit python main" mode.
That's why you see it trying to load your "$entry_function" as a jarfile that doesn't exist, since it only assumes you're running Python if that primary argument ends with ".py", and otherwise defaults to assuming you have a .jar file.
Instead of using --py-files
, just make your /home/full/path/to/file/python/my_python_file.py
be the primary argument; then you can either do fancy python to take the "entry function" as a program argument, or you just call your entry function in your main function inside the python file itself.
Alternatively, you can still use --py-files
and then create a new main .py
file which calls your entry function, and then pass that main .py file as the primary argument instead.
When adding elements to --py-files use comma to separate them without leaving any space. Try this:
confLocation=../conf/my_config_file.conf &&
executors=8 &&
memory=2G &&
entry_function=my_function_in_python &&
dos2unix $confLocation &&
spark-submit \
--master yarn-client \
--num-executors $executors \
--executor-memory $memory \
--py-files /home/full/path/to/file/python/my_python_file.py,$entry_function,$confLocation