Apache Spark: "failed to launch org.apache.spark.deploy.worker.Worker" or Master
I have the same problem, running spark/sbin/start-slave.sh on master node.
hadoop@master:/opt/spark$ sudo ./sbin/start-slave.sh --master spark://master:7077
starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 --master spark://master:7077
Options:
-c CORES, --cores CORES Number of cores to use
-m MEM, --memory MEM Amount of memory to use (e.g. 1000M, 2G)
-d DIR, --work-dir DIR Directory to run apps in (default: SPARK_HOME/work)
-i HOST, --ip IP Hostname to listen on (deprecated, please use --host or -h)
-h HOST, --host HOST Hostname to listen on
-p PORT, --port PORT Port to listen on (default: random)
--webui-port PORT Port for web UI (default: 8081)
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.
full log in /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
I found my fault, I should not use --master keyword and just run command
hadoop@master:/opt/spark$ sudo ./sbin/start-slave.sh spark://master:7077
following the steps of this tutorial: https://phoenixnap.com/kb/install-spark-on-ubuntu
Hint: make sure to install all dependencies before:
sudo apt install scala git -y
The Spark configuration system is a mess of environment variables, argument flags, and Java Properties files. I just spent a couple hours tracking down the same warning, and unraveling the Spark initialization procedure, and here's what I found:
sbin/start-all.sh
callssbin/start-master.sh
(and thensbin/start-slaves.sh
)sbin/start-master.sh
callssbin/spark-daemon.sh start org.apache.spark.deploy.master.Master ...
sbin/spark-daemon.sh start ...
forks off a call tobin/spark-class org.apache.spark.deploy.master.Master ...
, captures the resulting process id (pid), sleeps for 2 seconds, and then checks whether that pid's command's name is "java"bin/spark-class
is a bash script, so it starts out with the command name "bash", and proceeds to:- (re-)load the Spark environment by sourcing
bin/load-spark-env.sh
- finds the
java
executable - finds the right Spark jar
- calls
java ... org.apache.spark.launcher.Main ...
to get the full classpath needed for a Spark deployment - then finally hands over control, via
exec
, tojava ... org.apache.spark.deploy.master.Master
, at which point the command name becomes "java"
- (re-)load the Spark environment by sourcing
If steps 4.1 through 4.5 take longer than 2 seconds, which in my (and your) experience seems pretty much inevitable on a fresh OS where java
has never been previously run, you'll get the "failed to launch" message, despite nothing actually having failed.
The slaves will complain for the same reason, and thrash around until the master is actually available, but they should keep retrying until they successfully connect to the master.
I've got a pretty standard Spark deployment running on EC2; I use:
conf/spark-defaults.conf
to setspark.executor.memory
and add some custom jars viaspark.{driver,executor}.extraClassPath
conf/spark-env.sh
to setSPARK_WORKER_CORES=$(($(nproc) * 2))
conf/slaves
to list my slaves
Here's how I start a Spark deployment, bypassing some of the {bin,sbin}/*.sh
minefield/maze:
# on master, with SPARK_HOME and conf/slaves set appropriately
mapfile -t ARGS < <(java -cp $SPARK_HOME/lib/spark-assembly-1.6.1-hadoop2.6.0.jar org.apache.spark.launcher.Main org.apache.spark.deploy.master.Master | tr '\0' '\n')
# $ARGS now contains the full call to start the master, which I daemonize with nohup
SPARK_PUBLIC_DNS=0.0.0.0 nohup "${ARGS[@]}" >> $SPARK_HOME/master.log 2>&1 < /dev/null &
I'm still using sbin/start-daemon.sh
to start the slaves, since that's easier than calling nohup
within the ssh
command:
MASTER=spark://$(hostname -i):7077
while read -r; do
ssh -o StrictHostKeyChecking=no $REPLY "$SPARK_HOME/sbin/spark-daemon.sh start org.apache.spark.deploy.worker.Worker 1 $MASTER" &
done <$SPARK_HOME/conf/slaves
# this forks the ssh calls, so wait for them to exit before you logout
There! It assumes that I'm using all the default ports and stuff, and that I'm not doing stupid shit like putting whitespace in filenames, but I think it's cleaner this way.