How to find the master URL for an existing spark cluster
I found that doing --master yarn-cluster
works best. this makes sure that spark uses all the nodes of the hadoop cluster.
You are on the spot. .setMaster("local[*]")
will run spark in self-contained mode. In this mode spark can utilize only the resources of the local machine.
If you've already set up a spark cluster on top of your physical cluster. The solution is an easy one, Check http://master:8088
where master is pointing to spark master machine. There you can see spark master URI, and by default is spark://master:7077
, actually quite a bit of information lives there, if you have a spark standalone cluster.
However, I see a lot of questions on SO claiming this does not work with many different reasons. Using spark-submit
utility is just less error prone, See usage.
But if you haven't got a spark cluster yet I suggest setting up a Spark Standalone cluster first.