How does Spark-submit in cluster deploy mode manage the application Jars
The official documentation is correct (as we would expect).
TL;DR:
There is no need to copy application files or dependencies across the cluster to submit a Spark job with spark-submit
.
spark-submit
takes care of delivering the application jar to the executors. Even more, the jar files specified using the --jars
option are also served by the file server on the driver program to all executors, so we don't need to copy any dependencies to the executors, either. Spark takes care of that for you.
Further details are available on the Advanced Dependency Management page
As you are running your job in cluster deployment mode the dependent JARS specified through --jars will be copied from local path to the containers on HDFS.
Following is the console output where you can see the application JAR(mapRedQA-1.0.0.jar) along with the required configurations(__spark_conf__5743283277173703345.zip) is uploaded to the container on HDFS which will be accessible for all executor nodes. That's why you no need to put the application JAR on worker nodes Spark will take care of it.
17/08/10 11:42:55 INFO yarn.Client: Preparing resources for our AM container
17/08/10 11:42:57 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://master.localdomain:8020/user/user1/.sparkStaging/application_1502271179925_0001
17/08/10 11:43:19 INFO hdfs.DFSClient: Created token for user1: HDFS_DELEGATION_TOKEN [email protected], renewer=yarn, realUser=, issueDate=1502379778376, maxDate=1502984578376, sequenceNumber=6144, masterKeyId=243 on 2.10.1.70:8020
17/08/10 11:43:25 INFO yarn.Client: Uploading resource file:/Automation/mapRedQA-1.0.0.jar -> hdfs://master.localdomain:8020/user/user1/.sparkStaging/application_1502271179925_0001/mapRedQA-1.0.0.jar
17/08/10 11:43:51 INFO yarn.Client: Uploading resource file:/tmp/spark-f4e913eb-17d5-4d5b-bf99-c8212715ceaa/__spark_conf__5743283277173703345.zip -> hdfs://master.localdomain:8020/user/user1/.sparkStaging/application_1502271179925_0001/__spark_conf__5743283277173703345.zip
17/08/10 11:43:52 INFO spark.SecurityManager: Changing view acls to: user1
17/08/10 11:43:52 INFO spark.SecurityManager: Changing modify acls to: user1
17/08/10 11:43:52 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user1); users with modify permissions: Set(user1)
17/08/10 11:43:53 INFO yarn.Client: Submitting application 1 to ResourceManager
17/08/10 11:43:58 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1502271179925_0001 is still in NEW
t: Application report for application_1502271179925_0001 (state: ACCEPTED)