How to log using log4j to local file system inside a Spark application that runs on YARN?
It looks like you'll need to append to the JVM arguments used when launching your tasks/jobs.
Try editing conf/spark-defaults.conf
as described here
spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/apps/spark-1.2.0/conf/log4j.properties
spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/apps/spark-1.2.0/conf/log4j.properties
Alternatively try editing conf/spark-env.sh
as described here to add the same JVM argument, although the entries in conf/spark-defaults.conf should work.
If you are still not getting any joy, you can explicitly pass the location of your log4j.properties file on the command line along with your spark-submit
like this if the file is contained within your JAR file and in the root directory of your classpath
spark-submit --class sparky.MyApp --master spark://my.host.com:7077 --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-executor.properties" myapp.jar
If the file is not on your classpath use the file:
prefix and full path like this
spark-submit ... --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/apps/spark-1.2.0/conf/log4j-executor.properties" ...
The above options of specifying the log4j.properties using spark.executor.extraJavaOptions, spark.driver.extraJavaOptions would only log it locally and also the log4.properties should be present locally on each node.
As specified in the https://spark.apache.org/docs/1.2.1/running-on-yarn.html documentation, you could alternatively upload log4j.properties along with your applicaiton using --files option. This would do yarn aggregate logging on HDFS and you can access the log using the command
yarn logs -applicationId <application id>
1) To debug how Spark on YARN is interpreting your log4j settings, use log4j.debug
flag.
2) Spark will create 2 kind of YARN containers, the driver and the worker. So you want to share a file from where you submit the application with all containers (you cant use a file inside the JAR, since this is not the JAR that really run), so you must use the --files
Spark submit directive (this will share file with all workers).
Like this:
spark-submit
--class com.X.datahub.djobi.Djobi \
--files "./log4j.properties" \
--driver-java-options "-Dlog4j.debug=true -Dlog4j.configuration=log4j.properties" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.debug=true -Dlog4j.configuration=log4j.properties " \
./target/X-1.0.jar "$@"
Where log4j.properties is a project file inside src/main/resources/config
folder.
I can see in the console:
log4j: Trying to find [config/log4j.properties] using context
classloader org.apache.spark.util.MutableURLClassLoader@5bb21b69.
log4j: Using URL [jar:file:/home/hdfs/djobi/latest/lib/djobi-1.0.jar!/config/log4j.properties] for automatic log4j configuration.
log4j: Reading configuration from URL jar:file:/home/hdfs/djobi/latest/lib/djobi-1.0.jar!/config/log4j.properties
So the file is taken in account, you can check on Spark webUI too.