How to reduce the verbosity of Spark's runtime output?
Spark 1.4.1
sc.setLogLevel("WARN")
From comments in source code:
Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
Spark 2.x - 2.3.1
sparkSession.sparkContext().setLogLevel("WARN")
Spark 2.3.2
sparkSession.sparkContext.setLogLevel("WARN")
quoting from 'Learning Spark' book.
You may find the logging statements that get printed in the shell distracting. You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j.properties. The Spark developers already include a template for this file called log4j.properties.template. To make the logging less verbose, make a copy of conf/log4j.properties.template called conf/log4j.properties and find the following line:
log4j.rootCategory=INFO, console
Then lower the log level so that we only show WARN message and above by changing it to the following:
log4j.rootCategory=WARN, console
When you re-open the shell, you should see less output.
Logging configuration at the Spark app level
With this approach no need of code change in cluster for a spark application.
- Let's create a new file log4j.properties from log4j.properties.template.
- Then change verbosity with
log4j.rootCategory
property. - Say, we need to check ERRORs of given jar then,
log4j.rootCategory=ERROR, console
Spark submit command would be
spark-submit \
... #Other spark props goes here
--files prop/file/location \
--conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
--conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
jar/location \
[application arguments]
Now you would see only the logs which are ERROR categorised.
Plain Log4j way wo Spark(but needs code change)
Set Logging OFF for packages org
and akka
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)