How to reduce the verbosity of Spark's runtime output?

Spark 1.4.1

sc.setLogLevel("WARN")

From comments in source code:

Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

Spark 2.x - 2.3.1

sparkSession.sparkContext().setLogLevel("WARN")

Spark 2.3.2

sparkSession.sparkContext.setLogLevel("WARN")


quoting from 'Learning Spark' book.

You may find the logging statements that get printed in the shell distracting. You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j.properties. The Spark developers already include a template for this file called log4j.properties.template. To make the logging less verbose, make a copy of conf/log4j.properties.template called conf/log4j.properties and find the following line:

log4j.rootCategory=INFO, console

Then lower the log level so that we only show WARN message and above by changing it to the following:

log4j.rootCategory=WARN, console

When you re-open the shell, you should see less output.


Logging configuration at the Spark app level

With this approach no need of code change in cluster for a spark application.

  • Let's create a new file log4j.properties from log4j.properties.template.
  • Then change verbosity with log4j.rootCategory property.
  • Say, we need to check ERRORs of given jar then, log4j.rootCategory=ERROR, console

Spark submit command would be

spark-submit \
    ... #Other spark props goes here    
    --files prop/file/location \
    --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    --conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    jar/location \
    [application arguments] 

Now you would see only the logs which are ERROR categorised.


Plain Log4j way wo Spark(but needs code change)

Set Logging OFF for packages org and akka

import org.apache.log4j.{Level, Logger}

Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)