How to add third-party Java JAR files for use in PySpark

You could add --jars xxx.jar when using spark-submit

./bin/spark-submit --jars xxx.jar your_spark_script.py

or set the enviroment variable SPARK_CLASSPATH

SPARK_CLASSPATH='/path/xxx.jar:/path/xx2.jar' your_spark_script.py

your_spark_script.py was written by pyspark API


You can add external jars as arguments to pyspark

pyspark --jars file1.jar,file2.jar

You could add the path to jar file using Spark configuration at Runtime.

Here is an example :

conf = SparkConf().set("spark.jars", "/path-to-jar/spark-streaming-kafka-0-8-assembly_2.11-2.2.1.jar")

sc = SparkContext( conf=conf)

Refer the document for more information.