How to add any new library like spark-csv in Apache Spark prebuilt version

At the time I used spark-csv, I also had to download commons-csv jar (not sure it is still relevant). Both jars where in the spark distribution folder.

  1. I downloaded the jars as follow:

    wget -O commons-csv-1.1.jar<br/>    
    wget -O spark-csv_2.10-1.0.0.jar
  2. then started the python spark shell with the arguments:

    ./bin/pyspark --jars "spark-csv_2.10-1.0.0.jar,commons-csv-1.1.jar"
  3. and read a spark dataframe from a csv file:

    from pyspark.sql import SQLContext
    sqlContext = SQLContext(sc)
    df = sqlContext.load(source="com.databricks.spark.csv", path = "/path/to/you/file.csv")

Another option is to add the following to your spark-defaults.conf:

spark.jars.packages com.databricks:spark-csv_2.11:1.2.0