Why does Spark report "java.net.URISyntaxException: Relative path in absolute URI" when working with DataFrames?
It's the SPARK-15565 issue in Spark 2.0 on Windows with a simple solution (that appears to be part of Spark's codebase that may soon be released as 2.0.2 or 2.1.0).
The solution in Spark 2.0.0 is to set spark.sql.warehouse.dir
to some properly-referenced directory, say file:///c:/Spark/spark-2.0.0-bin-hadoop2.7/spark-warehouse
that uses ///
(triple slashes).
Start spark-shell
with --conf
argument as follows:
spark-shell --conf spark.sql.warehouse.dir=file:///c:/tmp/spark-warehouse
Or create a SparkSession
in your Spark application using the new fluent builder pattern as follows:
import org.apache.spark.sql.SparkSession
SparkSession spark = SparkSession
.builder()
.config("spark.sql.warehouse.dir", "file:///c:/tmp/spark-warehouse")
.getOrCreate()
Or create conf/spark-defaults.conf
with the following content:
spark.sql.warehouse.dir file:///c:/tmp/spark-warehouse
If you do want to fix it in code yet not touch exsiting code, can also pass it from system properties, such that the spark initializations which comes after won't change.
System.setProperty(
"spark.sql.warehouse.dir",
s"file:///${System.getProperty("user.dir")}/spark-warehouse"
.replaceAll("\\\\", "/")
)
Note, also this is using the current working dir, which can be replaced with "c:/tmp/", or any place you'd like the spark-warehouse dir.