How to create SparkSession with Hive support (fails with "Hive classes are not found")?
Add following dependency to your maven project.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.0.0</version>
</dependency>
tl;dr You have to make sure that Spark SQL's spark-hive
dependency and all transitive dependencies are available at runtime on the CLASSPATH of a Spark SQL application (not build time that is simply required for compilation only).
In other words, you have to have org.apache.spark.sql.hive.HiveSessionStateBuilder
and org.apache.hadoop.hive.conf.HiveConf
classes on the CLASSPATH of the Spark application (which has little to do with sbt or maven).
The former HiveSessionStateBuilder
is part of spark-hive
dependency (incl. all the transitive dependencies).
The latter HiveConf
is part of hive-exec
dependency (that is a transitive dependency of the above spark-hive
dependency).
I've looked into the source code, and found that despite HiveSessionState(in spark-hive), another class HiveConf is also needed to initiate SparkSession. And HiveConf is not contained in spark-hive*jar, maybe you can find it in hive related jars and put it in your classpath.