Why there are many spark-warehouse folders got created?

What is exactly spark-warehouse and why are these created many times?

Unless configured otherwise, Spark will create an internal Derby database named metastore_db with a derby.log. Looks like you've not changed that.

This is the default behavior, as point out in the Documentation

When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started

Sometimes my spark shell and beeline shows different databases and tables and sometimes it show same

You're starting those commands in those different folders, so what you see is only confined to the current working directory.

I used beeline and created tables... How the hive came on my machine?

It didn't. You're probably connecting to the either the Spark Thrift Server, which is fully compatible with HiveServer2 protocol, the Derby database, as mentioned, or, you actually do have a HiveServer2 instance sitting at 10.171.0.117

Anyways, the JDBC connection is not required here. You can use SparkSession.sql function directly.

Why there are many spark-warehouse folders got created?

Tags:

Hadoop

Jdbc

Hive

Apache Spark

Related

Recent Posts