Why there are many spark-warehouse folders got created?
What is exactly spark-warehouse and why are these created many times?
Unless configured otherwise, Spark will create an internal Derby database named metastore_db
with a derby.log
. Looks like you've not changed that.
This is the default behavior, as point out in the Documentation
When not configured by the
hive-site.xml
, the context automatically createsmetastore_db
in the current directory and creates a directory configured byspark.sql.warehouse.dir
, which defaults to the directoryspark-warehouse
in the current directory that the Spark application is started
Sometimes my spark shell and beeline shows different databases and tables and sometimes it show same
You're starting those commands in those different folders, so what you see is only confined to the current working directory.
I used beeline and created tables... How the hive came on my machine?
It didn't. You're probably connecting to the either the Spark Thrift Server, which is fully compatible with HiveServer2 protocol, the Derby database, as mentioned, or, you actually do have a HiveServer2 instance sitting at 10.171.0.117
Anyways, the JDBC connection is not required here. You can use SparkSession.sql
function directly.