Showing tables from specific database with Pyspark and Hive
sqlContext.sql("show tables in 3_db").show()
There are two possible ways to achieve this, but they differ a lot in terms of efficiency.
Using SQL
This is the most efficient approach:
spark_session = SparkSession.builder.getOrCreate()
spark_session.sql("show tables in db_name").show()
Using catalog.listTables()
The following is more inefficient compared to the previous approach, as it also loads tables' metadata:
spark_session = SparkSession.builder.getOrCreate()
spark_session.catalog.listTables("db_name")
Another possibility is to use the Catalog methods:
spark = SparkSession.builder.getOrCreate()
spark.catalog.listTables("3_db")
Just be aware that in PySpark this method returns a list
and in Scala, it returns a DataFrame
.