What is the difference between the hive jdbc client and the hive metastore java api?
as far as I understand there are 2 ways to connect to Hive
- using hive metastore server, which then connects in the background to a relational db such as mysql for schema manifestation. This runs on port 9083, generally.
- hive jdbc server, called HiveServer2, which runs on port 10001, generally...
Now, in the earlier editions of hive, hiveserver2 used to be not so stable and in fact it's multi-threading support was also limited. Things have probably improved in that arena, I'd imagine.
So for JDBC api - yes, it would let you communicate using JDBC and sql.
For the metastore connectivity, there appear to be 2 features.
- to actually run SQL queries - DML
- to perform DDL operations.
DDL -
for DDL, the metastore APIs come in handy, org.apache.hadoop.hive.metastore.HiveMetaStoreClient HiveMetaStoreClient class can be utilized for that purpose
DML -
what I have found useful in this regard is the org.apache.hadoop.hive.ql.Driver https://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/Driver.html hive.ql.Driver class
This class has a method called run()
which lets you execute a SQL statement and get the result back.
for e.g. you can do following
Driver driver = new Driver(hiveConf);
HiveMetaStoreClient client = new HiveMetaStoreClient(hiveConf);
SessionState.start(new CliSessionState(hiveConf));
driver.run("select * from employee);
// DDL example
client.dropTable(db, table);