Access Hive Data Using Python
You can use hive library for access hive from python,for that you want to import hive Class from hive import ThriftHive
Below the Example
import sys
from hive import ThriftHive
from hive.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
try:
transport = TSocket.TSocket('localhost', 10000)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ThriftHive.Client(protocol)
transport.open()
client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE)")
client.execute("LOAD TABLE LOCAL INPATH '/path' INTO TABLE r")
client.execute("SELECT * FROM r")
while (1):
row = client.fetchOne()
if (row == None):
break
print row
client.execute("SELECT * FROM r")
print client.fetchAll()
transport.close()
except Thrift.TException, tx:
print '%s' % (tx.message)
To install you'll need these libraries:
pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive
If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev
using apt-get
or yum
or whatever package manager. For Windows there are some options on GNU.org. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install
)
After installation, you can execute a hive query like this:
from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")
Now that you have the hive connection, you have options how to use it. You can just straight-up query:
cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
use_result(result)
...or to use the connection to make a Pandas dataframe:
import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)