How to see contents of Hive orc files in linux

Updated answer in year 2020:

Per @Owen's answer, ORC has grown up and matured as it's own Apache project. A completed list of ORC Adopters shows how prevalent it is now supported across many varieties of Big Data technologies.

Credit to @Owen and the ORC Apache project team, ORC's project site has a fully maintained up-to-date documentation on using either the Java or C++ stand alone tool on ORC file stored on a Linux local file system. Which carried on the torch for the original Hive+ORC Apache wiki page.

Original answer dated: May 30 '14 at 16:27

The ORC file dump utility comes with hive (0.11 or higher):
hive --orcfiledump <hdfs-location-of-orc-file>
Source link

It's also capable to see the contents of a ORC file by desktop application running on Linux.

There is a desktop application to view Parquet and also other binary format data like ORC and AVRO. It's pure Java application so that can be run at Linux, Mac and also Windows. Please check Bigdata File Viewer for details.

It supports complex data type like array, map, struct, etc.

enter image description here

There is now also a native executable for Linux and MacOS that prints the contents of the orc file in JSON. See the ORC project (http://orc.apache.org/) and build the C++ tools.

% orc-contents examples/TestOrcFile.test1.orc

There is also a native metadata tool:

% orc-metadata ../examples/TestOrcFile.test1.orc

The ORC project also has a standalone uber jar that can do the same from Java.

% java -jar orc-tools-1.2.3-uber.jar data myfile.orc

How to see contents of Hive orc files in linux

Tags:

Linux

Bash

Compression

Hive

Related

Recent Posts