Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive
You сan specify INPUTFORMAT
, OUTPUTFORMAT
, SERDE
in STORED AS
when creating table. Hive allows you to separate your record format from your file format. You can provide custom classes for INPUTFORMAT
, OUTPUTFORMAT
, SERDE
. See details: http://www.dummies.com/programming/big-data/hadoop/defining-table-record-formats-in-hive/
Alternatively you can write simply STORED AS ORC
or STORED AS TEXTFILE
for example.
STORED AS ORC statement already takes care about INPUTFORMAT
, OUTPUTFORMAT
and SERDE
. This allows you not to write those long fully qualified Java class names for INPUTFORMAT
, OUTPUTFORMAT
, SERDE
. Just STORED AS ORC
instead.
STORED AS
implies 3 things:
- SERDE
- INPUTFORMAT
- OUTPUTFORMAT
You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde
hive.default.serde
Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Added in: Hive 0.14 with HIVE-5976
The default SerDe Hive will use for storage formats that do not specify a SerDe.
Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.
Demo
hive.default.serde
set hive.default.serde;
hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
STORED AS ORC
create table mytable (i int)
stored as orc;
show create table mytable;
Note that the SERDE is 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
CREATE TABLE `mytable`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982059')
STORED AS INPUTFORMAT ... OUTPUTFORMAT ...
create table mytable2 (i int)
STORED AS
INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;
show create table mytable2
;
Note that the SERDE is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
CREATE TABLE `mytable2`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable2'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982426')