how to map column names in a hive table and replace it with new values in hive table
Assuming you can get column headers in you source CSV, you will need to map them from source number to their column names.
sed -i 's/1/NBR/g; s/2/GMB/g; s/3/GSB/g; s/4/KTC/g; s/5/VRV/g; s/6/AMB/g;...;...;...;...' input.csv
Since you only get an unknown subset of the total columns in your hive table, you will need to translate your CSV from
NBR,GMB,AMB,KTC
u,f,b,h
a,f,r,m
q,r,b,c
to
NBR,GMB,GSB,KTC,VRV,AMB,...,...,...,...
u,f,null,b,null,h,null,null,null,null
a,f,null,r,null,m,null,null,null,null
q,r,null,b,null,c,null,null,null,null
in order to properly insert them into your table.
From the Apache Wiki:
Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal
Using LOAD DATA INPATH
, even with the tblproperties("skip.header.line.count"="1")
set, still requires a valid SQL literal for all columns in the table. This is why youre missing columns.
If you can not get the producer of the CSV to create a file with 1,2,...9,10 columns in order with your table columns and either consecutive commas or a null
character in the data, write some kind of script to add missing column names, in the order you need them in, and the required null
values in the data.