How to update partition metadata in Hive , when partition data is manualy deleted from HDFS
EDIT : Starting with Hive 3.0.0 MSCK
can now discover new partitions or remove missing partitions (or both) using the following syntax :
MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS]
This was implemented in HIVE-17824
As correctly stated by HakkiBuyukcengiz, MSCK REPAIR
doesn't remove partitions if the corresponding folder on HDFS was manually deleted, it only adds partitions if new folders are created.
Extract from offical documentation :
In other words, it will add any partitions that exist on HDFS but not in metastore to the metastore.
This is what I usually do in the presence of external
tables if multiple partitions folders are manually deleted on HDFS and I want to quickly refresh the partitions :
- Drop the table (
DROP TABLE table_name
) (dropping an external table does not delete the underlying partition files) - Recreate the table (
CREATE EXTERNAL TABLE table_name ...
) - Repair it (
MSCK REPAIR TABLE table_name
)
Depending on the number of partitions this can take a long time. The other solution is to use ALTER TABLE DROP PARTITION (...)
for each deleted partition folder but this can be tedious if multiple partitions were deleted.
Try using
MSCK REPAIR TABLE <tablename>;
Ensure the table is set to external, drop all partitions then run the table repair:
alter table mytable_name set TBLPROPERTIES('EXTERNAL'='TRUE')
alter table mytable_name drop if exists partition (`mypart_name` <> 'null');
msck repair table mytable_name;
If msck repair throws an error, then run hive from the terminal as:
hive --hiveconf hive.msck.path.validation=ignore
or set hive.msck.path.validation=ignore;