HIVE - INSERT OVERWRITE vs DROP TABLE + CREATE TABLE + INSERT INTO
One edge consideration is that if your schema changes, INSERT OVERWRITE
will fail, while DROP
+CREATE
+INSERT
will not. While this is unlikely to apply in most scenarios, if you're prototyping workflow/table schemas then it might be worth considering.
For maximum speed I would suggest to 1) issue hadoop fs -rm -r -skipTrash table_dir/*
first to remove old data fast without putting files into trash because INSERT OVERWRITE will put all files into Trash and for very big table this will take a lot of time. Then 2) do INSERT OVERWRITE
command. This will be faster also because you do not need to drop/create table.
UPDATE:
As of Hive 2.3.0 (HIVE-15880), if the table has TBLPROPERTIES ("auto.purge"="true")
the previous data of the table is not moved to Trash when INSERT OVERWRITE
query is run against the table. This functionality is applicable only for managed tables. So, INSERT OVERWRITE with auto purge will work faster than rm -skipTrash
+ INSERT OVERWRITE
or DROP
+CREATE
+INSERT
because it will be a single Hive-only command.