Minus query in HIVE
It does not appear that HQL supports the MINUS
operator. See this relevant, albeit a bit old, resource:
http://www.quora.com/Apache-Hive/What-are-the-biggest-feature-gaps-between-HiveQL-and-SQL
What you want to do can be done with a LEFT JOIN
or NOT EXISTS
:
SELECT x
FROM abc
LEFT JOIN bcd
ON abc.x = bcd.x
WHERE bcd.x IS NULL
EDIT: Per comments below, NOT EXISTS
is not supported.
SELECT x
FROM abc
WHERE NOT EXISTS (SELECT x FROM bcd)
HQL DOES NOT support minus but you can always use Patrick Tucci solution which works fine when your select-list contains only a few fields. In my case I wanted to find the differences between an entire table (30+ fields) and a backup copy to find records that were different. Here is my solution:
select <all-my-fields>, count(*)
from (
select <all-my-fields> from mytable
union all
select <all-the-fields> from mybackuptable
) merged_data
group by <all-my-fields>
having count(*) = 1
Now this is not completly a "minus" since single records from mybackuptable would show up in the result which is what I wanted. To make it a complete "minus" equivalent I've added this:
select <all-my-fields>
from (
select max(source) source, <all-my-fields>, count(*)
from (
select 1 source, <all-my-fields> from mytable
union all
select 2 source, <all-the-fields> from mybackuptable
) merged_data
group by <all-my-fields>
having count(*) = 1
) minus_data
where source = 1