What is the difference between -hivevar and -hiveconf?
@Llama has explained it in detailed, along with that both type of variables are accessed differently.
The --hivevar
variables are accessed using ${var-name}
, while the --hiveconf
are accessed ${hiveconf:var-name}
inside hive.
e.g. Below examples access variable and print it's value in hive.
hivevar:
hive --hivevar a='this is a' -e '!echo ${a};'
output:this is a
hiveconf:
hive --hiveconf a='this is a' -e '!echo ${hiveconf:a};'
output:this is a
We can also use them at the beginning of the script as:
hiveconf:
SET this_dt = CURRENT_DATE;
select ${hiveconf:this_dt};
hivevar:
set hivevar:cur_dt=current_date;
select ${hivevar:cur_dt};
I didn't quite feel like the examples from the documentation were adequate, so here's my attempt at an answer.
In the beginning there was only --hiveconf
and variable substitution didn't exist.
The --hiveconf
option allowed users to set Hive configuration values from the command line and that was it. All Hive configuration values are stored under the hiveconf
namespace, i.e. hiveconf:mapred.reduce.tasks
. These values allowed you to control things like the number of mappers and reducers, if status messages should be displayed, and if the script should continue on errors.
Later, variable substitution was added. This meant you could now use variables in queries with the ${...}
syntax. However, the only variables you could set from the command line were under the hiveconf
namespace using --hiveconf
, so that's where users put their variables.
Putting your personal variables under the Hive configuration namespace probably won't break anything, but it's also not good form. Later, it was suggested that a hivevar
namespace be added specifically for user variables which could also be defined at the command line using --hivevar
. This meant a cleaner separation between Hive configuration values and user defined variables.
In summary:
The hiveconf
namespace and --hiveconf
should be used to set Hive configuration values.
The hivevar
namespace and --hivevar
should be used to define user variables.
Setting user variables under the hiveconf
namespace probably won't break anything, but isn't recommended.