How to specify username when putting files on HDFS from a remote machine?
Shell/Command way:
Set HADOOP_USER_NAME
variable , and execute the hdfs commands
export HADOOP_USER_NAME=manjunath
hdfs dfs -put <source> <destination>
Pythonic way:
import os
os.environ["HADOOP_USER_NAME"] = "manjunath"
If you use the HADOOP_USER_NAME env variable you can tell HDFS which user name to operate with. Note that this only works if your cluster isn't using security features (e.g. Kerberos). For example:
HADOOP_USER_NAME=hdfs hadoop dfs -put ...
By default authentication and authorization is turned off in Hadoop. According to the Hadoop - The Definitive Guide (btw, nice book - would recommend to buy it)
The user identity that Hadoop uses for permissions in HDFS is determined by running the whoami command on the client system. Similarly, the group names are derived from the output of running groups.
So, you can create a new whoami
command which returns the required username and put it in the PATH appropriately, so that the created whoami is found before the actual whoami which comes with Linux is found. Similarly, you can play with the groups
command also.
This is a hack and won't work once the authentication and authorization has been turned on.
This may not matter to anybody, but I am using a small hack for this.
I'm exporting the HADOOP_USER_NAME in .bash_profile, so that every time I'm logging in, the user is set.
Just add the following line of code to .bash_profile:
export HADOOP_USER_NAME=<your hdfs user>