Why hive_staging file is missing in AWS EMR
I resolved the issue. Let me explain in detail.
Exceptions that is coming -
- LeaveExpirtedException - from HDFS side.
- FileNotFoundException - from Hive side (when Tez execution engine executes DAG)
Problem scenario-
- We just upgraded the hive version from 0.13.0 to 2.1.0. And, everything was working fine with previous version. Zero runtime exception.
Different thoughts to resolve the issue -
First thought was, two threads was working on same piece because of NN intelligence. But as per below settings
set mapreduce.map.speculative=false set mapreduce.reduce.speculative=false
that was not possible.
then, I increase the count from 1000 to 100000 for below settings -
SET hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.dynamic.partitions.pernode=100000;
that also didn't work.
Then the third thought was, definitely in a same process, what mapper-1 was created was deleted by another mapper/reducer. But, we didn't found any such logs in Hveserver2, Tez logs.
Finally the root cause lies in a application layer code itself. In hive-exec-2.1.0 version, they introduced new configuration property
"hive.exec.stagingdir":".hive-staging"
Description of above property -
Directory name that will be created inside table locations in order to support HDFS encryption. This is replaces ${hive.exec.scratchdir} for query results with the exception of read-only tables. In all cases ${hive.exec.scratchdir} is still used for other temporary files, such as job plans.
So if there is any concurrent jobs in Application layer code (ETL), and are doing operation(rename/delete/move) on same table, then it may lead to this problem.
And, in our case, 2 concurrent jobs are doing "INSERT OVERWRITE" on same table, that leads to delete metadata file of 1 mapper, that is causing this issue.
Resolution -
- Move the metadata file location to outside table(table lies in S3).
- Disable HDFS encryption (as mentioned in Description of stagingdir property.)
- Change into your Application layer code to avoid concurrency issue.