Apache Hadoop Yarn - Underutilization of cores
The problem lies not with yarn-site.xml
or spark-defaults.conf
but actually with the resource calculator that assigns the cores to the executors or in the case of MapReduce jobs, to the Mappers/Reducers.
The default resource calculator i.e org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
uses only memory information for allocating containers and CPU scheduling is not enabled by default. To use both memory as well as the CPU, the resource calculator needs to be changed to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
in the capacity-scheduler.xml
file.
Here's what needs to change.
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>