Difference between Application Manager and Application Master in YARN?
Here Application refers to a single job assigned to the framework.
The Application manager is responsible to accept or reject the application when it is submitted to the Resource manager by the client.
The Application master is responsible for the execution of a single application when it is assigned to the Node manager by the Resource manager.
Does this make sense?
The terms Application Master and Application Manager are often used interchangeably. In reality Application Master is the main container requesting, launching and monitoring application specific resources, whereas Application Manager is a component inside ResourceManager. More details about Application Manager is given below.
The ApplicationsManager is responsible for maintaining a collection of submitted applications. After application submission, it first validates the application’s specifications and rejects any application that requests unsatisfiable resources for its ApplicationMaster (i.e., there is no node in the cluster that has enough resources to run the ApplicationMaster itself). It then ensures that no other application was already submitted with the same application ID—a scenario that can be caused by an erroneous or a malicious client. Finally, it forwards the admitted application to the scheduler. This component is also responsible for recording and managing finished applications for a while before they are completely evacuated from the ResourceManager’s memory. When an application finishes, it places an ApplicationSummary in the daemon’s log file. Finally, the ApplicationsManager keeps a cache of completed applications long after applications finish to support users’ requests for application data (via web UI or command line). The configuration property yarn.resourcemanager.max-completed-applications controls the maximum number of such finished applications that the ResourceManager remembers at any point of time. The cache is a first-in, first-out list, with the oldest applications being moved out to accommodate freshly finished applications.
Reference: Hadoop YARN Book
To understand this concept we need to understand the complete flow of Job/Application submitted via YARN in Hadoop.
Before we jump to execution flow we need to understand some key concepts:
KEY CONCEPTS:
- Yarn is comprised of Resource Manager and Node Manager
- There is only one Resource Manager which runs on Master Node
- There will be multiple Node Managers running on each Data Node
- Resource Manager deals with resource management to execute any Job/Application
- Node Manager takes care of individual tasks/processes submitted to them
- Please note that YARN is a generic Framework, its not only meant to execute Map Reduce Jobs. It can be used to execute any application, say main() of a Java Application.
Now, lets discuss about Job/Application Flow via YARN
- Client submits a Job to YARN.
- The submitted Job can be a Map Reduce Job or any other application/process
- This Job/application is picked by Resource Manager
- Since there can be multiple Jobs/applications submitted to Resource Manager, hence Resource Manager will check the scheduling algorithm, available capacity to see if submitted Job/Application can be launched
- When Resource Manager finds that it can launch newly submitted Job/Application, it allocates a Container. Container is a set of resources (CPU,memory etc) required to launch the Job/Application
- It checks which Node can take up this request, once it finds a Node then it contacts the appropriate Node Manager for the same
- Node Manager will then actually allocate the resources required to execute the Job/application and will then launch Application Master Process within Container
- Application Master Process is the main process for Job/Application execution. Please note that Application Master is Framework specific implementation. Map Reduce Framework has its own implementation of Application Master.
- Application Master will check if additional resources or containers are required to execute the Job/Application. This is the case when we submit a Map Reduce Job where Multiple Mappers and Reducers will be required to accomplish the Job.
- If additional resources are required then Application Master will negotiate with Resource Manager to allocate resources/containers. It will be responsibility of Application Master to execute and monitor the individual tasks for an application/job.
- The request made by Application Master to Resource Manager is known as Resource Request. The request contains the resources required to execute the individual Task and a location constraint. Location constraint is required as Task needs to be run in as proximity to data as possible to conserver network bandwidth. 12 As a response to Resource Request, Resource Manager will spawn a Node Manager on the selected Node. Node Manager will then allocate resources for the container. Within that container, task will run. This task is known as App Process.
- If there are multiple Mappers then there will be multiple App Process running (in a container) on multiple Nodes. Each of them will send their heat beat to their Application Master Process. This is how Application Master will monitor individual Tasks it launches.
- Application Master will also send its Heart Beat Signal to Resource Manager to indicate status of Job/Application execution.
- Once any Application execution is completed then Application Master for that application will be de-registered.
I hope this makes some clarity