Yarn背景
Yarn全稱為Yet Another Resource Negotiator。是一種資源管理器,負責集群資源的管理和調度,它可以實現對集群所有cpu,內存,文件系統,磁盤等各種資源的分配。
yarn是hadoop mapreduce的第二版本,解決version1的一些問題。
名詞解釋
Application Master (AM):
Resource Manager (RM):
Node Manager (NM):
The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job or a DAG of jobs.
The ResourceManager and the NodeManager form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.
The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
也就是說,RM和NM組成了數據計算框架,RM管理系統中所有資源的框架,NM是管理容器的機器級別的框架(管理機器cpu內存硬盤網絡資源)并匯報給RM/Scheduler。應用級別的AM是框架定義的庫,負責與RM協調資源,和NM一起執行并監控task。
Scheduler是RM的兩個主要部分之一,分別是Scheduler和Applications Manager (ASM)。
yarn并不能單獨安裝,只能通過部署hadoop來安裝yarn。
參考
Apache Hadoop YARN
Architecture of Next Generation Apache Hadoop MapReduce Framework
hadoop雜記-為什么會有Map-reduce v2 (Yarn)
Deploying MapReduce v2 (YARN) on a Cluster