Spark中很多組件都是靠RPC、事件消息機制實現通信的。前者解決遠程通信問題,后者則是本地較為高效的通信方式
定義ListenerBus
Spark定義了一個trait的ListenerBus,可以接收事件并將事件提交給對應的事件監聽器
private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging {
private[this] val listenersPlusTimers = new CopyOnWriteArrayList[(L, Option[Timer])]
// Marked `private[spark]` for access in tests.
private[spark] def listeners = listenersPlusTimers.asScala.map(_._1).asJava
/**
* Returns a CodaHale metrics Timer for measuring the listener's event processing time.
* This method is intended to be overridden by subclasses.
*/
protected def getTimer(listener: L): Option[Timer] = None
/**
* Add a listener to listen events. This method is thread-safe and can be called in any thread.
*/
final def addListener(listener: L): Unit = {
listenersPlusTimers.add((listener, getTimer(listener)))
}
/**
* Remove a listener and it won't receive any events. This method is thread-safe and can be called
* in any thread.
*/
final def removeListener(listener: L): Unit = {
listenersPlusTimers.asScala.find(_._1 eq listener).foreach { listenerAndTimer =>
listenersPlusTimers.remove(listenerAndTimer)
}
}
/**
* Post the event to all registered listeners. The `postToAll` caller should guarantee calling
* `postToAll` in the same thread for all events.
*/
def postToAll(event: E): Unit = {
// JavaConverters can create a JIterableWrapper if we use asScala.
// However, this method will be called frequently. To avoid the wrapper cost, here we use
// Java Iterator directly.
val iter = listenersPlusTimers.iterator
while (iter.hasNext) {
val listenerAndMaybeTimer = iter.next()
val listener = listenerAndMaybeTimer._1
val maybeTimer = listenerAndMaybeTimer._2
val maybeTimerContext = if (maybeTimer.isDefined) {
maybeTimer.get.time()
} else {
null
}
try {
doPostEvent(listener, event)
} catch {
case NonFatal(e) =>
logError(s"Listener ${Utils.getFormattedClassName(listener)} threw an exception", e)
} finally {
if (maybeTimerContext != null) {
maybeTimerContext.stop()
}
}
}
}
/**
* Post an event to the specified listener. `onPostEvent` is guaranteed to be called in the same
* thread for all listeners.
*/
protected def doPostEvent(listener: L, event: E): Unit
private[spark] def findListenersByClass[T <: L : ClassTag](): Seq[T] = {
val c = implicitly[ClassTag[T]].runtimeClass
listeners.asScala.filter(_.getClass == c).map(_.asInstanceOf[T]).toSeq
}
}
ListenerBus是個泛型trait。其泛型參數為[L <: AnyRef, E]。L表示監聽器引用,E表示事件。主要是將事件投遞給對應的監聽器處理
- listenersPlusTimers:維護所有注冊的監聽器+Timer。其數據結構是線程安全的CopyOnWriteArrayList,存放的對象是二元組(L, Option[Timer])。L是監聽器引用,Timer是Metrics計時器,用來統計RPC以及耗時
- getTimer:獲取監聽器的時間統計Timer。protected類型,子類覆蓋實現
- addListener:向listenersPlusTimers中添加監聽器。final且thread-safe
- removeListener:從listenersPlusTimers中移除監聽器
- postToAll:將事件post給所有監聽器。線程不安全;使用java iterator避免asScala的wrapper cost;Timer統計;catch所有NonFatal Throwable
- doPostEvent:將事件post給指定的監聽器。具體實現由子類override
- findListenersByClass:查找與指定類型T相同的監聽器列表。implicitly[ClassTag[T]].runtimeClass:獲取泛型T的實際類型;asInstanceOf:對象強制轉換為T類型
ListenerBus繼承體系
- SparkListenerBus:trait類型。實現將SparkListenerEvent事件doPostEvent()給SparkListenerInterface監聽器。SparkListenerInterface、SparkListenerEvent也都是trait類型
- AsyncEventQueue:處理事件的start、stop、dispatch、post功能。事件保存在LinkedBlockingQueue
- ReplayListenerBus:用于從序列化的事件流中重放事件
- ExternalCatalog:abstract class。提供了操作DB、Table、Function的功能抽象并postToAll操作事件Pre、After
- HiveExternalCatalog:操作Hive MetaStore的具體功能實現
- InMemoryCatalog:HashMap保存并操作DB、Table信息
- StreamingListenerBus:針對spark streaming各個階段的事件監聽
- StreamingQueryListenerBus:Data Stream查詢事件
SparkListenerBus詳解
private[spark] trait SparkListenerBus
extends ListenerBus[SparkListenerInterface, SparkListenerEvent] {
protected override def doPostEvent(
listener: SparkListenerInterface,
event: SparkListenerEvent): Unit = {
event match {
case stageSubmitted: SparkListenerStageSubmitted =>
listener.onStageSubmitted(stageSubmitted)
case stageCompleted: SparkListenerStageCompleted =>
listener.onStageCompleted(stageCompleted)
...
case _ => listener.onOtherEvent(event)
}
}
}
SparkListenerBus繼承ListenerBus,實現doPostEvent()方法。SparkListenerInterface是listener的trait抽象,SparkListenerEvent是event的trait抽象。SparkListener是abstract class類型,對SparkListenerInterface的缺省適配,提供方法的空實現
SparkListenerBus實現了具體類型的event與listener的調用綁定
AsyncEventQueue詳解
AsyncEventQueue繼承SparkListenerBus、ListenerBus,擁有CopyOnWriteArrayList類型的監聽器列表,以及LinkedBlockingQueue事件隊列。監聽器類型是SparkListenerInterface,事件類型是SparkListenerEvent
private class AsyncEventQueue(val name: String, conf: SparkConf, metrics: LiveListenerBusMetrics)
extends SparkListenerBus
with Logging {
import AsyncEventQueue._
// Cap the capacity of the queue so we get an explicit error (rather than an OOM exception) if
// it's perpetually being added to more quickly than it's being drained.
// 10000長度的阻塞隊列,保存事件
private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](
conf.get(LISTENER_BUS_EVENT_QUEUE_CAPACITY))
// Keep the event count separately, so that waitUntilEmpty() can be implemented properly;
// this allows that method to return only when the events in the queue have been fully
// processed (instead of just dequeued).
// 未執行事件數。可以直接eventQueue.size()獲取隊列count
private val eventCount = new AtomicLong()
/** A counter for dropped events. It will be reset every time we log it. */
// 刪除事件數。日志打印時清零:droppedEventsCounter.compareAndSet(droppedCount, 0)
private val droppedEventsCounter = new AtomicLong(0L)
/** When `droppedEventsCounter` was logged last time in milliseconds. */
@volatile private var lastReportTimestamp = 0L
private val logDroppedEvent = new AtomicBoolean(false)
private var sc: SparkContext = null
private val started = new AtomicBoolean(false)
private val stopped = new AtomicBoolean(false)
private val droppedEvents = metrics.metricRegistry.counter(s"queue.$name.numDroppedEvents")
private val processingTime = metrics.metricRegistry.timer(s"queue.$name.listenerProcessingTime")
// Remove the queue size gauge first, in case it was created by a previous incarnation of
// this queue that was removed from the listener bus.
metrics.metricRegistry.remove(s"queue.$name.size")
metrics.metricRegistry.register(s"queue.$name.size", new Gauge[Int] {
override def getValue: Int = eventQueue.size()
})
// 新建一個daemon dispatchThread,用于事件分發
private val dispatchThread = new Thread(s"spark-listener-group-$name") {
setDaemon(true)
override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
dispatch()
}
}
// 分發事件。take事件,檢查是否為POISON_PILL,然后將事件投遞給監聽器postToAll
private def dispatch(): Unit = LiveListenerBus.withinListenerThread.withValue(true) {
try {
var next: SparkListenerEvent = eventQueue.take()
while (next != POISON_PILL) {
val ctx = processingTime.time()
try {
super.postToAll(next)
} finally {
ctx.stop()
}
eventCount.decrementAndGet()
// 這里take是阻塞獲取隊列事件,保證next有值,while就會一直循環
next = eventQueue.take()
}
eventCount.decrementAndGet()
} catch {
case ie: InterruptedException =>
logInfo(s"Stopping listener queue $name.", ie)
}
}
override protected def getTimer(listener: SparkListenerInterface): Option[Timer] = {
metrics.getTimerForListenerClass(listener.getClass.asSubclass(classOf[SparkListenerInterface]))
}
/**
* Start an asynchronous thread to dispatch events to the underlying listeners.
*
* @param sc Used to stop the SparkContext in case the async dispatcher fails.
*/
// 啟動異步線程
private[scheduler] def start(sc: SparkContext): Unit = {
if (started.compareAndSet(false, true)) {
this.sc = sc
dispatchThread.start()
} else {
throw new IllegalStateException(s"$name already started!")
}
}
/**
* Stop the listener bus. It will wait until the queued events have been processed, but new
* events will be dropped.
*/
// 停止事件分發。修改stopped狀態,put POISON_PILL到eventQueue,等待線程執行結束dispatchThread.join()。優雅的實現停止:POISON_PILL
private[scheduler] def stop(): Unit = {
if (!started.get()) {
throw new IllegalStateException(s"Attempted to stop $name that has not yet started!")
}
if (stopped.compareAndSet(false, true)) {
eventCount.incrementAndGet()
eventQueue.put(POISON_PILL)
}
dispatchThread.join()
}
// 接收事件
def post(event: SparkListenerEvent): Unit = {
// stop時直接return
if (stopped.get()) {
return
}
eventCount.incrementAndGet()
// offer事件到eventQueue,成功則return,false則drop event
if (eventQueue.offer(event)) {
return
}
eventCount.decrementAndGet()
droppedEvents.inc()
// 當隊列滿時,drop event且droppedEventsCounter變量加1
droppedEventsCounter.incrementAndGet()
if (logDroppedEvent.compareAndSet(false, true)) {
// Only log the following message once to avoid duplicated annoying logs.
logError(s"Dropping event from queue $name. " +
"This likely means one of the listeners is too slow and cannot keep up with " +
"the rate at which tasks are being started by the scheduler.")
}
logTrace(s"Dropping event $event")
val droppedCount = droppedEventsCounter.get
if (droppedCount > 0) {
// Don't log too frequently
// 超過一分鐘時log記錄
if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) {
// There may be multiple threads trying to decrease droppedEventsCounter.
// Use "compareAndSet" to make sure only one thread can win.
// And if another thread is increasing droppedEventsCounter, "compareAndSet" will fail and
// then that thread will update it.
// 這里使用compareAndSet執行線程安全。也可以雙重校驗鎖判斷lastReportTimestamp,但沒有CAS輕量!
if (droppedEventsCounter.compareAndSet(droppedCount, 0)) {
val prevLastReportTimestamp = lastReportTimestamp
lastReportTimestamp = System.currentTimeMillis()
val previous = new java.util.Date(prevLastReportTimestamp)
logWarning(s"Dropped $droppedCount events from $name since $previous.")
}
}
}
}
}
// 一個“毒藥丸”事件,標識隊列事件執行到此結束
private object AsyncEventQueue {
val POISON_PILL = new SparkListenerEvent() { }
}
LiveListenerBus詳解
LiveListenerBus類是AsyncEventQueue管理器,建立了shared、appStatus、executorManagement、eventLog四種AsyncEventQueue事件調度器,保存在CopyOnWriteArrayList列表。創建多個AsyncEventQueue,提高事件響應時間
post方法會將事件投遞給所有Queue,而每個Queue的listener又是不同的,根據name區分。start時,CopyOnWriteArrayList列表中的所有AsyncEventQueue都會start,也就是迭代調用AsyncEventQueue.start(),多個線程并發事件調度
private val queues = new CopyOnWriteArrayList[AsyncEventQueue]()
/** Add a listener to queue shared by all non-internal listeners. */
def addToSharedQueue(listener: SparkListenerInterface): Unit = {
addToQueue(listener, SHARED_QUEUE)
}
/** Add a listener to the executor management queue. */
def addToManagementQueue(listener: SparkListenerInterface): Unit = {
addToQueue(listener, EXECUTOR_MANAGEMENT_QUEUE)
}
/** Add a listener to the application status queue. */
def addToStatusQueue(listener: SparkListenerInterface): Unit = {
addToQueue(listener, APP_STATUS_QUEUE)
}
/** Add a listener to the event log queue. */
def addToEventLogQueue(listener: SparkListenerInterface): Unit = {
addToQueue(listener, EVENT_LOG_QUEUE)
}
private[spark] def addToQueue(
listener: SparkListenerInterface,
queue: String): Unit = synchronized {
if (stopped.get()) {
throw new IllegalStateException("LiveListenerBus is stopped.")
}
queues.asScala.find(_.name == queue) match {
case Some(queue) =>
queue.addListener(listener)
case None =>
val newQueue = new AsyncEventQueue(queue, conf, metrics)
newQueue.addListener(listener)
if (started.get()) {
newQueue.start(sparkContext)
}
queues.add(newQueue)
}
}
Hadoop的AsyncDispatcher詳解
Hadoop的AsyncDispatcher與Spark的AsyncEventQueue功能類似,部分實現細節不一樣
public interface Event<TYPE extends Enum<TYPE>> {
TYPE getType();
long getTimestamp();
String toString();
}
public interface EventHandler<T extends Event> {
void handle(T event);
}
public interface Dispatcher {
// Configuration to make sure dispatcher crashes but doesn't do system-exit in
// case of errors. By default, it should be false, so that tests are not
// affected. For all daemons it should be explicitly set to true so that
// daemons can crash instead of hanging around.
public static final String DISPATCHER_EXIT_ON_ERROR_KEY =
"yarn.dispatcher.exit-on-error";
public static final boolean DEFAULT_DISPATCHER_EXIT_ON_ERROR = false;
EventHandler getEventHandler();
void register(Class<? extends Enum> eventType, EventHandler handler);
}
EventHandler.handle(Event)這是完整的方法簽名。抽象事件處理器接口EventHandler,具體化事件處理器方法handle(),抽象事件Event
那怎么一個EventHandler處理多個Event?instanceof或type,就能一個handle方法處理多種Event。Hadoop引入Enum類型的type:TYPE extends Enum<TYPE>
Dispatcher接口將EventType與EventHandler綁定。這里eventType是Class類型,而不是Enum。比如JobEventHandler直接綁定JobEventType class類型即可。若是綁定具體Enum,則要循環多次,比如JobEventType.JOB_INIT綁定JobEventHandler,JobEventType.JOB_START綁定JobEventHandler
public abstract class AbstractEvent<TYPE extends Enum<TYPE>>
implements Event<TYPE> {
private final TYPE type;
private final long timestamp;
// use this if you DON'T care about the timestamp
public AbstractEvent(TYPE type) {
this.type = type;
// We're not generating a real timestamp here. It's too expensive.
timestamp = -1L;
}
// use this if you care about the timestamp
public AbstractEvent(TYPE type, long timestamp) {
this.type = type;
this.timestamp = timestamp;
}
@Override
public long getTimestamp() {
return timestamp;
}
@Override
public TYPE getType() {
return type;
}
@Override
public String toString() {
return "EventType: " + getType();
}
}
因為Event事件都會具體實例化,所以定義一個abstract class AbstractEvent,將type、timestamp共同的屬性放入構造函數。若是將AbstractEvent定義為普通類class,則可以實例化,但AbstractEvent的實例化沒有現實意義,所以進行abstract
public class AsyncDispatcher extends AbstractService implements Dispatcher {
private static final Log LOG = LogFactory.getLog(AsyncDispatcher.class);
private final BlockingQueue<Event> eventQueue;
private volatile boolean stopped = false;
// Configuration flag for enabling/disabling draining dispatcher's events on
// stop functionality.
private volatile boolean drainEventsOnStop = false;
// Indicates all the remaining dispatcher's events on stop have been drained
// and processed.
private volatile boolean drained = true;
private Object waitForDrained = new Object();
// For drainEventsOnStop enabled only, block newly coming events into the
// queue while stopping.
private volatile boolean blockNewEvents = false;
private EventHandler handlerInstance = null;
private Thread eventHandlingThread;
protected final Map<Class<? extends Enum>, EventHandler> eventDispatchers;
private boolean exitOnDispatchException;
// 無長度限制的LinkedBlockingQueue
public AsyncDispatcher() {
this(new LinkedBlockingQueue<Event>());
}
public AsyncDispatcher(BlockingQueue<Event> eventQueue) {
super("Dispatcher");
this.eventQueue = eventQueue;
this.eventDispatchers = new HashMap<Class<? extends Enum>, EventHandler>();
}
Runnable createThread() {
return new Runnable() {
@Override
public void run() {
while (!stopped && !Thread.currentThread().isInterrupted()) {
drained = eventQueue.isEmpty();
// blockNewEvents is only set when dispatcher is draining to stop,
// adding this check is to avoid the overhead of acquiring the lock
// and calling notify every time in the normal run of the loop.
if (blockNewEvents) {
synchronized (waitForDrained) {
if (drained) {
waitForDrained.notify();
}
}
}
Event event;
try {
event = eventQueue.take();
} catch(InterruptedException ie) {
if (!stopped) {
LOG.warn("AsyncDispatcher thread interrupted", ie);
}
return;
}
if (event != null) {
dispatch(event);
}
}
}
};
}
@Override
protected void serviceInit(Configuration conf) throws Exception {
this.exitOnDispatchException =
conf.getBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY,
Dispatcher.DEFAULT_DISPATCHER_EXIT_ON_ERROR);
super.serviceInit(conf);
}
@Override
protected void serviceStart() throws Exception {
//start all the components
super.serviceStart();
eventHandlingThread = new Thread(createThread());
eventHandlingThread.setName("AsyncDispatcher event handler");
eventHandlingThread.start();
}
public void setDrainEventsOnStop() {
drainEventsOnStop = true;
}
@Override
protected void serviceStop() throws Exception {
if (drainEventsOnStop) {
blockNewEvents = true;
LOG.info("AsyncDispatcher is draining to stop, igonring any new events.");
synchronized (waitForDrained) {
while (!drained && eventHandlingThread != null
&& eventHandlingThread.isAlive()) {
waitForDrained.wait(1000);
LOG.info("Waiting for AsyncDispatcher to drain. Thread state is :" +
eventHandlingThread.getState());
}
}
}
stopped = true;
if (eventHandlingThread != null) {
eventHandlingThread.interrupt();
try {
eventHandlingThread.join();
} catch (InterruptedException ie) {
LOG.warn("Interrupted Exception while stopping", ie);
}
}
// stop all the components
super.serviceStop();
}
@SuppressWarnings("unchecked")
protected void dispatch(Event event) {
//all events go thru this loop
if (LOG.isDebugEnabled()) {
LOG.debug("Dispatching the event " + event.getClass().getName() + "."
+ event.toString());
}
Class<? extends Enum> type = event.getType().getDeclaringClass();
try{
EventHandler handler = eventDispatchers.get(type);
if(handler != null) {
handler.handle(event);
} else {
throw new Exception("No handler for registered for " + type);
}
} catch (Throwable t) {
//TODO Maybe log the state of the queue
LOG.fatal("Error in dispatcher thread", t);
// If serviceStop is called, we should exit this thread gracefully.
if (exitOnDispatchException
&& (ShutdownHookManager.get().isShutdownInProgress()) == false
&& stopped == false) {
Thread shutDownThread = new Thread(createShutDownThread());
shutDownThread.setName("AsyncDispatcher ShutDown handler");
shutDownThread.start();
}
}
}
@SuppressWarnings("unchecked")
@Override
public void register(Class<? extends Enum> eventType,
EventHandler handler) {
/* check to see if we have a listener registered */
EventHandler<Event> registeredHandler = (EventHandler<Event>)
eventDispatchers.get(eventType);
LOG.info("Registering " + eventType + " for " + handler.getClass());
if (registeredHandler == null) {
eventDispatchers.put(eventType, handler);
} else if (!(registeredHandler instanceof MultiListenerHandler)){
/* for multiple listeners of an event add the multiple listener handler */
MultiListenerHandler multiHandler = new MultiListenerHandler();
multiHandler.addHandler(registeredHandler);
multiHandler.addHandler(handler);
eventDispatchers.put(eventType, multiHandler);
} else {
/* already a multilistener, just add to it */
MultiListenerHandler multiHandler
= (MultiListenerHandler) registeredHandler;
multiHandler.addHandler(handler);
}
}
@Override
public EventHandler getEventHandler() {
if (handlerInstance == null) {
handlerInstance = new GenericEventHandler();
}
return handlerInstance;
}
class GenericEventHandler implements EventHandler<Event> {
public void handle(Event event) {
if (blockNewEvents) {
return;
}
drained = false;
/* all this method does is enqueue all the events onto the queue */
int qSize = eventQueue.size();
if (qSize !=0 && qSize %1000 == 0) {
LOG.info("Size of event-queue is " + qSize);
}
int remCapacity = eventQueue.remainingCapacity();
if (remCapacity < 1000) {
LOG.warn("Very low remaining capacity in the event-queue: "
+ remCapacity);
}
try {
eventQueue.put(event);
} catch (InterruptedException e) {
if (!stopped) {
LOG.warn("AsyncDispatcher thread interrupted", e);
}
// Need to reset drained flag to true if event queue is empty,
// otherwise dispatcher will hang on stop.
drained = eventQueue.isEmpty();
throw new YarnRuntimeException(e);
}
};
}
/**
* Multiplexing an event. Sending it to different handlers that
* are interested in the event.
* @param <T> the type of event these multiple handlers are interested in.
*/
// 一個Event對應多個EventHandler
static class MultiListenerHandler implements EventHandler<Event> {
List<EventHandler<Event>> listofHandlers;
public MultiListenerHandler() {
listofHandlers = new ArrayList<EventHandler<Event>>();
}
@Override
public void handle(Event event) {
for (EventHandler<Event> handler: listofHandlers) {
handler.handle(event);
}
}
void addHandler(EventHandler<Event> handler) {
listofHandlers.add(handler);
}
}
Runnable createShutDownThread() {
return new Runnable() {
@Override
public void run() {
LOG.info("Exiting, bbye..");
System.exit(-1);
}
};
}
@VisibleForTesting
protected boolean isEventThreadWaiting() {
return eventHandlingThread.getState() == Thread.State.WAITING;
}
@VisibleForTesting
protected boolean isDrained() {
return this.drained;
}
}
AsyncDispatcher與AsyncEventQueue
異同點
相同點:
- 維護LinkedBlockingQueue保存Event
- 有start、stop方法啟停
- 另起一個線程循環消費Queue并dispatch event
- 提供生產事件方法。AsyncDispatcher是內部類GenericEventHandler.handle;AsyncEventQueue是post方法
不同點:
- AsyncDispatcher提供register方法將Event和EventHandler綁定,通過EventType來進行handle時調用不同方法;AsyncEventQueue提供doPostEvent方法實現不同Event調用Listener接口的不同方法(SparkListenerBus.doPostEvent)
AsyncDispatcher消費事件dispatch時,可以根據Event類型,HashMap中獲取register的EventHandler,然后執行handle方法。AsyncEventQueue消費事件dispatch時,只能遍歷Listener,調用doPostEvent方法,模式匹配后執行對應的事件監聽器
Hadoop的事件是Event,處理器是EventHandler,方法是handle()。而Spark進行了更進一步的抽象,ListenerBus[L, E]。SparkListenerInterface、StreamingListener、ExternalCatalogEventListener是L的泛型trait;SparkListenerEvent、StreamingListenerEvent、ExternalCatalogEvent是E的泛型trait
使用場景
AsyncDispatcher:
ResourceManager、NodeManager、MRAppMaster等類在serviceInit()方法中調用createDispatcher(),創建AsyncDispatcher對象。每個類負責處理一系列Enum類事件,而不只是一種Enum類
rmDispatcher.register(NodesListManagerEventType.class, nodesListManager);
rmDispatcher.register(SchedulerEventType.class, schedulerDispatcher);
rmDispatcher.register(RMAppManagerEventType.class, rmAppManager);
AsyncDispatcher不作全局的事件調度器,而是每個重要的類都有一個自身的AsyncDispatcher調度器,提高事件響應時間
AsyncEventQueue:
LiveListenerBus是對AsyncEventQueue的統一管理器,相比Hadoop的各自為政,架構上更優秀。SparkContext在初始化時創建LiveListenerBus,并傳遞給JobScheduler、DAGScheduler等引用,所以LiveListenerBus必須是線程安全的