Azkaban Learning


title: Azkaban Learning
date: 2017-01-11 11:54:03
tags: [Azkaban,調度系統,大數據組件]
categories: "調度系統"


Azkaban

關鍵字:Azkaban簡介、大數據作業調度系統

這篇文章適合對azkaban有一定了解的人閱讀。建議先粗讀:
AZ開發文檔:http://azkaban.github.io/azkaban/docs/latest/#overview
強子哥的源碼分析:https://my.oschina.net/qiangzigege/blog/653198
(以下內容部分摘自上兩個鏈接)

azkaban源碼: git clone https://github.com/azkaban/azkaban.git

Azkaban 簡介

Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products.
Initially a single server solution, with the increased number of Hadoop users over the years, Azkaban has evolved to be a more robust solution.

Azkaban 是由Linkedln公司為了解決hadoop 作業之間的依賴而實現的。因為有一些ETL作業以及數據分析產品需要按照一定的順序去執行。
隨著hadoop用戶的逐年增加,Azkaban從一個簡單的服務解決方案發展成為一個更加健壯魯棒的方案。

Azkaban的系統架構主要由三個組件組成:

  • WebServer :暴露Restful API,提供分發作業和調度作業功能;
  • ExecServer :對WebServer 暴露 API ,提供執行作業的功能;
  • MySQL :數據存儲,實現Web 和 Exec之間的數據共享和部分狀態的同步。
azkaban.png

多執行節點模式下,更細節一點的架構圖可以如下,圖中省略MySQL數據庫:

myAzkaban.png

非常簡單而直觀


WebServer

暴露Restful API

在azkaban-webserver工程中,可以非常清晰地看到對外暴露的Servlet,其中最主要的幾個是:

  • ExecutorServlet 主要提供立即執行作業、取消作業流、暫停作業流、獲取流或節點日志等接口
  • ScheduleServlet 主要提供設置調度、設置Sla報警規則、獲取調度信息等接口
  • HistoryServlet 主要提供查看作業流執行歷史的接口
  • ProjectManagerServlet 主要提供上傳項目zip包、下載項目zip包、刪除項目、獲取流的DAG等接口

分發作業

ExecutorManager 主要承擔這部分的功能,所有類型的作業(包括立即執行和調度執行),都會通過submitExecutableFlow(ExecutableFlow exflow, String userId)這個方法進行提交。

在該方法中,我們可以看到:如果是多執行節點模式下,執行實例先放進分發隊列中;如果是單節點模式下,立即調用dispatch方法進行分發。

if (isMultiExecutorMode()) {
    //Take MultiExecutor route
    executorLoader.addActiveExecutableReference(reference);
    queuedFlows.enqueue(exflow, reference);
} else {
    // assign only local executor we have
    Executor choosenExecutor = activeExecutors.iterator().next();
    executorLoader.addActiveExecutableReference(reference);
    try {
        dispatch(reference, exflow, choosenExecutor);
    } catch (ExecutorManagerException e) {
        executorLoader.removeActiveExecutableReference(reference.getExecId());
            throw e;
    }
}

在多執行節點模式下,執行實例被放進分發隊列。隊列會被線程 QueueProcessorThread 定時處理。

    /* Method responsible for processing the non-dispatched flows */
    private void processQueuedFlows(long activeExecutorsRefreshWindow,
      int maxContinuousFlowProcessed) throws InterruptedException,
      ExecutorManagerException {
      long lastExecutorRefreshTime = 0;
      Pair<ExecutionReference, ExecutableFlow> runningCandidate;
      int currentContinuousFlowProcessed = 0;

      while (isActive() && (runningCandidate = queuedFlows.fetchHead()) != null) {
        ExecutionReference reference = runningCandidate.getFirst();
        ExecutableFlow exflow = runningCandidate.getSecond();

        long currentTime = System.currentTimeMillis();

        // if we have dispatched more than maxContinuousFlowProcessed or
        // It has been more then activeExecutorsRefreshWindow millisec since we
        // refreshed
        if (currentTime - lastExecutorRefreshTime > activeExecutorsRefreshWindow
          || currentContinuousFlowProcessed >= maxContinuousFlowProcessed) {
          // Refresh executorInfo for all activeExecutors
          refreshExecutors();
          lastExecutorRefreshTime = currentTime;
          currentContinuousFlowProcessed = 0;
        }

        /**
         * <pre>
         *  TODO: Work around till we improve Filters to have a notion of GlobalSystemState.
         *        Currently we try each queued flow once to infer a global busy state
         * Possible improvements:-
         *   1. Move system level filters in refreshExecutors and sleep if we have all executors busy after refresh
         *   2. Implement GlobalSystemState in selector or in a third place to manage system filters. Basically
         *      taking out all the filters which do not depend on the flow but are still being part of Selector.
         * Assumptions:-
         *   1. no one else except QueueProcessor is updating ExecutableFlow update time
         *   2. re-attempting a flow (which has been tried before) is considered as all executors are busy
         * </pre>
         */
        if(exflow.getUpdateTime() > lastExecutorRefreshTime) {
          // put back in the queue
          queuedFlows.enqueue(exflow, reference);
          long sleepInterval =
            activeExecutorsRefreshWindow
              - (currentTime - lastExecutorRefreshTime);
          // wait till next executor refresh
          sleep(sleepInterval);
        } else {
          exflow.setUpdateTime(currentTime);
          // process flow with current snapshot of activeExecutors
          selectExecutorAndDispatchFlow(reference, exflow, new HashSet<Executor>(activeExecutors));
        }

        // do not count failed flow processsing (flows still in queue)
        if(queuedFlows.getFlow(exflow.getExecutionId()) == null) {
          currentContinuousFlowProcessed++;
        }
      }
    }

selectExecutorAndDispatchFlow 方法先是選擇執行節點(選擇節點的實現比較有意思),選好節點后最終也是調用了dispatch進行作業分發。

/* process flow with a snapshot of available Executors */
    private void selectExecutorAndDispatchFlow(ExecutionReference reference,
      ExecutableFlow exflow, Set<Executor> availableExecutors)
      throws ExecutorManagerException {
      synchronized (exflow) {
        Executor selectedExecutor = selectExecutor(exflow, availableExecutors);
        if (selectedExecutor != null) {
          try {
            dispatch(reference, exflow, selectedExecutor);
          } catch (ExecutorManagerException e) {
            logger.warn(String.format(
              "Executor %s responded with exception for exec: %d",
              selectedExecutor, exflow.getExecutionId()), e);
            handleDispatchExceptionCase(reference, exflow, selectedExecutor,
              availableExecutors);
          }
        } else {
          handleNoExecutorSelectedCase(reference, exflow);
        }
      }
    }

因為Web 和Exec 之間是通過mysql進行數據共享的,所以dispatch進行作業分發的邏輯非常簡單,就是簡單地通過HTTP請求傳遞execId等信息,其余所需要的數據都通過數據庫讀寫完成。

調度作業

調度作業是調度系統的最重要的功能之一,也是Azkaban里相對復雜的一個模塊。調度是通過ScheduleManager對外暴露,對應著的結構是Schedule;對內是通過TriggerManager來實現,對應著的結構是Trigger。

所有的調度信息都通過ScheduleManager.scheduleFlow傳入,可以看到傳入參數包含了項目id、項目名字、流名字、第一次調度時間戳、時區、調度周期、下一次執行時間、提交時間、提交人。對于一個調度來說,最關鍵的信息無非是第一次調度時間和調度周期。

public Schedule scheduleFlow(final int scheduleId, final int projectId,
      final String projectName, final String flowName, final String status,
      final long firstSchedTime, final DateTimeZone timezone,
      final ReadablePeriod period, final long lastModifyTime,
      final long nextExecTime, final long submitTime, final String submitUser)

從scheduleFlow 往下可以看到調用了TriggerBasedScheduleLoader.insertSchedule。這個方法里邊先是將Schedule轉換成了Trigger,然后將Trigger放到了TriggerManager里邊。scheduleToTrigger方法寫的非常簡潔巧妙,讀者自行研究,此處不作細致分析。

 @Override
  public void insertSchedule(Schedule s) throws ScheduleManagerException {
    Trigger t = scheduleToTrigger(s);
    try {
      triggerManager.insertTrigger(t, t.getSubmitUser());
      s.setScheduleId(t.getTriggerId());
    } catch (TriggerManagerException e) {
      throw new ScheduleManagerException("Failed to insert new schedule!", e);
    }
  }

我們在繼續看看Trigger被塞到TriggerManager做了些啥。從下邊可以看到,先是調用triggerLoader寫進數據庫,然后就放到了一個線程runnerThread中去。

public void insertTrigger(Trigger t) throws TriggerManagerException {
    synchronized (syncObj) {
      try {
        triggerLoader.addTrigger(t);
      } catch (TriggerLoaderException e) {
        throw new TriggerManagerException(e);
      }
      runnerThread.addTrigger(t);
      triggerIdMap.put(t.getTriggerId(), t);
    }
  }

接下來就顯而易見了,這個線程TriggerScannerThread runnerThread 定期檢查Trigger是否應該觸發(onTriggerTrigger)或者終止(onTriggerExpire)。

    private void checkAllTriggers() throws TriggerManagerException {
      long now = System.currentTimeMillis();

      // sweep through the rest of them
      for (Trigger t : triggers) {
        try {
          scannerStage = "Checking for trigger " + t.getTriggerId();

          boolean shouldSkip = true;
          if (shouldSkip && t.getInfo() != null && t.getInfo().containsKey("monitored.finished.execution")) {
            int execId = Integer.valueOf((String) t.getInfo().get("monitored.finished.execution"));
            if (justFinishedFlows.containsKey(execId)) {
              logger.info("Monitored execution has finished. Checking trigger earlier " + t.getTriggerId());
              shouldSkip = false;
            }
          }
          if (shouldSkip && t.getNextCheckTime() > now) {
            shouldSkip = false;
          }

          if (shouldSkip) {
            logger.info("Skipping trigger" + t.getTriggerId() + " until " + t.getNextCheckTime());
          }

          if (logger.isDebugEnabled()) {
            logger.info("Checking trigger " + t.getTriggerId());
          }
          if (t.getStatus().equals(TriggerStatus.READY)) {
            if (t.triggerConditionMet()) {
              onTriggerTrigger(t);
            } else if (t.expireConditionMet()) {
              onTriggerExpire(t);
            }
          }
          if (t.getStatus().equals(TriggerStatus.EXPIRED) && t.getSource().equals("azkaban")) {
            removeTrigger(t);
          } else {
            t.updateNextCheckTime();
          }
        } catch (Throwable th) {
          //skip this trigger, moving on to the next one
          logger.error("Failed to process trigger with id : " + t.getTriggerId(), th);
        }
      }
    }

Trigger觸發的時候就會調用自己的action.doAction(),調度任務的Trigger的action一般都是ExecuteFlowAction,其doAction方法如下。方法主要做了兩個事情,第一個是構建執行實例ExecutableFlow,第二個是如果該調度設置了報警規則,則構建SlaTrigger。

構建執行實例完成后,可以看到調用了executorManager.submitExecutableFlow(exflow, submitUser) 進行作業分發,這樣子,就跟上文提到的作業分發殊途同歸。下邊不再分析。

  @Override
  public void doAction() throws Exception {
    if (projectManager == null || executorManager == null) {
      throw new Exception("ExecuteFlowAction not properly initialized!");
    }

    Project project = projectManager.getProject(projectId);
    if (project == null) {
      logger.error("Project to execute " + projectId + " does not exist!");
      throw new RuntimeException("Error finding the project to execute "
          + projectId);
    }

    Flow flow = project.getFlow(flowName);
    if (flow == null) {
      logger.error("Flow " + flowName + " cannot be found in project "
          + project.getName());
      throw new RuntimeException("Error finding the flow to execute "
          + flowName);
    }

    ExecutableFlow exflow = new ExecutableFlow(project, flow);
    exflow.setSubmitUser(submitUser);
    exflow.addAllProxyUsers(project.getProxyUsers());

    if (executionOptions == null) {
      executionOptions = new ExecutionOptions();
    }
    if (!executionOptions.isFailureEmailsOverridden()) {
      executionOptions.setFailureEmails(flow.getFailureEmails());
    }
    if (!executionOptions.isSuccessEmailsOverridden()) {
      executionOptions.setSuccessEmails(flow.getSuccessEmails());
    }
    exflow.setExecutionOptions(executionOptions);

    try {
      executorManager.submitExecutableFlow(exflow, submitUser);
      logger.info("Invoked flow " + project.getName() + "." + flowName);
    } catch (ExecutorManagerException e) {
      throw new RuntimeException(e);
    }

    // deal with sla
    if (slaOptions != null && slaOptions.size() > 0) {
      int execId = exflow.getExecutionId();
      for (SlaOption sla : slaOptions) {
        logger.info("Adding sla trigger " + sla.toString() + " to execution "
            + execId);
        SlaChecker slaFailChecker =
            new SlaChecker("slaFailChecker", sla, execId);
        Map<String, ConditionChecker> slaCheckers =
            new HashMap<String, ConditionChecker>();
        slaCheckers.put(slaFailChecker.getId(), slaFailChecker);
        Condition triggerCond =
            new Condition(slaCheckers, slaFailChecker.getId()
                + ".isSlaFailed()");
        // if whole flow finish before violate sla, just expire
        SlaChecker slaPassChecker =
            new SlaChecker("slaPassChecker", sla, execId);
        Map<String, ConditionChecker> expireCheckers =
            new HashMap<String, ConditionChecker>();
        expireCheckers.put(slaPassChecker.getId(), slaPassChecker);
        Condition expireCond =
            new Condition(expireCheckers, slaPassChecker.getId()
                + ".isSlaPassed()");
        List<TriggerAction> actions = new ArrayList<TriggerAction>();
        List<String> slaActions = sla.getActions();
        for (String act : slaActions) {
          if (act.equals(SlaOption.ACTION_ALERT)) {
            SlaAlertAction slaAlert =
                new SlaAlertAction("slaAlert", sla, execId);
            actions.add(slaAlert);
          } else if (act.equals(SlaOption.ACTION_CANCEL_FLOW)) {
            KillExecutionAction killAct =
                new KillExecutionAction("killExecution", execId);
            actions.add(killAct);
          }
        }
        Trigger slaTrigger =
            new Trigger("azkaban_sla", "azkaban", triggerCond, expireCond,
                actions);
        slaTrigger.getInfo().put("monitored.finished.execution",
            String.valueOf(execId));
        slaTrigger.setResetOnTrigger(false);
        slaTrigger.setResetOnExpire(false);
        logger.info("Ready to put in the sla trigger");
        triggerManager.insertTrigger(slaTrigger);
        logger.info("Sla inserted.");
      }
    }
  }

WebServer總結

下邊用一張圖簡單總結

image.png

ExecServer

暴露Restful API

Azkaban3.0后就開始支持多執行節點部署。單個執行節點比較簡單,對web暴露的API也比較少,主要是:

  • ExecutorServlet 主要提供執行、取消、暫停、日志查詢等接口。

執行作業

這里簡單看下執行節點執行一個作業的流程是怎樣的。我們在ExecutorServlet中看到所有的執行作業請求都經過handleAjaxExecute方法,這個方法簡單地將執行id傳遞給FlowRunnerManager:

private void handleAjaxExecute(HttpServletRequest req,
      Map<String, Object> respMap, int execId) throws ServletException {
    try {
      flowRunnerManager.submitFlow(execId);
    } catch (ExecutorManagerException e) {
      e.printStackTrace();
      logger.error(e);
      respMap.put(RESPONSE_ERROR, e.getMessage());
    }
  }

FlowRunnerManager 通過submitFlow方法提交工作流去執行。先是構建執行實例ExecutableFlow,然后準備執行目錄setupFlow(flow),然后生成FlowRunner,然后提交到線程池去運行executorService.submit(runner)。

 public void submitFlow(int execId) throws ExecutorManagerException {
    // Load file and submit
    if (runningFlows.containsKey(execId)) {
      throw new ExecutorManagerException("Execution " + execId
          + " is already running.");
    }

    ExecutableFlow flow = null;
    flow = executorLoader.fetchExecutableFlow(execId);
    if (flow == null) {
      throw new ExecutorManagerException("Error loading flow with exec "
          + execId);
    }

    // Sets up the project files and execution directory.
    setupFlow(flow);

    // Setup flow runner
    FlowWatcher watcher = null;
    ExecutionOptions options = flow.getExecutionOptions();
    if (options.getPipelineExecutionId() != null) {
      Integer pipelineExecId = options.getPipelineExecutionId();
      FlowRunner runner = runningFlows.get(pipelineExecId);

      if (runner != null) {
        watcher = new LocalFlowWatcher(runner);
      } else {
        watcher = new RemoteFlowWatcher(pipelineExecId, executorLoader);
      }
    }

    int numJobThreads = numJobThreadPerFlow;
    if (options.getFlowParameters().containsKey(FLOW_NUM_JOB_THREADS)) {
      try {
        int numJobs =
            Integer.valueOf(options.getFlowParameters().get(
                FLOW_NUM_JOB_THREADS));
        if (numJobs > 0 && (numJobs <= numJobThreads || ProjectWhitelist
                .isProjectWhitelisted(flow.getProjectId(),
                    WhitelistType.NumJobPerFlow))) {
          numJobThreads = numJobs;
        }
      } catch (Exception e) {
        throw new ExecutorManagerException(
            "Failed to set the number of job threads "
                + options.getFlowParameters().get(FLOW_NUM_JOB_THREADS)
                + " for flow " + execId, e);
      }
    }

    FlowRunner runner =
        new FlowRunner(flow, executorLoader, projectLoader, jobtypeManager);
    runner.setFlowWatcher(watcher)
        .setJobLogSettings(jobLogChunkSize, jobLogNumFiles)
        .setValidateProxyUser(validateProxyUser)
        .setNumJobThreads(numJobThreads).addListener(this);

    configureFlowLevelMetrics(runner);

    // Check again.
    if (runningFlows.containsKey(execId)) {
      throw new ExecutorManagerException("Execution " + execId
          + " is already running.");
    }

    // Finally, queue the sucker.
    runningFlows.put(execId, runner);

    try {
      // The executorService already has a queue.
      // The submit method below actually returns an instance of FutureTask,
      // which implements interface RunnableFuture, which extends both
      // Runnable and Future interfaces
      Future<?> future = executorService.submit(runner);
      // keep track of this future
      submittedFlows.put(future, runner.getExecutionId());
      // update the last submitted time.
      this.lastFlowSubmittedDate = System.currentTimeMillis();
    } catch (RejectedExecutionException re) {
      throw new ExecutorManagerException(
          "Azkaban server can't execute any more flows. "
              + "The number of running flows has reached the system configured limit."
              + "Please notify Azkaban administrators");
    }
  }

FlowRunner本身也繼承與Runnable,其run方法里邊調用了 runFlow方法,方法內容如下。方法里按照樹的層次結構逐層訪問DAG圖的每一個job,逐個去提交執行。

private void runFlow() throws Exception {
    logger.info("Starting flows");
    runReadyJob(this.flow);
    updateFlow();

    while (!flowFinished) {
      synchronized (mainSyncObj) {
        if (flowPaused) {
          try {
            mainSyncObj.wait(CHECK_WAIT_MS);
          } catch (InterruptedException e) {
          }

          continue;
        } else {
          if (retryFailedJobs) {
            retryAllFailures();
          } else if (!progressGraph()) {
            try {
              mainSyncObj.wait(CHECK_WAIT_MS);
            } catch (InterruptedException e) {
            }
          }
        }
      }
    }

    logger.info("Finishing up flow. Awaiting Termination");
    executorService.shutdown();

    updateFlow();
    logger.info("Finished Flow");
  }

對于單個job,最后構造一個JobRunner去執行之。

private void runExecutableNode(ExecutableNode node) throws IOException {
    // Collect output props from the job's dependencies.
    prepareJobProperties(node);

    node.setStatus(Status.QUEUED);
    JobRunner runner = createJobRunner(node);
    logger.info("Submitting job '" + node.getNestedId() + "' to run.");
    try {
      executorService.submit(runner);
      activeJobRunners.add(runner);
    } catch (RejectedExecutionException e) {
      logger.error(e);
    }
    ;
  }

 private JobRunner createJobRunner(ExecutableNode node) {
    // Load job file.
    File path = new File(execDir, node.getJobSource());

    JobRunner jobRunner =
        new JobRunner(node, path.getParentFile(), executorLoader,
            jobtypeManager);
    if (watcher != null) {
      jobRunner.setPipeline(watcher, pipelineLevel);
    }
    if (validateUserProxy) {
      jobRunner.setValidatedProxyUsers(proxyUsers);
    }

    jobRunner.setDelayStart(node.getDelayedExecution());
    jobRunner.setLogSettings(logger, jobLogFileSize, jobLogNumFiles);
    jobRunner.addListener(listener);

    if (JobCallbackManager.isInitialized()) {
      jobRunner.addListener(JobCallbackManager.getInstance());
    }

    configureJobLevelMetrics(jobRunner);

    return jobRunner;
  }

每個jobRunner在執行的時候,都去插件模塊里邊尋找對應的插件來進行job的類型加載。每種job類型都有對應的run方法。最后就是調用run方法去執行job。各種不同類型的job可以參考azkaban默認的job類型以及 azkaban-plugin工程里邊實現的一些hadoop相關作業類型。

 try {
        job = jobtypeManager.buildJobExecutor(this.jobId, props, logger);
      } catch (JobTypeManagerException e) {
        logger.error("Failed to build job type", e);
        return false;
      }

Azkaban Plugin

azkaban的插件機制使得可以非常方便的增加插件類型,從而支持運行更多的作業類型。azkaban的hadoop插件可以從以下倉庫中找到:

git clone https://github.com/azkaban/azkaban-plugins.git

插件的實現

其中插件的類繼承關系圖如下。每種插件作業都會單獨起一個進程去執行。其中ProcessJob就是負責起進程的一個類;JavaProcessJob繼承自它,特化為Java進程;其他的hadoop插件又各自繼承自JavaProcessJob。如果要自己實現插件類型,只要繼承JavaProcessJob類,在繼承子類里邊調用插件的Wrapper類就可以了。具體細節可以看代碼實現。

image.png
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • 目的這篇教程從用戶的角度出發,全面地介紹了Hadoop Map/Reduce框架的各個方面。先決條件請先確認Had...
    SeanC52111閱讀 1,767評論 0 1
  • 前言 大數據處理技術應用: [x] 電信運營商 數據營銷:房地產營銷、運營商時代(匯聚用戶行為) [x] 互聯網用...
    MichaelFly閱讀 4,482評論 0 16
  • 《分布式任務調度平臺XXL-JOB》 一、簡介 1.1 概述 XXL-JOB是一個輕量級分布式任務調度框架,其核心...
    許雪里閱讀 16,831評論 3 29
  • Spring Cloud為開發人員提供了快速構建分布式系統中一些常見模式的工具(例如配置管理,服務發現,斷路器,智...
    卡卡羅2017閱讀 134,973評論 19 139
  • ** 版本:2.2.1 ** Hello world: 調度器: 任務詳情:任務體實現Job接口 觸發器: 執行調...
    Coselding閱讀 10,265評論 12 38