理解Python的PoolExecutor

Demo代碼和引用知識點都參考自<a >《理解Python并發編程一篇就夠了|PoolExecutor篇》--董偉明</a>或作者個人公眾號Python之美, 《Python Cookbook》和Python并發編程之線程池/進程池

ThreadPoolExecutorProcessPoolExecutor分別對threadingmultiprocessing進行了高級抽象,暴露出簡單的統一接口。

通過ProcessPoolExecutor 來做示例。

主要提供兩個方法map()submit()

map() 方法主要用來針對簡化執行相同的方法,如下例:

# -*- coding: utf-8 -*-

from concurrent.futures import ProcessPoolExecutor

def fib(n, test_arg):
    if n > 30:
        raise Exception('can not > 30, now %s' % n)
    if n <= 2:
        return 1
    return fib(n-1, test_arg) + fib(n-2, test_arg)

def use_map():
    nums = [random.randint(0, 33) for _ in range(0, 10)]
    with ProcessPoolExecutor() as executor:
        try:
            results = executor.map(fib, nums, nums)
            for num, result in zip(nums, results):
                print('fib(%s) result is %s.' % (num, result))
        except Exception as e:
            print(e)

執行上例,輸出如下,當num為30時拋出異常捕獲后程序停止運行。

...
fib(19) result is 4181.
fib(11) result is 89.
fib(2) result is 1.
fib(5) result is 5.
fib(24) result is 46368.
fib(2) result is 1.
can not > 30, now 33

使用submit()方法。

# -*- coding: utf-8 -*-

from concurrent.futures import ProcessPoolExecutor, as_completed
import random

def fib(n, test_arg):
    if n > 30:
        raise Exception('can not > 30, now %s' % n)
    if n <= 2:
        return 1
    return fib(n-1, test_arg) + fib(n-2, test_arg)

def use_submit():
    nums = [random.randint(0, 33) for _ in range(0, 10)]
    with ProcessPoolExecutor() as executor:
        futures = {executor.submit(fib, n, n): n for n in nums}
        for f in as_completed(futures):
            try:
                print('fib(%s) result is %s.' % (futures[f], f.result()))
            except Exception as e:
                print(e)

執行上例,輸出如下,可見當拋出異常并捕獲后,繼續向后輸出,并沒有向map()一樣停止,除了as_completed(),還有wait()等方法。

fib(3) result is 2.
fib(15) result is 610.
can not > 30, now 31
fib(23) result is 28657.
fib(1) result is 1.
can not > 30, now 32
fib(14) result is 377.
fib(12) result is 144.
fib(26) result is 121393.
fib(29) result is 514229.

try/except的代碼塊包括as_completed()則不會繼續輸出,直接停止,暫時未找到原因。

def use_submit():
    nums = [random.randint(0, 33) for _ in range(0, 10)]
    with ProcessPoolExecutor() as executor:
        futures = {executor.submit(fib, n, n): n for n in nums}
        try:
            for f in as_completed(futures):
                print('fib(%s) result is %s.' % (futures[f], f.result()))
        except Exception as e:
            print(e)

其他:

  1. map()是根據傳入的參數然后順序輸出的,as_completed()是按完成時間輸出的,上面的例子不明顯,可以參考Python并發編程之線程池/進程池,但都跟max_workers 參數和方法執行時間掛鉤。
import time
def test_sleep(n):
    time.sleep(n)
    return True
def use_submit():
    nums = [3, 2, 1, 3]
    with ProcessPoolExecutor(max_workers=3) as executor:
        futures = {executor.submit(test_sleep, n): n for n in nums}
        for f in as_completed(futures):
            try:
                print('%s result is %s.' % (futures[f], f.result()))
            except Exception as e:
                print(e)
def use_map():
    nums = [3, 2, 1, 3]
    with ProcessPoolExecutor(max_workers=3) as executor:
        try:
            results = executor.map(test_sleep, nums)
            for num, result in zip(nums, results):
                print('%s result is %s.' % (num, result))
        except Exception as e:
            print(e)

use_submit() 輸出如下,耗時3+1=4s,且完成一個輸出一個,指定max_workers=3,第一個耗時1s的完成后就會執行第四個耗時3s的任務。

1 result is True.
2 result is True.
3 result is True.
3 result is True.

use_map() 輸出如下,同樣是耗時3+1=4s,但是是按傳入參數順序輸入,因為指定max_workers=3,所以前三個是在耗時3s后一起輸出的,第四個在耗時4s后再輸出。

3 result is True.
2 result is True.
1 result is True.
3 result is True.
  1. 閱讀部分map()源碼。
def map(self, fn, *iterables, timeout=None, chunksize=1):
        """Returns an iterator equivalent to map(fn, iter).

        Args:
            fn: A callable that will take as many arguments as there are
                passed iterables.
            timeout: The maximum number of seconds to wait. If None, then there
                is no limit on the wait time.
            chunksize: The size of the chunks the iterable will be broken into
                before being passed to a child process. This argument is only
                used by ProcessPoolExecutor; it is ignored by
                ThreadPoolExecutor.

        Returns:
            An iterator equivalent to: map(func, *iterables) but the calls may
            be evaluated out-of-order.

        Raises:
            TimeoutError: If the entire result iterator could not be generated
                before the given timeout.
            Exception: If fn(*args) raises for any values.
        """
        if timeout is not None:
            end_time = timeout + time.time()

        fs = [self.submit(fn, *args) for args in zip(*iterables)]

        # Yield must be hidden in closure so that the futures are submitted
        # before the first iterator value is required.
        def result_iterator():
            try:
                for future in fs:
                    if timeout is None:
                        yield future.result()
                    else:
                        yield future.result(end_time - time.time())
            finally:
                for future in fs:
                    future.cancel()
        return result_iterator()

fs存放了submit()后返回的future實例,是按傳入的參數順序排序的,返回了result_iterator()。至于為什么會按max_workers數一組返回輸出,暫時不清楚。

  1. as_completed()源碼,理解略有困難。
  2. ProcessExecutorPool()的實現:
    process.png

我們結合源碼和上面的數據流分析一下:
executor.map會創建多個_WorkItem對象(ps. 實際上是執行了多次submit()),每個對象都傳入了新創建的一個Future對象。
把每個_WorkItem對象然后放進一個叫做「Work Items」的dict中,鍵是不同的「Work Ids」。
創建一個管理「Work Ids」隊列的線程「Local worker thread」,它能做2件事:
從「Work Ids」隊列中獲取Work Id, 通過「Work Items」找到對應的_WorkItem。如果這個Item被取消了,就從「Work Items」里面把它刪掉,否則重新打包成一個_CallItem放入「Call Q」這個隊列。executor的那些進程會從隊列中取_CallItem執行,并把結果封裝成_ResultItems放入「Result Q」隊列中。
從「Result Q」隊列中獲取_ResultItems,然后從「Work Items」更新對應的Future對象并刪掉入口。

  1. 簡單分析submit()
    def submit(self, fn, *args, **kwargs):
        with self._shutdown_lock:
            if self._broken:
                raise BrokenProcessPool('A child process terminated '
                    'abruptly, the process pool is not usable anymore')
            if self._shutdown_thread:
                raise RuntimeError('cannot schedule new futures after shutdown')

            f = _base.Future()
            w = _WorkItem(f, fn, args, kwargs)

            self._pending_work_items[self._queue_count] = w
            self._work_ids.put(self._queue_count)
            self._queue_count += 1
            # Wake up queue management thread
            self._result_queue.put(None)

            self._start_queue_management_thread()
            return f
  • 創建Future()實例f,和_WorkItem()實例w
  • _pending_work_items即上述所說的Work Items字典,key為_queue_count,初始化為0;value為w。并將_queue_count添加到_work_ids隊列中。
  • Wake up queue management thread,即喚醒上述圖中的Local Work Thread
def _start_queue_management_thread(self):
       # When the executor gets lost, the weakref callback will wake up
       # the queue management thread.
       def weakref_cb(_, q=self._result_queue):
           q.put(None)
       if self._queue_management_thread is None:
           # Start the processes so that their sentinels are known.
           self._adjust_process_count()
           self._queue_management_thread = threading.Thread(
                   target=_queue_management_worker,
                   args=(weakref.ref(self, weakref_cb),
                         self._processes,
                         self._pending_work_items,
                         self._work_ids,
                         self._call_queue,
                         self._result_queue))
           self._queue_management_thread.daemon = True
           self._queue_management_thread.start()
           _threads_queues[self._queue_management_thread] = self._result_queue

   def _adjust_process_count(self):
       for _ in range(len(self._processes), self._max_workers):
           p = multiprocessing.Process(
                   target=_process_worker,
                   args=(self._call_queue,
                         self._result_queue))
           p.start()
           self._processes[p.pid] = p
  • _adjust_process_count()開啟max_wokers個進程,執行_process_worker()
  • 開啟_queue_management_thread()線程,即Local Worker Thread。
  • _queue_management_thread()線程中將調用_add_call_item_to_queue()_CallItem置于call_queue,并刪除引用等操作,該方法理解有困難。
def _process_worker(call_queue, result_queue):
   """Evaluates calls from call_queue and places the results in result_queue.

   This worker is run in a separate process.

   Args:
       call_queue: A multiprocessing.Queue of _CallItems that will be read and
           evaluated by the worker.
       result_queue: A multiprocessing.Queue of _ResultItems that will written
           to by the worker.
       shutdown: A multiprocessing.Event that will be set as a signal to the
           worker that it should exit when call_queue is empty.
   """
   while True:
       call_item = call_queue.get(block=True)
       if call_item is None:
           # Wake up queue management thread
           result_queue.put(os.getpid())
           return
       try:
           r = call_item.fn(*call_item.args, **call_item.kwargs)
       except BaseException as e:
           exc = _ExceptionWithTraceback(e, e.__traceback__)
           result_queue.put(_ResultItem(call_item.work_id, exception=exc))
       else:
           result_queue.put(_ResultItem(call_item.work_id,
                                        result=r))
  • 執行任務進程,從call_queue中獲取_CallItem并調用其fn,將結果放進result_queue中。
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 9,868評論 0 23
  • 一顆白楊,與黃土相戀 從此,扎根是他的固執 成長再也不缺理由 大路旁,田埂邊 黃土走在哪兒 哪兒便有 高聳...
    瘋為白楊閱讀 264評論 0 0
  • 阿華,恭喜你! 人生百態,姿態自擺! 同樣兩個人,也同一張圖片,可以有廈下面9種甚至更多的結果,希望你找到自己最想...
    啡常識閱讀 306評論 0 0