Python的多線程只能運(yùn)行在單核上,各個(gè)線程以并發(fā)的方法異步運(yùn)行。而多進(jìn)程可以利用CPU的多核,進(jìn)程數(shù)取決于計(jì)算機(jī)CPU的處理器個(gè)數(shù),由于運(yùn)行在不同的核上,各個(gè)進(jìn)程的運(yùn)行是并行的。
?????? 在python中,如果使用多進(jìn)程,需要使用multiprocessing這個(gè)庫(kù)。multiprocessing模塊用來(lái)開(kāi)啟子進(jìn)程,并在子進(jìn)程中執(zhí)行我們定制的任務(wù)(比如函數(shù)),該模塊與多線程模塊threading的編程接口類似。同時(shí)該模塊提供了process、Queue、Lock等組件,用于進(jìn)程的創(chuàng)建和進(jìn)程間通信。
當(dāng)進(jìn)程數(shù)量大于CPU的內(nèi)核數(shù)量時(shí),等待運(yùn)行的進(jìn)程會(huì)等到其他進(jìn)程運(yùn)行完讓出內(nèi)核為止。如果CPU單核,就無(wú)法運(yùn)行多進(jìn)程并行。可以使用multiprocessing庫(kù)查看CPU核數(shù)。
>>>from multiprocessing import cpu_count
>>>?
>>>cpu_count()
8
可知,本機(jī)的CPU核數(shù)為8。
進(jìn)程創(chuàng)建
multiprocessing模塊提供了一個(gè)Process類來(lái)構(gòu)造一個(gè)子進(jìn)程,結(jié)合queue來(lái)實(shí)現(xiàn)進(jìn)程間通訊。使用Process,需要根據(jù)實(shí)際需要手動(dòng)去動(dòng)態(tài)創(chuàng)建多個(gè)進(jìn)程,操作不是很方便,實(shí)際中多使用進(jìn)程池。
由于進(jìn)程啟動(dòng)的開(kāi)銷比較大,使用多進(jìn)程的時(shí)候會(huì)導(dǎo)致大量?jī)?nèi)存空間被消耗。為了防止這種情況發(fā)生可以使用進(jìn)程池(由于啟動(dòng)線程的開(kāi)銷比較小,所以不需要線程池這種概念,多線程只會(huì)頻繁得切換cpu導(dǎo)致系統(tǒng)變慢,并不會(huì)占用過(guò)多的內(nèi)存空間)。
進(jìn)程池內(nèi)部維護(hù)一個(gè)進(jìn)程序列,當(dāng)使用時(shí),去進(jìn)程池(Pool)中獲取一個(gè)進(jìn)程,如果進(jìn)程池序列中沒(méi)有可供使用的進(jìn)程,那么程序就會(huì)等待,直到進(jìn)程池中有可用進(jìn)程為止。
>>>from multiprocessing import Pool
>>>dir(Pool())
['Process','__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__','__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__','__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__','__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__','__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_cache', '_ctx','_get_tasks', '_guarded_task_generation', '_handle_results', '_handle_tasks','_handle_workers', '_help_stuff_finish', '_initargs', '_initializer','_inqueue', '_join_exited_workers', '_maintain_pool', '_map_async','_maxtasksperchild', '_outqueue', '_pool', '_processes', '_quick_get','_quick_put', '_repopulate_pool', '_result_handler', '_setup_queues', '_state','_task_handler', '_taskqueue', '_terminate', '_terminate_pool','_worker_handler', '_wrap_exception', 'apply', 'apply_async', 'close', 'imap','imap_unordered', 'join', 'map', 'map_async', 'starmap', 'starmap_async','terminate']
>>>?
創(chuàng)建一個(gè)進(jìn)程池的方法是
pool=Pool(processes=X)
Pool有一個(gè)processes參數(shù),這個(gè)參數(shù)可以不設(shè)置,如果不設(shè)置函數(shù)會(huì)跟根據(jù)計(jì)算機(jī)的實(shí)際情況(cpu_count)來(lái)決定要運(yùn)行多少個(gè)進(jìn)程,我們也可自己設(shè)置。
使用Pool類,常用的方法有
map
map()與內(nèi)置的map函數(shù)用法行為基本一致,第一個(gè)參數(shù)是函數(shù),第二個(gè)參數(shù)是一個(gè)可迭代對(duì)象,將可迭代對(duì)象中的元素作為參數(shù)依次傳入函數(shù)中。如
[root@localhost newtest]# catpool.py
#encoding=utf-8
from multiprocessing import Pool
import time,os
import random
?
def myfunc(url):
??? time.sleep(random.random()*3)
??? print("now process is"+str(os.getpid())+"? get anele "+str(url))
?
if __name__ =="__main__":
??? urls=[var for var in range(5)]
??? pool=Pool(processes=3)???????? #創(chuàng)建有3個(gè)進(jìn)程數(shù)量的進(jìn)程池???
????
??? pool.map(myfunc,urls)
???
??? print("main function pid is"+ str(os.getpid()))
??? pool.close()????????????????? #關(guān)閉進(jìn)程池,不再接受新的進(jìn)程
??? pool.join()?????????????????? #主進(jìn)程等待子進(jìn)程結(jié)束
[root@localhost newtest]#
???????? 運(yùn)行結(jié)果
[root@localhostnewtest]# python pool.py
nowprocess is 24790? get an ele 0
nowprocess is 24790? get an ele 3
nowprocess is 24791? get an ele 1
nowprocess is 24790? get an ele 4
nowprocess is 24792? get an ele 2
mainfunction pid is 24789
[root@localhostnewtest]#
Windows下,進(jìn)程的創(chuàng)建語(yǔ)句必需寫(xiě)在if __name__ == "__main__":下。
close方法是?等待所有進(jìn)程結(jié)束后,才關(guān)閉進(jìn)程池。join方法是主進(jìn)程等待所有子進(jìn)程執(zhí)行完畢(阻塞主\父進(jìn)程),必須在close或terminate()之后。從結(jié)果看通過(guò)這種方法創(chuàng)建的進(jìn)程是阻塞型進(jìn)程(其他子進(jìn)程執(zhí)行完畢,主進(jìn)程(pid=24789)才繼續(xù)向下執(zhí)行)。
apply_async()
創(chuàng)建非阻塞型進(jìn)程,原型為
apply_async(func[, args[, kwds[, callback]]])
如
[root@localhostnewtest]# vi poolnonblock.py
#encoding=utf-8
frommultiprocessing import Pool
importtime,os
importrandom
?
defmyfunc(url):
??? time.sleep(random.random()*3)
??? print("now process is"+str(os.getpid())+"? get anele "+str(url))
?
if__name__ == "__main__":
??? pool=Pool(processes=3)?
??? for i in range(10):
??????? pool.apply_async(myfunc,(i,))???????? #使用元祖類型傳參
?
??? print("main function pid is "+str(os.getpid()))
??? pool.close()????????????????? #關(guān)閉進(jìn)程池,不再接受新的進(jìn)程
??? pool.join()?????????????????? #主進(jìn)程等待子進(jìn)程結(jié)束
?
?????? 運(yùn)行結(jié)果
[root@localhostnewtest]# python poolnonblock.py
mainfunction pid is 31299
nowprocess is 31301? get an ele 1
nowprocess is 31300? get an ele 0
nowprocess is 31301? get an ele 3
nowprocess is 31302? get an ele 2
nowprocess is 31301? get an ele 5
nowprocess is 31301? get an ele 7
nowprocess is 31300? get an ele 4
nowprocess is 31302? get an ele 6
nowprocess is 31301? get an ele 8
nowprocess is 31300? get an ele 9
因?yàn)樽舆M(jìn)程為非阻塞,主函數(shù)(主進(jìn)程)會(huì)自己執(zhí)行自個(gè)的,不搭理子進(jìn)程的執(zhí)行,所以主進(jìn)程不會(huì)等待for循環(huán)執(zhí)行完畢后才輸出“main function pid is 31299”。
倘若沒(méi)有pool.join()這一句,則主進(jìn)程執(zhí)行完畢后,子進(jìn)程也就終止了,如
[root@localhostnewtest]# python poolblock.py
mainfunction pid is 13702
nowprocess is 13703? get an ele 0
[root@localhostnewtest]# python poolblock.py
mainfunction pid is 13973
[root@localhostnewtest]#
我們可以把這一句放到主進(jìn)程最后一可執(zhí)行語(yǔ)句前面,這樣的效果等同于創(chuàng)建阻塞子進(jìn)程。如
if__name__ == "__main__":
??? pool=Pool(processes=3)?
??? for i in range(10):
??????? pool.apply_async(myfunc,(i,))???????? #使用元祖類型傳參
pool.close()????????????????? #關(guān)閉進(jìn)程池,不再接受新的進(jìn)程
??? pool.join()?????????????????? #主進(jìn)程等待子進(jìn)程結(jié)束
?
??? print("main function pid is "+str(os.getpid()))
???
?????? 運(yùn)行結(jié)果
[root@localhost~]# python poolnonblock.py
nowprocess is 12586? get an ele 1
nowprocess is 12586? get an ele 3
nowprocess is 12586? get an ele 4
nowprocess is 12587? get an ele 2
nowprocess is 12587? get an ele 6
nowprocess is 12585? get an ele 0
nowprocess is 12587? get an ele 7
now processis 12587? get an ele 9
nowprocess is 12586? get an ele 5
nowprocess is 12585? get an ele 8
mainfunction pid is 12584
又因?yàn)檫M(jìn)程池中只能容納有3個(gè)對(duì)象實(shí)例,小于服務(wù)器的核數(shù)(核數(shù)為4),某一進(jìn)程執(zhí)行完畢后,不會(huì)創(chuàng)建新的子進(jìn)程,是剛剛空閑出來(lái)的進(jìn)程去執(zhí)行新的任務(wù),進(jìn)程池中的各進(jìn)程pid是不變的。
若改變使得進(jìn)程池中進(jìn)程實(shí)例大于服務(wù)器的核數(shù),如
….
pool=Pool(processes=5)
??? for i in range(10):
??????? pool.apply_async(myfunc,(i,))
….
?????? 運(yùn)行結(jié)果
[root@localhostnewtest]# python poolblock.py
mainfunction pid is 30563
nowprocess is 30566? get an ele 2
nowprocess is 30568? get an ele 4
nowprocess is 30565? get an ele 1
nowprocess is 30568? get an ele 6
nowprocess is 30567? get an ele 3
nowprocess is 30567? get an ele 9
nowprocess is 30566? get an ele 5
nowprocess is 30568? get an ele 8
nowprocess is 30564? get an ele 0
nowprocess is 30565? get an ele 7
?
可以看到,這兩種情況都維持執(zhí)行的進(jìn)程總數(shù)為processes,但只有后者,在當(dāng)一個(gè)進(jìn)程執(zhí)行完畢后會(huì)添加新的進(jìn)程進(jìn)去。
apply ()
創(chuàng)建阻塞型進(jìn)程。原型
apply(func[, args[, kwds]])
?????? 如修改為
…
pool=Pool(processes=3)
for iin range(4):
pool.apply(myfunc,(i,))
…
運(yùn)行結(jié)果
[root@localhostnewtest]# python poolblock.py
now processis 8200? get an ele 0
nowprocess is 8201? get an ele 1
nowprocess is 8202? get an ele 2
nowprocess is 8200? get an ele 3
mainfunction pid is 8199
可見(jiàn)主進(jìn)程被阻塞到子進(jìn)程執(zhí)行完畢后才繼續(xù)運(yùn)行。阻塞型進(jìn)程不需要pool.join()這一句。
我們也可以定義一系列函數(shù),通過(guò)循環(huán)讓不同的進(jìn)程執(zhí)行不同的函數(shù)。如
pool=Pool(processes=3)?
for funcin func_list:
function_list=? [func1,func2,func3]???????? #函數(shù)名組成的列表
??? pool.apply_async(func)????????