進程的基本概念
進程是程序的一次執行,每個進程都有自己的地址空間,內存,數據棧以及其他記錄其運行軌跡的輔助數據。多進程就是在一個程序中執行多個任務,可以提高腳本的并行執行能力。當然使用多進程往往是用來處理CPU密集型(科學計算)的需求。
使用fork創建進程
但是fork()調用一次,返回兩次,因為操作系統自動把當前進程(稱為父進程)復制了一份(稱為子進程),然后,分別在父進程和子進程內返回,子進程永遠返回0,而父進程返回子進程的ID
import os
# 此方法只在Unix、Linux平臺上有效
print('Proccess {} is start'.format(os.getpid()))
subprocess = os.fork()
source_num = 9
if subprocess == 0:
print('I am in child process, my pid is {0}, and my father pid is {1}'.format(os.getpid(), os.getppid()))
source_num = source_num * 2
print('The source_num in ***child*** process is {}'.format(source_num))
else:
print('I am in father proccess, my child process is {}'.format(subprocess))
source_num = source_num ** 2
print('The source_num in ---father--- process is {}'.format(source_num))
print('The source_num is {}'.format(source_num))
Proccess 16600 is start
I am in father proccess, my child process is 19193
The source_num in ---father--- process is 81
The source_num is 81
Proccess 16600 is start
I am in child process, my pid is 19193, and my father pid is 16600
The source_num in ***child*** process is 18
The source_num is 18
很明顯,多進程之間的數據并無相互影響
multiprocessing模塊
Multiprocessing是一個Python模塊,使用與threading模塊類似的API產生進程。它通過使用進程代替線程可以為本地和遠程并發性的、有效的避開GIL。因此,該multiprocessing模塊允許程序員充分利用給定機器上的多個處理器。
創建管理進程模塊:
- Process(用于創建進程):通過創建一個Process對象然后調用它的start()方法來生成進程。Process遵循threading.Thread的API。
- Pool(用于創建進程管理池):可以創建一個進程池,該進程將執行與Pool該類一起提交給它的任務,當子進程較多需要管理時使用。
- Queue(用于進程通信,資源共享):進程間通信,保證進程安全。
- Value,Array(用于進程通信,資源共享):
- Pipe(用于管道通信):管道操作。
- Manager(用于資源共享):創建進程間共享的數據,包括在不同機器上運行的進程之間的網絡共享。
同步子進程模塊:
- Condition
- Event:用來實現進程間同步通信。
- Lock:當多個進程需要訪問共享資源的時候,Lock可以用來避免訪問的沖突。
- RLock
- Semaphore:用來控制對共享資源的訪問數量,例如池的最大連接數。
1.Process
創建進程的類:Process(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None):
group永遠為0
target表示run()方法要調用的對象
name為別名
args表示調用對象的位置參數元組
kwargs表示調用對象的字典
deamon設置守護進程
方法:
run():表示進程活動的方法
start():開始進程
join():表示阻塞當前進程,待調用join()的進程結束后,再開始當前方法
name:進程的名字
is_alive():返回進程是否活著(與death狀態相反)
deamon:守護進程的標識
pid:返回進程ID
teminate():終結進程,強制終結
創建單個進程
import os
from multiprocessing import Process
def hello_pro(name):
print('I am in process {0}, It\'s PID is {1}' .format(name, os.getpid()))
if __name__ == '__main__':
print('Parent Process PID is {}'.format(os.getpid()))
p = Process(target=hello_pro, args=('test',), name='test_proc')
# 開始進程
p.start()
print('Process\'s ID is {}'.format(p.pid))
print('The Process is alive? {}'.format(p.is_alive()))
print('Process\' name is {}'.format(p.name))
# join方法表示阻塞當前進程,待p代表的進程執行完后,再執行當前進程
p.join()
Parent Process PID is 16600
I am in process test, It's PID is 19925
Process's ID is 19925
The Process is alive? True
Process' name is test_proc
創建多個進程
import os
from multiprocessing import Process, current_process
def doubler(number):
"""
A doubling function that can be used by a process
"""
result = number * 2
proc_name = current_process().name
print('{0} doubled to {1} by: {2}'.format(
number, result, proc_name))
if __name__ == '__main__':
numbers = [5, 10, 15, 20, 25]
procs = []
proc = Process(target=doubler, args=(5,))
for index, number in enumerate(numbers):
proc = Process(target=doubler, args=(number,))
procs.append(proc)
proc.start()
proc = Process(target=doubler, name='Test', args=(2,))
proc.start()
procs.append(proc)
for proc in procs:
proc.join()
5 doubled to 10 by: Process-8
20 doubled to 40 by: Process-11
10 doubled to 20 by: Process-9
15 doubled to 30 by: Process-10
25 doubled to 50 by: Process-12
2 doubled to 4 by: Test
將進程創建為類
import os
import time
from multiprocessing import Process
class DoublerProcess(Process):
def __init__(self, numbers):
Process.__init__(self)
self.numbers = numbers
# 重寫run()函數
def run(self):
for number in self.numbers:
result = number * 2
proc_name = current_process().name
print('{0} doubled to {1} by: {2}'.format(number, result, proc_name))
if __name__ == '__main__':
dp = DoublerProcess([5, 20, 10, 15, 25])
dp.start()
dp.join()
5 doubled to 10 by: DoublerProcess-16
20 doubled to 40 by: DoublerProcess-16
10 doubled to 20 by: DoublerProcess-16
15 doubled to 30 by: DoublerProcess-16
25 doubled to 50 by: DoublerProcess-16
2.Lock
代碼來自Python多進程編程
import multiprocessing
import sys
def worker_with(lock, f):
# lock支持上下文協議,可以使用with語句
with lock:
fs = open(f, 'a+')
n = 10
while n > 1:
print('Lockd acquired via with')
fs.write("Lockd acquired via with\n")
n -= 1
fs.close()
def worker_no_with(lock, f):
# 獲取lock
lock.acquire()
try:
fs = open(f, 'a+')
n = 10
while n > 1:
print('Lock acquired directly')
fs.write("Lock acquired directly\n")
n -= 1
fs.close()
finally:
# 釋放Lock
lock.release()
if __name__ == "__main__":
lock = multiprocessing.Lock()
f = "file.txt"
w = multiprocessing.Process(target = worker_with, args=(lock, f))
nw = multiprocessing.Process(target = worker_no_with, args=(lock, f))
w.start()
nw.start()
w.join()
nw.join()
print('END!')
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
END!
3.Pool
Pool可以提供指定數量的進程,供用戶調用,當有新的請求提交到pool中時,如果池還沒有滿,那么就會創建一個新的進程用來執行該請求;但如果池中的進程數已經達到規定最大值,那么該請求就會等待,直到池中有進程結束,才會創建新的進程來它
import time
import os
from multiprocessing import Pool, cpu_count
def f(msg):
print('Starting: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
time.sleep(3)
print('Ending: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
if __name__ == '__main__':
print('Starting Main Function')
print('This Computer has {} CPU'.format(cpu_count()))
# 創建4個進程
p = Pool(4)
for i in range(5):
msg = 'Process {}'.format(i)
# 將函數和參數傳入進程
p.apply_async(f, (msg, ))
# 禁止增加新的進程
p.close()
# 阻塞當前進程
p.join()
print('All Done!!!')
Starting Main Function
This Computer has 4 CPU
Starting: Process 2, PID: 8332, Time: Fri Sep 1 08:53:12 2017
Starting: Process 1, PID: 8331, Time: Fri Sep 1 08:53:12 2017
Starting: Process 0, PID: 8330, Time: Fri Sep 1 08:53:12 2017
Starting: Process 3, PID: 8333, Time: Fri Sep 1 08:53:12 2017
Ending: Process 2, PID: 8332, Time: Fri Sep 1 08:53:15 2017
Ending: Process 3, PID: 8333, Time: Fri Sep 1 08:53:15 2017
Starting: Process 4, PID: 8332, Time: Fri Sep 1 08:53:15 2017
Ending: Process 1, PID: 8331, Time: Fri Sep 1 08:53:15 2017
Ending: Process 0, PID: 8330, Time: Fri Sep 1 08:53:15 2017
Ending: Process 4, PID: 8332, Time: Fri Sep 1 08:53:18 2017
All Done!!!
本機為4個CPU,所以前0-3號進程直接同時執行,4號進程等待,帶0-3號中有進程執行完畢后,4號進程開始執行。而當前進程執行完畢后,再執行當前進程,打印“All Done!!!”。方法apply_async()是非阻塞式的,而方法apply()則是阻塞式的。
將apply_async()替換為apply()方法
import time
import os
from multiprocessing import Pool, cpu_count
def f(msg):
print('Starting: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
time.sleep(3)
print('Ending: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
if __name__ == '__main__':
print('Starting Main Function')
print('This Computer has {} CPU'.format(cpu_count()))
# 創建4個進程
p = Pool(4)
for i in range(5):
msg = 'Process {}'.format(i)
# 將apply_async()方法替換為apply()方法
p.apply(f, (msg, ))
# 禁止增加新的進程
p.close()
# 阻塞當前進程
p.join()
print('All Done!!!')
Starting Main Function
This Computer has 4 CPU
Starting: Process 0, PID: 8281, Time: Fri Sep 1 08:51:18 2017
Ending: Process 0, PID: 8281, Time: Fri Sep 1 08:51:21 2017
Starting: Process 1, PID: 8282, Time: Fri Sep 1 08:51:21 2017
Ending: Process 1, PID: 8282, Time: Fri Sep 1 08:51:24 2017
Starting: Process 2, PID: 8283, Time: Fri Sep 1 08:51:24 2017
Ending: Process 2, PID: 8283, Time: Fri Sep 1 08:51:27 2017
Starting: Process 3, PID: 8284, Time: Fri Sep 1 08:51:27 2017
Ending: Process 3, PID: 8284, Time: Fri Sep 1 08:51:30 2017
Starting: Process 4, PID: 8281, Time: Fri Sep 1 08:51:30 2017
Ending: Process 4, PID: 8281, Time: Fri Sep 1 08:51:33 2017
All Done!!!
可以看到阻塞式的在一個接一個執行,待上一個執行完畢后才執行下一個。
使用get方法獲取結果
import time
import os
from multiprocessing import Pool, cpu_count
def f(msg):
print('Starting: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
time.sleep(3)
print('Ending: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
return 'Done {}'.format(msg)
if __name__ == '__main__':
print('Starting Main Function')
print('This Computer has {} CPU'.format(cpu_count()))
# 創建4個進程
p = Pool(4)
results = []
for i in range(5):
msg = 'Process {}'.format(i)
results.append(p.apply_async(f, (msg, )))
# 禁止增加新的進程
p.close()
# 阻塞當前進程
p.join()
for result in results:
print(result.get())
print('All Done!!!')
Starting Main Function
This Computer has 4 CPU
Starting: Process 0, PID: 8526, Time: Fri Sep 1 09:00:04 2017
Starting: Process 1, PID: 8527, Time: Fri Sep 1 09:00:04 2017
Starting: Process 2, PID: 8528, Time: Fri Sep 1 09:00:04 2017
Starting: Process 3, PID: 8529, Time: Fri Sep 1 09:00:04 2017
Ending: Process 1, PID: 8527, Time: Fri Sep 1 09:00:07 2017
Starting: Process 4, PID: 8527, Time: Fri Sep 1 09:00:07 2017
Ending: Process 3, PID: 8529, Time: Fri Sep 1 09:00:07 2017
Ending: Process 0, PID: 8526, Time: Fri Sep 1 09:00:07 2017
Ending: Process 2, PID: 8528, Time: Fri Sep 1 09:00:07 2017
Ending: Process 4, PID: 8527, Time: Fri Sep 1 09:00:10 2017
Done Process 0
Done Process 1
Done Process 2
Done Process 3
Done Process 4
All Done!!!
4.Queue
Queue是多進程安全的隊列,可以使用Queue實現多進程之間的數據傳遞。
put方法用以插入數據到隊列中,put方法還有兩個可選參數:blocked和timeout。如果blocked為True(默認值),并且timeout為正值,該方法會阻塞timeout指定的時間,直到該隊列有剩余的空間。如果超時,會拋出Queue.Full異常。如果blocked為False,但該Queue已滿,會立即拋出Queue.Full異常。
get方法可以從隊列讀取并且刪除一個元素。同樣,get方法有兩個可選參數:blocked和timeout。如果blocked為True(默認值),并且timeout為正值,那么在等待時間內沒有取到任何元素,會拋出Queue.Empty異常。如果blocked為False,有兩種情況存在,如果Queue有一個值可用,則立即返回該值,否則,如果隊列為空,則立即拋出Queue.Empty異常
import os
import time
from multiprocessing import Queue, Process
def write_queue(q):
for i in ['first', 'two', 'three', 'four', 'five']:
print('Write "{}" to Queue'.format(i))
q.put(i)
time.sleep(3)
print('Write Done!')
def read_queue(q):
print('Start to read!')
while True:
data = q.get()
print('Read "{}" from Queue!'.format(data))
if __name__ == '__main__':
q = Queue()
wq = Process(target=write_queue, args=(q,))
rq = Process(target=read_queue, args=(q,))
wq.start()
rq.start()
# #這個表示是否阻塞方式啟動進程,如果要立即讀取的話,兩個進程的啟動就應該是非阻塞式的,
# 所以wq在start后不能立即使用wq.join(), 要等rq.start后方可
wq.join()
# 服務進程,強制停止,因為read_queue進程李是死循環
rq.terminate()
Write "first" to Queue
Start to read!
Read "first" from Queue!
Write "two" to Queue
Read "two" from Queue!
Write "three" to Queue
Read "three" from Queue!
Write "four" to Queue
Read "four" from Queue!
Write "five" to Queue
Read "five" from Queue!
Write Done!
5.Pipe
Pipe方法返回(conn1, conn2)代表一個管道的兩個端。
Pipe方法有duplex參數,如果duplex參數為True(默認值),那么這個管道是全雙工模式,也就是說conn1和conn2均可收發。duplex為False,conn1只負責接受消息,conn2只負責發送消息。
send和recv方法分別是發送和接受消息的方法。例如,在全雙工模式下,可以調用conn1.send發送消息,conn1.recv接收消息。如果沒有消息可接收,recv方法會一直阻塞。如果管道已經被關閉,那么recv方法會拋出EOFError。
可參考使用pipe管道使python fork多進程之間通信
import os, time, sys
from multiprocessing import Pipe, Process
def send_pipe(p):
for i in ['first', 'two', 'three', 'four', 'five']:
print('Send "{}" to Pipe'.format(i))
p.send(i)
time.sleep(3)
print('Send Done!')
def receive_pipe(p):
print('Start to receive!')
while True:
data = p.recv()
print('Read "{}" from Pipe!'.format(data))
if __name__ == '__main__':
sp_pipe, rp_pipe = Pipe()
sp = Process(target=send_pipe, args=(sp_pipe,))
rp = Process(target=receive_pipe, args=(rp_pipe,))
sp.start()
rp.start()
wq.join()
rq.terminate()
Start to receive!
Send "first" to Pipe
Read "first" from Pipe!
Send "two" to Pipe
Read "two" from Pipe!
Send "three" to Pipe
Read "three" from Pipe!
Send "four" to Pipe
Read "four" from Pipe!
Send "five" to Pipe
Read "five" from Pipe!
Send Done!
6.Semaphore
Semaphore用來控制對共享資源的訪問數量,例如池的最大連接數
import multiprocessing
import time
def worker(s, i):
s.acquire()
print(multiprocessing.current_process().name + "acquire");
time.sleep(i)
print(multiprocessing.current_process().name + "release\n");
s.release()
if __name__ == "__main__":
s = multiprocessing.Semaphore(3)
for i in range(5):
p = multiprocessing.Process(target = worker, args=(s, i*2))
p.start()
Process-170acquire
Process-168acquire
Process-168release
Process-169acquire
Process-171acquire
Process-169release
Process-172acquire
Process-170release
Process-171release
Process-172release