業(yè)務(wù)場(chǎng)景是:大模型每次推理都是新建一個(gè)線程進(jìn)行推理,如果用戶要取消回答,或者遇到異常的時(shí)候,需要停止線程;主要針對(duì)的是第一種情況,流失推理實(shí)際上就是用一個(gè)隊(duì)列保存推理之后的結(jié)果,然后用另外一個(gè)線程不斷地從這個(gè)隊(duì)列里面取推理結(jié)果返回,達(dá)到所謂的“打字機(jī)”效果;以下是模擬的場(chǎng)景:
from queue import Queue, Empty
from threading import Thread
from multiprocessing import Process
import time
class Streamer:
def __init__ (self, _text_queue):
self.text_queue = _text_queue
self.stop_signal = "stop"
def put(self, value):
self.text_queue.put(value, timeout=0.5)
def __iter__(self):
return self
def __next__(self):
try:
value = self.text_queue.get(timeout=0.5)
except Empty as empty:
value = self.stop_signal
if value == self.stop_signal:
print("stop here!")
raise StopIteration()
return value
def test_func():
def _inference():
count = 0
while True:
if count < 10:
streamer.put("shit")
time.sleep(20) # 例如卡在執(zhí)行.so,比如調(diào)用模型的推理
print("put shit & getting gem")
time.sleep(0.2 * count) # 后面推理超時(shí)
count += 1
else:
print("breaking!")
break
t_queue = Queue()
streamer = Streamer(t_queue)
# 模式1 Thread daemon
# inference_thread = Thread(target=_inference, daemon=True)
# inference_thread.start()
# for idx, i in enumerate(streamer):
# print(f"get {i}{idx}")
# 模式2 Thread+stop
# https://www.cnblogs.com/conscience-remain/p/16930488.html
# stop_thread(inference_thread) # 54行會(huì)導(dǎo)致停止失敗
# 模式3 ThreadPoolExecutor的cancel
# import concurrent.futures
# with concurrent.futures.ThreadPoolExecutor(max_workers=1) as tpe:
# future = tpe.submit(_inference)
# for idx, i in enumerate(streamer):
# print(i, idx)
# # getting jammed or finished
# if future.running():
# print("canceling here!")
# future.cancel() # 取消線程?如果正在運(yùn)行的,并不會(huì)生效
# try:
# future.result(timeout=1)
# except concurrent.futures.TimeoutError as ex:
# print(f"ex: {ex}")
# 模式4 trace 主線程都退出了,子線程還gam
# thread = thread_with_trace(target=_inference, daemon=True)
# thread.start()
# for idx, i in enumerate(streamer):
# print(i, idx)
# thread.kill()
def worker():
worker = Thread(target=test_func)
worker.start()
worker.join()
if __name__ == "__main__":
worker()
# time.sleep(5) # 主線程如果不退出,daemon也不會(huì)退出
print("out")
以上幾種方式,都無(wú)法正常地停止掉當(dāng)前的推理線程。因此無(wú)法達(dá)到停止當(dāng)前推理的功能;
另外,涉及到Python中Thread和ThreadPoolExecutor的相關(guān)用法,另外可以了解什么是daemon線程。