前言
需求:將請求不是200的url抓下來保存到本地記錄
方法:在scrapy的middlewares中創建一個中間件,對response.status狀態不為200的url收集下來
middleware中設置方法
class GetFailedUrl(object):
def process_response(self,response,request,spider):
if response.status != 200:
name = time.strftime('%Y-%m-%d %H:%M',time.localtime())
with open (str(name),'w+') as file:
file.write(response.url)
return response
else:
return response