做一個(gè)小爬蟲監(jiān)控

場景和需求是這樣的:

1.機(jī)器在線數(shù)據(jù)顯示在網(wǎng)站網(wǎng)頁上,每次都要訪問這個(gè)訪問進(jìn)行查詢機(jī)器是否在線
2.由于網(wǎng)頁上已經(jīng)有現(xiàn)成的在線數(shù)據(jù),所以就不打算直接查詢數(shù)據(jù)庫進(jìn)行數(shù)據(jù)獲取
3.需要定時(shí)發(fā)送一個(gè)郵件通知運(yùn)維人員在線情況。(定時(shí)任務(wù)簡單點(diǎn)用crontab)
4.下線機(jī)器的判定是機(jī)器信息更新時(shí)間在15分鐘內(nèi)算是在線,否則就是下線。

給予這樣的場景和需求就有了以下的內(nèi)容了。

1.首先網(wǎng)站的數(shù)據(jù)頁面找出來,可以用各種web開發(fā)工具,我這里使用的是httpfox,

查詢到這個(gè)device.php 通過post查詢機(jī)器數(shù)據(jù)然后返回機(jī)器在線數(shù)據(jù)。

關(guān)于post和get的科普:

GET 方法請(qǐng)注意,查詢字符串(名稱/值對(duì))是在 GET 請(qǐng)求的 URL 中發(fā)送的:
/test/demo_form.asp?name1=value1&name2=value2

有關(guān) GET 請(qǐng)求的其他一些注釋:

    * GET 請(qǐng)求可被緩存
    * GET 請(qǐng)求保留在瀏覽器歷史記錄中
    * GET 請(qǐng)求可被收藏為書簽
    * GET 請(qǐng)求不應(yīng)在處理敏感數(shù)據(jù)時(shí)使用
    * GET 請(qǐng)求有長度限制
    * GET 請(qǐng)求只應(yīng)當(dāng)用于取回?cái)?shù)據(jù)

POST 方法請(qǐng)注意,查詢字符串(名稱/值對(duì))是在 POST 請(qǐng)求的 HTTP 消息主體中發(fā)送的:
POST /test/demo_form.asp HTTP/1.1
Host: w3schools.com
name1=value1&name2=value2

有關(guān) POST 請(qǐng)求的其他一些注釋:

    * POST 請(qǐng)求不會(huì)被緩存
    * POST 請(qǐng)求不會(huì)保留在瀏覽器歷史記錄中
    * POST 不能被收藏為書簽
    * POST 請(qǐng)求對(duì)數(shù)據(jù)長度沒有要求

http://www.w3school.com.cn/tags/html_ref_httpmethods.asp


2.既然找到了切入點(diǎn),那么就可以開始爬了。

#!/usr/bin/python2.6
# -*- coding: utf-8 -*-

import re
import urllib
import urllib2
import json
import time
import datetime


def get_data():
  params = urllib.urlencode({'type':'gettable','data':'{"cpage":1,"pagesize":50,"search":{"address":{"type":"","id":0}}}'})

#用urllib.urlencode是因?yàn)檫@樣會(huì)方便將數(shù)據(jù)轉(zhuǎn)為一個(gè)key|value的字典來傳輸數(shù)據(jù),可用來post。
  url = 'http://XXXX/device.php' #url信息
  req = urllib2.Request(url=url,data=params)#創(chuàng)建請(qǐng)求內(nèi)容,參數(shù)是url和data
  a = urllib2.urlopen(req)#進(jìn)行訪問頁面,帶著請(qǐng)求信息
  b = a.read()     #這寫得比較簡單,將訪問頁面的返回信息轉(zhuǎn)變?yōu)閖son格式,然后讀取json格式的內(nèi)容獲取需要的字段,因?yàn)槲倚枰臄?shù)據(jù)在data字段里面,所以直接獲取data為key的value信息。
  c = json.loads(b)  
  data = c['data']

  downs_result = []

  num = 0

  for i in data:
    down_time_start = time.strptime(str(i['servertime']), "%Y-%m-%d %H:%M:%S")
    down_time_start = datetime.datetime(down_time_start.tm_year,down_time_start.tm_mon,down_time_start.tm_mday,down_time_start.tm_hour,down_time_start.tm_min,down_time_start.tm_sec)

#關(guān)于time.strptime和datetime是一種搭配使用的轉(zhuǎn)換時(shí)間格式的組合。只有將時(shí)間轉(zhuǎn)為真正的時(shí)間格式才能進(jìn)行運(yùn)算,所以需要先將"時(shí)間"用strptime轉(zhuǎn)為時(shí)間字符串,然后時(shí)間字符串用datetime轉(zhuǎn)為真正的時(shí)間格式
    if down_time_start < datetime.datetime.now() - datetime.timedelta(minutes=15): #這里用到timedelta來計(jì)算時(shí)間差值,timedelta會(huì)將時(shí)間轉(zhuǎn)為秒數(shù)。
      timediff = datetime.datetime.now() - down_time_start
      if re.search(r'days',str(timediff)):
        timediff = re.search(r'-?(\d+)\sdays,\s(\d+):(\d+):(\d+)',str(timediff))
        downs_fno = timediff.group(1) + "天" + timediff.group(2) + "小時(shí)" + timediff.group(3) + "分鐘" + timediff.group(4) + "秒"
        downs_result.append(i['name'].encode('utf-8') + '_____' + str(down_time_start) + '_____' + "(" + "下線距離現(xiàn)在已經(jīng)過了: " + downs_fno + ")")
   num += 1
      else:
        timediff = re.search(r'(\d+):(\d+):(\d+)',str(timediff))
        downs_fno = timediff.group(1) + "小時(shí)" + timediff.group(2) + "分鐘" + timediff.group(3) + "秒"
        downs_result.append(i['name'].encode('utf-8') + '_____' + str(down_time_start) + '_____' + "(" + "下線距離現(xiàn)在已經(jīng)過了: " + downs_fno + ")")
        num += 1

  return (downs_result,num,len(data))

參考地址:

https://docs.python.org/2/library/json.html

https://docs.python.org/2/library/urllib2.html#module-urllib2

https://docs.python.org/2/library/urllib.html#urllib.urlencode

https://docs.python.org/2/library/time.html#time.strptime

https://docs.python.org/2/library/datetime.html


3.然后就是發(fā)送郵件,上網(wǎng)抄了一下別人的例子

import smtplib  
from email.mime.text import MIMEText


mailto_list=["xxx@163.com"] 
mail_host="smtp.163.com"  #設(shè)置服務(wù)器
mail_user="xxx@qq.com"    #用戶名
mail_pass="12345"   #口令


def send_mail(to_list,sub,content):  #to_list:收件人;sub:主題;content:郵件內(nèi)容
    me="<"+mail_user+">"   #顯示發(fā)件人
    msg = MIMEText(content,_subtype='html',_charset='utf-8')    #創(chuàng)建一個(gè)實(shí)例,這里設(shè)置為html格式郵件
    msg['Subject'] = sub    #設(shè)置主題
    msg['From'] = me  
    msg['To'] = ";".join(to_list)  
    try:  
        s = smtplib.SMTP()  
        s.connect(mail_host)  #連接smtp服務(wù)器
        s.login(mail_user,mail_pass)  #登陸服務(wù)器
        s.sendmail(me, to_list, msg.as_string())  #發(fā)送郵件
        s.close()  
        return True  
    except Exception, e:  
        print str(e)  
        return False  

完整版是這樣的:

#!/usr/bin/python2.6
# -*- coding: utf-8 -*-

import re
import urllib
import urllib2
import json
import time
import datetime
import smtplib  
from email.mime.text import MIMEText

 
mailto_list=["xxx@163.com"] 
mail_host="smtp.163.com"  #設(shè)置服務(wù)器
mail_user="xxx@qq.com"    #用戶名
mail_pass="12345"   #口令

def get_data():
  params = urllib.urlencode({'type':'gettable','data':'{"cpage":1,"pagesize":50,"search":{"address":{"type":"","id":0}}}'})
  url = 'http://XXXX/device.php'
  headers = {'Content-Type': 'application/json'}
  req = urllib2.Request(url=url,data=params)
  req.add_header = ('Content-Type','application/json')
  a = urllib2.urlopen(req)
  b = a.read()
  c = json.loads(b)  
  data = c['data']

  downs_result = []

  num = 0

  for i in data:
    down_time_start = time.strptime(str(i['servertime']), "%Y-%m-%d %H:%M:%S")
    down_time_start = datetime.datetime(down_time_start.tm_year,down_time_start.tm_mon,down_time_start.tm_mday,down_time_start.tm_hour,down_time_start.tm_min,down_time_start.tm_sec)
    if down_time_start < datetime.datetime.now() - datetime.timedelta(minutes=15):
      timediff = datetime.datetime.now() - down_time_start
      if re.search(r'days',str(timediff)):
        timediff = re.search(r'-?(\d+)\sdays,\s(\d+):(\d+):(\d+)',str(timediff))
        downs_fno = timediff.group(1) + "天" + timediff.group(2) + "小時(shí)" + timediff.group(3) + "分鐘" + timediff.group(4) + "秒"
        downs_result.append(i['name'].encode('utf-8') + '_____' + str(down_time_start) + '_____' + "(" + "下線距離現(xiàn)在已經(jīng)過了: " + downs_fno + ")")
   num += 1
      else:
        timediff = re.search(r'(\d+):(\d+):(\d+)',str(timediff))
        downs_fno = timediff.group(1) + "小時(shí)" + timediff.group(2) + "分鐘" + timediff.group(3) + "秒"
        downs_result.append(i['name'].encode('utf-8') + '_____' + str(down_time_start) + '_____' + "(" + "下線距離現(xiàn)在已經(jīng)過了: " + downs_fno + ")")
        num += 1

  return (downs_result,num,len(data))


def send_mail(to_list,sub,content):  #to_list:收件人;sub:主題;content:郵件內(nèi)容
    me="<"+mail_user+">"   #顯示發(fā)件人
    msg = MIMEText(content,_subtype='html',_charset='utf-8')    #創(chuàng)建一個(gè)實(shí)例,這里設(shè)置為html格式郵件
    msg['Subject'] = sub    #設(shè)置主題
    msg['From'] = me  
    msg['To'] = ";".join(to_list)  
    try:  
        s = smtplib.SMTP()  
        s.connect(mail_host)  #連接smtp服務(wù)器
        s.login(mail_user,mail_pass)  #登陸服務(wù)器
        s.sendmail(me, to_list, msg.as_string())  #發(fā)送郵件
        s.close()  
        return True  
    except Exception, e:  
        print str(e)  
        return False  

if __name__ == '__main__':  
    downs_result,num,all=get_data()
    all = "一共有" + str(all) + "臺(tái)機(jī)器"
    num = "下線的有" +str(num) + "臺(tái)機(jī)器"
    downs_result = [ str(i) for i in downs_result]
    str_downs_result = "<br>".join(downs_result)
    if send_mail(mailto_list,"hello",all + "   " + num + "<br>" + str_downs_result):   #因?yàn)槭莌tml郵件,所以換行是<br>
        print "發(fā)送成功"  
    else:  
        print "發(fā)送失敗" 

完整版效果圖

原文鏈接:http://www.godblessyuan.com/2015/06/26/webcrawler_monitor/

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

推薦閱讀更多精彩內(nèi)容

  • # Awesome Python [![Awesome](https://cdn.rawgit.com/sindr...
    emily_007閱讀 2,227評(píng)論 0 3
  • # Python 資源大全中文版 我想很多程序員應(yīng)該記得 GitHub 上有一個(gè) Awesome - XXX 系列...
    小邁克閱讀 3,056評(píng)論 1 3
  • # Python 資源大全中文版 我想很多程序員應(yīng)該記得 GitHub 上有一個(gè) Awesome - XXX 系列...
    aimaile閱讀 26,593評(píng)論 6 427
  • 2017-08-16 桂湘行(三) ——暑期記錄之“游桂湘” 火山 龍脊梯田沿著山形分布,層疊的綠油油,據(jù)說秋色之...
    朱明云閱讀 290評(píng)論 0 1
  • 新時(shí)代的教師,首要工作非教知識(shí),而是教授學(xué)生如何學(xué)習(xí)。教的最大善意是不教,是賦能,賦能孩子發(fā)現(xiàn)的能力,學(xué)習(xí)的能力,...
    Tyger老師閱讀 369評(píng)論 1 5