Weibo安全圈關(guān)注排名

首先通過腳本遍歷微博關(guān)注列表。。 self._startid = 1652595727 隨便找一個(gè)黑客大佬的ID就行。

#coding:utf-8
import requests
import json
import re
from bs4 import BeautifulSoup
import pymongo
import Queue
import time


class sec_weibo:
    def __init__(self):
        self._startid = 1652595727
        self.id = 0
        self.client = pymongo.MongoClient('mongodb://172.17.0.2/')
        self.db = self.client['weibo']
        self.posts = self.db['users']
        self.tagets = Queue.Queue()

    def get_follow(self):
        url = "https://weibo.cn/%d/follow" % self.id

        #cookies
        cookies = {}
        cookies_str = "自己通過chrome F12獲取cookie,控制臺(tái)的不行,因?yàn)橛蠬TTP ONLY"
        
        cookies_str = cookies_str.split(";")
        for cookie in cookies_str:
            cookie = cookie.split("=")
            cookies[cookie[0]] = cookie[1]

        #獲取關(guān)注的總頁數(shù)    
        try:
            res = requests.get(url,cookies=cookies)
            soup = BeautifulSoup(res.content,"lxml")
            soup.input['type']="hidden"
            soup.input['name']="mp"
            total_page = soup.find_all("input")[4]['value']
        except:
            print "啟動(dòng)失敗,請(qǐng)重新啟動(dòng)引擎。可能是被微博封IP了。"
            time.sleep(60)
            exit(0)

        n = 0

        follows = []

        #采集每頁的關(guān)注
        for page in range(1, int(total_page) + 1):
            time.sleep(0.5)
            try:
                url = "https://weibo.cn/%d/follow?page=%d" % (self.id, page)
                res = requests.get(url,cookies=cookies)
                soup = BeautifulSoup(res.content,"lxml")
                soup.td['valign']="top"
                for user in soup.find_all("td"):
                    if "style=\"width: 52px\"" not in str(user):
                        follow = {}
                        re_name = "\">(.+?)</a>"
                        re_uid = "uid=(.+?)&"
                        name = re.findall(re_name, str(user))[0].split(">")[1]#.decode("utf-8").encode("gbk")
                        uid = re.findall(re_uid, str(user))[0]
                        print name
                        print uid
                        print '-' * 50,
                        n += 1
                        print n
                        follow["name"] = name
                        follow["uid"] = uid
                        self.tagets.put(uid)
                        follow["follow"] = str(self.id)
                        #raw_input()
                        follows.append(follow)
            except:
                print "當(dāng)前頁面數(shù)據(jù)抓取失敗,跳過。"
                time.sleep(10)
        self.posts.insert_many(follows)
        print "休息10秒!"
        time.sleep(10)
        

    def run(self):
        self.id = self._startid
        self.get_follow()
        while self.tagets.qsize() > 0:
            print "qsize %d" % self.tagets.qsize()
            self.id = int(self.tagets.get())
            try:
                self.get_follow()
            except:
                print "此用戶數(shù)據(jù)抓取失敗,跳過。"
if __name__ == "__main__":
    work = sec_weibo()
    work.run()

爬到數(shù)據(jù)之后,進(jìn)行統(tǒng)計(jì)和排序。

import Queue
import pymongo

class data_analy:
    def __init__(self):
        self.client = pymongo.MongoClient('mongodb://127.0.0.1/')
        self.db = self.client['weibo_vps']
        self.posts = self.db['users']
        self.tagets = Queue.Queue()

    def run(self):
        res = self.posts.aggregate([
            {"$group":{"_id":"$name","total":{"$sum":1}}}
                                ])
        result = [] 
        for i in res:
            if i["total"] > 100:
                result.append(str(i["total"]) + "<====>" + i["_id"])

        result.sort()
        for i in result:
            print i
if __name__ == "__main__":
    work = data_analy()
    work.run()

結(jié)果:
101<====>Fooying
101<====>Seay_法師
101<====>安全北北
101<====>李劼杰
102<====>月亮山大王
102<====>粉絲服務(wù)平臺(tái)
104<====>KeenTeam
106<====>alert7
106<====>瘦肉丁
107<====>唐門三少_tang3
107<====>棟棟的棟
107<====>愛吃豬肉的ztz
108<====>微博客服
108<====>趙彥_ayazero
110<====>xfkxfk
111<====>Seebug漏洞平臺(tái)
111<====>互聯(lián)網(wǎng)的那點(diǎn)事
111<====>吃瓜群眾-Fr1day
112<====>秒拍
112<====>騰訊科恩實(shí)驗(yàn)室
112<====>阿里云安全
113<====>pynerd
113<====>左耳朵耗子
113<====>瘦古龍
113<====>阿里安全應(yīng)急響應(yīng)中心
113<====>黃源小童鞋
114<====>aullik5
116<====>微博數(shù)據(jù)助手
117<====>PanguTeam
117<====>xisigr
117<====>知乎
118<====>局座召忠
118<====>薛之謙
119<====>佳佳是個(gè)軟妹紙
120<====>D3AdCa7
121<====>evi1m0
121<====>微博打賞
124<====>RicterZ
124<====>nearg1e
124<====>粉絲群
126<====>RevengeRangers
126<====>rock509
126<====>央視新聞
126<====>微博運(yùn)動(dòng)
127<====>拍客小助手
127<====>網(wǎng)易云音樂
128<====>人民日?qǐng)?bào)
130<====>一直播
134<====>矮窮齪-陸羽
134<====>陳良-KeenLab
135<====>Val0z0
136<====>evil_xi4oyu
136<====>exp-sky
140<====>RAyH4c
140<====>國際版小助理
140<====>定時(shí)微博小助手
140<====>沈沉舟
142<====>宮一鳴cn
144<====>王思聰
146<====>微博故事
147<====>白帽匯趙武
148<====>yuange1975
148<====>古河120
148<====>長亭科技
150<====>real-肉肉
151<====>Orange_tw
151<====>riusksk
152<====>EvilMoon
156<====>楊卿-Ir0nSmith
159<====>博物雜志
159<====>碳基體
160<====>sunwear
163<====>redrain_QAQ
163<====>烏云知識(shí)庫
165<====>GeekPwn
167<====>SecWiki
167<====>papi醬
167<====>烏云-漏洞報(bào)告平臺(tái)
168<====>廖新喜已被注銷
174<====>安全客官方微博
175<====>龔廣OldFresher
177<====>我叫0day誰找我

183<====>phithon別跟路人甲BB
184<====>phunter_lau
188<====>FreeBuf
189<====>安全
云舒
192<====>微博紅包
194<====>微博問答
200<====>微博安全中心
201<====>呆子不開口
203<====>粉絲紅包
208<====>余弦
209<====>tombkeeper
212<====>微博抽獎(jiǎng)平臺(tái)
212<====>來去之間
216<====>超級(jí)話題
220<====>Flanker_017
229<====>蒸米spark
261<====>騰訊玄武實(shí)驗(yàn)室

嘿嘿。。 這些就是微博里安全圈的大佬了。
也可以像先知里那樣,整理一個(gè)關(guān)注關(guān)系網(wǎng)的可視化效果。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

推薦閱讀更多精彩內(nèi)容

  • 目前能ping通的IP:216.58.193.51 59.18.44.245 59.18.44.53 59.18....
    StevenZack閱讀 2,016評(píng)論 0 0
  • "漂亮的姑娘就要嫁人啦",慕容曉曉這樣唱道,唱紅了每條大街小巷。曾經(jīng),我思想極端到認(rèn)為多看"漂亮姑娘"兩眼都是作惡...
    Eeyore閱讀 218評(píng)論 0 0
  • def print_memory():import psutilchromes = [p for p in psu...
    流浪騎士閱讀 327評(píng)論 0 1
  • 佛說,前生的五百次回眸,才換來今生的一次擦肩而過。 或早或晚,在人海茫茫中遇到一個(gè)心意相通的人。從此看星星看月亮看...
    安夏茉閱讀 1,479評(píng)論 56 49
  • 長衫醉臥晚?xiàng)魍ぃ瑲埾加逞┍烫靸簟S裢氤藴蜔熡辏蝸肀嘁粐@息?
    秋一葉西閱讀 532評(píng)論 0 0