丰满熟妇大号bbwbbwbbw,女人发了情的六大征兆,中文资源在线官网

爬取簡書首頁信息，包括：標(biāo)題，作者，發(fā)表時間，閱讀量，評論數(shù)，點贊數(shù)，打賞數(shù)，所投專題

因為自己看過一篇別人寫的爬取趕集網(wǎng)的信息，再加上也沒事做，就想著模仿著試一試，反正做的過程中是很痛苦的，好多基礎(chǔ)的都不會，就只能邊查資料邊學(xué)習(xí)了，硬著頭皮弄了一天，終于有了結(jié)果。
先上結(jié)果圖，存儲在mongodb中。

爬取的數(shù)據(jù)

好了，記錄一下做的過程吧。

1.查看要爬取頁面的源碼

經(jīng)過查看元素，發(fā)現(xiàn)在 ul 標(biāo)簽下的不同的 li 對應(yīng)不同的文章，而每個文章獲取標(biāo)題、作者等等的方法都一樣，那只需獲取這個文章列表，然后讓他們執(zhí)行相同的操作即可獲得所需數(shù)據(jù)。

2.查找自己所需數(shù)據(jù)所在的標(biāo)簽范圍

作者名和文章發(fā)布時間

標(biāo)題

閱讀量、評論數(shù)、點贊數(shù)和打賞數(shù)

3.具體的爬取數(shù)據(jù)過程

#encoding=utf-8
import requests,pymongo
from bs4 import BeautifulSoup

def get_info(url):

    r=requests.get(url) # 向服務(wù)器請求頁面
    r.encoding='utf-8' # 標(biāo)明編碼為utf-8,以免出現(xiàn)解碼錯誤
    soup=BeautifulSoup(r.text,'html.parser')  # 以html.parser方式對頁面進(jìn)行解析
    articlelist=soup.select('ul.note-list li')  #獲取首頁文章列表
    #print articlelist
    for article in articlelist:
        title=article.select('a.title')[0].text
        author=article.select('a.blue-link')[0].text
        date=article.select('span.time')[0].get('data-shared-at')
        if article.find_all('a',attrs={'class':'collection-tag'}):  #因為有些文章沒有所屬分類，所以先判斷，以免獲取為None
            collection=article.select('div.meta a.collection-tag')[0].text
            readnum=article.select('div.meta a:nth-of-type(2)')[0].text  #:nth-of-type(n) 選擇器匹配屬于父元素的特定類型的第 N 個子元素的每個元素.
            if article.find_all('i',attrs={'class':'iconfont ic-list-comments'}):
                commentnum=article.select('div.meta a:nth-of-type(3)')[0].text
            else:
                commentnum=0
        else:               #如果沒有所屬分類，那么閱讀量就是第一個a標(biāo)簽里的內(nèi)容
            collection='所屬分類無'
            readnum=article.select('div.meta a:nth-of-type(1)')[0].text
            if article.find_all('i',attrs={'class':'iconfont ic-list-comments'}):
                commentnum=article.select('div.meta a:nth-of-type(2)')[0].text
            else:
                commentnum=0
        likenum=article.select('div.meta span:nth-of-type(1)')[0].text
        if article.find_all('i',attrs={'class':'iconfont ic-list-money'}):
            money=article.select('div.meta span:nth-of-type(2)')[0].text
        else:
            money=0
        data = {
            'title' : title,
            'author' :author,
            'date': date,
            'readnum' : readnum,
            'commentnum' :commentnum,
            'likenum' : likenum,
            'money' : money,
            'collection' : collection
        }
        jianshu.insert_one(data)    #將獲取的數(shù)據(jù)存入到數(shù)據(jù)庫中
client = pymongo.MongoClient('localhost',27017)  # 連接mongodb
test = client['test']  # 創(chuàng)建一個名叫test的數(shù)據(jù)庫文件
jianshu = test['jianshu'] # 創(chuàng)建一個jianshu的表
get_info('http://www.lxweimin.com/')

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Python第二試

Python第二試

1.查看要爬取頁面的源碼

2.查找自己所需數(shù)據(jù)所在的標(biāo)簽范圍

3.具體的爬取數(shù)據(jù)過程

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Python第二試

1.查看要爬取頁面的源碼

2.查找自己所需數(shù)據(jù)所在的標(biāo)簽范圍

3.具體的爬取數(shù)據(jù)過程

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频