進(jìn)入項(xiàng)目的根目錄，執(zhí)行下列命令來(lái)啟動(dòng)shell:

      scrapy shell "http://www.itcast.cn/channel/teacher.shtml"

image.png

Scrapy Shell根據(jù)下載的頁(yè)面會(huì)自動(dòng)創(chuàng)建一些方便使用的對(duì)象，例如 Response 對(duì)象，以及 Selector 對(duì)象 (對(duì)HTML及XML內(nèi)容)。

當(dāng)shell載入后，將得到一個(gè)包含response數(shù)據(jù)的本地 response 變量，輸入 response.body將輸出response的包體，輸出 response.headers 可以看到response的包頭。
輸入 response.selector 時(shí)，將獲取到一個(gè)response 初始化的類(lèi) Selector 的對(duì)象，此時(shí)可以通過(guò)使用 response.selector.xpath()或response.selector.css() 來(lái)對(duì) response 進(jìn)行查詢(xún)。
Scrapy也提供了一些快捷方式, 例如 response.xpath()或response.css()同樣可以生效（如之前的案例）。

Selectors選擇器

Scrapy Selectors 內(nèi)置 XPath 和 CSS Selector 表達(dá)式機(jī)制

Selector有四個(gè)基本的方法，最常用的還是xpath:

xpath(): 傳入xpath表達(dá)式，返回該表達(dá)式所對(duì)應(yīng)的所有節(jié)點(diǎn)的selector list列表
extract(): 序列化該節(jié)點(diǎn)為Unicode字符串并返回list
css(): 傳入CSS表達(dá)式，返回該表達(dá)式所對(duì)應(yīng)的所有節(jié)點(diǎn)的selector list列表，語(yǔ)法同 BeautifulSoup4

re(): 根據(jù)傳入的正則表達(dá)式對(duì)數(shù)據(jù)進(jìn)行提取，返回Unicode字符串list列表
XPath表達(dá)式的例子及對(duì)應(yīng)的含義:

/html/head/title: 選擇<HTML>文檔中 <head> 標(biāo)簽內(nèi)的 <title> 元素
/html/head/title/text(): 選擇上面提到的 <title> 元素的文字
//td: 選擇所有的 <td> 元素
//div[@class="mine"]: 選擇所有具有 class="mine" 屬性的 div 元素

嘗試Selector

我們用騰訊社招的網(wǎng)站http://hr.tencent.com/position.php?&start=0#a舉例：

    # 啟動(dòng)
    scrapy shell "http://hr.tencent.com/position.php?&start=0#a"

    # 返回 xpath選擇器對(duì)象列表
    response.xpath('//title')
    [<Selector xpath='//title' data=u'<title>\u804c\u4f4d\u641c\u7d22 | \u793e\u4f1a\u62db\u8058 | Tencent \u817e\u8baf\u62db\u8058</title'>]

    # 使用 extract()方法返回 Unicode字符串列表
    response.xpath('//title').extract()
    [u'<title>\u804c\u4f4d\u641c\u7d22 | \u793e\u4f1a\u62db\u8058 | Tencent \u817e\u8baf\u62db\u8058</title>']

    # 打印列表第一個(gè)元素，終端編碼格式顯示
    print response.xpath('//title').extract()[0]
    <title>職位搜索 | 社會(huì)招聘 | Tencent 騰訊招聘</title>

    # 返回 xpath選擇器對(duì)象列表
    response.xpath('//title/text()')
    <Selector xpath='//title/text()' data=u'\u804c\u4f4d\u641c\u7d22 | \u793e\u4f1a\u62db\u8058 | Tencent \u817e\u8baf\u62db\u8058'>

    # 返回列表第一個(gè)元素的Unicode字符串
    response.xpath('//title/text()')[0].extract()
    u'\u804c\u4f4d\u641c\u7d22 | \u793e\u4f1a\u62db\u8058 | Tencent \u817e\u8baf\u62db\u8058'

    # 按終端編碼格式顯示
    print response.xpath('//title/text()')[0].extract()
    職位搜索 | 社會(huì)招聘 | Tencent 騰訊招聘

    response.xpath('//*[@class="even"]')
    職位名稱(chēng):

    print site[0].xpath('./td[1]/a/text()').extract()[0]
    TEG15-運(yùn)營(yíng)開(kāi)發(fā)工程師（深圳）
    職位名稱(chēng)詳情頁(yè):

    print site[0].xpath('./td[1]/a/@href').extract()[0]
    position_detail.php?id=20744&keywords=&tid=0&lid=0
    職位類(lèi)別:

    print site[0].xpath('./td[2]/text()').extract()[0]
    技術(shù)類(lèi)

以后做數(shù)據(jù)提取的時(shí)候，可以把現(xiàn)在Scrapy Shell中測(cè)試，測(cè)試通過(guò)后再應(yīng)用到代碼中。
官方文檔：http://scrapy-chs.readthedocs.io/zh_CN/latest/topics/shell.html

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

爬蟲(chóng)系列（十七）：Scrapy Shell

爬蟲(chóng)系列（十七）：Scrapy Shell

Selectors選擇器

嘗試Selector

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

爬蟲(chóng)系列（十七）：Scrapy Shell

Selectors選擇器

嘗試Selector

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频