處女開苞大合集毛片视频,坤坤寒进桃子里嗟嗟嗟视频,www.狠狠爱

requests(爬蟲系列之一)

由于最近工作中，與同事對(duì)接模擬手機(jī)瀏覽器進(jìn)行廣告模擬跳轉(zhuǎn)。又一次接觸用到爬蟲的知識(shí)，以前用過urllib + bs4 + selenium定向爬取網(wǎng)易一元奪寶的商品信息保存在數(shù)據(jù)庫中，當(dāng)時(shí)，還是太年輕，對(duì)爬蟲不是很了解，對(duì)爬蟲的robots協(xié)議也不知道。現(xiàn)在重新梳理一下爬蟲的知識(shí)。爭取寫一個(gè)系列，大致內(nèi)容順序是requests, bs4,re, scrapy, selenium等。
在介紹requests庫之前，先介紹以下基本的http概念,
下面內(nèi)容是在上嵩天教授課程筆記整理。在這里感謝他。

追尋

HTTP協(xié)議

HTTP,超文本傳輸協(xié)議（HTTP，HyperText Transfer Protocol)是互聯(lián)網(wǎng)上應(yīng)用最為廣泛的一種網(wǎng)絡(luò)協(xié)議。所有的WWW文件都必須遵守這個(gè)標(biāo)準(zhǔn)。設(shè)計(jì)HTTP最初的目的是為了提供一種發(fā)布和接收HTML頁面的方法,HTTP是一種基于"請(qǐng)求與響應(yīng)"模式的、無狀態(tài)的應(yīng)用層協(xié)議。HTTP協(xié)議采用URL作為定位網(wǎng)絡(luò)資源的的標(biāo)識(shí)符。
http://host[:post][path]
host:合法的Internet主機(jī)域名或ip地址
port:端口號(hào)，缺省為80
path:請(qǐng)求資源的路徑

HTTP URl的理解:
url是通過HTTP協(xié)議存取資源的的Internet路徑，一個(gè)URL對(duì)應(yīng)一個(gè)數(shù)據(jù)資源

HTTP協(xié)議對(duì)資源的操作

方法	說明
GET	請(qǐng)求獲取URL位置的資源
HEAD	請(qǐng)求獲取URL位置資源的響應(yīng)消息報(bào)告，即獲得資源的頭部信息
POST	請(qǐng)求向URL位置的資源后附加新的消息
PUT	請(qǐng)求向URL位置存儲(chǔ)一個(gè)資源，覆蓋原URL位置的資源
PATCH	請(qǐng)求局部更新URL位置的資源,即改變?cè)撎庂Y源的部分內(nèi)容
DELETE	請(qǐng)求刪除URL位置存儲(chǔ)的資源

以上方法中，GET,HEAD是從服務(wù)器獲取信息到本地，PUT,POST,PATCH,DELETE是從本地向服務(wù)器提交信息。通過URL和命令管理資源，操作獨(dú)立無狀態(tài)，網(wǎng)絡(luò)通道及服務(wù)器成了黑盒子。
文檔

安裝

pip install requests

requests庫安裝小測

import request
url = 'https://www.baidu.com'
r = requests.get(url)
r.encoding = r.apparent_encoding
print(r.text[-200:])

Out[13]: 'w.baidu.com/duty/>使用百度前必讀</ a>  < a href= >意見反饋</ a> 京ICP證030173號(hào)  < img src=//www.baidu.com/img/gs.gif> </p > </div> </div> </div> </body> </html>\r\n'

requests庫7個(gè)主要方法

方法	說明
requsts.requst()	構(gòu)造一個(gè)請(qǐng)求，最基本的方法，是下面方法的支撐
requsts.get()	獲取網(wǎng)頁，對(duì)應(yīng)HTTP中的GET方法
requsts.post()	向網(wǎng)頁提交信息，對(duì)應(yīng)HTTP中的POST方法
requsts.head()	獲取html網(wǎng)頁的頭信息，對(duì)應(yīng)HTTP中的HEAD方法
requsts.put()	向html提交put方法，對(duì)應(yīng)HTTP中的PUT方法
requsts.patch()	向html網(wǎng)頁提交局部請(qǐng)求修改的的請(qǐng)求，對(duì)應(yīng)HTTP中的PATCH方法
requsts.delete()	向html提交刪除請(qǐng)求，對(duì)應(yīng)HTTP中的DELETE方法

requests.get()

r = requests.get(url)
r:是一個(gè)Response對(duì)象，一個(gè)包含服務(wù)器資源的對(duì)象
.get(url):是一個(gè)Request對(duì)象，構(gòu)造一個(gè)向服務(wù)器請(qǐng)求資源的Request。

In [4]: type(requests.get(url))
Out[4]: requests.models.Response

下面看一下源碼:

def get(url, params=None, **kwargs):
    kwargs.setdefault('allow_redirects', True)
    return request('get', url, params=params, **kwargs) #返回一個(gè)request對(duì)象

＃　request對(duì)象，另外,method參數(shù)就是修改http方法
def request(method, url, **kwargs):
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)
        
class Session(SessionRedirectMixin):
....
    # session的reqeust方法
    def request(self, method, url,
        params=None,
        data=None,
        headers=None,
        cookies=None,
        files=None,
        auth=None,
        timeout=None,
        allow_redirects=True,
        proxies=None,
        hooks=None,
        stream=None,
        verify=None,
        cert=None,
        json=None):
        #　構(gòu)造一個(gè)Request對(duì)象.
        req = Request(
            method = method.upper(),
            url = url,
            headers = headers,
            files = files,
            data = data or {},
            json = json,
            params = params or {},
            auth = auth,
            cookies = cookies,
            hooks = hooks,
        )
        prep = self.prepare_request(req)

        proxies = proxies or {}

        settings = self.merge_environment_settings(
            prep.url, proxies, stream, verify, cert
        )

        # Send the request.
        send_kwargs = {
            'timeout': timeout,
            'allow_redirects': allow_redirects,
        }
        send_kwargs.update(settings)
        resp = self.send(prep, **send_kwargs)

        return resp

get方法參數(shù)

request.get(url,params=None,**kwargs)
從上面的源碼也可以知道，解釋一下參數(shù)含義
url: 獲取html的網(wǎng)頁的url
params:url中的額外的參數(shù)，字典或字節(jié)流格式，可選
**kwargs:　12個(gè)控制訪問的參數(shù)

Requests中兩個(gè)重要的對(duì)象

r = requests.get(url)
r:是一個(gè)Response對(duì)象，一個(gè)包含服務(wù)器資源的對(duì)象,Request對(duì)象包含爬蟲返回的內(nèi)容。
.get(url):是一個(gè)Request對(duì)象，構(gòu)造一個(gè)向服務(wù)器請(qǐng)求資源的Request。
x下面用例子看一下，返回的對(duì)象包含的內(nèi)容

In [5]: type(r)  ＃打印類型
Out[5]: requests.models.Response

In [6]: dir(r)　＃顯示具有的屬性和方法
Out[6]: 
['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

列出幾個(gè)重要的屬性：

屬性	說明
r.status_code	HTTP請(qǐng)求返回狀態(tài)碼，200表示成功
r.text	HTTP響應(yīng)的字符串形式，即，url對(duì)應(yīng)的頁面內(nèi)容
r.encoding	從HTTP　header中猜測的響應(yīng)內(nèi)容的編碼方式
r.apparent_encoding	從內(nèi)容中分析響應(yīng)內(nèi)容的編碼方式(備選編碼方式)
r.content	HTTP響應(yīng)內(nèi)容的二進(jìn)制形式

理解Response編碼
r.encoding:如果header中不存在charset,則認(rèn)為編碼是ISO-8859-1,r.text根據(jù)r.encoding顯示網(wǎng)頁內(nèi)容
r.apparent_encoding:根據(jù)網(wǎng)頁內(nèi)容分析處的編碼方式可以看做是r.encoding的備選

response = requests.get('http://www.lxweimin.com/')
# 獲取響應(yīng)狀態(tài)碼
print(type(response.status_code),response.status_code)
# 獲取響應(yīng)頭信息
print(type(response.headers),response.headers)
# 獲取響應(yīng)頭中的cookies
print(type(response.cookies),response.cookies)
# 獲取訪問的url
print(type(response.url),response.url)
# 獲取訪問的歷史記錄
print(type(response.history),response.history)

理解requests庫的異常

網(wǎng)絡(luò)鏈接有風(fēng)險(xiǎn)，異常處理很重要

異常	說明
requests.ConnectionError	網(wǎng)絡(luò)連接異常，如DNS查詢失敗，拒絕連接等
requests.HTTPError	HTTP錯(cuò)誤異常
requests.URLRequired	URL缺失異常
requests.TooManyRedirects	超過最大重定向次數(shù)，產(chǎn)生重定向異常
requests.ConnectTimeout	連接遠(yuǎn)程服務(wù)器超時(shí)異常
requests.Timeout	請(qǐng)求URL超時(shí)，產(chǎn)生超時(shí)異常

import requests
from requests.exceptions import ReadTimeout, ConnectionError, RequestException

try:
  response = requests.get("http://httpbin.org/get", timeout = 0.5)
          print(response.status_code)
except ReadTimeout:
  # 超時(shí)異常
  print('Timeout')
except ConnectionError:
  # 連接異常
  print('Connection error')
except RequestException:
  # 請(qǐng)求異常
  print('Error')

理解Response的異常

r.raise_for_status()
如果status_code不是200,產(chǎn)生異常requests.HTTPError

r.raise_for_status()方法內(nèi)部判斷r.status_code是否等于200不需要增加額外的if語句，該語句便于利用try-except進(jìn)行異常處理。
raise_for_status源碼

   def raise_for_status(self):
       """Raises stored :class:`HTTPError`, if one occurred."""

       http_error_msg = ''
       if isinstance(self.reason, bytes):
           # We attempt to decode utf-8 first because some servers
           # choose to localize their reason strings. If the string
           # isn't utf-8, we fall back to iso-8859-1 for all other
           # encodings. (See PR #3538)
           try:
               reason = self.reason.decode('utf-8')
           except UnicodeDecodeError:
               reason = self.reason.decode('iso-8859-1')
       else:
           reason = self.reason

       if 400 <= self.status_code < 500:
           http_error_msg = u'%s Client Error: %s for url: %s' % (self.status_code, reason, self.url)

       elif 500 <= self.status_code < 600:
           http_error_msg = u'%s Server Error: %s for url: %s' % (self.status_code, reason, self.url)

       if http_error_msg:
           raise HTTPError(http_error_msg, response=self)

requests內(nèi)置的狀態(tài)字符

# -*- coding: utf-8 -*-

from .structures import LookupDict

_codes = {

    # Informational.
    100: ('continue',),
    101: ('switching_protocols',),
    102: ('processing',),
    103: ('checkpoint',),
    122: ('uri_too_long', 'request_uri_too_long'),
    200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\\o/', '?'),
    201: ('created',),
    202: ('accepted',),
    203: ('non_authoritative_info', 'non_authoritative_information'),
    204: ('no_content',),
    205: ('reset_content', 'reset'),
    206: ('partial_content', 'partial'),
    207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'),
    208: ('already_reported',),
    226: ('im_used',),

    # Redirection.
    300: ('multiple_choices',),
    301: ('moved_permanently', 'moved', '\\o-'),
    302: ('found',),
    303: ('see_other', 'other'),
    304: ('not_modified',),
    305: ('use_proxy',),
    306: ('switch_proxy',),
    307: ('temporary_redirect', 'temporary_moved', 'temporary'),
    308: ('permanent_redirect',
          'resume_incomplete', 'resume',),  # These 2 to be removed in 3.0

    # Client Error.
    400: ('bad_request', 'bad'),
    401: ('unauthorized',),
    402: ('payment_required', 'payment'),
    403: ('forbidden',),
    404: ('not_found', '-o-'),
    405: ('method_not_allowed', 'not_allowed'),
    406: ('not_acceptable',),
    407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'),
    408: ('request_timeout', 'timeout'),
    409: ('conflict',),
    410: ('gone',),
    411: ('length_required',),
    412: ('precondition_failed', 'precondition'),
    413: ('request_entity_too_large',),
    414: ('request_uri_too_large',),
    415: ('unsupported_media_type', 'unsupported_media', 'media_type'),
    416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'),
    417: ('expectation_failed',),
    418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'),
    421: ('misdirected_request',),
    422: ('unprocessable_entity', 'unprocessable'),
    423: ('locked',),
    424: ('failed_dependency', 'dependency'),
    425: ('unordered_collection', 'unordered'),
    426: ('upgrade_required', 'upgrade'),
    428: ('precondition_required', 'precondition'),
    429: ('too_many_requests', 'too_many'),
    431: ('header_fields_too_large', 'fields_too_large'),
    444: ('no_response', 'none'),
    449: ('retry_with', 'retry'),
    450: ('blocked_by_windows_parental_controls', 'parental_controls'),
    451: ('unavailable_for_legal_reasons', 'legal_reasons'),
    499: ('client_closed_request',),

    # Server Error.
    500: ('internal_server_error', 'server_error', '/o\\', '?'),
    501: ('not_implemented',),
    502: ('bad_gateway',),
    503: ('service_unavailable', 'unavailable'),
    504: ('gateway_timeout',),
    505: ('http_version_not_supported', 'http_version'),
    506: ('variant_also_negotiates',),
    507: ('insufficient_storage',),
    509: ('bandwidth_limit_exceeded', 'bandwidth'),
    510: ('not_extended',),
    511: ('network_authentication_required', 'network_auth', 'network_authentication'),
}

codes = LookupDict(name='status_codes')

for code, titles in _codes.items():
    for title in titles:
        setattr(codes, title, code)
        if not title.startswith('\\'):
            setattr(codes, title.upper(), code)

以上的方法特別好，可以借鑒使用在自己項(xiàng)目中進(jìn)行數(shù)據(jù)映射轉(zhuǎn)換。
reqeust.codes可以使用屬性方式去訪問。如：

print(requests.codes.ok)
200
print(requests.codes.unordered_collection)
425
type(requests.codes.not_extended)
Out[15]: int
print(requests.codes.not_extended)
510

爬取網(wǎng)頁的通用代碼框架

# coding: utf8

import requests


def get_html(url, params):
    try:
        r = requests.get(url, params)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return "raise exception"


if __name__ == "__main__":
    url = "http://www.baidu.com"
    print(get_html(url))

Requests庫中的head()方法

head

Requests庫中的post()方法

post

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {     #post提交的data是一個(gè)字典的
    "key1": "youdi", # 就會(huì)格式化成一個(gè)form
    "king": "youdi", 
    "value": "the one"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "35", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.13.0"
  }, 
  "json": null, 
  "origin": "183.240.20.24", 
  "url": "http://httpbin.org/post"
}


{
  "args": {}, 
  "data": "ABCDEFG",  # post提交的data是字符串 ，編碼為data
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "7", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.13.0"
  }, 
  "json": null, 
  "origin": "183.240.20.24", 
  "url": "http://httpbin.org/post"
}

Requests庫中的put()方法

put

requests庫主要方法解析

requests.request(method, url, **kwagrs)

method:　請(qǐng)求方式，對(duì)應(yīng)get/put/post等7種方法
url: 獲取頁面的url鏈接
**kwargs:　控制訪問的參數(shù)，共有13個(gè)

method:請(qǐng)求方式

r = requests.request(method='GET', url=url, **kwargs)
r = requests.get(url, **kwargs)
r = requests.request(method='HEAD', url=url, **kwargs)
r = requests.head(url, **kwargs)
r = requests.request(method='POST', url=url, **kwargs)
r = requests.post(url, **kwargs)
r = requests.request(method='PUT', url=url, **kwargs)
r = requests.put(url, **kwargs)
r = requests.request(method='PATCH', url=url, **kwargs)
r = requests.patch(url, **kwargs)
r = requests.request(method='DELETE', url=url, **kwargs)
r = requests.delete(url, **kwargs)
r = requests.request(method='OPTIONS', url=url, **kwargs)
r = requests.options(url, **kwargs)

說明：上面的方法和下面的方法達(dá)到的效果是一樣的，就是做了一層封裝，把比較常用的方法都抽出來，python中很多庫都是這樣做的。典型的就是matplotlib中模仿matlab使用最簡單的method繪制目標(biāo)圖。這個(gè)內(nèi)容后面會(huì)更新給大家。

**kwargs:控制訪問的參數(shù)，均為可選項(xiàng)

params: 字典或字節(jié)序列，作為參數(shù)增加到url中

data:字典，字節(jié)序列或文件對(duì)象,作為Request的內(nèi)容

json: JSON格式的數(shù)據(jù)，作為Request的內(nèi)容

headers: 字典, HTTP定制頭

cookie: 字典或CooKiJar, Request中的cookie

auth: 元祖，支持HTTP認(rèn)證功能

files: 字典類型，傳輸文件

timeout: 設(shè)定超時(shí)時(shí)間，秒為單位

proxies: 字典類型，設(shè)定訪問代理服務(wù)器，可以增加登錄認(rèn)證

allow_redirects: True/False,默認(rèn)為True,重定向開關(guān)
stream: True/False，默認(rèn)為True,獲取內(nèi)容立即下載開關(guān)
verity: True/False默認(rèn)Ture,認(rèn)證ssl證書開關(guān)
cert: 本地ssl證書路徑

遠(yuǎn)方背影

下面逐一介紹

params
字典或字節(jié)序列，作為參數(shù)增加到url中

In [36]: payload
Out[36]: {'key1': 'one', 'key2': 'two'}

In [37]: r = requests.request('GET', 'http://python123.io/ws', params=payload)

In [38]: print(r.url)
http://python123.io/ws?key1=one&key2=two

data
字典，字節(jié)序列或文件對(duì)象,作為Request的內(nèi)容

import requests
payload = {'key1':'one', 'key2':'two'}
url = 'http://httpbin.org/put'
r = requests.put(url=url, data=payload)
# or 
r = requests.put(url=url, data='ABCDEFG') #字符串

json
JSON格式的數(shù)據(jù)，作為Request的內(nèi)容

In [48]: kv = {'name': 'youdi', 'role': 'king', 'rank': 'the one'}

In [49]: url = 'http://httpbin.org/post'

In [50]: r = requests.request(method='POST', url=url, json=kv)

In [51]: print(r.text)
{
  "args": {}, 
  "data": "{\"role\": \"king\", \"rank\": \"the one\", \"name\": \"youdi\"}",  #json格式，其實(shí)就是字符串
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "52", 
    "Content-Type": "application/json", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.13.0"
  }, 
  "json": {
    "name": "youdi", 
    "rank": "the one", 
    "role": "king"
  }, 
  "origin": "183.60.175.16", 
  "url": "http://httpbin.org/post"
}

headers
字典, HTTP定制頭部信息，隱藏爬蟲信息，模擬瀏覽器的頭部信息

In [58]: url = 'http://httpbin.org/post'

In [59]: r = requests.request('POST', url)

# 頭部信息
In [69]: r.request.headers
# 觀察User-Agent
Out[69]: {'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0', 'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Content-Length': '0'}


#加入headers后
In [62]: headers = { # 瀏覽器代理
    ...:      "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Ch
    ...: rome/57.0.2987.133 Safari/537.36"
    ...: }
In [63]: r = requests.request('POST', url, headers=headers)

In [71]: r.request.headers
Out[71]: {'Accept': '*/*', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36', 'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Content-Length': '0'}

cookie
字典或CooKiJar, Request中的cookie


#先獲取百度的cookie
In [74]: r = requests.request('GET', 'https://www.baidu.com')

In [75]: r
Out[75]: <Response [200]>
# 保存在變量中
In [76]: cookie = r.cookies

# cookie類型
In [86]: type(cookie)
Out[86]: requests.cookies.RequestsCookieJar


In [77]: r_baidu = requests.request('POST', 'https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=old&rsv_pq=981edbe6000308e9&rsv_t=76c1VG%2B1PcKzCGSEjcf3W2zDn5ZcBnhR1TAe%2FcJ32OW62aKsa5DWo7YYsms&rqlang=cn&rsv_enter=1&rsv_sug3=2', cookie=cookie)
# https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=test&rsv_pq=981edbe6000308e9&rsv_t=76c1VG%2B1PcKzCGSEjcf3W2zDn5ZcBnhR1TAe%2FcJ32OW62aKsa5DWo7YYsms&rqlang=cn&rsv_enter=1&rsv_sug3=2 中 wd修改關(guān)鍵詞 就是提交給百度進(jìn)行搜索的內(nèi)容

auth
元祖，支持HTTP認(rèn)證功能

import requests
# 最簡單的http驗(yàn)證
from requests.auth import HTTPBasicAuth

r = requests.get('http://httpbin.org/auth', auth=HTTPBasicAuth('user', 'user'))
# r = requests.get('http://httpbin.org/auth', auth=('user', 'user'))
print(r.status_code)

files
字典類型，傳輸文件

fs = {file: open('data.xls', 'rb')}
# 使用files參數(shù)就可以了
r = requests.request('POST','http://httpbin.org/post',files=fs)

timesout
設(shè)定超時(shí)時(shí)間，秒為單位

import requests
from requests.exceptions import ReadTimeout

try:
  # 設(shè)置必須在500ms內(nèi)收到響應(yīng)，不然或拋出ReadTimeout異常
  response = requests.get("http://httpbin.org/get", timeout=0.5)
           print(response.status_code)
except ReadTimeout:
  print('Timeout')

proxies
字典類型，設(shè)定訪問代理服務(wù)器，可以增加登錄認(rèn)證

import requests

#普通代理
proxies = {
   "http": "http://127.0.0.1:1080",
   "https": "https://127.0.0.1:1080",
}
# 往請(qǐng)求中設(shè)置代理(proxies)
r = requests.get("https://www.taobao.com", proxies=proxies)
print(r.status_code)

# 帶有用戶名和密碼的代理
proxies = {
   "http": "http://user:password@127.0.0.1:9743/",
}
r = requests.get("https://www.taobao.com", proxies=proxies)
print(r.status_code)

# 設(shè)置socks代理,翻墻必備
proxies = {
   'http': 'socks5://127.0.0.1:1080',
   'https': 'socks5://127.0.0.1:1080'
}
r = requests.get("https://www.google.com", proxies=proxies)
print(r.status_code)

allow_redirects
True/False,默認(rèn)為True,重定向開關(guān)

r = requests.request('GET','http://httpbin.org/get',allow_redirects=False)

stream
True/False，默認(rèn)為True,獲取內(nèi)容立即下載開關(guān)

r = requests.request('GET','http://httpbin.org/get/**.txt',stream=False)

verity
True/False默認(rèn)Ture,認(rèn)證ssl證書開關(guān)

# 無證書訪問
r = requests.get('https://www.12306.cn')
# 在請(qǐng)求https時(shí)，request會(huì)進(jìn)行證書的驗(yàn)證，如果驗(yàn)證失敗則會(huì)拋出異常
print(r.status_code)


# 關(guān)閉驗(yàn)證，但是仍然會(huì)報(bào)出證書警告
r = requests.get('https://www.12306.cn',verify=False)
print(r.status_code)

# 消除關(guān)閉證書驗(yàn)證的警告
from requests.packages import urllib3

# 關(guān)閉警告
urllib3.disable_warnings()
r = requests.get('https://www.12306.cn',verify=False)
print(r.status_code)

cert
本地ssl證書路徑

# 設(shè)置本地證書
r = requests.get('https://www.12306.cn', cert=('/home/youdi/Download/**.crt', '/hone/youdi/.ssh/**.key'))
print(r.status_code)

初心未改

requests介紹完了，格式不是太好，在此抱歉，爬蟲系列有時(shí)間話會(huì)一直更新下去，另外，爬蟲系列完了，計(jì)劃做關(guān)于python數(shù)據(jù)處理與繪圖系列。

謝謝閱讀，不要吝嗇你的點(diǎn)贊或打賞。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Requests庫詳解

Requests庫詳解

requests(爬蟲系列之一)

HTTP協(xié)議

安裝

requests庫安裝小測

requests庫7個(gè)主要方法

requests.get()

get方法參數(shù)

Requests中兩個(gè)重要的對(duì)象

理解requests庫的異常

理解Response的異常

requests內(nèi)置的狀態(tài)字符

爬取網(wǎng)頁的通用代碼框架

Requests庫中的head()方法

Requests庫中的post()方法

Requests庫中的put()方法

requests庫主要方法解析

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Requests庫詳解

requests(爬蟲系列之一)

HTTP協(xié)議

安裝

requests庫安裝小測

requests庫7個(gè)主要方法

requests.get()

get方法參數(shù)

Requests中兩個(gè)重要的對(duì)象

理解requests庫的異常

理解Response的異常

requests內(nèi)置的狀態(tài)字符

爬取網(wǎng)頁的通用代碼框架

Requests庫中的head()方法

Requests庫中的post()方法

Requests庫中的put()方法

requests庫主要方法解析

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频