Screen Shot 2018-07-11 at 9.22.53 AM.png
概要
介紹下requests和BeautifulSoup兩個庫的基本使用
具體內(nèi)容
- requests
requests是一個模擬瀏覽器發(fā)送請求的庫- methods
具體的http請求類型:
GET對應(yīng) requests.get()
POST對應(yīng) requests.post()
- url
對應(yīng)的http請求地址
url = 'http://www.cnblogs.com/wupeiqi/p/9078770.html'
requests.get(url=url)
- header
http請求的請求頭
header = {'Content-Type': 'image/jpeg'}
requests.get(url=url, header=header)
- cookie
http請求的緩存
cookie = {'_gid': 'GA1.2.1083957064.1531274683'}
requests.get(url=url, cookie=cookie)
- 上傳文件
file = {''file'': open('report.xls', 'rb')}
requests.get(url=url, file=file)
- methods
- BeautifulSoup
BeautifulSoup是一個可以從HTML或XML文件中提取數(shù)據(jù)的Python庫- 初始化
soup = BeautifulSoup(請求返回的html文本,'html.parser')
- find
找到上一篇、下一篇的div標(biāo)簽
soup.find(name = 'div', id = 'post_next_prev')
- find_all
查詢所有的a標(biāo)簽soup.find_all('a')
- get
獲取div標(biāo)簽里面的鏈接
soup.get('href')
圖片鏈接
soup.get('src')
- 初始化