好不容易盼來三天假，于是我在家蹲著coding啦。

這一次課堂，我主要了解兩個知識點(diǎn)：1、利用urllib抓取網(wǎng)頁；2利用正則表達(dá)式獲取網(wǎng)頁中需要的內(nèi)容。

第一個練習(xí)做的比較順利，代碼如下：
'''
import urllib.request
import re
page = urllib.request.urlopen('https://tieba.baidu.com/p/3205263090')

print(page.read())

html = page.read()
html = html.decode('UTF-8')

src="([.\S].jpg)" pic_ext="jpeg"

reg = r'src="([.\S].jpg)" pic_ext="jpeg"'
imgurls = re.findall(reg, html) #匹配出所有url

x=1

遍歷

for imgurl in imgurls:
print (imgurl)
urllib.request.urlretrieve(imgurl,"E:/pycharmproject/pachongtest/%s.jpg"%x)
print("downloading pic %d"%x)
x+=1
'''

第二個練習(xí)遇到了困難。運(yùn)行完畢后報“url錯誤”，我將url打印出來后發(fā)現(xiàn)是這樣的：
“//hbimg.b0.upaiyun.com/654953460733026a7ef6e101404055627ad51784a95c-B6OFs4_sq75sf”，試著更換正則表達(dá)式，得到的url還是這樣的，最后沒有辦法，手動的在獲取的每個url上加了一個http:，于是圖片都能保存了。代碼如下：
'''
import urllib.request
import re
page = urllib.request.urlopen('http://huaban.com/pins/1120072731/')
html = page.read()
html = html.decode('UTF-8')
print(html)

reg = r'<img src="([.\S])"'
imgurls = re.findall(reg, html) #匹配出所有url

x=1

遍歷

for imgurl in imgurls:

 imgurl='http:'+imgurl
 print(imgurl)
 urllib.request.urlretrieve(imgurl,"E:/pycharmproject/pachongtest/%s.jpg"%x)
 print("downloading pic %d"%x)
 x+=1

'''

另外，因?yàn)殚_始是在OSX上寫的，環(huán)境是python2.7，這兩天換到臺式機(jī)上，裝的python3.6，發(fā)現(xiàn)語法什么的還是有很多不一樣，兩邊的代碼放到對方那不一定能正確運(yùn)行。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

作業(yè)二之簡單爬蟲正則表達(dá)式

作業(yè)二之簡單爬蟲正則表達(dá)式

print(page.read())

src="([.\S].jpg)" pic_ext="jpeg"

遍歷

遍歷

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

作業(yè)二之簡單爬蟲正則表達(dá)式

print(page.read())

src="([.\S].jpg)" pic_ext="jpeg"

遍歷

遍歷

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频