成人H无码动漫在线观看,4399韩国电影免费观看网,最新精品国偷自产在线学生

最近在學python，首先推薦兩個網站：
http://www.runoob.com/python/python-tutorial.html （python基礎知識）
http://blog.csdn.net/pleasecallmewhy/article/details/8922826 （關于爬蟲博客）
如果你有語言基礎，爬蟲上手很快，前面基礎了解后，你可以找一些感興趣的東西爬取或者上網找一些爬蟲的例子，針對例子不懂的地方進行針對學習，下面直如主題：

我爬取的是第609期女嘉賓，網站
http://tv.jstv.com/fcwr/episode/1489737583149.shtml

操作.gif

首先點擊女嘉賓，點擊檢查，可以看到如下代碼

<img src="http://static.jstv.com/img/2017/3/17/
20173171489738296512_18787.jpg" alt="1號女嘉賓—劉妍滟">

然后根據這個可以寫出匹配的正則表達式：

reg = r'<span>.*?嘉賓—(.+?)</span>'

這個可以匹配男女嘉賓的，匹配姓名的正則表達式會在后面代碼貼出。
下面貼出完整代碼

#coding=utf-8
import urllib2
import urllib
import re

def getHtml(url):
    response = urllib2.urlopen(url);
    page = response.read();
    return page;

def getImg(html):
    reg = r'src="(.+?\.jpg)"';
    imgre = re.compile(reg)
    imglist = imgre.findall(html)
    names = getNames(html)
    for index in range(len(imglist)):
        print imglist[index]
        urllib.urlretrieve(imglist[index], '%s.jpg' % names[index].decode('utf-8'))

#<span>2號女嘉賓—許維君</span>
#<span>第609期5號男嘉賓—翟旭龍</span>
def getNames(html):
    reg = r'<span>.*?嘉賓—(.+?)</span>'
    namereg = re.compile(reg)
    names = namereg.findall(html)
    for name in names:
        print name
    return names

html = getHtml("http://tv.jstv.com/fcwr/episode/1489737583149.shtml");
getImg(html)

有什么問題和想法歡迎與我聯系，大家多多溝通，互相學習，如果覺得不錯，也歡迎點贊，你的肯定是我努力和堅持的動力。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

python爬蟲爬取非誠勿擾女嘉賓照片

python爬蟲爬取非誠勿擾女嘉賓照片

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

python爬蟲爬取非誠勿擾女嘉賓照片

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频