Python 實戰計劃學習筆記:爬取租房網站信息

學習用Python爬取租房網站內容,包括房屋的租金、地址、房東昵稱、性別、房屋圖片

Paste_Image.png

我的代碼:

import bs4
import requests
import time

heads = {
    "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
}

house_list_urls = ["http://sh.xiaozhu.com/search-duanzufang-p{}-0/".format(str(i)) for i in range(1,12)]



def get_house_info(url):
    response = requests.get(url,headers = heads)
    time.sleep(2)
    soup = bs4.BeautifulSoup(response.text,"lxml")

    title = soup.select('div.pho_info > h4 > em')[0].get_text()
    address = soup.select('div.pho_info > p')[0].get('title')
    price = soup.select('div.day_l > span')[0].get_text()
    avator = soup.select('div.member_pic > a > img')[0].get('src')
    sex = soup.select('div.member_pic > div')[0].get('class')[0]
    sex = "male" if sex == "member_ico" else "female"
    lord = soup.select("a.lorder_name")[0].get_text()

    print(title,address,price,avator,sex,lord)

def get_houses(url):
    response = requests.get(url,headers = heads)
    soup = bs4.BeautifulSoup(response.text,'lxml')
    house_list = [i.parent.get('href') for i in soup.select('img.lodgeunitpic')]
    for i in house_list:
        get_house_info(i)

for i in house_list_urls:
    get_houses(i)

總結:

  • select()返回的是list,哪怕是單個元素
  • request.get(url,headers = xxx) 注意headers有"s"
  • soup.get("class")返回的也是list
  • 從房源列表中獲取房源鏈接時,可以先定位img圖片,再用parent屬性獲得a tag
  • bs4.BeautifulSoup(response.text,'lxml') 不要忘了.text屬性

問題:

  • 為何抓取的圖片鏈接無法打開?源碼中明明是抓取的圖片鏈接
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容