1,基礎(chǔ)知識(shí)
- 利用pymongo庫(kù)進(jìn)行python與mongoDB數(shù)據(jù)庫(kù)的連接
import pymongo
client = pymongo.MongoClient('localhost', 27017)
walden = client['walden'] #創(chuàng)建表格文件
sheet_lines = walden['sheet_tag'] #創(chuàng)建表格中的sheet
使用find()函數(shù)展示數(shù)據(jù)庫(kù)中數(shù)據(jù)
$lt, $lte, $gt, $gte, $ne
分別對(duì)應(yīng)
<, <=, >, >=, !=
l == less; g ==greater; e == equal; n == not
2,practice
爬取小豬租房中前三頁(yè)的房源信息,并篩選出價(jià)格高于500RMB的房源
The Code:
import pymongo, requests, time
from bs4 import BeautifulSoup
client = pymongo.MongoClient('localhost', 27017)
walden = client['2_1homework']
sheet_lines = walden['2_1homework']
urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format(i) for i in range(1, 4)]
def get_details(url, data = None):
wb_data = requests.get(url)
soup = BeautifulSoup(wb_data.text, 'lxml')
titles = soup.select('#page_list > ul > li > div.result_btm_con.lodgeunitname > div > a > span')
prices = soup.select('#page_list > ul > li > div > span.result_price > i')
#print(titles, prices)
for i in range(len(titles)):
index = i
title = titles[i].get_text()
price = prices[i].get_text()
data = {
'index' : index,
'title' : title,
'price' : float(price)
}
#print(index, title, price)
sheet_lines.insert_one(data)
def find_price(url, data = None):
for item in sheet_lines.find({'price': {'$gte' : 500}}):
print(item['title'])
for url_single in urls:
get_details(url_single)
find_price(url_single)
time.sleep(2)
3, 總結(jié)與反思
需要注意的幾點(diǎn):
- 如何將數(shù)據(jù)插入數(shù)據(jù)庫(kù)
- 字典的創(chuàng)建
Practice makes perfect!