BeautifulSoup常用集錦

官方文檔

 htmlDoc = """
 <html><head><title>The Dormouse's story</title></head>
 <body>
 <p class="title"><b>The Dormouse's story</b></p>
 <p class="story">Once upon a time there were three little sisters; and their names were
 <a  class="sister" id="link1">Elsie</a>,
 <a  class="sister" id="link2">Lacie</a> and
 <a  class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p>
 <p class="story">...</p>"""

調用庫

from bs4 import BeautifulSoup #Python3

soup = BeautifulSoup(htmlDoc)

查找元素

下面每行代碼都是等價的方法,返回結果也都一樣

 ps1 = soup('p') //返回所有<p></p>
 ps2 = soup.find_all('p')

結構化輸出

1.僅獲取文本
 print(soup.get_text())
 # The Dormouse's story
 #
 # The Dormouse's story
 #
 # Once upon a time there were three little sisters; and their names were
 # Elsie,
 # Lacie and
 # Tillie;# and they lived at the bottom of a well.
 #
 # ...
1. href
for link in soup.find_all('a'): 
    print(link.get('href')) 
    # http://example.com/elsie 
    # http://example.com/lacie 
    # http://example.com/tillie

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • Android 自定義View的各種姿勢1 Activity的顯示之ViewRootImpl詳解 Activity...
    passiontim閱讀 173,544評論 25 708
  • 突然看到一段視頻,是王璐丹參加跨界歌王時的表演。 已經忘記了自己什么時候看的劇,當年,米萊站在臺上...
    初之sweety閱讀 557評論 0 1
  • 文/細嗅薔薇88 熏風癡又傻, 只顧戲飛花。 未知離別意, 空枝自嗟呀。
    夜雨殘燈閱讀 213評論 0 1
  • 2017.3.7 電梯關門,兩人獨處,一男一女,彼此都在盡力避免四目相對; 電梯開門, 兩人逃出,一前一后,從此,...
    大路上的小丑閱讀 251評論 0 0