苍井空办公室33分钟,成av人片在线观看www,女人自己扒荫道口自慰

lxml解析

from lxml import etree
text='''
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a  class="sister" id="link1"><!-- Elsie --></a>,
<a  class="sister" id="link2">Lacie</a> and
<a  class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
'''

html=etree.HTML(text)
#讀取文件
#html=etree.parse('test.html')
result=etree.tostring(html)
print(result)

輸出結果，補全了html的標簽

<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a  class="sister" id="link1"><!-- Elsie --></a>,
<a  class="sister" id="link2">Lacie</a> and
<a  class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
</body></html>

獲取a標簽和a的class

print html.xpath('//a')
#[<Element a at 0x10bdc0cb0>, <Element a at 0x10bdc0c68>, <Element a at 0x10bdc0b90>]
print html.xpath('//a/@href')
#['http://example.com/elsie', 'http://example.com/lacie', 'http://example.com/tillie']

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Python工具之lxml解析html

Python工具之lxml解析html

lxml解析

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Python工具之lxml解析html

lxml解析

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频