lxml库

  1. from lxml import etree
  2. html = etree.HTML(text)
  3. # 也可进行格式解析
  4. html = etree.HTML('test.html',etree.HTMLParser())

而后可进行html.xpath