安装库
pip install beautifulsoup4
测试

>>> import requests>>> r = requests.get("http://python123.io/ws/demo.html")>>> r.text>>> demo = r.text>>> from bs4 import BeautifulSoup>>> soup = BeautifulSoup(demo, "html.parser")>>> print(soup.prettify())<html> <head> <title> This is a python demo page </title> </head> <body> <p class="title"> <b> The demo python introduces several python courses. </b> </p> <p class="course"> Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses: <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1"> Basic Python </a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2"> Advanced Python </a> . </p> </body></html>
一般使用格式
from bs4 import BeautifulSoupsoup = BeautifulSoup(demo, 'html.parser')print(soup.prettify())
Beautiful Soup库的理解

- Beautiful Soup库是解析、遍历、维护“标签树”的功能库
库引用
Beautiful Soup类的理解
Beautiful Soup库的解析器

Beautiful Soup基本元素
Tag标签
>>> from bs4 import BeautifulSoup>>> soup = BeautifulSoup(demp, 'html.parser')>>> soup.title<title>This is a python demo page</title>>>> tag = soup.a>>> tag # 返回第一个a标签<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>
Tag的name(名字)
>>> soup.a.name'a'>>> soup.a.parent.name'p'>>> soup.a.parent.parent.name'body'
Tag的attrs(属性)
>>> tag = soup.a>>> tag.attrs{'href': 'http://www.icourse163.org/course/BIT-268001', 'class': ['py1'], 'id': 'link1'}>>> tag.attrs['class']['py1']>>> tag.attrs['href']'http://www.icourse163.org/course/BIT-268001'>>> type(tag)bs4.element.Tag>>> type(tag.attrs)dict
Tag的NavigableString
>>> soup.p <p class="title"><b>The demo python introduces several python courses.</b></p>>>> soup.p.string'The demo python introduces several python courses.'>>> type(soup.p.string)bs4.element.NavigableString
>>> newsoup = BeautifulSoup("<b><!--This is a comment--></b><p>This is a comment</p>", "html.parser")>>> newsoup.b.string'This is a comment'>>> type(newsoup.b.string)bs4.element.Comment>>> newsoup.p.string'This is a comment'>>> type(newsoup.p.string)bs4.element.NavigableString