基本格式
标签书的下行遍历
>>> soup = BeautifulSoup(demo, 'html.parser')
>>> soup.head
[<title>This is a python demo page</title>]
>>> soup.head.contents
<head><title>This is a python demo page</title></head>
>>> soup.body.contents
['\n',
<p class="title"><b>The demo python introduces several python courses.</b></p>,
'\n',
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>,
'\n']
>>> len(soup.body.contens) # body儿子结点的数量
5
>>> soup.body.contents[1]
<p class="title"><b>The demo python introduces several python courses.</b></p>
遍历子孙节点
for child in soup.body.children:
print(child)
标签树的上行遍历
>>> soup = BeautifulSoup(demo, 'html.parser')
>>> soup.title.parent
<head><title>This is a python demo page</title></head>
>>> soup.html.parent
<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
</body></html>
>>> soup.parent
# 没有输出
打印标签的父辈们
>>> soup = BeautifulSoup(demo, 'html.parser')
>>> for parent in soup.a.parents:
if parent is None:
print(parent)
else:
print(parent.name)
p
body
html
[document]
标签树的平行遍历
拥有同一父节点的两个节点才可以进行平行遍历
>>> soup = BeautifulSoup(demp, 'html.parser')
>>> soup.a.nexxt_sibling
' and '
>>> soup.a.next_sibling.next_sibling
<a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>
>>> soup.a.previous_sibling.previous_siblig
# 没有输出
>>> soup.a.parent # 是p标签
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
遍历后续节点
for sibling in soup.a.next_siblings:
print(sibling)
遍历前续节点
for sibling in soup.a.previous_siblings:
print(sibling)