基本格式

image.png
image.png

标签书的下行遍历

image.png

  1. >>> soup = BeautifulSoup(demo, 'html.parser')
  2. >>> soup.head
  3. [<title>This is a python demo page</title>]
  4. >>> soup.head.contents
  5. <head><title>This is a python demo page</title></head>
  6. >>> soup.body.contents
  7. ['\n',
  8. <p class="title"><b>The demo python introduces several python courses.</b></p>,
  9. '\n',
  10. <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
  11. <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>,
  12. '\n']
  13. >>> len(soup.body.contens) # body儿子结点的数量
  14. 5
  15. >>> soup.body.contents[1]
  16. <p class="title"><b>The demo python introduces several python courses.</b></p>

  • 遍历子孙节点

    1. for child in soup.body.children:
    2. print(child)

    标签树的上行遍历

    image.png

    1. >>> soup = BeautifulSoup(demo, 'html.parser')
    2. >>> soup.title.parent
    3. <head><title>This is a python demo page</title></head>
    4. >>> soup.html.parent
    5. <html><head><title>This is a python demo page</title></head>
    6. <body>
    7. <p class="title"><b>The demo python introduces several python courses.</b></p>
    8. <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
    9. <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
    10. </body></html>
    11. >>> soup.parent
    12. # 没有输出
  • 打印标签的父辈们

    1. >>> soup = BeautifulSoup(demo, 'html.parser')
    2. >>> for parent in soup.a.parents:
    3. if parent is None:
    4. print(parent)
    5. else:
    6. print(parent.name)
    7. p
    8. body
    9. html
    10. [document]

    标签树的平行遍历

    image.png

  • 拥有同一父节点的两个节点才可以进行平行遍历

image.png

  1. >>> soup = BeautifulSoup(demp, 'html.parser')
  2. >>> soup.a.nexxt_sibling
  3. ' and '
  4. >>> soup.a.next_sibling.next_sibling
  5. <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>
  6. >>> soup.a.previous_sibling.previous_siblig
  7. # 没有输出
  8. >>> soup.a.parent # 是p标签
  9. <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
  10. <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
  • 遍历后续节点

    1. for sibling in soup.a.next_siblings:
    2. print(sibling)
  • 遍历前续节点

    1. for sibling in soup.a.previous_siblings:
    2. print(sibling)