beautiful soup
    安装一个bs4和lxml
    image.pngimage.png

    image.png
    image.png
    image.png
    image.png
    image.png

    image.png
    lxml的用法
    image.png
    代码

    1. #!/user/bin/python
    2. #-*-coding:UTF-8-*-
    3. import requests
    4. import lxml
    5. from bs4 import BeautifulSoup
    6. url='http://www.baidu.com'
    7. res=requests.get(url)
    8. html=res.text
    9. soup=BeautifulSoup(html,'lxml')#html.parser,lxmlxmlhtml5lib解析库
    10. print(soup.prettify())

    打印title标签里面的字符串

    1. #!/user/bin/python
    2. #-*-coding:UTF-8-*-
    3. import re,requests,lxml
    4. from bs4 import BeautifulSoup
    5. url='http://www,baidu,com'
    6. res=requests,get(url)
    7. html=res.text
    8. soup=beautifulsoup(html,'lxml')#html.parser,lxmlxmlhtml5lib解析库
    9. print(soup.title.string)#打印title

    image.png
    image.png

    标签节点选择嵌套

    1. #!/user/bin/python
    2. #-*-coding:UTF-8-*-
    3. import re,requests,lxml
    4. from bs4 import BeautifulSoup
    5. url='http://www,baidu,com'
    6. res=requests.get(url)
    7. html=res.text
    8. soup=beautifulsoup(html,'lxml')#html.parser,lxmlxmlhtml5lib解析库
    9. print(soup.head.title.string)#打印title#标签节点选择嵌套

    标签的属性选择

    1. #!/user/bin/python
    2. #-*-coding:UTF-8-*-
    3. import re,requests,lxml
    4. from bs4 import BeautifulSoup
    5. url='http://www,baidu,com'
    6. res=requests.get(url)
    7. html=res.text
    8. soup=beautifulsoup(html,'lxml')#html.parser,lxmlxmlhtml5lib解析库
    9. print(soup.link.['href'])#打印a标签属性的名称为href的值

    image.png
    image.png

    生成器用法
    image.png
    image.png

    兄弟节点
    上一个兄弟节点用next_sibling
    下一个兄弟节点则用previous_sibling
    方法选择器
    find()匹配第一个满足条件的
    image.png
    image.png

    find_all()查找复合所有条件的元素
    image.png
    image.png
    写入文件
    image.png
    image.png
    代码

    1. # !/usr/bin/python
    2. # -*-coding:UTF-8-*-
    3. import requests,lxml
    4. from bs4 import BeautifulSoup
    5. url='https://dns.aizhan.com/ynsqx.com/'
    6. f=open('api_c.html','w')
    7. def aizhan_A():
    8. res=requests.get(url)
    9. html=res.text
    10. soup=BeautifulSoup(html,'lxml')#html.parser,lxmlxmlhtml5lib解析库
    11. for x in soup.find_all(attrs={'class':"domain"}):#attrs为属性
    12. f.write(str(x.find_all(name='a'))+'</br>')#txt中加/n为换行,html中/br为换行
    13. aizhan_A()

    css选择器
    可以直接使用select这个函数进行选择
    image.png
    image.png