作业1

  • 目标网站:http://www.ujxsw.com/
  • 爬取要求:
    • 1、自己创建一个账号,收藏几本小说
    • 2、爬取自己书架页面的html代码(用requests模块实现)
    • 3、在爬取到的html代码里面能找到自己收藏的小说名

import requests
url = ‘http://www.ujxsw.com/modules/article/bookcase.php
headers = {
‘referer’: ‘http://www.ujxsw.com/‘,
‘Cookie’: ‘Hm_lvt_ffafa5ae2f1ca7e65cb521c271c680c5=1641993631; PHPSESSID=p712r5hn333n7bi29urmv7k0a5; username=User; _identity-frontend=4a4f9d9568f60b7f511302df0e6b6cc4974afd091c906a5b37c34038be4ac7aea:2:{i:0;s:18:”_identity-frontend”;i:1;s:19:”[108455,””,2592000]”;}; Hm_lpvt_ffafa5ae2f1ca7e65cb521c271c680c5=1641993862’,
‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36’
}
# 发起请求
html = requests.get(url, headers=headers)
# html.encoding = ‘utf-8’
# print(html.text)
# 保存到本地
# with open(‘小说2.html’,’w’,encoding=’utf-8’) as f:
# f.write(html.content.decode(‘utf-8’))
# print(‘下载成功’)

作业2

  • 目标网站:https://www.dmzj.com/
  • 爬取要求:
    • 1、到这个网站上面找一张自己喜欢的漫画里面的随便一张图片的url
    • 2、把图片爬取下来,保存到本地

import requests
url = ‘https://images.dmzj.com/img/webpic/9/1127948491614865642.jpg
headers = {
‘referer’: ‘https://www.dmzj.com/‘,
‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36’

}
# 发送get请求
img = requests.get(url,headers=headers)
with open(‘漫画图片.jpg’,’wb’) as f:
f.write(img.content)