1. 作业1

  • 目标网站:https://www.1ppt.com/moban/
  • 爬取要求:
    • 1、 翻页爬取这个网页上面的源代码
    • 2、 并且保存到本地,注意编码

代码:
import urllib.request
import urllib.parse

https://www.1ppt.com/moban/ppt_moban_3.html
# https://www.1ppt.com/moban/ppt_moban_=1
# https://www.1ppt.com/moban/ppt_moban_1.html

headers = {
‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ‘
‘Chrome/95.0.4638.69 Safari/537.36 ‘
}
begin = int(input(‘请输入开始页:’))
end = int(input(‘请输入结束页:’))

for i in range(begin, end + 1):
url1 = ‘https://www.1ppt.com/moban/'+'ppt_moban_'+str(i)+'.html

  1. print(url1)<br /> req = urllib.request.Request(url1, headers=headers)<br /> res = urllib.request.urlopen(req)<br /> html = res.read().decode('gb2312')<br /> with open(f'第{i}页.html', 'w', encoding='gb2312') as f:<br /> f.write(html)