1.作业一

1.作业一

目标网站：https://www.kugou.com/yy/html/rank.html
爬取要求：
- 1、获取到榜单页面的源码
- 2、用正则解析数据，获取到该页面所有歌曲的名字(包括歌手)和页面链接
- 3、把数据保存到csv

代码
import requests
import re

url = ‘https://www.kugou.com/yy/html/rank.html‘

headers = {
‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ‘
‘Chrome/95.0.4638.69 Safari/537.36 ‘
}
response = requests.get(url, headers=headers)
response.content.decode(‘utf-8’)
text = response.text
# print(text)
title = re.findall(‘title=”(.?)” data-index’,text)[1:]
href = re.findall(‘href=”(https://www.kugou.com/mixsong/.?)”‘,text)
# data = []
for song, Song_link in zip(title, href):
singer,name = str(song).split(‘-‘)
print(f’歌手：{singer},歌曲名称：{name}, 歌曲链接：{h}’)

22期爬虫作业

第五次作业02.21

1.作业一