1.作业一

  • 目标网站:https://www.kugou.com/yy/html/rank.html
  • 爬取要求:
    • 1、获取到榜单页面的源码
    • 2、用正则解析数据,获取到该页面所有歌曲的名字(包括歌手)和页面链接
    • 3、把数据保存到csv

代码
import requests
import re

url = ‘https://www.kugou.com/yy/html/rank.html

headers = {
‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ‘
‘Chrome/95.0.4638.69 Safari/537.36 ‘
}
response = requests.get(url, headers=headers)
response.content.decode(‘utf-8’)
text = response.text
# print(text)
title = re.findall(‘title=”(.?)” data-index’,text)[1:]
href = re.findall(‘href=”(https://www.kugou.com/mixsong/.
?)”‘,text)
# data = []
for song, Song_link in zip(title, href):
singer,name = str(song).split(‘-‘)
print(f’歌手:{singer},歌曲名称:{name}, 歌曲链接:{h}’)