‘’’
作业1(2022-3-26)
目标网站 https://www.kugou.com/yy/html/rank.html
爬取要求:
1)获取到榜单页面的源码
2)用正则解析数据,获取到该页面所有歌曲的名字(包括歌手)和页面链接
3)把数据保存到csv
‘’’
import csv
import requests
import re
url = 'https://www.kugou.com/yy/html/rank.html'
# headers = {
# 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
# 'cookie': 'kg_mid=c0814cdf9ee4c9df77b0e0d2486cb007; kg_dfid=3Mo1rO4YVm0Q0MK8bX1DCht4; KuGooRandom=66451647516298467; kg_mid_temp=c0814cdf9ee4c9df77b0e0d2486cb007; kg_dfid_collect=d41d8cd98f00b204e9800998ecf8427e; ACK_SERVER_10015=%7B%22list%22%3A%5B%5B%22bjlogin-user.kugou.com%22%5D%5D%7D; ACK_SERVER_10016=%7B%22list%22%3A%5B%5B%22bjreg-user.kugou.com%22%5D%5D%7D; ACK_SERVER_10017=%7B%22list%22%3A%5B%5B%22bjverifycode.service.kugou.com%22%5D%5D%7D; Hm_lvt_aedee6983d4cfc62f509129360d6bb3d=1647516298,1648307473; Hm_lpvt_aedee6983d4cfc62f509129360d6bb3d=1648307548'
# }
# 哈哈,header也不用
res = requests.get(url)
html = res.text
result = re.match(r'.*(<div class="pc_temp_songlist pc_rank_songlist_short">.*?</div>).*',html,re.S)
ul = result.group(1)
lis = re.findall(r'<li.*?>.*?</li>', ul, re.S)
pattern = re.compile(r'<li.*?href="(.*?)".*?title="(.*?)".*?</li>', re.S)
data = []
for i in lis:
r = pattern.match(i)
l1 = r.group(2).split('-')
if len(l1) > 2:
auther = '-'.join(l1[0:-1])
else:
auther = l1[0]
song = l1[-1]
link = r.group(1)
data.append((song, auther, link))
with open("song_list.csv", "w", encoding='utf-8', newline='') as f:
writer = csv.writer(f)
writer.writerow(["歌名", "歌手", "链接"]) #写表头
writer.writerows(data)
附文件内容:
歌名,歌手,链接
追光旅行,井迪儿 ,https://www.kugou.com/mixsong/6h8lsc44.html
晚风心里吹,阿梨粤 ,https://www.kugou.com/mixsong/6g3al1ce.html
Bet On Me,Walk Off the Earth、D Smoke ,https://www.kugou.com/mixsong/6d786s37.html
如果我是他,王不醒 ,https://www.kugou.com/mixsong/6g7rqw07.html
像极了,永彬Ryan.B ,https://www.kugou.com/mixsong/3vn8ode7.html
调查中,糯米Nomi ,https://www.kugou.com/mixsong/6bwjtz7e.html
Time to Pretend (伪装时刻),Lazer Boomerang ,https://www.kugou.com/mixsong/3j9z7le8.html
Lose Control (Explicit),Hedley ,https://www.kugou.com/mixsong/47zgnod6.html
最美的瞬间,真瑞 ,https://www.kugou.com/mixsong/4hj34t74.html
就忘了吧,1K ,https://www.kugou.com/mixsong/6dd12c39.html
带我去找夜生活,告五人 ,https://www.kugou.com/mixsong/41nw8x64.html
天若有情,A-Lin ,https://www.kugou.com/mixsong/nw721d3.html
起风了,买辣椒也用券 ,https://www.kugou.com/mixsong/4igk4d9f.html
Wake (Studio Version),Hillsong Young & Free ,https://www.kugou.com/mixsong/hecel90.html
一吻天荒 (热血版),阿禹ayy ,https://www.kugou.com/mixsong/6gch262d.html
Normal No More (Explicit),Tysm ,https://www.kugou.com/mixsong/46zyql8f.html
美人鱼 (女声版),夏奈 ,https://www.kugou.com/mixsong/4r50lu4b.html
玫瑰窃贼,柳爽 ,https://www.kugou.com/mixsong/5q84sc85.html
月光不答,Y-D、闭文思 ,https://www.kugou.com/mixsong/6c73hf60.html
剑魂 (鱼多余版),鱼多余呀 ,https://www.kugou.com/mixsong/6crp8pa8.html
阿拉斯加海湾,蓝心羽 ,https://www.kugou.com/mixsong/4p4uihc1.html
春泥 (女版),旺仔小乔 ,https://www.kugou.com/mixsong/6fchtaa8.html