一、背景

视频下载的套路基本同图片下载一样,甚至更简单。
本文通过代码实现下载电视剧《赘婿》:https://www.okzyw.net/?m=vod-detail-id-71448.html
思路:搜索——>解析——>下载。
image.png

1.搜索

  • F12检查元素,打开网络选项卡,搜索《赘婿》,可以发现向服务器POST了关键词‘赘婿’

image.png

  • 继续查看搜索结果,搜索结果保存在html的赘婿更新至14集
  • image.png
  • 编写get_url()获取电视剧的url ```python import requests from bs4 import BeautifulSoup

search_key=’赘婿’ search_url=’https://www.okzyw.net/index.php‘ search_params={ ‘m’:’vod-search’ } search_headers={ ‘user-agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.68’, ‘referer’:’https://www.okzyw.net/‘, ‘origin’:’https://www.okzyw.net‘ } search_data={ ‘wd’: search_key, ‘submit’: ‘search’ }

req=requests.post(url=searchurl,params=search_params,headers=search_headers,data=search_data) req.encoding=’utf-8’ server=’https://www.okzyw.net‘ search_html=BeautifulSoup(req.text,’lxml’) search_spans=search_html.find_all(‘span’,class=’xing_vb4’)

for span in search_spans: name=span.a.string url=server+span.a.get(‘href’)
print(name,’:’,url)

  1. <a name="4fAWG"></a>
  2. ### 2.解析
  3. - 解析搜索结果,获取每集的下载链接。
  4. url:[https://www.okzyw.net/?m=vod-detail-id-71448.html](https://www.okzyw.net/?m=vod-detail-id-71448.html)
  5. - 可以发现有两种下载格式,我们已下载m3u8为例。
  6. ![image.png](https://cdn.nlark.com/yuque/0/2021/png/2983153/1613982041909-92821a4a-8bdb-440d-8282-492fdb7b9984.png#align=left&display=inline&height=393&margin=%5Bobject%20Object%5D&name=image.png&originHeight=786&originWidth=1635&size=189186&status=done&style=none&width=817.5)
  7. - 可以发现,下载链接都存放在<div id="2">的input标签中,下面提取所有下载链接
  8. <a name="0af5c359"></a>
  9. ### ![image.png](https://cdn.nlark.com/yuque/0/2021/png/2983153/1613982158895-2e4ab57c-f3bf-4ff9-a23a-ef190b1a2fe5.png#align=left&display=inline&height=84&margin=%5Bobject%20Object%5D&name=image.png&originHeight=168&originWidth=767&size=14974&status=done&style=none&width=383.5)
  10. - 代码如下:
  11. ```python
  12. import requests
  13. from bs4 import BeautifulSoup
  14. url = 'https://www.okzyw.net/?m=vod-detail-id-71448.html'
  15. req = requests.get(url)
  16. bs = BeautifulSoup(req.text, 'lxml')
  17. bs_div = bs.find('div', id="2")
  18. bs_inputs = bs_div.find_all('input')
  19. num = 1
  20. for each in bs_inputs:
  21. if 'm3u8' in each.get('value'):
  22. url = each.get('value')
  23. print(num, ':', url)
  24. num += 1

3.下载

打开网络抓包,容易发现视频都是以ts分段视频传输的,python中的ffmpy3,及python中的FFmpeg可以处理,该模块的功能包括视频采集、视频格式转换、视频抓图、给视频加水印等。FFmpeg相关教程url
image.png

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import os
  4. import ffmpy3
  5. from multiprocessing.dummy import Pool as ThreadPool
  6. search_key='赘婿'
  7. search_url='https://www.okzyw.net/index.php'
  8. search_params={
  9. 'm':'vod-search'
  10. }
  11. search_headers={
  12. 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.68',
  13. 'referer':'https://www.okzyw.net/',
  14. 'origin':'https://www.okzyw.net'
  15. }
  16. search_data={
  17. 'wd': search_key,
  18. 'submit': 'search'
  19. }
  20. req=requests.post(url=search_url,params=search_params,headers=search_headers,data=search_data)
  21. req.encoding='utf-8'
  22. server='https://www.okzyw.net'
  23. search_html=BeautifulSoup(req.text,'lxml')
  24. search_spans=search_html.find_all('span',class_='xing_vb4')
  25. for span in search_spans:
  26. name=span.a.string
  27. tv_url=server+span.a.get('href')
  28. print(name)
  29. print(tv_url)
  30. video_dir=name
  31. if video_dir not in os.listdir(r'D:\ProgramData\Python'):
  32. os.mkdir(name)
  33. #获取每集电视的下载地址
  34. req = requests.get(tv_url)
  35. bs = BeautifulSoup(req.text, 'lxml')
  36. bs_div = bs.find('div', id="2")
  37. bs_inputs = bs_div.find_all('input')
  38. num = 1
  39. search_res={}
  40. for each in bs_inputs:
  41. if 'm3u8' in each.get('value'):
  42. ep_url = each.get('value')
  43. if ep_url not in search_res.keys():
  44. search_res[ep_url]=num
  45. print(num, ':', ep_url)
  46. num += 1
  47. #开始下载
  48. def download_video(ep_url):
  49. num=search_res[ep_url]
  50. name=os.path.join(video_dir,'第{0}集.mp4'.format(num))
  51. ffmpy3.FFmpeg(inputs={ep_url: None}, outputs={name:None}).run()
  52. # 开10个线程池
  53. pool = ThreadPool(10)
  54. results = pool.map(download_video, search_res.keys())
  55. pool.close()
  56. pool.join()