一、背景
视频下载的套路基本同图片下载一样,甚至更简单。
本文通过代码实现下载电视剧《赘婿》:https://www.okzyw.net/?m=vod-detail-id-71448.html
思路:搜索——>解析——>下载。
1.搜索
- F12检查元素,打开网络选项卡,搜索《赘婿》,可以发现向服务器POST了关键词‘赘婿’
- 继续查看搜索结果,搜索结果保存在html的赘婿更新至14集中
- 编写get_url()获取电视剧的url ```python import requests from bs4 import BeautifulSoup
search_key=’赘婿’ search_url=’https://www.okzyw.net/index.php‘ search_params={ ‘m’:’vod-search’ } search_headers={ ‘user-agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.68’, ‘referer’:’https://www.okzyw.net/‘, ‘origin’:’https://www.okzyw.net‘ } search_data={ ‘wd’: search_key, ‘submit’: ‘search’ }
req=requests.post(url=searchurl,params=search_params,headers=search_headers,data=search_data) req.encoding=’utf-8’ server=’https://www.okzyw.net‘ search_html=BeautifulSoup(req.text,’lxml’) search_spans=search_html.find_all(‘span’,class=’xing_vb4’)
for span in search_spans:
name=span.a.string
url=server+span.a.get(‘href’)
print(name,’:’,url)
<a name="4fAWG"></a>
### 2.解析
- 解析搜索结果,获取每集的下载链接。
url:[https://www.okzyw.net/?m=vod-detail-id-71448.html](https://www.okzyw.net/?m=vod-detail-id-71448.html)
- 可以发现有两种下载格式,我们已下载m3u8为例。
![image.png](https://cdn.nlark.com/yuque/0/2021/png/2983153/1613982041909-92821a4a-8bdb-440d-8282-492fdb7b9984.png#align=left&display=inline&height=393&margin=%5Bobject%20Object%5D&name=image.png&originHeight=786&originWidth=1635&size=189186&status=done&style=none&width=817.5)
- 可以发现,下载链接都存放在<div id="2">的input标签中,下面提取所有下载链接
<a name="0af5c359"></a>
### ![image.png](https://cdn.nlark.com/yuque/0/2021/png/2983153/1613982158895-2e4ab57c-f3bf-4ff9-a23a-ef190b1a2fe5.png#align=left&display=inline&height=84&margin=%5Bobject%20Object%5D&name=image.png&originHeight=168&originWidth=767&size=14974&status=done&style=none&width=383.5)
- 代码如下:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://www.okzyw.net/?m=vod-detail-id-71448.html'
req = requests.get(url)
bs = BeautifulSoup(req.text, 'lxml')
bs_div = bs.find('div', id="2")
bs_inputs = bs_div.find_all('input')
num = 1
for each in bs_inputs:
if 'm3u8' in each.get('value'):
url = each.get('value')
print(num, ':', url)
num += 1
3.下载
打开网络抓包,容易发现视频都是以ts分段视频传输的,python中的ffmpy3,及python中的FFmpeg可以处理,该模块的功能包括视频采集、视频格式转换、视频抓图、给视频加水印等。FFmpeg相关教程url
import requests
from bs4 import BeautifulSoup
import os
import ffmpy3
from multiprocessing.dummy import Pool as ThreadPool
search_key='赘婿'
search_url='https://www.okzyw.net/index.php'
search_params={
'm':'vod-search'
}
search_headers={
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.68',
'referer':'https://www.okzyw.net/',
'origin':'https://www.okzyw.net'
}
search_data={
'wd': search_key,
'submit': 'search'
}
req=requests.post(url=search_url,params=search_params,headers=search_headers,data=search_data)
req.encoding='utf-8'
server='https://www.okzyw.net'
search_html=BeautifulSoup(req.text,'lxml')
search_spans=search_html.find_all('span',class_='xing_vb4')
for span in search_spans:
name=span.a.string
tv_url=server+span.a.get('href')
print(name)
print(tv_url)
video_dir=name
if video_dir not in os.listdir(r'D:\ProgramData\Python'):
os.mkdir(name)
#获取每集电视的下载地址
req = requests.get(tv_url)
bs = BeautifulSoup(req.text, 'lxml')
bs_div = bs.find('div', id="2")
bs_inputs = bs_div.find_all('input')
num = 1
search_res={}
for each in bs_inputs:
if 'm3u8' in each.get('value'):
ep_url = each.get('value')
if ep_url not in search_res.keys():
search_res[ep_url]=num
print(num, ':', ep_url)
num += 1
#开始下载
def download_video(ep_url):
num=search_res[ep_url]
name=os.path.join(video_dir,'第{0}集.mp4'.format(num))
ffmpy3.FFmpeg(inputs={ep_url: None}, outputs={name:None}).run()
# 开10个线程池
pool = ThreadPool(10)
results = pool.map(download_video, search_res.keys())
pool.close()
pool.join()