上一章节有介绍请求头,这节我们就来学习如何自己定义访问网址的请求头
**如果要加入Headers等信息,就需要利用Request类来构造请**求
随机User-Agent
import urllib.request
import random #为了随机选择 User-Agent
url = 'http://www.baidu.com/'
user_agent_list = [
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
"Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
"Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSE "
]
random_list = random.choice(user_agent_list) #随机选择 User-Agent
request = urllib.request.Request(url) #唯有如此方可添加请求头headers
request.add_header('User-Agent',random_list) #切记格式
url_data = urllib.request.urlopen(request).read().decode('utf-8')
print(url_data)
自定义headers
import urllib.request
url = 'http://www.baidu.com/'
user_agent_list = {
'User-Agent':"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
}
url_data = urllib.request.Request(url=url,headers=user_agent_list)
fin_data = urllib.request.urlopen(url_data).read().decode('utf-8')
print(fin_data)
随机headers
大家仔细观察上面两个案例,会发现区别。如果我们想在自定义header里面随机会发现实现不了,只能够像案例1提前写好字典库,然后利用Rquest来实现随机,由于视频中没有讲解这种方法,我也是随便一想想到这里,于是我决定实现,也不知道能否成功
import urllib.request
import random
url = 'http://www.baidu.com/'
user_agent_list = [
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
"Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
"Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSE "
]
accept_language_list = [
'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5'
]
random_useragentlist = random.choice(user_agent_list)
random_acceptlanguagelist = random.choice(accept_language_list)
request = urllib.request.Request(url)
request.add_header('User-Agent',random_useragentlist)
request.add_header('Accept-Language',random_acceptlanguagelist)
fin_data = urllib.request.urlopen(request).read().decode('utf-8')
print(fin_data)
从结果上来没有报错,但很遗憾没能够证明是否可行,因为在返回数据包中并没有关键字Accept-Language
以至于我无法识别是否可行,后面认识到上面工具后再来验证吧