上一章节有介绍请求头,这节我们就来学习如何自己定义访问网址的请求头

**如果要加入Headers等信息,就需要利用Request类来构造请**求

随机User-Agent

  1. import urllib.request
  2. import random #为了随机选择 User-Agent
  3. url = 'http://www.baidu.com/'
  4. user_agent_list = [
  5. "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
  6. "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
  7. "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
  8. "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
  9. "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
  10. "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
  11. "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
  12. "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
  13. "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSE "
  14. ]
  15. random_list = random.choice(user_agent_list) #随机选择 User-Agent
  16. request = urllib.request.Request(url) #唯有如此方可添加请求头headers
  17. request.add_header('User-Agent',random_list) #切记格式
  18. url_data = urllib.request.urlopen(request).read().decode('utf-8')
  19. print(url_data)

自定义headers

  1. import urllib.request
  2. url = 'http://www.baidu.com/'
  3. user_agent_list = {
  4. 'User-Agent':"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
  5. }
  6. url_data = urllib.request.Request(url=url,headers=user_agent_list)
  7. fin_data = urllib.request.urlopen(url_data).read().decode('utf-8')
  8. print(fin_data)

随机headers

大家仔细观察上面两个案例,会发现区别。如果我们想在自定义header里面随机会发现实现不了,只能够像案例1提前写好字典库,然后利用Rquest来实现随机,由于视频中没有讲解这种方法,我也是随便一想想到这里,于是我决定实现,也不知道能否成功

  1. import urllib.request
  2. import random
  3. url = 'http://www.baidu.com/'
  4. user_agent_list = [
  5. "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
  6. "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
  7. "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
  8. "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
  9. "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
  10. "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
  11. "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
  12. "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
  13. "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSE "
  14. ]
  15. accept_language_list = [
  16. 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
  17. 'fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5'
  18. ]
  19. random_useragentlist = random.choice(user_agent_list)
  20. random_acceptlanguagelist = random.choice(accept_language_list)
  21. request = urllib.request.Request(url)
  22. request.add_header('User-Agent',random_useragentlist)
  23. request.add_header('Accept-Language',random_acceptlanguagelist)
  24. fin_data = urllib.request.urlopen(request).read().decode('utf-8')
  25. print(fin_data)

从结果上来没有报错,但很遗憾没能够证明是否可行,因为在返回数据包中并没有关键字Accept-Language

以至于我无法识别是否可行,后面认识到上面工具后再来验证吧