方法一:在meta中设置

我们可以直接在自己具体的爬虫程序中设置proxy字段,直接在构造Request里面加上meta字段即可:

  1. # -*- coding: utf-8 -*-
  2. import scrapy
  3. class Ip138Spider(scrapy.Spider):
  4. name = 'ip138'
  5. allowed_domains = ['ip138.com']
  6. start_urls = ['http://2020.ip138.com']
  7. def start_requests(self):
  8. for url in self.start_urls:
  9. yield scrapy.Request(url, meta={'proxy': 'http://163.125.69.29:8888'}, callback=self.parse)
  10. def parse(self, response):
  11. print("response text: %s" % response.text)
  12. print("response headers: %s" % response.headers)
  13. print("response meta: %s" % response.meta)
  14. print("request headers: %s" % response.request.headers)
  15. print("request cookies: %s" % response.request.cookies)
  16. print("request meta: %s" % response.request.meta)

方法二:在中间件中设置

中间件middlewares.py的写法如下:

  1. # -*- coding: utf-8 -*-
  2. class ProxyMiddleware(object):
  3. def process_request(self, request, spider):
  4. request.meta['proxy'] = "http://proxy.your_proxy:8888"

这里有两个问题:

  • 一是proxy一定是要写号http://前缀的否则会出现to_bytes must receive a unicode, str or bytes object, got NoneType的错误。
  • 二是官方文档中写到process_request方法一定要返回request对象,response对象或None的一种,但是其实写的时候不用return

另外如果代理有用户名密码等就需要在后面再加上一些内容:

  1. # Use the following lines if your proxy requires authentication
  2. proxy_user_pass = "USERNAME:PASSWORD"
  3. # setup basic authentication for the proxy
  4. encoded_user_pass = base64.encodestring(proxy_user_pass)
  5. request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

如果想要配置多个代理,可以在配置文件中添加一个代理列表:

  1. PROXIES = [
  2. '163.125.69.29:8888'
  3. ]

然后在中间件中引入:

  1. # -*- coding: utf-8 -*-
  2. import random
  3. from ip138_proxy.settings import PROXIES
  4. class ProxyMiddleware(object):
  5. def process_request(self, request, spider):
  6. request.meta['proxy'] = "http://%s" % random.choice(PROXIES)
  7. return None

settings.pyDOWNLOADER_MIDDLEWARES中开启中间件:

  1. DOWNLOADER_MIDDLEWARES = {
  2. 'myCrawler.middlewares.ProxyMiddleware': 1,
  3. }

运行程序,可以看到设置的代理生效了:

📃 在爬虫程序中设置代理 - 图1

如果想要找到一些免费的代理可以到快代理中寻找。

参考资料