Gunicorn prefork流程

python 中怎么实现的?
用的知识,和简单的思路。
下面是阅读Gunicorn源码之后,实现的一个简单的 pre-fork 程序。
# -*- coding: utf-8 -*-#master-slaves.py python2.7.x#orangleliu@gmail.com'''简单的模拟pre-fork模式,master进程控制多个子进程这里实现这么几个信号INT ctrl+c 退出TTIN 增加一个workerTTOU 减少一个worker'''import osimport sysimport signalimport timeimport randomclass Worker(object):'''子进程要实现一些特定的信号来响应外界和父进程的操作'''def run(self):while True:time.sleep(3)class Master(object):WORKERS = {}SIG_QUEUE = []SIGNALS = [getattr(signal, "SIG%s" % x)for x in "INT TTIN TTOU".split()]SIG_NAMES = dict((getattr(signal, name), name[3:].lower()) for name in dir(signal)if name[:3] == "SIG" and name[3] != "_")def __init__(self, worker_nums=2):self.worker_nums = worker_numsself.master_name = "Master"self.reexec_pid = 0def start(self):print "start master"self.pid = os.getpid()self.init_signals()def init_signals(self):[signal.signal(s, self.signal) for s in self.SIGNALS]signal.signal(signal.SIGCHLD, self.handle_chld)def signal(self, sig, frame):'''普通的信号发生的时候,往信号队列增加一个信号'''if len(self.SIG_QUEUE) < 5:self.SIG_QUEUE.append(sig)def run(self):self.start()try:self.manage_workers()while True:# 如果不增加sleep 整个master进程就会进入几乎100 cpu的状态# 使用sleep的好处就是master的cpu消耗小很多,对于来自系统的给master的信号可以即使反馈time.sleep(1)sig = self.SIG_QUEUE.pop(0) if len(self.SIG_QUEUE) else Noneif sig is None:self.manage_workers()continueif sig not in self.SIG_NAMES:print "unknow signals:%s"%sigcontinuesigname = self.SIG_NAMES.get(sig)handler = getattr(self, "handle_%s"%signame, None)if not handler:print "Unhandler signal: %s"%signamecontinuehandler()except StopIteration:self.halt()except KeyboardInterrupt:self.halt()except SystemExit:passexcept Exception as e:print eself.stop()sys.exit(-1)def handle_chld(self, sig, frame):'''对于子进程退出SIGCHLD信号处理,防止产生大量僵尸进程'''self.reap_workers()def handle_int(self):'''ctrl+c 关闭master进程,先关闭子进程,然后抛出异常,自己退出'''self.stop()raise StopIterationdef handle_ttin(self):'''增加一个子进程'''print "add a worker"self.worker_nums += 1self.manage_workers()def handle_ttou(self):'''减少一个子进程'''print "deincrease a worker"if self.worker_nums <= 1:returnself.worker_nums -= 1self.manage_workers()def stop(self):'''停止子进程 这里都当做SIGTERM来处理'''print 'stop workers'sig = signal.SIGTERMself.kill_workers(sig)self.kill_workers(signal.SIGKILL)def halt(self, exit_status=0):'''master 进程自杀'''print "master exit"self.stop()sys.exit(exit_status)def reap_workers(self):'''这里的检测也是为了避免僵尸进程,否则大量资源无法释放参考:http://www.cnblogs.com/mickole/p/3187770.html'''try:while True:#os.waitpid 收集僵尸子进程的信息,并把它彻底销毁后返回#这里的 -1 代表所有子进程#os.WNOHANG 如果没有子进程信息就立刻返回wpid, status = os.waitpid(-1, os.WNOHANG)if not wpid:breakelse:exitcode = status >> 8worker = self.WORKERS.pop(wpid, None)if not worker:continueexcept OSError as e:#errno.ECHILD 是没有子进程错误if e.error != errno.ECHILD:raisedef manage_workers(self):'''workers 的健康检查,数量是否对齐等'''if len(self.WORKERS.keys()) < self.worker_nums:self.spawn_workers()workers = self.WORKERS.items()while len(workers) > self.worker_nums:(pid, _) = workers.pop(0)self.kill_worker(pid, signal.SIGTERM)def spawn_worker(self):worker = Worker()pid = os.fork()#master进程处理if pid != 0:self.WORKERS[pid] = workerreturn pid#worker进程处理worker_pid = os.getpid()try:worker.run()sys.exit(0)except SystemExit:raiseexcept Exception as e:print "work error %s"%str(e)sys.exit(-1)def spawn_workers(self):for i in range(self.worker_nums - len(self.WORKERS.keys())):self.spawn_worker()#为什么要那么端时间的休眠time.sleep(0.1*random.random())def kill_workers(self, sig):worker_pids = list(self.WORKERS.keys())for pid in worker_pids:self.kill_worker(pid, sig)def kill_worker(self, pid, sig):try:os.kill(pid, sig)except OSError as e:print "kill worker error: %s"%str(e)if __name__ == "__main__":Master().run()
Gunicorn worker 类型
- sync
- gthread
- eventlet
- gevent
- tornado
根据底层动作的原理可以将worker分成三种类型:
- sync:底层实际是每个请求一个process处理
- gthread:底层实际是每个请求一个thread处理
- eventlet/gevent/tarnado:底层则是利用异步IO让一个process在等待IO响应时继续处理下个请求
用 process 处理请求
使用 sync 类型的worker运行 CPU bound/IO bound的任务在性能上的表现
# views.pyfrom django.shortcuts import renderfrom django.http import HttpResponse# Create your views here.import timedef ioTask(request):time.sleep(2)return HttpResponse("IO bound task finish!\n")def cpuTask(request):for i in range(10000000):n = i * i * ireturn HttpResponse("CPU bound task finish!\n")
输出
08:31:40 (gunicorn_demo-bLt-GVNF) root@arch gdemo → siege -c 2 -r 1 http://192.168.37.145/worker/io/ -v** SIEGE 4.0.4** Preparing 2 concurrent users for battle.The server is now under siege...HTTP/1.1 200 2.00 secs: 22 bytes ==> GET /worker/io/# 下面请求开始被阻塞HTTP/1.1 200 4.00 secs: 22 bytes ==> GET /worker/io/Transactions: 2 hitsAvailability: 100.00 %Elapsed time: 4.00 secsData transferred: 0.00 MBResponse time: 3.00 secsTransaction rate: 0.50 trans/secThroughput: 0.00 MB/secConcurrency: 1.50Successful transactions: 2Failed transactions: 0Longest transaction: 4.00Shortest transaction: 2.0008:40:51 root@arch ~ → siege -c 2 -r 1 http://192.168.37.145/worker/cpu/ -v** SIEGE 4.0.4** Preparing 2 concurrent users for battle.The server is now under siege...HTTP/1.1 200 0.97 secs: 23 bytes ==> GET /worker/cpu/# 下面请求开始被阻塞HTTP/1.1 200 2.12 secs: 23 bytes ==> GET /worker/cpu/Transactions: 2 hitsAvailability: 100.00 %Elapsed time: 2.12 secsData transferred: 0.00 MBResponse time: 1.54 secsTransaction rate: 0.94 trans/secThroughput: 0.00 MB/secConcurrency: 1.46Successful transactions: 2Failed transactions: 0Longest transaction: 2.12Shortest transaction: 0.97
这种类型的好处是错误隔离高,一个 process 挂掉只会影响该 process 当下服务的请求,而不会影响其他请求。
坏处则为 process 资源开销较大,开太多 worker 时对内存或 CPU 的影响很大,因此 并发concurrency 理论上限极低。
用 thread 处理请求
当 gunicorn worker type 用 gthread 时,可额外加参数 —thread 指定每个 process 能开的 thread 数量,此时 concurrency 的上限为 worker 数量乘以给个 worker 能开的 thread 数量。
如下 gunicorn 启动时开了一个 pid 为 595 的 process 来处理请求, thread 数量为 2,理论上每次只能处理二个请求:
09:05:31 (gunicorn_demo-bLt-GVNF) root@arch gdemo → gunicorn -w 1 -k sync --thread=2 gdemo.wsgi -b 192.168.37.145:80[2018-06-24 09:05:41 +0800] [18464] [INFO] Starting gunicorn 19.8.1[2018-06-24 09:05:41 +0800] [18464] [INFO] Listening at: http://192.168.37.145:80 (18464)[2018-06-24 09:05:41 +0800] [18464] [INFO] Using worker: threads[2018-06-24 09:05:41 +0800] [18467] [INFO] Booting worker with pid: 18467[2018-06-24 09:19:59 +0800] [18464] [INFO] Handling signal: winch[2018-06-24 09:20:05 +0800] [18464] [INFO] Handling signal: winch
用 siege 分别对 IO bound task 和 CPU bound task 发出 4 个请求可以明显看到第三个请求以后才会被阻塞:
09:22:34 root@arch ~ → siege -c 4 -r 1 http://192.168.37.145/worker/io/ -v** SIEGE 4.0.4** Preparing 4 concurrent users for battle.The server is now under siege...HTTP/1.1 200 2.01 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.01 secs: 22 bytes ==> GET /worker/io/# 下面的请求开始被阻塞HTTP/1.1 200 4.02 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 4.01 secs: 22 bytes ==> GET /worker/io/Transactions: 4 hitsAvailability: 100.00 %Elapsed time: 4.02 secsData transferred: 0.00 MBResponse time: 3.01 secsTransaction rate: 1.00 trans/secThroughput: 0.00 MB/secConcurrency: 3.00Successful transactions: 4Failed transactions: 0Longest transaction: 4.02Shortest transaction: 2.0109:23:39 root@arch ~ → siege -c 4 -r 1 http://192.168.37.145/worker/cpu/ -v** SIEGE 4.0.4** Preparing 4 concurrent users for battle.The server is now under siege...HTTP/1.1 200 2.00 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 2.00 secs: 23 bytes ==> GET /worker/cpu/# 下面的请求开始被阻塞HTTP/1.1 200 3.97 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 3.97 secs: 23 bytes ==> GET /worker/cpu/Transactions: 4 hitsAvailability: 100.00 %Elapsed time: 3.97 secsData transferred: 0.00 MBResponse time: 2.99 secsTransaction rate: 1.01 trans/secThroughput: 0.00 MB/secConcurrency: 3.01Successful transactions: 4Failed transactions: 0Longest transaction: 3.97Shortest transaction: 2.00
这种类型的 worker 好处是 concurrency 理论上限会比 process 高,坏处依然是 thread 数量,OS 中 thread 数量是有限的,过多的 thread 依然会造成系统负担。
用异步IO处理每个请求
当 gunicorn worker type 用 eventlet、gevent、tarnado 等类型时,每个请求都由同一个 process 处理,而当遇到 IO 时该 process 不会等 IO 回应,会继续处理下个请求直到该 IO 完成,理论上 concurrency 无上限。
以 gevent 为例,gunicorn 启动时开了一个 pid 为 733 的 process 来处理请求:
09:44:31 (gunicorn_demo-bLt-GVNF) root@arch gdemo → gunicorn -w 1 -k gevent gdemo.wsgi -b 192.168.37.145:80[2018-06-24 09:47:35 +0800] [36301] [INFO] Starting gunicorn 19.8.1[2018-06-24 09:47:35 +0800] [36301] [INFO] Listening at: http://192.168.37.145:80 (36301)[2018-06-24 09:47:35 +0800] [36301] [INFO] Using worker: gevent[2018-06-24 09:47:35 +0800] [36304] [INFO] Booting worker with pid: 36304
用 siege 对 IO bound task 发出 10 个请求可以明显看到没有任何请求被阻塞:
10:06:55 root@arch ~ → siege -c 10 -r 1 http://192.168.37.145/worker/io/ -v** SIEGE 4.0.4** Preparing 10 concurrent users for battle.The server is now under siege...# 可以明显看到没有任何请求被阻塞HTTP/1.1 200 2.01 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.00 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.00 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.00 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.01 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.01 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.01 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.01 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.01 secs: 22 bytes ==> GET /worker/io/HTTP/1.1 200 2.01 secs: 22 bytes ==> GET /worker/io/Transactions: 10 hitsAvailability: 100.00 %Elapsed time: 2.02 secsData transferred: 0.00 MBResponse time: 2.01 secsTransaction rate: 4.95 trans/secThroughput: 0.00 MB/secConcurrency: 9.94Successful transactions: 10Failed transactions: 0Longest transaction: 2.01Shortest transaction: 2.00
但当面临 CPU bound 请求时,则会退化成用 process 处理请求一样,concurrency 上限为 worker 数量。如下用 siege 对 CPU bound task 发出 10 个请求,可以看到第二个请求以后就被阻塞:
10:07:38 root@arch ~ → siege -c 10 -r 1 http://192.168.37.145/worker/cpu/ -v** SIEGE 4.0.4** Preparing 10 concurrent users for battle.The server is now under siege...HTTP/1.1 200 0.96 secs: 23 bytes ==> GET /worker/cpu/# 下面请求开始被阻塞HTTP/1.1 200 1.90 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 2.89 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 4.16 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 5.40 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 6.38 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 7.34 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 8.30 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 9.51 secs: 23 bytes ==> GET /worker/cpu/HTTP/1.1 200 10.52 secs: 23 bytes ==> GET /worker/cpu/Transactions: 10 hitsAvailability: 100.00 %Elapsed time: 10.52 secsData transferred: 0.00 MBResponse time: 5.74 secsTransaction rate: 0.95 trans/secThroughput: 0.00 MB/secConcurrency: 5.45Successful transactions: 10Failed transactions: 0Longest transaction: 10.52Shortest transaction: 0.96
因此使用非同步类型的 worker 好处和坏处非常明显,对 IO bound task 的高效能,但在 CPU bound task 会不如 thread。
结论
当谈到效能时,必须考虑到使用情境。 gunicorn + 异步IO 效能就一定比较好的说法并不一定成立。
从上面的数据三种类型的 worker 都有其相对适合的场景:
- 当需要稳定的系统时, 用 process 处理请求可以保证一个请求的异常导致程式 crash 不会影响到其他请求。
- 当 web 服务内大部分都是 cpu 运算时,用 thread 可以提供不错的效能。
- 当 web 服务内大部分都是 io 时,用非同步 io 可以达到极高的 concurrency 数量。
附录
名词解释
websocket
是一个新协议,跟http协议基本没有关系,只是为了兼容现有浏览器的握手规范而已,也就是说它是http协议上的一种补充。
WebSocket 是一个持久化协议,相对于HTTP这种非持久的协议来说。
简单的举个例子吧,用目前应用比较广泛的PHP生命周期来解释。
- HTTP的生命周期通过Request来界定,也就是一个Request 一个Response,那么在HTTP1.0中,这次HTTP请求就结束了。
- 在HTTP1.1中进行了改进,使得有一个keep-alive,也就是说,在一个HTTP连接中,可以发送多个Request,接收多个Response。
但是请记住 Request = Response , 在HTTP中永远是这样,也就是说一个request只能有一个response。而且这个response也是被动的,不能主动发起。
跟Websocket有什么关系呢?
Websocket是基于HTTP协议的,或者说借用了HTTP的协议来完成一部分握手。
在握手阶段是一样的
首先我们来看个典型的Websocket握手(借用Wikipedia的。。)
GET /chat HTTP/1.1Host: server.example.comUpgrade: websocketConnection: UpgradeSec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==Sec-WebSocket-Protocol: chat, superchatSec-WebSocket-Version: 13Origin: http://example.com
熟悉HTTP的童鞋可能发现了,这段类似HTTP协议的握手请求中,多了几个东西。
我会顺便讲解下作用。
Upgrade: websocketConnection: Upgrade
这个就是Websocket的核心了,告诉Apache、Nginx等服务器:
注意啦,窝发起的是Websocket协议,快点帮我找到对应的助理处理~不是那个老土的HTTP。
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==Sec-WebSocket-Protocol: chat, superchatSec-WebSocket-Version: 13
首先,Sec-WebSocket-Key 是一个Base64 encode的值,这个是浏览器随机生成的,告诉服务器:泥煤,不要忽悠窝,我要验证尼是不是真的是Websocket助理。
然后,SecWebSocket-Protocol 是一个用户定义的字符串,用来区分同URL下,不同的服务所需要的协议。简单理解:今晚我要服务A,别搞错啦~
最后,Sec-WebSocket-Version 是告诉服务器所使用的Websocket Draft(协议版本),在最初的时候,Websocket协议还在 Draft 阶段,各种奇奇怪怪的协议都有,而且还有很多期奇奇怪怪不同的东西,什么Firefox和Chrome用的不是一个版本之类的,当初Websocket协议太多可是一个大难题。。不过现在还好,已经定下来啦大家都使用的一个东西 脱水:**服务员,我要的是13岁的噢→→**
然后服务器会返回下列东西,表示已经接受到请求, 成功建立Websocket啦!
HTTP/1.1 101 Switching ProtocolsUpgrade: websocketConnection: UpgradeSec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk=Sec-WebSocket-Protocol: chat
这里开始就是HTTP最后负责的区域了,告诉客户,我已经成功切换协议啦~
Upgrade: websocketConnection: Upgrade
依然是固定的,告诉客户端即将升级的是Websocket协议,而不是mozillasocket,lurnarsocket或者shitsocket。
然后,Sec-WebSocket-Accept 这个则是经过服务器确认,并且加密过后的 Sec-WebSocket-Key。服务器:好啦好啦,知道啦,给你看我的ID CARD来证明行了吧。。
后面的,Sec-WebSocket-Protocol 则是表示最终使用的协议。
至此,HTTP已经完成它所有工作了,接下来就是完全按照Websocket协议进行了。
具体的协议就不在这阐述了。
