Python实战 - gunicorn-生产环境下的WSGI容器 - 《Python实战之路》

概述
WSGI 服务器
gunicorn 介绍
gunicorn 实战

概述

如果你用过 Python 开发 Web 应用，例如利用 Flask 框架开发过 Web 应用的话，那么，你在启动 Python Web 服务时，一定看到过如下提示：

 * Serving Flask app "flask_server" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

看到 WARNING 信息了吧~
提示这是一个开发服务器，不要直接用于生产环境部署，在生产环境下应该使用 WSGI 服务器。
那么，什么是 WSGI 服务器呢？
本文要介绍的 Gunicorn 就是其中的佼佼者。

WSGI 服务器

Web服务器网关接口（Python Web Server Gateway Interface，缩写为WSGI）是为Python语言定义的Web服务器和Web应用程序或框架之间的一种简单而通用的接口)。自从WSGI被开发出来以后，许多其它语言中也出现了类似接口。
WSGI 区分了两个部分：

服务器/网关
应用程序/框架

在处理一个WSGI请求时，服务器会为应用程序提供环境信息及一个回调函数（Callback Function）。当应用程序完成处理请求后，透过前述的回调函数，将结果回传给服务器。

所谓的“WSGI 中间件”同时实现了API的两方，因此可以在WSGI服务器和WSGI应用之间起调解作用：从Web服务器的角度来说，中间件扮演应用程序，而从应用程序的角度来说，中间件扮演服务器。“中间件”组件可以执行以下功能：

重写环境变量后，根据目标URL，将请求消息路由到不同的应用对象。
允许在一个进程中同时运行多个应用程序或应用框架。
负载均衡和远程处理，通过在网络上转发请求和响应消息。
进行内容后处理，例如应用XSLT样式表。

gunicorn 介绍

Gunicorn 是一个用于 UNIX 的 Python WSGI HTTP 服务器，这是一个 pre-fork worker 模型。
Gunicorn 服务器与各种 Web 框架广泛兼容，实现简单，服务器资源消耗少，速度相当快。

gunicorn 实战

安装

开始 gunicorn 学习的第一步，我们还是来安装 gunicorn 。和大部分的 Python 库一样，gunicorn 的安装非常简单，只需要使用 pip 安装即可：

pip install gunicorn

QuickStart

下面，我们来用一个最简示例演示一下 gunicorn 的使用：
编写一个最简单的应用程序 myapp.py 文件：

def app(environ, start_response):
    data = b"Hello, World!\n"
    start_response("200 OK", [
        ("Content-Type", "text/plain"),
        ("Content-Length", str(len(data)))
    ])
    return iter([data])

可以看到，其中定义了一个 app 的函数。
下面，我们可以直接使用 gunicorn 来启动对应的 WEB 服务了：

gunicorn -w 4 myapp:app
# [2021-09-15 11:51:01 +0800] [10495] [INFO] Starting gunicorn 20.1.0
# [2021-09-15 11:51:01 +0800] [10495] [INFO] Listening at: http://127.0.0.1:8000 (10495)
# [2021-09-15 11:51:01 +0800] [10495] [INFO] Using worker: sync
# [2021-09-15 11:51:01 +0800] [10497] [INFO] Booting worker with pid: 10497
# [2021-09-15 11:51:01 +0800] [10498] [INFO] Booting worker with pid: 10498
# [2021-09-15 11:51:01 +0800] [10499] [INFO] Booting worker with pid: 10499
# [2021-09-15 11:51:01 +0800] [10500] [INFO] Booting worker with pid: 10500

此时，我们可以打开浏览器，访问 http://127.0.0.1:8000 来看一下：

好了，Web 应用服务器已经可以正常运行了。

Flask 应用集成

首先，我们准备一个简单的 Flask 应用程序代码：flask_demo.py

from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello_world():
    return "<p>Hello, World!</p>"
if __name__ == '__main__':
    """
    # 主程序
    """
    app.run(
        host="0.0.0.0", port=8080
    )

此时，我们可以直接使用 gunicorn 来启动应用：

gunicorn -w 4 -b 127.0.0.1:4000 flask_demo:app
# [2021-09-15 12:41:53 +0800] [13030] [INFO] Starting gunicorn 20.1.0
# [2021-09-15 12:41:53 +0800] [13030] [INFO] Listening at: http://127.0.0.1:4000 (13030)
# [2021-09-15 12:41:53 +0800] [13030] [INFO] Using worker: sync
# [2021-09-15 12:41:53 +0800] [13032] [INFO] Booting worker with pid: 13032
# [2021-09-15 12:41:53 +0800] [13033] [INFO] Booting worker with pid: 13033
# [2021-09-15 12:41:53 +0800] [13034] [INFO] Booting worker with pid: 13034
# [2021-09-15 12:41:53 +0800] [13035] [INFO] Booting worker with pid: 13035

此时，服务已经可以正常启动起来了。

配置文件说明

Gunicorn 服务器在启动时，会依次从上到下尝试读取相关配置（优先级依次降低）：

命令行参数。
环境变量GUNICORN_CMD_ARGS中设置的命令行参数。
在当前工作目录中或使用命令行参数指定的可选配置文件gunicorn.conf.py。
从特定于框架的配置文件中读取配置。
读取基础环境变量。

Ps: gunicorn 可以使用如下命令来查询当前解析到的配置:

gunicorn --print-config APP_MODULE

关于 gunicorn 支持哪些命令行参数，可以使用如下命令进行查询：

gunicorn -h

而 gunicorn 的配置文件则是一个以 .py 为扩展名的 Python 源文件（默认为 gunicorn.conf.py），在每次启动 Gunicorn 服务时，该配置文件都会被执行，示例如下：

import multiprocessing
bind = "127.0.0.1:8000"
workers = multiprocessing.cpu_count() * 2 + 1

下面，我们给出一个常用的 Gunicorn 的配置文件示例：

import multiprocessing
bind = "0.0.0.0:8000"
workers = multiprocessing.cpu_count() * 2 + 1
pidfile = "gunicorn.pid"
daemon = "true"
accesslog = 'gunicorn_acess.log'
errorlog = 'gunicorn_error.log'
loglevel = 'warning'
access_log_format = '%(t)s %(p)s %(h)s "%(r)s" %(s)s %(L)s %(b)s %(f)s" "%(a)s"'

nginx 代理

在生产环境下，我们强烈建议将 Gunicorn 作为一个反向代理的后端服务。
例如，Gunicorn 前应该部署一个 Nginx 来接收请求流量。
一个示例的 Nginx 配置如下：

worker_processes 1;
user nobody nogroup;
error_log  /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
  worker_connections 1024; # increase if you have lots of clients
  accept_mutex off; # set to 'on' if nginx worker_processes > 1
  # 'use epoll;' to enable for Linux 2.6+
}
http {
  include mime.types;
  default_type application/octet-stream;
  access_log /var/log/nginx/access.log combined;
  sendfile on;
  upstream app_server {
    # fail_timeout=0 means we always retry an upstream even if it failed to return a good HTTP response
    # for UNIX domain socket setups
    server unix:/tmp/gunicorn.sock fail_timeout=0;
    # for a TCP configuration
    # server 192.168.0.7:8000 fail_timeout=0;
  }
  server {
    # if no Host match, close the connection to prevent host spoofing
    listen 80 default_server;
    return 444;
  }
  server {
    # use 'listen 80 deferred;' for Linux
    # use 'listen 80 accept_filter=httpready;' for FreeBSD
    listen 80;
    client_max_body_size 4G;
    # set the correct host(s) for your site
    server_name example.com www.example.com;
    keepalive_timeout 5;
    # path for static files
    root /path/to/app/current/public;
    location / {
      # checks for static file, if not found proxy to app
      try_files $uri @proxy_to_app;
    }
    location @proxy_to_app {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Host $http_host;
      # we don't want nginx trying to do something clever with
      # redirects, we set the Host: header above already.
      proxy_redirect off;
      proxy_pass http://app_server;
    }
    error_page 500 502 503 504 /500.html;
    location = /500.html {
      root /path/to/app/current/public;
    }
  }
}

信号处理

接下来，我们介绍一下 Gunicorn 服务常用的一些接收信号，主要围绕它的 Master 进程来说明：

QUIT/INT：快速退出
TERM：优雅退出
HUP：重新加载配置，使用新配置启动新的工作进程并优雅地关闭旧的工作进程。如果应用程序没有预加载（使用 preload_app 选项），Gunicorn 也会加载它的新版本。
USR1：重新打开一个新的日志文件

例如，我们更新了应用程序的代码后，可以执行如下命令来重新启动服务：

kill -HUP `cat gunicorn.pid`