Flask study notes (2) application deployment

This article describes how to deploy a Flask application.
Deploying Flask applications is mainly to use multi-threading and multi-processing to improve the concurrency of the interface. Let's take the following Python code (server.py) as an example for demonstration:

# -*- coding: utf-8 -*-
import time
import datetime
from flask import Flask, jsonify

app = Flask(__name__)


@app.route('/')
def hello_world():
    time.sleep(15)
    return 'Hello World!'


@app.route('/index')
def beijing():
    return 'Shanghai'


@app.route('/tell_time')
def tell_time():
    start_time_desc = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    time.sleep(5)
    end_time_desc = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    return jsonify({
    
    "start_time_desc": start_time_desc, "end_time_desc": end_time_desc})


@app.route('/tell_time/<int:_id>', methods=['GET'])
def hello_index(_id):
    start_time_desc = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    time.sleep(5)
    end_time_desc = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    return jsonify({
    
    "id": _id, "start_time_desc": start_time_desc, "end_time_desc": end_time_desc})


if __name__ == '__main__':
    app.run(host="0.0.0.0", port=5000, threaded=False)

In the above program, there are four API interfaces, described as follows:

Interface 1: /, the function is to sleep for 15s, and then output Hello World!
Interface 2: /index, the function is to output Shanghai
Interface 3: /tell_time, the function is to sleep for 5s, and output the start and end time dictionary
Interface 4: /tell_time/<int: _id>, suffix _id to distinguish urls, the function is the same as the previous interface.

If we use it python3 server.py, the application deployment scenario is single-threaded, that is, there is blocking between interfaces, that is, when we access a time-consuming interface, calling other interfaces will be blocked, affecting the normal calls of these interfaces . In this example, when we access interface 1 while accessing interface 2, there will be blocking, as shown in the figure below:

Next, we will introduce three Flask deployment methods to avoid the above-mentioned interface blocking problem and improve the concurrency of the interface.
The Flask version used in this article is 2.3.2.

Set multi-thread or multi-process

When the Flask application is running, app.run()it can accept two parameters, namely threaded and processes, which are used to enable thread support and process support.

threaded : Multi-thread support, the default is True, that is, multi-threading is enabled;
processes: the number of processes, the default is 1.

Threaded defaults to True in Flask version 2.3.2, but defaults to False in earlier versions.
If we app.run()set threaded=True or do not write it in, the application will be deployed with multi-threads, and the code adjustments are as follows:

	app.run(host="0.0.0.0", port=5000, threaded=True)

At this time, when we access interface 1 and interface 2 at the same time, there will be no blockage, as shown in the figure below:
Multithreaded deployment
Note, do not use a browser to do experiments here, the two requests are the same url, and the browser may optimize it Causes two requests to use the same socket connection.
Let's take interface 3 as an example, and visit http://127.0.0.1:5000/tell_time in the Chrome browser at the same time, the result is as follows:
Using the Chrome browser, the interface is still blocked
But when we change to interface 4, there is no blocking on the interface, as shown in the figure below:

Use the gevent module

geventIt is a coroutine Python network library based on greenlet that encapsulates the high-level synchronization API of libevent event loop. It allows us to write asynchronous I/O code in a synchronous manner without changing our programming habits. Programming performance using gevent is indeed higher than using traditional threads. We will have the opportunity to introduce the gevent module separately later.
In the deployment scenario of Flask, before introducing gevent, the monkey patch gevent.monkey can be introduced at the beginning of program execution, which can modify the default IO behavior of python and turn the standard library into a collaborative API. The sample code is as follows:

from gevent import pywsgi
from gevent import monkey
monkey.patch_all()  # 打上猴子补丁

from flask import flask
...

if __name__ == '__main__':
    app.debug = True
    server = pywsgi.WSGIServer(('127.0.0.1', 5000), app)
    server.serve_forever()

Using the gunicorn module

Gunicornis a Python WSGI HTTP server. It usually sits between a reverse proxy (such as Nginx) or load balancer (such as AWS ELB) and a web application (such as Django or Flask). It is a pre-fork worker model ported from Ruby's Unicorn project, supporting both eventlet and greenlet.
Usually, when we use the gunicorn module to deploy Flask applications, we use it in combination with configuration files, such as the following gunicorn configuration file (gunicorn_config.py):

# -*- coding: utf-8 -*-
# gunicorn + gevent 的配置文件
import multiprocessing

timeout = 600
debug = False

# 预加载资源
preload_app = True
# 绑定 ip + 端口
bind = "0.0.0.0:5000"
# 进程数 = cup数量 * 2 + 1
# workers = multiprocessing.cpu_count() * 2 + 1
workers = 2

# 线程数 = cup数量 * 2
# threads = multiprocessing.cpu_count() * 2
threads = 5

# 等待队列最大长度,超过这个长度的链接将被拒绝连接
backlog = 2048

# 工作模式--协程
worker_class = "gevent"

# 最大客户客户端并发数量,对使用线程和协程的worker的工作有影响
# 服务器配置设置的值  1200：中小型项目  上万并发： 中大型
# 服务器硬件：宽带+数据库+内存
# 服务器的架构：集群 主从
worker_connections = 1200

# 进程名称
proc_name = 'gunicorn.pid'
# 进程pid记录文件
pidfile = 'app_run.log'
# 日志等级
loglevel = 'debug'
# 日志文件名
logfile = 'debug.log'
# 访问记录
accesslog = 'access.log'
# 访问记录格式
access_log_format = '%(h)s %(t)s %(U)s %(q)s'

The command to deploy is: gunicorn -c gunicorn_config.py server:app. This deployment method adopts a multi-thread + multi-process method, and the configuration can be adjusted at the same time, which is suitable for high-concurrency scenarios.

pressure test

We have two deployment methods for the interface/tell_time: multi-threaded deployment (method 1) and gunicorn deployment (method 3) for stress testing, and the tool used is jmeter.
In jmeter, we set 1 second to send 5000 user requests/tell_time interface, the round is 1, as shown in the figure below:
Jmeter settings
using the first deployment method ( app.run()set threaded to True in ), the test results are as follows:

using the third deployment method (use gunicorn, the number of CPU cores is 4, set workers=9, threads=8), the test results are as follows: It

can be seen that the number of successful HTTP requests is more than the third deployment method, and the throughput is also higher, which is determined by It can be seen that the high concurrency performance of the third deployment method is better than that of the first deployment method.

Summarize

This article mainly introduces three common ways of deploying Flask applications with high concurrency. I hope readers can practice more in actual work and improve their work skills~