Flask parse (b): Flask-Sqlalchemy and multi-threaded, multi-process

Original author: flowell, reprint please indicate the source: https://www.cnblogs.com/flowell/p/multiprocessing_flask_sqlalchemy.html


 

 

Sqlalchemy

  flask-sqlalchemy the session is thread-safe, but in a multi-process environment, to ensure that spawn child process, the parent process any database connection does not exist, you can call db.get_engine (app = app) .dispose () manually destruction engine has been created, and then send another children processes.


 

  Recent projects will always be reported online database connection-related errors, such as "Command out of Sync", "Mysql server has gone away", "Lost databse connection", "Package sequence out of order" and so on, the final settlement They discovered that the above error can be divided into two types, one is missing and connection-related, and one is connected by multiple threads (processes) using both related.

 

  Our project is based on flask, multithreaded scenarios, but also how the process of scene. orm with a flask of expanding flask-sqlalchemy. Use flask-sqlalchemy it must be based app flask instance, that you want to use the flask-sqlalchemy in the app context, so in some off-line (non-web) scenario, we also used Sqlalchemy native.

 

  Native Sqlalchemy of use is

engine = create_engine(db_url)
Session = sessionmaker(bind=engine)
session = Session()
session.query(xxx)

  To create a first engine, engine name suggests is the engine and database connections. Before initiating the actual query, it is not going to create any connection of. You can specify the connection pool used by specifying poolclass engine parameters when creating engine. Default QueuePool, it may be provided to NullPool (without using the connection pool). To facilitate understanding, the engine can be managed as an object connection pool.

 

  sqlalchemy in session and we usually say the session database are two different concepts, in peacetime database, session life cycle begins with the database connection, the connection and disconnection location database. But sqlalchemy the session is more of a management target connection, it took out a connection from the connection pool, use the connection, and then release the connection, but also follow their own destruction. Connection object sqlalchemy is the real database connection management object, the real database connection in sqlalchemy is DBAPI.

 

  By default, if you do not pass poolclass, use QueuePool (with a certain number of connection pool), if you do not specify pool_recycle parameter, the default database connection is not refreshed. That is the connection, if not applicable, has not refresh it. But here, in Mysql, enter "show variables like"% timeout% ";", you can see there is a waittimeout, there interacttimeout, 28800 (8 hours), these values ​​represent, if and not a database connection mysql contact within eight hours, it will cut off the connection. So, eight hours later, Mysql cut off the connection, but sqlalchemy client side still maintains the connection. When the connection is removed sometime use from the connection pool, it will be thrown "Mysql server has gone away" and other connection information loss.

 

  The solution to this problem is as simple as passing pool_recycle parameters. In particular, this problem does not occur in the flask-sqlalchemy because falsk-sqlalchemy automatically help us expand into a pool_recycle parameter, the default is 7200 seconds.

 

def apply_driver_hacks(self, app, sa_url, options):
        """This method is called before engine creation and used to inject
        driver specific hacks into the options.  The `options` parameter is
        a dictionary of keyword arguments that will then be used to call
        the :func:`sqlalchemy.create_engine` function.
        The default implementation provides some saner defaults for things
        like pool sizes for MySQL and sqlite.  Also it injects the setting of
        `SQLALCHEMY_NATIVE_UNICODE`.
        """
        if sa_url.drivername.startswith('mysql'):
            sa_url.query.setdefault('charset', 'utf8')
            if sa_url.drivername != 'mysql+gaerdbms':
                options.setdefault('pool_size', 10)
                options.setdefault('pool_recycle', 7200)  # 默认7200秒刷新连接
        elif sa_url.drivername == 'sqlite':
            pool_size = options.get('pool_size')
            detected_in_memory = False
            if sa_url.database in (None, '', ':memory:'):
                detected_in_memory = True
                from sqlalchemy.pool import StaticPool
                options['poolclass'] = StaticPool
                if 'connect_args' not in options:
                    options['connect_args'] = {}
                options['connect_args']['check_same_thread'] = False

                # we go to memory and the pool size was explicitly set
                # to 0 which is fail.  Let the user know that
                if pool_size == 0:
                    raise RuntimeError('SQLite in memory database with an '
                                       'empty queue not possible due to data '
                                       'loss.')
            # if pool size is None or explicitly set to 0 we assume the
            # user did not want a queue for this sqlite connection and
            # hook in the null pool.
            elif not pool_size:
                from sqlalchemy.pool import NullPool
                options['poolclass'] = NullPool

            # if it's not an in memory database we make the path absolute.
            if not detected_in_memory:
                sa_url.database = os.path.join(app.root_path, sa_url.database)

        unu = app.config['SQLALCHEMY_NATIVE_UNICODE']
        if unu is None:
            unu = self.use_native_unicode
        if not unu:
            options['use_native_unicode'] = False

        if app.config['SQLALCHEMY_NATIVE_UNICODE'] is not None:
            warnings.warn(
                "The 'SQLALCHEMY_NATIVE_UNICODE' config option is deprecated and will be removed in"
                " v3.0.  Use 'SQLALCHEMY_ENGINE_OPTIONS' instead.",
                DeprecationWarning
            )
        if not self.use_native_unicode:
            warnings.warn(
                "'use_native_unicode' is deprecated and will be removed in v3.0."
                "  Use the 'engine_options' parameter instead.",
                DeprecationWarning
            )

  

  sessionmaker is Session customized approach, we pass sessionmaker engine, you can get a session factory by factory to produce real session objects. But this production out of the session is not thread-safe, sqlalchemy provides scoped_session session to help us produce thread-safe principle is similar to Local, is the agent session, to find the session really belong in this thread by thread id.

 

  flask-sqlalchemy scoped_session is used to ensure thread safety, specific code can be seen in Sqlalchemy in constructing the session, using the scoped_session.

 

def create_scoped_session(self, options=None):
        """Create a :class:`~sqlalchemy.orm.scoping.scoped_session`
        on the factory from :meth:`create_session`.
        An extra key ``'scopefunc'`` can be set on the ``options`` dict to
        specify a custom scope function.  If it's not provided, Flask's app
        context stack identity is used. This will ensure that sessions are
        created and removed with the request/response cycle, and should be fine
        in most cases.
        :param options: dict of keyword arguments passed to session class  in
            ``create_session``
        """

        if options is None:
            options = {}

        scopefunc = options.pop('scopefunc', _app_ctx_stack.__ident_func__)
        options.setdefault('query_cls', self.Query)
        return orm.scoped_session(
            self.create_session(options), scopefunc=scopefunc
        )

    def create_session(self, options):
        """Create the session factory used by :meth:`create_scoped_session`.
        The factory **must** return an object that SQLAlchemy recognizes as a session,
        or registering session events may raise an exception.
        Valid factories include a :class:`~sqlalchemy.orm.session.Session`
        class or a :class:`~sqlalchemy.orm.session.sessionmaker`.
        The default implementation creates a ``sessionmaker`` for :class:`SignallingSession`.
        :param options: dict of keyword arguments passed to session class
        """

        return orm.sessionmaker(class_=SignallingSession, db=self, **options)

  

Multi-process and database connections

  Multi-process environment, pay attention, and database connection-related operations.


 

  When it comes to multi-process, python is the most commonly used multiprocessing. multiprocessing differ in windows and linux under performance, in which only discuss performance under linux. Under linux multi-process by fork () to spawn, to understand what I say below must understand fork () is what it is. Roughly speaking, each process has its own space, called the process space, process space of each process is independent, non-interfering between processes and processes. The role of fork (), is the process space of a process, a totally copy, copy is out of the child process, so we say the child and the parent process has exactly the same address space. Address space is the space of a running process, this process has the space will be open file descriptors, file descriptors indirectly pointing process has open files. In other words, after the fork (), the parent process, the child will have the same file descriptors, pointing to the same file. why? Because the file is present in the hard disk, fork () process space copy when the memory, and also did not file a copy. This leads, the parent process, the child process, while pointing to, any of them can be operated with a file on this file. This article saying anything to the database? Along the way think, database connection is not a TCP connection? TCP connection is not a socket? socket is what is a file in linux. So, if the parent process fork () before opening the database connection, then the child will have this open connection.

 

  Two processes simultaneously write a data connection can lead to confusion, so the "Command out of sync" error occurs, two processes simultaneously read a connection will lead to a process of reading, and the other did not read, is "No result" . A process closes a connection, do not know another process, it tries to operate the connection, "Lost database connection" error will occur.

 

  Scene in this discussion is the parent process before the spawn child process, the parent process database connection has been open. After derived the child, the child will have the appropriate connection. If () before the parent process did not open the database connection, so do not worry about this problem in the fork. For example, Celery use prefork pool, although the multi-process model, but the celery in the school before the child does not open the database connection, so do not worry there will be chaos in the database connection problems celery task.

 

   One project I'm doing a scene where multiple processes is to use the tornado to run web applications, when deriving multiple web application instances, ensure that the database created earlier connection is destroyed.

 

= the Flask App () 
db = SQLAlchemy () 
db.init_app (App) 
... 
... 
db.get_engine (App App =) .Dispose () # to destroy the existing engine, to ensure that no parent database connection 
.. . 
... 
fork () # spawn child processes 

# For example
tornado.start () # launch multiple web instance process

Guess you like

Origin www.cnblogs.com/flowell/p/multiprocessing_flask_sqlalchemy.html