Database coding (Database-as-Code) combat

Foreword

Recently encountered a thorny problem in doing exclusive cloud-output, two years ago, our customers need to upgrade to the latest released version. Because of the longer span, the product code and database schema have changed dramatically. Product code part because the use of version management strategy, with a clear upgrade path, but the database is part of the program because the code is not used, resulting in lack of upgrade path, the entire upgrade process is very difficult. In order to upgrade later can proceed smoothly, it is necessary to develop a uniform set of database coding scheme.

What we needed to do was to change our mindset of how we treated our database. We had to stop treating it like some special artifact or some unique scenario, and we started looking at it through the same perspective that we were treating our web code.

State based VS Migrations based

State based Migrations based and are two common ways to implement the code of a database, are described separately below.

State based

state_based

In the state based model, we only need to maintain the state of the target database. Each table, stored procedures, views, triggers will be saved as a separate SQL files that will be a true representation of the state of database objects. The script will be required to upgrade the database automatically generated by the tool, thus greatly reducing maintenance costs.

However, this model does not deal well with the data migration scenario, e.g., a user name column table split into two first name and last name fields. This is because the data table is often context-sensitive, which means that the tool can not assume that the data is reliable to generate upgrade scripts.

Migrations based

migrations_based

In migrations based model, we need to maintain their own database script changes from one version to another version. Compared state based, the model increases the cost and complexity of maintenance, but it allows us to more directly control the migration process, it is possible to migrate data processing scenarios such as context-dependent. And due to changes in the way described by imperative that we can review it earlier. To achieve a migration based representation tools Liquibase , Flyway and so on.

Flyway Profile

This chapter introduces the tools under one representative of migrations based mode Flyway, used in the text version 6.0.8.

What is the Flyway

Flyway is an open source database migration tool that can easily help us to complete the deployment of new and incremental upgrade of the database. It has the following features:

  1. It can be embedded in an application, or as a separate tool to perform.
  2. Tracking the migration has been executed.
  3. Implementation of the new migration.
  4. Verify that the database state.

Flyway principle

sls_branch

Flyway works as follows:

  1. Initially, the database creates a file flyway_schema_historymeta data table, the table for performing the recording of migration.
  2. Scan user defined migration scripts directory, according to the version number of them from low to high order.
  3. The migration script in turn applied to the database. Meanwhile, the metadata table will be updated.
  4. The next incremental upgrade databases, flyway will be based on the implementation of the metadata records in the table, to find this new migration scripts and sequentially executed.

Database coding scheme

Because we not only need to deal with schema changes, but also face some data migration scenarios, so the final choice of migrations based mode and use Flyway help us achieve our database coding.

Migration scripts organizational structure

In order to take into account the public cloud and private cloud, private cloud of taking the old version upgrade scenario, we designed the following directory structure for managing SQL migration scripts.

|--{db1}
     |--flyway.conf
     |--base_sql
          |--V0.000__a.sql
          |--V0.001__b.sql
          |...
          `--V0.025__z.sql
     |--upgrade_legacy_private_cloud_sql
          |--V2.000__create_TB_t1.sql
          |--V2.001__alter_TB_a.sql
          `--V2.002__TB_b_update.sql
     |--upgrade_sql
          |--V2019.11.11.000__alter_TB_b_add_column.sql
          |--V2019.11.11.001__TB_c_insert.properties
          |--V2019.11.11.001__TB_c_insert.sql
          `--V2019.11.13.000__mix.sql
 |--{db2}
      |--flyway.conf
      ...
 |--common
      `--procedure.sql

The following description thereof:
1. Each directory corresponds to a separate database, containing the migration scripts and configuration information for the database.
2. Under the file flyway.conf database directory contains information about the connection, authentication, baseline such as the database.
3. subdirectories base_sql used to store inventory schema, the version number of the format 0.xxx. Determine the contents of the directory will not allow changes.
4. subdirectory upgrade_legacy_private_cloud_sql used to store private cloud old version to the new version of the migration scripts, the version number of the format 2.xxx.
5. subdirectory upgrade_sql unified public cloud and private cloud storage subsequent migration scripts. As the public cloud version was not prepared, and private cloud also different versions of numbering, date chosen here as a prefix version migration script format yyyy.mm.dd.index.
DML 6. may differ in different environments. For this scenario, the need to represented the difference in the migration placeholders in script portion, rendering the actual value placeholder properties file obtained by the same name, for example V{yyyy.mm.dd.index}__xxx.properties. Properties with the same name in the file stored in the baseline, different environments can be set different values corresponding to the environment will be copied to the file directories upgrade_sql runtime.
7. The file common/procedure.sqlcontains the change index, column, key storage process, these stored procedures to achieve a change of idempotent.

Size migration scripts

In general, we recommend that a script contains a class action against a target only. For example, Vxxx_TB_car_add_column.sqlrepresentative of the data table caradding columns, Vyyy_TB_car_insert.sqlon behalf of the data table to carinsert the data. This design pattern consistent with a single mandate, can greatly reduce the number and complexity of the merge conflicts.

Implementation process

Mode based on the above-described migration SQL scripts, execution flow public cloud and private cloud different scenarios as follows:

surroundings Scenes Implementation process
Public cloud The new deployment 1. Perform migration script base_sql in.
2. Execute the migration script upgrade_sql in.
Public cloud The new upgrade 1. Get the current version x.
2. The migration script in order to perform a version number greater than x upgrade_sql in.
Public cloud Existing database upgrade 1. Manually base_sql existing databases and aligned.
2. The baseline is set to 2000.
3. In order to perform migration script upgrade_sql version number is greater than 2000.
Private cloud The new deployment 1. Perform migration script base_sql in.
2. Execute the migration script upgrade_sql in.
Private cloud The new upgrade 1. Get the current version x.
2. The migration script in order to perform a version number greater than x upgrade_sql in.
Private cloud Existing database upgrade 1. organize private cloud old version to the new version of the migration script and put in the upgrade_legacy_private_cloud_sql directory.
2. The baseline is set to 1.
3. sequentially executed migration script upgrade_legacy_private_cloud_sql and upgrade_sql version number greater than 1.

Idempotence practice

Ideally, each migration script is run only once per database. But if a particular migration fails, you may need to perform a successful migration steps to restore the database to the desired state. Then idempotent way to write the migration script will be very helpful. Here we summarize the best practices of different DDL and DML idempotency implementation.

SQL Type Object Action SQL script naming convention Best Practices
DDL Table Create Table V{yyyy.mm.dd.index}__create_TB_{table_name}.sql

Example:
V2019.11.08.000__create_TB_car.sql
CREATE TABLE IF NOT EXISTS {table_name};
DDL Table Drop Table V{yyyy.mm.dd.index}__drop_TB_{table_name}.sql

Example:
V2019.11.08.000__drop_TB_car.sql
DROP TABLE IF EXISTS {table_name};
DDL Column Add Column V{yyyy.mm.dd.index}__alter_TB_{table_name}_add_column.sql

Example:
V2019.11.08.000__alter_TB_car_add_column.sql
目标 column 不存在,则进行操作。(通过存储过程封装)
DDL Column Drop Column V{yyyy.mm.dd.index}__alter_TB_{table_name}_drop_column.sql

Example:
V2019.11.08.000__alter_TB_car_drop_column.sql
目标 column 存在,则进行操作。(通过存储过程封装)
DDL Column Change Column V{yyyy.mm.dd.index}__alter_TB_{table_name}_change_column.sql

Example:
V2019.11.08.000__alter_TB_car_change_column.sql
目标 column 存在,则进行操作。(通过存储过程封装)
DDL Keys and Indexes Add V{yyyy.mm.dd.index}__alter_TB_{table_name}_add_index.sql

Example:
V2019.11.08.000__alter_TB_car_add_index.sql
目标 key 或 index 不存在,则进行操作。(通过存储过程封装)
DDL Keys and Indexes Drop V{yyyy.mm.dd.index}__alter_TB_{table_name}_drop_index.sql

Example:
V2019.11.08.000__alter_TB_car_drop_index.sql
目标 key 或 index 存在,则进行操作。(通过存储过程封装)
DML Row Insert V{yyyy.mm.dd.index}__TB_{table_name}_insert.sql

Example:
V2019.11.08.001__TB_car_insert.sql
  • 带唯一索引 Insert - 具备幂等性,重复插入会失败,推荐使用。如果希望重复时不抛异常,可以使用insert ignore intoreplaceon duplicate key update等命令。
  • 不带唯一索引 Insert - 不具备幂等性。在数据量不大的情况下,可考虑改写成如下 SQL 语句INSERT INTO table(field1, field2, fieldn) SELECT 'field1', 'field2', 'fieldn' FROM DUAL WHERE NOT EXISTS(SELECT field FROM table WHERE field = ?)
DML Row Update V{yyyy.mm.dd.index}__TB_{table_name}_update.sql

Example:
V2019.11.08.001__TB_car_update.sql
  • 计算式 Update - 不具备幂等性,尽量避免使用,例如UPDATE table SET number=number-1 WHERE id=1
  • 非计算式 Update - 天然具备幂等性,推荐使用,例如UPDATE table SET number=3 WHERE id=1
DML Row Delete V{yyyy.mm.dd.index}__TB_{table_name}_delete.sql

Example:
V2019.11.08.001__TB_car_delete.sql
天然具备幂等性。
DML Row Multiple tables
Multiple actions
V{yyyy.mm.dd.index}__mix.sql

Example:
V2019.11.08.001__mix.sql
如果针对下列情况有事务性要求,可以将 SQL 放到一个 mix 脚本里。
  • 单表多类 DML
  • 多表单类 DML
  • 多表多类 DML

以修改表字段为例,这里将修改过程封装成了如下的存储过程。

DELIMITER $$
CREATE PROCEDURE `SAFE_CHANGE_COLUMN`(IN i_table_name VARCHAR(128),IN i_col_name VARCHAR(128), IN i_col_def VARCHAR(256))
BEGIN
    SET @tableName = i_table_name;
    SET @colName = i_col_name;
    SET @colDef = i_col_def;
    SET @colExists = 0;
    SELECT 1 INTO @colExists FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @tableName AND COLUMN_NAME = @colName LIMIT 1;
    IF @colExists THEN
        SET @query = CONCAT('ALTER TABLE ',@tableName,' CHANGE COLUMN ', @colName,' ',@colDef);
        PREPARE stmt FROM @query;
        EXECUTE stmt;
        DEALLOCATE PREPARE stmt;
    END IF;
END$$

CICD 流水线

cicd

Here we'll database migration scripts and application code stored in the same warehouse code, and share the same CICD process. This mode that best meets DevOps advocated collaboration, testing, rapid feedback, continuous improvement ideas, products can achieve faster, more frequent and more stable delivery.

to sum up

With the code of the database program, database maintenance and upgrade work becomes easy.

  1. No matter how long span upgrade can calmly, as have a clear upgrade path between any version.
  2. Database schema changes, data migration behavior becomes auditable.
  3. Database migration, application code sharing CICD lines, improve product delivery speed release.

Reference material

Guess you like

Origin yq.aliyun.com/articles/739406