Interviewer: Why is it not recommended to deploy the database in a Docker container?

Docker has been very hot in the past two years. Developers can't wait to deploy all applications and software in Docker containers, but are you sure you want to deploy the database in the container too?

This question is not a mere fact, because many operating manuals and video tutorials can be found on the Internet. The editor has compiled some reasons why the database is not suitable for containerization for your reference. At the same time, I hope you can be more cautious when using it.

So far, it is very unreasonable to containerize the database, but the advantages of containerization believe that all developers have tasted the sweetness and hope that with the development of technology, more perfect solutions can appear.

7 reasons why Docker is not suitable for database deployment

1. Data security issues

Don't store data in containers, this is also one of the official Docker container usage tips. The container can be stopped or deleted at any time. When the container is rm dropped, the data in the container will be lost. To avoid data loss, users can use data volume mounting to store data. However, the container's Volumes design is to provide persistent storage around the Union FS image layer, and data security lacks guarantees. If the container suddenly crashes and the database is not shut down normally, data may be damaged. In addition, the shared data volume group in the container will also damage the hardware of the physical machine.

Even if you want to store Docker data on the host, it still cannot guarantee that data will not be lost. Docker volumes is designed to provide persistent storage around the Union FS image layer, but it still lacks guarantees.

With current storage drivers, Docker still has the risk of being unreliable. If the container crashes and the database is not closed properly, data may be corrupted.

2. Performance issues

Everyone knows that MySQL is a relational database and requires high IO. When a single physical machine runs multiple, IO will accumulate, causing IO bottleneck, greatly reducing MySQL's read and write performance.

In a special session on the Ten Difficulties of Docker Application, an architect of a state-owned bank also pointed out: "The performance bottleneck of the database generally appears on the IO. If you follow the Docker idea, then multiple docker final IO requests will be Appeared on storage. Now Internet databases are mostly share nothing architecture, maybe this is also a factor in not considering migration to Docker.”

For performance problems, some students may also have corresponding solutions:

(1) Separation of database program and data

  If you use Docker to run MySQL, the database program and data need to be separated, the data is stored in shared storage, and the program is placed in the container. If the container is abnormal or the MySQL service is abnormal, a brand new container is automatically started. In addition, it is recommended not to store data in the host machine. The host machine and the container share the volume group, which will have a greater impact on the host machine damage.

(2) Run a lightweight or distributed database

  When deploying lightweight or distributed databases in Docker, Docker itself recommends that the service hang up and automatically start new containers instead of continuing to restart the container service.

(3) Reasonable layout application

  For applications or services with high IO requirements, it is more appropriate to deploy the database in a physical machine or KVM. Currently, TX Cloud’s TDSQL and Ali’s Oceanbase are deployed directly on physical machines instead of Docker.

3. Network problems

To understand Docker networking, you must have a deep understanding of network virtualization. You must also be prepared to deal with unexpected situations. You may need to fix bugs without support or additional tools.

We know that: databases need dedicated and durable throughput to achieve higher loads. We also know that the container is an isolation layer behind the hypervisor and the host virtual machine. However, the network is essential for database replication, which requires a 24/7 stable connection between the master and slave databases. The unresolved Docker network problem is still unresolved in version 1.9.

Putting these issues together, containerization makes database containers difficult to manage. I know that you are a top engineer, and any problem can be solved. But how much time do you need to spend to solve Docker network problems? Wouldn't it be better to put the database in a dedicated environment? Save time to focus on the business goals that really matter.

4. Status

It is cool to package a stateless service in Docker, which can realize the orchestration of containers and solve the single point of failure problem. But what about the database? Put the database in the same environment, it will be stateful and make the scope of system failure larger. The next time your application instance or application crashes, it may affect the database.

Knowledge points horizontal scaling in Docker can only be used for stateless computing services, not databases.

An important feature of Docker's rapid expansion is statelessness. Those with data state are not suitable to be placed directly in Docker. If a database is installed in Docker, storage services need to be provided separately.

Currently, TX Cloud's TDSQL (Financial Distributed Database) and Alibaba Cloud's Oceanbase (Distributed Database System) are running directly on physical machines, not on Docker, which is easy to manage.

5. Resource isolation

In terms of resource isolation, Docker is indeed inferior to the virtual machine KVM. Docker uses Cgroup to implement resource restrictions. It can only limit the maximum resource consumption, but cannot isolate other programs from occupying their own resources. If other applications excessively occupy physical machine resources, it will affect the read and write efficiency of MySQL in the container.

The more isolation levels you need, the more resource overhead you get. Compared to a dedicated environment, easy horizontal scaling is a big advantage of Docker. However, horizontal scaling in Docker can only be used for stateless computing services, and databases are not applicable.

We don't see any isolation function for the database, so why should we put it in the container?

6. The inapplicability of cloud platforms

Most people start their projects through a shared cloud. The cloud simplifies the complexity of virtual machine operation and replacement, so there is no need for no one to work at night or on weekends to test new hardware environments. When we can quickly start an instance, why do we need to worry about the environment in which this instance runs?

This is why we pay a lot of fees to cloud providers. When we place a database container for the instance, the convenience mentioned above does not exist. Because the data does not match, the new instance will not be compatible with the existing instance. If you want to restrict the instance from using stand-alone services, you should let the DB use a non-containerized environment. We only need to reserve the ability to expand elastically for the computing service layer.

7. Environmental requirements for running the database

It is often seen that the DBMS container and other services are running on the same host. However, these services have very different hardware requirements.

Databases (especially relational databases) have higher requirements for IO. Generally, database engines use a dedicated environment in order to avoid competition for concurrent resources. If you put your database in a container, it will waste your project's resources. Because you need to configure a lot of additional resources for this instance. In the public cloud, when you need 34G of memory, the instance you start must have 64G of memory. In practice, these resources are not fully used.

How to deal with it? You can design hierarchically and use fixed resources to launch multiple instances of different levels. Horizontal expansion is always better than vertical expansion.

to sum up

In response to the above question, does it mean that the database must not be deployed in a container?

The answer is: no

We can digitize services that are not sensitive to data loss (search, burying points), and use database sharding to increase the number of instances, thereby increasing throughput.

Docker is suitable for running lightweight or distributed databases. When the docker service hangs, it will automatically start a new container instead of continuing to restart the container service.

The database uses middleware and containerized systems to automatically scale, tolerate, switch, and bring multiple nodes, and it can also be containerized.


 

Guess you like

Origin blog.csdn.net/bjmsb/article/details/108623139