Distributed database combat [original]

Every year on Double Eleven, the system will encounter performance challenges. We have done a lot of work on the scalability of the application level, and basically we can expand horizontally. At present, the biggest pressure is still the oracle database. This is a single point. Therefore, in order to allow the database level to scale horizontally, we are going to use a distributed database engine. At the end of March, another architect and I formed a small team of two people and started the project.
The open source distributed database engines include: cobar, mycat, Atlas, Kingshard, sharding-jdbc. Considering the resources of the architecture group, we feel that it is more suitable for us to adopt a relatively mature product with an active community. In the end, we chose Dangdang's open source sharding-jdbc.
After the selection, it took us about a month from the planning to the transformation and the launch. It took about 10s for a single query before the transformation to reduce to a single query. The query only takes 1s, and we feel that it is quite a sense of achievement. Next, I will share with you the idea of transformation.
1. First of all, our business systems are all running on oracle, and the more mature sub-database sub-table solutions are running on mysql. Therefore, we need to synchronize oracle data to mysql, and the additional benefit of doing so is to achieve read-write separation. The whole process is like this, the user writes data to the oracle main database, we use cdc to extract and synchronize it to mysql. In order not to affect the performance of the main database, we use ogg to synchronize the data to the standby database, use cdc to extract data into rocketMq on the standby database, and then extract data from rocketMq to mysql. Adding rocketMq is for decoupling, and at the same time, it also prevents cdc data from being discarded or repeatedly executed due to the failure of the handler. In the process of getting data from mq to mysql storage, because multi-threading (PushConsumer) is used, idempotent processing is required to ensure that all data can be re-executed, and the order of data processing is strictly guaranteed. Due to the project time requirements, we have only made one process multi-threading in the currently online version, which basically meets the current performance requirements, but there are still problems with reliability. In the next step, we will do distributed thread synchronization through zookeeper to ensure distributed idempotent processing.
2. In the function of inserting mysql and external network query, we use sharding-jdbc. What is difficult to handle is that sharding-jdbc does not support small table broadcasting. When inserting, it can be inserted into the specified table of the specified library, but when querying, it will be queried from different libraries. (I don't know if it's a bug, or our configuration is incorrect, or it's a deliberate optimization not to perform cross-database joins), anyway, we have to synchronize small tables by ourselves. We adopted Ali's otter.
3. When it is officially launched, it involves data migration. At that time, I planned to make one myself, but later found that Ali's yugong just met our needs, and its performance was quite good.
4. In the whole process, in order to make debugging more convenient, we also used extjs to make a message query tool for mq. In addition, at the beginning, for performance, we also adopted multi-threaded scheduling when acquiring cdc data, using It is the elastic-job of Dangdang. But for some reason, after using this distributed scheduling and the combination of cdc, the old data is lost. We estimate that this job is not matched correctly, which causes the displacement of cdc to be disordered. It is no problem to change it to a single thread. . Fortunately, the performance bottleneck is not in this place, and single thread can also meet our needs.
Finally, I would like to thank Ali and Dangdang for dedicating so many good tools. Our successful implementation is completely standing on the shoulders of giants.

Distributed database combat [original]

Guess you like