Service security: How to ensure that the server does not lose data when it is powered on?

The most important thing for service security is data security. Most disaster recovery is to ensure high service availability and data security. 

table of Contents

Live disaster preparedness in different places

Disaster recovery switch between two places

Live more in multiple cities and different places

Unit city level failure

Central city level failure

UPS

UPS composition

UPS service provider


Uninterruptible power plan:

  • Live disaster preparedness in different places
  • UPS uninterrupted power

Live disaster preparedness in different places

The Alibaba Cloud database remote multiple activity solution uses the following Alibaba Cloud core products to provide a data layer multiple activity solution in accordance with the principles of architecture design.

DRDS

According to the previously mentioned business data splitting dimensions, Alibaba Cloud DRDS has two clusters that support the buyer dimension and the seller dimension:

  • DRDS cluster in unit mode: Users in multiple locations read and write data in the local domain respectively, and the data in the local domain will be synchronized with the central data in both directions.
  • DRDS cluster in copy mode: The data of this cluster is written in the central database, and after completion, it is fully synchronized to each unit. It should be noted that the DRDS level needs to increase the judgment of the data write route: if it is a cross-unit write, it is judged as an illegal operation and an exception is thrown to ensure that the data will not be written across the unit.

For more DRDS introduction, please refer to the article on Distributed Relational Database DRDS .

DTS

Data replication is a key part of the database multi-active design, in which the correctness of data replication is the first, and efficiency is also the key. Alibaba Cloud DTS supports multiple checks, avoids circular replication (using transaction tables, or thread_id schemes), uses parallel replication (serial distribution, conflict detection, and parallel execution) and large transaction splits to ensure final consistency.

Data verification is also a key part. Alibaba Cloud DTS uses full verification tools (TCP) and incremental verification tools (AMG) to ensure the accuracy of data in real-time/timed inspection centers and units to ensure that online data is foolproof.

More data transmission related information, please refer to the data transmission service article.

HDM

Alibaba Cloud HDM provides services such as the establishment of DRDS clusters, the creation of synchronization links, the monitoring of multiple active databases, data verification, cluster expansion and contraction, and automated disaster tolerance, etc., all of which can be completed through HDM, which realizes remote multi-location through HDM. Database management in live scenarios.

For more data management, please refer to the article on Hybrid Cloud Database Management .

Disaster recovery switch between two places

Disaster recovery is the most important part of multiple activities in different places. Take the architecture diagram of the deployment of multiple activities in two cities as an example:

  • A set of complete business systems are deployed in two cities (City 1 is located in South China 1 and City 2 is located in East China 1).
  • The order business is fragmented according to "user_id"% 100. Under normal circumstances:
    • [00~49] All reads and writes of the shard are in the main database of the database instance in city 1.
    • [50~99] All reads and writes of shards are in the main database of the database instance in city 2.
  • "Main database instance of city 1" and "Main database instance of city 2" establish DTS two-way replication.

When an exception occurs, a disaster tolerance switchover is required. There are 4 possible scenarios:

 
Serial number abnormal situation operating
1 City 1 database main database failure
  1. The database engine completes the switchover between active and standby
  2. DTS automatically switches to the new main database of city 1 to read the new incremental update, and then synchronizes to the database instance of city 2
2 All APP Server failures in city 1 There are two solutions:
  • Solution 1: There is no operation on the database layer, APP Server switches to city 2, and reads and writes the database of city 1 across cities
  • Scenario 2: APP Server and database are both switched to city 2
3 Failure of all databases in city 1 There are two solutions:
  • Solution 1: Switch the database layer to City 2, and APP Server reads and writes the database of City 2 across cities
  • Scenario 2: APP Server and database are both switched to city 2
4 City 1 overall failure (including all APP Server + database, etc.)
  1. All data bank traffic in city 1 is switched to city 2
  2. The DTS data synchronization link from the city 1 database to the city 2 database stops
  3. In city 2, DTS is activated, saving the changes of [00-49] segment
  4. After the failure of city 1 is restored, the incremental data of [00-49] is synchronized to the database instance of city 1
  5. After the synchronization is over, switch the database traffic of [00-49] from city 2 back to city 1 to start [00-49] segment from city 1 to city 2 DTS synchronization

The second and third abnormal situations are all handled by the second scheme. Then, no matter all APP Server anomalies, all database anomalies, and the entire city anomaly, they will be handled directly in accordance with the city-level disaster recovery plan. APP Server, the database is switched to another city.

Live more in multiple cities and different places

The multi-city and remote multi-activity mode refers to the deployment of multi-activity in three or more cities in different locations. There are central nodes and unit nodes in this mode:

  • Central node: It means that the incremental data of the unit node needs to be synchronized to the central node in real time, and the central node synchronizes the incremental data of all shards to other unit nodes.
  • Cell node: the node corresponding to the read and write of the shard. The node needs to synchronize the increment of the shard to the central node and receive the incremental data of other shards from the central node.

The figure below is a diagram of a three-city multi-activity architecture diagram, in which East China 1 is the central node, and South China 1 and North China 1 are unit nodes.

Unit city level failure

When the unit city fails and the service needs to be switched, take the city-level failure of North China as an example:

  1. Disaster tolerance
    1. All data bank traffic in North China 1 (unit) is switched to East China 1 (center);
    2. The DTS data synchronization link from the North China 1 (unit) database to the East China 1 (center) database is stopped, and the synchronization location is recorded
    3. Read and write of slice [70-99] switch to East China 1 (center)
  2. restore
    1. Reconstruction of North China 1 (unit);
    2. After the data migration and synchronization of North China 1 (unit) are completed, stop the read and write of fragment [70-99] in East China 1 (center);
    3. Stop data synchronization from Huadong 1 (center) to Huabei 1 (unit) fragment [70-99];
    4. Create data synchronization from North China 1 (unit) to East China 1 (center);
    5. Switch the read and write of fragment [70-99] to Huabei 1 (unit);
    6. The main database of North China 1 (unit) is opened for writing;
    7. an examination;

Central city level failure

When a failure occurs in the central city and the business needs to be switched, take the failure of East China 1 city as an example:

  1. Disaster tolerance
    1. All data bank traffic in East China 1 (center) is switched to South China 1 (unit);
    2. The DTS data synchronization link between East China 1 (center) database and South China 1 (unit) database is stopped;
    3. The DTS data synchronization link between the East China 1 (center) database and the North China 1 (unit) database is stopped;
    4. The DTS data synchronization link from the South China 1 (unit) database to the East China 1 (center) database is stopped;
    5. The DTS data synchronization link from the North China 1 (unit) database to the East China 1 (center) database is stopped;
    6. Add DTS data synchronization link from South China 1 (unit) database to China North 1 (unit) slice [30~99];
  2. restore
    1. Rebuild East China 1 (center);
    2. After the data migration and synchronization of East China 1 (center) are completed, stop the read and write of the fragment [30-69] in South China 1 (unit);
    3. Stop data synchronization from East China 1 (center) to South China 1 (unit) segment [00-29];
    4. Create data synchronization from East China 1 (center) to South China 1 (unit);
    5. Create data synchronization from East China 1 (center) to North China 1 (unit);
    6. Switch the read and write of slice [00-29] to South China 1 (unit);
    7. The main database of South China 1 (unit) is opened for writing;
    8. an examination;

UPS

UPS (Uninterruptible Power System), that is, uninterruptible power supply, is a constant voltage and constant frequency uninterruptible power supply with an energy storage device and an inverter as the main component. Mainly used to provide uninterrupted power supply to a single computer , computer network system or other power electronic equipment. When the mains input is normal, the UPS will stabilize the mains and supply it to the load. At this time, the UPS is an AC mains voltage stabilizer , and it also charges the battery inside the machine; when the mains is interrupted (accidental power failure) Immediately, the UPS will continue to supply 220V AC power to the load through the inverter conversion method from the power of the battery in the machine , so that the load can maintain normal operation and protect the load's software and hardware from damage. UPS equipment usually provides protection for both over-voltage and under-voltage.

UPS composition

1. Master control station (backstage)

It is composed of monitoring station, engineering maintenance station, system interface, etc., uses management analysis software to process the received data and publish it through the Web. Engineering maintenance personnel can log in to the server to view the operating status of all online equipment in the entire plant, as well as complete historical and real-time data analysis and statistics.

2. Field equipment control station (ES)

According to the needs of the field equipment, you can choose the monitoring function instrument or the equipment operating status information acquisition instrument (EII). EII communicates with smart devices such as electric energy meters, battery acquisition modules, DC screens , UPS, etc. through the RS-232/485 port , converts the monitoring data into data packets that conform to the communication protocol, connects to the LAN , and transmits them to the main control room server. The independent and complete ES includes the following parts.

2.1. System host. It is composed of a downstream serial port channel, a data processor, a display , and an upstream serial port channel. The downlink serial channel accesses the battery voltage acquisition module through the RS-485 bus, collects data, manages the voltage acquisition module, the data processor completes data decompression, data calculation, storage management, and sends part of the processed data to the display, and the other part is sent by the uplink serial port The channel is sent to the protocol processor or passed to the upper management system .

2.2. Data acquisition module group. The data collection requirements can be determined according to user needs and the corresponding collection equipment can be configured. It is generally composed of battery voltage collection modules, current, temperature, power, etc., with good isolation between the modules, strong insulation, high reliability and safety. Data collection can be grouped. Each module can collect voltage of a certain number of batteries, and can be equipped with current and temperature sensors . The modules and the system host generally use RS-485 connection.

2.3. Protocol processor. Interface board with protocol processing program to process various communication protocols. It can be realized: ①The battery voltage, current, temperature and other information sent by the host are encoded, packaged, and sent to the remote server according to the agreed protocol; ②The remote control and remote adjustment commands sent by the remote server are decoded and sent to the host for real-time control.

2.4. Discharge module. The DC internal resistance of the battery can be quickly measured, and the battery performance can be tested instantaneously. The high-power discharge module can provide instantaneous high-current impact load.

2.5. Remote server. Realize the computer data communication in the local area network, remotely access the on-site battery monitoring system through the local post, receive and analyze the data, and publish the data through the Web server.

3. Communication network

Each substation (collection monitoring station) of networked field equipment uses optical fiber as the main data communication line to form a local area network for UPS and DC power supply online monitoring of the whole plant.

UPS service provider

(Jingdong search results):

 The more reliable domestic Huawei UPS is generally the first choice.

 

Guess you like

Origin blog.csdn.net/boonya/article/details/109800395