MySQL high availability of MGR

 

 

MGR overall structure and characteristics

  single-master

    Only one node is written, can be read

  multi-master

    Each node can be written and read

  The concept involves:

    group communication system (GCS)

    writeset

    membership

    cerification info

    flow control stats

    paxos

  MGR read enhancing consistency

  group_replication_consistency (8.0.14引入)

    EVENTUAL: Default

    BEFORE: waiting in the queue to perform all transactions completed

    BEFORE_ON_PRIMARY_FAILOVER: waiting for a new primary after executing a transaction queue

    AFTER: waiting for data changes in all other nodes have all been applied

    BEFORE_AND_AFTER: 

  MGR restrictions

    Only supports InnoDB, you must have a primary health

    Binlog format: Row, close binlog checksum  

    It must be turned GTID

    Transaction isolation level: READ COMMITTED (no gap lock)  

    Large transaction limit: group_replication_transaction_size_limit

    Multi master mode: to avoid the different nodes on the same table concurrently DDL / DML

    The maximum cluster node; 9 (odd number)

 

How MGR data to synchronize data

  MGR data replication -> Services Certification

    Services Certification

    Collision Detection

      certification_info key: xxhash64 (Value Index Name + DB + DB name table name + length + length + the table name index constituting each column of a unique length + value) is the Value of the transaction gtid_executed

      Transaction allocation gtid

      group_replication_gtid_assignment_block_size

    The branch submitted (group commit)

 

MGR data replication conflict resolution

  problem:

    The system will write to tell centification_info for increasingly larger, performance will be getting worse?

  Approach:

    centification_info the introduction of clean-up mechanism

 

Copy data flow control MGR

Flow Control

  Flow control purposes

    Controlled to ensure that the cluster delay (for read-only transactions are not within the control range of the flow)

  The reason for flow control

    Each node inconsistent performance

    Bucket short board effect

  parameter

    group_replication_flow_control_mode default: quota open flow control

    group_replication_flow_control_period How often flow control statistics, unit: seconds

    How many transactions to be authenticated group_replication_flow_control_applier_threshold & group_replication_flow_control_certifier_threshold affairs certification queue accumulated more than just trigger node flow control

 

 

MGR monitoring points

  The current node is not online

    select member_state from performance_schema.replication_group_members;  

  Is there is a delay

    获取到的: select received_transaction_set from performance_schema.replication_connection_status;

    It has been executed: select @@ gtid_executed

  The current backlog queue is not there

    select count_transaction_in_queue from performanct_schema.replication_group_member_stats where member_id=@@server_uuid;

  The current node is not writable

    select * from performance_shcema.global_variables where variable_name in ('read_only','super_read_only');

  

MGR optimization direction

  The operation and maintenance

    Wiki structure of this copy operation, all data replication, or the logic of reproduction, so optimization is also copy optimization points.

    change:

      slave_parallel_type -> LOGICAL_CLOCK

    Enhanced number SQL_THREAD:

      slave_parallel_workers -> 2-8

    If the CPU bottleneck, the network no problem, reducing CPU compression:

      group_replication_compression_threshold = 1000000 ->  2000000

      Increased from 1M to become 2M, then compressed (mainly optimized for large transaction transmission)

  For, after all, write the amount of the environment

    Using single-master

    On the table structure design: to reduce the number of indexes, multi-use joint index

  Kernel

    Attempts have been made: static const int BROADCASE_GTID_EXECUTED_PERIOD = 60> 30; // seconds

  Important parameters:

    group_replication_member_expel_timeout (8.0.13+) 

      After (5 + X) seconds, the node is removed from the group of members romance

      Network anomalies -> 5 seconds -> lost to guess -> X-sec / UNREACHABLE -> removed 

      X seconds, group can not add nodes, delete nodes, Primary Election

    group_replication_unreachable_majority_timeout

      After network partition, minorty members within X seconds failed to restore the connection to the majority, enter ERROR

    group_replication_exit_state_action (8.0.12+, 5.7.24+)

      ABORT_SERVER / READ_ONLY

      aplier execution error / loss associated with majority / churn group is removed

    group_replication_recovery_complete_at

      TRANSACTIONS_CERTIFIED / TRANSACTIONS_APPLIED

    group_replication_member_weight

      Useful in the lower single primary, node roles unequal situation

      Same group_replication_member_weight, depending server_uuid

    group_replication_transaction_size_limit (5.7.19+)

      The maximum number of bytes in a single transaction limit, it can control network overhead, memory allocation, the probability of conflict

    group_replication_compression_threshold

      After more than X bytes, open LZ4 compression Affairs transmission, default 1MB 

 

MGR deployment architecture recommendations

  MySQLrouter + MGR  

    router: two interfaces (read, insert)

    MySQL needs to look at the X protocol, js-related related operations

    Recommended alternative ProxySQL

  If for performance: Single-master

  Easy to use: Multi-master (single writing point, multi-point reading)

 

 

 

Guess you like

Origin www.cnblogs.com/yujiaershao/p/11357932.html