Design Data Intensive Applications notes (copy)

Copy purpose:

  • Access location data closer to the user to reduce access latency (multiple data centers)
  • Data is still available down node (HA)
  • Multi-readable copy, read throughput increase

Master-slave replication:

Replication -> strong consistency, weak availability once a node goes down from the write failure
Solution: synchronous replication from a node, other asynchronous replication, synchronous node goes down once the upgrade is an asynchronous synchronization node node
new strong consistency copy algorithm: chain replication

Asynchronous replication -> strong availability, weak consistency

Increase from the node:
Challenges: The master node data is still being written, copied directly cause data inconsistencies
program:
When copying data lock master node is not allowed to write, too rough
a better solution in three steps:

  1. Make a consistent snapshot of the master node data, and records the position of the corresponding replication log
  2. Copy snapshot to the slave node
  3. The new snapshot replication after application to log on from node to catch up to the master node

Recover from node goes down:
Copy the log position from the beginning to catch up before downtime

The master node down -> FAILOVER mechanism
is complicated and requires three steps:

  1. Down Detection: typically by heart
  2. Re-election primary, the majority consensus algorithm and protocol
  3. Route requests to the new primary node, and notifies the client
    issues:
  4. Loss of data, the newly selected primary node has no data of all the nodes of origin
  5. Split brain: two nodes at the same time consider themselves the primary node
  6. Reasonable heartbeat timeout setting (network jitter may be false positives)
    manual FAILOVER better than automatic?

Copy the log to achieve:

  • Based on SQL statements: (MYSQL earlier versions)
    problem: special function now (), random () such as the uncertainty brought about by the impact from the primary key conflict trigger, stored procedures, etc.
  • Based WAL log: (POSTGRESQL, ORACLE)
    issues: strong coupling and storage engines, rolling upgrades may be a problem
  • Replication logic: Custom structure, based on the modified data values.
    Advantages: does not depend on the storage engine, good compatibility
  • Trigger (TRIGGER): (DATABUS FOR ORACLE , BUCARDO)
    Advantages: Flexibility
    question: OVERHEAD

Copy the delay problem:
Eventual consistency: consistency with the weaker exchange for read scalability
consistency and more weakly depends on the replication latency and read policy

  • read-your-own-write- consistency (read-after-write consistency): ensure that their data must be written to a read request to read a subsequent
    implementation:
    if the user modifies only its own data, and know the ID data, own data read from the master node, the other data is read from the random node
    if the user can modify many other data, requires the client to record key -.> last modify time read from only the master node within a certain time, or read from a random node. the client or the key + last modify time distributed query nodes, the timestamp comparison query node, and then returns the result to wait catch or read request
    multiple data centers: route data read request to the same center

  • monotonic read: second reading to ensure that results are not older than the first
    implementation:
    each read from the same copy (data skew may occur) for the same primary key

  • consistent prefix read: read and write will not ensure causal ordering disorder (such as question and answer sequence)
    implementation:
    global consistency written order
    tracking algorithm of causality causual dependency

To solve the delay problem by copying the distributed transaction.

Multi-master replication:

Support for multiple write
each other for LEADER FOLLOWER LEADER is
the main application scenario is a multiple data centers
each data center has a LEADER, LEADER to copy data from the data center according to the present LEADER FOLLOWER and another data center.

Performance: write requests processed by LOCAL LEADER data centers do not have across the WAN.
Data center fault tolerance: another data center can still operate independently after a data center downtime, data recovery center and then copy the data over the last
network fault tolerance: asynchronous replication across the WAN temporary network instability will not affect writing

Disadvantages: the conflict must be resolved to write
must be considered from the design by the primary key conflict TRIGGER, data integrity constraints.

Write conflict detection and resolution

  • Replication of data to each node and then detect collisions and notify the user to delay large, completely sacrificing the advantages of multi-master replication: synchronous detection.
  • Collision Avoidance: to ensure that the same key is written to the same data center, data HASH, routing data center downtime or user location change if a problem occurs.
  • Each writes a UUID, the same data last write wins (based on timestamp, etc.), data may be lost
  • Each REPLICA a UUID, the same data is written from the high REPLICA win, you may lose data
  • Save write conflicting data, or automatically resolved after the user is prompted to solve

The timing of settlement of the conflict:
write: once the conflict can only be resolved background does not enable user intervention (Bucardo)
read: Save the conflicting data, the next reading time resolved automatically or prompt the user (CouchDB)
transaction each write conflict will be addressed separately, we can not maintain transaction boundaries

Automatic Conflict Resolution:
Conflict as Free replicated the Data types
Mergeable persistent the Data Structures (GIT)
Operational Transformation

Multi-master replication topology :( more than two MASTER)
star-shaped
ring
all of all
the need to prevent replication cycle (write request through the node record number)
star or ring-shaped copy a node goes down will lead to not write
all susceptible to all delay leading to data consistency problems or write failure

No primary copy:

The concept is not a copy, write directly to the client multiple nodes
most successful node in response to the write request is
a read request is also sent to multiple nodes, read the latest version to ensure the data is
guaranteed: the number of nodes successful write (W) + read nodes (R )> cluster nodes (N)
parallel read and sent to the node number of all nodes, W, R is the response
if W + R <N sacrificial exchange consistent low latency and high availability
in general choose W> N / 2, R> N / 2, by a maximum of N / 2 nodes down
W + R> N is still possible to read old data:

  • Sloppy Quorum
  • Concurrent write collision
  • Write conflict, the node may not read the latest data is written
  • Write some node fails, the node is not written successfully rolled back
  • Write successful node goes down, data is restored from the old node

Monitoring data staleness is very difficult, not like a master-slave replication as monitoring REPLICATION LAG. Sequential write uncertainty

Sloppy quorum: some nodes can not be written down resulting in nodes are usually written (reading and writing migration)
Hinted Handoff: Node node recovery after migrating to read and write will temporarily write data is typically written to the writeback node
Sloppy quorum improves availability but decreases the latest data may be read
without the master copy is also suitable for multi-scene data center: a write request to send multiple data centers but only waiting for confirmation of this data center

Node downtime guarantee eventual consistency:
the Read Repair: Client found that reading a version conflicts automatically updated versions of older nodes
Anti-entropy process: background process automatically compares the data version and fix (long time inconsistency)

Write a conflict may occur in the following circumstances:

  • Node is lost due to network failure write request
  • Different nodes receive inconsistent written order

Write conflict management:

  • The last write wins: the version number assigned to the write request, the conflict retain only version of the largest data may be lost data for each scene once wrote KEY Retired. (Time-series data?)
  • Write concurrency defined: If two strong write timing relationships or dependencies (e.g., A is inserted key1, B is a key + 1) to each other, then A, B referred causually dependent write any A, B, A must be in. before B, A after B, or both concurrent AB
  • Concurrent write detection algorithm:
    Each KEY maintain one version, each write version +1
    client must first read the latest version of KEY to start writing before SERVER KEY
    write KEY version read before the band write operation can be covered with a KEY lower or the same version, but later must be retained (concurrent write)
    cart, for example:
    a, B two clients at the same time adding products to the cart, write each submission, the server generates a new version of a shopping cart, and the latest version of the shopping cart contents and the corresponding version number is returned to the requesting client to submit written
    submissions when you want to write a request submitted before the cart and merge the client version, the previous version and comes with
    the server for the same client, the new version over the previous versions, but the latest version submitted by different clients considered to be concurrent write, do not overwrite each other, need conflict resolution
  • Handle concurrent write:
    multiple concurrent write the merged version by the client.
    Delete operation can not be directly physically deleted, but to mark (tombstone)
    automatic conflict resolution, see above
  • Vector bell (version vector):
    applies to leaderless replication, but there is no master node multiple peer replica
    version number for each replica of each KEY
    again before writing the start replica read the latest version vector, write data merge back
    guarantee read from replica a, and then wrote replica B is safe

Guess you like

Origin blog.51cto.com/shadowisper/2448260