"Architecture Evolution of cellular data path" reading notes

First, the functional integration

1, each function how?

Offline sync : sql understood as according to a query of the data storage synchronized to the other objects;

Real-time subscription : real-time parsing mysql-binlog, the variable data stored in the encapsulated event message queue, user subscription for consumption;

Real-time synchronization : some common materials are now subscribing clients, real-time news consumption, the change is applied to the data on the target storage.

2, how to integrate three functions under one platform architecture?

Offline synchronization, real-time subscription, real-time synchronization needs three abstract into three operations, namely BatchJob, StreamJob, PieJob.

i. BatchJob Sqoop reference model, data will be synchronized to fragmented according to the specified rules, and then split into a plurality of tasks according to a job slices, each synchronization sub-task only slice data, a plurality of tasks can be run, in order to speed up the synchronization efficiency;

ii. in BatchJob mode based, StreamJob can also be classified according to the example to be collected mysql plurality of tasks, each task is responsible for parsing a collection of the binlog mysql, and parsed events encapsulated message stored in the local subscribers for consumption ;

iii. PieJob is a subscription package for clients, each client can subscribe to as a task.

Three different job ultimately can be divided into multiple tasks to run through the slice, using the unified model.

 

Second, the task details

Task generated after the internal fragmentation for each specific implementation details Job

1, Btchtsk

Fetcher responsible for taking data, Sinker responsible for writing data, Storage as a cache layer.

2、StreamTask

RelayLogTask responsible for pulling binlog; HHLTask Binlog responsible for parsing, and changes the parsed event data package is easy to use the message body, and finally stored in hhl.

Achieve hhl draws Kafka, it can be seen as a simple version of the message queue. Use protobuf serialized message, sequentially writing the compressed file. While providing the index block specified size.

Providing a subscription request ClientServer handles clients in StreamTask in detail below:

Upon receipt of the subscription request to a given site quickly locate corresponding data block by an index, then scan the data block corresponding to the positioning message, all messages of the site after filtered through a specified filter, the final push to the client.

Subscribe to the news server does not maintain customer subscription status, that is not stored in the client's site, by the client themselves. The server is only responsible for the messages after the specified site constantly pushed to the user.

3, PieTask

PieTask actually a package to the client, this is mainly introduce the client implementation.

Client mode using concurrent processing, connector responsible for receiving messages, paritioner responsible for distributing messages to different Processor (thread) processing.

Because the client needs to record their own sites currently being processed, but have to protect the sites before the message recorded in concurrency scenarios have been processed correctly. In order to reduce inter-thread to block the use of the submission annular array (recording sites).

 

Third, clusters

Using Master-Slave configuration, as shown below:

 

Referred to herein as Master Queen; Slave herein as Bee.

Queen responsible for fragmented job scheduling; Bee responsible for performing specific tasks (tasks fragment obtained from the job).

1, high availability

i.Mysql: mysql maintained by the availability of DBA, but mysql after switching from the corresponding master site will be different, here to monitor changes found by searching for a corresponding bit of serverId after the main switch is switched from the host on the new instance of time by point;

ii.Queen: Active and switching is achieved by StandBy zookeeper

iii.Bee: Queen will switch down after all tasks running on the host machine to another

2, data locality

Each has its own Bee cabinets, racks, computer room, group information. You can specify your preferences job runs, the task will be assigned priority on the specified machine group

3, load balancing

Bee at runtime report their load heartbeat, when a task requires scheduling, Queen priority task will be distributed to low-load hosts meet under the premise of local data.

 

Read the original:

https://mp.weixin.qq.com/s?__biz=MzU1MzE2NzIzMg==&mid=2247485060&idx=1&sn=2d374061f2f85c453cc27d092a5354ad&chksm=fbf7b66bcc803f7dde316a4edbb40d9e6074640ef95ca7b1095b3d885433c991115ac00f1d9b&scene=21#wechat_redirect

Guess you like

Origin www.cnblogs.com/iCheny/p/11056558.html