Kafka Study Notes

" Kafka used to live Design Analysis (a): Kafka describes the background and architecture" study notes

 

 MQ role:

 1. Decoupling: data producers and data consumers of decoupling. Data producers Pruducer only need to be concerned about data production; consumption data need only be concerned put Consumer data consumption. Without having to be concerned about the transfer process in which the process to ensure MQ.

    Just think, Producer Consumer data produced will be sent using the Http request after the data, if the transmission fails, it is the responsibility or the responsibility of the consumer side of the production side? If the production retry mechanism guarantees message transmission has failed, then the equivalent

    Logic will not produce added data Producer.

 2. Clipping: When Producer (not necessarily within the system, it may be an external system, such as http interface, the interface is open to traffic outside the explosion) the amount of data produced by a sudden surge, Consumer throughput can not meet the amount of data increased. therefore,

  Will inevitably lead to the accumulation of data, it will slowly overwhelm the entire system. Therefore, it is necessary to introduce MQ data Producer production queues, Consumer consumed not been able to accumulate data in the MQ, which is not directly crushed Consumer.

 3. Asynchronous: there may be different between the throughput and the Consumer Producer. Therefore, the data of production and consumption may not be synchronized. MQ is introduced so that the flow of asynchronous data of production and consumption.

 

 Article:

  4. Recoverability: when a portion of the components of the system failure, does not affect the entire system. Message queue reduces coupling between processes, so that even if a message processing process hang, queued messages can still be processed after the system recovery.

  The sequence of: In most usage scenarios, the order of data processing is important. Most originally ordered message queue, and can ensure that the data will be processed in a particular order. Kafka ensure the orderly messages within a Partition.

     Http request due to network delays, leading to a request to send is not necessarily the first to reach the target service, but not necessarily to get the response of the target service. If the request to produce MQ, it is able to assure an orderly requests.

Kafka Glossary:

  1.Broker:Kafka node

  2.Controller: in Kafka there will be a cluster or multiple broker, a broker which will be elected for the controller (Kafka Controller), which is responsible for managing state across the cluster and copies of all partitions. When the leader a copy of a partition fails, the controller is responsible for the election of the new leader a copy of the partition.

              When a change is detected a partition ISR collection occurs, the controller is responsible for notifying all broker update metadata information. When kafka-topics.sh script to increase the number of partitions for a topic, or by the reallocation of the same controller is responsible for the partition.
         Controller electoral mechanism: All Broker nodes go ZK registered temporary node / Controller, since only a Broker is registered successfully, so successfully registered the Broker will become Controller, other failures Broker will determine the cluster already exists according to the failure information Controller.

  3. Topic: publish messages to each cluster Kafka has a category, the category is known as Topic, data Topic does not guarantee the order, unless Parition 1. (Topic different physically separate storage of messages, the message is logically a Topic although stored on one or more of the message broker, but the user to specify a Topic to production or consumption data without concern for where the data are stored in)

  4. Partition: is divided over one or more physical Partition data on a Topic, located in different Broker, a single internal data is sequential Partition. With the increase in the number of Partition, Topic throughput increases.

  5.Producer: data producers, production data pushed Kafka cluster.

  6.Consumer: data consumer, the consumer pulls data from Kafka cluster.

  7. Consumer Group: Consumer groups, Kafka used to live in the entire cluster to ensure stable operation (Reblance not happen), the data for each Partition in the Consumer Group will be in a Consumer consumption. Group name can be specified for each Consumer, if the group name specified in the Default group.

  8. Offset: each has several Consumer Group Offset values ​​in each Partition, the recording of the CG to the consumption of which data Parititon. offset recorded in the Topic called __consumer_offsets in.

  9. Parittion Replication: Since each Topic is divided into a plurality of cargo the Partition, and therefore a Topic data is divided into multiple parts are stored on different Kafka Broker, Broker if a failure occurs, will result in the stored Broker this part of the data loss.

                                         Therefore, Kafka made Replication concepts. Each Topic may be provided one or more Replication, Partition of the redundant data. If the number of Replication is set to 3, the data for each Parititon in Kafka represents the cluster

                                         There are three parts; Replication if the number is set to 1, the data representing each Parititon in Kafka there is only one cluster.

  10. Partition Leader: Replication of the presence, and therefore each present in a plurality of data Partittion Broker nodes. Therefore, it is necessary to select reading Parittion Leader, Producer and Consumer Parititon of the data, write-only and Paritition Leader interaction. Replication of other Broker

                                     Only as a Follower of the Partition, Leader will continue to sync data to the Follower. If you hang up after when Leader, Controller will specify the new Partition Leader.

Partition:

When sending a message to the Producer broker, where it will choose to store a Partition according Paritition mechanism. If Partition mechanism set reasonable, all messages can be evenly distributed to different Partition inside, so to achieve load balancing. If a Topic corresponds to a file, the file that the machine where the I / O will be the Topic of performance bottlenecks, and later with the Partition, different messages can be written in parallel in different Partition different broker's, greatly improves throughput. You can be specified in $ KAFKA_HOME / config / server.properties by the number of configuration items num.partitions default Partition New Topic can also be specified with the parameters when you create Topic, but can also modify tools provided by Kafka after Topic created.

When sending a message, you can specify this message key, Producer and Partition according to the key mechanism for determining this news should be sent to which Parition. The same key message will be sent to the same Partition in, but does not guarantee that there is only one Partition Key data (this is a personal guess) .

Consumption at least once, up to a consumer, the consumer only once:

At least one consumption: 1, Data 2, Data 3 processed, filed Offset

Much time consumption: 1, Data 2, submitted Offset 3, data processing

Only one consumer: 1, 2 receiving data, processing data processing but not for falling 3 results submitted (distributed transactions) together for falling and Offset

 

Guess you like

Origin www.cnblogs.com/ybonfire/p/11815786.html