kafka

############# The kafka
distributed messaging system, written in scala, is the basis for active streaming and operational data processing pipelines

1. Use background to
analyze user behavior, search keywords for statistics, etc. 

Support protocol - imitation AMQP

Does not support transactions, sacrifices support for transactions in pursuit of high performance


####AMQP protocol
Consumer (consumer): client application that requests messages from the message queue
Producer (producer): client application that publishes messages to broker
AMQP server (broker): used to receive producers The messages sent to the refrigerator are routed to a queue in the server

You can use most mainstream languages ​​to communicate with kafka, that is, write your own consumer and producer

#########Kafka Architecture
Topic
Partition: The message data in a topic is organized in multiple partitions, and a partition is the smallest unit of Kafka message queue organization.
                  A partition can be regarded as a FIFO Queue
backup (replicatioon): In order to ensure distributed reliability, kafka0.8 starts to back up the data of each partition (on different brokers) to prevent one of the brokers from being down and causing the partition data to be unavailable
zookeeper: one provides distributed state management , a cluster of distributed configuration management, distributed lock services, etc.
kafka performance is not high, you can increase the partition


#########Cluster build
config/server.properties file configuration

##########kafka message processing and cluster maintenance
1. Message organization
    1. When recognizing the disk to
        read data, it is necessary to determine the track and sector:
        first, the noodles must be found, that is, the magnetic head needs to be moved and aligned accordingly The track - seek time
        and then the target sector rotates under the head - rotation time
        
        A disk access request (read/write) is completed in three processes:
        seek (time): the head moves to the specified track and
        rotates (delay): wait for the specified track Sector passes under the head through
        data transmission (time): data is transmitted between disk, memory and network
        
    2. kafka message writing principle (zero-byte copy of data writing and reading, data is written sequentially instead of random)
    Production: Network->pagecache->Disk
    Consumption: Disk->Network
    
    3. kafka message deletion principle
    starts from the oldest log segment (.log file) (deletes in units of log segments), and then moves forward gradually, until a log segment does not satisfy the condition.
    Deletion conditions:
        1. Satisfy the given condition predicate (specified by the configuration items log.retention.{ms, minutes, hours} and log.retention.bytes)
        2. Not the currently activated log segment, that is, the currently operating log segment
        3. The size cannot be smaller than the minimum size of the log segment (configuration item log.segment.bytes )
        4. Whether all log segments are to be deleted, and if so, directly call the roll method for segmentation, because kafka must retain at least one log segment
    
    deletion mechanism:
        1. Delete: log.retention.check.interval.ms specifies the interval
        2. Flush: log.flash.scheduler.interval.ms Specify the time interval
        3. Log checkpoint: log.flush.offset.checkpoint.interval.ms Specify the time interval
        4. Compression (if any): Always run (by log.cleaner. enable specifies whether to enable)

2. Message retrieval (partition file consists of .index file and .log file)
    1.segment file (.log) composition and physical structure
    
    2.index file (.index) composition and physical structure
    
    3. The
        first step of retrieval process: search segmentfile, according to the offset (offset) binary search file list, you can quickly locate the
        second part of the file: find message through segment


3. Cluster maintenance
    1. Real-time viewing and modification of basic cluster information (topic tool)
        1. List all topics currently available in the cluster:
        bin/kafka-topics.sh --list -zookeeper zookeeper_address
        
        2. View cluster-specific topic information:
        bin /kafka-topics.sh --describe --zookeeper zookeeper_address --topic topic_name
        
        3. Create topic
        bin/kafka-topics.sh --create --zookeeper zookeeper_address -replication-factor 1 --partitions 1 --topic topic_name
        
        4. Increase (cannot decrease) partition (the last 4 is the increased value)
        bin/kafka-topics.sh --zookeeper zookeeper_address --alter --topic topic_name --patitions 4
        
        
    2. Cluster leader balance mechanism (machines appear frequently online and offline All replicas of
    each partition are called "assigned replicas", and the first replicas in "assigned replicas" are called "preferred replicas"
    The newly created topic generally "preferred replicas" is the leader. The
    
    cluster leader is balanced:
    bin/kafka-preferred-replica-election.sh -zookeeper zookeeper_address
    
    can be configured in the configuration file to automatically execute auto.leader.rebalance.enable=true
    
    3. Partition log migration (migrate data to a new cluster)
    Migrate topic data to other brokers:
    1. Write a json file, the file format is as follows:
      cat topics-to-mobe.json
      {
        "topic":[{"topic":"foo1"} ,{"topic":"foo2"}],
        "version":1
      }
     2. Use -generate to generate a migration plan (the following operation is to move topic:foo1 and foo2 to broker5,6)
     bin/kafka-reassign-partitions. sh --zookeeper localhost:2181 --topic-to-move-json-file topics-to-move.json --broker-list "5,6" -generate
     This step only generates the plan, and does not perform data migration, and puts the execution results in the file expand-cluster-reassignment.json
     
     3. Use -execute to execute the plan:
     bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file expend-cluster-reassignment.json -
     It is best to save the current allocation before executing -execute to
     
     4. Use -verify to verify whether the migration has been completed bin/ kafka-reassign-partitions.sh
     --zookeeper localhost:2181 --reassignment-json-file expend-cluster-reassignment.json -verify 
     
     migrate a topic For some specific partition data to other brokers, the steps are the same, except that the json file has changed.
    
    Note:
    The kafka-reassign-partitions.sh tool will copy the log files on the disk. Only when the complete copy is completed will the logs on the disk before the migration be deleted. document. Note when performing partition log migration:
    1. The granularity of the kafka-reassign-partitions.sh tool can only go to the broker, not to the broker's directory (if the broker is configured with multiple directories, it is based on the number of partitions that have resided on the disk. Evenly distributed),
      so if the data between topics or the data between topic partitions is uneven, it is likely to cause uneven disk data
    2. It will take a lot of time to migrate data from partitions with more partition data, so it is recommended to perform data migration operations when the number of topics is small or the effective data on the disk is small.
    3. When migrating partitions, it is best to keep one Partition on the original disk, so that it will not affect normal consumption and production

4. Cluster monitoring
    1. kafka Offset Monitor (can monitor a cluster)
        1. The currently surviving set of brokers
        2. The list of topics currently active in the cluster
        3. The list of consumers
        4. The number of offset lags consumed by the current consumers of the cluster by group
          is the current topic How many messages are backlogged in the current partition and not consumed in time
          
    2. kafka Manager (which can monitor multiple clusters) can not only observe but also modify
        1. Manage multiple clusters
        2. Check cluster status (topic, brokers, replica distribution, partition distribution)
        3. Select replica to view
        4. Reassign partitions
        

              

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324434655&siteId=291194637