Kafka and cluster deployment

Kafka Introduction

    Official website: http: //kafka.apache.org

    Kafka is a very good performance and support for distributed message queuing middleware. Due to its high throughput characteristics, Kafka generally used in the field of large data, such as log collection internet. In fact, Kafka is a stream processing platform, this concept is not well understood, reason is called flow, because it is like a high-throughput pipeline can support in their work, data flow like water into it, and then go to the other end read the data. We can put Kafka seen as a special message queue middleware.

    When messages are stored kafka Topic classification according to, send messages become Producer, Consumer message recipients become, in addition kafka kafka instances a plurality of clusters, each instance ( Server ) becomes broker. Whether kafka cluster, or the producer and consumer are dependent on the zookeeper to ensure the preservation of some of the meta information system availability cluster. kafka cluster almost no maintenance of any consumer and producer status information, which have zookeeper save; therefore producer and consumer clients achieve a very lightweight, they can leave freely, without causing additional impact clusters.

    Compared with the traditional message system Kafka, with the following exceptions:

1 ) It is designed as a distributed system, easy to expand outwards;
 2 ) It also provides high throughput for the publish and subscribe;
 3 ) It supports multiple subscribers, when a consumer fails to automatically balance;
 4 ) it message persisted to disk, it can be used for bulk consumption, such as ETL, as well as real-time applications.

1, Kafka There are several key concepts and roles

1)Producer

News producer, is to produce the source of the message, is responsible for generating the message sent to Kafka.

2)Consumer

News consumer, the consumer is the message of responsible consumption messages on the server Kafka.

3)Topic

Theme, defined by the user, and configure the server in Kafka for establishing a subscription relationship between producers and consumers, producers will send the message to the specified Topic, then consumers and then go on to take the message from the Topic.

     Topic a message may be considered as a group, each topic is divided into a plurality of partition (zone), in each partition level is the append log file storage. Any release message to this partition will be appended directly to the end of the log file, each message location in the file called offset (offset), an offset for the digital long, it is the only mark a message. Its only mark a message. kafka does not provide any additional indexing mechanism to store offset, because almost does not allow the message "random access" in kafka in.

4)Partition

Message partition, the following will be a plurality Topic Partition, it is an ordered queue of each Partition, Partition each message is assigned a sequential id.

     partitions the design purpose more than most fundamental reason is kafka-based file storage by partitions, you can log content across multiple s on erver, to avoid the file size reaches the upper limit of single disks, each partiton will be the current server ( examples kafka) preservation; may be any of a plurality of multi-topic segmentation partitions, efficiency messages are stored / consumed the more partitions addition means can hold more consumer, effectively enhance the ability of concurrent consumption.

5)Broker

This is actually a server Kafka, Kafka or whether it is a single cluster, called the unified Broker, some of the data to translate it for the agent or broker.

6)Group

Consumer groups, the same class of consumers categorized into a group. In Kafka, the more consumers to consume messages in a Topic, where each part of the message of consumer spending, these consumers on the formation of a group, have the same group name.

kafka by zookeeper cluster configuration management, election leader

2, kafka Features

1) Kafka: memory, disk, databases, support large accumulation

    kafka partition is the smallest storage unit, comprising a plurality of partitions topic, theme when creating kafka, these partitions are allocated on multiple servers, one server is typically a broker.

    Zoning chief evenly distributed across different servers, partition copy will be evenly distributed across different servers, load balancing and high availability to ensure that, when adding a new broker cluster, partial copy will be moved to the new broker on.

    According directory listing profile, kafka new partition will be assigned to the fewest number of directory listing in the directory partition.

    By default, the partition algorithm uses a polling message evenly distributed in different partitions of the same subject matter, for the case when delivering the key, will be stored according to the corresponding partition value of the key modulo hashcode.

2) Kafka: supports load balancing
a) a broker is usually a server node. For the same Topic different partitions, Kafka'll try to be distributed to different partitions Broker server, zookeeper save metadata broker, themes and partition information. Partition leader will handle production requests from clients, kafka partition leader will be assigned to a different broker server, so that different broker server shared task.
     Each broker are cached metadata information, the client can get a broker metadata information from any and cached know where to send the request based on metadata information.
b) Kafka used to live in the same consumer groups subscribe to a topic, so that every consumer will be assigned to the same number of partitions share the load as much as possible.
c) When consumers join or quit when the consumer group, will trigger rebalancing, reallocation of partitions for each consumer, share the load.
   kafka most load balancing is done automatically, create a partition also kafka completed, hide a lot of details, to avoid the tedious configuration and load problems caused by human negligence.
d) transmission side to determine which key topic and the message to the partition, if the key is null, then the algorithm uses a polling message to be sent to the same equilibrium topic different partitions. If the key is not null, then the calculated modulo will be sent to the key partition according hashcode.
3) cluster approach, natural 'Leader-Slave' stateless cluster, each server is both a Master Slave .
    Zoning chief evenly distributed across different kafka servers, partition copy is also evenly distributed over different kafka server, each server contains both partitions kafka leader, but also contains a copy of the partition, each server is a kafka Slave station kafka server, but also the leader of a Taiwan kafka server.
     kafka cluster depends on the zookeeper, zookeeper support hot expansion, all the broker, consumers, partitions can be dynamically added to remove without shutting down the service, compared to not rely zookeeper cluster mq, which is the biggest advantage.

Kafka workflow

1) Producers periodically send a message to the topic.

2) Kafka broker all messages stored in the configuration for that particular subject partition. It ensures equal sharing of messages between partitions. If the producer sends two messages, and there are two partitions, Kafka a message stored in the first partition, the second message is stored in the second partition.

3) consumers to subscribe to a particular topic.

4) Once the consumer subscribes to a theme, Kafka will provide consumers with the current offset this topic, and saves the offset in the ZooKeeper.

5) Consumers will periodically request Kafka news.

6) Once the message is received from the producer of Kafka, it will forward these messages to consumers.

7) Consumers will receive the message and process it.

8) Once the message is processed, the consumer will send an acknowledgment Kafka broker.

Change 9) Upon receipt of confirmation Kafka, it will offset the new value, and update the ZooKeeper. Since the offset ZooKeeper retained, even when the server fails, consumers can read the message correctly.

kafka cluster deployment:

(1) Kafka architecture is a producer (news producer), consumer (news consumers), borker (kafka cluster server, handles the message read, write request, store messages, kafka cluster at this level here, in fact, which is there are a number of Broker), topic (message Queuing / classification is the queue, there are producers and consumers model), zookeeper (zookeeper information exists in the metadata, comprising: memory consumption offset, topic topic information, partition information) these parts.

(2) kafka inside news is that topic organized, we can imagine a simple queue, a queue is a topic, then it put each topic is divided into a number of partition, this is to make parallel, in each partition strongly-ordered internal message, corresponding to the ordered queue, wherein each message has a sequence number offset, such as 0 to 12, from the front to the back read write. A partition corresponds to a broker, a broker can control a plurality of partition, for example, topic six partition, there are two broker, each broker that partition 3 on the tube. This partition can easily imagine a document when the data sent, when it started for the partition above append, append the line, the message does not pass through the buffer memory, write files directly, kafka and many messaging system is not the same, many of the messaging system is consumer finished I'll delete, delete and kafka is based on time strategy, rather than finished consumer deleted, there is no kafka in a complete consumption of such a concept, such a concept only expired.

(3) producer to decide which partition to which to write, here are some strategies, such as if the hash, not to join data between a plurality of partition. consumer maintain their own consumption to which offset, each consumer has a corresponding group, within the group is the queue consumption model (the individual consumer consumption of different partition, and therefore a message within the consumer group only once), inter-group is a publish-subscribe model of consumption , each group is independently consumption, independently of each other, so a group message is consumed each time.

kafka cluster installation configuration:

   1) installation configuration kafka cluster depends zookeeper, before building kafka cluster, you must first deployed a cluster available zookeeper

   2) need to install the operating environment openjdk

   3) synchronization kafka copied to all cluster hosts

   4) modify the configuration file

   5) broker.id each server can not be the same

   6) zookeeper.connect cluster address, do not have listed, you can write part

Deployment environment: 

Operating System IP kafka version 

rhel6.5 192.168.1.234 2.1.1 

rhel6.5 192.168.1.206 2.1.1 

rhel6.5 192.168.1.45 2.1.1

1, create a user, all download kafka

[root @ kafka- 0001 ] $ useradd wushaoyu 

[root @ Kafka used to live - 0001 ] $ su -   wushaoyu 

[wushaoyu @ Kafka used to live - 0001 ] $ wget HTTPS: // mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.1. 1 / kafka_2.11-2.1.1.tgz

2, create a message catalog, modify the configuration file

[wushaoyu kafka- @ 0001 ~] $ CD kafka_2. . 11 - 2.1 . . 1 
[wushaoyu Kafka @ - 0001 . kafka_2 . 11 - 2.1 . . 1 ] $ mkdir logs 
[wushaoyu Kafka @ - 0001 . kafka_2 . 11 - 2.1 . . 1 ] $ CD config / 
[wushaoyu @ Kafka used to live - 0001 config] $ CAT the server.properties | egrep -v " ^ $ | ^ # " 
broker.id = 1 #broker globally unique number can not be repeated, myid correspond with the recommendations of the zookeeper
listeners = PLAINTEXT: //192.168.1.234: 9092 #broker listening ip and port
advertised.listeners = PLAINTEXT: //192.168.1.234: 9092
num.network.threads
= . 3 #borker network processing threads num.io.threads = . 8 #borker for I / O processing threads socket.send.buffer.bytes = 102400 # send buffer size, i.e., the first message sent to the transmission buffer, when the buffer is full sent out together socket.receive.buffer .bytes = 102400 # reception buffer size, the first received message into the receive buffer, when this number reaches synchronized to disk socket.request.max.bytes = 104857600 # maximum number of bytes requested to kafka socket, preventing server outofmemory, the best size should not exceed java stack size log.dirs . = / Home / wushaoyu / kafka_2 11 -2.1 . 1 / logs # message store directory, log directory is not num.partitions = 1 # default number of partitions for each topic num.recovery.threads.per.data.dir = 1 offsets.topic.replication.factor = 1 Transaction. state.log.replication.factor = 1 transaction.state.log.min.isr = 1 log.retention.hours = 168 # message expiration time, the default is one week log.segment.bytes = 1073741824 log.retention.check.interval .ms = 300000 zookeeper.connect = 192.168 .1.234:2181,192.168.1.206:2181,192.168.1.45:2181 zookeeper.connection.timeout.ms=6000 group.initial.rebalance.delay.ms=0

 3, start kafka

[wushaoyu@kafka-0002 kafka_2.11-2.1.1]$ ./bin/kafka-server-start.sh ./config/server.properties 
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c0000000, 1073741824, 0) failed; error='Cannot allocate memory'(errno = 12 ) 
# 
# There IS Insufficient Memory for at The the Java Runtime Environment to the Continue .       
# Native Memory Allocation (mmap) failed The to the Map 1073741824 bytes for Committing. As a Reserved Memory. 
# An error Report File with More Information IS saved AS : 
# / . Home / wushaoyu / kafka_2 11 - 2.1 . 1 / hs_err_pid23272.log 
here appear error: unable to allocate enough memory, because the deployment environment for cloud hosting, only 1G of memory, so you can add swap partition resolution View memory size [wushaoyu @ kafka
- 0002 kafka_2.. 11 - 2.1 . . 1 ] $ Free - m Total Used Free Shared buffers cached Mem: 995 920. 75 0 80 641 - / + buffers / Cache: 198 797 Swap: 0 0 0 Creating a Swap [the root @ Kafka - 0001 ~] # dd IF = / dev / ZERO of = / tmp / BS = the swap 1M COUNT = 8192 # create a file, the size of 8G 8192 + 0 Records in 8192 + 0 Records OUT 8,589,934,592 bytes ( 8.6 GB) copied appears,52.9978 s, 162 MB/s [root@kafka-0001 ~]# mkswap /tmp/swap #创建交换分区 mkswap: /tmp/swap: warning: don't erase bootbits sectors on whole disk. Use -f to force. Setting up swapspace version 1, size = 8388604 KiB no label, UUID=84ea82c7-35a3-46be-926a-73dfc7e18548 [root@kafka-0001 ~]# swapon /tmp/swap #启用交换分区 [root@kafka-0001 ~]# free -m Shared buffers cached Used Free Total Mem: 995 928 67 0 22 is 719 - / + buffers / Cache: 186 809 Swap: 8191 0 8191 starts again [wushaoyu Kafka @ - 0001 kafka_2. . 11 - 2.1 . . 1 ] $ ./bin/kafka ./config/server.properties & -server-start.sh (default running in the foreground) or [wushaoyu @ Kafka used to live - 0001 . kafka_2 11 - 2.1 . 1 ] $ ./bin/kafka-server-start.sh -daemon. /config/server.properties

 Zookeeper + Kafka Cluster Verification and news release

The server.1 as creator, server.3 as consumers

# Execute on server.1 

create a topic a partition, two copies 
[wushaoyu @ kafka-0001 kafka_2.11-2.1.1] $ ./bin/kafka-topics.sh --create --zookeeper 192.168.1.234: --partitions --replication-factor. 1 2181 2 --topic mymsg 
the OpenJDK warning the VM 64-Bit Server: The Number of the If IS expected to Increase from Processors One, the then you Should The Configure Threads Number of Parallel appropriately the using the GC -XX: N = ParallelGCThreads 
the Created Topic "mymsg." 

analog producers publish message (message publisher) 
[0001 wushaoyu @ Kafka-kafka_2.11-2.1.1] $ ./bin/kafka-console-producer.sh --broker- 192.168.1.234:9092 --topic mymsg List 
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
> the Hello, world! 
> Nice to Meet you. 
> i Love you 
> 

# on server.2 perform 

analog consumers, receive messages (message recipient)
[wushaoyu@kafka-0002 kafka_2.11-2.1.1]$ ./bin/kafka-console-consumer.sh --bootstrap-server 192.168.1.234:9092 --topic mymsg --from-beginning
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
hello,world!
nice to meet you.
i love you

   # Enter the producer in the news, consumer will be displayed in the same content, successful consumer representation   

   # --From-beginning indicates otherwise receive only messages generated from new start receiving      

So far, no news production and consumption issues, Kafka cluster deployment is complete.


Kafka commonly used commands

1) View topic

[wushaoyu@kafka-0001 kafka_2.11-2.1.1]$ ./bin/kafka-topics.sh --list --zookeeper 192.168.1.234:2181
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
__consumer_offsets
mymsg

2) Details View topic msmsg

[wushaoyu@kafka-0001 kafka_2.11-2.1.1]$ ./bin/kafka-topics.sh --describe --zookeeper 192.168.1.234:2181 --topic mymsg
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
Topic:mymsg    PartitionCount:1    ReplicationFactor:2    Configs:
    Topic: mymsg    Partition: 0    Leader: 1    Replicas: 1,3    Isr: 1,3

3) Delete topic

[wushaoyu@kafka-0001 kafka_2.11-2.1.1]$ ./bin/kafka-topics.sh --delete --zookeeper 192.168.1.234:2181 --topic mymsg

4) Producers parameters View

[wushaoyu @ kafka- 0001 . kafka_2 11 - 2.1 . 1 ] $ ./bin/kafka-console-producer.sh

5) Check parameters creator

[wushaoyu @ kafka- 0001 . kafka_2 11 - 2.1 . 1 ] $ ./bin/kafka-console-consumer.sh

kafka References:

http://kafka.apache.org/21/documentation.html

https://www.jianshu.com/p/d3e963ff8b70

https://blog.51cto.com/littledevil/2134694?source=dra

https://www.cnblogs.com/saneri/p/8762168.html

Guess you like

Origin www.cnblogs.com/wusy/p/11216812.html