Hello everybody, my name is Xie Wei, a programmer.
Today's topic: kafka user guide, single-node version.
1. Scene
If you are a back-end engineer, application of the normal operation of the line design, a particular spike activity, a sudden collapse of the system out, the investigation found that the system does not handle a lot of traffic, cause the system to hang, and this time there are two ideas: 1. nginx reverse proxy forwards the request to be processed more internal network server, to achieve a load balancing purposes 2. messaging system, more requests to a middleware "cache" up, then from the system to the cache continuously taken request for further processing.
The latter uses the messaging system is a use scenario kafka.
So what is kafka?
kafka is a distributed messaging system, it has been positioned as streaming distributed internet.
A simply send messages to the system a message system, a message system and from system B to take the message, subsequent processing.
A common term used to describe kafka scenarios are: peak load shifting, to reduce peak flow rate, the flow trough filling, smoothing of the system as much as possible.
Thereby obtaining at: kafka three typical application scenarios
- Messaging System
- Storage System
- Distributed stream processing platform
Messaging system is the most widely used; message transmission needs to be stored for subsequent pull system, it can also be used as a storage system; after pulling news, in fact, is the follow-up system for processing, then why not also contain further data processing kafka system? Distributed stream processing platform, about what it means.
The core of the application stated the following: Messaging
2. Basic Concepts
A message generated by the system A, the message sent to the system, system B pull message from the system, which involves a lot of concepts.
- A system called producer producer, is sending a message object
- Messaging system called the broker, the process is essentially the purpose of the service is to accept the message producers, message consumers pull request, persistence
- System B is called consumer consumer, is the object of the pull message system messages
For producers, consumers have different parameter settings, determines the different behavior of producers, consumers.
Producers want to send a message, we must first know where to send, that is, to know the address of the broker, the broker know the address, broker (kafka server) set constraint address and other acts of persistent storage, in addition, how distinguish between different types of outgoing messages it? kafka system to take the concept to distinguish a message logical concept: Topic, i.e. different producers designated Topic, stored address is different.
For Topic, simple scene is constantly entered hair content, persistent storage will continue with additional storage mode, simple scene no problem, the problem is too much, then the message data, the system is not conducive to consumption, a very simple idea, points different "file" additional storage, to reduce the overall size, this concept is called kafka the partition:. partition message may continue to append mode continuously sent to the sub-region, there is a partition number, start bit, news append mode stored in the sub-region, will give a number to offset
Consumers pull messages from the broker system, you must first know the broker address, followed by the need to know Topic, which also more refined partition can be set up, which offset offset start, consumer news.
That news ever lost zezheng? A simple approach is redundant: Replication, backup multiple copies, one of which is the Leader, the other is the role of follower, leader of the message is and docking, follower and not a direct message butt, butt only responsible and leader, constantly sync data.
Kafka cluster composed of multiple broker, if a hung kafka system relies zookeeper re-election of a new leader.
kafka cluster:
kafka topic: Zoning concept
kafka cluster:
3. The client uses
Based on the above concept: How to build a Kafka service, complete message system?
- Start the service process: broker
Fake code:
type Broker struct{
Addr
Config
...
}
复制代码
- Producers connection broker
Fake code:
type Producer struct{
Config
Message
...
}
复制代码
- Consumer connection broker
Fake code
type Consumer strcut{
Config
Topic
Partitions
Offset
...
}
复制代码
The basic idea:
- Start kafka Service
- A system connected to the service, send a message
- System B connection services, consumer news
Examples of binding official website: how to complete the basic messaging.
Download the installation package: kafka_2.12-2.3.0.tgz
- Means the compiler version 2.12
- Version 2.3.0 refers kafka
After unpacking, the most important are two directories:
- bin: a series of scripts, such as starting zookeeper services, to create topic, news producers production, consumer spending and other news
zookeeper-server-start.sh
zookeeper-server-stop.sh
kafka-configs.sh
kafka-console-consumer.sh
kafka-console-producer.sh
kafka-consumer-groups.sh
kafka-topics.sh
kafka-server-start.sh
kafka-server-stop.sh
...
复制代码
- config: Profile: zookeeper such as port configuration, the configuration kafka log storage directory, external port, the maximum capacity of the message, and so save often
zookeeper.properties
server.properties
producer.properties
consumer.properties
...
复制代码
Probably more than 200 parameters, right, sorry, I can not remember. How to do that? Not to learn yet, it can not earn money, you can not raise ah.
The basic default settings, some settings by category:
- zookeeper.properties
kafka depends on the zookeeper distributed coordination
dataDir=/tmp/zookeeper
clientPort=2181
复制代码
Remember that this default clientPort = 2181
- server.properties
kafka server service
log.dirs=/tmp/kafka-logs //日志存储目录
log.retention.hours=168 // 日志存储时长
broker.id=0 // 默认 broker id,集群方式的 kafka 设置,给每个 broker 编号
listeners=PLAINTEXT://:9092 // 对外提供的服务入口地址
zookeeper.connect=localhost:2181 // ZooKeeper集群地址
...
复制代码
- producer.properties
Content conventions messages, etc.
- consumer.properties
Conventions message content consumption, etc.
Configure the configuration parameters:
- Start zookeeper
> bin/zookeeper-server-start.sh config/zookeeper.properties
复制代码
- Start kafka service process
> bin/kafka-server-start.sh config/server.properties
复制代码
Create a topic, you can use the query topic such as: kafka-topics.sh
Producers to produce news you can use: kafka-console-producer.sh
Consumer spending news you can use: kafka-console-consumer.sh
Of course, these operations typically used only for testing, use is actually used language into a corresponding client.
4. demonstration
kafka go version of the client:
Download and install:
go get -u -v github.com/Shopify/sarama
复制代码
4.1 Producer
System A
- Producers
type KafkaAction struct {
DataSyncProducer sarama.SyncProducer
DataAsyncProducer sarama.AsyncProducer
}
复制代码
// 同步方式
func newDataSyncProducer(brokerList []string) sarama.SyncProducer {
config := sarama.NewConfig()
config.Producer.RequiredAcks = sarama.WaitForAll // Wait for all in-sync replicas to ack the message
config.Producer.Retry.Max = 5 // Retry up to 10 times to produce the message
config.Producer.Return.Successes = true
config.Producer.Partitioner = sarama.NewRoundRobinPartitioner
producer, err := sarama.NewSyncProducer(brokerList, config)
if err != nil {
log.Fatalln("Failed to start Sarama producer1:", err)
}
return producer
}
复制代码
// 异步方式
func newDataAsyncProducer(brokerList []string) sarama.AsyncProducer {
config := sarama.NewConfig()
sarama.Logger = log.New(os.Stdout, "[KAFKA] ", log.LstdFlags)
config.Producer.RequiredAcks = sarama.WaitForLocal // Only wait for the leader to ack
config.Producer.Compression = sarama.CompressionSnappy // Compress messages
config.Producer.Flush.Frequency = 500 * time.Millisecond // Flush batches every 500ms
config.Producer.Partitioner = sarama.NewRoundRobinPartitioner
producer, err := sarama.NewAsyncProducer(brokerList, config)
if err != nil {
log.Fatalln("Failed to start Sarama producer2:", err)
}
go func() {
for err := range producer.Errors() {
log.Println("Failed to write access log entry:", err)
}
}()
return producer
}
复制代码
Remember producers have a set of configuration parameters it? This config on this role, has a default value, you can set the corresponding value yourself.
For example: compression algorithm
config.Producer.Compression = sarama.CompressionSnappy
复制代码
Commonly used compression algorithms are:
- gzip
- snappy
- lz4
- zstd
Different compression algorithms differ primarily in the compression ratio and throughput.
For example, zoning rules
config.Producer.Partitioner = sarama.NewRoundRobinPartitioner
复制代码
Commonly used partitioning rules:
- Polling mechanism
- Random partition
- Press the key partition
For example: sending a message whether to return success
onfig.Producer.RequiredAcks = sarama.WaitForLocal
复制代码
- Message: Manufacturer byte data transfer only.
interface
type Encoder interface {
Encode() ([]byte, error)
Length() int
}
复制代码
Encoder message transmitted need to implement the interface, i.e., the definition of message structure and the need to implement Encode Length method.
type SendMessage struct {
Method string `json:"method"`
URL string `json:"url"`
Value string `json:"value"`
Date string `json:"date"`
encoded []byte
err error
}
func (S *SendMessage) Length() int {
b, e := json.Marshal(S)
S.encoded = b
S.err = e
return len(string(b))
}
func (S *SendMessage) Encode() ([]byte, error) {
return S.encoded, S.err
}
复制代码
- Send a message
func (K *KafkaAction) Do(v interface{}) {
message := v.(SendMessage)
// 发送的消息返回分区和偏移量
partition, offset, err := K.DataSyncProducer.SendMessage(&sarama.ProducerMessage{
Topic: TOPIC,
Value: &message,
})
if err != nil {
log.Println(err)
return
}
value := map[string]string{
"method": message.Method,
"url": message.URL,
"value": message.Value,
"date": message.Date,
}
fmt.Println(fmt.Sprintf("/%d/%d/%+v", partition, offset, value))
}
复制代码
For example, we send a message in accordance with the above configuration: topic: topic-golang partition / offset / value
/0/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/2/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/3/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
复制代码
Above is only one partition, offset value is increasing.
Create another topic, in 10 districts. topic: topic-python
In the log displayed as Zeyang it?
// cd log.dirs ; server.properties 中的设置
topic-golang-0
topic-python-0
topic-python-1
topic-python-2
topic-python-3
topic-python-4
topic-python-5
topic-python-6
topic-python-7
topic-python-8
topic-python-9
复制代码
Transmission log, the partitioning rule in the polling to topic-python:
/0/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/1/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/2/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/3/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/4/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/5/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/6/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/7/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/8/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/9/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/1/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
复制代码
Poll, continued to partition memory message.
4.2 Consumers
System B
func main() {
config := sarama.NewConfig()
config.Consumer.Return.Errors = true
brokers := []string{"127.0.0.1:9092"}
master, err := sarama.NewConsumer(brokers, config)
if err != nil {
panic(err)
}
defer func() {
if err := master.Close(); err != nil {
panic(err)
}
}()
_, e := master.Partitions("topic-python")
if e != nil {
log.Println(e)
}
consumer, err := master.ConsumePartition("topic-python", 0, sarama.OffsetOldest)
if err != nil {
panic(err)
}
signals := make(chan os.Signal, 1)
signal.Notify(signals, os.Interrupt)
doneCh := make(chan struct{})
go func() {
for {
select {
case err := <-consumer.Errors():
fmt.Println(err)
case msg := <-consumer.Messages():
fmt.Println("Received messages", string(msg.Key), string(msg.Value), msg.Topic)
case <-signals:
fmt.Println("Interrupt is detected")
doneCh <- struct{}{}
}
}
}()
<-doneCh
}
复制代码
- Consumers specify the topic: topic-python
- Consumers specify the partition: 0
Remember the message is sent to the producer within topic-python do? partition / offset / value
/0/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/1/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/2/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/3/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/4/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/5/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/6/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/7/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/8/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/9/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/1/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
复制代码
It can be seen: partition: 0 in two messages. Then the consumer specifies the zoning, which can only consume two messages.
Received messages {"method":"get5","url":"www.baidu.com4","value":"da4","date":"12344"} topic-python
Received messages {"method":"get5","url":"www.baidu.com4","value":"da4","date":"12344"} topic-python
复制代码
4.3 Other
Use kafka client, then what features we need?
- About Topic creation, description, delete, etc.
- Consumer group description, etc.
- Meta information: metadata
type ClusterAdmin interface {
CreateTopic(topic string, detail *TopicDetail, validateOnly bool) error
ListTopics() (map[string]TopicDetail, error)
DescribeTopics(topics []string) (metadata []*TopicMetadata, err error)
DeleteTopic(topic string) error
CreatePartitions(topic string, count int32, assignment [][]int32, validateOnly bool) error
DeleteRecords(topic string, partitionOffsets map[int32]int64) error
DescribeConfig(resource ConfigResource) ([]ConfigEntry, error)
AlterConfig(resourceType ConfigResourceType, name string, entries map[string]*string, validateOnly bool) error
CreateACL(resource Resource, acl Acl) error
ListAcls(filter AclFilter) ([]ResourceAcls, error)
DeleteACL(filter AclFilter, validateOnly bool) ([]MatchingAcl, error)
ListConsumerGroups() (map[string]string, error)
DescribeConsumerGroups(groups []string) ([]*GroupDescription, error)
ListConsumerGroupOffsets(group string, topicPartitions map[string][]int32) (*OffsetFetchResponse, error)
DeleteConsumerGroup(group string) error
DescribeCluster() (brokers []*Broker, controllerID int32, err error)
Close() error
}
复制代码
Basic application on a single node kafka on these.
5. Container Service
Any system to provide services, you can use a container version, kafka can also use container version. You can use the form configuration environment variable settings.
docker-compose.yml
version: '2'
services:
ui:
image: index.docker.io/sheepkiller/kafka-manager:latest
depends_on:
- zookeeper
ports:
- 9000:9000
environment:
ZK_HOSTS: zookeeper:2181
zookeeper:
image: index.docker.io/wurstmeister/zookeeper:latest
ports:
- 2181:2181
server:
image: index.docker.io/wurstmeister/kafka:latest
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_OFFSETS_TOPIC_REPLIATION_FACTOR: 1
KAFKA_ADVERTISED_HOST_NAME: 127.0.0.1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
复制代码
- zookeeper distributed coordination system
- kafka server Kafka Service
- kafka-manager kafka management platform
Follow-up cluster version.
<End>