kafka Start Guide: single node

Hello everybody, my name is Xie Wei, a programmer.

Today's topic: kafka user guide, single-node version.

1. Scene

If you are a back-end engineer, application of the normal operation of the line design, a particular spike activity, a sudden collapse of the system out, the investigation found that the system does not handle a lot of traffic, cause the system to hang, and this time there are two ideas: 1. nginx reverse proxy forwards the request to be processed more internal network server, to achieve a load balancing purposes 2. messaging system, more requests to a middleware "cache" up, then from the system to the cache continuously taken request for further processing.

The latter uses the messaging system is a use scenario kafka.

So what is kafka?

kafka is a distributed messaging system, it has been positioned as streaming distributed internet.

A simply send messages to the system a message system, a message system and from system B to take the message, subsequent processing.

A common term used to describe kafka scenarios are: peak load shifting, to reduce peak flow rate, the flow trough filling, smoothing of the system as much as possible.

Thereby obtaining at: kafka three typical application scenarios

  • Messaging System
  • Storage System
  • Distributed stream processing platform

Messaging system is the most widely used; message transmission needs to be stored for subsequent pull system, it can also be used as a storage system; after pulling news, in fact, is the follow-up system for processing, then why not also contain further data processing kafka system? Distributed stream processing platform, about what it means.

The core of the application stated the following: Messaging

2. Basic Concepts

A message generated by the system A, the message sent to the system, system B pull message from the system, which involves a lot of concepts.

  • A system called producer producer, is sending a message object
  • Messaging system called the broker, the process is essentially the purpose of the service is to accept the message producers, message consumers pull request, persistence
  • System B is called consumer consumer, is the object of the pull message system messages

For producers, consumers have different parameter settings, determines the different behavior of producers, consumers.

Producers want to send a message, we must first know where to send, that is, to know the address of the broker, the broker know the address, broker (kafka server) set constraint address and other acts of persistent storage, in addition, how distinguish between different types of outgoing messages it? kafka system to take the concept to distinguish a message logical concept: Topic, i.e. different producers designated Topic, stored address is different.

For Topic, simple scene is constantly entered hair content, persistent storage will continue with additional storage mode, simple scene no problem, the problem is too much, then the message data, the system is not conducive to consumption, a very simple idea, points different "file" additional storage, to reduce the overall size, this concept is called kafka the partition:. partition message may continue to append mode continuously sent to the sub-region, there is a partition number, start bit, news append mode stored in the sub-region, will give a number to offset

Consumers pull messages from the broker system, you must first know the broker address, followed by the need to know Topic, which also more refined partition can be set up, which offset offset start, consumer news.

That news ever lost zezheng? A simple approach is redundant: Replication, backup multiple copies, one of which is the Leader, the other is the role of follower, leader of the message is and docking, follower and not a direct message butt, butt only responsible and leader, constantly sync data.

Kafka cluster composed of multiple broker, if a hung kafka system relies zookeeper re-election of a new leader.

kafka cluster:

image

kafka topic: Zoning concept

image

kafka cluster:

image

3. The client uses

Based on the above concept: How to build a Kafka service, complete message system?

  • Start the service process: broker

Fake code:

type Broker struct{
    Addr 
    Config
    ...
}
复制代码
  • Producers connection broker

Fake code:


type Producer struct{
    Config
    Message 
    ...
}

复制代码
  • Consumer connection broker

Fake code

type Consumer strcut{
    Config
    Topic 
    Partitions
    Offset
    ...
}
复制代码

The basic idea:

  • Start kafka Service
  • A system connected to the service, send a message
  • System B connection services, consumer news

Examples of binding official website: how to complete the basic messaging.

Download the installation package: kafka_2.12-2.3.0.tgz

  • Means the compiler version 2.12
  • Version 2.3.0 refers kafka

After unpacking, the most important are two directories:

  • bin: a series of scripts, such as starting zookeeper services, to create topic, news producers production, consumer spending and other news
zookeeper-server-start.sh
zookeeper-server-stop.sh
kafka-configs.sh
kafka-console-consumer.sh
kafka-console-producer.sh
kafka-consumer-groups.sh
kafka-topics.sh
kafka-server-start.sh
kafka-server-stop.sh
...

复制代码
  • config: Profile: zookeeper such as port configuration, the configuration kafka log storage directory, external port, the maximum capacity of the message, and so save often
zookeeper.properties
server.properties
producer.properties
consumer.properties
...
复制代码

Probably more than 200 parameters, right, sorry, I can not remember. How to do that? Not to learn yet, it can not earn money, you can not raise ah.

The basic default settings, some settings by category:

  • zookeeper.properties

kafka depends on the zookeeper distributed coordination

dataDir=/tmp/zookeeper
clientPort=2181
复制代码

Remember that this default clientPort = 2181

  • server.properties

kafka server service

log.dirs=/tmp/kafka-logs //日志存储目录
log.retention.hours=168 // 日志存储时长
broker.id=0 // 默认 broker id,集群方式的 kafka 设置,给每个 broker 编号
listeners=PLAINTEXT://:9092 // 对外提供的服务入口地址
zookeeper.connect=localhost:2181 // ZooKeeper集群地址
...
复制代码
  • producer.properties

Content conventions messages, etc.

  • consumer.properties

Conventions message content consumption, etc.

Configure the configuration parameters:

  • Start zookeeper
> bin/zookeeper-server-start.sh config/zookeeper.properties
复制代码
  • Start kafka service process
> bin/kafka-server-start.sh config/server.properties
复制代码

Create a topic, you can use the query topic such as: kafka-topics.sh

Producers to produce news you can use: kafka-console-producer.sh

Consumer spending news you can use: kafka-console-consumer.sh

Of course, these operations typically used only for testing, use is actually used language into a corresponding client.

4. demonstration

kafka go version of the client:

Download and install:

go get -u -v github.com/Shopify/sarama
复制代码

4.1 Producer

System A

  • Producers
type KafkaAction struct {
	DataSyncProducer  sarama.SyncProducer
	DataAsyncProducer sarama.AsyncProducer
}
复制代码
// 同步方式

func newDataSyncProducer(brokerList []string) sarama.SyncProducer {
	config := sarama.NewConfig()
	config.Producer.RequiredAcks = sarama.WaitForAll // Wait for all in-sync replicas to ack the message
	config.Producer.Retry.Max = 5                    // Retry up to 10 times to produce the message
	config.Producer.Return.Successes = true
	config.Producer.Partitioner = sarama.NewRoundRobinPartitioner
	producer, err := sarama.NewSyncProducer(brokerList, config)
	if err != nil {
		log.Fatalln("Failed to start Sarama producer1:", err)
	}
	return producer

}

复制代码
// 异步方式
func newDataAsyncProducer(brokerList []string) sarama.AsyncProducer {
	config := sarama.NewConfig()
	sarama.Logger = log.New(os.Stdout, "[KAFKA] ", log.LstdFlags)
	config.Producer.RequiredAcks = sarama.WaitForLocal       // Only wait for the leader to ack
	config.Producer.Compression = sarama.CompressionSnappy   // Compress messages
	config.Producer.Flush.Frequency = 500 * time.Millisecond // Flush batches every 500ms
	config.Producer.Partitioner = sarama.NewRoundRobinPartitioner
	producer, err := sarama.NewAsyncProducer(brokerList, config)
	if err != nil {
		log.Fatalln("Failed to start Sarama producer2:", err)
	}
	go func() {
		for err := range producer.Errors() {
			log.Println("Failed to write access log entry:", err)
		}
	}()
	return producer
}

复制代码

Remember producers have a set of configuration parameters it? This config on this role, has a default value, you can set the corresponding value yourself.

For example: compression algorithm

config.Producer.Compression = sarama.CompressionSnappy
复制代码

Commonly used compression algorithms are:

  • gzip
  • snappy
  • lz4
  • zstd

Different compression algorithms differ primarily in the compression ratio and throughput.

For example, zoning rules

config.Producer.Partitioner = sarama.NewRoundRobinPartitioner
复制代码

Commonly used partitioning rules:

  • Polling mechanism
  • Random partition
  • Press the key partition

For example: sending a message whether to return success

onfig.Producer.RequiredAcks = sarama.WaitForLocal
复制代码
  • Message: Manufacturer byte data transfer only.

interface

type Encoder interface {
	Encode() ([]byte, error)
	Length() int
}
复制代码

Encoder message transmitted need to implement the interface, i.e., the definition of message structure and the need to implement Encode Length method.

type SendMessage struct {
	Method  string `json:"method"`
	URL     string `json:"url"`
	Value   string `json:"value"`
	Date    string `json:"date"`
	encoded []byte
	err     error
}

func (S *SendMessage) Length() int {
	b, e := json.Marshal(S)
	S.encoded = b
	S.err = e
	return len(string(b))
}
func (S *SendMessage) Encode() ([]byte, error) {
	return S.encoded, S.err
}
复制代码
  • Send a message
func (K *KafkaAction) Do(v interface{}) {
	message := v.(SendMessage)
    // 发送的消息返回分区和偏移量
	partition, offset, err := K.DataSyncProducer.SendMessage(&sarama.ProducerMessage{
		Topic: TOPIC,
		Value: &message,
	})
	if err != nil {
		log.Println(err)
		return
	}
	value := map[string]string{
		"method": message.Method,
		"url":    message.URL,
		"value":  message.Value,
		"date":   message.Date,
	}
	fmt.Println(fmt.Sprintf("/%d/%d/%+v", partition, offset, value))
}
复制代码

For example, we send a message in accordance with the above configuration: topic: topic-golang partition / offset / value

/0/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/2/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/3/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
复制代码

Above is only one partition, offset value is increasing.

Create another topic, in 10 districts. topic: topic-python

In the log displayed as Zeyang it?

// cd log.dirs  ; server.properties 中的设置

topic-golang-0
topic-python-0
topic-python-1
topic-python-2
topic-python-3
topic-python-4
topic-python-5
topic-python-6
topic-python-7
topic-python-8
topic-python-9
复制代码

Transmission log, the partitioning rule in the polling to topic-python:

/0/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/1/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/2/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/3/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/4/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/5/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/6/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/7/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/8/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/9/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/1/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
复制代码

Poll, continued to partition memory message.

4.2 Consumers

System B

func main() {
	config := sarama.NewConfig()
	config.Consumer.Return.Errors = true
	brokers := []string{"127.0.0.1:9092"}
	master, err := sarama.NewConsumer(brokers, config)
	if err != nil {
		panic(err)
	}
	defer func() {
		if err := master.Close(); err != nil {
			panic(err)
		}
	}()
	_, e := master.Partitions("topic-python")
	if e != nil {
		log.Println(e)
	}
	consumer, err := master.ConsumePartition("topic-python", 0, sarama.OffsetOldest)
	if err != nil {
		panic(err)
	}
	signals := make(chan os.Signal, 1)
	signal.Notify(signals, os.Interrupt)
	doneCh := make(chan struct{})
	go func() {
		for {
			select {
			case err := <-consumer.Errors():
				fmt.Println(err)
			case msg := <-consumer.Messages():
				fmt.Println("Received messages", string(msg.Key), string(msg.Value), msg.Topic)
			case <-signals:
				fmt.Println("Interrupt is detected")
				doneCh <- struct{}{}
			}
		}
	}()
	<-doneCh
}
复制代码
  • Consumers specify the topic: topic-python
  • Consumers specify the partition: 0

Remember the message is sent to the producer within topic-python do? partition / offset / value

/0/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/1/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/2/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/3/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/4/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/5/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/6/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/7/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/8/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/9/0/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/0/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
/1/1/map[date:12344 method:get5 url:www.baidu.com4 value:da4]
复制代码

It can be seen: partition: 0 in two messages. Then the consumer specifies the zoning, which can only consume two messages.

Received messages  {"method":"get5","url":"www.baidu.com4","value":"da4","date":"12344"} topic-python
Received messages  {"method":"get5","url":"www.baidu.com4","value":"da4","date":"12344"} topic-python
复制代码

4.3 Other

Use kafka client, then what features we need?

  • About Topic creation, description, delete, etc.
  • Consumer group description, etc.
  • Meta information: metadata
type ClusterAdmin interface {
	CreateTopic(topic string, detail *TopicDetail, validateOnly bool) error
	ListTopics() (map[string]TopicDetail, error)
	DescribeTopics(topics []string) (metadata []*TopicMetadata, err error)
	DeleteTopic(topic string) error
	CreatePartitions(topic string, count int32, assignment [][]int32, validateOnly bool) error
	DeleteRecords(topic string, partitionOffsets map[int32]int64) error
	DescribeConfig(resource ConfigResource) ([]ConfigEntry, error)
	AlterConfig(resourceType ConfigResourceType, name string, entries map[string]*string, validateOnly bool) error
	CreateACL(resource Resource, acl Acl) error
	ListAcls(filter AclFilter) ([]ResourceAcls, error)
	DeleteACL(filter AclFilter, validateOnly bool) ([]MatchingAcl, error)
	ListConsumerGroups() (map[string]string, error)
	DescribeConsumerGroups(groups []string) ([]*GroupDescription, error)
	ListConsumerGroupOffsets(group string, topicPartitions map[string][]int32) (*OffsetFetchResponse, error)
	DeleteConsumerGroup(group string) error
	DescribeCluster() (brokers []*Broker, controllerID int32, err error)
	Close() error
}
复制代码

Basic application on a single node kafka on these.

5. Container Service

Any system to provide services, you can use a container version, kafka can also use container version. You can use the form configuration environment variable settings.

docker-compose.yml

version: '2'
services:
  ui:
    image: index.docker.io/sheepkiller/kafka-manager:latest
    depends_on:
      - zookeeper
    ports:
      - 9000:9000
    environment:
      ZK_HOSTS: zookeeper:2181
  zookeeper:
    image: index.docker.io/wurstmeister/zookeeper:latest
    ports:
      - 2181:2181
  server:
    image: index.docker.io/wurstmeister/kafka:latest
    depends_on:
      - zookeeper
    ports:
      - 9092:9092
    environment:
      KAFKA_OFFSETS_TOPIC_REPLIATION_FACTOR: 1
      KAFKA_ADVERTISED_HOST_NAME: 127.0.0.1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
复制代码
  • zookeeper distributed coordination system
  • kafka server Kafka Service
  • kafka-manager kafka management platform

Follow-up cluster version.

<End>

Code: github.com/wuxiaoxiaos...

Guess you like

Origin juejin.im/post/5dbe8d7a6fb9a020775fce09