Huawei Cloud Yaoyun Server L instance evaluation | Installing kafka on Huawei Cloud

Huawei Cloud Yaoyun Server L instance evaluation | Installing kafka on Huawei Cloud

1. Introduction to kafka

Kafka is an open source distributed message flow platform developed by LinkedIn and written in Scala and Java. Its main function is to provide a unified, high-throughput, low-latency platform for processing real-time data. Its essence is a message engine system based on the publish-subscribe model.

Kafka has the following features:

  • High throughput and low latency: Kafka sends and receives messages very quickly, and the message processing delay using the cluster can be as low as 2ms.
  • High scalability: Kafka can elastically expand and contract, can scale to thousands of brokers, hundreds of thousands of partitions, and process trillions of messages every day.
  • Persistent storage: Kafka can securely store data in a distributed, durable, fault-tolerant cluster.
  • High availability: Kafka can effectively expand the cluster in the availability zone. If a node goes down, the cluster can still work normally.

kafka core components:

  • Topic
    messages are classified according to Topic and can be understood as a queue. When the message producer generates a message, it will attach a Topic label to it. When the message consumer needs to read the message, it can read specific data based on this Topic.

  • Producer
    message producer is the client that sends messages to kafka broker. The message producer is responsible for sending the generated messages to the Kafka server.

  • Consumer
    message consumer is the client that gets messages from kafka broker.

  • Consumer Group
    Consumer group, each message consumer can be divided into a specific group.

  • Broker
    is each Kafka instance (server). A Kafka server is a broker. A cluster is composed of multiple brokers. A broker can accommodate multiple topics.

  • Zookeeper
    relies on the cluster to save meta information.

Kafka is a distributed message queue. It has high performance, persistence, multi-copy backup, and horizontal expansion capabilities. Producers write messages to the queue, and consumers retrieve messages from the queue to perform business logic. Generally, it plays the role of decoupling, peak clipping, and asynchronous processing in architecture design.

2. Huawei cloud host preparation

  1. When purchasing a Huawei cloud host, the evaluation system is as follows:
    Insert image description hereNote: In this article, we use the 2C4G environment for testing, not 2C2G~

  2. Create a new security group and develop all ports for testing
    Insert image description here
    . Change the security group as follows, select our security group for development of all ports:
    Insert image description here

  3. After developing all the ports, we can log in to the Huawei Cloud host via ssh~

3. kafka installation

Official quick start: https://kafka.apache.org/quickstart

Version information tested and verified in this article:

kafka_2.13-3.2.3.tgz
openjdk-17.0.1_linux-x64_bin.tar.gz

1. What version of java should be installed?

Idea:

  1. According to the Kafka version requirements, download and install the corresponding version of Java.
  2. Configure the JAVA_HOME environment variable to point to the Java installation directory.
    Configure Kafka to use a specific Java version by setting the JAVA_HOME variable.

Binary downloads:
Scala 2.12 - kafka_2.12-3.5.0.tgz (asc, sha512)
Scala 2.13 - kafka_2.13-3.5.0.tgz (asc, sha512)
From the release notes of Kafka, we can see that it provides Precompiled packages based on Scala 2.12 and 2.13.

Want to determine which version of Java to use to run Kafka?
Scala version 2.12 requires Java 8 or higher. And Scala 2.13 version requires Java 11 or higher.

Kafka provides packaged downloads based on Scala 2.12 and 2.13. The main differences are as follows:

  1. Scala version
    Scala 2.12 and 2.13 are the two main versions of Scala. Kafka uses Scala for development, so it needs to be compiled and packaged corresponding to different Scala versions.
  2. Compatibility
    Scala 2.12 version has better compatibility with older versions, but does not have the new features of Scala 2.13. Scala 2.13 removes some old features but supports new syntax.
  3. Runtime performance
    Scala 2.13 has been optimized and its runtime performance is improved compared to Scala 2.12.
  4. Compilation speed
    Scala 2.13 compiles faster than 2.12.
  5. Community support
    Scala 2.12 has more library dependency support and the community is more mature. Scala 2.13 is receiving increasing support.
    After comprehensive consideration, if you want to be compatible with old projects and need to rely on more old libraries, it is recommended to choose Scala version 2.12.
    If it is a new project or you need to optimize running performance, you can choose Scala version 2.13.

Therefore, here we choose the Scala 2.13 version, so the version information we select here is as follows:

kafka_2.13-3.2.3.tgz
openjdk-17.0.1_linux-x64_bin.tar.gz

Just decompress the binary installation of openjdk directly, for example:

#!/bin/bash
if [ ! -d "/myproject/kafka/jdk-17.0.1/" ];then
  tar -xf openjdk-17.0.1_linux-x64_bin.tar.gz -C /myproject/kafka/

2. Install the zookeeper service

Kafka needs to rely on ZK. The installation package already comes with a ZK , or you can change it to specify a running ZK. If you change to the specified ZK, you need to modify zookeeper.connect in the config/server.properties file in the kafka installation directory. The built-in ZK is used here. Just modify the configuration file and start it.

For the normal operation of Kafka, ZooKeeper must be configured, otherwise neither the Kafka cluster nor the client's survivors and consumers will work properly; therefore, it is necessary to configure and start the ZooKeeper service.

  1. First download and install kafka:
wget https://archive.apache.org/dist/kafka/3.5.0/kafka_2.12-3.5.0.tgz
tar -xzf kafka_2.12-3.5.0.tgz
cd kafka_2.12-3.5.0
  1. Modify zookeeper configuration
    zookeeper.properties:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# the directory where the snapshot is stored.
# 数据快照文件存储的目录
dataDir=/opt/lighthouse/server/env/kafka/zookeeper
# the port at which the clients will connect
# clientPort
# 客户端连接的端口
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
# 最大客户端连接数,这里设置为0表示无限制
maxClientCnxns=0
# Disable the adminserver by default to avoid port conflicts.
# Set the port to something non-conflicting if choosing to enable this
# 默认情况下该功能是关闭的。如果设置为true,则会启动一个嵌入式的 Jetty 服务器,默认端口号为8080。
# admin.enableServer 主要目的是提供便捷的监控和管理功能。在需要调试查看服务器状态或者管理集群时开启使用。但正常运行时开启该功能会增加一些系统开销。
admin.enableServer=false
# 初始化连接时的最长时间,单位TickTime。TickTime 指定了 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是定时心跳(heartbeat)的周期。默认情况下 TickTime 是 2000 毫秒,也就是 2 秒。
initLimit=5
#  发送请求和接收响应之间的最长时间,单位TickTime
syncLimit=2
# admin.serverPort=8080
# 允许所有四字命令 四字命令(Four Letter Words)是 Zookeeper 提供的一些简单的命令,用于查询服务器的状态。
# 这些命令全部是4个字母的字符串,通过 telnet 或 nc 向 Zookeeper 服务器的客户端端口(默认2181)发送四字命令
# Zookeeper 支持的四字命令包括:
# - conf:输出相关服务配置的详细信息。
# - cons:列出所有连接到服务器的客户端连接/会话的详细信息。
# - crst:重置当前这台服务器所有连接/会话的统计信息。
# - dump:列出未完成的会话和临时节点。
# - envi:输出关于服务器环境的详细信息。
# - ruok:测试服务是否处于正确运行状态,如果正常返回"imok",否则不做任何响应。
# - stat:输出关于客户端连接数,接收/发送包数量等的简要信息。
# - srst:重置 server stat 中的统计信息。
# - wchs:列出服务器 watch 的简单信息。
# - wchc:通过 session 列出服务器 watch 的详细信息。
# - wchp:通过路径列出服务器 watch 的详细信息。
4lw.commands.whitelist=*

# 集群中参与的服务器,每一行配置一个
# server.id=host:port:port
#     其中第一个port是 follower 与 leader 通信的端口,第二个port是 leader选举的端口。
# 这里配置的是Zookeeper集群,所以使用了同一个IP,不同的端口号(12888和13888)来区分不同的Zookeeper节点。实际生产环境中,不同的Zookeeper服务器应该使用不同的IP地址,而不是同一个IP。
# 配置文件中的ip地址主要用于集群模式,让集群中的其他zookeeper节点能够互相访问。
# 但在单机模式下,它用不到这个配置的ip地址,直接使用当前进程的主机ip就可以了。
# 即使配置的ip地址不正确,也不会影响单机模式下zookeeper的启动。

# 12888端口在Zookeeper中用于follower与leader之间的通信。13888端口用于leader选举过程中的通信。这两类通信在单机模式下都是不需要的。
# follower与leader通信在单机模式下不需要,因为只有一个server,不存在follower和leader的概念。 这两类通信在单机模式下都是不需要的。
server.1=10.248.172.114:12888:13888
server.1=10.248.172.114:12888:13888
server.1=127.0.0.1:12888:13888

3. Use systemctl to manage and start the ZooKeeper service

kafka_zookeeper.server, here you can directly use the script included in Kafka, encapsulated in the systemd configuration file~

[Unit]
Description=Apache Zookeeper server (Kafka)
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Environment="KAFKA_HEAP_OPTS=-Xmx256M -Xms256M"
Type=simple
Restart=always
Environment=JAVA_HOME=/opt/lighthouse/server/env/kafka/jdk-17.0.1
WorkingDirectory=/opt/lighthouse/server/env/kafka
ExecStart=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/bin/zookeeper-server-start.sh /opt/lighthouse/server/conf/zookeeper/zookeeper.properties
ExecStop=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/bin/zookeeper-server-stop.sh
CPUQuota=25%
MemoryMax=512M
MemoryLimit=512M

[Install]
WantedBy=multi-user.target
sudo rm -rf /etc/systemd/system/kafka_zookeeper.service

sudo cp $SERVER_CONF_PATH/kafka_zookeeper.service /etc/systemd/system/kafka_zookeeper.service
sudo systemctl daemon-reload
sudo systemctl enable kafka_zookeeper
sudo systemctl restart kafka_zookeeper

4. Modify kafka configuration

server.propertiesn:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
# broker.id 配置 broker id,要求每个 broker 的 id 唯一
broker.id=0

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092

security.inter.broker.protocol=SASL_PLAINTEXT
sasl.mechanism.inter.broker.protocol=PLAIN
#  sasl.enabled.mechanisms - 启用的 SASL 机制,比如 PLAIN、SCRAM
sasl.enabled.mechanisms=PLAIN

# - SASL 表示启用了 SASL(Simple Authentication and Security Layer)机制的安全连接。SASL 提供了 Kafka 客户端与 broker 之间的安全认证。
# - PLAINTEXT 表示未加密的 claro 连接。这主要用于开发环境,生产环境更推荐使用 SSL 加密连接。
listeners=SASL_PLAINTEXT://127.0.0.1:9092

# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
# 这个配置的作用是让客户端能够连接到 broker 的外网地址,而不是只能连接到内网地址。
# 原因是 Kafka broker 在集群内部的地址(listeners 配置)可能是一个不可路由的内网地址,如 192.168.0.1。这样外部客户端无法连接。
# 为了让外部客户端可以连接,需要配置一个外网可路由的地址,如公网 IP,然后通过 advertised.listeners 把这个地址暴露给客户端。
advertised.listeners=SASL_PLAINTEXT://127.0.0.1:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
# 配置网络线程数
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
# 配置 IO 线程数
num.io.threads=8

#  socket.send.buffer.bytes 和 socket.receive.buffer.bytes 配置 socket 发送/接收缓冲区大小
# The send buffer (SO_SNDBUF) used by the socket server
# 配置日志存放目录
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
# log.dirs 指定的是 Kafka broker 的消息日志(log)所在的目录。Kafka 的消息数据是以日志文件的形式保存在这个目录下的。
# 注意:log.dirs 这与 Kafka 自身的运行日志是不同的,指定的路径是用来存储 Kafka 中主题和分区的日志数据。 log.dirs 配置的目录可以视为 Kafka 的“数据目录”,而不是“日志目录”。
log.dirs=/opt/lighthouse/server/env/kafka/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
# 配置 topic 的默认分区数
num.partitions=12

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
# 配置每个数据目录恢复线程数
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
# 配置内部 offsets topic 的副本数
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
# 这个配置项用于控制 Kafka 将消息日志 flush 到磁盘的频率
# 它的作用是配置每累积多少条消息,Kafka 就将消息日志 flush 到文件系统一次。
# 默认值为 9223372036854775807,即最大 long 值。这意味着不会按消息条数进行 flush。
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=2
# 日志段滚动的时间间隔。当达到这个时间,会创建一个新的日志段。默认是168小时,这里设置为1小时。
log.roll.hours = 1
retention.ms = 3600000
log.retention.check.interval.ms = 120000
log.cleanup.interval.mins = 5
log.segment.delete.delay.ms = 60000
# 是否启用日志压缩。默认true。压缩可以减少磁盘使用。
log.cleaner.enable=true

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
# 根据日志总大小保留日志的策略。当前日志段总和大于该值时,会删除旧的段。默认是-1,即不限制大小。这里是150GB。
log.retention.bytes = 16106127360
# 每个日志段的大小,达到该值时会创建新段。默认1GB,这里是500MB。
log.segment.bytes = 536870913

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
# 检查日志是否可以被删除的时间间隔。默认5分钟。
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=127.0.0.1:2181

# Timeout in ms for connecting to zookeeper
# 连接Zookeeper的超时时间,默认6秒。
zookeeper.connection.timeout.ms=60000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

# 控制 replica 在从 leader 中 fetch 消息时,每次能拉取的最大字节数。
# 默认是 1048576 bytes,这里增加到 20MB。增大这个值可以减少 follower 频繁地向 leader 发起复制请求。
replica.fetch.max.bytes=20971520

# 控制 kafka 中消息体的最大大小,默认是1000012 bytes。这里增加到20MB,允许发送更大的消息。但消息不能超过这个最大值。
message.max.bytes=20971520

5. Use systemctl to manage and start the kafka service

kafka.service:

[Unit]
Description=Apache Kafka server (broker)
Documentation=http://kafka.apache.org/documentation.html
Requires=network.target remote-fs.target
After=network.target remote-fs.target kafka_zookeeper.service

[Service]
CPUQuota=200%
MemoryMax=4G
MemoryLimit=4G
Environment="KAFKA_HEAP_OPTS=-Xmx2048M -Xms2048M"
Environment="KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -XX:+ExplicitGCInvokesConcurrent"
Environment="KAFKA_OPTS=-Djava.security.debug=jaas -Djava.security.auth.login.config=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/config/kafka_server_jaas.conf"
Type=simple
Restart=always
LimitNOFILE=1024768
LimitNOFILE=1024768
Environment=JAVA_HOME=/opt/lighthouse/server/env/kafka/jdk-17.0.1
WorkingDirectory=/opt/lighthouse/server/env/kafka
ExecStart=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/bin/kafka-server-start.sh /opt/lighthouse/server/conf/kafka/server.properties
ExecStop=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/bin/kafka-server-stop.sh

[Install]
WantedBy=multi-user.target

Start kafka:

sudo rm -rf /etc/systemd/system/kafka.service
sudo cp $SERVER_CONF_PATH/kafka.service /etc/systemd/system/kafka.service

sudo systemctl daemon-reload
sudo systemctl enable kafka
sudo systemctl restart kafka

Note: Environment="KAFKA_OPTS=-Djava.security.debug=jaas -Djava.security.auth.login.config=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/config/kafka_server_jaas.conf"
For the kafka service we use configuration kafka_server_jaas.conf, and for the kafka client we use configuration. kafka_client_jaas.conf
This configuration is more important~

6. Create a test topic

SASL_PLAINTEXT and PLAINTEXT basics

SASL (Simple Authentication and Security Layer) is an application layer network protocol used to add authentication support.

JAAS (Java Authentication and Authorization Service) is Java's authentication and authorization service. Kafka uses JAAS to implement SASL authentication and authorization.

The kafka configuration is as follows:

listeners=SASL_PLAINTEXT://127.0.0.1:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
# 这个配置的作用是让客户端能够连接到 broker 的外网地址,而不是只能连接到内网地址。
# 原因是 Kafka broker 在集群内部的地址(listeners 配置)可能是一个不可路由的内网地址,如 192.168.0.1。这样外部客户端无法连接。
# 为了让外部客户端可以连接,需要配置一个外网可路由的地址,如公网 IP,然后通过 advertised.listeners 把这个地址暴露给客户端。
advertised.listeners=SASL_PLAINTEXT://127.0.0.1:9092  是这样配置的呀

SASL_PLAINTEXTIt is the PLAINTEXT protocol with SASL authentication enabled, which will cause clients that do not use SASL to be unable to connect.

If you only need internal use, it is recommended to use the PLAINTEXT protocol, which is simple to configure and does not require SASL settings. SASL_PLAINTEXT is only needed when verifying the client's identity.
You can modify the Kafka configuration to turn off SASL authentication like this:

# 注释或者删除与SASL相关的配置
#security.inter.broker.protocol=SASL_PLAINTEXT  
#sasl.mechanism.inter.broker.protocol=PLAIN
#sasl.enabled.mechanisms=PLAIN

listeners=PLAINTEXT://0.0.0.0:9092 
advertised.listeners=PLAINTEXT://localhost:9092

# 删除sasl.jaas.config

Here we mainly demonstrate the situation with account and password:
The SASL authentication configuration on the Kafka server is managed through the JAAS mechanism, mainly kafka_server_jaas.confconfigured through files.

kafka_server_jaas.conf Contents:

KafkaServer {
    
    
    org.apache.kafka.common.security.plain.PlainLoginModule required
    username="admin"
    password="elkeid"
    user_admin="elkeid"
    user_alice="elkeid";
};

We need to modify the official built-in script kafka-run-class.shand add the following configuration , specifying the use of the kafka_server_jaas.conf file:
We customize a KAFKA_SASL_OPTS environment variable
KAFKA_SASL_OPTS. This environment variable is used to specify the SASL-related JAAS configuration of the Kafka process.

  • -Djava.security.auth.login.config: This is the Java system property that sets the JAAS login configuration file.
  • /xxx/kafka/kafka_2.13-3.2.3/config/kafka_server_jaas.conf: This is the path to the JAAS configuration file.
    The effect of this environment variable is:
  • Specify the JAAS configuration file path for the Kafka process as /xxx/kafka/kafka_2.13-3.2.3/config/kafka_server_jaas.conf
  • When the Kafka process starts, this JAAS configuration file will be loaded to obtain SASL authentication-related configurations.

KAFKA_SASL_OPTS=“-Djava.security.auth.login.config=/xxx/kafka/kafka_2.13-3.2.3/config/kafka_server_jaas.conf”
Insert image description here

Summary of ideas: Specify the JAAS configuration file for Kafka by modifying the official kakfa startup script kafka-run-class.sh. In the command that starts the Kafka process, add this variable.

After testing and verification, this implementation is not recommended. If you don't use other official client scripts, you can change it like this, because
it is best not to hardcode other configurations in kafka-run-class.sh, but to pass them through environment variables to maintain the script's versatility.

Idea 1: You can imitate it here

if [ -z "$KAFKA_OPTS" ]; then
  KAFKA_OPTS=""
fi

By adding similar logic to the kafka-run-class.sh script, you can implement the function of customizing the JAAS configuration file path:

# JAAS configuration
if [ -z "$KAFKA_SASL_OPTS" ]; then
  KAFKA_SASL_OPTS="" 
fi

Then when starting Kafka, if you need to use non-default JAAS configuration:

export KAFKA_SASL_OPTS="-Djava.security.auth.login.config=/custom/jaas.conf"

You can easily switch between different JAAS configuration files by exporting KAFKA_SASL_OPTS. Compared with hard-coding the specified JAAS file path, this implementation is more flexible and versatile.

Idea 2: There is no need to modify the official script at all. The KAFKA_OPTS environment variable of the official script can meet our needs.

export KAFKA_OPTS="-Djava.security.debug=jaas -Djava.security.auth.login.config=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/config/kafka_client_jaas.conf"

Note: kafka_client_jaas.conf is used here

Here I recommend using idea 2.

Create a test topic

Load java environment variables so that java can be found

export JAVA_HOME=/opt/lighthouse/server/env/kafka/jdk-17.0.1

Enter the kafka installation directory:

cd /opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/
./bin/kafka-topics.sh --create --topic test --partitions 1 --replication-factor 1 --bootstrap-server localhost:9092

If there are no errors, the topic can be created successfully.
However, our kafka service is actually configured with SASL/PLAIN, which is an authentication method based on account and password, so an error should be reported here.

Therefore, we need to configure and modify the official client operation-related scripts so that it can support account and password access to kafka.

SASL/PLAIN client configuration (when the server configuration enables SASL/PLAIN, then the client needs to configure authentication information when connecting)

When a client connects to a server with SASL authentication enabled, it needs to be specified in the client configuration:

security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN

These two parameters are specified separately:

  • Communicate using the SASL_PLAINTEXT protocol
  • Using the PLAIN mechanism for username and password verification
    can be added in the client's configuration file (such as consumer.properties, producer.properties, etc.)
security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN

The specific steps are as follows:

  1. Add a new configuration file in the kafka/config directory jaas.properties, configure SASL, and specify the security protocol and authentication mechanism used by the client to be consistent with the server.
vi jaas.properties
security.protocol=SASL_PLAINTEXT 
sasl.mechanism=PLAIN

Once the SASL parameters of the client and server are consistent, with the correct Jaas configuration, the client should be able to successfully establish a connection with the server through SASL/PLAIN.

  1. Add a new configuration file in the kafka/config directory kafka_client_jaas.confand specify the user login account information.
KafkaClient {
    
    
  org.apache.kafka.common.security.plain.PlainLoginModule required
    username="admin"
    password="elkeid";
};

Note: The users here need to follow the user configuration configured in the server kafka-server-jaas.conf configuration file, otherwise an error will be reported

  1. kafka-topics.sh, kafka-console-producer.sh, kafka-console-consumer.sh file operations
    kafka-topics.sh, kafka-console-producer.sh, kafka-console-consumer in the kafka kafka/bin directory. sh file, add the following configuration.
    Here, kafka-topics.sh is used as an example to specify the kafka_client_jaas.conf configuration file directory.

Taking kafka-topics.sh as an example, we create a topic:

export JAVA_HOME=/opt/lighthouse/server/env/kafka/jdk-17.0.1
cd /opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/
export KAFKA_OPTS="-Djava.security.debug=jaas -Djava.security.auth.login.config=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/config/kafka_client_jaas.conf"

./bin/kafka-topics.sh --create --topic test --partitions 1 --replication-factor 1 --bootstrap-server localhost:9092 --command-config=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/config/jaas.properties

Note: I do not need to customize KAFKA_SASL_OPTS here. I KAFKA_OPTScan directly use the environment variables in the official script to overwrite the specified kafka_client_jaas.conf configuration file directory.

7. Send and consume a test message

At this point, we have started kafka and successfully created a topic. Next, we send and consume a test message.

Enter the kafka installation directory:

export JAVA_HOME=/opt/lighthouse/server/env/kafka/jdk-17.0.1
cd /opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/
export KAFKA_OPTS="-Djava.security.debug=jaas -Djava.security.auth.login.config=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/config/kafka_client_jaas.conf"

Production news:

./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test  --producer.config=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/config/producer.properties

Consumer news:

./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning   --consumer.config=/opt/lighthouse/server/env/kafka/kafka_2.13-3.2.3/config/consumer.properties

Note: producer.properties and consumer.properties exist by default. As before jaas.properties, we add additional configurations.

security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN

If messages can be sent and received, Kafka can basically work.

8. Problems encountered during the process

创建主题报错:NFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Failed authentication with /127.0.0.1 (channelId=127.0.0.1:9092-127.0.0.1:54982-14) (Unexpected Kafka request of type METADATA during SASL handshake.) (org.apache.kafka.common.network.Selector)

Problem analysis:
These "Failed authentication" errors indicate that the SASL authentication between the client and the broker failed when creating a Kafka topic.
The main reason: Kafka broker enabled SASL authentication, but the client did not configure the corresponding configuration when connecting.

The command line client kafka-topics.sh is used when creating a Kafka topic.
This client does not enable SASL authentication by default, so it cannot authenticate normally with the Kafka broker that has SASL authentication enabled, causing this problem.

Problem Solving:
To solve this problem, you need to enable SASL authentication through Jaas configuration when using command line clients such as kafka-topics.sh. The steps are as follows:

  1. In the Kafka configuration directory, add the Jaas configuration file, such as kafka_client_jaas.conf:
KafkaClient {
    
    
  org.apache.kafka.common.security.plain.PlainLoginModule required
  username="admin"
  password="admin-secret";
};
  1. When running the kafka-topics.sh command, add Jaas configuration parameters:
./bin/kafka-topics.sh --create --topic test --partitions 1 --replication-factor 1 --bootstrap-server localhost:9092 --command-config /path/to/kafka_client_jaas.conf

4. Kafka graphical tool selection

1. EFAK (Eagle For Apache Kafka, formerly known as Kafka Eagle)

Source code: https://github.com/smartloli/kafka-eagle/Download
: http://download.kafka-eagle.org/Official
documentation: https://www.kafka-eagle.org/articles/docs/documentation .html

EFAK (Eagle For Apache Kafka, formerly known as Kafka Eagle) is a Kafka cluster monitoring system open sourced by a domestic company. It can be used to monitor the broker status, Topic information, IO, memory, consumer threads, offsets, etc. of the Kafka cluster. information and display it visually. The unique KQL can also query data in kafka online through SQL.

After taking a look, the code activity is relatively high and the documentation is relatively detailed. It is recommended to choose this solution~

2. Kafka Manager

Kafka Manager is an open source project developed by Yahoo for managing and monitoring Kafka clusters. It provides a user-friendly Web UI to view and manage Kafka's topics, consumer groups, partitions, offsets and other information.

This is Yahoo's open source Kafka management tool, which focuses more on collecting Kafka cluster indicators and also has some topic management functions.

3. Kafka Monitor

This is a monitoring tool developed by LinkedIn that monitors the health and performance of Kafka clusters and provides a web-based user interface.

The Kafka monitoring tool developed by LinkedIn is very powerful and can help Kafka administrators quickly discover problems in the Kafka cluster and take timely measures to repair them.

reference

Kafka installation and deployment configuration
reference URL: https://www.cnblogs.com/yb38156/p/15978055.html
Big data Hadoop - Kafka graphical tool EFAK (EFAK environment deployment)
reference URL: https://blog.csdn .net/qq_35745940/article/details/124764824

Guess you like

Origin blog.csdn.net/inthat/article/details/132994516