Kafka Design Analysis (5) - Kafka Performance Test Method and Benchmark Report

Reprint address: http://www.jasongj.com/2015/12/31/KafkaColumn5_kafka_benchmark/

Summary

  This article mainly introduces how to use Kafka's own performance test script and Kafka Manager to test the performance of Kafka, and how to use Kafka Manager to monitor the working status of Kafka, and finally gives the performance test report of Kafka.

Performance testing and cluster monitoring tools

  Kafka provides a lot of useful tools, such as the operation and maintenance tools mentioned in Kafka Design Analysis (3) - Kafka High Availability (below) - Partition Reassign Tool, Preferred Replica Leader Election Tool, Replica Verification Tool, State Change Log Merge Tool. This article will introduce the performance testing tools provided by Kafka, the Metrics reporting tools and Yahoo's open source Kafka Manager.

Kafka performance test script

  • $KAFKA_HOME/bin/kafka-producer-perf-test.sh This script is designed to test the performance of Kafka Producer. It mainly outputs 4 indicators, the total amount of messages sent (in MB), the amount of messages sent per second (MB/second), the total number of messages sent, and the number of messages sent per second ( records/second). In addition to outputting the test results to standard output, the script also provides a CSV Reporter, which stores the results in a CSV file, making it easy to use the test results in other analysis tools
  • $KAFKA_HOME/bin/kafka-consumer-perf-test.sh This script is used to test the performance of Kafka Consumer, and the test metrics are the same as those of the Producer performance test script

Kafka Metrics

  Kafka uses Yammer Metrics to report server-side and client-side metrics. Yammer Metrics 3.1.0 provides 6 forms of Metrics collection - Meters, Gauges, Counters, Histograms, Timers, Health Checks. At the same time, Yammer Metrics separates the collection and reporting (or publishing) of metrics, which can be freely combined as needed. Currently, the Reporters it supports include Console Reporter, JMX Reporter, HTTP Reporter, CSV Reporter, SLF4J Reporter, Ganglia Reporter, and Graphite Reporter. Therefore, Kafka also supports the output of its Metrics information through the above Reporters.

View Single Server Metrics with JConsole

  Using JConsole through JMX is one of the easiest and most convenient ways to view Kafka server metrics without installing other tools (since Kafka has been installed, Java must be installed, and JConsole is a tool that comes with Java).
  Kafka's JMX Reporter must first be enabled by setting a valid value for the environment variable JMX_PORT. eg export JMX_PORT=19797. Then you can use JConsole to access a Kafka server through the port set above to view its Metrics information, as shown in the following figure.

JConsole Kafka JMX
  One advantage of using JConsole is that there is no need to install additional tools. The disadvantages are obvious, the data display is not intuitive, the data organization is not friendly, and more importantly, the metrics of the entire cluster cannot be monitored at the same time. In the above figure, under kafka.cluster->Partition->UnderReplicated->topic4, there are only two nodes 2 and 5. This is not because topic4 has only the data of these two Partitions in a replicated state. In fact, topic4 has only these two Partitions on this Broker, and other Partitions are on other Brokers, so only these two Partitions can be seen through the server's JMX Reporter.

View metrics for the entire cluster through Kafka Manager

  Kafka Manager is an open source Kafka management tool from Yahoo. It supports the following functions

  • Manage multiple clusters
  • Easy to view cluster status
  • 执行preferred replica election
  • Generate and execute Partition allocation plans for multiple topics in batches
  • Create Topic
  • Delete Topic (only supports 0.8.2 and above, and requires it to be delete.topic.enableset to true in Broker)
  • Add Partition to Existing Topic
  • Update topic configuration
  • On the premise that the Broker JMX Reporter is enabled, poll the metrics at the broker level and topic level
  • Monitor Consumer Group and its consumption status
  • Support adding and viewing LogKafka

  After installing the Kafka Manager, it is very convenient to add a Cluster. Just specify the Zookeeper list used by the Cluster and specify the Kafka version, as shown in the following figure.

Add Cluster

  It should be noted here that adding a Cluster here refers to adding an existing Kafka cluster to the monitoring list, rather than deploying a new Kafka Cluster through Kafka Manager, which is different from Cloudera Manager.

Kafka Benchmark

  Kafka的一个核心特性是高吞吐率,因此本文的测试重点是Kafka的吞吐率。
  本文的测试共使用6台安装Red Hat 6.6的虚拟机,3台作为Broker,另外3台作为Producer或者Consumer。每台虚拟机配置如下

  • CPU:8 vCPU, Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz,2 Sockets,4 Cores per socket,1 Thread per core
  • 内存:16 GB
  • 磁盘:500 GB

  开启Kafka JMX Reporter并使用19797端口,利用Kafka-Manager的JMX polling功能监控性能测试过程中的吞吐率。

  本文主要测试如下四种场景,测试的指标主要是每秒多少兆字节数据,每秒多少条消息。

Producer Only

  这组测试不使用任何Consumer,只启动Broker和Producer。

Producer Number VS. Throughput

  实验条件:3个Broker,1个Topic,6个Partition,无Replication,异步模式,消息Payload为100字节
  测试项目:分别测试1,2,3个Producer时的吞吐量
  测试目标:如Kafka设计解析(一)- Kafka背景及架构介绍所介绍,多个Producer可同时向同一个Topic发送数据,在Broker负载饱和前,理论上Producer数量越多,集群每秒收到的消息量越大,并且呈线性增涨。本实验主要验证该特性。同时作为性能测试,本实验还将监控测试过程中单个Broker的CPU和内存使用情况
  测试结果:使用不同个数Producer时的总吞吐率如下图所示
Producer Number VS. Throughput

  由上图可看出,单个Producer每秒可成功发送约128万条Payload为100字节的消息,并且随着Producer个数的提升,每秒总共发送的消息量线性提升,符合之前的分析。

  性能测试过程中,Broker的CPU和内存使用情况如下图所示。
Broker CPU Usage

  由上图可知,在每秒接收约117万条消息(3个Producer总共每秒发送350万条消息,平均每个Broker每秒接收约117万条)的情况下,一个Broker的CPU使用量约为248%,内存使用量为601 MB。

Message Size VS. Throughput

  实验条件:3个Broker,1个Topic,6个Partition,无Replication,异步模式,3个Producer
  测试项目:分别测试消息长度为10,20,40,60,80,100,150,200,400,800,1000,2000,5000,10000字节时的集群总吞吐量
  测试结果:不同消息长度时的集群总吞吐率如下图所示
Message Size VS. Throughput

  由上图可知,消息越长,每秒所能发送的消息数越少,而每秒所能发送的消息的量(MB)越大。另外,每条消息除了Payload外,还包含其它Metadata,所以每秒所发送的消息量比每秒发送的消息数乘以100字节大,而Payload越大,这些Metadata占比越小,同时发送时的批量发送的消息体积越大,越容易得到更高的每秒消息量(MB/s)。其它测试中使用的Payload为100字节,之所以使用这种短消息(相对短)只是为了测试相对比较差的情况下的Kafka吞吐率。

Partition Number VS. Throughput

  实验条件:3个Broker,1个Topic,无Replication,异步模式,3个Producer,消息Payload为100字节
  测试项目:分别测试1到9个Partition时的吞吐量
  测试结果:不同Partition数量时的集群总吞吐率如下图所示
Partition Number VS. Throughput

  由上图可知,当Partition数量小于Broker个数(3个)时,Partition数量越大,吞吐率越高,且呈线性提升。本文所有实验中,只启动3个Broker,而一个Partition只能存在于1个Broker上(不考虑Replication。即使有Replication,也只有其Leader接受读写请求),故当某个Topic只包含1个Partition时,实际只有1个Broker在为该Topic工作。如之前文章所讲,Kafka会将所有Partition均匀分布到所有Broker上,所以当只有2个Partition时,会有2个Broker为该Topic服务。3个Partition时同理会有3个Broker为该Topic服务。换言之,Partition数量小于等于3个时,越多的Partition代表越多的Broker为该Topic服务。如前几篇文章所述,不同Broker上的数据并行插入,这就解释了当Partition数量小于等于3个时,吞吐率随Partition数量的增加线性提升。
  当Partition数量多于Broker个数时,总吞吐量并未有所提升,甚至还有所下降。可能的原因是,当Partition数量为4和5时,不同Broker上的Partition数量不同,而Producer会将数据均匀发送到各Partition上,这就造成各Broker的负载不同,不能最大化集群吞吐量。而上图中当Partition数量为Broker数量整数倍时吞吐量明显比其它情况高,也证实了这一点。

Replica Number VS. Throughput

  实验条件:3个Broker,1个Topic,6个Partition,异步模式,3个Producer,消息Payload为100字节
  测试项目:分别测试1到3个Replica时的吞吐率
  测试结果:如下图所示
Replica Number VS. Throughput

  由上图可知,随着Replica数量的增加,吞吐率随之下降。但吞吐率的下降并非线性下降,因为多个Follower的数据复制是并行进行的,而非串行进行。

  

Consumer Only

  实验条件:3个Broker,1个Topic,6个Partition,无Replication,异步模式,消息Payload为100字节
  测试项目:分别测试1到3个Consumer时的集群总吞吐率
  测试结果:在集群中已有大量消息的情况下,使用1到3个Consumer时的集群总吞吐量如下图所示

Consumer Number VS. Throughput

  由上图可知,单个Consumer每秒可消费306万条消息,该数量远大于单个Producer每秒可消费的消息数量,这保证了在合理的配置下,消息可被及时处理。并且随着Consumer数量的增加,集群总吞吐量线性增加。
  根据Kafka设计解析(四)- Kafka Consumer设计解析所述,多Consumer消费消息时以Partition为分配单位,当只有1个Consumer时,该Consumer需要同时从6个Partition拉取消息,该Consumer所在机器的I/O成为整个消费过程的瓶颈,而当Consumer个数增加至2个至3个时,多个Consumer同时从集群拉取消息,充分利用了集群的吞吐率。

Producer Consumer pair

  Experimental conditions: 3 Brokers, 1 Topic, 6 Partitions, no Replication, asynchronous mode, message Payload is 100 bytes Test item:   Test
  the amount of messages that the Consumer can consume when one Producer and one Consumer are working at the same time
Result: 1,215,613 records/second

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326710024&siteId=291194637