Kafka in a simple way

kafka zero torture

Insert picture description here

Consumer group coondinator mechanism

How to choose coondinator? One of the servers in the broker.
consumer group:
group. id: my_conumser -> my consumer hash -> the result is-a number
number/50 => a number such as 0-49 is obtained. For example, the number is 8,
and then find
which server the leader partition of partition 8 of __consumer_offets is on . That one is the coondinator server. |
__consumer_offets 50 partitions by default

Kafka cluster evaluation

To handle 1 billion requests, 5 physical machines are required, 11 (SAS) * 7T

Memory evaluation

We found that the process of reading and writing data in Kafka is based on the os cache. In other words, assuming our os cashe is infinite, is
the entire Kafka equivalent to operating based on memory? If it is operating based on memory, the performance will be very good.
The memory is limited.

  1. As much memory resources as possible should be given to the os cache
  2. The core code of Kafka is written in scala, and the code of the client is written in java.
    All are based on jvm. So we have to give some memory to jvm.
    The design of Kafka does not put many data structures in jvm. So our jvm does not need too much memory.
    According to experience, 10G is enough.
    NameNode:
    Metadata (tens of G) is also stored in the I jvm, and the JVM must give it a lot. For example, give a 100G.

Suppose we have this project with 1 billion requests, there will be a total of 100 topics.
100 topic * 5 partition * 2 = 1000 partition
A partition is actually a directory on the physical machine, and there will be many .log files under this directory.
.log is the storage data file. By default, the size of a .log file is 1G.
If we want to ensure that the data of the latest .log files of 1000 partitions are in the memory, the
performance is the best at this time . 1000 * 1G = 1000G memory.
We only need to store 25% of the latest data in the current latest log in the memory.
250M * 1000 = 0.25 G * 1000 = 250G of memory.
250 RAM / 5 = 50G RAM
50G+10G = 60G RAM
64G RAM, the other 4G, does the operating system need RAM too.
In fact, Kafka's jvm does not need to be as much as 10G.
It is estimated that 64G is possible.
Of course, if you can give a server with 128G of memory, it's best.
When I was just evaluating, I used a topic with 5 partitions, but if it is a topic with a large amount of data, there may be 10 partitions.
Summary. To
handle 1 billion requests, 5 physical machines are required, 11 (SAS) * 7T, 64G of memory (128G is better), and 16 cpu cores (32 is better)

Assess what kind of network card we need?

Generally, either a gigabit network card (IG/S), or a 10G network length (10G/S),
during peak periods, there will be an influx of 55,000 requests per second, 5.5/5 = about each There will be an influx of 10,000 requests to the server.
As we said before,
10000 * 50kb = 488M means that each server receives 488M data per second. The data also needs to have an auxiliary tree, and the synchronization between the replicas is
also a request to go through the network. 488 * 2 = 976m/s
Explain: For
many companies’ data, one request is not as big as 50kb. Our company is because the host encapsulates the data on the production side
and then merges multiple pieces of data together, so one of ours The request is so big.
Explain:
Under normal circumstances, the bandwidth of the network card is not up to the limit, if it is a gigabit network card, we can generally use about 700M.
But in the best case, we still use 10 Gigabit network cards.
If you are using 10 Gigabit, it is very easy.

Exception

1)LeaderNotAvailableException:
This is if a machine is hung up and the leader copy is unavailable at this time, it will cause you to fail
to write. You have to wait for other follower copies to switch to the leader copy before you can continue writing. You can try again at this time
Just send it. If you usually restart the broker process of Kafka, it will definitely cause the leader to switch, and it will definitely cause you to write an error, which
is
LeaderNotAvailableException. 2) NotControllerException: This is the same. If the Broker where the Controller is located is down,
then it will be If there is a problem, you need to wait for the Controller to re-elect. At this time, the same is to retry.
3) NetworkException: Network exception timeout
a. Configure the .retries parameter, and he will automatically retry
b. But if you try again a few times, it still doesn’t work. , We
will provide Exception for us to deal with. After we get the exception,
we will process the message separately, and we will have a backup link.

How to improve throughput

Parameter 1:
buffer.memory: Set the buffer for sending messages, the default value is 33554432, which is 32MB.
Parameter 2:
compression, type: default is none, no compression, but Iz4JK can also be used for compression, the efficiency is still good,
compression After that, the amount of data can be reduced, and the throughput can be improved: but it will increase the cpu overhead on the producer side.
Parameter 3:
batch.size: Set the size of the batch. If the batch is too small, it will cause frequent network requests and lower throughput;
if The batch size is too large, which will cause a message to wait a long time to be sent, and will put a lot of pressure on the memory buffer,
too much data is buffered in the memory, the default value is: 16384, which is 6 kb, that is, a batch is full Send out after 16 kb,
generally in actual production environment, the value of this batch can be increased to increase Joe's throughput:

If we have fewer messages and
use the parameter linger.ms, the default value is 0, which means that the message must be sent immediately, but this is not correct.
Generally, set a value of 100 milliseconds. In this case, the message After being sent out, it enters a batch.
If the batch is full of 16kb within 100 milliseconds, it will naturally be sent out.

Guess you like

Origin blog.csdn.net/m0_46449152/article/details/114998935