caseStudy-20180913-Kafka hang up & process solutions

Problem Description

About xx xx May 2018 16:20 xxx xxx happened to see online are troubleshooting problems encountered Kafka cluster, and then ask the case, there is a machine Kafka process hung up, he was on a lark when viewing platform error log information, and then I joined troubleshoot the problem together.
Accident Duration: xx xx May 2018 16:30 - at 17:25 on September 13, 2018
Business Impact: In theory no effect, automatic fault-tolerant business, production and consumption of literacy failure retry routed to another node Kafka on.
People involved in the processing: xxx, xxx

Process

2018 xx xx May 42 May 16:00 minutes xxx View lark log, error log analysis
2018 16:00 xx month xx day 48 points xxx Log in to view the system parameters Kafka machine 10.136.40.2
2018 16:00 xx month xx day finishing 55 points xxx system optimization parameter lists
2018 xx xx May 17:12 xxx modify the system parameters
2018 xx xx xxx date 17:25 Kafka restart the service, observe the log, the normal start open-source platform cloudera

Positioning process

1. Check the error log lark

 

 

 

 

 

看到 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) 错误

Log on the machine (ssh 10.136.40.2) View:

 

 

When the above figures are not jvm heap OutofMemoryError error, is a newly created index files do mmap memory mapping error, by judging analysis, is likely to be due to an insufficient number max_map_count.
google search cause of the problem: https: //stackoverflow.com/questions/43042144/kafka-server-failed-to-start-java-io-ioexception-map-failed

 

 

The basic configuration is too low to determine the cause of a max_map_count
view the system parameters
[@ c3-worker d13-136-40-2 Kafka used to live] $ sudo sysctl -a | grep "max_map_count"
vm.max_map_count = 65530

System configuration parameters by Kafka, think of the mandatory system configuration before elastic search startup parameters, or start error

 

 

es in the parameters vm.max_map_count

 

 

es force the optimizer to start the practice is worth learning 

 

 

Why is there this failure

  • 1. The lack of a complete alarm monitoring system, can not be timely warning
  • 2. We do not have on the operation of Kafka cluster line, do not understand
  • 3.centos system without making parameter optimization

Why the lack of monitoring alarm system

  • Because our Kafka ecosystem under construction

Why Kafka cluster line running without grasp, understand

  • Because before Kafka poorly understood, lack of control and the overall awareness of Kafka

Why does the system without making parameter optimization

  • Operation and maintenance because the system is installed, we just spent, so there will not notice the impact on the operating environment requires no deep understanding of Kafka

max_map_count effect analysis

Glossary: process maximum number of virtual memory mapped region.
The official explanation: https://www.oschina.net/translate/understanding-virtual-memory?print
max_map_count file contains a number of restrictions process can have VMA (virtual memory area). Virtual memory area is a contiguous virtual address space area. In the life cycle of the process, whenever a program attempts to map files in memory, linked to a shared memory segment, or allocate heap space of time, these areas will be created. This value will limit the tuning process can have a number of VMA. A process has VMA limit the total number of applications could lead to errors, because when the process reaches the line on the VMA but can only release a small amount of memory to other kernel processes, the operating system will throw out of memory error. If your operating system in the NORMAL area with only a small amount of memory, then this value can help reduce the release of kernel memory to use.
Baidu Quality Analysis Reference: http://www.10tiao.com/html/473/201606/2651473114/1.html

 

 

Underlying research program malloc, mmap calls and direct loading of shared libraries and mprotect, java class FileChannel.map methods produce memory-mapped region.

Kafka service using the scala language development, running on the platform jvm, jvm heap is because pre-allocated memory, so take up very little virtual virtual memory area. The maximum occupancy in Kafka program is doing file-mapping (mmap function call), Kafka is stored into the data files and index files, read and write data files directly without mmap mapping, and index files in order to speed read and write speeds to achieve the mapping mmap mechanism, each create an index file to do a virtual memory mapping, map_count number is incremented until the current map_count> system max_map_count, will throw OutofMemoryError, then the java process to exit. The following is an index file mapping code to do

 

Follow-up

Risks facing Kafka cluster
if other nodes without system parameter optimization, Kafka node may request because the peak or data skew, are also at risk block off, so all nodes in the cluster must be adjusted Kafka and take preventive measures. Here TODO work.

Carding system parameter list optimization parameters are listed xxx xx xx (date) DONE all the systems
on the scripted line to perform submit work orders, operation and maintenance is responsible for implementing xxx xx xx (date) DONE 

 

Blog address reference: https://www.cnblogs.com/lizherui/p/12650254.html

Guess you like

Origin www.cnblogs.com/lizherui/p/12650254.html