ELK+Kafka builds log collection system
ELK Overview
ELK refers to the three main open source software tools Elasticsearch, Logstash and Kibana, which are often used together for real-time log analysis and data visualization. In addition, in log collection systems, ELK is usually used in conjunction with Kafkka.
1、Elasticsearch
Elasticsearch is an open source distributed search and analysis engine built on the Lucene library. It is designed to handle large-scale data sets, enabling fast full-text search, structured search, analysis and real-time data processing. Elasticsearch is highly scalable and reliable, can automatically handle data sharding and replication, and supports distributed search and aggregation operations.
2.Logstash:
Logstash is an open source data collection and processing engine used to collect, process and transmit various types of data (such as logs, events, metrics, etc.) from multiple sources to Elasticsearch or other storage and analysis tools. Logstash supports multiple data input sources and output destinations, and can perform data conversion, standardization, filtering and enhancement to make the data consistent and structured.
3. Kibana:
Kibana is an open source data visualization platform for creating and sharing real-time data visualization dashboards on Elasticsearch. It provides a rich set of visual components such as charts, tables, maps, and dashboards, allowing users to explore and analyze data in an intuitive way. Kibana also supports interactive querying and filtering, enabling you to quickly demonstrate and share insights from your data.
4.Kafka
Kafka is a data buffer queue. Being a message queue decouples the processing while improving scalability. With peak processing capabilities, using message queues can enable key components to withstand sudden access pressure without completely collapsing due to sudden overloaded requests.
Build and configure
Use Docker Compose to implement ELK (Elasticsearch, Logstash, Kibana) and Kafka log collection
docker-compose.yml
Create the docker-compose.yml file and use the following services:
ZooKeeper:用于Kafka的依赖服务,监听在2181端口
Kafka:用于消息队列和日志采集,监听在9092端口,并连接到ZooKeeper
Elasticsearch:用于存储和索引日志数据,监听在9200端口
Logstash:用于从Kafka接收日志数据并转发到Elasticsearch
Kibana:用于可视化和检索日志数据,监听在5601端口,并连接到Elasticsearch
vim docker-compose.yml
version: '3.7'
services:
zookeeper:
image: zookeeper:3.8
container_name: zookeeper
ports:
- "2181:2181"
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
restart: always
kafka:
image: bitnami/kafka:3.3.2
container_name: kafka1
hostname: kafka
volumes:
- ./kafka_data:/bitnami/kafka # 赋予kafka_data目前权限:chmod 777 kafka_data
ports:
- "9092:9092"
depends_on:
- zookeeper
environment:
KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181 #kafka链接的zookeeper地址
KAFKA_ENABLE_KRAFT: no # 是否使用kraft,默认值:是,即Kafka替代Zookeeper
KAFKA_CFG_LISTENERS: PLAINTEXT://:9092 # 定义kafka服务端socket监听端口,默认值:PLAINTEXT://:9092,CONTROLLER://:9093
KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://192.168.30.30:9092 # 定义外网访问地址(宿主机ip地址和端口),默认值:PLAINTEXT://:9092
KAFKA_KRAFT_CLUSTER_ID: FDAF211E728140229F6FCDF4ADDC0B32 # 使用Kafka时的集群id,集群内的Kafka都要用这个id做初始化,生成一个UUID即可
ALLOW_PLAINTEXT_LISTENER: yes # 允许使用PLAINTEXT监听器,默认false,不建议在生产环境使用
KAFKA_HEAP_OPTS: -Xmx512M -Xms256M # 设置broker最大内存,和初始内存
KAFKA_BROKER_ID: 1 # broker.id,必须唯一
restart: always
elasticsearch:
image: elasticsearch:7.4.2
container_name: elasticsearch
hostname: elasticsearch
volumes:
- ./es_data:/usr/share/elasticsearch/data # 赋予es_data目前权限:chmod 777 es_data
restart: always
environment:
- "discovery.type=single-node"
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ports:
- "9200:9200"
logstash:
image: logstash:7.4.2
container_name: logstash
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
- ./logstash.yml:/usr/share/logstash/config/logstash.yml
depends_on:
- elasticsearch
environment:
LS_JAVA_OPTS: "-Xmx256m -Xms128m"
ELASTICSEARCH_HOST: "http://192.168.30.30:9200"
kibana:
image: kibana:7.4.2
restart: always
container_name: kibana1
ports:
- 5601:5601
environment:
ELASTICSEARCH_URL: "http://192.168.30.30:9200"
depends_on:
- elasticsearch
Configure log collection rules
Create kafka's data storage directory and grant the current permissions:
chmod 777 kafka_data
Create the ES data storage directory and grant the current permissions:
chmod 777 es_data
Create the logstash.yml configuration file and modify the ES connection address
http.host: "0.0.0.0"
xpack.monitoring.elasticsearch.hosts: [ "http://192.168.30.30:9200" ]
Create the logstash.conf configuration file and define log collection rules
input {
kafka {
bootstrap_servers => "192.168.30.30:9092"
topics => "user_logs"
}
}
filter {
}
output {
elasticsearch {
hosts => "192.168.30.30:9200"
index => "user_logs"
}
}
Start service
In the docker-compose.yml file directory, run the following command to start the service:
docker-compose up -d
After all containers are started, you can access Kibana by accessinghttp://192.168.30.30:5601/
to start visualizing and querying the log data
Simulate sending log messages
Create a log sending queue, cooperate with the thread, obtain the log content from the queue, and then send the message to MQ asynchronously
Log sending queue
@Component
public class LogDeque {
/**
* 本地队列
*/
private static LinkedBlockingDeque<String> logMsgs = new LinkedBlockingDeque<>();
@Autowired
private KafkaTemplate<String, Object> kafkaTemplate;
public void log(String msg) {
logMsgs.offer(msg);
}
public LogDeque() {
new LogThread().start();
}
/**
* 创建线程,从队列中获取日志内容,然后以异步的形式发送消息到MQ
*/
class LogThread extends Thread {
@Override
public void run() {
while (true) {
String msgLog = logMsgs.poll();
if (!StringUtils.isEmpty(msgLog)) {
// 发送消息
kafkaTemplate.send("user_logs", msgLog);
}
// 避免cpu飙高的问题
try {
Thread.sleep(200);
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
}
Log aspect
@Aspect
@Component
@Slf4j
public class AopLogAspect {
@Autowired
private LogDeque logDeque;
/**
* 申明一个切点 execution表达式
*/
@Pointcut("execution(* cn.ybzy.demo.controller.*.*(..))")
private void logAspect() {
}
/**
* 请求method前打印内容
*
* @param joinPoint
*/
@Before(value = "logAspect()")
public void methodBefore(JoinPoint joinPoint) {
ServletRequestAttributes requestAttributes = (ServletRequestAttributes) RequestContextHolder.getRequestAttributes();
HttpServletRequest request = requestAttributes.getRequest();
JSONObject jsonObject = new JSONObject();
// 设置日期格式
SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
jsonObject.put("request_time", df.format(new Date()));
jsonObject.put("request_url", request.getRequestURL().toString());
jsonObject.put("request_ip", request.getRemoteAddr());
jsonObject.put("request_method", request.getMethod());
jsonObject.put("request_args", Arrays.toString(joinPoint.getArgs()));
// 将日志信息投递到MQ
String logMsg = jsonObject.toJSONString();
log.info("<AOP日志 ===》 MQ投递消息:{}>", logMsg);
// 投递msg
logDeque.log(logMsg);
}
}
Configure application.yaml
server:
port: 8888
spring:
datasource:
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://127.0.0.1:3306/demo?useUnicode=true&characterEncoding=utf-8&useSSL=false&allowMultiQueries=true&serverTimezone=UTC
username: root
password: 123456
application:
# 服务的名称
name: elkk
jackson:
date-format: yyyy-MM-dd HH:mm:ss
kafka:
bootstrap-servers: 192.168.30.30:9092 # 指定kafka server的地址,集群配多个,中间,逗号隔开
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
consumer:
group-id: default_consumer_group # 群组ID
enable-auto-commit: true
auto-commit-interval: 1000
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
Send log message
@Slf4j
@RestController
public class TestController {
@RequestMapping("/test")
public String test() {
return "OK";
}
}
Use of Kibana
To use Kibana, you need to tell it the Elasticsearch index you want to explore by configuring one or more index patterns. The index pattern is the prerequisite for Kibana visualization. It is equivalent to telling kibana which indexes to use as data for visual display.
Create index schema
An index pattern identifies one or more Elasticsearch indexes that you want to explore through kiabna. Kibana looks for index names that match the specified pattern. An asterisk (*) in a pattern matches zero or more characters.
Find management
in the left menu, then click index patterns
--> create index pattern
. By entering the name of index pattern
, kibana will automatically display the matching index and click next
In the Configure settings, select the time dimension field in the index. This time field is used to facilitate filtering data based on time. Select
@timestamp
from the drop-down menu here and click Create Index Mode
Discovery search data
ClickDiscovery
to view log-related data and search logs
Visualize data
Kibana comes with many visual components to facilitate visual display of aggregated results.
SelectVisualize
from the left menu, then click on the +
number on the right
ELK+RabbitMQ
After building the ELK+Kafka log collection system, you can build the ELK+RabbitMQ log collection system in the same way. Their implementation methods are similar. The following is for reference:
Send log message
@Autowired
private RabbitTemplate rabbitTemplate;
@Test
public void logs(String logMsg) {
for (int i = 0; i < 500000; i++) {
rabbitTemplate.convertAndSend("elk_logs_exchange", "user_logs",logMsg);
try {
Thread.sleep(500);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
Configure log collection rules
Create the logstash.conf configuration file and define log collection rules
input {
rabbitmq {
host => "192.168.30.30:5672"
user => "work"
password => "12345678"
vhost => "/"
queue => "user_logs"
durable => true
exchange => "elk_logs_exchange"
key => "user.logs"
codec => "json"
}
}
output {
elasticsearch {
hosts => ["192.168.30.30:9200"]
index => "base_logs"
}
}