kafka synchronizes data to ES (nginx filebeat kafka es)

Kafka: message queue Kafka is a high-throughput, highly scalable distributed message queue service that is widely used in scenarios such as log collection, monitoring data aggregation, streaming data processing, online and offline analysis, etc. One of the indispensable products.

Elasticsearch: Search and analysis engine, dedicated to database acceleration, data analysis, information retrieval, intelligent operation and maintenance monitoring and other scenario services. It is suitable for time-series data analysis scenarios with high write TPS, large write traffic fluctuations, and low search QPS, such as log retrieval analysis, metric monitoring analysis, IoT intelligent hardware data collection and monitoring analysis, etc.

===========================================

Install and configure nginx  
Step 1 Install nginx: yum -y install nginx
Step 2 Modify the log format in the nginx configuration file:
vim /etc/nginx/nginx.conf
Set the log printing format as follows:
log_format main '[\"$remote_addr\", \"$remote_user\",\"$time_iso8601\",\"$request\"' ',
\"$status\",\"$body_bytes_sent\",\"$http_referer\"' '
,\"$http_user_agent \",\"$http_x_forwarded_for\"]';

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;

include /usr/share/nginx/modules/*.conf;

events {     worker_connections 1024; } http {     log_format main '[\"$remote_addr\",\"$remote_user\",\"$time_iso8601\",\"$request\"' ',\"$status\", \ "$body_bytes_sent\",\"$http_referer\"' ',\"$http_user_agent\",\"$http_x_forwarded_for\"]';     access_log /var/log/nginx/access.log main; } Step 3 Start nginx, and check the startup status. /usr/sbin/nginx ps -ef|grep nginx







    



============================================

Install and configure filebeat
Step 1 Install filebeat
wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.10.0-
x86_64.rpm
rpm -ivh filebeat-7.10.0-x86_64.rpm
Edit the configuration file: vim /etc/filebeat/filebeat.yml, synchronize nginx logs to kafka
Step 2 Modify the filebeat.inputs configuration item.
Change false on line 24 to true. Set the full path of nginx log /var/log/nginx/access.log on line 28
.

filebeat inputs

- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
Step 3 Modify the Elasticsearch Output configuration item.
Comment out lines 176 to 186.
Step 4 Confirm the Logstash Output configuration items.
Make sure that lines 189 to 201 are all commented out.
Step 5 Insert Kafka and other configuration items
# --------------------------- Kafka Output ---------- -----------------------
output.kafka:
hosts: ["192.168.30.209:9092","192.168.30.210:9092","192.168.30.208 :9092"]
topic: nginx_log
version: 0.10.2
Step 6 Modify the Processors configuration item.
Comment out the original content and insert:
-
drop_fields:
fields: ["ecs","input","host","log","agent"]
Step 7 Start filebeat:
service filebeat start
===========================================

Install the stress measurement tool
Step 1 Install httpd-tools.
yum -y install httpd-tools
Step 2 Test access to nginx:
ab -n100 -c10 http://localhost/
Step 3 Log in to the Kafka console and check whether the message has been synchronized to Kafka

==========================================

Create an index template for Elasticsearch
Step 1 Create an index template.
The named template name is nginx-log-template.
Input index mode: nginx-* and nginx*
Settings are configured as follows:
{ "index.mapping.total_fields.limit": "3000", "index.translog.flush_threshold_size": "2gb", "index.number_of_replicas": "1" , "index.translog.sync_interval": "100s", "index.refresh_interval": "60s", "index.translog.durability": "async", "index.merge.policy.segments_per_tier": "10", " index.routing.allocation.total_shards_per_node": "200", "index.merge.policy.max_merged_segment": "512m", "index.










==========================================

flink 解析Kafka到ES
CREATE TEMPORARY TABLE `kafka_table` (
msg varchar
) WITH (
'connector' = 'kafka',
'topic' = 'nginx_log',
'properties.bootstrap.servers' = 'xxxxx:9092,xxxxx:9092,xxxxxx:9092',
'properties.group.id' = 'point2',
'format' = 'raw'
);
CREATE TEMPORARY TABLE es_sink (
client VARCHAR,
users VARCHAR,
access_time TIMESTAMP,
request VARCHAR,
status int,
body_bytes_sent int,
http_referer VARCHAR,
http_user_agent VARCHAR,
http_x_forwarded_for VARCHAR,
`@timestamp` TIMESTAMP
) WITH (
'connector' = 'elasticsearch-7',
'hosts' = 'http://xxxxxxxxxxx:9200',
'index' = 'nginx-{access_time|yyyy.MM.dd}',
'username' ='elastic',
'password' ='***'
);
INSERT into es_sink
select
json_value(json_value(msg,'$.message'),'$[0]'),
json_value(json_value(msg,'$.message'),'$[1]'),
cast(DATE_FORMAT(REGEXP_REPLACE(json_value(json_value(msg,'$.messag
e'),'$[2]'),'T',' '),'yyyy-MM-dd HH:mm:ss') as timestamp),
json_value(json_value(msg,'$.message'),'$[3]'),
cast(json_value(json_value(msg,'$.message'),'$[4]') as int),
cast(json_value(json_value(msg,'$.message'),'$[5]') as int),
json_value(json_value(msg,'$.message'),'$[6]'),
json_value(json_value(msg,'$.message'),'$[7]'),
json_value(json_value(msg,'$.message'),'$[8]'),
cast(DATE_FORMAT(REGEXP_REPLACE(json_value(json_value(msg,'$.messag
e'),'$[2]'),'T',' '),'yyyy-MM-dd HH:mm:ss') as timestamp)
from kafka_table;

Guess you like

Origin blog.csdn.net/victory0508/article/details/122040072