第一次接触Kafka和Spark,也没有前期去铺垫什么理论,直接上手,踩了一些坑,但现在再来过程会很快很简单。
先声明下,我的搭建环境是Ubuntu
1. 安装Kafka,5分钟
还是老规矩,参考官网,并按照官网说的去启动kafka的服务。这个过程只需要5分钟。
https://kafka.apache.org/quickstart
2. 安装Spark,5分钟
官网下载https://spark.apache.org/downloads.html最新的包
wget http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
tar xvf spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
启动spark
./spark_shell
3. 安装 Fluent-bit,并配置日志同时采集到Kafka和ES,5分钟
依然参照官网,这里也只需要3分钟左右:https://fluentbit.io/documentation/0.13/installation/ubuntu.html
安装完毕后,到/etc/td-agent-bit下修改td-agent-bit.conf文件,这里我的input是从fluent-bit采集到的日志转发到kafka和elasticsearch
[INPUT]
Name forward
Listen 0.0.0.0
Port 24224
[OUTPUT]
Name kafka
Match *
Brokers localhost:9092
Topics messages
[OUTPUT]
Name es
Match *
Host localhost
Port 9200
Index fluentbit-gw
Type docker
修改完毕以后需要重启服务
sudo service td-agent-bit restart
以及查看服务状态
sudo service td-agent-bit status
4. 示例: 启动一个docker服务,它的日志会被Fluent-bit采集
这里我启动了es的docker服务来验证。
docker run -p 9200:9200 -p 9300:9300 -t -i --log-driver=fluentd --log-opt fluentd-address=localhost:24224 -e "discovery.type=single-node" --name elasticsearch docker.elastic.co/elasticsearch/elasticsearch:6.7.1
Note: 需要注意的是,如果你的fluent-bit就是在本地,那不需要指定--log-opt fluent-address=host:port,这个参数,直接使用--log-driver就可以了,如果你的fluent在远端,还需要指定这条参数。
这样日志信息就会被送往到kafka和es了,通过kibana查看数据
docker run --link elasticsearch:elasticsearch -p 5601:5601 -e "elasticsearch.hosts=http://elasticsearch:9200" --name kibana docker.elastic.co/kibana/kibana:6.7.1
5. Java完成Spark消费和监控Kafka的日志数据
这里有一个国外大神写的demo,我做了自己需求的修改。
https://github.com/eugenp/tutorials/tree/master/apache-spark
直接拉下code,增加自己的类LogMonitor, 来监控(打印出)错误的日志信息。
public class LogMonitor {
public static void main(String[] args) throws InterruptedException {
Logger.getLogger("org")
.setLevel(Level.OFF);
Logger.getLogger("akka")
.setLevel(Level.OFF);
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "localhost:9092");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "use_a_separate_group_id_for_each_stream");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", false);
Collection<String> topics = Arrays.asList("messages");
SparkConf sparkConf = new SparkConf();
sparkConf.setMaster("local[2]");
sparkConf.setAppName("WordCountingApp");
JavaStreamingContext streamingContext = new JavaStreamingContext(sparkConf, Durations.seconds(1));
JavaInputDStream<ConsumerRecord<String, String>> messages = KafkaUtils.createDirectStream(streamingContext, LocationStrategies.PreferConsistent(), ConsumerStrategies.<String, String> Subscribe(topics, kafkaParams));
JavaPairDStream<String, String> results = messages.mapToPair(record -> new Tuple2<>(record.key(), record.value()));
JavaDStream<String> lines = results.map(tuple2 -> tuple2._2());
JavaDStream<String> errorLines = lines.filter(x->x.toLowerCase().contains("error")||x.toLowerCase().contains("warn"));
errorLines.print();
streamingContext.start();
streamingContext.awaitTermination();
}
}
生成jar包,然后在spark的安装目录(spark-2.4.3-bin-hadoop2.7/bin)下执行
./spark-submit --class com.baeldung.data.pipeline.LogMonitor --master local[2] /home/ubuntu/apache-spark-1.0-SNAPSHOT-jar-with-dependencies.jar
然后你会看见warn或者error的信息会被即使捕获并打印。示例是一个warning信息。
ubuntu@ubuntu:~/spark-2.4.3-bin-hadoop2.7/bin$ ./spark-submit --class com.baeldung.data.pipeline.LogMonitor --master local[2] /home/ubuntu/apache-spark-1.0-SNAPSHOT-jar-with-dependencies.jar
19/06/05 02:15:12 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 139.24.217.54 instead (on interface enp0s17)
19/06/05 02:15:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/06/05 02:15:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
-------------------------------------------
Time: 1559726123000 ms
-------------------------------------------
-------------------------------------------
Time: 1559726124000 ms
-------------------------------------------
{"@timestamp":1559724977.000000, "container_id":"ba05eb706392c888eecaeee5f2612d8b3562a4bdf71e486c831f593aa0ab4601", "container_name":"/elasticsearch", "source":"stdout", "log":"OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.\r"}
6. 外接Dingtalk或者其他方式来报警
这里有个网上有一段java的demo代码外接钉钉(Dingtalk)方式,可以参考
public static void dingtalk(){
String WEBHOOK_TOKEN = "https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxx";
HttpClient httpclient = HttpClients.createDefault();
HttpPost httppost = new HttpPost(WEBHOOK_TOKEN);
httppost.addHeader("Content-Type", "application/json; charset=utf-8");
String textMsg = "{ \"msgtype\": \"text\", \"text\": {\"content\": \"this is msg\"}}";
StringEntity se = new StringEntity(textMsg, "utf-8");
httppost.setEntity(se);
HttpResponse response = null;
try {
response = httpclient.execute(httppost);
if (response.getStatusLine().getStatusCode()== HttpStatus.SC_OK){
String result= EntityUtils.toString(response.getEntity(), "utf-8");
System.out.println(result);
}
} catch (IOException e) {
e.printStackTrace();
}
}
这段代码只是基本示例,需要重写。
基本过程就是酱紫。
后面还是要恶补一下理论。
转载于:https://www.jianshu.com/p/e247934c2a81