可疑对象布控分析
上一节对 “可疑对象” 进行离线分析,确定了部分对象可能是可疑对象,并将这些值写入到 “可疑结果表” 中。本节继续从实时性这方面来考虑,从 “可疑结果表” 中获取数据,与实时数据流进行分析,来判断这些可疑对象的可能轨迹。
片段代码
- 从数据库中获取数据
Dataset<Row> dataFromSql = spark.read()
.format("jdbc")
.option("url","jdbc:mysql://bigdata01:3306/test?characterEncoding=UTF-8")
.option("dbtable","t_verify_result")
.option("user","root")
.option("password","123456")
.load();
- 从 Kafka 获取实时数据流
//从 Kafka 中获取数据
String broker = "bigdata01:9092";
String topics = "peopledata";
JavaStreamingContext scc = new JavaStreamingContext(jsc, Durations.seconds(5));
Collection<String> toplist = new HashSet<String>(Arrays.asList(topics.split(",")));
Map<String, Object> kafkaParams = new HashMap<String, Object>();
kafkaParams.put("bootstrap.servers", broker);
kafkaParams.put("group.id", "vcs_group");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
Map<TopicPartition, Long> offsets = new HashMap<TopicPartition, Long>();
JavaInputDStream<ConsumerRecord<String, String>> kafkaDStream = KafkaUtils.createDirectStream(
scc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.Subscribe(toplist, kafkaParams, offsets)
);
- 获取两个数据表中都存在的数据
JavaDStream<String> resultDStream = pairDStream.transform(new Function<JavaPairRDD<String, String>, JavaRDD<String>>() {
@Override
public JavaRDD<String> call(JavaPairRDD<String, String> v1) throws Exception {
JavaRDD<String> resultRDD = v1.join(pairSQLRDD)
.map(new Function<Tuple2<String, Tuple2<String, String>>, String>() {
@Override
public String call(Tuple2<String, Tuple2<String, String>> t1) throws Exception {
return t1._1 + "," + t1._2._1;
}
});
return resultRDD;
}
});
- 保存到数据库
resultDStream.foreachRDD(new VoidFunction<JavaRDD<String>>() {
@Override
public void call(JavaRDD<String> rdd) throws Exception {
// rdd.foreach(new VoidFunction<String>() {
// @Override
// public void call(String s) throws Exception {
// System.out.println("s= "+s);
// }
// });
rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
@Override
public void call(Iterator<String> iter) throws Exception {
Connection conn = JdbcUtils.getConnection();
conn.setAutoCommit(false);
PreparedStatement pstmt = null;
while (iter.hasNext()) {
String data = iter.next();
String[] fields = data.split(",");
String rsfz = fields[0];
String grxb = fields[1];
String psqk = fields[2];
String tgsj = fields[3];
String sbbh = fields[4];
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
Date tgsjDate = sdf.parse(tgsj);
String sql = "insert into t_streaming_result (jsbh,rsfz,grxb,psqk,tgsj,sbbh,cjsj) values (?,?,?,?,?,?,?)";
pstmt = conn.prepareStatement(sql);
long jsbh = System.currentTimeMillis();
pstmt.setString(1,jsbh+"_streaming");
pstmt.setString(2,rsfz);
pstmt.setString(3,grxb);
pstmt.setString(4,psqk);
pstmt.setTimestamp(5,new Timestamp(tgsjDate.getTime()));
pstmt.setString(6, sbbh);
pstmt.setTimestamp(7,new Timestamp(jsbh));
pstmt.executeUpdate();
}
conn.commit();
JdbcUtils.free(pstmt,conn);
}
});
}
});
实际操作
本节的计算操作是基于上一节的计算结果,所以从数据库中获取数据方面的操作没有其他要求,仅需要将上一节的结果保存在数据库中。
接着进入本节的实操内容。
- 创建结果集数据库,来保存最终计算结果。
CREATE TABLE t_streaming_result(
JSBH text,
RSFZ text,
GRXB text,
PSQK text,
TGSJ text,
SBBH text,
CJSJ text
)charset utf8 collate utf8_general_ci;
- 编写 KafkaProducer,依旧采用之前程序代码处理。并将 完整 jar 包保存到虚拟机组成的计算环境下。在执行 Spark 计算程序才开始进行操作
java -cp /root/monitoranalysis-1.0-SNAPSHOT-jar-with-dependencies.jar com.monitor.produce.PeopleProducer
- Spark 程序 jar 包上传,并可以执行
./spark-submit
--master spark://bigdata01:7077
--class com.monitor.compare.streaming.VerifyComputeSteaming
--deploy-mode client
/root/monitoranalysis-1.0-SNAPSHOT.jar
代码配合这些操作,就可以得到想要的计算结果了。