Spark 项目实战之数据服务分析(七)

可疑对象布控分析

上一节对 “可疑对象” 进行离线分析,确定了部分对象可能是可疑对象,并将这些值写入到 “可疑结果表” 中。本节继续从实时性这方面来考虑,从 “可疑结果表” 中获取数据,与实时数据流进行分析,来判断这些可疑对象的可能轨迹。

片段代码

  1. 从数据库中获取数据
        Dataset<Row> dataFromSql = spark.read()
                .format("jdbc")
                .option("url","jdbc:mysql://bigdata01:3306/test?characterEncoding=UTF-8")
                .option("dbtable","t_verify_result")
                .option("user","root")
                .option("password","123456")
                .load();
  1. 从 Kafka 获取实时数据流
        //从 Kafka 中获取数据
        String broker = "bigdata01:9092";
        String topics = "peopledata";

        JavaStreamingContext scc = new JavaStreamingContext(jsc, Durations.seconds(5));

        Collection<String> toplist = new HashSet<String>(Arrays.asList(topics.split(",")));
        Map<String, Object> kafkaParams = new HashMap<String, Object>();

        kafkaParams.put("bootstrap.servers", broker);
        kafkaParams.put("group.id", "vcs_group");
        kafkaParams.put("auto.offset.reset", "latest");
        kafkaParams.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        kafkaParams.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        Map<TopicPartition, Long> offsets = new HashMap<TopicPartition, Long>();

        JavaInputDStream<ConsumerRecord<String, String>> kafkaDStream = KafkaUtils.createDirectStream(
                scc,
                LocationStrategies.PreferConsistent(),
                ConsumerStrategies.Subscribe(toplist, kafkaParams, offsets)
        );
  1. 获取两个数据表中都存在的数据
        JavaDStream<String> resultDStream = pairDStream.transform(new Function<JavaPairRDD<String, String>, JavaRDD<String>>() {
            @Override
            public JavaRDD<String> call(JavaPairRDD<String, String> v1) throws Exception {

                JavaRDD<String> resultRDD = v1.join(pairSQLRDD)
                        .map(new Function<Tuple2<String, Tuple2<String, String>>, String>() {
                            @Override
                                public String call(Tuple2<String, Tuple2<String, String>> t1) throws Exception {
                                return t1._1 + "," + t1._2._1;
                            }
                        });
                return resultRDD;
            }
        });
  1. 保存到数据库
        resultDStream.foreachRDD(new VoidFunction<JavaRDD<String>>() {
            @Override
            public void call(JavaRDD<String> rdd) throws Exception {
//                rdd.foreach(new VoidFunction<String>() {
//                    @Override
//                    public void call(String s) throws Exception {
//                        System.out.println("s= "+s);
//                    }
//                });

                rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
                    @Override
                    public void call(Iterator<String> iter) throws Exception {

                        Connection conn = JdbcUtils.getConnection();
                        conn.setAutoCommit(false);

                        PreparedStatement pstmt = null;

                        while (iter.hasNext()) {
                            String data = iter.next();
                            String[] fields = data.split(",");
                            String rsfz = fields[0];
                            String grxb = fields[1];
                            String psqk = fields[2];
                            String tgsj = fields[3];
                            String sbbh = fields[4];

                            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
                            Date tgsjDate = sdf.parse(tgsj);
                            String sql = "insert into t_streaming_result (jsbh,rsfz,grxb,psqk,tgsj,sbbh,cjsj) values (?,?,?,?,?,?,?)";

                            pstmt = conn.prepareStatement(sql);
                            long jsbh = System.currentTimeMillis();
                            pstmt.setString(1,jsbh+"_streaming");
                            pstmt.setString(2,rsfz);
                            pstmt.setString(3,grxb);
                            pstmt.setString(4,psqk);
                            pstmt.setTimestamp(5,new Timestamp(tgsjDate.getTime()));
                            pstmt.setString(6, sbbh);
                            pstmt.setTimestamp(7,new Timestamp(jsbh));
                            pstmt.executeUpdate();
                        }
                        conn.commit();
                        JdbcUtils.free(pstmt,conn);
                    }
                });
            }
        });

实际操作

本节的计算操作是基于上一节的计算结果,所以从数据库中获取数据方面的操作没有其他要求,仅需要将上一节的结果保存在数据库中。

接着进入本节的实操内容。

  1. 创建结果集数据库,来保存最终计算结果。
CREATE TABLE t_streaming_result(
JSBH text,
RSFZ text,
GRXB text,
PSQK text,
TGSJ text,
SBBH text,
CJSJ text
)charset utf8 collate utf8_general_ci;
  1. 编写 KafkaProducer,依旧采用之前程序代码处理。并将 完整 jar 包保存到虚拟机组成的计算环境下。在执行 Spark 计算程序才开始进行操作
java -cp /root/monitoranalysis-1.0-SNAPSHOT-jar-with-dependencies.jar com.monitor.produce.PeopleProducer
  1. Spark 程序 jar 包上传,并可以执行
./spark-submit
--master spark://bigdata01:7077  
--class com.monitor.compare.streaming.VerifyComputeSteaming 
--deploy-mode client 
/root/monitoranalysis-1.0-SNAPSHOT.jar

代码配合这些操作,就可以得到想要的计算结果了。

实例代码

https://github.com/yy1028500451/MonitorAnalysis/tree/master/src/main/java/com/monitor/compare/streaming

猜你喜欢

转载自blog.csdn.net/dec_sun/article/details/89766371
今日推荐