Spark 项目实战之数据服务分析(五)

业务逻辑的处理

同行对象

判断是否是同行对象,那么可以通过判断两个对象是否通过了多个相同的地方,这里使用经纬度来处理。当然也可以每个监测设备进行标记,当对象通过此监测设备时,能被此设备捕捉到。

思路:

  1. 根据通过对象的 RSFZ 聚合被监测的对象通过指定时间内的所有地点并进行排序,处理后的数据为:
    rsfz,[(tgsj3, sbbh3), (tgsj2, sbbh2), (tgsj4, sbbh4), (tgsj5, sbbh5)]

  2. 根据同行对象要通过的地点数量分段对象通过的地点,如上面数据被分解成:
    sbbh3 -> sbbh2 -> sbbh4,(rsfz,[ sbbh3 tgsj3,sbbh2 tgsj2,sbbh4 tgsj4 ])
    sbbh2 -> sbbh4 -> sbbh5,(rsfz,[ sbbh2 tgsj2,sbbh4 tgsj4,sbbh5 tgsj5 ])

  3. 把通过相同地点序列的被监测对象聚合,
    sbbh3 -> sbbh2 -> sbbh4,[(rsfz1,[sbbh3 tgsj3, sbbh2 tgsj2, sbbh4 tgsj4],(rsfz2, [sbbh3 tgsj3, sbbh2 tgsj2, sbbh4 tgsj4] ),(rsfz3, (sbbh3 tgsj3, sbbh2 tgsj2, sbbh4 tgsj4))]

  4. 判断通过相同序列地方的被监测对象通过相同地方的时间差来确认是否满足同行的要求。

代码

public class TogetherCompute {
    public static void main(String[] args) {
        SparkSession spark = SparkSession.builder()
                .enableHiveSupport()
                .appName("TogetherCompute")
                .getOrCreate();

        JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
        jsc.setLogLevel("ERROR");
        Dataset<Row> allData = spark.sql("select * from t_people_together");
        JavaRDD<Row> allDataRDD = allData.javaRDD();

        JavaPairRDD<String, Tuple2<String, String>> allPairRDD = allDataRDD.mapToPair(new PairFunction<Row, String, Tuple2<String, String>>() {
            @Override
            public Tuple2<String, Tuple2<String, String>> call(Row row) throws Exception {
                String rsfz = row.getAs("rsfz");
                String tgsj = row.getAs("tgsj");
                String sbbh = row.getAs("sbbh");
                return new Tuple2<String, Tuple2<String, String>>(rsfz, new Tuple2<String, String>(tgsj, sbbh));
            }
        });

        JavaPairRDD<String, Iterable<Tuple2<String, String>>> groupDataRDD = allPairRDD.groupByKey();

        groupDataRDD.foreach(new VoidFunction<Tuple2<String, Iterable<Tuple2<String, String>>>>() {
            @Override
            public void call(Tuple2<String, Iterable<Tuple2<String, String>>> s2) throws Exception {
                String t1 = s2._1;
                StringBuilder sbt2 = new StringBuilder();
                StringBuilder sbt3 = new StringBuilder();

                Iterator<Tuple2<String, String>> iter = s2._2.iterator();

                while (iter.hasNext()) {
                    Tuple2<String, String> tuple = iter.next();
                    sbt2.append(tuple._1).append(",");
                    sbt3.append(tuple._2).append(",");
                }
            }
        });

        JavaRDD<Tuple2<String, Tuple2<String, String>>> flatDataRDD = groupDataRDD.flatMap(new FlatMapFunction<Tuple2<String, Iterable<Tuple2<String, String>>>, Tuple2<String, Tuple2<String, String>>>() {
            @Override
            public Iterator<Tuple2<String, Tuple2<String, String>>> call(Tuple2<String, Iterable<Tuple2<String, String>>> s2) throws Exception {
                List<Tuple2<String, Tuple2<String, String>>> result = new ArrayList<Tuple2<String, Tuple2<String, String>>>();
                List<Tuple2<String, String>> list = IteratorUtils.toList(s2._2.iterator());
                /**
                 * 数据首先经过了 groupByKey,这是按照 key 来进行分组。
                 * 然后再通过 flatMap 进行切割重组,这里是为了得到 <jwzb, gsfz,tgsj>
                 * 这里按照每 3 个 latLon 作为一组,
                 * 输出为 [jwzb,[gsfz,tgsj]]
                 */

                for (int i = 0; i < list.size() - 2; i++) {
                    StringBuilder sbTGSJ = new StringBuilder();
                    StringBuilder sbSBBH = new StringBuilder();
                    for (int j = 0; j < 3; j++) {
                        if (j + i < list.size()) {
                            sbTGSJ.append(list.get(j + i)._1).append(",");
                            sbSBBH.append(list.get(j + i)._2).append(",");
                        } else {
                            break;
                        }
                    }
                    System.out.println("sbTime:" + sbTGSJ.toString());
                    System.out.println("sbKkbh:" + sbSBBH.toString());
                    result.add(new Tuple2<String, Tuple2<String, String>>(sbSBBH.toString(), new Tuple2<String, String>(s2._1, sbTGSJ.toString())));
                }

                return result.iterator();
            }
        });

        flatDataRDD.mapToPair(new PairFunction<Tuple2<String, Tuple2<String, String>>, String, Tuple2<String, String>>() {
            @Override
            public Tuple2<String, Tuple2<String, String>> call(Tuple2<String, Tuple2<String, String>> t2) throws Exception {
                return new Tuple2<String, Tuple2<String, String>>(t2._1, t2._2);
            }
        }).groupByKey().map(new Function<Tuple2<String, Iterable<Tuple2<String, String>>>, String>() {
            @Override
            public String call(Tuple2<String, Iterable<Tuple2<String, String>>> v1) throws Exception {
                Set<String> rsfzSet = new HashSet<String>();
                StringBuilder sbrsfz = new StringBuilder();

                String sbbh = v1._1;

                List<Tuple2<String, String>> list = IteratorUtils.toList(v1._2.iterator());
                for (int i = 0; i < list.size(); i++) {
                    for (int j = i + 1; j < list.size(); j++) {
                        String tgsj1 = list.get(i)._2;
                        String tgsj2 = list.get(j)._2;

                        String rsfz1 = list.get(i)._1;
                        String rsfz2 = list.get(j)._1;

                        String[] times01 = tgsj1.split(",");
                        String[] times02 = tgsj2.split(",");

                        for (int k = 0; k < times01.length; k++) {
                            double subMinutes = TimeUtils.getSubMinutes(times01[i], times02[i]);
                            if (subMinutes <= 3) {
                                rsfzSet.add(rsfz1);
                                rsfzSet.add(rsfz2);
                            }
                        }
                    }
                }
                for (String rsfz : rsfzSet) {
                    sbrsfz.append(rsfz).append(",");
                }
                String resultStr = sbbh + "&" + (sbrsfz.toString());
                return resultStr;
            }
        }).filter(new Function<String, Boolean>() {
            @Override
            public Boolean call(String v1) throws Exception {
                return v1.split("&").length > 1;
            }
        }).foreach(new VoidFunction<String>() {
            @Override
            public void call(String s) throws Exception {

                String rsfz = s.split("&")[1];
                String tgsj = s.split("&")[2];
                String sbbh = s.split("&")[3];

                Connection conn = JdbcUtils.getConnection();

                PreparedStatement pstmt = null;
                String sql = "insert into t_people_result2 (CJSJ,RSFZ,TGSJ,SBBH) values (?,?,?,?)";

                pstmt = (PreparedStatement) conn.prepareStatement(sql);

                //添加时间戳。
                long cjsj = System.currentTimeMillis();
                pstmt.setString(1, cjsj + "");
                pstmt.setString(2, rsfz);
                pstmt.setString(3, tgsj);
                pstmt.setString(4, sbbh);
                pstmt.executeUpdate();
                JdbcUtils.free(pstmt, conn);
            }
        });
    }
}

实操实现:

  1. 在 hdfs 上创建目录,并将数据文件导入到目录中
-- 在 hdfs 上创建 people_together
bin/hadoop dfs -mkdir /people_together

-- 将本地的 people01.csv 文件上传到 hdfs 上的 /people_together 目录
bin/hdfs dfs -put /root/people01.csv /people_together
  1. 打开Hive,并创建一个表
-- hive 
CREATE EXTERNAL TABLE t_people_together (ID string,
RSFZ string,
GRXB string,
PSQK string,
SSYZ string,
SSYS string,
XSYZ string,
XSYS string,
TGSJ string,
SBBH string,
JWZB string)
row format delimited fields terminated by ',' lines terminated by '\n' location '/people_together' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");

--查看
select * from t_people_together 
  1. 在 mysql 创建表
CREATE TABLE t_people_result2 (
CJCJ text,
RSFZ text,
TGSJ text,
SBBH text
)charset utf8 collate utf8_general_ci;
  1. spark 执行
./spark-submit 
--master spark://bigdata01:7077 
--class com.monitor.together.TogetherCompute 
--deploy-mode client 
/root/monitoranalysis-1.0-SNAPSHOT.jar

实例代码

https://github.com/yy1028500451/MonitorAnalysis/tree/master/src/main/java/com/monitor/together

猜你喜欢

转载自blog.csdn.net/dec_sun/article/details/89715692
今日推荐