influxdb实战- 强生出租车位置数据存储与查询

一、数据准备

下载开放的强生出租车的数据,网址如下,http://shanghai.sodachallenges.com/data.html,本测试的案例已经下载了30天的轨迹,大概每天6000万条,原始文件以txt存储,大概的文件目录结构如下:
数据//HT160801//01//00//1608010000.txt
其中txt代表某一分钟的所有出租车位置数据
数据格式如下:
14082|A|0|1|V|0|0|0|2016-08-01|2016-08-01 00:01:05|121.480657|31.180292|44.0|318.0|0|032

二、数据描述

出租车定位的信息项有很多,这里为了简化influxdb的存储,摘取了主要的几项,构建SHTaxi实体如下

private String carid;//车辆编号
private String datetime;//定位时间
private double lon;//经度
private double lat;//维度
private double speed;//速度
private double dir;//方位角

三、存储设计

1.创建shtaxi表
2.将carid属性作为tag值
3.将lon、lat、speed等作为Field
4.将datetime转换为long型时间,存储为time
5.分批批量加入,我这里是解析每个txt文件里面所有,然后批量插入一次

private static void insertData() {
        InfluxdbUtils influxdb = new InfluxdbUtils();
        try {
                    influxdb.setUp();
        String basepath = "D:\\Data\\强生taxi\\HT160801\\01\\";
        for (int i = 0; i < 24; i++) {
            for (int j = 0; j < 60; j++) {
                String stri = String.valueOf(i);
                if (i < 10) {
                    stri = "0" + i;
                }

                String strj = String.valueOf(j);
                if (j < 10) {
                    strj = "0" + j;
                }
                strj = "160801" + stri + strj;


                String path = basepath + stri + "\\" + strj + ".txt";
                System.out.println("准备写入" + path);
                BatchPoints pts = setUpBatchPoints(getData(path));
                influxdb.write(pts);
                System.out.println("写入成功");
            }
        }

        System.out.println("写入成功");

        influxdb.closeInfluxDBClient();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

 private static List<SHTaxi> getData(String basepath) {

    File file = new File(basepath);
    if (file.exists()) {
        BufferedReader reader = null;
        try {
            System.out.println("以行为单位读取文件内容,一次读一整行:");
            reader = new BufferedReader(new FileReader(file));
            String tempString = null;
            List<SHTaxi> taxis = new ArrayList<SHTaxi>();

            // 一次读入一行,直到读入null为文件结束
            while ((tempString = reader.readLine()) != null) {
                // 显示行号
                String[] infos = tempString.split("\|");
                SHTaxi taxi = new SHTaxi();
                taxi.setCarid(infos[0]);

                taxi.setDatetime(infos[9]);
                taxi.setLon(Double.parseDouble(infos[10]));
                taxi.setLat(Double.parseDouble(infos[11]));
                taxi.setDir(Double.parseDouble(infos[13]));
                taxi.setSpeed(Double.parseDouble(infos[12]));
                taxi.setState(Integer.parseInt(infos[14]));

                taxis.add(taxi);
            }

            reader.close();

            return taxis;

        } catch (IOException e) {
            e.printStackTrace();
            return null;
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e1) {
                }
            }
        }
    } else {
        System.out.println("没有找到文件");
        return null;
    }

}

 public static BatchPoints setUpBatchPoints(List<SHTaxi> taxis) {
    System.out.println("构建BatchPoints");

    BatchPoints batchPoints = BatchPoints.database(InfluxdbUtils.DATABASE).tag("async", "true")
            .retentionPolicy(InfluxdbUtils.rpName).consistency(ConsistencyLevel.ALL).build();
    for (SHTaxi taxi : taxis) {
        Point point = Point.measurement("shtaxi").tag("carid", taxi.getCarid())
                .time(DateUtils.formatDate(taxi.getDatetime()).getTime(), TimeUnit.MILLISECONDS)
                .addField("lon", taxi.getLon()).addField("lat", taxi.getLat()).addField("head", taxi.getDir())
                .addField("speed", taxi.getSpeed()).build();

        batchPoints.point(point);
    }
    return batchPoints;
}

四、基础查询

1.当前表的总条数

select count(*) from shtaxi
查询结果:
time count_head count_lat count_lon count_speed
0 64934444 64934444 64934444 64934444
大概有6000多万数据,查询非常耗时,大约30s

2.查询十条数据

SELECT * from shtaxi limit 10
查询结果:
time async carid head lat lon speed
1222639851000000000 true 18860 716 31.276538 121 0
1262275200000000000 true 00003 0 31.313642 121.427585 0
1262275200000000000 true 00007 0 31.262738 121.581457 0
1262275200000000000 true 00016 0 31.299445 121.11689 0
1262275200000000000 true 00020 0 31.229033 121.279512 0
1262275200000000000 true 00025 0 31.265292 121.511433 0
1262275200000000000 true 00035 0 31.108333 121.44878 0
1262275200000000000 true 00037 0 31.217817 121.406355 0
1262275200000000000 true 00048 0 31.153637 121.547815 0
1262275200000000000 true 00049 0 31.276552 121.35836 0

整个查询时间相对第一个查询快了很多,但是差不多也10s左右。

3.查询指定时间段的位置数据

SELECT * FROM shtaxi where time>=’2016-07-31T20:33:32Z’ AND time <=’2016-07-31T20:33:33Z’
查询结果:
time async carid head lat lon speed
1469997213000000000 true 13617 96 30.990268 121.496652 33
1469997213000000000 true 13626 333 31.009327 121.72234 0
1469997213000000000 true 13657 347 31.286078 121.550668 0
1469997213000000000 true 13662 108 31.301087 121.454312 0
1469997213000000000 true 13684 242 31.360207 121.417495 0
1469997213000000000 true 13696 240 31.110172 121.258982 0
1469997213000000000 true 13726 301 31.340192 121.455047 0
1469997213000000000 true 13738 134 30.990753 121.501022 0
1469997213000000000 true 13744 325 31.30235 121.528888 0
1469997213000000000 true 13751 325 31.35154 121.396623 0
1469997213000000000 true 13758 32 31.126795 121.393505 0
1469997213000000000 true 13765 115 31.321905 121.418807 0
1469997213000000000 true 13792 60 31.386887 121.360457 0
1469997213000000000 true 13797 81 31.26792 121.599462 0
1469997213000000000 true 13803 244 31.349768 121.3945 0
1469997213000000000 true 13804 105 31.306233 121.392213 0
1469997213000000000 true 13807 309 31.25703 121.566315 0

查询速度非常快,毫秒级的查询返回

五、总结

本次主要是测试目标为较大数据量的存储以及它的查询性能,目前来看,时间条件下的查询优势还是非常明显的。

猜你喜欢

转载自blog.csdn.net/xinlingjun2007/article/details/80273348
今日推荐