The application of TDengine in "one map and one library" helps transportation to realize information transformation

Small T Introduction: In the construction of a large traffic data resource management system and related applications "one map and one database", there are a large number of time series data application scenarios, the most critical of which is the storage and use of time series data generated by vehicle operation. After the selection research, it was decided to use TDengine. This article summarizes the relevant experience and suggestions on the implementation of cluster architecture and the effect of write query in detail.


Company Profile

Beijing Jinhai Leyou Software Co., Ltd. is a modern technology enterprise that takes the development of computer software architecture as the core and implements the integration of computer software and hardware systems. The company is engaged in the research and development of system software in the transportation industry. The technology covers and is not limited to Java Spring framework (optional technical framework: C/C++, Python, Go) industry business logic implementation, TransCAD\TransModeler\Supermap\Arcgis and other commercial software secondary Development and algorithm implementation, big data analysis and data center, traffic asset management.


Background of the project

In order to strengthen the city's transportation management, coordinate comprehensive transportation development, and improve transportation operation and management efficiency, a city-level management unit has established a large transportation data resource management system and related applications "one map and one database". The main contents of the "One Library" part include: data access, data storage, and data sharing; the main contents of the "One Map" part include: the visual representation of GIS information and its associated data information on two-dimensional and three-dimensional maps.

The essence of the big traffic data resource management system is to build an efficient and available industry data middle-end system for the transportation industry of a city. Its basic structure is as follows:
In the construction of the data center, there are a large number of time series data application scenarios, the most critical of which is the storage and use of time series data generated by vehicle operation. As shown in the figure below, in the integrated traffic operation monitoring system, GPS time series data is an extremely important data resource:
The relevant data uploaded in real time by the vehicles and ships of taxis, online car-hailing, buses, rails, water transportation, long and short passenger transportation, two passengers and one danger, and railway transportation all have the characteristics of time series data, and the daily inbound data volume can easily break through At the level of 100 million, the operation data of vehicles and ships in first-tier cities or provincial-level regions can even reach hundreds of millions or even billions. How to efficiently store time series data into the database has become a core requirement of the data center in the transportation industry.

In addition, in the information business of the transportation industry, there is also the core demand for efficient query of the real-time location of vehicles. For a typical use case, users need to frequently obtain information such as the number of online vehicles (tens of thousands to hundreds of thousands), final location information, and business quota. in seconds). Moreover, with the process of informatization, users will frequently have various new time series data query requirements, which need to be implemented and deployed quickly and efficiently.

Third, from the importance and sensitivity of data in the transportation industry, the reliability and scalability of data storage are also extremely important core demands.

one.

Database selection and landing process



Why TDengine?


At the beginning of the selection, we considered three database objects, namely InfluxDB, ClickHouse and TDengine.

  • InfluxDB: A mature and established time series database, but important cluster functions require commercial versions. Considering the special nature of foreign commercial software, there are certain risks from approval to payment and subsequent security.

  • ClickHouse: A high-performance database developed in Russia, its non-standardized SQL learning costs are high and cluster maintenance costs are high.

  • TDengine: Localized database, Chinese open source community, excellent writing speed and easy-to-maintain cluster architecture, these three reasons finally prompted us to choose TDengine as the project's time series database.


Reference selection articles:
  • TDengine Testing Report (Link: https://www.taosdata.com/downloads/TDengine_Testing_Report_cn.pdf)
  • System Properties Comparison ClickHouse vs. InfluxDB vs. TDengine(链接:https://db-engines.com/en/system/ClickHouse%3BInfluxDB%3BTDengine)


 

Cluster architecture landing


Use 5 servers to build the Dnode cluster of TDengine, and the number of replicas of Mnode and database is set to 3.

在集群的设定初期,由于当前版本 Mnode 个数的缺省值由之前的 3 个变更成为了 1 个,系统 Mnode 最初没有副本,使用以下的方法进行安全的重启维护,将 Mnode 总数升级为 3 个,此方法由官方社群中提供的 TDengine 集群版升级步骤变更而来:

  1. 确保集群节点状态正常(show dnodes;),读写无问题

  2. 在所有节点停止数据库服务 systemctl stop taosd

  3. 备份数据文件目录下的所有内容 到数据文件目录之外

  4. 分别 cd 进入各个节点的数据文件目录

  5. tree 命令检查所有 vnode 目录下的 wal 目录是否为空

  6. 如果为空,进入步骤 7

  7. 如果非空,启动数据库进程,再关闭,直到 wal 全部为空

  8. 在数据库服务 taosd 停止的状态下,分别在所有节点修改配置文件,将 numOfMnodes 的值设为3

  9. 分别启动所有节点的 taosd 服务,systemctl start taosd。

  10. show dnodes 检查节点状态

  11. 检查数

 

数据写入架构落地


由于我们的业务开发框架使用的是 Srping 框架,在使用 TAOS-JDBCDriver 进行开发时,可以选择两种方式进行数据入库——JDBC-JNI 或者是 JDBC-RESTful。在 TDengine 官网,明确记载了“JDBC-RESTful 性能是 JDBC-JNI 的 50%~90%”,所以,我们选择了 JDBC-JNI 方式进行多线程入库。

在 JDBC-JNI 方式中,依然有两种实现方式,在数据库连接池(Hikari、druid)的基础上,原生 SQL 执行写入或者是使用 ORM 框架(MyBatis等)执行写入。在试运行初期,我们使用了 ORM 框架进行数据写入,在当前的数据写入量之下,并没有太大的问题。

但是,在社区交流群中有朋友提出:“ORM 框架大多数面向关系库开发场景,每秒几万的吞吐量对它们来说就很大了,但在时序数据写入场景中,这连塞牙缝都不够,设计满足的应用场景不同导致适配度有差异,但查询影响不大”,我们认识到 ORM 框架本身可能存在性能瓶颈,因此在未来的版本中,我们使用了数据库连接池(Hikari、druid)+原生 SQL 执行写入为主要写入模式。

一.

接入 TDengine 的效果展示



 

写入效果


项目的目标写入量为 1 亿条/天,每秒钟写入 1158 条左右,我们通过 TDengine 自带的 log 功能进行分析,确认写入效率。由下图可以看出,五节点构成的集群中,目前瞬时写入能力取不精确的最大值,也就是 dn1 节点的 23107 条。log.dn 表中数据采集的周期是 30 秒,由此可知,dn1 的实测瞬时最大写入量是 770 条/秒。加之五节点的集群在分布式插入的架构下,770*5=3850 条/秒的数据插入效率是完全可以保障的,完全满足了我们业务需求。至于本集群的插入性能上限,应在此实测值的 100 倍以上,并且有极大的增长空间。


这里说明一下 log.dn 表是及其重要的一张 TDengine 自带的运行状态数据表,我们可以通过此表对 TDengine 的运行状态进行监视,后面在查询资源占用情况的时候,我们还会用到这张表。log 表字段的说明如下图:(据官方社区工作人员表示,后面新版本的 TDinsight 是更好的监控工具,有机会打算试用一下)


 

查询效果


所有车辆最新位置信息的查询是交通运行监控中的重中之重,最初“使用何种查询语句实现高效查询”是令我们非常困扰的一件事,后面在 TDengine 社区团队的帮助下,我们利用了隐藏字段名 tbname 和 group by 方法,高效地查询了车辆的最新定位信息。从下图可以看到,频繁查询的情况下,接近六万辆车的位置信息,只用了不到 1 秒的查询时间,简单而又高效,完全符合我们的业务需求



数据统计分析也是各种业务系统中需要广泛实装的一个功能。我们再看一个例子,一个 64 天数据量的表,进行每日数据条数的降维统计,所需时间也不到 1 秒:



在合理的 SQL 设计支持之下,TDengine 的查询效率完全可以满足时序类数据的高效查询需求,大大简化了开发难度,降低了运维成本,整个团队都为此感到满意。

 

资源占用


下图中罗列了一日内各个节点的 CPU、内存、带宽、IO 读写的相关数据(最大采样值),其资源消耗尤其是 CPU 方面的消耗是非常稳定和可控的:



三.

结语



在本次开发中,TDengine 展现出的性能效果非常显著,推动了交通行业海量时序数据业务快速高质地落地,极大地降低了开发与运维成本。对于堪称国货精品的 TDengine,我们愿意付出更多的耐心与信心,甚至愿意参加到开源社区的开发活动中去,为其建立一个良好的社区生态而努力。

而对于 TDengine 产品本身,我们也有更多的期许:
  • 在目前的高迭代开发期,尤其是对集群客户,提供不间断服务的无缝升级功能
  • 建议 TDengine 开展培训、认证、服务分成体系,培养更多的认证服务代理商
  • 提供更多的专业领域函数,比如说空间函数库,功能上可以参考 MySQL 的空间函数库
  • 开放第三方的函数插件市场,在插件开发规范的基础上,会有更多的用户贡献出专业领域的函数插件

借助中国的巨大市场,愿 TDengine 早日成长为时序数据库领域的 Oracle!

作者 | 王扬 北京金海乐游软件有限公司 技术主管

本文分享自微信公众号 - TDengine(taosdata_news)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324092888&siteId=291194637
one
ONE