Complex scenarios, best practices for migrating from OpenTSDB to TDengine

In the previous article , we introduced how to migrate from OpenTSDB to TDengine in the operation and maintenance monitoring scenario.

If the application is particularly complex, or the application field is not an operation and maintenance monitoring scenario, this article will introduce the advanced topic of migrating OpenTSDB applications to TDengine in a more comprehensive and in-depth manner.

Migration Assessment and Strategy for Other Scenarios

1. Differences between TDengine and OpenTSDB

This section will detail the differences between OpenTSDB and TDengine at the system function level.

After reading this section, you can comprehensively evaluate whether some complex applications based on OpenTSDB can be migrated to TDengine, and the issues that should be paid attention to after the migration.

TDengine currently only supports Grafana's visual kanban rendering, so if other kanbans (such as TSDash , Status Wolf , etc.) are used in the application, it cannot be directly migrated to TDengine for the time being, and it needs to be re-adapted to Grafana to run normally.

As of version 2.3.0.x, TDengine can only support collectd and StatsD as data collection and aggregation software. Of course, access support for more data collection and aggregation software will be provided in the future. If the collector uses other types of data aggregators, it needs to be adapted to the two data aggregator systems for normal writing. In addition to the above two data aggregation software protocols, TDengine also supports writing data directly through InfluxDB's line protocol, OpenTSDB's data writing protocol, and Json format, and can rewrite the logic of the data push side, using the line protocol supported by TDengine to data input.

In addition, if the following features of OpenTSDB are used in the application, the following considerations need to be understood before migration:

  1. /api/stats: If this feature is used in the application to monitor the service status of OpenTSDB, and related logic is established in the application for linkage processing, then the logic for reading and obtaining this part of the status needs to be re-adapted to TDengine. TDengine provides a new processing cluster status monitoring mechanism to meet the application's monitoring and maintenance needs.
  2. /api/tree: If you rely on this feature of OpenTSDB for hierarchical organization and maintenance of the timeline, it cannot be directly migrated to TDengine. TDengine uses the hierarchy of database->supertable->subtable to organize and maintain timelines. All timelines belonging to the same supertable are at the same level in the system, but can be created by special construction of different tag values. Simulates the multi-level structure of application logic.
  3. Rollup And PreAggregates: Using Rollup and PreAggregates, the application needs to decide to access the result of the Rollup in the appropriate place, and in some cases to access the original result, the opaqueness of this structure makes the application processing logic extremely complex and does not have the Portability. We consider this strategy to be a compromise and compromise when time series databases cannot provide high-performance aggregation. TDengine does not currently support automatic downsampling and (time period range) pre-aggregation for multiple timelines. Due to its high-performance query processing logic, even if it does not rely on Rollup and (time period) pre-aggregation calculation results, it can provide Very high performance query response, and makes your application query processing logic simpler.
  4. Rate: TDengine provides two functions for calculating the rate of change of numerical values, Derivative (the calculation result is consistent with the Derivative behavior of InfluxDB) and IRate (the calculation result is consistent with the calculation result of the IRate function in Prometheus). However, the results of these two functions are slightly different from Rate, but are more powerful overall. In addition, all calculation functions provided by OpenTSDB are supported by TDengine corresponding query functions, and the query functions of TDengine far exceed those supported by OpenTSDB, which can greatly simplify the application processing logic.

Through the above introduction, I believe that you should be able to understand the changes brought about by the migration of OpenTSDB to TDengine. This information will also help you to correctly judge whether it is acceptable to migrate your application to TDengine and experience the powerful time series data processing capabilities provided by TDengine. Convenient user experience.

2. Migration strategy

Firstly, the migration of the OpenTSDB-based system involves data schema design, system scale estimation, data write-side transformation, data distribution, and application adaptation; then run the two systems in parallel for a period of time, and then migrate the historical data to in TDengine. Of course, if some functions of your application strongly depend on the above OpenTSDB features, and you don't want to stop using them, you can consider keeping the original OpenTSDB system running, and start TDengine to provide the main services.

data model design

On the one hand, TDengine requires that the data it stores has strict schema definitions. On the other hand, the data model of TDengine is richer than OpenTSDB, and the multi-value model can be compatible with all single-value model establishment requirements. Now let's assume an operation and maintenance monitoring scenario. We use collectd to collect the basic metrics of the device, including several metrics such as memory, swap, and disk. The pattern in OpenTSDB is as follows:

serial number measurement (metric) value name type tag1 tag2 tag3 tag4 tag5
1 memory value double host memory_type memory_type_instance source  
2 swap value double host swap_type swap_type_instance

source

 
3 disk value double host disk_point disk_instance disk_type source

TDengine requires the stored data to have a data schema, that is, create a supertable and specify the schema of the supertable before writing data. For the establishment of the data schema, there are two ways to accomplish this work:

1) Make full use of TDengine's support for native data writing of OpenTSDB, call the API provided by TDengine to write (text line or JSON format) data, and automatically build a single-value model. Using this method does not require major adjustments to the data writing application, nor does it require conversion of the written data format. At the C language level, TDengine provides taos_insert_lines to directly write data in OpenTSDB format (this function corresponds to taos_schemaless_insert in version 2.3.x). For the code reference example, please refer to the sample code schemaless.c in the installation package directory.

2) On the basis of fully understanding the TDengine data model, combined with the characteristics of the generated data, manually establish the mapping relationship between OpenTSDB and TDengine data model adjustment. TDengine can support multi-valued models and single-valued models. Considering that OpenTSDB is a single-valued mapping model, it is recommended to use single-valued models for modeling in TDengine.

  • single value model

The specific steps are as follows: use the name of the metrics as the name of the TDengine super table. After the super table is built, it has two basic data columns—timestamp and value. The label of the super table is equivalent to the metric The label information, the number of labels is equal to the number of labels of the metric. The table name of the subtable is named in a way with a fixed rule: metric + '_' + tags1_value + '_' + tag2_value + '_' + tag3_value ...as the subtable name.

Create 3 hypertables in TDengine:

create stable memory(ts timestamp, val float) tags(host binary(12),memory_type binary(20), memory_type_instance binary(20), source binary(20));
create stable swap(ts timestamp, val double) tags(host binary(12), swap_type binary(20), swap_type_binary binary(20), source binary(20));
create stable disk(ts timestamp, val double) tags(host binary(12), disk_point binary(20), disk_instance binary(20), disk_type binary(20), source binary(20));

The sub-table is created using the dynamic table creation method as follows:

insert into memory_vm130_memory_bufferred_collectd using memory tags(‘vm130’, ‘memory’, 'buffer', 'collectd') values(1632979445, 3.0656);

In the final system, about 340 sub-tables and 3 super-tables will be established. It should be noted that if the subtable name exceeds the system limit (191 bytes) due to the method of concatenating the tag value, then a certain encoding method (such as MD5) needs to be used to convert it into an acceptable length.

  • multivalued model

If you want to use the multi-value model capabilities of TDengine, you need to first meet the following requirements: Different collection volumes have the same collection frequency, and can reach the data writing end through the message queue at the same time, so as to ensure that multiple indicators are written at one time using SQL statements enter. Use the name of the measure as the name of the super table to build a multi-column model of data that has the same collection frequency and can arrive at the same time. The table names of the subtables are named in a manner with fixed rules. Each of the above measures contains only one measurement value, so it cannot be turned into a multivalued model.

Data offloading and application adaptation

Subscribe to the data from the message queue and start the adjusted writer to write the data.

After the data starts to be written for a period of time, you can use an SQL statement to check whether the amount of data written meets the expected write requirements.

Use the following SQL statement for statistical data volume:

select count(*) from memory

After completing the query, if the written data is no different from the expected one, and there is no abnormal error message in the writing program itself, then it can be confirmed that the data writing is complete and valid.

TDengine does not support query or data acquisition processing using the query syntax of OpenTSDB, but provides corresponding support for each query of OpenTSDB. For details, please refer to the relevant documents .

TDengine supports the standard JDBC 3.0 interface to manipulate the database, and can also use other types of high-level language connectors to query and read data to adapt to the application. Please also refer to the user manual for specific operation and usage help.

Historical data migration

1. Use tools to automatically migrate data

In order to facilitate the migration of historical data, we provide a plug-in for the data synchronization tool DataX, which can automatically write data into TDengine. It should be noted that the automatic data migration of DataX can only support the data migration process of single-value models. For the specific usage of DataX and how to use DataX to write data to TDengine, please refer to its help manual github.com/taosdata/datax .

2. Manual data migration

If you need to use the multi-value model to write data, you need to develop a tool to export data from OpenTSDB, and then confirm which timelines can be merged and imported into the same timeline, and then write the time that can be imported at the same time into the database through SQL statements middle.

Manual migration of data requires attention to the following two issues:

1) When storing the exported data in the disk, the disk needs to have enough storage space to be able to fully accommodate the exported data files. In order to avoid the shortage of disk file storage after the full amount of data is exported, the partial import mode can be adopted, and the timelines belonging to the same hypertable are preferentially exported, and then the exported data files are imported into the TDengine system.

2) Under the full load of the system, if there are enough remaining computing and IO resources, a multi-threaded import mechanism can be established to maximize the efficiency of data migration. Considering the huge load that data parsing brings to the CPU, it is necessary to control the maximum number of parallel tasks to avoid the overall overload of the system triggered by importing historical data.

Due to the ease of operation of TDegnine itself, there is no need to perform index maintenance and data format change processing in the entire process. The entire process only needs to be executed sequentially.

After the historical data is completely imported into TDengine, the two systems are running at the same time, and then the query request can be switched to TDengine, thus realizing seamless application switching.

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4248671/blog/5334395