How to use Alibaba Cloud HiTSDB time series database to read and write millions of massive data points in seconds

Read the full text[urlhttp://click.aliyun.com/m/23220/][/url]

At the 2017 Yunqi Conference Shanghai Summit, Alibaba Cloud released the HiTSDB time series database for IoT scenarios, which can support 10 million data per second Time-series data point writing; with PB-level data storage capability, providing efficient compression algorithms, and reducing overall storage costs by 90%; providing time-series data interpolation calculations, reduced precision calculations, time-latitude aggregation calculations, and spatial-latitude aggregation calculations.

The capabilities of HiTSDB are born out of Alibaba's years of practice. Facing the scale of the cluster, it has unique analysis and computing capabilities. This article will conduct an in-depth analysis of HiTSDB from the perspective of e-commerce.

Alibaba owns the world's largest e-commerce trading platform, with a single-day turnover of 120.7 billion on Double Eleven in 2016. This huge business scale is supported by thousands of application services, tens of thousands of servers, and hundreds of millions of service calls every day.

image

Such a large-scale application monitoring needs the global monitoring service APM (Application Performance Monitoring) to complete. By collecting the operation data and business indicators of each service and machine for data tracking and monitoring, you can globally grasp the business and service operation status and help troubleshoot Diagnosis and business evaluation.

Ali's global monitoring service is called Ali360. Through Ali360, global business and service monitoring can be performed. The Ali360 technical challenge is cluster size. The scale initially faced is hundreds of applications and tens of thousands of machines. It is necessary to monitor QPS and other related service indicators for each machine. The application system generates data according to Metric specifications, tens of millions of data points are written, and hundreds of thousands of data points are written. 10,000 data point queries, this scale is quite staggering.

image
(Tmall Double 11 big screen is also part of APM)

At the same time, the average writing of Ali360 is maintained at 200W/S, and the average size of each data point is 200 bytes, so the write volume per second is 0.4G, and 34T data is generated every day, and APM is a common time series field. Usually, data writing is carried out continuously. According to such a writing speed, the storage cost required for business storage every year becomes very huge, and cost optimization is urgently needed.

image

Faced with this huge challenge, we must find the most suitable way.

The first solution to be ruled out is through a relational database solution. The writing pressure of millions of levels of data is huge on the relational database. The index created by the relational database to support multi-dimensional queries reduces the writing efficiency. At the same time, the index storage space also causes a huge cost of the entire solution, and the performance also suffers. not ideal.

The second excluded scenario is the NoSQL storage scenario. The problem with KV is that the append operation of data is usually converted into get and put operations, which is more suitable for a large number of small hot data, not suitable for large data writing of such monitoring data, and the efficiency of data writing is also very poor.

From the analysis of the monitored business form and data characteristics, we finally found the Alibaba Cloud HiTSDB time series database to solve this problem. Since the final presentation form of business monitoring data is based on time latitude monitoring data, these presented data are collectively referred to as "TimeSeries Data" in the technical field, and a series of continuous data point series for a certain indicator is referred to as "TimeLine time". String". Then the monitoring system finally presents a series of TimeLines. The time series database is a database product specially optimized for the management of time series data.

image

The system writes arbitrary data through the interface provided by HiTSDB according to the Metric specification. The written information can include any tags, such as: computer room, area, IP, application, service, method name, etc. and written indicators such as: abnormal number, QPS, TPS, etc., and write time series data to HiTSDB storage through the HiTSDB interface , HiTSDB supports time-series data monitoring applications at any latitude to query monitoring indicators, and provides services to applications in the form of timelines.
Read the full text http://click.aliyun.com/m/23220/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326056494&siteId=291194637