DTT live broadcast review: One article will give you a comprehensive understanding of openGemini

On July 19, the openGemini community and Huawei Cloud DTT (Technical Public Live Course Column) jointly held a live broadcast event with the theme of "openGemini Time Series Database Application Scenarios and Technical Practices", initiated by Huawei Cloud Open Source DTSE Technology Evangelist & openGemini Community Ren Xiangyu conducted an online live communication with developers, discussing 8 aspects including openGemini’s characteristics, application scenarios, open source goals and values, differentiated competitiveness, core capabilities, technology ecology, operation and maintenance management, and community roadmaps. A comprehensive and detailed introduction was given. Near the end, Xiang Yu said that openGemini is an open, inclusive, and cooperative open source community and welcomes more developers and partners to join to jointly promote technological innovation.

3 major features of openGemini

openGemini is a time series database that pays equal attention to storage and analysis. It has three significant features:

  • Open source

The open source license used by openGemini is Apache 2.0, which is business-friendly. Partners and developers can release their own commercial versions based on openGemini, build operation and maintenance monitoring systems based on openGemini, develop monitoring products and services, and build Internet of Vehicles based on openGemini. , Internet of Things and Industrial Internet of Things platforms, etc.

  • high performance

From incubation to open source, openGemini has long relied on Huawei Cloud SRE operation and maintenance monitoring business. In the process of product polishing, openGemini has excellent reading and writing performance and efficient data analysis capabilities.

  • distributed

The stand-alone version of the database is always limited by computing resources and cannot achieve higher throughput and performance. Therefore, openGemini has designed a distributed cluster architecture from the moment of its birth, with good scalability and flexibility.

Focus on massive telemetry data storage and analysis scenarios

In recent years, with the development and popularization of many new technologies such as cloud computing, AI, 5G, and the Internet of Things, digital transformation is in full swing. In the fields of Internet of Vehicles, manufacturing, logistics, electricity, Internet of Things, industrial Internet, operation and maintenance monitoring and other fields, The amount of data has increased dramatically. For example, the vehicle data collected by large car companies in one day is at the petabyte level; the operation and maintenance data collected by top cloud vendors exceeds tens of terabytes every day.

Faced with such a massive amount of telemetry data, openGemini proposed targeted design and technical optimization solutions through an in-depth understanding of the data and business characteristics in the above scenarios, achieving a clustered time-series database system with high concurrency, high scalability, low latency, and low cost. .

At present, openGemini has been officially commercialized in Huawei Cloud Industrial IoT Platform, and is also supporting the operation and maintenance monitoring business of the entire Huawei Cloud. There are about 25 clusters deployed across the entire network, with a maximum cluster size of 70 nodes and an average daily processing 20TB data, write TPS 40 million items/second, read QPS 50,000/second.

In the few months since openGemini was open sourced, 46 known companies have contacted the community and officially connected to the business for testing and adaptation. The rising spark has the potential to start a prairie fire.

6 major capabilities highlight openGemini’s differentiated competitiveness

  • Performance advantage: Among the differentiated competitiveness of openGemini, high performance is the most important one. Compared with open source InfluxDB, openGemini improves simple query scenarios by more than 2 times, and medium query scenarios by more than 5 times. In complex query scenarios, openGemini can still respond quickly, but InfluxDB fails to work due to OOM. In addition, openGemini's newly developed high-radix engine supports unlimited timelines, further expanding the scope of applications. If you need to know the performance comparison with other similar products, you can find the contact information on the official website to request it.

In addition, openGemini has launched a series of practical functions in data storage and data analysis to build more differentiated competitiveness. The main functions are as follows:

  • Streaming aggregation: Streaming aggregation is a pre-aggregation method that downsamples the data while writing the data. Its purpose is to solve the problem of traditional downsampling methods that read a large amount of historical data from the disk for calculation, resulting in serious I/O amplification. The problem.

  • Multi-level downsampling: For existing historical data, traditional downsampling methods will retain historical data details. In some scenarios, the historical data details are not important, and only the data characteristics need to be retained. The multi-level downsampling function can extract the features of the historical data details and replace the historical data details in place, which can further reduce the cost by 50%. Storage costs.

  • Log retrieval: Log data is a special kind of time series data. Most time series databases support log storage, but just storing log data is not enough. Log retrieval and analysis are the ultimate purpose of storing logs. The mainstream uses the ELK technology stack for log processing, but when faced with massive logs, ES becomes very difficult. openGemini uses a dynamic word segmentation method to implement full-text indexing in the kernel, and has the advantages of low memory resource usage and high retrieval efficiency. Everyone is welcome to try it and provide feedback.

  • Anomaly detection and prediction: openGemini has developed an AI-based data analysis framework for the final application of time series data, which can realize anomaly detection and prediction of time series data. It can detect 13 common abnormal scenarios and has the characteristics of fast detection speed, high accuracy, The advantages of stream-batch integration allow data to be processed nearby and improve data analysis efficiency.

  • High cardinality engine: High cardinality will lead to index expansion, resulting in excessive memory resource consumption and reduced read and write performance, which has long plagued the development of time series databases. openGemini found a solution from the AP system and developed a new high-radix engine HSCE, which can support unlimited timelines. At present, the core capabilities are in place, and various aggregation methods under the high-cardinality engine are being improved (planned to be completed in September).

picture

Supported by core capabilities, the application scenarios are broader

In addition to the above-mentioned differentiated capabilities, openGemini's core capabilities also include full compatibility with InfluxDB 1. rollout), materialized views, data partition sharding (supports specifying partition keys), data retention policies, etc.

Powerful components improve operation and maintenance management capabilities

In order to improve the operation and maintenance efficiency of openGemini, the community developed the ts-monitor component, which specifically collects node and kernel indicators and can be used with Grafana to achieve comprehensive monitoring of the operating status of openGemini. For example, indicators such as CPU and memory utilization, write bandwidth, write latency, write concurrency, QPS, etc. can be seen at a glance through the visual interface.

picture

Embrace the ecology and help application development

Since openGemini is compatible with InfluxDB, data access tools, SDKs, data insight tools, big data analysis tools, etc. used in InfluxDB can be directly applied to openGemini.

In terms of operating systems, openGemini currently supports mainstream Linux systems, X86 and ARM64 CPU architectures, and will support MAC and Windows in the next version.

In terms of cloud native, openGemini supports the deployment of Docker, K8s, KubeEdge and other platforms. To facilitate deployment on K8s, the community created the openGemini-operator project.

In terms of data migration, a data migration tool from InfluxDB to openGemini is provided. A tool for migrating data from ES to openGemini is under development and is expected to be available in August.

In terms of management tools, data export is supported, and backup and recovery and GUI management tools are under community development and will be available to everyone in September.

To sum up, openGemini supports multiple mainstream development languages ​​and operating system platforms, seamlessly integrates with InfluxDB’s third-party tools, and supports multi-form deployment and applications.

picture

Imagine the future with us

At multiple domestic and foreign industry summits held this year, openGemini conducted theme technology sharing and exhibition displays, fully feeling everyone's attention to openGemini, and also conducted technical and business exchanges with many developers. I am very grateful to Huawei Cloud DTT technology. In the open class column, we took this opportunity to comprehensively and carefully introduce all aspects of openGemini in front of the majority of developers. I hope everyone will have a deeper understanding of openGemini from now on.

Thanks to the new and old friends who participated in the interaction and listening, I wish you a prosperous career and great success, and I wish your classmates success in their studies and a bright future!

openGemini will continue to focus on the storage and analysis of massive telemetry data, and provide the industry with open source solutions that can effectively handle the storage and analysis of massive data. At the same time, we hope to grow into a first-class time series database technology community, cultivate more outstanding database technology talents, and promote the vigorous development of the database industry!

Finally, openGemini is a young technical open source community with a vast space and full of infinite possibilities. The community belongs to all developers. We hope that more companies and developers will participate together to create a good open source community culture and let open source benefit millions. Let’s build, govern and share the future together!


openGemini official website: http://www.openGemini.org

openGemini open source address: https://github.com/openGemini

openGemini public account:

Welcome to pay attention~ We sincerely invite you to join the openGemini community to build, govern and share the future together!

The author of the open source framework NanUI switched to selling steel, and the project was suspended. The first free list in the Apple App Store is the pornographic software TypeScript. It has just become popular, why do the big guys start to abandon it? TIOBE October list: Java has the biggest decline, C# is approaching Java Rust 1.73.0 Released A man was encouraged by his AI girlfriend to assassinate the Queen of England and was sentenced to nine years in prison Qt 6.6 officially released Reuters: RISC-V technology becomes the key to the Sino-US technology war New battlefield RISC-V: Not controlled by any single company or country, Lenovo plans to launch Android PC
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3234792/blog/10111951