The operation and maintenance of massive data needs to be powerful, and GaussDB (for Cassandra) can help

Abstract: Application Operations Management (AOM) and Cassandra are two inseparable components that together form an efficient solution that can help enterprises gain huge advantages in application operation and maintenance. In this article, we will introduce the advantages and characteristics of AOM and Cassandra, and reveal the secrets of how they can maintain market competitiveness for enterprises.

This article is shared from Huawei Cloud Community " Massive data operation and maintenance needs to be powerful, Huawei Cloud GaussDB (for Cassandra) to help ", author: Selected by Huawei Cloud Community.

guide

With the popularization of container technology, more and more enterprises develop applications through the microservice framework, business is gradually transformed into services implemented on the cloud, and operation and maintenance is gradually shifting to cloud operation and maintenance services. In this environment, the operation and maintenance of cloud applications has also encountered new challenges:

  1. Operation and maintenance personnel have high skill requirements and need to maintain multiple systems at the same time, and the configuration is complicated. The distributed tracking system has high learning and use costs, poor stability, and low cost performance.
  2. It is difficult to analyze distributed application problems in cloud scenarios, mainly in how to visualize the dependencies between microservices, how to improve application performance experience, how to correlate scattered logs, and how to quickly track problems.

In response to the above challenges, AOM came into being.

What is AOM

AOM is a one-stop three-dimensional operation and maintenance management platform for cloud applications developed by Huawei Cloud. It consists of four sub-services: application resource management, monitoring center (observability analysis), automatic operation and maintenance, and collection management. Observational analysis and automated operation and maintenance solutions support rapid collection of indicators, logs, and performance data from the cloud and locally, helping users discover faults in a timely manner, comprehensively grasp the real-time operating status of applications, resources, and services, and improve the automation of enterprise mass data operation and maintenance capacity and efficiency.

AOM has many advantages and powerful functions, and it is inseparable from the intelligent data base that supports the operation of its massive data—Huawei Cloud GaussDB (for Cassandra).

Why choose GaussDB (for Cassandra)?

HUAWEI CLOUD GaussDB (for Cassandra) is a cloud-native NoSQL database compatible with the Cassandra ecosystem and supports SQL-like syntax CQL. On the basis of high performance, high availability, high reliability, high security, and elastic scalability, HUAWEI CLOUD provides service capabilities such as one-click deployment, fast backup and recovery, independent expansion of computing and storage, monitoring and alarming, and is especially suitable for various massive data Processing and high-concurrency business scenarios.

  • Businesses with data hotspots. For example, a news and current affairs APP needs to manage a large amount of news and current affairs data. When social hotspot events occur, the number of related news data requests increases sharply. At this time, it is necessary to ensure the normal operation of the APP and keep the request success rate stable.
  • Businesses that need to model time series data. For example: A weather station needs to collect temperature every minute and store the collection results. At the same time, it needs to ensure the timeliness of data and automatically delete expired data.
  • Businesses that need to model conversational message data. For example, a social APP needs to store a large number of user and session messages, and it needs to ensure that the time-consuming switching between different session messages is low, the success rate is high, and the response time is short.
  • Businesses that require high-speed data processing. Example: A business needs to quickly process data from different devices or sensors.
  • Services that require real-time monitoring data. For example: An operation and maintenance platform needs to monitor data of different dimensions in real time, accurately collect indicators, and quickly complete operation and maintenance.
  • Businesses that need to use social media analytics and recommendation engines. For example: An e-commerce APP needs to analyze user preferences for products, and implement product recommendations based on user preferences to improve user experience and product competitiveness.
  • ……

In addition, HUAWEI CLOUD GaussDB (for Cassandra) has rich features and is applicable to a wide range of business scenarios.

  • Big data applications: GaussDB (for Cassandra) can handle massive amounts of data and support high-throughput and low-latency read and write operations, so it is suitable for big data application scenarios.
  • Internet application: GaussDB (for Cassandra) can handle highly concurrent read and write requests and support multi-data center deployment, so it is suitable for Internet application scenarios.
  • Time series data: GaussDB (for Cassandra) supports the storage and query of time series data, so it is suitable for application scenarios that need to store and query time series data, such as Internet of Things, log analysis, etc.
  • High availability business. GaussDB (for Cassandra) uses multi-copy replication to ensure data availability and reliability. When a node fails, the system can automatically restore data from other nodes to ensure data integrity and consistency.
  • Scalability business. GaussDB (for Cassandra) can be easily extended to hundreds of nodes to process PB-level data sets. It also supports dynamic addition and deletion of nodes, and can flexibly adjust the scale and performance of the system according to actual needs.
  • Distributed storage applications. GaussDB (for Cassandra) uses a distributed storage method to store data in multiple nodes, and each node can independently process read and write requests. This method can effectively improve the availability and reliability of data, and also improve the throughput and scalability of the system.
  • Distributed query application: GaussDB (for Cassandra) supports distributed query, which can distribute query requests to multiple nodes for parallel processing, thereby improving query efficiency and response speed.
  • ……

To sum up, GaussDB (for Cassandra) is very suitable for application scenarios such as big data analysis, real-time data processing, social network, Internet of Things, distributed storage and query.

Interpretation of Real Scenarios——Data Hot Issues

AOM has powerful functions and involves a variety of typical business scenarios, such as data hotspots, time series data, real-time monitoring, etc. Therefore, GaussDB (for Cassandra) is selected as the underlying data support engine. Next, take the issue of data hotspots as an entry point to reveal how GaussDB (for Cassandra) can ensure the stable operation of AOM when data hotspots occur.

Scene reproduction:

When monitoring and maintaining massive amounts of data, the access frequency of specific data in the table suddenly increases, and some partitions generate hot traffic. The setting of the primary key in the table is unreasonable, and the business volume under a certain partition increases suddenly, and the traffic impact will be concentrated on one partition, causing the CPU of the node where the token corresponding to the partition is located to remain high.

Root cause of the problem:

GaussDB (for Cassandra) is a highly scalable, high-performance distributed database for big data scenarios, which can be used to manage massive structured data. In the process of business use, with the continuous growth of business volume and data traffic, some business design drawbacks are gradually exposed, which reduces the stability and availability of the cluster. For example, problems such as unreasonable primary key design, excessive number of records or data volume in a single partition, and large partition key appear, resulting in unbalanced node load and decreased cluster stability. These problems are collectively referred to as large key problems. The main reason for generating a large key is that the design of the primary key is unreasonable, resulting in an excessively large number of records or data volume in a single partition. Once there is a large amount of data in a partition, access to the partition will cause the load of the server where the partition is located to increase, and in severe cases, it may even lead to consequences such as node OOM.

In daily life, various popular events often occur. For example, when tens of thousands of clicks and comments are made on a hot news in an application, a large amount of requests will be formed. The frequent operation of the same key causes the CPU load of the node where the key is located to soar, which affects other requests on the node, resulting in a decline in the business success rate. There are also scenarios such as popular product promotions and online celebrity live broadcasts. These typical scenarios with more reads and fewer writes will also cause data hotspots. When the access of a certain key request on a certain host exceeds the server limit, it will lead to hot key problems. Big keys are often the indirect cause of hot key problems. Hot keys will cause the following hazards: traffic concentration, reaching the upper limit of the physical network card; too many requests, the cache fragmentation service is crushed; database breakdown, causing a business avalanche, etc.

In the above scenario, the main reason is that the primary key structure in the table is unreasonable, which leads to the generation of large keys and hot keys. The table structure is as follows. The movie table stores information about short videos, the partition key is movieid, and user information (uid) is stored. If the video corresponding to movieid is a popular short video, and tens of millions or even hundreds of millions of users like this short video, the partition where the popular short video is located is very large.

CREATE TABLE movie (
movieid text, 
appid int, 
uid bigint, 
accessstring text, 
moviename text, 
access_time timestamp, 
PRIMARY KEY (movieid, appid, uid, accessstring, moviename) 
) 

solution:

  • Adjust the table structure. Compared with other databases, GaussDB (for Cassandra) has a more flexible data structure and supports the flexible setting of the primary key and partition key. By setting the primary key and partition key reasonably, adjusting the table structure and query statements, and dividing the data in the table, it can Effectively optimize query speed and improve operation and maintenance efficiency. In the above scenario, the primary key setting of the movie table is unreasonable, and the amount of query data is very large and takes a long time. When creating a new table with the following table structure, the amount of data in the table is significantly reduced. The new table is used to save popular short video information, only the public information of short videos is kept, and user information is not included to ensure that the table does not generate large partition keys.
CREATE TABLE hotmovieaccess ( 
movieid text, 
appid int, 
accessstring text, 
access time timestamp, 
PRIMARY KEY (movieid, appid)
)
  • Use caching. Caching can improve the responsiveness of read operations, requiring the use of additional memory to store data, thereby minimizing the number of disk reads that must be done. As the cache size increases, so does the number of "hits" that can be served from memory. The built-in caches of GaussDB (for Cassandra) include key caches and row caches. The key cache stores a mapping between the partition key and the row index to facilitate faster access to the SSTable stored on the disk; the row cache can cache a certain row for each partition to improve the reading speed of frequently accessed rows.

In the above scenarios, caching can be used to mitigate traffic shocks. The business application reads the hotspot information from the cache first, and then queries from the database if it is not found, reducing the number of database queries. The overall logic flow is as follows.

Data hotspot detection tool:

Data hotspots will bring pressure to the business and affect the normal operation of the business. It is too late to solve data hotspots after they appear. Therefore, it is necessary to predict data hotspots and design solutions in advance to ensure the normal operation of the business. To this end, GaussDB (for Cassandra) provides detection and early warning tools for large keys and hot keys.

    • Large key detection . Through large-scale business observation and learning, GaussDB (for Cassandra) defines a key that exceeds any of the following thresholds as a large key: 1. The number of rows in a single partition key cannot exceed 100,000 rows; 2. The size of a single partition does not exceed 100MB.
    • Hot key detection . Through large-scale business observation and learning, GaussDB (for Cassandra) defines a key with an access frequency greater than 100,000 times/min as a hot key.

GaussDB (for Cassandra) supports large key and hot key detection and alarm tools. Customers can configure large key and hot key alarms for instances on the product interface according to actual business needs. At the same time, when a big key or hot key event occurs, the system will send an early warning notification. Customers can view the monitoring event data on the product interface and deal with relevant alarms in a timely manner to avoid business fluctuations.

Summarize:

For data hotspot issues, GaussDB (for Cassandra) provides real-time detection of big keys and hot keys to help businesses design reasonable solutions and avoid business stability risks; it provides real-time monitoring of big keys and hot keys to ensure that the first Time-aware business risks; solutions for large keys and hot keys are provided, which enhances the stability and availability of clusters in the face of large data volume flood peak scenarios, and escorts the continuous and stable operation of customer businesses.

To sum up, when online businesses use GaussDB (for Cassandra), they must implement relevant development rules and usage specifications, and reduce usage risks during the development and design phase. Generally follow the governance process of "setting specifications" → "access review" → "regular inspection" → "optimization rules". A reasonable design will generally reduce the probability of most risks. For business, the design of any table must consider whether it will lead to the generation of large keys and hot keys, and whether it will cause load tilt. In addition, a data aging mechanism needs to be established. The data in the table cannot grow without limit without deletion or aging. For scenarios with more reads and fewer writes, a caching mechanism should be added to deal with read hotspots and improve query performance; for each partition key and each row of data, its size should be controlled, and it should be optimized in time after exceeding the limit, otherwise performance and performance will be affected. stability.

in conclusion

The combination of AOM and GaussDB (for Cassandra) has successfully created an efficient, scalable, high-performance, flexible and customizable massive data monitoring operation and maintenance platform, which can help enterprises better manage and utilize monitoring data and improve operation and maintenance efficiency , to help enterprises maintain their competitive advantage in the ever-changing market environment.

 

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/9719124
Recommended