In 2023, these new AIOps trends cannot be ignored

The article comes from the public account - Dr. Bu (senior product expert of Qingchuang Technology)

Foreword: In recent years, the research and industry application of artificial intelligence technology has risen sharply. Even if it seems that artificial intelligence technology is just a fantasy from a movie, it is undeniable that artificial intelligence technology has been successfully applied in all aspects of our lives and has greatly changed our lives. Artificial intelligence application scenarios have involved all aspects of our work and life. ChatGPT, which has become popular during this period, is the best proof.

For example, in order to find knowledge about a product, we use search engine technology to accurately identify search intent, match the extensive knowledge acquired by the search engine, and obtain the knowledge we want; , shopping records, and automatically recommend products we are interested in. There are also current news and information websites or apps that can automatically recommend information that may be of interest to us based on the content data we browse, and it is becoming more and more accurate. Feeling.

……...

AI can be used to improve many services in our daily lives, while also encouraging innovation. In the field of operation and maintenance, people are helping us to promote IT operation and maintenance to keep pace with various industries through artificial intelligence and machine learning technology. Let's take a look at a few new AIOps trends that we have seen recently from the market and customers and are worth paying attention to.

Trend 1: Need for faster alarm event response

Regardless of whether it is international or domestic, we see that AIOps is really widely used in intelligent alarm response and disposal. AIOps provides O&M engineers with richer contextual information at the point in time when an alarm occurs through a comprehensive analysis of the acquired data such as alarms, changes, logs, knowledge, and indicators, so as to accelerate the recognition of alarms and provide O&M engineers with Faster root cause analysis.

For example, when an alarm occurs, it can provide:

1. Similar alarm identification

Whether similar alarms have occurred in history, and what was the previous solution, can assist operation and maintenance engineers to quickly deal with them.

2. Impact Analysis

When an alarm occurs, whether it affects the supporting upper-level business and technical services, such as the abnormality of a certain indicator of the shared storage service, whether it affects the upper-level database service, and the affected cluster size.

3. Complete alarm 360 view

At present, the more popular name is observability. Under the new trend, AIOps will intelligently provide more comprehensive indicators, alarms, changes, and log-related data required by the alarm according to different scenarios of the alarm, for operation and maintenance engineers to view in one view. Complete the comprehensive analysis, response and disposal of alarms.

4. Root cause analysis

In fact, it is not recommended to call it root cause analysis. Often the result of the algorithm is a probability problem, so it can be called a suspected root cause, that is, after an alarm occurs, the algorithm recommends a list of suspected root causes (the product solution of root cause analysis, in It will be introduced in detail in subsequent chapters) for operation and maintenance personnel to recommend a list of possible root causes among a bunch of disorganized alarms to narrow the scope of investigation.

5. Alarm correlation analysis

Through the advanced AIOps algorithm, like the correlation analysis of identifying the products in the shopping cart, identify which alarms are related to each other during the alarm occurrence period, and use the AIOps correlation algorithm to mine historical data to find its correlation to further reduce alarms processing capacity and improve collaboration efficiency.

6. Predictive alert analysis

AIOps will make more extensive use of predictive analytics capabilities based on artificial intelligence and machine learning techniques, in anticipation that your team will be able to detect potential problems minutes or even longer before the alarm starts, and deal with it in advance.

7. Important indicator curve

When an alarm occurs, the indicator curve of important indicators during the alarm period, and the comparative analysis with the normal period.

Trend 2: Focus on building an integrated "unified operation and maintenance platform"

Recently, we've seen a number of composite requirements emerge with similar characteristics, including:

  • Unified monitoring and management platform (unified policy management and control platform for monitoring systems such as zabbix, Prometheus, APM, and NPM)

  • Unified alarm management platform (integrated unified management platform for alarms generated by monitoring systems such as zabbix, Prometheus, APM, NPM)

  • Unified data management platform (converging indicators, logs, alarms, changes) and other data operation and maintenance data platform

  • Unified collection and control management platform to complete unified data collection and processing of different environments such as public cloud, private cloud, container, etc.

  • ...

In response to the emergence of similar requirements, it has changed the need to use multiple tool systems in the past and required operation and maintenance personnel to manually log in to multiple tool systems, and manually obtain data from different tool systems to complete a given task (such as processing alarms and making changes) .

We use and create new AI algorithms to process multiple data types at once through a single O&M tool system called "Unified O&M Platform" and allow the tool to see all given data (metrics, logs, changes, alerts) , work order, knowledge, configuration items and configuration item dependencies, etc.), conduct correlation analysis on them, and combine these valuable data through specific scenarios to help reduce alarm noise, enhance alarm processing efficiency, and provide operation and maintenance More alert context for personnel.

In the future, AIOps will be based on the data foundation of the "unified operation and maintenance platform", and will save more time and money for enterprises through further innovative applications.

Trend 3: Alarm handling is more automated

In the previous article, we mentioned the five pillars of AIOps (data, expert experience, automation, visualization, and AI algorithm). Before 2022, the operation and maintenance team is very cautious about automated alarm analysis and automatic alarm handling, but in In 2022, more and more financial industry organizations will start to pay attention to automation technology. With the help of AIOps, it can help operation and maintenance engineers to conduct automatic problem analysis, troubleshooting and automatic problem repair. This will allow the operation and maintenance engineer team to have more time to focus on the innovation of operation and maintenance tools and efficiency, thereby providing a higher customer service experience.

Written at the end: What is AIOps?

Gartner's definition is: AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination. (AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and root cause exploration). Its essence is to automate some complicated tasks in the daily operation and maintenance process through artificial intelligence technology, and people will be freed from these trivial tasks, focusing on innovation and creating better operation and maintenance tool products to improve the rapid response to alarms , so as to provide end users with a better product and service experience.


​Qingchuang Technology, a benchmark supplier in the field of AIOps continuously recommended by Gartner. The company is committed to assisting enterprise customers to improve insight into operation and maintenance data, optimize operation and maintenance efficiency, and fully reflect the influence of technology operation and maintenance on business operations.

The common choice of industry leading customers

​Learn more about operation and maintenance dry goods and technology sharing

You can follow with one click in the upper right corner

We have been deeply involved in the field of intelligent operation and maintenance for nearly ten years

AIOps Benchmarking Supplier Recommended by Gartner for Consecutive Years

See you next time

Supongo que te gusta

Origin blog.csdn.net/qq_37641528/article/details/130090279
Recomendado
Clasificación