Application cases of AIOps intelligent operation and maintenance |

This case comes from the case collection of Qingchuang's implementation - a national financial service organization

1. Case background

1. Policy needs

In recent years, the wave of digital transformation of domestic financial institutions has continued to surge. As one of the main industrial development strategies, Xinchuang has become a new driving force for domestic economic development and a new wind vane for innovation in all walks of life. [State-owned Assets Supervision and Administration Commission Xinchuang No. 79] has clearly pointed out that all central enterprises will realize the replacement of all central enterprises' information systems with security and innovation by the end of 2027.

The financial industry, as the only industry in which my country's credit innovation strategy is comprehensively promoted and piloted and a key industry that the country pays attention to, has actually achieved remarkable results.

The content of the case shared this time will explain in detail how to quickly build an integrated and observable Xinchuang operation and maintenance platform to achieve localization replacement and complete the digital transformation of Xinchuang from the aspects of data governance, monitoring system innovation, and disaster recovery capability improvement.

2. The business system changes faster + the original monitoring is difficult to meet

With the gradual stepping into the deep-water area of ​​credit innovation, the acceleration of enterprise cloudification, and the rapid growth of business volume, the customer's original payment system gradually became difficult to cope with the huge business volume. The operation and maintenance system supported by the background must also be reformed from the original relatively simple and static form to the direction of diverse capabilities, dynamic real-time, high-tech and fast speed. The foreign monitoring tools used in the system also need to be replaced as soon as possible.

After an in-depth analysis of the operation and maintenance monitoring system, we found that the customer's computing functions were scattered and repeated, and the convergence and concentration capabilities were seriously insufficient.

The main manifestations are:

  • There are many data collection tools and different data standards, so it is impossible to carry out effective data screening, and it is difficult to provide reasonable consumption value

  • Due to the complexity of business rules and insufficient functional support of monitoring configuration, effective management and association cannot be carried out when new components or application systems appear

  • The number of original foreign monitoring systems exceeds 30+, and the processing and analysis capabilities are insufficient. Faced with the coexistence of multiple network management systems, it is impossible to properly complete the operation and maintenance work, and the operation and maintenance costs are extremely high

2. Landing steps

1. Domestic replacement, creating an intelligent integrated monitoring platform

01. Capability replacement and optimization

Replace data collection, indicator monitoring, centralized log monitoring, centralized alarm management and other capabilities with Tivoli, add AI algorithm, visual configuration, stream batch processing, policy configuration and other functions on the basis, and optimize the system's processing and storage efficiency for big data to achieve high availability

02. Data standardization

Unify the format of multi-dimensional data based on the operation and maintenance data governance standard, and realize the enrichment and expansion of time data through CMDB and self-management data. At the same time, establish a simplified indicator system to achieve standardized management of indicator names and Tags, and increase log data identification to achieve classified management and storage

03. Intelligent analysis and visualization

Based on CMDB, alarm data association analysis is realized, and AI algorithm is synchronously implanted to realize the abnormal analysis ability of indicators and logs. At the same time, it provides visual decision-making methods such as leadership view, CCPC view and business scene monitoring based on rich report capabilities

04. System integration and function linkage

Realize the replacement of systems such as Tivoli managed PAAS, BAAS, host monitoring, and open platforms, and at the same time complete the data access and functional linkage of peripheral systems such as network monitoring, dial testing, CMDB, process platforms, and automated scheduling, such as cloud platforms and hyper-converged platforms

4. Achieve panoramic observability and improve business continuity

The construction of the project is completed, fully independent and controllable, and the overall cost reduction and efficiency increase of operation are realized. From the perspective of business, the goal of operation and maintenance to provide value for operation has been completed.

The specific performance is as follows:

  • The dual centers adopt mechanisms such as distributed deployment, system active-active, data dual-acquisition and synchronization, etc., to realize flexible switching of the system, reduce the impact of correlation, improve expansion efficiency, and comprehensively improve disaster recovery and self-healing capabilities with decoupling mode

  • Create a comprehensive one-stop supervision platform, operation and maintenance monitoring covers the whole range of business, can be used by various operation and maintenance personnel, and avoid problems such as coexistence and maintenance of multiple platforms

  • From the perspective of data analysis and processing capabilities, the alarm processing speed reaches 4,000 records/second, and the log processing speed reaches 100,000 records/second. The new system can create 100+ stream batch job tasks at the same time, and has a 5000+ keyword strategy library to achieve hit matching of tens of thousands of data per second

  • Realize X86, virtual machine, mainframe, storage, network equipment and other multi-dimensional data access and analysis, data types 30+, monitoring range increased to 6000+

  • Complete visual analysis and statistics based on business concerns, and provide data basis to assist operational decision-making. The current implementation scenarios include: data report 15+, scene monitoring 10+, etc.

3. The final effect

  • In the end, it helped customers successfully build diversified data collection capabilities, and realized the unified standard of multi-dimensional data.

  • Through various scenario capabilities from the perspective of business, combined with visualization, the value of operation and maintenance data can be deeply explored to help customers gain real-time insight into business operation status, provide effective basis for leadership decision-making, and ensure business continuity.

Perhaps in the near future, the pace of domestic innovation will be further accelerated. Qingchuang will continue to work with ecological partners in various fields to complete independent and controllable Xinchuang adaptation and make preparations in advance.

In the future, we hope to reach cooperation with customers in more industries, jointly promote the implementation of operation and maintenance in various industries, and help customers realize the hope of improving business service quality driven by technology.


​Qingchuang Technology, a benchmark supplier in the field of AIOps continuously recommended by Gartner. The company is committed to assisting enterprise customers to improve insight into operation and maintenance data, optimize operation and maintenance efficiency, and fully reflect the influence of technology operation and maintenance on business operations.

The common choice of industry leading customers

​Learn more about operation and maintenance dry goods and technology sharing

You can follow with one click in the upper right corner

We have been deeply involved in the field of intelligent operation and maintenance for nearly ten years

AIOps Benchmarking Supplier Recommended by Gartner for Consecutive Years

See you next time

Guess you like

Origin blog.csdn.net/qq_37641528/article/details/131558245