Enterprise actual combat | Construction of banking operation and maintenance index system under complex business relationship

background

In the iterative evolution process of cloudization, containerization, centralization, and microservices of banking IT systems, the relationship between system architecture and business invocation is complex, and the difficulty of operation and maintenance management has become increasingly prominent. The banking system mostly adopts the chimney construction method, which makes data communication difficult, and the operation and maintenance team cannot respond in time, discover and solve problems in a timely manner. At present, the banking industry as a whole has the ability to collect relatively complete IT indicator data in combination with business scenarios, and an indicator data analysis system is urgently needed to provide quantifiable, visualized, and intensive decision support for IT management and business analysis.

However, the Chinese banking industry will encounter the following challenges in the process of implementing the indicator management system:

  • The challenge of coordinating global operations and maintenance data sources

China's banking industry is in the transition period from traditional to Internet, and emerging technologies such as cloud computing and big data are gradually being developed. Moreover, the banking industry needs massive IT computing power and real-time response speed to meet the demands of continuously launching innovative services. In short, the magnitude of data in banking operations data sources is increasing, while business unit expectations for real-time processing response times are decreasing.

In addition, for the indicator data of a single business department, you can rely on personal experience of business scenarios to quickly make judgments and apply them to IT operation and maintenance management. However, the banking system generates a large amount of indicator data all the time. IT managers cannot judge the importance and priority of indicator data to the business, let alone sort out the correlation between indicator data and business. Indicator data, so that the indicator data cannot play its potential value.

  • The Challenge of Continuously Innovating Intelligent Algorithm Libraries

The digital transformation of China's banking industry is challenging the way IT is organized and managed. Technology is increasingly integrated into the business. IT is no longer limited to supporting technology, but is also a driving force for innovation. Due to its own characteristics, the banking industry has an urgent demand for accelerating innovative business, and the intelligent algorithm model for business scenario innovation is the only magic weapon to solve such demands.

  • Follow the challenges of new theories of intelligent operation and maintenance

Although intelligent operation and maintenance has been developed for many years, it is still in the exploratory stage. During the practice of the indicator management system, it is necessary to continuously learn and absorb the new theoretical systems of standards such as ITIL 4 and IT4IT in the field of intelligent operation and maintenance, so that the indicator management system can play a huge role in promoting business intelligent operation and maintenance.

The actual implementation of bank index management system

A complete indicator management system should be based on the top-level planning of enterprise business and IT operation and maintenance management, classify and hierarchically manage the isolated data of each business system, so as to display the indicators of business scenarios in a more systematic and hierarchical way Data, making it a data-driven, business-oriented monitoring and management of business operation and maintenance, enabling IT administrators to upgrade and simplify complex IT management work, improve IT management methods, and improve the overall IT operation of the enterprise efficiency.

Implementation plan

The implementation project of a bank's indicator management system is driven by top-level indicator management. It starts from a business perspective, takes business scenarios as the theme and takes business continuity as its purpose . By facing business scenarios, sorting out the IT call chain in a positive direction, and accessing data sources in reverse Finally, an index management system with the ability to overview the health of all business scenarios and overlook multi-dimensional and three-dimensional IT indicators was built.

First, starting from the core business scenarios of the bank, unified data collection, index extraction and data storage are carried out on the IT data sources and business data sources of the application system through a professional operation and maintenance database platform; , conduct index management system consultation and research, conduct index sorting and program construction on IT data and business data, and form index specifications and implementation systems; way to complete the construction of the bank's indicator management system. At the same time, according to the daily operation and maintenance scenarios of the banking industry, functional modules such as workbench, visual control and AIOps are implemented in the upper-layer application of the indicator management system platform.

Index system construction

  1. Business research: focus on business scenarios and sort out business indicators

Through business research, sort out the core business of the bank, including offline payment (such as: counter deposit), online payment (such as: mobile banking), wealth management, etc. The department's report report and the business indicators that leaders refer to in decision-making are used to sort out the key indicators of the core business scenarios. Examples are as follows:

  1. Data access: topology IT call chain, measurement technical indicators

After researching the data of the bank's IT system and automated configuration platform, we have sorted out the application systems that support the bank's core business, such as omni-channel payment system, pre-payment system, payment and settlement system, etc. Each business system has a complete IT system. The monitoring status and index attention of the core business system are divided into five layers of technical index monitoring system according to the dependencies of the top-down call chain: application layer, service layer, middleware layer, process layer (virtual layer), host layer , based on the experience of index management system construction in multiple bank cases, build an index management system for monitoring sources, and build a topology-dependent relationship matrix between indicators at each layer. Examples are as follows:

The technical indicators of the above-mentioned layers are an independent real-time serial data stream. The topological relationship network of the call chain between the technical indicators of each layer is established through the configuration item data of the bank automation configuration platform. The example is as follows:

  1. Model configuration: quantify business attention, modeling index health

After completing the structure layering of the indicator management system, combine the core business indicators and evaluate the weights of indicators at each layer with a result-oriented reverse thinking : the stability of business indicators depends on the stability of business subsystems, and the stability of business subsystems depends on IT applications The stability of the system and the stability of the IT application system depend on the stability of all levels of the IT system, and the stability of all levels of the IT system depends on the stability of various technical indicators. So when the atomic-level technical indicators are unstable, how can the risk escalate and penetrate upward? The influence of various technical indicators is quantified by weighting calculation and quantification by means of level evaluation and weight distribution of technical indicators.

For example, the over-the-counter deposit is the core basic business system, and it is necessary to model the health of the over-the-counter deposit business from three dimensions: life and death indicators , key indicators and standard indicators :

The life-and-death indicator of over-the-counter deposits: transaction success rate; a single indicator that reflects business availability.

The calculation method of the transaction success rate: the number of successful transactions per unit time divided by the total number of transactions in the same unit time.

Key indicators of over-the-counter deposits: request success rate, average response time; a set of technical indicators that directly affect the business situation.

Interpretation of indicators: When the success rate is lower than the expected threshold, it directly means that the end user frequently fails in business operations when using the counter deposit function, which in turn affects the user experience and leads to an increase in the customer churn rate.

Standard indicators of over-the-counter deposits: memory usage rate, CPU usage rate; individual monitoring technical indicators related to business situation.

Interpretation of indicators: When the CPU usage and memory usage of physical resources at the host layer suddenly increase, it may cause instability of a single node of the IT application system, but in the context of microservices and distributed architecture, this risk will not spread and affect to the business layer.

The classification and rating of indicators of life and death line indicators, key indicators, and standard indicators is to more accurately quantify the weight of each indicator on the health of business topics, and it is an important factor for modeling the health of business scenarios. The weight of the technical indicators is used to obtain the health score of the business scenario.

After defining the health degree and setting the weight, a health degree model covering the over-the-counter deposit business is established. The example is as follows:

  1. Global overview: workbench overview of business scenarios, application wall overlooking the indicator situation

As the upper-layer application of the indicator management system, the workbench carries the design concept of indicator system management, that is, it supports viewing the health status of each layer from the business scene wall, application wall, service wall, middleware wall, host wall and other tabs; Unified management of the health of business indicators and IT indicators, so as to support the linked viewing of the health status of technical indicators at all layers of the IT system from the business scenario wall, ultimately ensuring the continuity of business operation and maintenance.

  • Operational Perspective of Business Units

Business departments can view the health score of each business scenario through the business scenario wall, click on the business scenario to view the topology map of the sub-business system that the business scenario depends on, and intuitively understand the color identification (green for health, orange for danger, and red for disaster) The operation status of each IT application system in the business subsystem, click the IT application system support to view the operation status and trend chart of various key indicators, enable analysis of the health status of business scenarios, accurately locate the source, and improve the efficiency of cross-departmental communication.

Business Perspective - Business Subsystem Dependence Topology

Business Perspective - In-depth Analysis of Operation and Maintenance Situation of Business Subsystems

  • Operational perspective of the technical department

The operation and maintenance department overviews the health score of the technical indicators of each layer of the IT application system through the application wall, service wall, middleware wall, and host wall. Click to view the IT application system in charge of the department and see the indicator system topology diagram of each layer of the IT system , click on the instance object of each layer to view the real-time data such as the running status and trend graph of each indicator object of the instance object, enable the troubleshooting of abnormal items of IT system technical indicators, and prevent the changes beforehand, during the alarm, and retrospectively. Operation and maintenance methods to improve operation and maintenance efficiency.

Operation and maintenance perspective - three-dimensional topology of the application system

Operation and Maintenance Perspective - In-depth Analysis of Instance Object Index Situation

Summary and Outlook

The actual achievements of the above-mentioned bank indicator management system can be summarized into the following two aspects:

  • Through the three-dimensional construction concept of the layered serial call chain, the integrity of the operation and maintenance process is enhanced : the call association of the application systems at all levels of the IT operation and maintenance process is opened up, so as to achieve end-to-end coverage, ensure the continuity of the operation and maintenance of the IT system, improve the Overall system operation and maintenance efficiency.

  • The business operation and maintenance efficiency is improved by combining the topology visualization design concept of business and operation and maintenance : the transparency of business bonding and connection operation and maintenance is improved, and optimization is based on the measurement-driven process, thereby establishing an organizational culture of efficient collaboration, high authorization and continuous improvement.

The construction of the IT system in the banking industry is in the stage of upgrading and evolution, and the bottleneck of business operation and maintenance is also "flickering". The indicator management system will also pursue and follow the trend in the direction of process refinement, algorithm intelligence, and integration of operation and maintenance.

Open source benefits

Cloud Wisdom has open source data visualization orchestration platform FlyFish. By configuring the data model, it provides users with hundreds of visual graphic components, and zero coding can achieve a cool visual large screen that meets their own business needs. At the same time, Feiyu also provides flexible expansion capabilities, supports configuration of component development, custom functions and global events, and ensures efficient development and delivery for complex demand scenarios.

Click the address link below, and welcome everyone to give FlyFish a like and send a Star. Participate in component development, and there will be more than 10,000 yuan in cash waiting for you.

GitHub address: https://github.com/CloudWise-OpenSource/FlyFish

Gitee address: https://gitee.com/CloudWise/fly-fish

Super Experience Officer Event: http://bbs.aiops.cloudwise.com/d/712-flyfish

10,000 yuan cash activity: http://bbs.aiops.cloudwise.com/t/Activity

Wechat scan to identify the QR code below, note [Flying Fish] Join the AIOps community Flying Fish developer exchange group, and communicate face-to-face with the FlyFish project PMC~

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunzhihui/blog/5550015