Analysis of intelligent operation and maintenance solutions in the "university" industry (including implementation practice)

Background and Challenges

Since the "Twelfth Five-Year Plan" of informatization, "three links and two platforms" have become the focus of informatization construction in the education industry. Under the above background, various local education departments and schools have carried out a large number of construction practices.

With the continuous emergence of current educational application innovations and the transformation of university informatization construction from digital campuses to smart campuses, campus IT systems are also facing new changes and challenges, mainly including the following two aspects:

  • Application platform level: The important business systems in the education resource public service platform and the education management public service platform, such as the one-card, finance, and scoring system, have higher requirements for usability and agility;

  • IT operation and maintenance level: A large number of systems and equipment put forward higher requirements for operation and maintenance. The system is complex and it is difficult to have effective monitoring tools. Therefore, it is difficult to quickly locate operation and maintenance problems, and it is difficult to effectively evaluate the operation and maintenance effect.

In addition, with the transformation from digital campus to smart campus, this process also brings the following pain points to campus IT operation and maintenance:

  • Perception of health degree: It is difficult to establish the overall health degree system of education resources public service platform and education management public service platform, and there is no quantitative indicator management;

  • Difficulty in positioning: Because the quality of the system provided by the third party is difficult to guarantee, it is difficult for teachers and students to reproduce and locate the problems that occur when using various system platforms (such as lesson preparation system, self-learning system, campus comprehensive management system, etc.);

  • High concurrency: It is difficult to evaluate the concurrency bottleneck of the system in advance, which leads to the outbreak of the test-taking boom in stages, which often causes system downtime.

solution

The three stages of intelligent business operation and maintenance in colleges and universities include the completion stage of operation and maintenance tools, the standardization stage of IT operation and maintenance management, and the stage of intelligent operation and maintenance.

Complementation of university operation and maintenance tools

The completion stage of operation and maintenance tools mainly includes the completion of infrastructure monitoring, business application monitoring, user experience monitoring, centralized alarm and other campus IT infrastructure monitoring.

In addition, with the construction of smart campuses and the gradual improvement of school IT infrastructure, the characteristics of multiple campuses sharing the same computer room make the real-time online and security performance of IT infrastructure particularly important.

Based on the above background, cloud smart infrastructure monitoring can quickly support new resources through rich protocol access capabilities and model-based definition capabilities, realize integrated monitoring of resources, and grasp the current health status of server network hardware and software in real time. In order to evaluate and measure the utilization rate of infrastructure, provide accurate data for user infrastructure optimization and understand the processing capacity of infrastructure equipment, predict potential failures, and provide early warning.

In terms of visual real-time alarms in the campus computer room, Cloud Wisdom provides a three-dimensional panoramic view of the computer room, including the cabinet and individual equipment outside the cabinet. The temperature and humidity status data is transmitted to the cloud smart server for unified monitoring.

In terms of indicator detection, Cloud Wisdom has built-in tens of thousands of indicator items, which have been successfully used out of the box.

Cloud Wisdom can proactively discover business problems and ensure the high availability of dedicated lines for teaching buildings in each campus. On the one hand, Cloud Wisdom conducts 7*24 active dial-up tests on the external business of colleges and universities through national and even global nodes, so as to detect problems in time and give active alarms; The quality of the private line improves the user experience.

With the gradual improvement of various business systems on campus, the calling relationship between business and business is also gradually complicated, which makes it difficult to quickly locate problems in business systems. Based on the above factors, in the face of complex and diverse systems, the school business system needs visual tools for centralized management, and at the same time quantifies the quality of the business system provided by a third party, so as to locate and analyze school business system faults.

Cloud Intelligence has end-to-end full technology stack application performance management, including support for mobile and smart devices to better understand real user experience; support for end-to-end highly virtualized applications to track load changes; and support for public cloud, private cloud and hybrid Cloud deployment across cloud environments. The above performance can quickly locate problems in the educational administration system, including the following aspects:

  • Fine operation and maintenance: including automatic discovery of global topology, rapid location of performance problems, and correlation analysis between applications;

  • User experience: including automatic acquisition of all user behaviors, fine-grained tracking of real user behaviors, operations and process performance;

  • In-depth diagnosis: including code-level problem diagnosis, analysis of the performance impact of stack statements, and detailed analysis of database SQL;

  • Behavior analysis: including statistical analysis of business behavior, end-to-end transaction tracking, and quick location of performance problems.

In terms of business analysis, Cloud Wisdom can automatically connect the entire request based on the unique request ID identifier, from front-end to back-end application code and infrastructure, and restore problem snapshots based on a single request sequence, helping colleges and universities to gradually analyze and use them from the outside to the inside. The root cause of the system problem, so as to achieve the purpose of rapid recurrence of teachers and students using the problem.

Cloud smart and efficient solutions can quickly locate code-level problems. It includes problem discovery based on business topology, mobile code crash problem analysis, web-side real user experience monitoring and analysis, request and key transaction analysis, and single-request analysis.

Cloud Wisdom can integrate and sort out applications and IT resources based on various educational administration systems. On the one hand, the system architecture topology diagram hierarchically displays the health of all objects in the system and the dependencies between them. On the other hand, users can quickly view vertical dependencies by resource and analyze associated objects, thereby speeding up the root cause troubleshooting process.

Cloud Intelligence provides a unified exit for alarms, so as to achieve centralized, automated, diversified, intelligent, and user-friendly alarms. It mainly includes the following aspects:

  • Aggregate scattered alarms, carry out standard formatting, and realize centralized processing;

  • Automation of alarm handling, confirmation, dispatch, escalation, and recovery;

  • The alarm notification function supports a variety of notification methods to ensure that the notification of problem events can be delivered immediately;

  • Based on rules, compress and merge alarms for massive and continuous redundant messages, suppress the number of alarm messages, and reduce the frequency of alarm messages;

  • Provides an alarm silence option to silently process alarms within the system maintenance time window, thereby reducing unnecessary alarm disturbance.

Standardization of operation and maintenance management in colleges and universities

The standardization stage of operation and maintenance management in colleges and universities includes the realization of technologies such as ITSM, CMDB, and operation and maintenance automation. Cloud Wisdom optimizes the user experience of teachers and students by standardizing third-party services through standardized management processes.

In terms of the overall design of IT service management (ITSM), the cloud smart digital operation service management product can fully satisfy the construction of the IT service management system in colleges and universities. Through the existing product functions + system API interface + custom process + secondary development and customization of some functions, the construction content of the smart campus project can be satisfied.

In terms of centralized management and control of information assets, Cloud Wisdom has CMDB data maintenance based on automatic discovery. Automatically collect configuration item information of Iaas, Pass, and Saas layers through various methods such as Agent and API. It supports federated collection of multiple data sources, and reconciles the collection data of various data sources to ensure the completeness and accuracy of configuration item information in the CMDB.

The following figure shows the overall architecture design of CMDB. Through the cloud intelligence CMDB, the comprehensive management of operation and maintenance metadata from IaaS, PaaS to SaaS layer can be completed, providing complete and accurate metadata support for systems such as operation and maintenance monitoring, service management and automation.

The following is the data virtuous cycle process of CMDB applied to monitoring alarm processing and automation platform. As a configuration information base for various resources, CMDB provides information drill-down capabilities for monitoring alarm information, allowing single-point alarm information to be extended to specific impact areas. The automation platform triggers system repair through alarm self-healing. At this time, the target list of task execution will be obtained through CMDB, which improves the accuracy and feasibility of task execution. After the problem system is repaired, CDBM will automatically collect system information and update the information in the original database to complete the information archiving.

In terms of visualization of information office work orders, Cloud Wisdom makes the information office work of colleges and universities traceable through work order statistics, so as to achieve the purpose of continuous optimization.

Intelligent operation and maintenance of colleges and universities

The intelligent operation and maintenance of colleges and universities includes the realization of intelligent analysis, predictive analysis, machine learning, AI and other technologies. On the one hand, through the realization of AIOps intelligent operation and maintenance scenarios, the operation and maintenance management mode is transformed, thereby improving the level of intelligence and automation. On the other hand, artificial intelligence algorithms are used to perceive business hidden dangers based on data characteristics, so that faults can be predicted based on historical data.

The indicator anomaly detection provided by Cloud Intelligence aims to discover abnormal points in the time series of KPIs (Key Benefit Indicators) through algorithms, and then notify operation and maintenance personnel of relevant risks through alarms. At the same time, indicator anomaly detection is also a pre-scenario of other AIOps scenarios, and the detection results provide input information for subsequent scenarios such as alarm convergence, root cause location, and fault self-healing.

The single-index intelligent anomaly detection is shown in the following figure:

The multi-index intelligent root cause analysis is shown in the following figure:

Intelligent log anomaly detection includes log anomaly pattern detection, log statistics anomaly detection, log sequence anomaly detection, and other anomaly detection.

Cloud Wisdom provides operation and maintenance ideas based on user experience. From data monitoring, to analysis and optimization, to management guidance, the overall satisfaction of teachers and students has been improved. It has achieved the transformation from four stages of basic, promotion, management, and advanced, and realized the evolution from instrumental operation and maintenance to intelligent operation and maintenance.

landing practice

A case of unified monitoring platform in a university

Need pain points

  • There are many business systems, and it is difficult to visualize the calling relationship;

  • Difficult to perceive user access experience;

  • There is a problem, and the root cause cannot be quickly located;

  • Many systems require centralized management and centralized monitoring.

Program Highlights

  • A unified monitoring platform established for existing business systems

  • End-to-end tracking of user experience of business systems using APM probe technology

program value

  • Management value: can have overall control over huge, diverse and complex business systems;

  • Operation, maintenance and development value: It can effectively monitor the full access data of the business system, and achieve accurate positioning and in-depth problem diagnosis from the global to the local. (For example, business system operating topology, access efficiency, database query statements, host information, etc.)

  • Overall value: quickly improve the performance of the business system, meet the user experience of teachers and students in the whole school, and ensure the development of normal teaching activities.

FlyFish open source benefits

Cloud Wisdom has open source data visualization orchestration platform FlyFish. By configuring the data model, it provides users with hundreds of visual graphic components, and zero coding can achieve a cool visual large screen that meets their own business needs. At the same time, Feiyu also provides flexible expansion capabilities, supports configuration of component development, custom functions and global events, and ensures efficient development and delivery for complex demand scenarios.

If you like our project, please don't forget to click the code repository address below and click Star on the GitHub / Gitee repository, we need your encouragement and support. In addition, immediately participate in the FlyFish project to contribute to become a FlyFish Contributor, and there will be 10,000 yuan in cash waiting for you.

GitHub address: https://github.com/CloudWise-OpenSource/FlyFish

Gitee address: https://gitee.com/CloudWise/fly-fish

Wechat scan to identify the QR code below, note [Flying Fish] Join the AIOps community Flying Fish developer exchange group, and communicate face-to-face with the FlyFish project PMC~

{{o.name}}
{{m.name}}

Supongo que te gusta

Origin my.oschina.net/yunzhihui/blog/5577583
Recomendado
Clasificación