Easy-Jet Cloud Super Large-scale Cloud Computing Center Inspection "Zero Intervention"

Editor's note:

EasyStack ECS, a new generation private cloud of EasyStack, commercializes the operation and maintenance experience of 1,000+ large and medium-sized enterprise customers and tens of thousands of node-scale cloud platforms, and realizes light operation and maintenance. It is based on a safe, stable, and efficient new-generation data center distributed cloud operating system. It separates the platform from the service through an integrated and scenario-based design concept, and realizes the evolvability and light operation and maintenance capabilities of the entire platform. In terms of light operation and maintenance, it can realize the intelligent unified operation and maintenance of ultra-large-scale cloud computing centers. It not only realizes the visualization and automation of logs, monitoring, and alarms, but also autonomously detects changes in system topology and service status, and then realizes intelligent perception Fault pre-diagnosis analysis and rapid self-healing.

This article is the intelligent inspection article of the easy cloud light operation and maintenance series.

Regular inspections can detect system abnormalities in time and avoid accidents. However, traditional private cloud inspections rely on personnel to check the status of servers, storage, network and other equipment one by one every day, or find value from thousands of logs. The content is time-consuming and labor-intensive, and there are drawbacks such as long cycles and poor reliability. Compared with traditional manual inspections, intelligent inspections perform automatic inspections on Yunping's infrastructure, and automatically alarms when abnormalities are found, and collect inspection logs with one click to realize the intelligence of the entire inspection process.

Unified inspection of ultra-large-scale distributed cloud computing center

After the cloud computing center has gone through the stage of large-scale development, it is currently developing in a distributed direction to integrate various physical and virtual resources to form a unified logical resource pool and effectively improve the resource utilization and management efficiency of the cloud computing center. Under the distributed architecture, the functional modules of the cloud computing center application system are deployed in a distributed manner, and the functional subdivision of the business system leads to the diversity of versions and the complexity of the calling relationship between the modules. The traditional private cloud can only perform independent operation and maintenance on a single resource pool distributed in different regions, and build a set of operation and maintenance system for each resource pool. It is difficult to realize the integrated operation and maintenance of the entire cloud platform.

                                                                Schematic diagram of intelligent inspection of Yijiexingyun's new generation of private cloud

EasyStack ECS, a new generation of private cloud, is based on a safe, stable, and efficient new generation of data center distributed cloud operating system, which can realize the intelligent unified operation and maintenance of ultra-large-scale cloud computing centers.

First of all, Easy Cloud has commercialized the operation and maintenance experience of 1,000+ large and medium-sized enterprise customers and tens of thousands of node-scale cloud platforms to form an operation and maintenance knowledge base; then, it will automatically perform inspections and other operations through the event grid service. Event grid service is the inherent capability of the new generation of private cloud of Yijiexing Cloud. It has the capability of event orchestration. Through the event grid service, through the API, the most effective path between the cloud service and the physical device can be used to sense events in time. , Dynamically execute operation and maintenance actions without being affected by deployment scale and deployment form, helping enterprises improve situational awareness and enterprise agility.

"Zero intervention" in the whole process of patrol inspection to avoid human operation risks

Traditional private cloud monitoring management, inspection, log and other systems are built separately, often requiring manual inspection by operation and maintenance personnel and manual input of inspection logs. The Express line of next-generation cloud-based private cloud EasyStack ECS business scenarios flexible definition of process operation and maintenance services, check manually upgrade to 7 × 24 unmanned automated inspection, full inspection unattended, to avoid the risk of human action , real-time track record The operation of the cloud platform realizes early judgment of abnormal conditions and early warning of expected failures , effectively improving inspection efficiency and ensuring safe and stable business operations.

1. Zero intervention in the inspection process: 360° perception of cloud platform operating status

Daily inspection work Performs daily health inspections on the cloud platform and abnormal inspections of resource performance indicators. The automatic inspection script is embedded in the ECS of Yijiexing Cloud's new-generation private cloud, and the inspection tasks and time are defined in advance, and the physical resources, computing resources, storage resources, network resources, cloud services, and operations are performed without affecting the customer’s business. The system conducts comprehensive automatic detection and analysis of operating status and capacity status to help customers 360° in-depth perception of the operating status of the cloud platform, so that managers can perform remote inspections, discover, report, and deal with problems in a timely manner, and prevent problems before they occur. Create the possibility for real-time and remote processing of alarms.

2. Zero intervention in log management: unified log, real-time/timed notification of inspection report

The log data of the operation and maintenance management can well reflect the operating status of the cloud platform. When a problem occurs in the system, you can check the log for troubleshooting.

Yijiexingyun's new generation of private cloud ECS realizes the visualization and automation of logs, monitoring, and alarms. Provides multiple services from one-click log collection, log storage to log retrieval and analysis, to help operation and maintenance personnel comprehensively and systematically analyze system faults and health conditions to facilitate rapid problem location and analysis; at the same time, by configuring alarm mailboxes in advance, patrol inspections Logs are sent regularly to assist users in obtaining notifications of abnormalities in time and query the causes of abnormalities.

                                                                       Configure the alarm mailbox in advance, and send the inspection log regularly

3. Zero intervention for abnormal alarm: automatic alarm for abnormalities, and self-healing of faults

In order to help enterprises to more quantitatively evaluate the current alarm management capabilities, the new generation private cloud ECS of Yijiexing Cloud is based on intelligent inspection and perception, providing serious, warning and information automatic alarms of different degrees for services, storage, hosts and logs, and provides automatic Repair and manual alarm handling solutions; provide sufficient warnings before failure or loss of control to achieve active operation and maintenance and reduce failure rate; trigger automatic alarms when equipment fails, damage, or equipment load is abnormal, from fault discovery, diagnosis to self-healing The entire process is implemented automatically, and the operation and maintenance guarantee is basically achieved with little or no participation, ensuring the safety and reliability of the platform.

                                                                           Provide different levels of automatic alarm and handling solutions

Case: Intelligent inspection of a large tertiary hospital achieves efficient operation and maintenance

There is no technical threshold for ECS intelligent inspection of the new generation private cloud of Easy Cloud, which can help enterprises proactively find out the reasons that may affect system availability and performance degradation, and find serious failures that may affect software and hardware, as well as business system performance bottlenecks, etc. To a large extent, provide users with the usability and stability of the business environment.

Take a large tertiary hospital as an example. The hospital builds an internal private cloud platform in the hospital relying on the new generation of private cloud ECS of Yijiexing Cloud. Through accelerating fault diagnosis and operation and maintenance decision-making, it comprehensively monitors hardware, systems, services, and performance 360°. Visualized multi-dimensional and fine-grained monitoring indicators, automated operation and maintenance and in-depth analysis, alarm/automatic inspection report email notification and other rich functions, effectively reduce the pressure on operation and maintenance, allowing the hospital to focus more on business system function expansion and service optimization.

With the increasing scale of enterprise IT management, the new generation of private cloud ECS intelligent inspection service of Yijiexing Cloud will greatly improve the work efficiency of operation and maintenance and inspection personnel, and increase the convenience and accuracy of operation and maintenance inspection work. The way of operation and maintenance improves the service management level of the cloud platform, helps the enterprise cloud platform to be highly reliable and highly available, and accelerates the digital transformation of the enterprise.

Guess you like

Origin blog.csdn.net/k8scaptain/article/details/107696528