Eighth Chinese software Bowl - cloud monitoring system design ideas

Foreword

Long time did not write a blog, mainly because of the recent two-month busy doing two games, one eighth of China's software Bowl, and the other is Ali's fifth Middleware Performance Challenge and another is busy ready to move forward to autumn, it almost did not write a blog for three months, recently concluded the Chinese software Cup, Ali fifth middleware performance challenge also ended the preliminary round, so take advantage of free time to sum up the two a match.
First report about the achievements, China's software Cup won third prize (national), the fifth Middleware Performance Challenge preliminaries finished sixth (4094 team), we have now entered the semi-finals.
Let me talk about the Chinese software Cup competition results, first of all this achievement for me, quite unexpectedly, first of all I feel very disappointed about this game, at the beginning of the competition when the game is that this game belongs to the class of technology, the end result or bias in PPT type of game, as to why so to speak, I have to Tucao look here, I attended the title match is on cloud platform monitoring alarm systems, probably using the topic of enterprises HTTP APIget to the cloud server performance metrics, and then through these indicators to monitor, when the rule when indicators in line with user-defined alarm is triggered, the title match demand looks nothing issue, clarity, law-abiding, I and my teammates will need to achieve the title match, it will stop at the final prize the finals did not enter (the title match I attended only three teams to reach the finals). Behind the scene by watching the final video, the achievement learned finalist team, they actually use on artificial intelligence, block chain, face recognition, AR, and so this popular technology, the use of block chain technical support from API the acquired data is not lost, not to be tampered with, the use of AR cloud server to view monitoring information, the use of face recognition to verify identity, using artificial intelligence to .... sounds fast hardware technologies stack up I will not speak with these monitoring alarm system anything to it, do not say they have not achieved in the end, these terms alone put the "experts" rule instill obedience, it is no wonder into the finals, I was on the tall these technologies and I did not spend a teammate, the final game will get a prize only.
To tell you the fifth Middleware Performance Challenge preliminaries, the results for me very satisfied, the first day of the preliminary round took a 5th place, then the middle one after another have also been the first few days, the final score for the first six, in general, this result is not surprising, because from the first day of preliminaries beginning to the end of my rankings are front.

Both matches will stop here, then the following is the next to share my ideas and my teammates to participate in China's software probably Cup, eventually accompanied by the source code, the finished video.

Like a friend can point a thumbs praise ~ If a few more words to write about my next idea fifth Middleware Performance Challenge preliminary round.

The title match analysis

The contest problems Address: Based on Huayun Gong has a cloud platform, public cloud monitoring system design

In the current era of universal cloud, single application has been unable to meet the rapidly increasing business demands, we design idea is to split the single application into multiple small business functions in accordance with the service, each service provides specialized small business functions, different services can be between RPCor HTTPcommunication, so that the system can be decoupled into a plurality of services, each service can be developed independently, deployment, maintenance and management, but also can be based on the horizontal extension of the service, more fine-grained extension.

Architecture design

Overall structure

According to the title match demand, we will follow the business system is split into five main services:

  • Data Collection: Collecting data from different applications, will be posted to the message queue middleware after data cleaning.
  • Data storage and retrieval: The main function of the data persistence while providing search services outwardly. Data collected by subscribing to the message queue, asynchronously processing the data stored in a retrieval framework, to facilitate retrieval of data, while the data is persisted to the database, to prevent loss of data.
  • Monitoring alarm: The main function is to provide custom alarm rules and exceptions to the police. Outwardly interface provides customized alarm rules, through RPC, and statistical analysis of the data service call to retrieve the data storage and retrieval, automatic warning alarm when the data satisfies the rule.
  • Unified service layer: The main function is to unify the various services provided outside APIcommunication.
  • Front: The main function is to provide users cloud host management operations, monitoring alarm custom page, host performance data cloud visual display.

Real-time data collection system for Huayun data collection, to ensure the validity of data, data collection and data storage and retrieval by RabbitMQthe purpose of decoupling, asynchronous, so vast amounts of data collected provides the possibility, while relying on RabbitMQto provide reliable news , to ensure the reliability of the data. Data storage and retrieval by subscribing to RabbitMQreal-time data will be stored MySQLand Elasticsearchby Elasticsearchimproving the retrieval efficiency, so that vast amounts of data retrieval as possible.

The real-time monitoring and alarming data monitoring, analysis and statistics, alarm when the data are consistent with the rules, it will pass Email, SMS alarm information promptly sent to the user, allowing users to secure well aware of the cloud host. Users can customize the alarm rules, can be multi-dimensional set of alarm rules, such as monitoring parameters, monitoring period, the number of cycles, in line with the rules of an alarm or an alarm when it is always in line with other information, allowing users to cloud host performance control of the multi-dimensional.

Unified service layer data storage and retrieval, alarm monitoring, management, unified and integrated cloud hosting foreign provide a unified APIoperation.

The distal end provides the user with a cloud host management, data visualization, alarm monitoring interface.

Technology Selection

1. Development Language

In this paper, Javaas a development language, mainly due to Javahigh stability, high safety, it has a huge ecosystem, but also has characteristics of cross-platform, so this paper Javaas a development language.

2. Development Framework

In this paper, Spring Bootas a framework for system development, because the Springframe eco series is very good, able to quickly develop and deploy agile, as well as to improve the maintainability of late. Spring BootAs Springa sub-project under the framework of the series, we can quickly integrate third-party frameworks, while supporting the packaging system as an application for execution, provided the possibility for micro services.

3. Message Queuing

In this paper, RabbitMQa middleware message queue, because RabbitMQlow latency, high availability, high message reliability.

4. RPC frame

Paper, Dubboas a RPCframework, Dubbois open Alibaba excellent high performance of a service frame, so that the high-performance applications can RPCexport and import functions implemented and services, and may be Springa frame seamlessly integrated.

5. Distributed Services Framework

In this paper, ZooKeeperas a distributed service framework ZooKeeperis a distributed, open-source coordination service for distributed applications.

6. The full-text retrieval framework

In this paper Elasticsearch, ElasticSearchit is based on a Lucenesearch server. It provides a distributed multi-user capabilities of full-text search engine based RESTful webinterface. ElasticsearchIt is Javadeveloped, and as Apachereleased under open source license terms, is popular enterprise search engine. Designed for the cloud, to achieve real-time search, stable, reliable, fast and easy to install.

7. caching framework

Text adopted Redisas a cache framework Redisis an open source use ANSI Clanguage, support network, can also be based on memory persistence type of logs, Key-Valuedatabase.

8. Database

Paper, MySQLas the database system.

Back-end implementation

data collection

Traffic log data collection and pulled out, collect dynamic data from various data sources, and data filtering, analysis, rich, uniform format operation and then stored for subsequent use, to achieve a different server logs generated unified management, unified management of log data has a very important role:

  • Log analysis: the log through a unified platform log analysis, statistics, operation and maintenance by eliminating the need to go inside to view a log servers.
  • Find data: by retrieving the logs, load control and operating status of the server.

To get through the regular task cloud host performance data provided Huayun, so to maintain data consistency and Hua Yunguan party data, ensure real-time, the validity of the data. The data acquired while washing, retaining only the useful data, to publish data to a message queue middleware RabbitMQin. By sector switch huayun.fanoutthe message to publish huayun.esand huayun.persistencetwo queues, for storing the message to the former Elasticsearch, the latter for storing the message MySQLin.

By message delay delivery of safeguards data reliability, upstream service data collection will send data to a message after a delay of RabbitMQ to continue to send a message to RabbitMQ, followed by consumer news service, we will cancel the message delivered to the downstream RabbitMQ, callback services Callback Serviceby subscription downstream delivery service cancellation message, know what the message was successfully consumed, if the delay when a message arrives RabbitMQ upstream service delivery after the callback service to get the message after a delay, it was found that the message did not receive incoming downstream services cancellation message, the callback service will re-invoke the upstream services to the upstream service rerouted to ensure message delivery reliability.

Data storage and retrieval

Using Elasticsearchthe data full-text search, which has a high expandability, can be extended to hundreds of servers, PB level data processing, while also having efficient retrieval capabilities, for the large amount of data can be quickly retrieved. Using Dubbo+ ZooKeeperprovide external distributed RPC service using Zookeepera distributed service management, service providers will be released to the registry service, and the service consumer can subscribe to the service through the registry, the service provider receives service change notifications, use Zookeeper, capable of call service to monitor the situation analysis.

Subscribe middleware message queues RabbitMQ, Elasticsearchservice level by listening to the queue huayun.es, the queue when the news reached, RabbitMQwill take the initiative to push over the message, Elasticsearchthe service layer of the received message is stored Elasticsearchin. Similarly, MySQLthe service layer but also over the push message is stored MySQLin.

In addition to providing data storage and retrieval of data persistence services, but also provide external data retrieval service, by ZooKeeperregistered retrieval services, by Dubboproviding RPC.

Monitoring alarm

Based on the service data storage and retrieval to provide, you can multi-dimensional custom rules, server running real-time monitoring, while for abnormal server appears, alarm monitoring can notify users in a timely manner, the alarm message is sent, allowing developers a glance to find problems where, in order to quickly resolve.

Alarm monitoring by subscribing to Zookeeperget the service information through Dubbo 获取到数据存储与检索提供的检索服务,基于检索服务对数据进行检索和分析,根据报警规则定时的去对数据进行分析,当数据满足报警条件时自动进行报警,将报警信息实时的发送发给用户,同时将报警信息存储到 MySQLin order to future inquiries.

Users may be made to set the alarm rules multidimensional, properties such as monitoring, monitoring period, the number of cycles, and the threshold value triggers the alarm and the like.

Unified service layer

Foreign unified service layer to provide a unified APIfront end can be provided through a unified service layer APIto retrieve the data, set alarm rules, management of cloud hosting, image management and snapshot management.

By RPCusing the search data and alarm monitoring services, and by increasing the Redisbuffer layer, query results cached Redis, increasing efficiency.

Source

Open source our team work, welcome criticism

Source: github.com/xue8/huayun...

Demo Video: www.bilibili.com/video/av610...

Original Address: ddnd.cn/2019/07/28/...

Guess you like

Origin juejin.im/post/5d380a32e51d45572c060123