Drops open source Nightingale Nightingale: enterprise-class monitoring solutions

Double Cheese bit and release a new open source project it!

Nightingale (Nightingale), is a bit basic platform developed jointly by bit cloud and open source, enterprise-class monitoring solution designed to meet the monitoring requirements of native cloud era enterprise level.

Nightingale Introduction

It drops the official introduction , this is a Ge as small as several machines can be as large as hundreds of thousands of perfectly support of enterprise-class monitoring solutions , taking into account the original cloud of Health and bare metal, support support application monitoring and surveillance systems, plug-in mechanism flexible, rich sound plug-in, and has a high degree of flexibility and scalability.

GitHub Address : https://github.com/didi/nightingale

In Open-Falcon, based on the combined best practices within the drops, which in a lot of performance, maintainability, ease of use is improved. As a group unified monitoring solutions support billions of pieces of internal monitoring indicators, covering everything from systems, containers, to all levels of application monitoring needs.

Nightingale a tree node navigation, also referred to as object tree. The object tree is essentially a mechanism to monitor the management group objects, easy to find and view the monitored object, and set up monitoring strategies and other management actions on the monitored object. A typical tree from top to bottom can be described as the relationship between organizational structure, products, services module relations, room and machine mount relations, the navigation tree can be flexibly customized according to user needs on their own.

Monitoring policy is applied to a node, all child nodes under the mount of all the machines will use this strategy, any machine will trigger threshold associated alarm is generated.

Custom monitor the market made a substantial usability improvements, support for the chart threshold value, support the classification charts, graphs and ordering new management are visible in WYSIWYG mode, custom inspection market is no longer difficult.

Nightingale is derived on the basis of the Open-Falcon evolved, Open-Falcon as a domestic one of the most widely used monitoring solutions, provides a large number of reference for the design and development of Nightingale.

Contrast: Nightingale and Open-Falcon

difference

Alarm engine reconstruction: Open-warning Falcon strategy, monitoring data will also push up the trigger policy decisions, this "push" model is to determine the advantages of aging policy is very high, but not conducive to more advanced alarm strategy support and expansion, such as a combination of alarm conditions and more difficult to support. Nightingale sliding into binding mode, push mode to ensure the most efficient policy decisions by the support and pull mode and alarm conditions nodata alarm;

Introduced a navigation object tree: The Open-Falcon uses flat HostGroup, into a navigation object tree Nightingale, is a grouping mechanism to monitor the management of the target object tree nature, easy to find and view the surveillance target, target setting and monitoring monitoring strategies and other management actions. While Nightingale in addition to the concept of alarm templates, alarm policies directly with the tree node binding, simplify design, significantly enhance the flexibility and ease of use;

Indexing module upgrading: the Open-Falcon MySQL storage using index data metrics, the bottleneck on the expandability and flexibility. Nightingale according to the monitoring requirements, design and development of a new memory module index index, query more diverse, higher query efficiency, avoid maintenance optimization work had reached one hundred million level when faced with MySQL index data;

Timing Database Optimization: On the basis of Open-Falcon Graph memory module on the introduction of Gorilla Facebook compression scheme, the recent adoption of several hours of data storage memory, significantly increasing the efficiency of data query, long-term data is still stored on the hard disk rrdtool data format. At the same time further improve the performance and stability of the timing of the database;

Alarm engine availability enhancements: an alarm module engine judge did heartbeat mechanism through automatic fault extraction, no longer have to worry about a single judge downtime led to some policy failure, requiring human intervention issue, index module also uses a similar fashion to ensure availability;

Native built log monitoring function: Nightingale client native built a log match and index extraction capacity, on the web console page supports the configuration log matching rules, and Profiles under the target machine specific directories also supports reading, let monitoring business indicators are more easy to use;

Operation and maintenance can be enhanced: the portal (falcon-plus api in), uic, dashboard, hbs, alarm combined into one module: monapi, simplifying the overall deployment of the system more difficult, in-process method call to become part of the module between the original call higher performance;

Centralized configuration file: configuration file did ease the transformation to extract common configuration database to mysql.yml, extraction port address and other related configuration examples to address.yml, a large number of configuration code to the default value, making the profiles more clearly and easy to maintain.

Same point

Data model has not changed, still the organization metric, endpoint, tags, and basic agent can be multiplexed, Nightingale the agent called the collector, the fusion of the original logic agent and falcon-log-agent of the Open-Falcon, a variety of monitoring plug-in are also possible reuse.

Overall data flow and processing logic is similar, still using flexible push model, divided into two data storage and alarm determination links.

Nightingale architecture

That collector agent, the machine can collect common metrics, native support for log monitoring, support for plug-in mechanism to support business reporting data directly through the interface;

transfe r collector provides rpc received data reported by the interface, and then through the consistent hashing, forwards the data to multiple tsdb and multiple Judge;

tsdb that is open-falcon in the graph component for storing historical data to support dual-write mode configured to enhance the capacity of disaster recovery systems, tsdb will forward a copy to the index monitoring data indexing;

memory index is the index module, replacing the original mysql embodiment, indexing in memory, to facilitate subsequent data retrieval, retrieval and greatly enhance the flexibility of retrieval performance;

Alarm engine is judge, from the synchronization policy monitoring monapi (portal), then the received data is determined to make an alarm, so as to satisfy the threshold, an alarm is generated to push redis event queue;

monapi (alarm) read from the event queue redis judge generated secondary treatment, add some meta information, generate an alarm message, re pushed back redis queue;

Each transmission component, such as mail-sender, sms-sender, etc., read from the Redis alerting message, sends an alarm, all kinds of abstract sender to facilitate subsequent customization;

monapi incorporates original features multiple modules, provides the interface to the js call, api prefix / api / portal, data query go transfer, in addition to the open-falcon in the original query component, api prefix / api / transfer, index query the prefix api / api / index, then, the front end Nginx uniform structures, different location to forward the request to different backend;

The database is still using MySQL, stored mainly include: user information, team information, tree nodes, trap strategy, market surveillance, screening policies, acquisition strategy, some of the components heartbeat information.

Follow-up

  • Provides monitoring indicators polymeric components, the present framework can solve the machine level, module level monitoring, but monitoring indicators cluster dimension is the need to aggregate metrics for all modules of the entire cluster, machines, do some adding and seeking operations averaged like , open polymerization process-related components are in intense;

  • Seamless integration with k8s work are under way;

  • More complete monitoring plug-in, plug-ins and existing communities for secondary finishing and maintenance.

【End】

Epidemic prevention, how to return to work in parallel? Tianyun launched artificial intelligence data monitoring program! In the end how to do advance the prevention, rather than after the fact? 8 pm Thursday , the day the cloud data VP Yong for you reveal the answer! Scanning the next Fanger Wei code free registration ~

Recommended Reading 

nearly 10 years programming language rookie big PK, Pick it!

had sleepless nights techniques learned, and now no use ......

Your Business Under what circumstances need to artificial intelligence? Take a look at what conditions and capabilities you need to have it!

5 bn bo suspected data leak, how to avoid stepping Python reptile sinkhole?

@ developers, Microsoft CEO Satya led the 60 large build-up to make coffee, you dare to take it?

claiming Nakamoto he was angry judge hate: your testimony there is no credibility!

You look at every point, I seriously as a favorite

Click to read the original text, sign up to participate

Released 1878 original articles · won praise 40000 + · Views 17,070,000 +

Guess you like

Origin blog.csdn.net/csdnnews/article/details/105108963
Recommended