Linux server network operation and maintenance monitoring software Nagios

      To make the server run well and smoothly, a very arduous task is to do a good job in network operation and maintenance management. Network administrators use a number of tools to monitor server health and see how network traffic rises and falls. They also had to make sure that the entire network of servers ran smoothly, as a single minute of network outage could disrupt the work of the entire organization.

      One of the most important ways to make the server network run smoothly is to use various network operation and maintenance management software. Existing network operation and maintenance management software can be said to be various, but these tools are often expensive, so it is worthwhile to spend some time to purchase, and it is necessary to carefully study its applicability, performance, professionalism and other characteristics , There are many factors that need to be judged, but it is not easy to understand these indicators and make a choice in a short time.

      With the rapid development of the Internet industry today, the number of users of some Internet of Things services has reached 100 million. For example, Taobao has reached 370 million registered users, and the number of active users on Double Eleven in 2015 exceeded 100 million. The hardware foundation supporting such a large number of users is a large-scale server group. It is very important to obtain the running status of each server, learn about potential hidden dangers in time, and lock and eliminate problems in time. Computer room operation and maintenance personnel and high-level decision-making personnel can only make effective decisions after mastering this information in real time, such as whether to shut down the service in time or start the backup service after excessive access traffic or other malicious attacks, and whether to work after the service area is paralyzed. Whether personnel go to the computer room to deal with hardware problems or simply restart the server remotely, and other similar or simple or complex decisions, all require the support of the underlying expert system information.

Today's more mature open source server bottom data collection solutions mainly include SugarNMS and Nagios.

Nagios of server operation and maintenance monitoring software

      Nagios is a monitoring system that monitors system operating status and network information. It can monitor specified local or remote hosts and services, and provide abnormal notification functions. Nagios runs on Linux/Unix platforms and provides an optional browser-based web interface for system administrators to view network status, various system problems, logs, and more. 

Features that Nagios can monitor are: 

  1. Monitor network services (SMTP, POP3, HTTP, NNTP, PING, etc.); 2. Monitor host resources (processor load, disk utilization, etc.); 

  2. Simple plug-in design allows users to easily extend the detection method of their own services; 4. Parallel service inspection mechanism; 

  3. Have the ability to define the network hierarchy, use the "parent" host definition to express the relationship between network hosts, this relationship can be used to discover and clarify the host down or unreachable state; 

  4. Send alerts to contacts (via EMail, SMS, user-defined methods) when service or host problems occur and are resolved; 

  5. Some handlers can be defined so that they can play a preventive role in the event of a service or host failure; 8. Automatic log rolling function; 

  6. It can support and implement redundant monitoring of the host; 

  7. The optional WEB interface is used to view the current network status, notification and fault history, log files, etc. [2]; 11. The system monitoring information can be viewed through the mobile phone;

SugarNMS of server operation and maintenance monitoring software

      Zhihe network management platform mainly monitors the operation and maintenance of the server based on the SNMP network protocol. Of course, if it is a device with other protocols, the protocol can also be extended.

      The platform adopts mature technologies such as J2SE, XML, Web Service, Web, HTML5, JavaScript, Struts, Spring, Hibernate, SNMP, HTTP, JDBC, Swing, RMI, OM Mapping, OR Mapping, and Muti-Thread. It is composed of multi-layer architecture patterns of presentation layer, business layer, data layer and device middle layer, and provides corba and webservice interfaces. The framework uses the device middle layer to shield the differences in the device management protocols of different manufacturers, and supports the management of different types of managed devices.

Autodiscover server

      In the process of automatic discovery, you can search for the server, identify the manufacturer and model of the server, generate a panel diagram of the device or search for device resources, such as: board, port, CPU, memory, disk, etc. Automatic device generation topology map.

View the overall performance of the server

      Select the server on the topology map, right-click, and select Comprehensive Device Information. You can view overall reports, details, management recommendations, and more for the service.

Server failure monitoring

      Zhihe network management platform dynamically monitors the running status of the network and equipment in real time. Alarms are used to reflect the running status of the device. A fault monitor (working state monitor) of a device can only have one state at a time. The device has as many fault monitors as there are corresponding status lights.

      The device and resource icons will display the most serious status light color of the device. If the device has 4 fault monitors, corresponding to 5 status lights of red, yellow, blue, and green, the red light will be displayed on the device icon. Similarly, the network icon will display the color of the most serious device status light under the network. If there are 2 devices under the network, one device is the most serious in red, and the other device is the most serious in yellow, the network displays red.

      The Zhihe network management platform displays alarm information in real time on the topology interface, resource view, network-wide working status, and alarm list.

Server performance monitoring

      Comprehensive collection of server resources, applications, services and other performance information. The performance information data can be displayed in the form of charts according to various dimensions such as time, resources, and performance types. Supports viewing the real-time performance data of a certain resource of a device according to the combination of resource type, monitor type, and time interval; according to the time range, the resource monitor type displays the detailed performance values ​​of a single resource of the device in the form of graphs and lists.

Other common functions

  1. Automatic discovery: In the process of automatic discovery, you can search for network devices, identify the device type and manufacturer model, generate a panel diagram of the device or search for device resources, such as: boards, ports, CPU, memory, disk, etc., and discover devices link relationship between.

  2. Topology management: Display network devices and their connection relationships in a visualized topology diagram, which can be edited by users. Devices, device resources, and connections can be managed through the topology map.

  3. Device management: Through the topology view, users can easily manage devices and their configuration parameters.

  4. Device resource management: On the basis of the topology diagram, it is supported to further display the device details, including the physical components of the device, the services on the server (Web server, middleware application service, database server, mail server) or other monitoring objects defined by the user.

  5. Connection management: Users can edit connections through the topology view and select the performance data items displayed in real time for the connection.

  6. Security management: Support a variety of security management functions, such as QOS security policy, MAC-IP binding, black and white list, and admission control.

  7. Statistical report: Supports statistical functions of multiple data, allowing users to have a comprehensive and intuitive understanding of the network. Supports exporting or printing the statistical charts in the software for backup or comparative viewing.

      The whole system uses the open source server operation and maintenance monitoring solution of Zhihe network management platform, and carries out secondary development and expansion integration at the same time. A set of manageable, monitorable and alarmable server operation and maintenance monitoring expert system is realized. This platform realizes the user's management and operation and maintenance monitoring of the computer room, and more importantly, uses this information to connect with the data of its sibling units, sub-units and service units. It ensures that users can know the operation status of the business links at all levels of the forecast center in real time, clearly and accurately. Once a problem occurs, users can make decisions based on this information in a timely manner to ensure that business information points are released in a timely manner.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325072726&siteId=291194637