Open source or commercial? Top 10 cloud operation and maintenance monitoring tools

Original URL link: http://url.cn/kBXk3X
With the rapid development of cloud computing and the Internet, a large number of applications need to span different network terminals and widely access third-party services (such as payment, login, navigation, etc.), IT The system architecture is becoming more and more complex. Rapidly iterative product requirements and good user experience require IT operation and maintenance managers to ensure the stable availability of core business at all times, and the pain points and difficulties in enterprise operation and maintenance also need to be solved urgently.

  1. Business-oriented O&M not only cares about the running status of a single point of IT resources, but also cares about the health status of the entire business system

  2. If the enterprise uses a large number of APIs and modular applications, then pay attention to the performance changes and metrics of each interface

  3. For operation and maintenance supervisors and enterprise management, a large monitoring screen on the wall is especially required.

  4. O&M needs to view report trend analysis weekly and monthly, but it is difficult to export data from traditional O&M tools

  5. It is necessary to turn the bird in the first time and quickly find the faulty node to reduce the loss caused by business interruption

Cloud Wisdom compares the mainstream open source operation and maintenance monitoring systems and commercial operation and maintenance monitoring systems in the industry, analyzes the positioning, target users and functional characteristics of various products, hoping to help the majority of operation and maintenance, developers and entrepreneurs to find the most suitable operation and maintenance for themselves. tool.

Open source operation and maintenance monitoring products

Zabbix

Recommended stars: five

Zabbix is ​​an enterprise-level open source operation and maintenance platform that provides distributed system monitoring and network monitoring functions based on WEB interface. It is also the most widely used monitoring software among domestic Internet users. More than 85% of the users encountered by Cloud Wisdom are using Zabbix. monitoring solutions.

wKiom1Yy2nKB2oDYAAKJUEKDX3U160.jpg

Easy to get started, simple to get started, powerful, and open source free is the most intuitive evaluation of Zabbix by Cloud Wisdom. Zabbix is ​​easy to manage and configure, and can generate beautiful data graphs. Its automatic discovery function greatly reduces the workload of daily management. The rich data collection methods and API interfaces allow users to flexibly collect data, and the distributed system architecture can support monitoring more equipment. In theory, through the plug-in architecture provided by Zabbix, any needs of the enterprise can be met.

User group: more than 85% of pan-Internet companies.

advantage:

  1. Enterprise-level distributed open source monitoring software that supports multiple platforms

  2.  Simple installation and deployment, flexible integration of various data acquisition plug-ins

  3. Powerful function, can realize complex and multi-condition alarm,

  4. With its own drawing function, the obtained data can be drawn into graphics

  5. Provide a variety of API interfaces to support calling scripts

  6. When a problem occurs, the command can be automatically executed remotely (the execution authority needs to be set on the agent)

shortcoming:

  1. It is inconvenient to modify projects in batches

  2. Although the community is mature, there are relatively few Chinese materials and limited service support;

  3. It is easy to get started, and basic monitoring can be achieved, but the deep-level requirements need to be very familiar with Zabbix and carry out a large number of secondary customization development, which is difficult;

  4. There are relatively many system-level alarm settings. If you do not filter, there will be a lot of alarm emails; and custom project alarms need to be set by yourself, and the process is cumbersome;

  5. Lack of data aggregation function, if it is impossible to view the average value of a group of servers, secondary development is required;

  6. Data reports require special secondary development definitions;

Nagios

Recommended stars: four

Nagios is an open source enterprise-level monitoring system that can realize basic system monitoring of system parameters such as CPU, disk, and network, as well as various basic service types such as SMTP, POP3, HTTP, and NNTP. In addition, by installing plug-ins and writing monitoring scripts, users can implement application monitoring and deploy a hierarchical monitoring architecture for a large number of monitoring hosts and multiple objects.

wKiom1Yy2omT9fWKAAHjPaf3acg681.jpg

The biggest feature of Nagios is its powerful management center. Although its function is to monitor services and hosts, Nagios itself does not include this part of the functional code. All monitoring and alarm functions are completed by related plug-ins.

User group: Enterprises suitable for complex IT environments

advantage:

  1. Errored servers, applications, and devices are automatically restarted, with automatic log scrolling

  2. Flexible configuration, shell scripts can be customized, through distributed monitoring mode

  3. Supports host monitoring in a redundant manner, with various alarm settings

  4. Command to reload configuration files without disturbing Nagios running

shortcoming:

  1. Weak event console functionality and poor plugin usability

  2. Poor handling of performance, traffic and other indicators

  3. No historical data can be seen, only alarm events can be seen, it is difficult to trace the cause of the failure

  4. The configuration is complex, and beginners invest a lot of time, energy and cost

Ganglia

Recommended stars: four

Ganglia is an open source cluster monitoring project initiated by the University of California, Berkeley, originally designed to monitor thousands of network nodes. Ganglia is a cross-platform scalable, distributed monitoring system under high-performance computing systems. It has been widely ported to various operating systems and processor architectures.

wKioL1Yy2uPy_cA4AAJtfIUKzDE233.jpg

User group: suitable for users of large server clusters.

advantage:

  1. It is suitable for monitoring system performance, and it is easy to see the working status of each node through the curve

  2. The monitoring items can be customized. There are two types of monitoring display: table and image. Mobile version is supported.

  3. Easy to deploy, manage tens of thousands of machines through different layers, no need to add configurations one by one

shortcoming:

  1. No built-in message notification system

  2. There is no alarm mechanism, and problems cannot be reported in time

Zenoss

Recommended stars: four

Zenoss Core是Zenoss的开源版本,其商用版本为ZenossEnterprise。作为企业级智能监控软件,Zenoss Core允许IT管理员依靠单一的WEB控制台来监控网络架构的状态和健康度。Zenoss Core的强大能力来自于深入的列表与配置管理数据库,以发现和管理公司IT环境的各类资产。Zenoss同时提供与CMDB关联的事件和错误管理系统,以协助提高各类事件和提醒的管理效率。

wKiom1Yy2t6yIN8QAAKdQVaioQo283.jpg

优点:

  1. Zenoss比较出色的地方在于它的Dashboard,可以配置很多portlet

  2. 每个用户的界面都是分开管理的,自定义dashboard不会影响其他用户

  3. 强大监控功能支持服务器、路由交换、防火墙、存储、数据库、中间件监控

  4. 采用基于HBASE的opentsdb存储任意时间段的数据

  5. 将状态监控,性能监控,资源管理,良好的报告机制进行有机的整合

缺点:

  1. 对资源要求较高,即使只管理少数几台设备,也需要消耗大量硬件及内存等附加资源。

  2. 针对windows系统,开源版只提供SNMP,通过WMI检测CPU,Disk,软硬件和性能只在收费版提供。

Open-falcon

推荐星级:三颗

Open-falcon是小米运维团队从互联网公司的需求出发,根据多年的运维经验,结合SRE、SA、DEVS的使用经验和反馈,开发的一套面向互联网的企业级开源监控产品。

wKioL1Yy2yyTbjrKAAGMHlI9xb8098.jpg

 

                                Open-falcon架构

用户群:目前有几十家企业用户不同程度使用。

优点:

  1. 自动发现,支持falcon-agent、snmp、支持用户主动push、用户自定义插件支持

  2. 支持每个周期上亿次的数据采集、告警判定、历史数据存储和查询

  3. 高效的portal、支持策略模板、模板继承和覆盖、多种告警方式、支持callback调用

  4. 单机支撑200万metric的上报、归档、存储

  5. 采用rrdtool的数据归档策略,秒级返回上百个metric一年的历史数据

  6. 多维度的数据展示,用户自定义Screen

  7. 通过各种插件目前支持Linux、Windows、Mysql、Redis、Memache、RabbitMQ和交换机监控。

缺点:由于发布时间较短,很多基础的服务监控插件(如Tomcat、apache等)还不支持,很多功能还在不断完善中,另外由于缺少专门的支持,虽然有开放社区,但是解决问题的效率相对较低。

商用运维监控系统篇

监控宝

推荐星级:五颗

监控宝是云智慧为用户提供IT性能监控(IT Performance Monitoring)的SaaS产品,包含网站监控、服务器监控、中间件监控、数据库监控、应用监控、API监控和页面性能监控等功能。包含免费版、畅享版和企业版,目前用户约40万,监控宝app也是国内唯一提供移动监控服务的产品。

wKiom1Yy2wiwMMRbAAHo7P8jCzE891.jpg

用户群:覆盖电子商务、移动互联网、广告传媒、在线游戏、教育医疗等行业的几十万用户,小米、陌陌、高德、用友、金山、途牛、聚美优品、陆金所、中国平安、建行信用卡中心、春雨医生、畅游、国家电网、中国电信、滴滴打车、春秋航空、凤凰网等各行业领先企业和中国互联网百强企业超过30%在使用监控宝。

优点:

  1. 作为国内最早提供基于SaaS服务的网络监控平台,监控宝不但为初级用户提供免费的标准服务,企业用户还可以按需购买所需的监控、告警资源,最大限度的节省企业运维成本;

  2. 监控宝通过遍布全球的300多个分布式监测节点,对网络进行稳定性和可用性的主动监控和实时分析,支持http(https)、ftp、ping、udp、tcp、smtp、traceroute等多种协议,测量CDN效果及DNS状态,全网全地域性能趋势分析。

  3. 实时捕捉服务器深层性能指标,支持Linux/Unix/Windows系统及云平台,支持CPU使用率、CPU平均负载、内存使用比例、磁盘IO、磁盘空间使用率、网络流量和系统进程数统计等物理指标及30多种应用服务,云主机监控端一键开启,无需复杂配置。对于应用服务的监控,监控宝已经支持常见的应用类型包括:Apache、Lighttpd、Nginx、Tomcat、IIS、Memcache和Redis,存储层监控支持Hadoop、MySQL、MongoDB、SQLServer、Oracle的健康状态及性能监控。

  4. 监控宝是国内目前唯一支持API监控的网络监控产品,通过API接口调用模拟用户使用过程,支持对get、post、put、delete、head、options六种请求方式进行实时监控;支持JSON、XML、Text、Response Status验证及Postman脚本导入。

  5. Docker监控也是监控宝的独家功能,能够实时监控Docker容器的CPU、内存、网络流量及Swap状态,让开发者和运维人员在使用Docker时清晰掌握其资源消耗状况。

  6. 监控宝提供页面性能管理,基于国际标准制定页面性能指数,识别加载元素的状态及正确性,对全网全用户加载响应时间分析,同时准确定位问题元素及优化建议。

  7. 及时有效的告警通知对运维来说至关重要,监控宝可以根据SLA设置告警阈值,第一时间发送告警通知。监控宝覆盖最全面的告警通知方式:电子邮件、短信、电话语音、URL回调通知、App Push等。另外监控宝提供分级告警通知,能够根据告警事件的不同等级将不同的告警推送给不同的人员,支持企业分层管理!

  8. 监控宝目前对其Smart Agent进行了开源,用户可以根据业务需求定制化开发Agent,同时用户的数据安全得到保障。

  9. 监控宝提供私有化部署解决方案,满足政企、金融行业专有网络监控的需求。

  10. 来自Compuware、CA、IBM等企业IT服务资深专家,超过5年的本土化企业级SaaS服务经验,以及超过百人的技术服务团队,为用户提供最佳的服务保障。

360网站服务监控

推荐星级:两颗

360网站服务监控是一款面向广大站长的网站监控产品,提供免费的网站、服务器监控。

wKiom1Yy2xbjd3kEAAEBNSLYgeI132.jpg

用户群:个人站长

优点:

  1. 服务免费,支持网站HTTP监控、PING监控、域名DNS监控和服务器监控

  2. 提供网站访问全景数据和简单配置信息

缺点:

  1. 只支持简单的网站和服务器监控,历史数据保留15天,且免费监控点数量仅为四个

  2. 最后一次产品更新是2014年9月,目前已停止更新和运营支持

阿里云监控

推荐星级:四颗

阿里云监控是一款免费网站监控产品,可监控站点和服务器,并提供多种告警方式:短信,旺旺,邮件。

wKioL1Yy21yRl_elAAEc0wqTAM8245.jpg

用户群:阿里云用户

优点:

  1. 与阿里云服务捆绑紧密,允许用户自定义数据监控

  2. 阿里云多IDC间内网数据传输,不占用客户公网资源

  3. 支持对业务数据的通用统计,从各个角度反应服务的运行情况

缺点:

  1. 所有服务基于阿里云,功能单一,扩展性差

  2. 功能不够强大,只能满足基础监控需求

百度云观测

推荐星级:两颗

百度云观测是百度推出的一款云服务产品,类似于360云监控、阿里云监控,主要是为站长提供免费的一站式网站监测、预警服务,功能覆盖网站运行状况、安全和访问速度等多个方面。

wKiom1Yy2zKz6ax0AAISbMCCMxg056.jpg

用户群:个人站长

优点:

  1. 对于用户每日访问的站点进行安全检测

  2. 国内各大城市云节点覆盖,支持CDN、DNS状态和主机监控

缺点:

  1. 需要进行网站认证

  2. 监控点少,功能简单,只能监测网站状态,不支持服务器、应用监控。

小蜜蜂网站监测

推荐星级:一颗

小蜜蜂网站监测是一款针对中小企业需求开发的综合测量网站运营情况线上工具,可以定时监控网站或服务器器的可用率(Uptime),一旦网站无法连结、或是服务器发生错误,即可发送告警通知。

wKioL1Yy232TxI4rAAC_qa0rm6w378.jpg

用户群:中小企业网站管理员

优点:

  1. 小蜜蜂通过探测节点和Last Mile两种模式监测网站可用性,支持多种站点监控类型和不同的网络访问传输协议;

  2. 提供多样化监控告警设置,并支持站内实时告警消息,支持RSS。

缺点:

  1. 只支持基本的网站监控,监控点不可选,监控服务不稳定;

  2. 网站性能历史数据不够详细,且无法导出。

With the continuous development of new technologies, cloud services have become necessary for Internet companies, but for a long time there will be a coexistence of traditional physical hosts and cloud hosts, private clouds and public clouds. In addition, the development speed of Internet companies is very fast. Many companies such as Xiaomi and Didi Chuxing have developed in just a few years. Therefore, it is very necessary to choose a suitable cloud monitoring product to grow with the company.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326870402&siteId=291194637