[Reprint] open source technology stack monitoring in addition ELK, there InfluxData the TICK

In addition to monitoring open source technology stack ELK, there InfluxData the TICK

Source | Influxdata

Translator | Key to Mori

How to choose the right tool depends on what you're doing.

Applications will be expressed, and time-series data is one of their languages. DevOps, cloud computing and container technology changed the way we write and run applications. Based on a series of open source projects, InfluxData and communities are committed to providing a modern and flexible monitoring kit.

In the past decade, container, virtual machines, cloud computing changes everything. These changes occur quickly, we need to be able to respond to environmental changes in this rate application also needs in a more convenient way to maintain the continued evolution. Therefore, we need to understand the behavior of the application, be prepared for failure, thereby improving application. Now that we have the appropriate tools and techniques, they only need to integrate applications to understand how to run, how the evolution of the infrastructure, and ultimately understand the system failure and thus enhance performance.

Monitoring logs

We have been reading the log, there are some tools to help us understand the application behavior. We do this because:

We have to trust something . Rely on the information on the screen, we can not understand application behavior. We need to know how users use applications, there has been much exception. There are many indicators that can be tracked, to combine them in order to build trust in our system.

We hope to be able to predict the future. We want to predict based on all kinds of indicators and behaviors that we identified, so that you can judge whether our own growing up, how many grow in the future can still grow fast. With this information, we can design a program that might be able to predict some of the less good event.

 

Sample Log

System monitoring group uses a powerful command called "tail" to read the log. In general, we use the expression in this way. As for logging, there is a basic or normal state. If the log output continued in a normal state, so no problem. If the log output ground too fast or too slow, then there is a problem, we need to take corrective action.

 

Log not understand the most intelligent way, but it is the most common, and everyone in this way monitoring applications. We can certainly do better, but this involves the characteristic log.

Logs are descriptive, it contains a large amount of information. Consideration will save them in the database too. Since the log usually in plain text format, they are not easy to index. This means that the engine must try to understand the relationship between the log and search for support information. If you have a lot of logs, or are happening with logging applications, you need a good system to support. It's hard, but not impossible.

There are many tools and services can be combined and calculated log events taking place, such as Logstash, Kibana, Elasticsearch, NewRelic, CloudWatch, Graphite and so on. Some of them form of services provided, and some are open source projects, some two forms of both. The key point is that there are a lot of choices.

Select the log monitoring tool

How to choose the right tool depends on what you're doing. There are some scenes you need to log debate with people or just for archiving. Since the log contains detailed information about the events taking place, you can log used for these scenarios. When an exception occurs, you can determine its type. More log is used to obtain such information.

However, in some other scenes, you just want to know how the application is running, for example, the log is changed more or fewer abnormal over time and how they are distributed. You do not need to know what really happened, why they happen - you just need to know the application there is a change in behavior on the line. On the other hand, you are also using time series data every day to help understand the system behavior. The timing does not log data so detailed - they are in another language. For example, CPU, memory usage is time-series data.

You can not just use the time series data without using a log, because some issues have to rely on the log to resolve. I'm not here to debate who is better log and time-series data, because you will most likely both. The value they have. You need only two kinds, in fact, the log is a form of time series data. If you value time series and to simplify the log, we can do some calculations, the log will be easier to index.

 

你实际上是在将日志转换成时序数据。想象一下你的应用中有多少登录,多少异常,或者如果你是一家金融公司,有多少笔交易,这些都是时序数据,因为它们是时间点上的一个值,一次登录。它们是时间上的一个分布。这就是时序数据的含义。日志就是可以这样被转换的。这不是一个整数或者一个值,而是从不同角度看的一个日志。

简单说,你可以将日志简化为仅仅一个值以及对应的时间点,你可以将这些时间点进行聚合,比较等等。如果花 10 分钟思考一下你的应用,你能拿到很多时序数据。

另外,所有你能从服务器获得并且使用的资源都是时序数据。你可以使用应用数据统计来可视化它们,进而理解突增的异常率是怎样导致内存使用率上升的。

 

作为开发者,我们知道,5 年前我们做的所有事情如今会显得很复杂。我们现在的目标是将事情简化。简单的事情更容易解释给别人也容易维护。对于时序数据,我就是这样做的:一个值和一个时间,值是一个数字。有了这种模型,你可以做一些计算,聚合它们,创建一个图表,用代价不那么高昂的方式从应用中提取信息。然而,与 Cassandra,MySQL,MongoDB 这些传统的通用工具相比,InfluxDB 更适合用来处理这类数据,因为它专门为持续查询,保留策略等特定场景提供了功能特性,而不是一套序列和压缩的优化特性。

使用 InfluxDB 作为日志存储

InfluxDB 是一个时序数据库。你可以将应用或服务器产生的所有信息推送到这个数据库。它是一个在 Windows 和 Mac 上都可以下载的 Go 二进制文件,很容易安装和启动。InfluxDB 使用 InfluxQL 表达。这意味着你可以使用与 SQL 很相似的语言来查询这个数据库,而 SQL 你已经很熟悉了,不需要学习另一个新语言。这里是选择 InfluxDB 的一些理由的总结。

容易上手

熟悉的查询语法

无外部依赖

开源

水平可扩展

一套结合紧密的时序数据平台的成员

InfluxDB 拥有很大的用户群和社区。结合以下讨论的 InfluxData 平台的其他组件,InfluxDB 创建了一个全栈监控系统,同时支持非常规时序数据(非固定时间间隔发生的事件)和常规时序数据(固定时间间隔的事件指标),如下。

 

在 InfluxData,我们做了一系列基准测试来展示为什么你需要选择合适的时序数据库而不是你喜欢的那类数据库。InfluxDB 和其他可对比的数据库间的写性能差异很大。基准测试通常有倾向性,但是我们会通过独立测试尝试将它们变得更客观。参考 InfluxDB 与 Elasticsearch, MongoDB,Cassandra 和 OpenTSDB 的对比基准测试。

搭建现代化监控系统

InfluxData 拥有一套全栈的开源项目 -- Telegraf (https://www.influxdata.com/time-series-platform/telegraf/),InfluxDB (https://www.influxdata.com/time-series-platform/influxdb/) ,Chronograf (https://www.influxdata.com/time-series-platform/chronograf/) 和 Kapacitor (https://www.influxdata.com/time-series-platform/kapacitor/)。 它们在一起构成 TICK 栈。

 

构建监控或事件系统的完整栈

Telegraf是服务端的一个指标采集和数据发送代理,它是一个可以下载和启动的 Go 二进制文件,使用起来非常简单。你可以在每个服务器上安装一个 Telegraf,将它配置为从所在服务器上采集信息。Telegraf 对各类指标,事件,运行它所在的容器或系统的日志,从第三方 API 拉取的指标甚至通过 StatsD 和 Kafka 消费者服务监听到的指标都提供了集成。Telegraf 是插件化的,并提供输入和输出插件,输出插件可以将指标发送至各类数据仓库,服务,消息队列,比如 InfluxDB,Graphite,OpenTSDB,Datadog,Librato,Kafka,MQTT,NSQ 等等。如果你已经有一个监控系统,并且正在寻找一个强大的采集器,你可以使用 Telegraf。

 

InfluxDB是存储引擎,可作为所有带有大量时间戳数据使用场景的数据仓库,包括 DevOps 监控,日志数据,应用指标,物联网(IoT)传感器数据以及实时分析数据。所有来自 Telegraf 的指标都可以被发送至 InfluxDB。InfluxDB 可以被配置为仅仅保留特定时长的数据,从系统中自动过期并删除不再需要的数据,这样可以节省机器的存储空间。InfluxDB 还提供了一个类似 SQL 的查询语言来进行数据交互。

 

Chronograf是 InfluxData 平台 TICK 栈的用户接口组件,在 Chronograf 上可以看到所有存储在 InfluxDB 的数据,这样就能构建健壮的查询和告警。Chronograf 使用简单,包含一些模板和库让你能够迅速构建带有实时可视化数据的仪表盘。你也可以在 Chronograf 上管理 InfluxDB 和 Kapacitor。如果你不打算使用 Chronograf,还有其他实现 InfluxDB 输出插件的项目,包括 Grafana。

 

Kapacitor是 TICK 栈的本地实时流式数据处理引擎,可以被配置为基于监听到的指标,对正在发生的事件提前采取措施。它可以同时处理来自 InfluxDB 的流式数据和批数据。Kapacitor 允许嵌入自定义逻辑或者用户定义的函数来处理动态阈值告警,对指标进行模式匹配,计算概率统计异常,以及基于告警执行类似动态负载均衡的特定动作。你可以发送 Kapacitor 告警到兼容的 事件管理集成组件,包括 HipChat,OpsGenie,Alerta,Sensu,PagerDuty,Slack 等等。比如,Kapacitor 可以发送一条消息到 PagerDuty,如果夜里发生了问题你可以被通知到,或者发送一条消息到 Slack。

 

启动 InfluxDB 并运行整个 TICK 栈是相当简单的。你可以运行二进制文件或者 Docker 容器,这样一个监控系统就正常运转了。但是一个监控系统真正的目标是当基础设施出问题或者应用宕机时通知你。如果你的监控系统和服务器一起宕掉了,那么它就没有正常工作。所以你需要信任你的监控系统。你需要将它与应用以及基础设施解耦,这样你能 100% 确定当应用和服务器宕掉时监控系统仍然能正常工作。你需要知道这不是一个简单的目标,也不仅仅意味着一些 Docker 运行命令。

 

不是人人都能管理一个监控系统

参考链接:

https://www.influxdata.com/time-series-platform/telegraf/

https://www.influxdata.com/time-series-platform/influxdb/

https://www.influxdata.com/time-series-platform/chronograf/

https://www.influxdata.com/time-series-platform/kapacitor/

  • 发表于: 2018-11-16
  • 原文链接:https://kuaibao.qq.com/s/20181116B0PN3B00?refer=cp_1026
  • 腾讯「云+社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
 

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/11294571.html