Index Figure 101: Summary chart

This is about monitoring data visualization the second in a series of articles. This article focuses on the summary chart.

In this series, the first part , we discuss the time series chart - display a visualization of infrastructure metrics over time. In this article, we will introduce the summary chart, summary chart which is a specific period of time flattened to provide an infrastructure summary window of visualization:

For each graphic type, we will explain their work and life. But first, we will quickly understand the concepts discussed in the infrastructure necessary for the summary chart: Cross-polymerization time (you can use it as a "time flattening" or "snapshot") and aggregation across space.

Aggregated across time

In order to provide the summary view index, visual must be compressed to a dimension of sight time range, the time series flattened to a single value. This aggregation across time could mean only display metrics latest value returned by the query, or more complex polymerization to return to the calculated value within a moving time window.

For example, you might not want to display the latest report measure of the value of standard queries, but to display the maximum value for each host reported in the last 60 minutes to solve a problem of peak:

[Redis delay map

Span spatial aggregation

Not all indicators are meaningful queries, you can press the host, container or other infrastructure unit division. Therefore, you usually need to cross space to do some aggregation, to create a reasonably reflect a measure of your infrastructure visualization. Such polymerization can take many forms: through the message queue, certain properties (operating systems, the availability of area, hardware profiles, etc.) database table, application, or the host itself to measure the polymerization.

Aggregation across space allows you to slice and segmentation of the infrastructure to accurately isolate observable indicators of critical systems.

Redis host level peaks listed in the above example, the delay compared to the view based on the peak of each internal service Redis construction delay may be more useful. Or, you can only display a maximum infrastructure in any host report:

! [FIG delay the Redis] Cross Space polymerization: host name services by grouping (top), or compressed into a single host list value (bottom)

Time sequence diagram across a polymerization space is also useful. For example, it is difficult to understand host-level chart Web requests, but when aggregated metrics by availability zones, you can easily interpret the same data:

! [FIG delay the Redis] never polymerized (line graph, top) to cross the polymerization space (stacked area chart, bottom)

Mark indicator main reason is to enable aggregation across space.

Single Value Summary

Single summary value using conditional formatting (such as green / yellow / red background) displays the current value of a given metric queries to convey the value is within the expected range. Single-valued digest display values ​​do not necessarily represent instantaneous measurement. Widgets can display the latest reported values, or a display obtained during the entire time window value calculated from the aggregate values ​​for all queries. These visualized as your infrastructure provides a narrow but clear window.

[Host count widget

When to use a single value summary

what why Case
Given the system of performance indicators The key indicators immediately visible Web server requests per secondNGINX requests per second
The key resource indicators An overview of the resource situation and health status Behind the load balancer host healthELB total number of hosts
Error indicator Quick draw attention to potential problems Fatal database exceptionCassandra abnormal unavailable
Compared with the previous value, calculated metrics change Clearly convey the key trends Use the host compared with a week agoEC2 host increase

Ranking Toplists

The chart is an ordered list, allowing you to press the host, cluster, or any other index value of its network infrastructure rank. Because they are easy to explain, it is particularly useful in the list of top-level advanced state panel.

Compared to a single value summary, a list of the top with an additional polymeric layer on the space because the group is divided by the value of the index query. Each group may be a single host or related hosts.

[Redis on the maximum delay AZ

When to Use List

what why Case
From different hosts or groups of work or resource indicators A glance to find poor people outliers, performance or excessive consumption of resources Each application server integration processTop of the list server
Custom index returned as a list of values 以易于阅读的格式传达KPI(例如,用于壁挂式显示器上的状态板) 正在使用的Datadog代理版本Agent Version rankings

变更图表Change graphs

顶列表为您提供了最近度量标准值的摘要,而变化图则将度量标准的当前值与其过去某个时间点的值进行比较。

变更图与其他可视化之间的主要区别在于,变更图采用两个不同的时间范围作为参数:一个用于评估窗口的大小,另一个用于设置回溯窗口。

[Login failed to change the map

何时使用变更图

什么 为什么
每天,每周或每月上升和下降的循环指标 将指标趋势与定期基准分开 数据库写吞吐量,与上周同期相比[Cassandra write throughput
高级基础架构指标 快速识别大规模趋势 主机总数,与昨天同期相比[EC2 host counts in FIG.

主机地图Host maps

主机地图是一种独特的方式,使您可以一目了然地观察整个基础架构或其任何部分。但是,如果对基础结构进行切片和切块(按数据中心,按服务名称,按实例类型等),您将看到所选组中的每个主机都是六边形,并按这些主机报告的任何度量标准进行了颜色编码和大小调整。

这种特定的可视化类型是Datadog独有的。这样,它是专门为基础结构监视而设计的,与本文其他地方描述的通用可视化相反。

[Examples of types of host mapping

何时使用主机地图

什么 为什么
资源利用率指标 一目了然地发现过载的组件 每个应用程序主机的负载(按群集分组)[Each cluster host mapping load
识别资源分配不当(例如,任何实例是否过大或过小) 每个EC2实例类型的CPU使用率[Examples of each type of CPU
错误或其他工作指标 快速识别降级的主机 每个服务器的HAProxy 5xx错误[HAProxy each host server error
相关\指标**** 在单个图中查看相关性 应用服务器的吞吐量与使用的内存[HAProxy each host server error

发行版

Map shows the histogram index values across the infrastructure section. Each bar represents the range of the graph of a combined value of a height corresponding to the number of entities within the range of the reported value.

Maps and heat maps are closely related. The main difference between the two is that the heat figure shows changes over time, and a summary of the distribution of the time window. FIG like as heat, distribution can be easily visualized report metrics specific number of entities, and therefore they are often used to measure draw graphics on a single host or container level.

[Each Web server latency

When to use Release

what why Case
Large number of entities reporting a single indicator Convey at a glance the overall health Each host network delay [Each host distribution delay
The difference between team members view Uptime for each host [Each server is assigned uptime

Guess you like

Origin blog.51cto.com/dba10g/2473037