Observability OpenTelemetry-Metrics (metrics)

I am participating in the "Nuggets·Starting Plan"

1. What is OpenTelemetry

Before introducing metrics (Metrics), let's understand: what is OpenTelemetry?

A microservices architecture enables developers to build and release software faster while having greater independence because they are no longer subject to the cumbersome release process associated with monolithic architectures. As these now distributed systems scale, it becomes increasingly difficult for developers to see how their own services depend on or affect other services, especially during deployments or during downtime, where speed and accuracy are critical. Observability enables both developers and operators to gain visibility into their systems.

So what? In order for a system to be observable, it must be instrumented. That is, the code must emit traces, metrics, and logs. The instrumented data must then be sent to the observable backend. There are many observable backends on the market, ranging from self-hosted open source tools such as Jaeger and Zipkin to commercial SaaS offerings.

In the past, code would be instrumented differently, as each observable backend had its own instrumentation library and agents for emitting data to the tooling. This means that there is no standardized data format for sending data to observable backends. Additionally, if companies choose to switch observable backends, this means they must re-instrument their code and configure new agents to be able to emit telemetry data to the new tools.

A consequence of this lack of standardization is a lack of data portability and a burden on users to maintain instrumented libraries. Recognizing the need for standardization, the cloud community came together and two open source projects were born: OpenTracing (a Cloud Native Computing Foundation (CNCF) project) and OpenCensus (a Google open source community project).

OpenTracing provides a vendor-neutral API for sending telemetry data to an observable backend; however, it relies on developers to implement their own libraries to comply with the specification.

OpenCensus provides a set of language-specific libraries that developers can use to instrument their code and send it to any one of their supported backends.

Due to the various reasons mentioned above , OpenTelemetry was born .

In order to have a single standard, OpenCensus and OpenTracing merged in May 2019 to form OpenTelemetry (OTel for short). As a CNCF incubation project, OpenTelemetry combines the advantages of both and innovates. The goal of OTel is to provide a standardized set of vendor-neutral SDKs, APIs, and tools for ingesting, transforming, and sending data to observable backends (i.e. open source or commercial vendors).

Simply say what is OpenTelemetry

  • OpenTelemetry is an open source collection of observable tools and APIs for generating, collecting, and processing telemetry data for distributed systems. It provides a standardized way to track the performance and behavior of distributed applications and send this data to different observation platforms for storage, analysis and visualization. That OpenTelemetryis a set of observational standards.
  • OpenTelemetry is language-independent, supports various programming languages ​​and frameworks, and can be integrated with multiple observation platforms. By using OpenTelemetry, developers and operators can more easily understand and optimize the performance and reliability of distributed applications.

2. OpenTelemetry Metrics type

The Metrics API defines various instruments. Instruments record measurements, which are aggregated by the Indicators SDK and finally exported out-of-process. Instruments are available in both synchronous and asynchronous forms. Synchronized instruments record measurements as they occur. Asynchronous instruments register callbacks that are called once each time a collection is made and measurements are recorded at that point in time. The following instruments are available:

  • LongCounter/DoubleCounter : Only log positive values, with synchronous and asynchronous options. Useful for counting things like the number of bytes sent over the wire. Counter measurements are aggregated by default as monotonically increasing sums that always increase. Here it can only grow upwards and not decrease downwards. For example, traffic statistics can use this type of metric, and there are two forms of synchronous and asynchronous implementations in the Java language.
  • LongUpDownCounter/DoubleUpDownCounter:记录正负值,具有同步和异步选项。对于计算上升和下降的内容非常有用,例如队列的大小。上下计数器测量默认按非单调总和聚合。在Java语言的实现有同步也有异步两种形式。
  • LongGauge/DoubleGauge:使用异步回调测量瞬时值。对于记录不能跨属性合并的值非常有用,例如CPU利用率百分比。计量测量默认按计量表聚合。例如某个应用的TCP的连接数量或者内存使用情况等等。在Java的实现中只有异步
  • LongHistogram/DoubleHistogram:记录最有用于分析的测量值,例如作为直方图分布。没有异步选项可用。对于记录HTTP服务器处理请求所需的时间持续时间之类的内容非常有用。直方图测量默认按显式桶直方图聚合。直方图在Java实现中只有同步

同步instrument:

Instrument Properties Aggregation 例子
Counter monotonic sum -> delta 请求数,请求大小
UpDownCounter additive last value -> sum 连接的数量
Histogram grouping histogram 请求持续时间,请求大小

异步instrument:

Instrument Name Properties Aggregation Example
CounterObserver monotonic sum -> delta CPU 使用时间
UpDownCounterObserver additive last value -> sum Memory 使用大小
GaugeObserver grouping last value -> none/avg 内存使用 (%)

2.1 Instrument使用说明

  • Gauge(量规)是一个单独的值,可以随时变化,代表瞬时状态或者瞬时计数,例如 CPU 使用率或者当前在线用户数等。在OpenTelemetry中,Gauge通常通过提供一系列的值来表示单个指标的不同状态,例如当前使用内存的百分比。

  • Counter(计数器)是一个不断累加的值,代表在一段时间内的总量,例如请求数量或者错误数量等。在OpenTelemetry中,Counter会自动累加每个时间窗口内的指标值,并且通常会被重置为0,以便开始下一个时间窗口的计数。

  • Histogram: Histogram是一种表示分布的指标,它能够提供值的数量、最大值、最小值、平均值等信息。Histogram通常用于表示例如响应时间、数据大小等指标的分布情况。Histogram会记录一个时间段内值的分布情况,然后将其拆分为不同的桶(bucket),每个桶表示一个区间范围,可以在桶中记录数据数量、最大值、最小值等信息。

具体而言,在OpenTelemetry中,Gauge表示测量瞬时值的指标,例如某一时刻的 CPU 利用率。Counter用于记录某个累计量的值,例如请求数量。Histogram可以将测量数据分成多个桶,以便查看它们的分布情

3. 手动仪表化步骤

引入Jar包到项目(以Gradle为例)

dependencies {
    implementation 'io.opentelemetry:opentelemetry-api:1.24.0'
    implementation 'io.opentelemetry:opentelemetry-sdk:1.24.0'
    implementation 'io.opentelemetry:opentelemetry-exporter-otlp:1.24.0'
    implementation 'io.opentelemetry:opentelemetry-semconv:1.24.0-alpha'
}
复制代码

创建例子:

//创建资源
Resource resource = Resource.getDefault()
  .merge(Resource.create(Attributes.of(ResourceAttributes.SERVICE_NAME, "logical-service-name")));

//创建TracerProvider
SdkTracerProvider sdkTracerProvider = SdkTracerProvider.builder()
  .addSpanProcessor(BatchSpanProcessor.builder(OtlpGrpcSpanExporter.builder().build()).build())
  .setResource(resource)
  .build();
//创建MeterProvider
SdkMeterProvider sdkMeterProvider = SdkMeterProvider.builder()
  .registerMetricReader(PeriodicMetricReader.builder(OtlpGrpcMetricExporter.builder().build()).build())
  .setResource(resource)
  .build();

//创建OpenTelemetry实例
OpenTelemetry openTelemetry = OpenTelemetrySdk.builder()
  .setTracerProvider(sdkTracerProvider)
  .setMeterProvider(sdkMeterProvider)
  .setPropagators(ContextPropagators.create(W3CTraceContextPropagator.getInstance()))
  .buildAndRegisterGlobal();
复制代码

4. OpenTelemetry Metrics导入Prometheus

下面写一个例子将OpenTelemetry Metrics导入Prometheus。我们这里需要增加一个导入

implementation("io.opentelemetry:opentelemetry-exporter-prometheus:1.23.1-alpha")
复制代码

代码如下:

public class OpenTelemetryTest {

    public static void main(String[] args) throws InterruptedException {

        Resource resource = Resource.getDefault().merge(Resource.create(Attributes.empty()));
        SdkMeterProvider build = SdkMeterProvider.builder().setResource(resource)
            .registerMetricReader(PrometheusHttpServer.builder().setPort(7070).build()).build();
        OpenTelemetrySdk openTelemetrySdk = OpenTelemetrySdk.builder().setMeterProvider(build).buildAndRegisterGlobal();

        Meter mxsm = openTelemetrySdk.getMeter("mxsm");
        MemoryMXBean mxb = ManagementFactory.getMemoryMXBean();
        AtomicLong  cc= new AtomicLong();
        mxsm.upDownCounterBuilder("process.runtime.jvm.memory.usage").setUnit("Bytes")
            .buildWithCallback(record -> record.record(Runtime.getRuntime().totalMemory(),Attributes.of(AttributeKey.stringKey("type"),"heap")));
        mxsm.upDownCounterBuilder("process.runtime.jvm.memory.usage_after_last_gc").setUnit("bytes").buildWithCallback(record->record.record(cc.longValue(), Attributes.of(AttributeKey.stringKey("type"),"heap")));
        LongCounter build1 = mxsm.counterBuilder("mxsm.qqq").setUnit("1").build();
        long i =1;
        for(; ;){
            cc.set(mxb.getHeapMemoryUsage().getUsed());
            build1.add(i);
            TimeUnit.SECONDS.sleep(1);
        }

    }

}
复制代码

首先本地运行Prometheus。配置好相关配置。运行上面的程序然后打开Prometheus的控制台网页

image.png

image.png

更多的使用可以参照作者在 Apache EventMesh 的项目 ISSUE#3430的升级改造PR

5. 总结

使用OpenTelemetry Metrics可以带来以下优点:

  1. 标准化的度量指标:OpenTelemetry Metrics定义了一组标准的指标格式和命名规则,这使得应用程序和观察平台可以更容易地共享和理解度量数据。这也可以避免不同应用程序使用不同指标格式和命名规则所导致的混乱。
  2. 支持多种语言和框架:OpenTelemetry Metrics提供了各种编程语言和框架的API,使得应用程序可以轻松地生成和公开度量指标。这也使得跨语言和跨框架的应用程序可以使用相同的指标集合。
  3. 可扩展性:OpenTelemetry Metrics支持自定义指标和自定义指标聚合器,这使得应用程序可以更容易地收集和聚合特定于业务或应用程序的指标。
  4. 可插拔性:OpenTelemetry Metrics支持多种观察平台和后端存储,如Prometheus、Grafana、InfluxDB等。这使得应用程序可以轻松地将度量数据发送到不同的平台进行存储、分析和可视化。
  5. 实时数据分析:OpenTelemetry Metrics可以帮助应用程序实时地分析和可视化度量数据,这可以帮助开发人员和运维人员更快地检测和解决性能问题,提高应用程序的可靠性和稳定性。

Guess you like

Origin juejin.im/post/7229186336154419259