CI/CD Observability: New Opportunities for OpenTelemetry

The open source China community team made its first live broadcast, telling the story behind the open source China community in the name of sharing."

If we shift our observability focus to the left, we can resolve issues in CI/CD before they escalate, as two Grafana engineers stated.

Translated from CI/CD Observability: A Rich New Opportunity for OpenTelemetry , author Giordano Ricci; Dimitris Sotirakis.

Continuous integration and continuous deployment (CI/CD) are the backbone of modern software delivery, but visibility into its processes remains limited. Here's how OpenTelemetry (OTel) is changing that, and why these changes are so exciting.

CI/CD has different definitions depending on who you ask, but the consistent part is that it’s continuous—a never-ending feedback loop that’s all about reducing manual processes, producing deployable software, and delivering software when issues arrive Eradicate it before production environment.

This practice has become necessary to reduce manual processes, produce deployable software, and increase confidence in the software delivery process, but we lack the tools to prevent it from becoming unstable.

Observability for CI systems is still in its early stages—and now, thanks to a combination of factors, this opportunity is possible. Let's take a closer look at the historically unobservable aspects of CI/CD pipelines, how OpenTelemetry and related work enables CI observability, and the high ceiling for future developer productivity improvements.

There's still plenty of room to move to the left

CI and alerting have traditionally been used as solutions with a common goal. They work closely together as essential components for continuous automated monitoring. Continuous integration is a protector in the early stages: it detects changes, maintains build health, and continuously monitors system signals. Alerts are often used in later stages. It identifies issues missed by CI. Thus, CI lays the foundation, while alerts respond to threats—continuous collaboration to solve the same problem.

But historically, the focus of observability has been on the "running" part of things, ignoring valuable insights from early stages like build, test, and deployment, as well as other key areas of opportunity in the early stages of a CI pipeline.

We deploy things, we see things catch fire, and then we try to put out the fire.

But if we only look at the final stages of the development and deployment cycle, it will be too late. We don't know what happened during the build phase or the testing phase, or we have difficulty with root cause analysis or increased mean time to recovery, and we miss optimization opportunities. We knew our CI pipelines were taking a long time to run, but if we wanted them to run faster, we didn't know what to improve.

If we shift our observability focus to the left, we can resolve issues before they escalate, increase efficiency by reducing issues in the process, increase the robustness and completeness of our tests, and minimize post-deployment and outage-related costs and expenses.

There's a reason OpenTelemetry is one of the most active projects in the Cloud Native Computing Foundation (CNCF) (technically, the "second fastest project"). It has been an excellent protocol for defining semantic conventions and unifying signal types for logs, metrics, and traces (the "three pillars" of observability), as well as analytics and other emerging signal types.

We've seen OTel make waves in the last year after adding broad support for open standards and common ground in what was once a black box area. Once highly proprietary areas of observability, such as databases, cloud providers, query languages, and log file formats, have been cracked with a well-defined protocol that works and supports nearly everything popular in our modern multilingual world programming language.

The world of CI/CD vendor tools has its own black box. Every development team uses a CI system, and most teams use multiple CI systems. The concept of "owning your own CI data" is gaining traction today with more users who are tired of complex workarounds to get that data in their own well-understood backend architecture, but are struggling with context switching and dedicated There is a backend.

This is why when the OTel CI/CD working group first proposed the introduction of new semantic conventions for CI/CD observability, and subsequently proposed a new Special Interest Group (SIG) specifically for CI/CD observability, So much excitement.

What the future of observability data will look like

Owning your own data means you decide where it goes and how it is stored. By running OpenTelemetry between our CI system and the destination of our choice, OpenTelemetry takes care of transforming it into the database and schema we want, which means a lot of innovation based on previously siled CI data is now introducing observability Tool field.

For example, we built an OpenTelemetry Collector distribution —a binary whose receivers, processors, and exporters extract CI data from Drone, convert it to the format you need, and then send that data to the database. Jenkins has a plugin to export data via the OpenTelemetry Protocol (OTLP).

This is a very exciting time for the observability community. By taking the data from our CI and integrating it with the observability system, we can drill back into the logs from the build and see important information from our CI - such as the time of the first failure. From there, we can figure out what caused the error in a way that more accurately pinpoints when it occurred.

The CI/CD space unlocks vast amounts of pre-crime data for observability systems. Obtaining telemetry data from your builds allows you to build a timeline of your deployment branches and gain insight into what failures occurred, troubleshoot a variety of unstable test issues, easily find and reproduce the root cause of problems, and analyze CI/CD pipeline performance and duration. Perform troubleshooting.

As observability continues to move further left in the CI pipeline, we can resolve issues before they escalate, increase efficiency by removing issues from the process, improve test robustness and completeness, and minimize Costs and expenses associated with post-deployment and downtime.

Driven by OpenTelemetry, we expect the CI/CD space to become one of the hottest evolving areas for observability, joining other major observability use cases such as infrastructure monitoring and application performance monitoring.

CI/CD is the foundation—and often the prerequisite—of every modern production system, so we should emphasize its importance by applying to it the best practices we use for production services.

This article was first published on Yunyunzhongsheng ( https://yylives.cc/ ), everyone is welcome to visit.