Top 10 open source projects for SREs and DevOps

Top 10 open source projects for SREs and DevOps

Since building a scalable and highly reliable software system is the ultimate goal of every SRE (Site Reliability Engineering), the industry currently has a large number of outstanding SRE/DevOps open source projects and software products for users to learn from And use.

Below, we will introduce you to ten most popular open source projects in the field of monitoring, deployment, and maintenance. They will be able to simulate network traffic and facilitate users to model unpredictable (or mixed) events to develop reliable system projects.

  1. Cloudprober
    for proactive tracking and monitoring applications, Cloudprober may be earlier than the user finds various failures on the site. It usually uses an "active" monitoring mode to check whether the target component can operate as expected. For example, it will actively run various "probes" to determine whether the front end of the website can access its back end. Similarly, it can also run a probe to verify whether the local system can actually access the target virtual machine in the cloud. Through this tracking method, users can adopt an independent implementation method, easily track the relevant configuration of the application, and discover potential problems in the system in time.

feature:

Cloudprober can not only integrate natively with open source monitoring stacks such as Prometheus and Grafana, but also export the results of various detections.
Through the feature of automatically discovering cloud service targets, Cloudprober provides out-of-the-box support for GCE and Kubernetes, as well as easy configuration for other cloud services.
To simplify deployment, Cloudprober is completely written in Go and compiled into a static binary file. It can be quickly deployed through various Docker containers. Due to the automatic discovery function, Cloudprober can not only complete most of the update operations, but also usually does not need to be redeployed and configured.
Cloudprober's Docker image file is relatively small and only contains a statically compiled binary file. At the same time, it only needs a small amount of CPU and RAM when running a large number of probes.

Image source-https://github.com/google/cloudprober

  1. Cloud Operations Sandbox (Alpha) is
    an open source platform. Cloud Operations Sandbox allows users to understand the practices of Google Service Reliability Engineering and use Ops Management (previously known as Stackdriver) to manage their cloud systems. Obviously, it requires users to have a Google cloud service account in order to use various native microservices based on the Hipster Shop.

feature:

Provide applications and Demo programs based on cloud-native, micro-service architecture design.
Through script processing, one-click deployment of services to the Google cloud platform.
Its Demo service can generate load generators, which are components that simulate traffic.

Image source - https://github.com/GoogleCloudPlatform/cloud-ops-sandbox

  1. Version Checker for Kubernetes is
    a Kubernetes tool that allows users to observe the image version running in the cluster. At the same time, the tool also allows users to view the current mirrored version in a table format on the Grafana dashboard.

feature:

Able to set up multiple self-hosted mirror registration centers at one time.
The version information can be used as a relevant indicator of Prometheus. It
supports mirror registration centers such as ACR, DockerHub, and ECR.

Image source - https://github.com/jetstack/version-checker

  1. Istio
    as an open framework, may be used to merge Istio micro-services, micro transfer service to monitor the flow of implementing the strategy, and a standardized manner, summarizes various telemetry data (telemetry data). Istio's control plane can provide an abstraction layer for cluster management (such as Kubernetes) on the underlying platform.

feature:

It can provide automated load balancing for HTTP, gRPC, WebSocket, and TCP communications.
Various traffic behaviors can be fine-grained control through rich routing rules, retries, failover, and fault injection.
Provides an embeddable policy layer and API configuration, and supports access control, rate limiting and quotas.
Automatically provide various indicators, logs, and tracking of all traffic within the cluster, as well as at the entrance and exit of the cluster.
Through strong authentication and authorization, secure service-to-service communication is realized in the cluster.

Image source - https://istio.io/

  1. Checkov
    Checkov is an Infrastructure-as-Code type static code checking tool. It not only scans cloud infrastructures such as Terraform, Cloud Details, Cubanet, Serverless, and ARM Models, but also detects configuration errors in their security and compliance.

feature:

There are more than 400 built-in rules, covering security protection and practices for AWS, Azure, and Google Cloud.
It can monitor the development, maintenance and update of IaaS, PaaS or SaaS managed by Terraform by evaluating various settings of Terraform Provider.
It can detect AWS credentials in EC2 Userdata, Lambda context variables, and Terraform Providers.

Image source - https://www.checkov.io/

  1. Litmus
    Litmus is a tool set based on cloud-native chaos engineering modeling. Litmus provides various tools to coordinate the miscellaneous situation on Kubernetes to assist SRE (Website Reliability Engineer) to find vulnerabilities in its deployment. In other words, SRE will first use Litmus to perform chaos testing in the staging area, and then use it to find faults and vulnerabilities during the development process, and then implement solutions to improve the robustness of the system.

feature:

Developers can run chaos testing during application development as an extension of unit testing or integration testing.
When the application encounters a failure path in the pipeline, the builder of the CI (continuous integration) pipeline can run a chaos test to find the error.

Image source - https://github.com/litmuschaos/litmus

  1. Locust
    Locust is an easy-to-use, scriptable and flexible performance testing application. You can use standard Python code to define user behavior without using complex UI (user interface) or domain-specific languages. This makes Locust both extensible and friendly to developers.

feature:

Locust has the characteristics of distribution and scalability, and users can easily get started using it.
Web-based UI can display progress in real time.
With minor modifications, you can test a variety of systems.

Image source - https://github.com/locustio/locust

  1. Prometheus
    as Cloud Native Computing Foundation project, Prometheus can be used to monitor a variety of systems and services. It can extract various indicators from the configured targets at a specified time, test different rules, and display the results. If it finds any violation of the specified conditions, it will immediately trigger a notification.

feature:

Provide a multi-dimensional data model (including time series defined by metric names and key/value sets).
Discover the target through service discovery or static configuration.
It does not rely on distributed storage, and a single server node has autonomy.
Use a powerful and flexible query language-PromQL.

Image source - https://github.com/prometheus/prometheus

  1. Kube-monkey
    Kube-monkey is a Kubernetes cluster implementation of Netflix's Chaos Monkey (https://netflix.github.io/chaosmonkey/). It assists in the creation and verification of robust resources by randomly deleting Kubernetes Pods.

feature:

Kube-monkey can not only be enabled on demand, but also can be used to terminate pods only for Kubernetes (k8s) users.
Various functions that can be customized on demand.

Image source - https://www.slideshare.net/arungupta1/chaos-engineering-with-kubernetes

  1. PowerfulSeal
    PowerfulSeal can inject faults into the Kubernetes cluster to help users identify the root cause of the problem as quickly as possible. At the same time, it can create a chaotic experiment scheme with a complete description.

feature:

Compatible with Kubernetes, OpenStack, AWS, Azure, GCP, and local hosts.
It can be connected with Prometheus and Datadog (https://www.datadoghq.com/) to collect relevant indicators.
Allow users to use multiple modes and customize various use cases.

Image source - https://github.com/powerfulseal/powerfulseal

Summary
As the microservice architecture continues to occupy a dominant position in the cloud computing field, we often need to use reliable tools to monitor instances and troubleshoot running faults in a timely manner. The biggest advantage of open source technology comes from scalability. You can add various functions to the tool as needed to better adapt to the custom architecture. In view of the ten kinds of open source projects introduced above, there are extensive supporting documents and user communities, you can choose according to the actual project.

Original title: Top 10 Open Source Projects for SREs and DevOps, Author: Nir Sharma

[51CTO translation, please indicate the original translator and source for reprinting on the partner site as 51CTO.com]

【Editor's Choice】

Cluster architecture series 2 NFS network file system explanation video tutorial
kubeadm deploys K8S cluster and uses containerd as container runtime @Developer
: How to promote efficient communication? Check out AppGallery Connect!
In the future of big data storage, the development of energy-saving technology has become just needed
. 7 Python tools suitable for beginners, really good

Guess you like

Origin blog.51cto.com/15144514/2677723