Kubernetes native CI/CD construction framework Argo detailed!

Author: FogDong (only cloud)

Editor: Bach (Caiyun)

What is an assembly line?

In computers, pipeline is a technology that decomposes a repetitive process into several sub-processes, and makes each sub-process run in parallel with other sub-processes, also called Pipeline . Because this s working method is very similar to the production line in the factory, it is also called the assembly line technology. Essentially, pipeline technology is a time parallel technology. Take the "build image" process as an example:

image.png

In each build of a mirror, we need to pull down the code in the code repository, compile the code, build the mirror, and finally push it to the mirror repository. After every code change, this process remains the same. The use of pipeline tools can greatly improve the efficiency of this process, and repetitive tasks can be easily completed by simple configuration. Such a process is also called CI.

Jenkins is used in the process above. Jenkins is well known as a veteran assembly line framework. In the cloud native era, Jenkins launched Jenkins X as a new generation pipeline based on Kubernetes. In addition, the cloud native era also gave birth to two pipeline frameworks-Argo and Tekton. This article introduces the related content of Argo in detail.

"Kubernetes native CI/CD construction framework Tekton detailed! Detailed introduction of Tekton related content.

Argo

Argo Workflows is an open source container-native workflow engine that can orchestrate parallel jobs on Kubernetes. Argo Workflows is implemented as Kubernetes CRD.

Quick Start

Argo is based on Kubernetes and can be installed directly using kubectl. The installed components mainly include some CRDs and the corresponding controller and a server.

image.png

Note that the above installation will only execute Workflow in the same namespace. For cluster install, see the documentation. Document address: https://github.com/argoproj/argo/blob/master/docs/installation.md

Three-level definition

To understand the CRD defined by Argo, start with the three-level definition. Conceptually, from big to small, they are WorkflowTemplate, Workflow, and template. The names of these resources are somewhat similar, so pay attention to distinguish them.

Template

Starting from the simplest template, a template has multiple types, namely container, script, dag, steps, resource, and suspend. For the template, we can simply understand it as a Pod-the container/script/resource type template will actually control a Pod, while the dag/steps type template is composed of multiple basic types of templates (container/script /resource).

  • container : The most common template type, consistent with the Kubernetes container spec.

  • script : This type is based on Container and supports users to define a script in the template. There is also a Source field to indicate the running environment of the script.

  • resource : This type supports us to operate on the resources of kubernetes in the template. There is an action field that can specify the type of operation, such as create, apply, delete, etc., and supports setting related success and failure conditions to determine the success of the template And failure.

  • suspend : Suspend template will suspend execution for a period of time or before manually resume execution. The execution can be resumed from CLI (using argo resume), API or UI.

  • steps : The Steps Template allows users to define tasks in a series of steps. In Steps, [--] represents sequential execution, and [-] represents parallel execution.

  • dag : DAG template allows users to define tasks as directed acyclic graphs with dependencies. In DAG, other tasks that must be completed before a specific task starts are set through dependencies. Tasks without any dependencies will run immediately. For the detailed logic of DAG, please refer to the source code https://github.com/argoproj/argo/blob/master/workflow/controller/dag.go#L204 .

Workflow

In a Workflow, there is a field named templates in its spec, in which at least one template is required as its component task.

A simple example of hello world is as follows:

image.png

In this example, a template of type container is specified in the templates field of the Workflow, and whalesay mirroring is used.

Here is a slightly more complicated workflow:

image.png

WorkflowTemplate

WorkflowTemplate is equivalent to the template library of Workflow, and like Workflow, it also consists of templates. After creating WorkflowTemplate, users can directly submit them to execute Workflow.

image.png

Workflow Overview

image.png

After understanding the three-level definition of Argo, let's first dive into the most critical definition of Argo, Workflow. Workflow is the most important resource in Argo and has two important functions:

  • Defines the workflow to be executed.

  • The state of the workflow is stored.

Because of these dual responsibilities, Workflow should be regarded as an Active object. It is not only a static definition, but also an "instance" of the above definition.

The definition of Workflow Template is almost the same as Workflow, except for the type. Just because Workflow can be a definition or an instance, WorkflowTemplate is needed as a template of Workflow. After the WorkflowTemplate is defined, a Workflow can be created by submitting (Submit).

The Workflow is composed of an entrypoint and a series of templates. The entrypoint defines the entry point for this workflow execution, and the template will actually execute a Pod, where user-defined content will be reflected in the Pod as the Main Container. In addition, there are two Sidecars to assist in operation.

Sidecar

In Argo, these Sidecar mirrors are argoexec. Argo uses this executor to complete some process control.

Init

When the user's template needs to use the artifact or script type in inputs (the script type needs to be injected into the script), Argo will add an Init Container to the pod-its mirror image is argoexec, and the command is argoexec init.

In this Init Container, the main job is to load artifacts:

image.png

Wait

Except for Resource type templates, Argo will inject a Wait Container to wait for the completion of the Main Container and end all Sidecars. The image of this Wait Container is also argoexec, and the command is argoexec wait. (The Resource type is not needed because the Resource type template directly uses argoexec to run as the Main Container)

image.png

Inputs and Outputs

When running Workflow, a common scenario is the delivery of output products. Usually, the output product of a Step can be used as the input product of the subsequent steps. In Argo, the product can be passed through Artifact or Parameter.

Artifact

To use Argo's Artifact, you must first configure and use the Artifact storage repository. The specific configuration method can be specified by modifying the default Config Map containing Artifact Repository information or displaying it in Workflow. For details, please refer to the configuration document, which will not be repeated here. The following table shows the warehouse types supported by Argo.

image.png

A simple example of using Artifact is as follows:

image.png

By default, Artifact is packaged as tar package and gzip package, we can also use the archive field to specify the archive strategy.

In the above example, the template named whalesay uses the cowsay command to generate a file named /tmp/hello-world.txt, and then outputs the file as an Artifact named hello-art. The template named print-message accepts an input artifact named message, unpacks it on the path of /tmp/message, and then uses the cat command to print the contents of /tmp/message.

As mentioned in the previous Sidecar introduction, Init Container is mainly used to pull Artifact products . These Sidecars are the key to product delivery. Below, we introduce another way of product delivery to experience the key to product delivery in Argo.

Scripts

Let's look at a simple example:

image.png

In the above example, there are two templates of script type, script allows the use of source to standardize the script body. This will create a temporary file containing the main body of the script, and then pass the name of the temporary file as the last parameter to command (the interpreter that executes the main body of the script), so that different types of scripts (bash, python, js etc.) can be executed conveniently ).

Script template assigns the standard output of the script to a special output parameter named result to be called by other templates. Here, the script output of the template named generate can be obtained by {{steps.generate.outputs.result}}.

{{xxx}} is Argo's fixed variable substitution format:

So, how should the script output be obtained inside the container?

Let's go back to Sidecar. In the Wait Container, there is such a logic:

image.png
image.png

Let's take a look at the Volume Mount of this Wait Container:

image.png

Now it is very clear, Wait Container gets the output result in Main Container by mounting docker.sock and service account, and saves it in Workflow . Of course, because a lot of information is stored in Workflow, when there are too many Steps in a Workflow, the structure of the entire Workflow will be too large.

Parameter

Parameter provides a general mechanism to use the result of a step as a parameter. The working principle of Parameter is similar to the script result, except that the value of the output parameter will be set to the content of the generated file instead of the content of stdout. Such as:
image.png

Volume

This is not a standard way for Argo to process product delivery, but through shared storage, we can obviously achieve the result of a common product. Of course, if you use Volume, we don't need to use Inputs and Outputs.

In Workflow's Spec, we define a Volume template:

image.png

And mount the volume in other templates:

image.png

Other process control functions

cycle

When writing Workflow, it is often very useful to be able to loop through a set of inputs, as shown in the following example:

image.png

In the source code implementation, withItems will be judged, and if it exists, each element in it will be expanded once.

image.png

Conditional judgment

Specify with the when keyword:

image.png

Error retry

image.png

Recursion

Templates can call each other recursively , which is a very useful function. For example, in a machine learning scenario: you can set the accuracy rate to meet a value, otherwise the training will continue. In the following coin toss example, we can continue to toss the coin until the heads appear before ending the entire workflow.

image.png

The following is the result of the two executions. The first execution is directly thrown to the front and ends the process; the second execution is repeated three times before being thrown to the front and ends the process.

image.png

Exit processing

Exit processing is a template that is specified to be executed at the end of the workflow, regardless of success or failure.

image.png

Compare Tekton

Compared with Tekton, Argo's process control functions are more abundant. It has functions such as loops and recursion, which are very suitable for some machine learning scenarios. The Argo community also positions itself as MLOps, AIOps, and Data/Batch Processing, which is why the bottom layer of Kubeflow Pipeline is based on Argo (although KFP is also doing Tekton's backend).

But in terms of access control, Argo is not as good as Tekton. I personally think that Tekton's structure is more clearly defined. Both have their own advantages and disadvantages, you can choose according to your needs.

Reference documents

Guess you like

Origin blog.51cto.com/14133165/2590575