Argo CD Practical Tutorial 06

3.4 Planning for Disaster Recovery

Argo CD doesn't use any database directly (Redis is used as cache), so it doesn't appear to have any state. Earlier, we saw how to implement a highly available installation, primarily by increasing the number of replicas per deployment. However, we also have application definitions (such as Git source and target clusters), and details on how to access the Kubernetes cluster or how to connect to a private Git repo or a private helper cluster. These things make up the state of Argo CD, and they are kept in Kubernetes resources—either local resources, such as secrets for connection details, or custom resources for applications and application constraints.
Disasters can happen due to human intervention, such as a Kubernetes cluster or Argo CD namespace being deleted, or it could be a problem with some cloud providers. We may also have scenarios where we want to move the Argo CD installation from one cluster to another. For example, maybe the current cluster was created with a technology we no longer want to support, like kubeadm ( https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ ), and now we want to move to a cloud provider management technology.
You might be thinking: "but I think this is GitOps, so everything is kept in the Git repo, which means it's easy to recreate?" First of all, not everything is kept in the Git repo middle. For example, when registering a new cluster in Argo CD, we have to run a command that keeps those details out of Git (which is ok for security reasons). Second, recreating everything in a GitOps repo can take a lot of time -- potentially thousands of applications, hundreds of clusters, and thousands of Git repos. A better option might be to restore all previous resources from a backup rather than recreating all resources from scratch; it's much faster to do so.

3.4.1 Install CLI

Argo CD provides a utility section of the main CLI (argocd management subcommands) that can be used to create backups (export all relevant data) to YAML files or import data from existing files. The CLI can be found in the main Docker image or installed separately.
To install the v2.1.1 CLI version, run the following command (this is the macOS version; for other options, check the official page: https://argo-cd.readthedocs.io/en/stable/cli_installation/ ):

curl -sSL -o /usr/local/bin/argocd https://github.com/argoproj/
argo-cd/releases/download/v2.1.1/argocd-darwin-amd64
chmod +x /usr/local/bin/argocd
argocd version --client

If everything worked, the output of the previous command should show the Argo CD client version, for example:

argocd: v2.1.1+aab9542
 BuildDate: 2021-08-25T15:14:05Z
 GitCommit: aab9542f8b3354f0940945c4295b67622f0af296
 GitTreeState: clean
 GoVersion: go1.16.5
 Compiler: gc
 Platform: darwin/amd64

** ** Now that we have the CLI installed, we can use it to create backups.

3.4.2 Create a backup

We can now connect to the cluster and create backups. You should be connected to the cluster with Argo CD installed (for pointing to the cluster's kube context). Run the following command, which will create a file with a custom name based on the current date and time (this way, you can run it daily or even more frequently):

argocd admin export -n argocd > backup-$(date +"%Y-%m-
%d_%H:%M").yml

** **Even though we only installed a simple application (Argo CD itself), you can see that the backup is a fairly large file (it should contain almost 1000 lines). This is also because the Argo CD application is a fairly large application (we deploy many resources) which keeps all the history it syncs. You will find the backup file I generated for the HA installation in the Git repository ( https://github.com/PacktPublishing/ArgoCD-in-Practice ) in the ch03/disaster recovery folder.
Next, we should take this backup file and save it in a cloud storage system (such as AWS S3, Azure Blob, or Google Cloud Storage), encrypt it, and have an access policy around it. This is because, for a real installation, a lot of sensitive information will be stored there, including access to your production Kubernetes cluster.

3.4.3 Restoring on a different cluster

To restore a backup, you need to have Argo CD installed in the target cluster. This is because, in the backup, we have its configuration, and all the configmaps and secrets, so everything we changed for the initial install should be there. However, backups do not store the actual deployment or state set. This means they need to be mounted before restoring the backup. The same goes for custom resource definitions - we'll have all instances of the application and application projects, but we won't have definitions for these custom resources.
So, in the new cluster, perform the same installation as in the previous HA installation using the Kustomize section. Then, run the following command (you will need to change the filename to match your command):

 argocd admin import - < backup-2021-09-15_18:16.yml  

You should now have a fresh installation with all the state (application, cluster, and Git repo) you had when you created the backup. The only difference is that now there isn't any in the Redis cache, so Argo CD needs to start recalculating all the manifests for the Git repo, which may affect the performance of the system for the first few minutes. After that, everything should work as usual.
In this section we saw how easy it is for the Argo CD tool to automate everything from creating regular backups to restoring them on a newly created cluster. It is important to have a backup strategy and to do recovery exercises from time to time. We should be prepared for when disaster strikes and have the necessary runbooks so we can execute the same outcome whether it's 2am or 2pm. Disasters are rare, so we encounter many situations in our daily operations. This could be an increase in the number of applications being synchronized, or a specific version of the YAML templating tool causing timeouts or even an unresponsive system. For this, we need to have a good observability strategy. We will explore this issue in the next section.

3.5 Enabling observability

Observability is important because it can provide answers about the health, performance and behavior of the system. When you're working on a large application with dozens of teams deploying their monoliths and microservices to Kubernetes, there's a good chance that things don't always go as smoothly as you'd expect. There's always some wrong setup, an old version that shouldn't be used, immutable fields trying to update, many apps that need to be synced at the same time, a team trying to use a private repo that doesn't have SSH keys set up, or can cause large apps to time out.
Fortunately, Argo CD exposes a number of metrics that allow us to understand the system, whether it is underutilized or overutilized, and what to do about it. It also gives us ways to directly alert the responsible development team of issues with a particular application when a sync fails. The alerts we will create can be divided into two directions: one for the team that operates Argo CD, and one for the team that handles microservices.
In this section, we'll learn how to monitor Argo CD with Prometheus, which has become the default choice for monitoring dynamic environments, such as microservices running in containers on Kubernetes. Prometheus, because of its focus on reliability, is one of the best tools for finding out the current state of the system and easily identifying possible problems.

3.5.1 Monitoring with Prometheus

Just as Kubernetes became the standard for container orchestration, Prometheus has become the standard for monitoring. This is the second project to enter the Cloud Native Computing Foundation (CNCF), Kubernetes being the first. In the cloud native world, we have an operator for running Prometheus in Kubernetes (just like Argo CD is an operator for GitOps), called the Prometheus operator (https://prometheus-operator.dev/ ) . The Argo CD component exposes metrics in Prometheus format, which makes it easy to install a Prometheus operator in your cluster and start scraping these endpoints. There is a help chart that you can use to install it (usually, this is done in a separate namespace called monitoring): https://github.com/prometheus-community/helm-charts/tree/main /charts/kube-prometheus-stack .
Once installed, we will need to tell Prometheus where it can find endpoints that expose metrics. For this, we can use a custom service monitor resource ( https://prometheus-operator.dev/docs/operator/design/#servicemonitor ). Three services should be removed - one for the application controller, one for the API server, and one for the repository server - thus covering all Argo CD components. You can find the service monitor resources in the official documentation at https://argo-cd.readthedocs.io/en/stable/operator-manual/metrics/#prometheus-operator . We also have a Git repository in the ch03/server folder (https://github.com/PacktPublishing/ArgoCD-in-Practice ) maintains a copy of them.
You can apply them using GitOps by placing the files in a folder in your Git repository, and then creating an application that points to it.
After we have a service monitor resource and the scraping process starts, there is a Grafana dashboard ( https://grafana.com/grafana/dashboards ), at https://argo-cd.readthedocs.io/en/stable /operator-manual/metrics/#dashboards , you can use it. Follow the official documentation on how to import a dashboard and see how to add it to your own Prometheus Operator installation: https://grafana.com/docs/grafana/latest/dashboards/export-import/#import-dashboard . We'll cover monitoring from two different angles - one for the team responsible for Argo CD, which we'll call the operations team, and one for the team building the application, which we'll call the microservices team.

3.5.2 Metrics for Operations Teams

To perform synchronization, Argo CD will use a repository server and controller. These are the most important parts we need to monitor, and taking care of them will give us a good, high-performance system. Various metrics can help us understand their behavior. So, let's explore some of them.
_OOMKilled _
_ _ Over time, we realized that the most valuable metric for these two components was not what Argo CD exposed, but the out-of-memory (OOM) kills performed by the node OS on containers that were trying to use too many resources die. This is a good indicator that you are not setting enough container resources, or you are using a parameter that is too large for parallelism. The Argo CD documentation provides a good explanation of when OOM can occur, and what parameters to use to reduce parallelism: https://argo-cd.readthedocs.io/en/stable/operator-manual/high_availability/ . For the repository server, too many manifests are being generated at the same time, and for the application controller, too many manifests are being applied at the same time.
You can use the following queries to get alerts on these events. It checks if the container was restarted within the last 5 minutes, and if the reason for the last termination was OOMKilled (this query comes from this old and valuable kubernetes thread: https://github.com/kubernetes/kubernetes/issues/69676# issuecomment-455533596 ):

sum by (pod, container, namespace) (kube_pod_container_
status_last_terminated_reason{reason="OOMKilled"}) * 
on (pod,container) group_left sum by (pod, container) 
(changes(kube_pod_container_status_restarts_total[5m])) > 0

If you get one or two of these alerts a week, it's probably not that bad and you can wait a few more weeks and see what happens. If they happen several times a day, either for the repo server or the controller, you should take action. Here are some things you can do:

  • Increase the number of replicas of a deployment/state set so that when application synchronization is required, the load will be spread across more instances.

  • Set more CPU and memory resources for the container.

  • Lower the value of the Repository Server's Concurrency Limit parameter and the Controller's Concurrency Limit parameter value.

    OOM has to do with how much work the controller needs to do to coordinate the state of the application. This means that if you haven't deployed for a few days, this might not happen, whereas if you start syncing many apps at the same time, you might start getting OOM alerts. If so, then we should see a correlation with the load metric we defined in the system. We'll look at these next.
    System Load Metrics
    **** There are metrics that reveal the load on the system. Here we'll look at one related to the repository server and one related to the application controller.
    The task of the repository server is to fetch the contents of the Git repo and then create a manifest based on the templating engine used. After they have created the final manifest, the application controllers will continue their work. We've seen that using too many manifests at the same time can cause OOM issues, but what happens when we have many requests to fetch the contents of a Git repository? In this case, there is a metric called argocd_repo_pending_request_total (in Prometheus we call it a metric), which depends on the number of pending requests on the repository server instance. Of course, this number should be as close to zero as possible, indicating that the current number of repo instances can handle the load. However, if it rises for a short period of time, this is not a problem. Problems can arise when the value is large over a long period of time, so this is something you should be aware of.
    NOTE - Scaling the repo server with HPA
    If you are already thinking about scaling the repo server with HPA based on this metric, please join the discussion in this thread as it is not that easy: https://github.com/argoproj/argo- cd /issues/2559 .
    On the application controller side, there is another important metric that shows system load - namely argocd_kubectl_exec_pending. This shows the number of application and authentication commands that will be executed on the target cluster. The maximum number can be equal to the --kubectl-parallelism-limit flag, since this is how many parallel threads can launch commands on the target cluster. This is not a problem if it reaches a maximum value for a short period of time, but when the value remains large for a longer period of time, problems such as synchronization that takes a lot of time can arise.

3.5.3 Metrics for Microservices Teams

If you're trying to apply the idea of ​​platform teams creating a self-service platform for development teams, then you should allow development teams to monitor, get alerts, and take action when something goes wrong with their live deployments. One of these ways is to allow them to set up alerts for Argo CD applications that they use to bring their microservices to production. There are two metrics that provide value to development teams. It can be used for synchronization status, especially if there is a failure during the synchronization process. Another kind of health state that can be used for an application, specifically a degraded state, which means something is not functioning as expected.
Application Sync Status
** It's useful to be reminded of the sync status, so teams don't need to pay attention to the UI or run regular commands through the CLI to find out the status of a new version deployment. This is especially true when you're doing it several times a week, let alone more often. Teams can set up alerts for the applications they manage so that if they fail to sync new versions of Docker images or other changes they make to their manifests, then they will be alerted. The argocd_app_sync_total metric can be used for this.
The following query can be used to alert you about any application that has changed its sync status to "Failed" within the past 5 minutes. This query only looks for applications from the argocd namespace and starts with accounting (the accounting team will be interested):

sum by (name) (changes(argocd_app_sync_total{phase="Failed", 
exported_namespace="argocd", name=~"accounting.*"}[5m]))>0

** ** If there is nothing wrong, we shouldn't get any results. However, if we do get any apps that fail to sync state, we should start investigating why.
Application Health State
** **Health state is different from sync state because it can be modified independently of sync. We're usually looking for a degenerate state, which happens when the function is not working properly, such as if you ask for three replica state sets, but only two sums are running and the third is still initialized after a long time, or it is terminated, Not scheduled now, still pending. Such a scenario can happen at any time when the application is running in production, and it is not directly related to synchronization events. The metric to track it is argocd_app_info. You can use the following query to track the degraded status of the Argo CD application from the Argo CD namespace if the name starts with irritating but it doesn't end with the application (this can be useful for intermediate applications using the application suffix by application mode: https: // argo-cd.readthedocs.io/en/stable/operator-manual/cluster-bootstrapping/#app-of-apps-pattern ):

argocd_app_info{health_status="Degraded",exported_
namespace="argocd",name=~"prod.*",name!~".*app"}

** **Getting an application with a degraded status in the results clearly indicates that some issue in the cluster is preventing the application from functioning properly, so it needs to be checked.
Next, we'll learn how to notify the user about events that occurred in Argo CD, such as whether the application was successfully deployed. This can be achieved with different tools. We'll take a look at those that are specific to Argo CD, such as the Argo CD Notifications project and the custom webhooks built into Argo CD.

3.6 Notification to end users

For syncing applications, Argo CD can work in two different ways. First, it works manually so that new commits to the GitOps repository don't have any direct impact unless you manually trigger a sync via the CLI, using the UI, or using an API call. The second mode, and I think the most commonly used one, is that after a push to the repository, Argo CD will start automatically reconciling the cluster state to match our declared state.
Developers performing state changes are interested in the outcome of the reconciliation - they want to know if their microservices are running correctly, or if they had some issues with the new configuration or the new container image.
Earlier, we learned how to monitor sync progress using Prometheus and the application health and sync status exposed by Argo CD. However, there is another way we can notify the development team that there are some failures in their microservices, or when everything is going perfectly: ArgoCD notifies the project. This is especially in view of the Argo CD, which can provide users with more useful details. You can learn more about the Argo CD Notifications project at https://github.com/argoproj-labs/argocd-notifications .

3.6.1 Install Argo CD Notifications

Like Argo CD, the Notifications project can be installed in three different ways: via the help chart ( https://github.com/argoproj/argo-helm/tree/master/charts/argocd-notifications ), using a simple manifest ( https://github.com/argoproj/argo-helm/tree/master/charts/argocd-notifications ), or via Kustomize. We will use the Kustomize option and install it using GitOps mode. All the notification code we will build can be found in the ch03/notifications folder of https://github.com/PacktPublishing/ArgoCD-in-Practice.git .
In the same repo you used to install Argo CD, create a new folder called notifications. In that folder, create a file called kustomization.yaml and add the following content. As you can see we kept the same name argocd space; there is no reason to create another application as this is not a standalone application. However, it requires an instance of Argo CD to work (in other words, if Argo CD is not installed, we will not have an instance of Argo CD Notifications):

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: argocd
bases:
 - github.com/argoproj-labs/argocd-notifications/manifests/
controller?ref=v1.1.1

You should commit the files to the repo and then push to the remote so we can create the application files. Name this argocd-notifications-app.yaml and place it in the top folder this time (it should be at the same level as the argocd-app.yaml file we created earlier in this chapter when we created Argo CD self-management). Here are the contents of the file (just make sure you replace the path and referrer URL with your own values):

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
 name: argocd-notifications
spec:
 destination:
 namespace: argocd
 server: https://kubernetes.default.svc
 project: default
 source:
 path: ch03/notifications
 repoURL: https://github.com/PacktPublishing/ArgoCD-inPractice.git
 targetRevision: main
 syncPolicy:
 automated: {
    
    }

Now that we have created the file, we need to apply it using the following command (don't forget to commit and push it to your repo for future reference):

 kubectl apply -n argocd -f argocd-notifications-app.yaml  

After applying it, you should have the Argo CD notification application installed by Argo CD installed in your cluster. In the UI, it should look like this:
image.png
Figure 3.2 - The Argo CD Notifications app in the Argo CD UI
Next, we'll learn how to start a GitLab pipeline from within the Argo CD Notifications. This can be used to update the deployment status of our application in various tracking systems and can be seen as a way of closing the GitOps coordination loop.

3.6.2 Starting the pipeline

Here's a good article on the Argo blog that explains how to set up the correct configuration to send notifications via email: https://blog.argoproj.io/notifications-for-argo-bb7338231604 . In the official documentation, there are more examples of notification services, such as Slack ( https://argocd-notifications.readthedocs.io/en/latest/services/slack/ ), Microsoft Teams ( https://argocd-notifications. readthedocs ., Telegram ( https://argocd-notifications.readthedocs.io/en/latest/services/telegram/ ), and a few others. on how to use webhooks to set the status of a GitHub commit ( https://argocd- notifications.readthedocs.io/en/latest/services/webhook/#set-github-commit-status ) or a custom hook to post data ( https://argocd-notifications.readthedocs.io/en/latest/services/webhook /#set-github-commit-status ).
An example of a pipeline. GitLab is now increasingly used for CI/CD because it allows pipelines to run in a cloud-native fashion on containers, its Kubernetes runner:https://docs.gitlab.com/runner/executors/kubernetes.html . For our demonstration, we will not be using the Kubernetes executor; we will be using the shared version provided by GitLab, which is based on Docker Machine: https://docs.gitlab.com/runner/executors/docker_machine.html. We are The created pipeline will run the same pipeline on Kubernetes and Docker machine runners.
First, create a user on GitLab by going to https://gitlab.com/users/sign_up . Once you have your account up and running, go ahead and create a project. There should be a new project button in the upper right corner of the GitLab UI. Select "Create Blank Project" on the next page, after which you should be able to set a name for the project.
In my case, I named it Recovery-Manual-Pipeline and made the project public so I can share it with everyone. You can set it up however you want:

Figure 3.3 - Creating a new GitLab project
Once we have created the project, before adding any code, we need to set up a simple authentication method for the Git repository using SSH keys. First, go to the top right corner, find the last link on the right, and click on it - you should see the Preferences menu item. This will take you to a page where you have a large menu on the left, including an entry for SSH keys. Clicking it will take you to a page where you can add your SSH key (follow steps 1, 2 and 3 in the screenshot below to get to the SSH keys page). There will be a link explaining how to generate a new link. After creating it, you can paste the public item into the text box (not private), give it a title, and click Add Key:
image.png
Figure 3.4 - How to get to the SSH Keys page
Now that we have the correct setup, we can clone, pull, and push to our Git repo without any issues.
Now, back in our repo process, we should clone it locally and open it in an editor. We'll build a pipeline using a job called update-deploy-status . It will use an Alpine Docker image and will run a dummy script that will display the application's name, status, and commit SHA applied by Argo CD. The idea is that this job can make changes such as setting tags for Git commits, or putting production tags on certain tasks after a sync event. Ours is a dummy one to explain the connections between events and pipelines, but yours can be more advanced. So create a new file near the README .md file named gitlab-ci.yml and set the following pipeline definition:

update-deploy-status:
 stage: .post
 script:
 - echo "Deploy status $APPLICATION_DEPLOY_STATUS for 
Application $APPLICATION_NAME on commit $APPLICATION_GIT_
COMMIT"
 image: alpine:3.14
 only:
 variables:
 - $APPLICATION_DEPLOY_STATUS
 - $APPLICATION_NAME
 - $APPLICATION_GIT_COMMIT

NOTE - Manual GitLab pipelines
can be started manually using conditions that require a job to run (see the Unique: section). This also allows us to start the pipeline from the GitLab UI, which is a great way to debug it.
Next, we will use the created .gitlab-ci.yml file to create a commit and push it to the remote repo. Before we define webhooks, we need a way to authenticate Argo CD notification calls to GitLab pipelines. We'll use a pipeline trigger token for this: https://docs.gitlab.com/ee/api/pipeline_triggers.html . We'll create it from GitLab's UI. On the project's home page, in the left menu, there is an entry for Settings. After clicking it, you will see a CI/CD item in its submenu. Clicking it will take you to a page that can be expanded, one of which is a pipeline trigger. There, you can create a new trigger; I named it My Argo CD Notify Webhook. After clicking Add Trigger the token will appear:
image.png
Figure 3.5 Create a Pipeline Trigger - give it a name and click the Add Trigger button
Now we have a token when we want to notify the webhook from Argo CD to start We can use this for authentication when piped. In the pipeline triggers section, we already have an example of what a webhook should look like - all we need to do is tweak it with our configuration. The marker is the one we just created. In our case, REF_NAME is the main branch. For these variables, we will populate them into the Argo CD notification template:

curl -X POST \ -F token=TOKEN \ -F "ref=REF_NAME" \ -F 
"variables[RUN_NIGHTLY_BUILD]=true" \ https://gitlab.com/api/
v4/projects/29851922/trigger/pipeline

For the next few steps, we can follow instructions on how to notify in Argo CD ( https://argocd-notifications.readthedocs.io/en/stable/ ).
We need to modify the argocd-notifications-cm configuration map, which we can do by changing Git. Inside the notifications folder that we created when we installed Argo CD notifications, we need to add a new folder called patches. Along the way, we'll add a file called argocd-notifications-cm.yaml, where we'll define the trigger, when the webhook is sent, and what the webhook should look like, which involves a notification template. We refer to triggers as synchronizers. When the sync result ends up as success, error, or failure, we activate it and link it to the gitlab-webhook template. Next, the template is linked to the gitlab webhook, which shows that an HTTP post request will send the variables needed to start our job, the ref set to primary, and the auth token (which you will need to set to a real value - you previously created):

apiVersion: v1
kind: ConfigMap
metadata:
 name: argocd-notifications-cm
data:
 trigger.on-sync: |
 - when: app.status.operationState.phase in ['Succeeded', 
'Error', 'Failed']
 send: [gitlab-webhook]
 service.webhook.gitlab: |
 url: https://gitlab.com/api/v4/projects/29851922/trigger/
pipeline
 headers:
 - name: Content-Type
 value: multipart/form-data
 template.gitlab-webhook: |
 webhook:
 gitlab:
 method: POST
 body: ref=main&token=<token-goeshere>&variables[APPLICATION_DEPLOY_STATUS]={
    
    {.app.status.
sync.status}}&variables[APPLICATION_NAME]={
    
    {
    
    .app.metadata.
name}}&variables[APPLICATION_GIT_COMMIT]={
    
    {
    
    .app.status.
operationState.operation.sync.revision}}

We need an entry in the kustomization.yaml file to reference this new file:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: argocd
bases:
 - github.com/argoproj-labs/argocd-notifications/manifests/
controller?ref=v1.1.1
patchesStrategicMerge:
 - patches/argocd-notifications-cm.yaml

Now, create a commit, push it to the remote, and make sure the Argo CD application is synced to include our changes. We should now be ready to update one of our application custom resources to subscribe to the webhook we created. In this case we have the applications in Git, but they are not directly tracked by Argo CD, so if we change them, we still need to manually apply them. In Chapter 5, Argo CD Booting a K8s Cluster, we'll look at the application schema, which allows us to store all application definitions in Git. But for now, we can also perform these small changes manually.
Earlier, in the ch03 folder, we created an argocd-app.yaml file. Here we'll modify it so that it includes comments that will specify that it subscribes to the gitlab webhook with the sync trigger (see highlighted code):

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
 annotations:
 notifications.argoproj.io/subscribe.on-sync.gitlab: ""
 name: argocd
spec:
 destination:
 namespace: argocd
 server: https://kubernetes.default.svc
 project: default
 source:
 path: ch03/kustomize-installation
 repoURL: https://github.com/PacktPublishing/ArgoCD-inPractice.git
 targetRevision: main

We need to apply this file manually with the following command:

 kubectl apply -f argocd-app.yaml  

The output should be as follows:

 application.argoproj.io/argocd configured  

With this, we have everything set up so that every change and sync we make on the Argo CD app will start a pipeline in the GitLab project. Here we can modify the argocd-cm configuration diagram we added earlier in this chapter when we discussed Argo CD self-management. After we've pushed our changes to the remote, we should have a pipeline that provides output similar to the following:
image.png
Figure 3.6 - GitLab job output for a pipeline initiated by an Argo CD
notification Long demo where we create a small GitLab pipeline with a job that is triggered via a notification when a failed or successfully executed sync occurs in the Argo CD application. Another option is to periodically query the application sync state from the pipeline as new commits are performed, until it reaches the state we're waiting for, and then have to do what we need. These notifications allow us to embrace the full pull nature of GitOps without having to create workarounds that make it look like a push method.

3.7 Summary

We started this chapter by installing the Argo CD. We chose a cluster managed by a cloud provider because we needed more nodes to experiment with HA deployments. We experienced how Argo CD can update itself and make configuration changes to the installation. While a production Kubernetes cluster is highly available and the cloud provider will manage it for us, there are still scenarios where disaster can happen, so we need to have a working disaster recovery strategy. We saw how to create a backup of the Argo CD state and then restore it in a new cluster.
Observability is an important topic, and we discuss what metrics can be used to monitor Argo CD installations, from OOM container restarts to issues that microservices teams need to be aware of. Finally, we learned how to link the results of the sync to the pipeline so that everything can be automated.
In the next chapter, we will discover how to use Argo CD to bootstrap a new Kubernetes cluster in AWS, including how to set up applications such as external DNS and Istio in the newly created cluster.

3.8 Further reading

To learn more about the topics covered in this chapter, check out the following resources:

Guess you like

Origin blog.csdn.net/github_35631540/article/details/131144744