Four simple things to help improve the deployment process

Four simple things to help improve the deployment process

In all the changes, some content remains the same. These questions are how we can deploy code to production with minimal workload and non-disruptive manner. Secondly, how do we know whether the service is running normally, whether it is running or closed, and if we configure it correctly, will the service behave as expected?

Here are four simple things that can be done in any environment to help improve the deployment process. These will give you better insights and confidence to make your application run and configure correctly.

  1. Application health check
  2. Event notes
  3. Pod: Minimize the impact
  4. Blue-green deployment

Application health check

The first step in improving application deployment and management is to understand whether your application is functioning properly (it is running and able to perform its expected tasks), can talk to downstream services and run the correct version. Obviously, monitoring is crucial, but our monitoring method is the key to using it for automated deployment. In all the places I have worked, we have performed some form of monitoring of applications and databases, but not everyone has performed application health checks.

Recently, at Kountable, we have set the /public/health point on all applications . This health check will tell us about the application. First, whether the application is running normally (started and ready). Second, what version of the code ( commit ) the application is running . Third, the uptime of the application , and finally the connection_status . The connnection_status tells us whether the application can connect to the database or downstream services. If not, then we can check whether this is a network problem, a password problem or a downstream service offline problem? This helps reduce the time and focus of application failures. This is an example of health check output .

{
"healthy": true,
"commit": "1e98e46",
"uptime": "05:22:47:21",
"connection_status": true
}

This health check can be used not only to monitor services, but also as part of the deployment process. Health checks can be used to verify the installed version ( commit ) as well as the health and connection status during a blue-green deployment . If all of these pass, plus other comprehensive tests, we can automatically upgrade the deployment to production.

In the early days of this setup, we deployed services that failed health checks to AWS ECS. The submission ID does not match the ID to be deployed. If you already have an ECS service running, you know that AWS can do its job well, allowing you to deploy a new version of the ECS task in a way that has the least impact on the currently running service. ECS will start a new task, verify the health check endpoint configured in the target group, and only when it passes, will it exhaust the old task and enable the new service. In the past, I have seen many new ECS tasks deployed, and then they are always in a cycle of startup and failure. There is no AWS error on task deployment. The only option is to view CloudWatch logs, and you will see your service starts and stops every minute. May take some time

Through the application health check with the submission ID or version, and the blue-green deployment, we are able to catch deployment failures. The deployment tool verifies the submission ID to be deployed and the health check submission ID. When they do not match, the deployment will stop. This simple setting saves more than 30 minutes of time to identify the problem and avoids the problem being put into production.

Event notes

One trend I have seen over and over again is that when there are no changes to the system, application or environment, there are hardly any problems or interruptions. When I was working at Apigee, in the early days, our customers grew rapidly and the code was released continuously. During this period of rapid development and continuous deployment, we will encounter many problems in production applications. In quiet times, when there is no production deployment, the problem will almost disappear or almost no.

In a constantly changing environment, it is difficult to track all changes. When changes occur, it takes some time to narrow the scope, especially when changes are rolled out over time and globally. One thing I find easy to implement and very helpful is to log the change event and add it to your monitoring system. This can be done easily with deployment tools to update the monitoring system with deployment events.

This is an example where we recently deployed the application and the response time increased immediately. The grafana annotation marks the deployment time, and then you will see the response time peak.

Four simple things to help improve the deployment process

In addition to helping quickly determine the cause, I also found recorded events for any deployment process or other automated process that are easy to implement. I think all changes to the environment (run from configuration management tools, patching, backups and even non-automatic changes) need to be changed.

I have found that adding backup events helps by overlaying the backup window to the system resource usage (CPU, memory, etc.). This is a quick and easy way to see if the backup process is the culprit causing the CPU and memory spikes.

Pod: Minimize the impact

There are many different iterations of the concept of Pods, from data center design, VMware Pods to Kubernetes Pods . Pods can be used or designed in many ways. The key is to design applications and infrastructure to reduce the impact of any failure on some components, customers or services.

When we designed the application and infrastructure together at Apigee, we realized this concept. Working with Engineering in terms of operation, we designed a multi-tenant application to run customers on 2 or more application Pods. For us, a Pod is a set of application services, in which 1 to X customers are assigned to a specific Pod. For example, you might have a Pod for core applications and another Pod for analysis or logging. In AWS settings, you can have application pods by AWS region, and then you can assign customers to pods in all or several regions of the world.

If there is a problem with the Pod in a specific area due to cloud failure, deployment issues or other factors. The impact of this issue will only be isolated to the customers on the Pod in the area. Usually, after deploying customers to multiple regions, they will never notice the problem.

By designing the application and infrastructure together, the greater the possibility of reducing the impact of the problem/blast radius, the better the end result.

Blue-green deployment

Four simple things to help improve the deployment process

Blue-green deployment allows you to run two different versions of the application, while one runs real-time traffic. You can set it up in several different ways. In the past, I have run two versions of applications in ECS, both of which point to the same database.

Your application and database need to be forward and backward compatible. The key to compatibility is your database schema changes. You need to make sure to defer column deletion until neither version requires it.

In order to switch between v1.0.3 or v1.0.5, AWS ALB sets two rules, one rule for blue and the other rule for green. The ALB switches the listener rule from blue to green, and then exhausts all old (blue) connections.

Four simple things to help improve the deployment process

about us

Zeyang, Certified Jenkins Engineer DevOps field practitioner. Focus on the sharing of enterprise-level DevOps operation and maintenance development technology practice, mainly focusing on new Linux operation and maintenance technology and DevOps technology courses. Rich front-line practical experience, the pursuit of practicality in the course has been recognized by most students. The content of the course comes from enterprise applications, where you can not only learn technology but also acquire popular skills, you are welcome!

Class link: https://edu.51cto.com/lecturer/11054706.html

Guess you like

Origin blog.51cto.com/11064706/2540583