What can I do to increase the availability of applications?

table of Contents

Item # 1-- stateful service

Item # tightly coupled 2--

Item # 3-- enable scale-out

Item # 4-- set out of each part of the application

Item 5, the establishment of automatic deployment pipeline

Item # 6-- surveillance, monitoring, surveillance

Item # 7-- control configuration

Item # 8-- eliminate environmental "uniqueness"

Item # 9-- start for availability mode

Project 10-- remember that this is an evolving process of


In this article, we explore why should be avoided or deleted from the following applications: state service, tightly coupled, horizontal expansion is enabled, the settings on each section out of an application to construct the pipeline automatically deploy, monitor, control and configure the environment the " uniqueness " .

So, I have completed several usability and flexibility of articles about how to improve the application, and in the past few months I have come to a conclusion. For many people, this is a very important topic. I have discussed in previous articles of what it means to go this route, and how solutions for the availability design.

But another problem comes is that I should do the type of things in order to achieve greater availability and flexibility in the application? How to upgrade legacy applications to improve resilience? As a developer, how can I do to keep this in mind?

So, I think you should find a summary list of the contents of the application, these changes will increase your availability / resiliency, and / or if you want while increasing application of these two factors, you should avoid these things .

So let's start with the two terms I continue to use, because I found that both methods can be improved, and only when we are talking about the same thing will happen.

  • Availability ( Availability )  - is the ability of your application can continue to run even if a key function in significant downtime. So, it can affect the ability to continue to provide services with minimal user.
  • Elastic ( the Resiliency  )  - is when your application even in the event of a major failure or transient current handling capacity can continue to work. Therefore, the completion of the current work in progress.

Therefore, further studies, the question becomes what I should avoid or what type of deleted from the application in order to improve my position forward.

Item # 1 - at state service

Overall, this is a key element to eliminate the problem of availability and resiliency, and may be a controversial issue, but here, I want to talk about this. If the service has a certain status (in memory or other state), it means to me, failover to other places more difficult. I knew I had to copy the state, if done in memory, it will become very difficult. If it like SQL or Redis independent store like this, it will become easier, but at the same time requires additional complexity, which makes it more difficult this form of availability. When you are in the SLA add the "9" when, in particular. Thus, in general, it is avoided if the application components dependent on the state, it is the best.

In addition, there are state services can also cause other problems in the cloud, including restrictions on the ability to grow and expand as demand is. The perfect example is the " sticky sessions " , which means that once the route to your server, we will continue to send you to the same server. This is the opposite of scale, should be avoided at all costs.

If you are dealing with legacy applications, and delete status is not possible, then at least make sure to state management in addition to memory. For example, if you are unable to delete the conversation, move it to SQL and copy.

Item # 2 - Work closely coupling

This points to two key elements I have outlined above. When the tight coupling between application components, you will create some good will ultimately fail and can not extend something. It prevents the ability to build extensions can be a good solution.

Let us take a common example, suppose you have a program on your application API layer and the API with your UI front-end with a built-in Web project. Then, the API dialogue directly with the database.

This is a very common traditional model. This problem is caused by loading Web applications and back-end demand API tightly coupled, and therefore, a failure means another failure.

Now, let's further say that you will API exposed to the outside world (follow safe practices, so that your application can be extended. Sounds good, right.

In addition to in-depth understanding, by all application elements communicate directly with each other, you can now create a program in which the cascading failures completely destroys your application.

For example, your customer to decide quite hard to use your API , each 30 many of them decide to use your data to extract a complete dump seconds, or you sign up for API customers. It leads to the following effects:

  1. Of API demand led to an increase in Web memory and on the level CPU consumption increases.
  2. This causes a problem of application performance load the page.
  3. This can cause intermittent region, resulting in for API higher transaction requirements of SQL requirements. For SQL demand increases cause your application encounters a resource deadlock.
  4. When an application fails, these resources can lead to other problems deadlock user experience.

Now you may be thinking, yes, Kevin, but I can enable automatic scaling in the cloud, it can solve all these problems. My answer is, at the same time, your bill appeared uncontrolled inflation. So, obviously, your CFO of costs and runaway inflation is acceptable to offset a bad practice.

A situation we can solve this problem is that the API split into a separate computing layers, so that we can manage calculated separately, rather wantonly extended to solve the problem. Then I have my permit application extension of the individual options.

Also, I can queue as load balancing practice to achieve, which only allows you to extend the queue depth extended to the case beyond a reasonable response time of the next application. I can also limit from API requests or prioritize messages from the application. Then I can copy the message queue in order to provide greater flexibility.

Item # 3 - Enable Scale

Now I know, I just think scale is very bad, but the key part is " controlled " . I mean, by making service to become stateless, and the practice of decoupling implementation, you can create one or more services can run a copy of the program, from the flexibility and usability point of view, this can bring a variety of benefits. It will service from your pet into a cow, you no longer care whether one down, because another has replaced it. It is a bit like a Hydra, which is a good way of thinking.

Item # 4 - will be set out of each part of the application

The closer your settings and application code connected, the greater the difficulty of change. If your code is tightly coupled, and need to be deployed to make configuration changes, then this means that you need to change the endpoint, it will become increasingly difficult. Therefore, the best thing you can do is start the configuration settings that will be removed from the application. Regardless of how you look, this is an important thing. For the following reasons:

  • Safety
  • Maintainability
  • automation
  • Change Management

Item 5 - Create automatic deployment pipeline

In many cases, automation is the key to high availability, especially if you get a higher 9s time level. The simple fact is the number of seconds.

Moreover, automatic deployment also helps manage configuration drift, the simple fact is that the configuration drift, the more difficult to maintain a secondary zone, because you must be managed to ensure a region of nothing. By forcing everything to eliminate this situation by automating the deployment pipeline. If each change must be scripted and automated, it is almost impossible to see happening in the environment configuration drift.

Item # 6 - Monitoring, surveillance, monitoring

Another element of high availability and flexibility are monitored. If you asked me a few years ago, most developers think that the problem is the first thing, and that is " How do I ensure this? " Although this is a problem, but many developers still some way to view as an afterthought, but the bigger question is , " how do I monitor and know whether it works? " the rise in view of the micro-server computing service and no, we do need to be able to monitor each piece of code that we deploy. Therefore, we need to understand that you build all new products in order to answer this question.

This may be as simple as a custom telemetry recorded in the Application Insights in, or record incoming and outgoing requests, record abnormalities. However, if you do not achieve these targets, we can not ensure that certain programs are running.

Item # 7 - Control Configuration

This is what I made on the basis of the above comments. The biggest mistake I see people encountered on this type of implementation is that they do not manage how to make configuration changes to the environment. Eventually, this led to the " pet " VS " cow " mentality. My career had a boss, his office above a banner that read: " Server is a cow, not a pet ...... . Sometimes you have to make a hamburger ."

And the above statement as interesting is that it has an element of truth. If you are allowed to change the configuration and revision applied directly to the environment, will lead to not rely on any trust in automation. Moreover, it enables the monitoring and all the other elements of the real high-availability or resiliency architecture completely unrelated.

Therefore, the best thing you can do is to use automated pipeline, and if necessary, make any changes, it must be pushed through the pipe, ideally, in addition to reading metrics and logging, the best so that people can not access Production Environment.

Item # 8 - Eliminates environment " uniqueness "

As above, we need to ensure that all content environments are repeatable. In theory, I should be able to destroy the environment, the click of a button, you can deploy a new environment. This can only be done by writing all the scripts. I terraform a big fan, you can help solve this problem, but bash scripting, PowerShell , cli , help yourself.

The more you can delete the content, the easier it is to copy it and create at least one active / passive environment.

Item # 9 - Start mode for availability

If you are along the way to achieve more practical to enhance the flexibility of the application to go, then you should consider the following practices: when building new services, which will help create a flexible type to be built. These modes include:

  • Endpoint health monitoring - to achieve functional check in your application to make sure you can use external tools to help.
  • Queue-based load balancing - use acts like a buffer queue, or in a more flexible manner how to process incoming requests placed abstraction layer between the application.
  • Throttle - This mode helps manage resource consumption, so that you can meet the system requirements while controlling consumption.
  • Breaker - In my experience, this model is very valuable. Your service should be smart enough to use incremental retry and return in the downstream services affected.
  • Bulk head - This mode utilizes separation and attention to ensure fault tolerance because of a service shutdown caused by the application is not closed.
  • Compensating transactions - very important if you are using any type of partitions or fault tolerance, or have separate concerns, then the transaction is rolled back to its original state.
  • Retry - the most basic model to be achieved, it is essential for the establishment of a transient fault tolerance.

Item 10 - remember this is an evolving process

As previously described herein, the intent here is, if you want to build more cloud-based functions, and thus increase the flexibility of the application, then the best advice I can offer is to remember that this is an iterative process and look for opportunities to update your application and improve flexibility.

For example, suppose I have to send notifications API changes. If you want to make these updates, maybe I can achieve queues, logs and make some changes in order to extend its services to micro to improve flexibility. When you do this, you will find the location of the application will be improved.

Published 71 original articles · won praise 152 · views 520 000 +

Guess you like

Origin blog.csdn.net/mzl87/article/details/105106286