如何通过高级流量管理提高 Kubernetes 的弹性

原文作者:Jenn Gile - F5 NGINX 产品营销经理

原文链接:如何通过高级流量管理提高 Kubernetes 的弹性

转载来源:NGINX 中文官网


NGINX 唯一中文官方社区 ,尽在 nginx.org.cn

编者按 —— 本文是以下系列博文中的一篇(共十篇):

  1. 生产级 Kubernetes 助您降低复杂性

  2. 如何通过高级流量管理提高 Kubernetes 的弹性(本文)

  3. 如何提高 Kubernetes 环境的可视性

  4. 使用流量管理工具保护 Kubernetes 的六种方法

  5. Ingress Controller 选型指南,第一部分:确定需求

  6. Ingress Controller 选型指南,第二部分:评估风险和技术前瞻性

  7. Ingress Controller 选型指南,第三部分:开源、默认和商用版本能力对比

  8. Ingress Controller 选型指南,第四部分:NGINX Ingress Controller 选项

  9. 如何选择 Service Mesh

  10. NGINX Ingress Controller 在动态 Kubernetes 云环境中的性能测试

您还可以免费下载整套博文集结成的电子书:Kubernetes:从测试到生产》

如何判断一家公司是否成功使用了现代应用开发技术呢?很简单,看看客户有没有在社交媒体上抱怨就知道了。客户可能会抱怨新片看不了,网银登不上去,购物车超时无法下单。

1.png(我正要看一个我租借的电影,但是老是弹出“视频播放错误”的提示,并且网站帮助页面的网址也打不开。我的App是最新版本,我试过了退出重新登陆,也重启过我的网络电视。求个解决方案?)

即使他们没有公开抱怨,也不代表就没有问题。我们的一位客户 —— 一家大型保险公司曾告诉我们,如果他们未能在仅仅 3 秒内加载出主页,便会面临客户流失的问题。

用户对性能差或宕机问题的所有抱怨都指向了一个共同的元凶:缺乏弹性。微服务技术(包括容器和 Kubernetes)好就好在它能提高应用弹性,显著改善客户体验。为什么呢?这一切得从架构说起。

微服务架构与单体架构存在本质上的区别。打个比方,在老式的节日灯串中,只要有一个灯坏了,整串就都不亮了。如果不换灯泡,那就只能将整串丢掉。单体应用就像这种老式灯串一样,其组件紧密耦合在一起,一损俱损。

但是,照明行业和软件行业都发现了这个痛点。如果现代灯串上的某个灯泡发生故障,其他灯泡仍会继续照明。同样,对于设计良好的微服务应用而言,即使个别服务实例发生故障,整个应用也会继续运行。

Kubernetes 流量管理

容器因其轻便、可移植和易于扩展的特点,非常适合需要使用小型独立组件构建应用的场景,因而成为了微服务架构中颇受欢迎的一种选择。Kubernetes 已成为容器编排的事实标准,但若投入生产环境,还面临着许多挑战。要想增强 Kubernetes 应用的可控性和弹性,成熟的流量管理策略是一个重要因素,它能够让您控制服务而不是数据包,并动态地或使用 Kubernetes API 调整流量管理规则。流量管理在任何架构中都很重要,但对于高性能应用来说,有两个流量管理工具是必不可少的:流量控制流量精分

流量控制

流量控制(有时称为“流量路由”“流量整形”)是指控制流量去向及其传输方式的行为。在生产环境中运行 Kubernetes 是必须的,因为它可以保护您的基础架构和应用免遭攻击和流量激增。在应用开发过程中,您需要采用“速率限制”“断路”这两种技术。

  • 用例:我不希望服务接收过多请求
    解决方案:速率限制

    无论 HTTP 请求是恶意的(例如暴力破解密码和 DDoS 攻击)还是正常的(例如顾客蜂拥抢购),大量的 HTTP 请求都会导致服务瘫痪和应用崩溃。速率限制技术可限制用户在给定时间段内的请求数量。这些请求可能非常简单,例如访问网站主页的 GET 请求,或是登录页面上的 POST 请求。举例来说,当受到 DDoS 攻击时,您可以使用速率限制将传入的请求速率限制为真实用户的合理数值。

  • 用例:我希望避免出现级联故障
    解决方案:断路

    当服务不可用或出现高延迟时,传入请求的超时时间以及客户端收到错误响应的时间可能很长。长超时可能会造成级联故障,即一个服务中断导致其他服务超时,最终引发整个应用故障。

    断路器可监控服务故障,防止发生级联故障。当服务请求的失败次数超过预先设定的阈值时,将触发断路器,断路器一收到请求就向客户端返回错误响应,从而对服务进行限流。

    断路器将持续拦截和拒绝请求,等过了指定的时长后再放行有限数量的请求以作测试。如果这些请求成功,断路器将停止限流。如果不成功,则开始重新计时,期间断路器继续拒绝请求。

流量精分

流量精分(有时称为“流量测试”)是流量控制的一个子类,目的是为了控制传入流量的比例,这些流量会被导向到在环境中同时运行的不同版本的后端应用(通常为当前的生产版本和新版本)。流量精分是应用开发周期中的重要一环,允许团队在不影响客户的情况下测试新特性和版本的功能和稳定性。实用的部署场景包括调试路由灰度部署A/B 测试 和 蓝绿部署。(业界对这四个术语的使用存在很大分歧,此处先按我们理解的定义来使用它们。)

  • 用例:我准备在生产环境中测试新版本
    解决方案:调试路由

    Let's say you have a banking app and you want to add a credit scoring feature to it. Before doing customer testing, you might want to see how it performs in a production environment. Debug routing allows you to deploy new functionality publicly while "hiding" it from real users, allowing only specific users to access it based on Layer 7 attributes such as session cookie, session ID, or group ID. For example, you could only allow access to users who have managed session cookies - their requests will be routed to the new version with credit scoring functionality, while other users continue to access the stable version.

  • Use case:I need to ensure the performance of the new version is stable

    Solution:Grayscale deployment (also known as "canary deployment")

    The concept of canary deployment comes from a long-standing mining practice. At that time, miners put canaries in cages into the mines and evacuated them once they discovered that the canaries were poisoned. Therefore, canaries are "gas alarm birds." In the world of apps, there are no more victims. Grayscale deployment provides a safe and agile way to test the stability of new features or versions. A typical grayscale deployment is to first let the vast majority (say 99%) of users use the stable version, and then move a small number of users (the remaining 1%) to the new version. If something goes wrong with the new version (such as crashing or returning an error to the client), you can immediately transfer test users back to the stable version. If the new version runs smoothly, you can migrate users from the stable version to the new version all at once or gradually (more commonly) in a controlled manner.

4.png

  • Use case:I need to know if the new version is more popular with customers than the current version
    Solution:A/B testing

    Once you've confirmed that your new feature is working correctly in production, you may also want to get customer feedback on the feature, including key performance indicators (KPIs) such as clicks, repeat customer ratio, or explicit ratings. Many industries use A/B testing processes to measure and compare user behavior with the goal of determining how popular different product or app versions are among customer groups. In a typical A/B test, 50% of users visit version A (the current version of your app), and the remaining 50% visit version B (the version with stable new features). The version with the highest overall KPI score will win.

5.png

  • Use case:I want to move users to a new version without downtime
    Solution:Blue-green deployment

    Now, let’s say your banking app is about to undergo a major version change. Congratulations! In the past, version upgrades often meant user downtime because you had to remove the old version from production before you could push a new version. But in today's competitive environment, downtime for upgrades is unacceptable to most users. Blue-green deployment greatly reduces or even eliminates upgrade downtime. You can continue to run the old version (blue) in a production environment while deploying the new version (green) in that production environment.

    Most enterprises are reluctant to move all users from blue to green at once, after all, what if the green version fails? ! The solution is to use grayscale deployment to move users in an incremental manner that best fits your risk avoidance strategy. If the new version is a disaster, you can easily move everyone back to stable with just a few keystrokes.

How NGINX can help you

Most Ingress controllers and service mesh (service mesh) can help you achieve advanced traffic control and segmentation. Which technology to use depends on your application architecture and use cases. For example, the Ingress controller is suitable for the following three scenarios:

  • Your app has only one endpoint, just like a simple app or monolith you migrate to Kubernetes.

  • There is no service-to-service communication in the cluster.

  • There is service-to-service communication in the cluster, but you are not using service mesh yet.

If your deployment is complex enough to use a service mesh, a common use case is to test or upgrade individual microservices by splitting service traffic. For example, you might want to do a grayscale deployment between two different versions of the geolocation microservice API on the mobile front end.

However, some ingress controllers and service meshes are not only time-consuming to set up traffic classification, but also error-prone for many reasons:

  • Ingress controllers and service meshes from different vendors have different implementations of Kubernetes functions.

  • Kubernetes is not designed to manage and understand Layer 7 traffic.

  • Some ingress controllers and service meshes do not support complex traffic management.

With NGINX Ingress Controller and NGINX Service Mesh you Easily configure robust traffic routing and segmentation policies in seconds.

NGINX Ingress resources and SMI specifications help you simplify configuration

The following NGINX features simplify the configuration process:

  • NGINX Ingress Resources for NGINX Ingress Controller — While standard Kubernetes Ingress resources simplify configuration of SSL/TLS termination, HTTP load balancing, and Layer 7 routing, it Does not have customization capabilities required for circuit breaking, A/B testing, and blue-green deployment. As a result, non-NGINX users must resort to annotations, ConfigMap, and custom templates, but they all lack fine-grained control and are insecure, error-prone, and difficult to use.

    The NGINX Ingress Controller comes with the NGINX Ingress resource as an alternative to the standard Ingress resource (which is also supported). This resource uses a native, type-safe indented configuration style to simplify the implementation of Ingress load balancing. NGINX Ingress resources have an added benefit for existing NGINX users: they simplify the reuse of load balancing configurations in non-Kubernetes environments, allowing all NGINX load balancers to use the same load balancing configuration.

  • NGINX Service Mesh following SMI — NGINX Service Mesh implements Service Mesh Interface (SMI) — SMI is aspecification a>. Defines the standard interface for Service Mesh running on Kubernetes, with TrafficSplit, TrafficTarget and HTTPRouteGroup. and other typed resources. Leveraging standard Kubernetes configuration methods, NGINX Service Mesh and the NGINX SMI extension simplify the deployment of traffic segmentation strategies, such as grayscale deployment, and maximize Minimize disruption to production flow. The following is an example of using NGINX Service Mesh to define a grayscale deployment:

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
name: target-ts
spec:
service: target-svc
backends:
- service: target-v1-0
    weight: 90
- service: target-v2-0
    weight: 10

Our tutorial "Deployment using traffic fine-graining" introduces examples of deployment modes using traffic fine-graining, including grayscale deployment and Blue-green deployment.

Achieve more complex flow control and flow refinement with advanced customization

The following features of NGINX simplify traffic control and traffic segmentation in a more advanced way:

  • Key-value storage for grayscale deployment - When you perform A/B testing or blue-green deployment, you may want to use specific increments (such as 0%, 5%, 10%, 25%, 50%, and 100%) to transition traffic to the new version. Most tools require a lot of manual effort because you have to edit the percentages and reload the configuration file for each increment. With a sizeable workload, you might be tempted to go directly from 5% to 100%. However, with the NGINX Ingress Controller based on NGINX Plus, you canchange the percentage using a key-value store without reloading the configuration file.

  • Circuit Breaking via NGINX Ingress Controller - Advanced circuit breaker detects failures and failover faster, and even activates custom formatted error pages for unhealthy upstream services Helping you save time and increase flexibility. For example, a search service's circuit breaker might return a set of well-formatted but empty search results. In order to achieve this effect, the NGINX Ingress Controller based on NGINX Plus adopts active health check to proactively monitor the health status of TCP and UDP upstream servers. Because monitoring is real-time, your clients are less likely to encounter application errors.

  • errors ——The number of errors before the circuit breaker triggers

  • timeoutSeconds ——The error window before the circuit breaker triggers, and the waiting time before the circuit breaker closes

  • fallback ——oThe name and port of the Kubernetes service to which traffic is rerouted after the circuit breaker is triggered.

errors and timeoutSeconds are standard circuit breaker functions, while fallback allows you to define backup servers, further improving flexibility. If your backup server responses are unique, they can serve as early indicators of cluster failure, allowing you to initiate troubleshooting immediately.

Interpret traffic segmentation results with the help of dashboards

Now that you’ve achieved traffic segmentation…what’s next? Next we should analyze the traffic segmentation results. This can be the hardest part, as many enterprises lack critical insights into Kubernetes traffic and application operations. NGINX can help you make it easier by graphically displaying the metrics exposed by Prometheus Exporter through the NGINX Plus Dashboard and the pre-built Grafana dashboard. to obtain insightful information. To learn more about how to improve visibility and gain insights, read our blog post How to improve visibility into your Kubernetes environment.

NGINX helps you easily control microservices

The NGINX Ingress Controller based on NGINX Plus offers a 30-day free trial, which includes the ability to protect containerized applications < a i=3>NGINX App Protect.

To try out the NGINX open source version of the NGINX Ingress Controller, you can obtain the release source code or from DockerHubDownload pre-built containers.

You can go to f5.com download NGINX Service Mesh, which is always free.


NGINX’s only official Chinese community, all atnginx.org.cn 

More technical information, interactive Q&A, series of courses, and event resources related to NGINX: Open source community official website | WeChat public account

IntelliJ IDEA 2023.3 & JetBrains annual major version update New concept "defensive programming": make yourself a stable job GitHub .com runs more than 1,200 MySQL hosts, how to seamlessly upgrade to 8.0? Stephen Chow’s Web3 team will launch an independent app next month Will Firefox be eliminated? Visual Studio Code 1.85 released, floating window US CISA recommends abandoning C/C++ to eliminate memory security vulnerabilities Yu Chengdong: Huawei Disruptive products will be launched next year and rewrite the history of the industry TIOBE December: C# is expected to become the programming language of the year Lei Jun’s paper written 30 years ago: "Computer Virus Determination Expert System Principles and Design
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5246775/blog/10116929