One link to Belden, one article to achieve a flexible K8S infrastructure!

This article is transferred from Rancher Labs

About the Author

VIGNESH TV, Timecampus CEO, CTO and founder.

Kubernetes is currently the most popular open source container orchestration platform and has become the first choice for many enterprises to build infrastructure. In this article, we will explore the best way to build the infrastructure for your use case, and the various decisions you may have to make based on various constraints.
 

Architecture design

Your architecture should be designed to a large extent around your use case, so you need to be very careful in the design process to ensure that the infrastructure can support your use case, and you can also seek help from an external professional team when necessary. It is very important to ensure that the direction is right at the beginning of the architecture design, but this does not mean that mistakes will not occur, and as new technologies or research emerge every day, you can see that change has become the norm and your architecture Design thinking may be outdated. 

This is why I strongly recommend that you adopt the principles of Architect for Chang and make your architecture a modular architecture so that you can flexibly make changes internally when needed in the future. 

Let us see how to achieve the goal of the system architecture considering the client-server model. 

Entry point: DNS

In any typical infrastructure (whether it is a cloud-native architecture or not), a message request must first be resolved by a DNS server, and the server's IP address must be returned. Setting up your DNS should be based on the availability you need. If you need higher availability, you may want to distribute your servers to multiple regions or cloud providers. The specific implementation should be based on the level of availability you want to achieve. 
 
Content Delivery Network (CDN)
 
In some cases, you may need to provide services to users with the least delay as much as possible, while reducing the load on the server. This is where the content delivery network (CDN) plays an important role.
Does the client often request a set of static assets from the server? Do you want to increase the speed of delivering content to users while reducing server load? In this case, using edge CDNs to provide services for a group of static assets may actually help reduce user latency and server load. 

Is all your content dynamic? Can you provide users with delayed content to a certain extent to reduce complexity? Or does your application receive very low traffic? In this case, using a CDN may not make much sense. You can send all traffic directly to the global load balancer. But it should be noted that having a CDN does have the advantage of distributing traffic, which is very helpful when your server is subject to DDOS***.
 
CDN providers include Cloudfare CDN, Fastly, Akamai CDN, Stackpath, and your cloud provider may also provide CDN services, such as Google Cloud Platform’s Cloud CDN, AWS’ CloudFront, Microsoft Azure’s Azure CDN, etc.
Insert picture description here

Load Balancer

If there is a request that cannot be served by your CDN, the request will be sent to your load balancer in the next step. These can be regional IPs or global Anycast IPs. In some cases, you can also use a load balancer to manage internal traffic.

In addition to routing and proxying traffic to appropriate back-end services, load balancers can also take on the responsibility of SSL termination, integration with CDN, and even management of certain aspects of network traffic.

Although there are hardware load balancers, software load balancers provide powerful flexibility, reduce costs, and elastic scalability.

Similar to CDN, your cloud provider should also be able to provide you with a load balancer (such as GCP's GLB, AWS's ELB, Azure's ALB, etc.), but what's more interesting is that you can deploy these load balancers directly from Kubernetes Device. For example, creating an Ingress in GKE will also create a GLB for you on the backend to receive traffic. Other functions such as CDN and SSL redirection can also be set by configuring your ingress. Visit the following link to view details:

https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features

Although you always start small at the beginning, a load balancer allows you to gradually scale to an architecture with the following scales:
Insert picture description here

Network and security architecture

 
The next thing to pay attention to is the network. If you want to improve security, you may need a private cluster. There, you can regulate inbound and outbound traffic, block IP addresses behind NATs, isolate networks of multiple subnets on multiple VPCs, etc.
 
How to set up a network usually depends on the degree of flexibility you are looking for and how to implement it. To set up the correct network is to reduce the surface
 

 

 
as much as possible while maintaining normal operation. Protecting your infrastructure by setting up the right network usually involves using a firewall with the right rules and restrictions to restrict the ingress and egress of traffic from various back-end services, including inbound and outbound. In many cases, you can protect these private clusters by setting up a bastion host and performing all operations in the cluster through a tunnel, because what you need to expose to the public network is the bastion (also known as Jump host), usually on the same network as the cluster In the settings. Some cloud providers also provide customized solutions for achieving zero-trust security. For example, GCP provides its users with Identity Awareness Proxy (IAP), which can be used to replace typical implementations.

 
After everything is done, the next step is to set up the network within the cluster itself according to your use case.
This involves the following tasks:
 
Setting up service discovery within the cluster (which can be handled by CoreDNS)

If necessary, set up a service mesh (such as LinkerD, Istio, Consul, etc.)

Set up the Ingress controller and API gateway (for example: Nginx, Ambassador, Kong, Gloo, etc.)

Set up a network plug-in using CNI to facilitate networking within the cluster

Set network policies, adjust the communication between services, and expose services using various service types as needed

Use protocols and tools such as GRPC, Thrift or HTTP to set up inter-service communication between different services

Set up A/B testing, if you use a service mesh like Istio or Linkerd, it can be easier to implement

If you want to see some example implementations, I suggest you take a look at this repo ( https://github.com/terraform-google-modules/cloud-foundation-fabric), it can help users set up all these different networks in GCP Models, including hub and spoke through VPN, internal DNS and Google Private Access, shared VPC supporting GKE, etc., all of which use Terraform.
The interesting thing about the network in cloud computing is that it is not limited to cloud service providers in your region, but can span multiple regions as needed. This is where projects like Kubefed or Crossplane can help.

If you want to explore more about some best practices when setting up VPCs, subnets and overall networks, I suggest you visit the following webpages. The same concept applies to any cloud provider you join:

https://cloud.google.com/solutions/best-practices-vpc-design

Governors

 
If you are using a managed cluster such as GKE, EKS, AKS, Kubernetes is automatically managed, which reduces the complexity of user operations.
 
If you manage Kubernetes yourself, you need to deal with many things, such as backing up and encrypting etcd storage, establishing a network between nodes in the cluster, regularly patching your nodes with the latest version of the operating system patch, and managing cluster upgrades to keep up with the upstream The Kubernetes version remains the same. Based on this, this is only recommended if you have a dedicated team to maintain these things. 

Site Reliability Engineering (SRE)

When you maintain a complex infrastructure, it is very important to have a proper observability stack, so that you can detect errors and predict possible changes before users notice them, identify anomalies, and have the energy to go deeper. Delve into where the problem is.
 
Now, this requires you to have an agent to expose the indicator as a specific tool or application to collect and analyze (the pull or push mechanism can be followed). And if you are using a service mesh with sidecars, they often come with their own indicators, without the need for custom configuration.
 
In any scenario, you can use a tool such as Prometheus as a time series database to collect all metrics for you, and use a tool similar to OpenTelemetry to expose metrics from applications and various tools using a built-in exporter. With the help of tools such as Alertmanager to send notifications and alerts to multiple channels, Grafana will provide a visual dashboard to provide users with complete visibility of the entire infrastructure.
 
In summary, this is Prometheus's solution for observability:
Insert picture description here
Source: https://prometheus.io/docs/introduction/overview/

With such a complex system, you also need to use a log aggregation system so that all logs can flow to one place for easy debugging. Most companies tend to use ELK or EFK stacks, Logstash or FluentD do log aggregation and filtering for you according to your constraints. But there are also new players in the log field, such as Loki and Promtail.
 
The following figure illustrates how a log aggregation system like FluentD can simplify your architecture:
Insert picture description here
Source: https://www.fluentd.org/architecture

But what if you want to track requests that span multiple microservices and tools? This is where distributed tracing comes into play, especially considering the complexity of microservices. Tools like Zipkin and Jaeger have always been pioneers in this field, and the most recent emerging tool to enter this field is Tempo.
 
Although log aggregation will give information from various sources, it may not give the context of the request. This is where tracking is really helpful. But keep in mind that adding traces to your stack will add a lot of overhead to your request, because the context must be propagated between services along with the request.
 
The figure below is a typical distributed tracing architecture:
Insert picture description here
Source: https://www.jaegertracing.io/docs/1.21/architecture/

However, the reliability of a website does not stop at monitoring, visualization and alerting. You must be prepared to deal with any failure of any part of the system, and regularly perform backups and failovers, so that at least the degree of data loss can be minimized. You can do this with tools like Velero.
 
Velero helps you maintain regular backups of various components in the cluster, including your workload, storage, etc., by leveraging the same Kubernetes architecture you use. Velero's architecture is as follows:
Insert picture description here
As you have observed, there is a backup controller that regularly backs up objects and pushes them to specific destinations according to the schedule you set. The frequency is based on the schedule you set. This can be used for failover and migration, because almost all objects are backed up.
 

storage

There are many different storage programs and file systems available, which can vary greatly between cloud providers. This requires standards such as the Container Storage Interface (CSI), which can help most external plug-ins of volumes, so that they are easy to maintain and develop without becoming a core bottleneck.
 
The following figure shows the CSI architecture, which usually supports various volume plug-ins:
Insert picture description here
Source: https://kubernetes.io/blog/2018/08/02/dynamically-expand-volume-with-csi-and-kubernetes/

 
What should we do about the clustering and expansion problems caused by distributed storage? At this time, a file system such as Ceph has proven its capabilities, but considering that Ceph is not built around Kubernetes, there are some difficulties in deployment and management, and projects such as Rook can be considered at this time.
 
Although Rook is not coupled with Ceph and also supports other file systems, such as EdgeFS, NFS, etc., Rook and Ceph CSI are like a match made in heaven. The architecture of Rook and Ceph is as follows:
Insert picture description here
Source: https://rook.io/docs/rook/v1.5/ceph-storage.html

As you can see, Rook takes on the functions of Ceph installation, configuration, and management in the Kubernetes cluster. According to the user's preferences, the following storage is automatically allocated. All this happens without exposing the application to any complicated circumstances.
 

Mirror warehouse

 
The mirror repository provides you with a user interface where you can manage various user accounts, push/pull images, manage quotas, get event notifications through webhooks, perform vulnerability scans, sign the pushed images, and you can also process images or log in Operations such as copying mirrors in multiple mirror warehouses.
 
If you are using a cloud provider, they are likely to have provided a mirror warehouse as a service (such as GCR, ECR, ACR, etc.), which eliminates a lot of complexity. If your cloud provider does not provide it, you can also choose a third-party mirror warehouse, such as Docker Hub, Quay, etc.
 
But what if you want to host your own mirror warehouse?
 
If you want to deploy a mirror warehouse inside the enterprise, want to have more control over itself, or want to reduce the related costs of vulnerability scanning and other operations, then you may need to host it.
 
If this is the case, then choosing a private mirror repository like Harbor will help you. The architecture of Harbor is as follows:
Insert picture description here
Source: https://goharbor.io/docs/1.10/install-config/harbor-ha-helm/
 
Harbor is an OCI-compliant mirror repository, composed of various open source components, including Docker mirror repository V2 Harbor UI, Clair and Notary.
 

CI/CD architecture

 
Kubernetes can host all workloads at any scale, but it also requires a standard way to deploy applications and a streamlined CI/CD workflow. The following figure shows a typical CI/CD pipeline:
Insert picture description here
some third-party services such as Travis CI, Circle CI, Gitlab CI or Github Actions all include their own CI runners. You only need to define the steps in the pipeline you want to build. This usually includes: building an image, scanning the image for possible vulnerabilities, running tests and pushing them to the image repository, and in some cases, a preview environment is required for approval.
 
Now, although if you manage your own CI runner, the steps usually remain the same, but you need to configure them to be set inside or outside the cluster and have the appropriate permissions to push assets to the mirror warehouse.
 

to sum up

 
We have introduced the architecture of cloud native infrastructure based on Kubernetes. As we have seen above, various tools solve different problems of infrastructure. They are like Lego bricks. Each one focuses on a specific current problem and abstracts away a lot of complex things for you.

This allows users to gradually get started with Kubernetes in a gradual manner. And you can use only the tools you need in the entire stack according to your use case.

Guess you like

Origin blog.51cto.com/12462495/2664678