Kubernetes container cloud platform construction practice

[51CTO.com original manuscript] Kubernetes Google open source is a container orchestration engine, which supports automated deployment, large-scale scalable application container management. With the rapid rise of cloud-native technology, and now Kubernetes in fact has become the standard application container platform, more and more enterprises of all ages, in production also applied more widely.

Our container platform from the beginning of 2016, has gone through pre-research to explore, system construction and platform floor three stages.

 

Kubernetes container cloud platform construction practice

 

 

Here from the network, storage, cluster management and monitoring and operation and maintenance aspects Kubernetes to share the history of container cloud platform for building our next walk, we hope to give you some thoughts and inspiration.

A, kubernetes network

The container network development up to now, has been the pattern of planes will. Hutch will actually refers to the Docker's CNM and Google, CoreOS, Kuberenetes led CNI. First be clear, CNM and not CNI networks, they are network specification and a network system, from the perspective of their research and development is a bunch of interface you are using Flannel underlying Ye Hao, Ye Hao with Calico, they do not care, CNM and CNI concern is the problem of network management.

Network needs survey found that business unit focuses on the following points: 1, the network and the physical network open container 2, the faster the better 3, 4 change as little as possible, the least possible risk points.

Network solution container can be divided into the protocol stack level, through the shape, form three isolated manner.

 

Kubernetes container cloud platform construction practice

 

 

Protocol level: second floor is better understood, more common in the engine room before the traditional or virtualized scenarios, is based on the ARP + MAC bridging learning, its biggest flaw is broadcast. Because the second floor of the broadcast will limit the magnitude of nodes; three (pure routing and forwarding), three layers of the protocol stack is generally based on BGP, autonomous learning route state of the entire room. Its biggest advantage is its IP penetration, that is to say as long as the IP-based network, and that this network can go through. Obviously, its size is very advantageous, and has good scalability of the order.

But in the actual deployment process, because most corporate networks controlled. For example, some BGP corporate network is not based on security considerations to use developers or corporate network itself is not BGP, that in this case you are limited by the; stack floor plus three floors, it has the advantage to solve the scale of the problem of pure story, but also solve the problem of pure three-limiting, especially under the cloud of a scene VPC VPC can take advantage of Layer 3 forwarding capabilities across nodes. So, now the actual deployment of network solutions Kubernetes you can see, the story also add three more.

Through Form:

This is related to the actual deployment environment. Through the form is divided into two: Underlay, Overlay.

Underlay: In a network scenario of a controlled good, we generally use Underlay. Such understanding may be popular, whether it is below bare metal or virtual machine, as long as the entire network (physical + virtual) control, the entire network can be passed through directly to the vessel, which is Underlay.

Overlay: Overlay more common in the cloud of the scene. Overlay network VPC Here is controlled, when there is not in the range of jurisdiction IP or the MAC VPC, VPC will not allow this IP / MAC crossing. When this happens, we can do using the Overlay mode.

Overlay network the physical network virtualization, resource pooling, is the key to cloud network integration. Overlay network and the technologies used in conjunction with SDN, the SDN Overlay network control plane controller as a controller, in this way easier to make the network computing component integration, is ideal for the transition to the cloud platform service network.

Isolation method:

Isolation is usually divided into VLAN way and VXLAN two ways:

VLAN: VLAN ones used in the engine room, but in fact there is a problem. It is the total number of tenants is limited. As we all know, VLAN has a number of limitations.

VXLAN: VXLAN is now a more mainstream way of isolation. Because of its size larger is better, and it is based on IP through better way.

We from the protocol stack level, through the form and manner of isolation Kubernetes several common network components (Calico, contiv, flannel, Openshift SDN, custom routing) to do an analysis in the traditional room of networks and cloud VPC network scenarios, with wiring diagrams to express their relationship before.

 

Kubernetes container cloud platform construction practice

 

 

首先无论是传统机房网络还是云化 VPC 网络,我们可以看到 Overlay 方案是通用的,它在云化场景里可能用的更多一些,因为它有很好的穿越性。

在上图中,红线实线指向传统机房网络,这里重点说明下。Underlay + 三层的方案,是传统机房网络非常流行的方案,同时它的性能非常可观,场景应用偏多。Underlay+二层+三层的方案,在云化 VPC 场景(特别是公有云)也是比较主流的一个方案,借助 VPC 的自定义路由完成转发。

绿色虚线指向云化VPC网络, Underlay+三层网络在云化 VPC 场景下,也是可以受限使用。受限使用顾名思义,可以使用但不是每个供应商都让你用,因为每一个云厂商对他自己网络保护的定义不一样。比如像 Calico 方案,它的 BGP 在 AWS 中就容易做,但在 Azure 中就不允许,因为 Azure 的 VPC 本身是不允许不受它管控范围的 IP 通过。

黄色实线指向云化VPC网络,Overlay+二层或三层在云化场景中比较常见。Overlay 下面是受控的 VPC 网络,管控会比较方便。

当然云化VPC场景下也存在一些问题,如下图所示。

 

Kubernetes container cloud platform construction practice

 

 

多租户之间的网络隔离问题

 

Kubernetes container cloud platform construction practice

 

 

K8s从1.3版引入网络策略机制,通过网络策略可实现POD之间的入站和出站访问策略。

网络策略可应用于通过常用标签标识的pod组,然后使用标签来模拟传统的分段网络,可以通过特定的“段”标签来标识前端和后端pod。策略控制这些段之间的流量,甚至控制来自外部源的流量。但并非所有的网络后端都支持策略,比如 flannel。现在很多厂商在这方面加强了研究,也有很多新的解决方案,就不一一列举了。

集群边界Ingress的管理

 

Kubernetes container cloud platform construction practice

 

 

Ingress 是在Kubernetes 1.2版本才出现的,容器应用默认以Service的形式提供服务,但Service仅作用于集群内部,通过Ingress将Service暴露出去才能为集群外的客户端提供服务。

下面对常见的Ingress Controller做一个对比,见下表:

 

Kubernetes container cloud platform construction practice

 

 

二、Kubernetes的存储

K8s最初用于管理无状态的服务,但随着越来越多的应用迁移到K8s平台,管理存储资源成为一个非常重要的功能。

Kubernetes中对于存储的使用主要集中在以下几个方面:

服务的基本配置文件读取、密码密钥管理等;服务的存储状态、数据存取等;不同服务或应用程序间共享数据。大致有以下几个场景:

 

Kubernetes container cloud platform construction practice

 

 

Kubernete存储在设计的时候遵循着Kubernetes的一贯哲学,即声明式(Declarative)架构。同时为了尽可能多地兼容各种存储平台,Kubernetes以in-tree plugin的形式来对接不同的存储系统,满足用户可以根据自己业务的需要使用这些插件给容器提供存储服务。同时兼容用户使用FlexVolume和CSI定制化插件。相比较于Docker Volume,支持的存储功能更加丰富和多样。

Kubernete存储插件解析:

1、in-tree plugin:存储代码与K8s紧密集成,耦合性太强

2、FlexVolume:存储插件安装在宿主机上,需要宿主机的root权限

3、CSI规范:将存储代码与K8s完全解耦(1.10版本及以上,使用CSI attacher使用0.2.0版本)

 

Kubernetes container cloud platform construction practice

 

 

csi规范极大地方便了插件的开发、维护和集成,具有很好的发展前景。

Kubernetes使用两种资源管理存储:

PersistentVolume(简称PV):由管理员添加的一个存储的描述,是一个全局资源,包含存储的类型,存储的大小和访问模式等。它的生命周期独立于Pod,例如当使用它的Pod销毁时对PV没有影响。

PersistentVolumeClaim(简称PVC):是Namespace里的资源,描述对PV的一个请求。请求信息包含存储大小,访问模式等。

PV可以看作可用的存储资源,PVC则是对存储资源的需求,PVC会根据Pod的要求去自动绑定合适的PV给Pod使用。PV和PVC的相互关系遵循下图所示的生命周期。

 

Kubernetes container cloud platform construction practice

 

 

PV模式有静态和动态,静态PV模式管理NFS、FC、ISCSI,动态PV模式管理glusterfs、Cinder、Ceph RBD、Vsphere、ScaleIO、AWS、Azure等。静态的需要管理员创建和管理PV,而动态的则由系统自动生成PV并绑定PVC。

下面再简单补充下Kubernetes中的镜像管理,生产中都会有很多不同版本不同应用的镜像,对镜像的管理也是比较重要的环节。

 

Kubernetes container cloud platform construction practice

 

 

镜像的多租户权限管理:

1、不同租户的镜像应相互隔离

2、不同的租户对镜像拥有不同的权限,例如读写、只读、上传、下载权限

3、镜像库提供镜像的查询、更新和删除等功能

对于跨地域多数据中心的镜像管理,镜像库的远程复制管理需要注意:

1、在多数据中心或跨地域多站点的环境下,为了提高多地区镜像的下载效率,至少需要两级镜像库的设置:总镜像库和子镜像库

2、镜像库之间的准实时增量同步

 

Kubernetes container cloud platform construction practice

 

 

三、Kubernetes集群管理

在生产系统中,Kubernetes多集群的管理主要涉及:

1、服务运维

2、集中配置

3、扩容升级

4、资源配额

首先说说多集群的调度管理

1、Kubernetes中的调度策略可以大致分为两种,一种是全局的调度策略,另一种是运行时调度策略。

2、NODE的隔离与恢复;NODE的扩容;Pod动态扩容和缩放。

3、亲和性可以实现就近部署,增强网络能力实现通信上的就近路由,减少网络的损耗。反亲和性主要是出于高可靠性考虑,尽量分散实例。

4、 微服务依赖,定义启动顺序

5、跨部门应用不混部

6、api网关以及GPU节点应用独占

 

Kubernetes container cloud platform construction practice

 

 

多集群管理中的应用弹性伸缩管理:

1、手工扩缩容:预先知道业务量的变化情况

2、基于CPU使用率的自动扩缩容:v1.1版引入控制器HPA,POD必须设置CPU资源使用率请求

3、基于自定义业务指标的自动扩缩容:v1.7版对HPA重新设计,增加了组件,被称为HPA v2

在实际应用中,HPA还有很多不完善的地方,很多厂商都用自己的监控体系来实现对业务指标的监控并实现自动扩容

Kubernetes多集群的调优:

主要有三个难点:

第一是如何分配资源,当用户选择多集群部署后,系统根据每个集群的资源用量,决定每个集群分配的容器数量,并且保证每个集群至少有一个容器。集群自动伸缩时,也会按照此比例创建和回收容器。

第二是故障迁移,集群控制器主要是为了解决多集群的自动伸缩和集群故障时的容器迁移,控制器定时检测集群的多个节点,如果多次失败后将触发集群容器迁移的操作,保障服务可靠运行。

第三是网络和存储的互连,由于跨机房的网络需要互连,我们采用vxlan的网络方案实现,存储也是通过专线互连。容器的镜像仓库采用Harbor,多集群之间设置同步策略,并且在每个集群都设置各自的域名解析,分别解析到不同的镜像仓库。

 

Kubernetes container cloud platform construction practice

 

 

K8s集群的Master节点高可用实现,我们知道Kubernetes集群的核心是其master node,但目前默认情况下master node只有一个,一旦master node出现问题,Kubernetes集群将陷入“瘫痪”,对集群的管理、Pod的调度等均将无法实施。所以后面出现了一主多从的架构,包括master node、etcd等都可设计高可用的架构。

 

Kubernetes container cloud platform construction practice

 

 

Federation 集群联邦架构

在云计算环境中,服务的作用距离范围从近到远一般可以有:同主机(Host,Node)、跨主机同可用区(Available Zone)、跨可用区同地区(Region)、跨地区同服务商(Cloud Service Provider)、跨云平台。K8s的设计定位是单一集群在同一个地域内,因为同一个地区的网络性能才能满足K8s的调度和计算存储连接要求。而集群联邦(Federation)就是为提供跨Region跨服务商K8s集群服务而设计的,实现业务高可用。

Federation 在1.3版引入,集群联邦federation/v1beta1 API扩展基于DNS服务发现的功能。利用DNS,让POD可以跨集群、透明的解析服务。

1.6版支持级联删除联邦资源,1.8版宣称支持5000节点集群,集群联邦V2

 

Kubernetes container cloud platform construction practice

 

 

目前存在的问题:

1、网络带宽和成本的增加

2、削弱了多集群之间的隔离性

3、成熟度不足,在生产中还没有正式的应用

四、kubernetes的监控与运维

对于一个监控系统而言,常见的监控维度包括:资源监控和应用监控。资源监控是指节点、应用的资源使用情况,在容器场景中就延伸为节点的资源利用率、集群的资源利用率、Pod的资源利用率等。应用监控指的是应用内部指标的监控,例如我们会将应用在线人数进行实时统计,并通过端口进行暴露来实现应用业务级别的监控与告警。那么在Kubernetes中,监控对象会细化为哪些实体呢?

系统组件

Kubernetes集群中内置的组件,包括apiserver、controller-manager、etcd等等。

静态资源实体

主要指节点的资源状态、内核事件等等

动态资源实体

主要指Kubernetes中抽象工作负载的实体,例如Deployment、DaemonSet、Pod等等。

自定义应用

主要指需要应用内部需要定制化的监控数据以及监控指标。

不同容器云监控方案的对比:

 

Kubernetes container cloud platform construction practice

 

 

Prometheus监控:

主要注意两点:

  • 查询api的封装
  • 配置文件的下发

运维的思考---开发与运维的一体化

 

Kubernetes container cloud platform construction practice

 

 

运维的思考---高可用问题

  • Ocp平台:

1、负载均衡Router高可用集群: 2个节点

2、EFK高可用集群: 3个ES节点+n个F节点

3、镜像仓库高可用集群: 2个镜像仓库

  • 微服务架构:

1、注册中心高可用集群(Eureka): 3个

2、配置中心高可用集群: 3个

3、网关高可用集群: 2个

4、关键微服务均是高可用集群

运维的思考---高并发问题

  • Ocp平台:

1、对后端微服务(Pod)配置弹性扩容, K8的弹性伸缩扩容以及Docker容器的秒级启动可以支撑用户量的持续增长;

2、提前预留20%的资源, 当高并发情况发生时, 可以紧急扩充资源。

  • 微服务架构:

1、调大关键链路微服务的熔断线程数: 提升主要业务的并发响应能力。

2、对非关键链路微服务通过熔断限流进行降级处理甚至关闭非关键链路的微服务。

3、熔断机制:提升容器云高并发场景下的容错能力,防止故障级联以及微服务的雪崩效应,提升系统的可用性。

  • 中间件:

1, in addition to the cluster is being used, the increase in advance of cold standby cluster.

2, when high concurrency scenarios imminent, may extend the emergency level.

Finally, the road to cloud container summary

1, the operational level: large enterprises because of business stability and continuity have relatively high demand, so the evolution path of the vessel must be operational from the edge to the core business, from simple applications to complex applications, specific to the business, first of all you can consider a container migration in Web front-end, back-end business last move.

2, the technical level: Native Docker currently in service discovery, load balancing, container lifecycle management, inter-vessel network, storage and other aspects there are still many deficiencies, open source solutions and commercial versions of many third-party manufacturers to provide distinctive , hard to compete. Regardless of the user product selection, reliability, flexibility two important factors need to be carefully considered.

3, taking into account cost-effectiveness: considering the balance between the container and the cost of paying the cost of future benefits.

4, consider the existing hardware load capacity of the container is not a panacea, some higher throughput requirements for concurrent operations run directly on the bare metal, to improve performance by optimizing the container transfer system may not be the most Good choice.

5, continuously updated, always remind myself continuous learning, embrace change, so as to see the lack of platform, continuously iterative better products.

In the production practice, only reinforce the foundation to continue to improve the eco-system-based construction products and container cloud platform, to control the future of a thousand miles!

[51CTO original manuscripts, site reproduced please indicate the original author and source for the 51CTO.com]

Guess you like

Origin www.cnblogs.com/johnnyblog/p/11372645.html