Based on technology from the research vessel kubernetes practice management platform

A container cloud BACKGROUND

With the popularity of micro-architecture and services, and in conjunction with open source Dubbo Spring Cloud services and other micro-frame, a lot of business letter should line inside a gradual transition from the original monomers gradually to the micro-architecture service architecture. From state to have no application state, specifically the state data traffic such as: session, user data is stored in the middleware service.

Based on technology from the research vessel kubernetes practice management platform

Although the split micro services will reduce the complexity of each service, but the number of service instances preached explosive growth, to increase the difficulty of operation and maintenance, on the one hand is the service deployment, upgrade, on the other hand is the fault monitoring services recovery.

In 2016, container technology, especially Docker quickly became popular, the company began to try to run the container into the container, solves the problem by publishing service container, but the container is still a lot of operation and maintenance operation and maintenance make ends meet. Yixin is a financial technology company, at the time of the introduction of open source components, stable and reliable as most important criteria to consider, kubernetes slowly mature in early 2017, management became standard containers, and is used by many domestic and foreign companies, in which background, letter should learn from the open source community and commercial PAAS platform products, kubernetes from a research vessel based management platform.

Second, the overall architecture

Build the whole infrastructure around kubernetes, divided into four levels, the lowest level is the main foundation of resources, including network, compute, storage, all containers are deployed on physical servers, NAS storage container mounted commerce, network interconnection through vxlan; the middle layer is the core layer resource scheduling, mainly to complete the multi-cluster management, release deployment, intelligent scheduling, auto-scaling, etc. this layer is mainly resource management and service orchestration; the left side is to provide system security, primarily for system security and mirroring security container, right side surface are set automatically compile the code, automatically build, deploy automatic system; middleware layer mainly provides common middleware services, Nginx configuration and monitoring alarms; user access is the uppermost layer, mainly to provide entry operation by the user. Overall architecture as shown below:

Based on technology from the research vessel kubernetes practice management platform

Three, Nginx self-management

Most of the company's services are by Nginx reverse proxy to provide services, in order to isolate and load balancing services, for a total of a dozen Nginx clusters, different versions of these nginx, configuration methods have led to simply rely on manual operation and maintenance the very costly and error-prone, and the IP address of the container is not fixed and can not be directly disposed to the rear end nginx. A set of self-development nginx management system, mainly to solve the template of the nginx configuration, as shown below:

Based on technology from the research vessel kubernetes practice management platform

Nginx-mgr provide HTTP request, nginx configuration is responsible for receiving the request, and updates to etcd, each refresh nginx nginx-agent configuration by watch Etcd batch. In the actual production environment, deploy the open source Tengine Ali rather than nginx, due to the configuration is basically the same no distinction. Each service is configured with a health check, so can guarantee automatically switch back-end failure. If you have virtual machines need to manually switch the scene, following figure shows the page manually switch the nginx:

Based on technology from the research vessel kubernetes practice management platform

As the case of many businesses are virtual machine container mixed and run, if the back-end is a container, we get the vessel through kubernetes the API IP address dynamically refreshed.

Fourth, multi-cluster management

Although kubernetes inherent employs a highly available deployment architecture, avoiding single points of failure, but this is far from enough, partly because a single kubernetes cluster deployed in a room, if room-level failure will result in service interruptions, another Since a single aspect kubernetes cluster failure itself, such as a network configuration error cause the entire cluster network failures, etc., will affect the normal used service, the letter should be deployed in a plurality of rooms kubernetes by a leased line between the engine room. Then the multi-cluster management will be the main difficulties: the first is how to allocate resources, when the user selects the multi-cluster deployment, system resource usage based on each cluster to determine the number of containers assigned to each cluster, and each cluster has at least guarantee a container. When automatic retractable cluster will follow this ratio to create and recycling containers. The second fault migration, as shown in the cluster controller is mainly to solve the container when the automatic retractable multi-cluster and cluster migration failure, the controller detects the timing of the plurality of nodes in a cluster, the cluster will trigger if multiple containers fail to migrate operations, security services and reliable operation.

Based on technology from the research vessel kubernetes practice management platform

第三是网络和存储的互连,由于跨机房的网络需要互连,我们采用vxlan的网络方案实现,存储也是通过专线互连。容器的镜像仓库采用Harbor,多集群之间设置同步策略,并且在每个集群都设置各自的域名解析,分别解析到不同的镜像仓库。

五、DNS解析

由于业务人员对容器技术还存在疑虑,所以大部分应用都是虚拟机和容器的混合部署,容器通过域名访问虚拟机和虚拟机通过域名访问容器都是普遍存在的,为了统一管理域名,我们没有采用kubernetes自带的kube-dns(coreDns)而采用bind提供域名解析。通过kubernetes支持的Default DNS策略将容器的域名指向公司的DNS服务器,并配置域名管理的API动态添加。

Based on technology from the research vessel kubernetes practice management platform

六、网络方案

kubernetes的CNI的网络方案有很多种,主要分为二层、三层和overlay方案。一方面机房并不允许跑BGP协议,并且需要跨机房的主机互连,所以我们采用了flannel的vxlan方案,为了实现跨机房的互通,两个集群的flannel连接到同一个etcd集群,这样保障网络配置的一致性。老版本的Flannel存在很多问题,包括:路由条数过多,ARP表缓存失效等问题。建议修改成网段路由的形式,并且设置ARP规则永久有效,避免因为etcd等故障导致集群网络瘫痪。

Based on technology from the research vessel kubernetes practice management platform

Flannel的使用还需要注意一些配置优化,默认情况下每天都会申请Etcd的租约,如果申请失败会删除etcd网段信息。为了避免网段变化,可以将etcd数据节点的ttl置为0(永不过期);Docker默认是会masq所有离开主机的数据包,导致flannel中无法获取源容器的IP地址,通过设置Ipmasq添加例外,排除目标地址为flannel网段数据包;由于flannel使用vxlan的方式,开启网卡的vxlan offloading对性能提升很高。Flannel本身没有网络隔离,为了实现kubernetes的network policy我们采用canal,它是calico实现kubernetes的网络策略的插件。

七、CICD

为了支持Devops流程,在最初的版本我们尝试使用Jenkins的方式执行代码编译,但Jenkins对多租户的支持比较差。在第二版通过kubernetes的Job机制,每个用户的编译都会启动一个编译的Job,首先会下载用户代码,并根据编译语言选择对应的编译镜像,编译完成后生成执行程序,如果jar或者war文件。通过Dockerfile打成Docker镜像并推送到镜像仓库,通过镜像仓库的webhook触发滚动升级流程。

Based on technology from the research vessel kubernetes practice management platform

八、服务编排

系统设计了应用的逻辑概念,kubernetes虽然有服务的概念,但缺少服务的关联关系,一个完整的应用通常包括前端、后端API、中间件等多个服务,这些服务存在相互调用和制约的关系,通过定义应用的概念,不仅可以做到服务启动先后顺序的控制,还可以统一规划启停一组服务。

Based on technology from the research vessel kubernetes practice management platform

九、日志

容器的日志归集使用公司自研的watchdog日志系统,每台宿主机上通过DaemonSet方式部署日志采集Agent,Agent通过Docker API获取需要采集的容器和日志路径,采集日志并发送到日志中心,日志中心基于elasticsearch开发,提供多维度日志检索和导出。

Based on technology from the research vessel kubernetes practice management platform

十、监控

容器本身资源监控的性能监控通过Cadvisor + Prometheus的方式,容器内业务的监控集成开源的APM监控系统uav(https://github.com/uavorg/uavstack),完成应用的性能监控。uav的链路跟踪基于JavaAgent技术,如果用户部署应用勾选了使用uav监控,系统在构建镜像时将uav的agent植入到镜像内,并修改启动参数

Based on technology from the research vessel kubernetes practice management platform

除了上述几个模块外,系统还集Harbor完成容器镜像的多租户管理和镜像扫描功能;日志审计是记录用户在管理界面的操作,webshell提供用户的web控制台接入,为了支持安全审计,后台会截获用户所有在webshell的操作命令并记录入库;存储管理主要是集成公司商业的NAS存储,为容器直接提供数据共享和持久化;应用商店主要是通过kubernetes的operator提供开发和测试使用的场景中间件服务。

十一、落地实践

11.1 docker不是虚拟机

在容器推广的初期业务开发人员对容器还不是很熟悉,会下意识认为容器就是虚拟机,其实他们不仅是使用方式的区别,更是实现方式和原理的差异,虚拟机是通过模拟硬件指令虚拟出操作系统的硬件环境,而容器是在共享内核的前提下提供资源的隔离和限制。下图展示了4.8内核中linux支持的7种namespace。

Based on technology from the research vessel kubernetes practice management platform

换句话说,其他的都没有差异,譬如,时钟,所有容器和操作系统都共享同一个时钟,如果修改了操作系统的时间,所有容器都时间都会变化。除此之外,容器内proc文件系统也是没有隔离,看到的都是宿主的信息,这给很多应用程序带来困扰,JVM初始的堆大小为内存总量的1/4,如果容器被限制在2G的内存上限,而宿主机通常都是200+G内存,JVM很容易触发OOM, 解决方案通常是启动时根据内存和CPU的限制设置JVM,或者借助lxcfs等。

Cgroup的资源限制目前对网络和磁盘IO的限制比较弱,v1的cgroup只支持direct IO的限制,但实际的生产环境都是些缓存的。目前我们也在测试cgroup v2关于IO的限制。当最新的CNI已经支持网络限速,结合tc可以很好的达到这个效果。

11.2 Kubernetes优化

Kubernetes自带了很多调度算法,在启动容器之前会通过调度的算法,这些算法都是需要过滤所有的节点并打分排序,大大增加了容器的部署时间,通过删除一些无用的调度算法,从而提高部署的速度。容器采用反亲和的策略,降低物理机故障对服务造成的影响。

Although kubernetes opened RBAC, but kubernetes token is not recommended to mount into the container business by closing ServiceAccountToken enhance the security of the system.

Docker image storage using direct-lvm manner, so that better performance, separate vg divided at deployment time, to avoid problems due Docker affect the operating system. By devicemapper storage limit each container system disk is 10G, to avoid depletion of container business host disk space, you need to limit the maximum number of processes each container vessel operation, avoiding fork ×××.

Etcd which records the kubernetes core data, so etcd highly available and scheduled backup is necessary, more than one hundred in kubernetes cluster node, the query speeds will be reduced by SSD can effectively improve the speed. In addition to this system service through the database and save kubernetes

Watch the certificate is valid, the deployment kubernetes cluster, many are self-signed certificate, without specified, openssl default valid for one year, to update the certificate need to be very careful, because the whole kubernetes the API is built based on the certificate All services associated need to be modified.

XII summary

Docker container plus K8S choreography is one of the mainstream practice of cloud container, container Yixin cluster management platform using this program. This article shares some exploration and practice should believe in the containers cloud platform technology. This article contains Nginx self-management, multi-cluster management, DNS resolution, some thoughts network solutions, CICD service orchestration, log monitoring, kubernetes optimize some technical work, as well as internal container should believe cloud platform, of course, we have many shortcomings, Welcome to the heroes letter should conduct in-depth communication and exchange!

Author: Chen Xiaoyu

Source: CreditEase Institute of Technology

Guess you like

Origin blog.51cto.com/14159827/2424791