Article for ants gold dress this year KubeCon all heavy share

June 24, domestic cloud most important meeting of native areas impending onslaught! KubeCon + CloudNativeCon + Open Source Summit China 2019 will be held in Shanghai, the ants gold dress will be heavily involved, be shared by a number of technical experts and organize workshop, presented feast technology for the participants.

The conference, focused on ants gold dress will share management Kubernetes cluster, the depth of learning tasks and tuning large-scale deployment on Kubernetes, the Internet finance, security and other topics at the forefront of the container. Since 2016, the beginning of ants gold dress depth using Kubernetes, and as an end user is CNCF case officially recommended:

2

Currently, ants gold dress not only in the cloud around CNCF native open source technologies to contribute, also open their financial class cloud native distributed solution SOFAStack, the conference on ants gold dress Workshop will show the form to use SOFAStack fast Implementation Service Mesh and Serverless, welcome attention.

Share specific contents are as follows:

List of topics

1, hosted by the CPU and GPU workloads, efficient use of resources

  • Ants gold service platform Data Technology Systems technical experts Cenpeng Hao
  • Ali cloud platform container senior technical experts He Jian

About topics
This lecture describes how the AI training mission and long service department on Kubernetes mixed cluster. The main object is achieved by mixing various portions workload improve resource utilization, thereby saving resources. We will from different dimensions, including Qos class, cgroup, scheduling and so on to describe how we achieve mixing section, as well as how to assess utilization. Over the past few months, we built a few hundred nodes GPU and CPU mixing unit cluster, we will introduce best practices in the production of cluster deployment of mixing and long service AI batch tasks.

2, no confusion: mass Kubernetes Audit and Inspection

  • Ali cloud container platform technology expert Chen Jie
  • Ants gold dress senior development engineer Ma Jinjing

议题简介
众所周知,准确的异常发现和快速的问题分析是保证 Kubernetes 集群可用性和稳定性的关键所在。但在整个 Kubernetes 项目中,有着不计其数的监控指标数据。仅以我们的 Kubernetes 集群为例,我们观察到像这样的监控数据每秒钟就会产生几千条。如何合理地利用这些复杂而大量的数据和指标,对它们有效的进行记录和分析,变成简单易懂的可视化展示,变成准确的告警信息,是一个非常有挑战性的工作。

在这个演讲中,我们希望与大家分享在 Alibaba 在 Kubernetes 集群监控、审计和巡检方面的实践和经验。首先,我们会聊一聊 Kubernetes 与稳定性相关的重要数据和指标,以及如何去理解它们。我们会以案例的形式,具体讲一讲我们如何对这些数据和指标进行整合与解析。最后,我们会分享阿里巴巴高效、实时的对这些数据进行自动化巡检与分析的最佳实践。

3、有效可靠地管理大规模 Kubernetes 集群

  • 蚂蚁金服高级开发工程师 张勇
  • 蚂蚁金服技术专家 林志贤

议题简介
随着业务的增长,我们需要将 Kubernetets 部署到世界各地的多个数据中心。单个数据中心中就拥有超过数万个节点。我们面临的关键挑战是如何高效、可靠地在数据中心内管理多个大规模 Kubernetes 集群。

在本次演讲中,我们将分享实现大规模集群管理自动化的经验和实践。首先,我们将介绍全自动化节点生命周期管理,以及如何基于 NPD、Autoscaler 和自定义运算符自动发现和恢复节点故障。然后,我们将分享部署和升级 Kubernetes 集群的经验和解决方案。最后,我们将分享基于 Prometheus 和运算符的风险防控系统,该系统可确保集群可靠性,具有自动故障检测和隔离的能力。

4、为互联网金融关键任务场景拓展部署

  • 蚂蚁金服高级开发工程师 周梦伊
  • 蚂蚁金服技术专家 吴珂

议题简介
默认部署方法为执行常规版本升级提供了一种良好的解决方案。但是,将高可用性和可靠性的大规模服务部署为互联网金融应用尚且另当别论,更不用说这种工作负载在现有操作系统和维护系统下所面临的兼容性问题了。

蚂蚁金服引入的新工作负载可让这些问题迎刃而解。它能够通过可靠而灵活的分发、风险控制的部署策略以及高性能的就地更新扩展部署能力。它尤其消除了金融服务行业所面临的技术障碍,使开发商和运营商能够专心发展核心业务。

5、Kubernetes 集群的大规模分布式深度学习

  • 蚂蚁金服技术专家 唐源
  • Director of Engineering, MobileIron Yong Tang

议题简介
本次演讲的重点是在 Kubernetes 上部署大规模分布式深度学习。此外,还将介绍如何通过使用运算符来管理和并实现机器学习训练过程自动化。我们将分享我们的经验,并比较两个开源 Kubernetes 运算符:tf-operator 和 mpi-operator。这两个运算符都为 TensorFlow 管理训练任务,但有着不同的分配策略,这就造成了 CPU、GPU 和网络利用率方面的不同性能结果。

深度学习任务既是网络密集型又是 GPU 密集型,因此对编排进行适当优化非常重要。易发的不平衡会导致闲置计算容量,这对于 GPU 节点来说成本太高昂了(与 CPU 相比)。我们将分享我们的经验,希望可提供有用的洞察,帮助从机器学习任务中获得更好的经济效益。

6、推介会:SIG Cluster 生命周期

  • 蚂蚁金服高级研发工程师 徐迪
  • Cloud Software Architect, Intel Alexander Kanevskiy

议题简介
Sig-Cluster-Lifecycle Intro群集生命周期SIG是一个专注于群集部署和升级的特别兴趣小组。我们的SIG正在努力改善用户体验,以引导符合最佳实践的最小可行Kubernetes集群。使用我们的主要安装工具kubeadm,可以很好地管理简化的安装和升级过程。我们最近推出了一个名为Cluster API的新Kubernetes对象,它将声明式Kubernetes风格的API引入群集创建,配置和管理。在本次介绍会上,我们将介绍SIG的使命陈述,审核最新更新,并讨论我们的路线图。还介绍了一些新的生命周期项目。非常欢迎您加入我们的SIG并为其做出贡献。

7、安全沙箱是否已生产就绪?Kata 容器、gVisor 等

  • 蚂蚁金服资深技术专家 王旭
  • 蚂蚁金服技术专家 李福攀

议题简介
在 KubeCon NA 2018 上,我们对 Kata 容器和 gVisor 进行了定量比较,当时我们展示了对 Kata 而言合理的 CPU/网络性能、文件系统存储的性能损失、Kata 的内存消耗以及 gVisor 的系统调用开销等。

活动结束后,Kata 容器发布了版本 1.5,支持轻量级管理程序(Nemu 和 FireCracker)。当时我们还介绍了用于文件系统共享的 virtio-fs,它可以提供更好的 POSIX 兼容性和性能。Virtio-fs 能够与 shimv2 进行无缝的容器化集成,看似能够在 2019 年为 Kubernetes 提供更出色的生产就绪型安全沙箱支持。

在本次演讲中,我们将展示使用更新的测试套件对新推出的技术进行的基准测试,并帮助用户了解它们是否已生产就绪。

SOFAStack Cloud Native Workshop

Service Mesh inter-service communication capability to sink to the infrastructure that enables applications decoupling and lightweight. But the complexity of the Service Mesh itself remains, how easy practice Service Mesh technology? In the event, we will take you through the Service Mesh feel CloudMesh hosted on the cloud, help ease the practice of Service Mesh technology.

As one of the original cloud technology forward direction, Serverless architecture allows you to further improve resource utilization, more focus on business development. You can experience this Serveless to quickly create applications, according to the service request-second 0-1-N automatic telescopic view the log Quick troubleshooting, time-triggered applications such as new product features.

Under the micro-service architecture, distributed transaction problem is an industry problem. This time, you can experience the AT mode How to use open-source framework for distributed transactions Seata of, TCC mode to solve the ultimate problem of consistency of business data.

Specific schedule can click here to view.

Full schedule

The actual schedule is subject to the Assembly's official website.

  • 6月24日09:00-16:00 SOFAStack Cloud Native Workshop
  • 25 June 13: 35-14: 10 Hosted by the CPU and GPU workloads, efficient use of resources
  • 25 June 17: 30-18: 05 no confusion: mass Kubernetes Audit and Inspection
  • 25 June 17: 30-18: 05 effectively and reliably manage large-scale cluster Kubernetes
  • 25 June 16: 00-16: 35 financial mission-critical for the Internet scene expanded deployment
  • 25 June 16: 00-16: 35 Kubernetes cluster of large-scale distributed deep learning
  • 25 June 11: 00-11: 35 seminar: SIG Cluster Lifecycle
  • 25 June 11: 45-12: 20 security sandbox has been production-ready? Kata container, gVisor etc.

Guess you like

Origin yq.aliyun.com/articles/706118