Article for ants gold dress this year KubeCon all heavy share

640?wx_fmt=jpeg The ants gold dress will be heavily involved, be shared by a number of technical experts and organize workshop, presented feast technology for the participants. Author / Source: ants gold dress

June 24, domestic cloud most important meeting of native areas impending onslaught! KubeCon + CloudNativeCon + Open Source Summit China 2019 will be held in Shanghai, the ants gold dress will be heavily involved, be shared by a number of technical experts and organize workshop, presented feast technology for the participants.

The conference, focused on ants gold dress will share management Kubernetes cluster, the depth of learning tasks and tuning large-scale deployment on Kubernetes, the Internet finance, security and other topics at the forefront of the container. Since 2016, the beginning of ants gold dress depth using Kubernetes, CNC F and is officially recommended as an end-user case:

640?wx_fmt=jpeg

Currently, ants gold dress not only in the cloud around CNCF native open source technologies to contribute, also open their financial class cloud native distributed solution SOFAStack, the conference on ants gold dress Workshop will show the form to use SOFAStack fast Implementation Service Mesh and Serverless, welcome attention.

Share specific contents are as follows: 

List of topics 

Hosted by the CPU and GPU workloads, to achieve efficient use of resources

◈ ants gold dress platform data Technical Systems technical experts Cenpeng Hao (Cooper) ◈ Ali cloud platform container senior technical experts He Jian

Introduction to issues

This presentation describes how the AI ​​training mission and long service department on Kubernetes mixed cluster. The main object is achieved by mixing various portions workload improve resource utilization, thereby saving resources. We will from different dimensions, including Qos class, cgroup, scheduling and so on to describe how we achieve mixing section, as well as how to assess utilization. Over the past few months, we built a few hundred nodes GPU and CPU mixing unit cluster, we will introduce best practices in the production of cluster deployment of mixing and long service AI batch tasks. 

No confusion: mass Kubernetes Audit and Inspection

◈ container Ali cloud platform technology expert Chen Jie ◈ ants gold dress senior development engineer Ma Jinjing 

Introduction to issues

As we all know, the exact abnormality detection and rapid problem analysis is the key to ensure Kubernetes cluster availability and stability. But in the entire Kubernetes project, with numerous monitoring indicator data. Only our Kubernetes cluster as an example, we observed that monitoring data such as this will produce several thousand per second. How rational use of these complex and large amounts of data and indicators, they are recorded and analyzed effectively, it becomes easy to understand visual display into accurate warning information, it is a very challenging task. 

In this presentation, we hope to share with you in Kubernetes cluster monitor in Alibaba, audit and inspection aspects of the practice and experience. First, we'll chat Kubernetes important data and indicators related to stability, and how to understand them. We will take the form of case, specifically to talk about how we integrate and analysis of these data and indicators. Finally, we will share Alibaba efficient, real-time data to automate best practices for inspection and analysis.

Effectively and reliably manage large-scale cluster Kubernetes

◈ ants gold dress senior development engineer Zhang Yong (Cang desert) ◈ ants gold dress technical experts Lin Zhixian (Xiao Lin)

Introduction to issues

As the business grows, we need to Kubernetets deploy to multiple data centers around the world. A single data center to have more than tens of thousands of nodes. The key challenge we face is how to efficiently and reliably manage multiple large-scale Kubernetes clusters in the data center.

In this presentation, we will share experiences and practices to achieve large-scale cluster management automation. First, we will introduce a fully automated lifecycle management node, and based on how the NPD, Autoscaler and custom operator automatically detecting and recovering from node failure. Then, we will share deployment and upgrade Kubernetes cluster experiences and solutions. Finally, we will share based on Prometheus and operators of risk prevention and control system, which ensures the reliability of the cluster, has the ability to automatic fault detection and isolation.

Internet banking is mission-critical scale-out deployment scenarios

◈ ants gold dress senior development engineer Zhou Meng Yi (Feng Sheng) ◈ ants gold dress technical experts Wu Ke (HEC) 

Introduction to issues

The default deployment method provides a good solution to perform routine upgrade. However, the high availability and reliability of large-scale financial applications for the Internet service deployment yet another matter, not to mention the kind of work load compatibility problems under the existing operating system and maintaining the system faced a.

New workloads ants gold dress introduced so that these problems can be solved. It deployment strategy through a reliable and flexible distribution, risk control and high-performance in-place update the extended deployment capabilities. In particular, it eliminates the technical barriers faced by the financial services industry, the developers and operators to concentrate on its core business.

Kubernetes cluster of large-scale distributed deep learning

Tang source (ceremony) ◈ ants gold dress technical experts ◈ Director of Engineering, MobileIron Yong Tang

Introduction to issues

The focus of this lecture is to deploy large-scale distributed depth study on Kubernetes. In addition, the training will also cover how to automate the process of learning by using operators to manage and implement and machine. We will share our experience and to compare two open source Kubernetes operators: tf-operator and mpi-operator. These operators are TensorFlow management training task, but with a different allocation strategies, which resulted in different performance results CPU, GPU and network utilization aspects.

Deep learning both network-intensive task is GPU-intensive, so it is important to arrange proper optimization. Prone to cause imbalance unused computing capacity, which for GPU is too costly to the node (as compared to CPU). We will share our experience, hope can provide useful insight to help get better value for money from the machine learning tasks. 

Promotion: SIG Cluster Lifecycle

◈ ants gold dress senior research engineer Xu Di (Xun Ming) ◈ Cloud Software Architect, Intel Alexander Kanevskiy

Introduction to issues

Sig-Cluster-Lifecycle Intro cluster lifecycle of a SIG is the Special Interest Group focused on cluster deployment and upgrades. Our SIG is working to improve the user experience, to guide compliance with the minimum possible Kubernetes cluster best practices. Use our major installation tool kubeadm, you can well manage to simplify the installation and upgrade process. We recently launched a new Kubernetes object called Cluster API, which will Kubernetes declarative style API introduces cluster creation, configuration and management. In this briefing, we will introduce SIG's mission statement, review the latest updates and discuss our roadmap. Also introduced some new life cycle of the project. You are welcome to join the SIG and contribution.

Is it safe sandbox already production-ready? Kata container, gVisor etc. 

◈ ants gold dress, a senior technical expert Wang Xu (loop) ◈ ants gold dress technical expert Alan Li Phang (Yates)

Introduction to issues

On KubeCon NA 2018, we Kata container and gVisor quantitatively compare, when we call to show a reasonable CPU / network performance, the performance loss of the file system storage, memory consumption Kata and Kata gVisor system in terms of cost, etc. .

After the event, Kata container released version 1.5, supports lightweight hypervisor (Nemu and FireCracker). We were also introduced virtio-fs for file sharing system, which can provide better POSIX compatibility and performance. Virtio-fs can be seamlessly integrated with a container of shimv2, it appears to be able to provide better security sandbox production-ready support for Kubernetes in 2019.

In this presentation, we will demonstrate the use of the benchmark test suite to update new technology and help users understand if they are production-ready.

SOFAStack Cloud Native Workshop

Service Mesh inter-service communication capability to sink to the infrastructure that enables applications decoupling and lightweight. But the complexity of the Service Mesh itself remains, how easy practice Service Mesh technology? In the event, we will take you through the Service Mesh feel CloudMesh hosted on the cloud, help ease the practice of Service Mesh technology. 

As one of the original cloud technology forward direction, Serverless architecture allows you to further improve resource utilization, more focus on business development. You can experience this Serveless to quickly create applications, according to the service request-second 0-1-N automatic telescopic view the log Quick troubleshooting, time-triggered applications such as new product features. 

Under the micro-service architecture, distributed transaction problem is an industry problem. This time, you can experience the AT mode How to use open-source framework for distributed transactions Seata of, TCC mode to solve the ultimate problem of consistency of business data.

Specific schedule can click here to view.

Full schedule 

The actual schedule is subject to the Assembly's official website.

<As shown insufficiency, left or right slide>

time

issue

June 24, 9:00 to 16:00

SOFAStack Cloud Native Workshop

June 25, 13:35 to 14:10

Hosted by the CPU and GPU workloads, to achieve efficient use of resources

June 25, 17:30 to 18:05

No confusion: mass Kubernetes Audit and Inspection

June 25, 17:30 to 18:05

Effectively and reliably manage large-scale cluster Kubernetes

June 25, 4:00 p.m. to 4:35 p.m.

Internet banking is mission-critical scale-out deployment scenarios

June 25, 4:00 p.m. to 4:35 p.m.

Kubernetes cluster of large-scale distributed deep learning

June 25, 11:00 to 11:35

Promotion: SIG Cluster Lifecycle

June 25, 11:45 to 12:20

Is it safe sandbox already production-ready? Kata container, gVisor etc.

 

640?wx_fmt=jpeg


Guess you like

Origin blog.csdn.net/F8qG7f9YD02Pe/article/details/93377388