Large-scale landing cloud native, Ali cloud upcoming blockbuster debut KubeCon China

June 24, 2019 to 26, a native cloud technology conference organized by the KubeCon Cloud Native Computing Foundation (CNCF) + CloudNativeCon + Open Source Summit ( Shanghai) in Shanghai, China will soon curtain dress.


Following the 2018 KubeCon first successful landing in China, the current KubeCon will attract thousands of technicians from all over the world will attend the event, participation in all projects and topics CNCF depth discussion and case studies, to listen to the operation and maintenance of the project and who CNCF Share the end user. This KubeCon + CloudNativeCon + Open Source Summit Conference of the project committee consisting of 75 experts, reviewed proposals CloudNativeCon KubeCon + 618, and in this KubeCon China 2019, Alibaba total of 26 technical presentations selected. 


On this KubeCon, Ali cloud smart container platform is responsible spawned Yu (tert same), CNCF TOC, etcd project author, Ali cloud container platform, a senior technical expert Li Xiang, Ambassador CNCF, Kubernetes project maintainer, Ali cloud senior technical expert Zhang Lei and many other cloud native technology large coffee will all be present and do technology sharing, and it will bring both open source Virtual Cluster strong multi-tenant design, OpenKruise open source projects, open cloud native application center (cloud native App hub) and many other cloud-native advanced for you the latest developments and technological advances. We look forward to meeting you on KubeCon China and Ali container platform team, to communicate or to carry out technical cooperation.


Line KubeCon + CloudNativeCon Alibaba special page


A comprehensive display of Ali cloud in this KuebCon speech issues and ecological outcomes of native cloud
"KubeCon + CloudNativeCon Alibaba special page **" has been formally launched. Here, you can master Ali speech issues on KubeCon the track "CNCF x Alibaba Cloud native technology open class" curriculum update, understand Ali cloud native products dynamic, June 24 Hands salon schedule, click on the link or end the "Reading original "direct access to special page.

Special pages link: https://yq.aliyun.com/promotion/833
/>



We recommend that you focus on the following speech:
**

Kubernetes is the time, may cloud the future of the native

Speaker
Ali intelligent cloud platform responsible for people engaged in container buildings (with t)

About issues
as practitioners of native cloud applications, Ali cloud not only support the huge flow of two-eleven, but also bear the Alibaba economies with large daily business. This presentation will share the success of Ali cloud Reflections on Kubernetes technology, while looking to the future development trend of cloud-native.

Keynote: Alibaba scale cloud native

Speaker: Ali cloud container platform, a senior technical expert Li Xiang

About topics: Ali cloud has succeeded in large-scale cloud native ground, this presentation aims to share experiences to give you specific audiences, involving aspects of the expansion, reliability and efficiency of development, migration strategies, and explore large-scale scenes were against optimization. Cloud native works for Alibaba. Cloud native works for (almost) everyone.

Ali Baba + high availability and scalability Prometheus Thanos


Speaker
Ali cloud platform container senior technical expert Qin Guoan (inflammation Lie)
Ali cloud platform container senior development engineer Li Tao (Lu wind)
issues ** About
Alibaba Group Kubernetes being used to support the world's largest e-commerce business. Availability and scalability, and how to provide reliable fine-grained monitoring and alert service is indeed a challenge. This presentation will share their experience with high availability and scalability of fine-grained monitoring system based on the open source project Prometheus and Development Thanos. The system supports Alibaba cluster management system, with 800 million TPS and 10K requests, topics will be discussed:

  • How to use Prometheus support large-scale scene?
  • How to use data query Thanos solve the problem caused by multiple instances Prometheus?
  • We went to high school lessons from the configuration of Prometheus and Thanos, such as target discovery and records management rules and alert rule.

Use Istio management across regions and across clusters of micro-services

Speaker
Ali cloud platform container senior technical expert Wang Xi Ning (Tony Ding)
Backend Architect UniCareer Xiaozhong Liu


topics Profile
post gifted you are a professional e-learning development platform designed to meet the needs of working professionals and students around the world, and for users from many parts of the world to provide services. These applications deployed on multiple clusters in different regions Kubernetes Ali cloud to reduce service access in different areas delays. To effectively manage these micro-services, grid services require a multi-cluster service traffic to control the microstructure, guaranteed service to service communications.


Istio is built on a grid Kubernertes service, support multiple topologies to manage application traffic across multiple clusters Kubernetes unified management. Throughout the case study, we will use the grid services Istio share related to the multi-cluster deployment of traffic management design and technology, and discusses some of the challenges and corresponding practices according to the needs and constraints of the underlying platform.

Hosted by the CPU and GPU workloads, to achieve efficient use of resources

Speaker
Ali cloud platform container senior technical experts He Jian
ants gold dress platform data technology systems, Technical Expert Cenpeng Hao (Cooper)




issues Introduction
This presentation describes how the AI training mission and long service department on Kubernetes mixed cluster. The main object is achieved by mixing various portions workload improve resource utilization, thereby saving resources. We will from different dimensions, including Qos class, cgroup, scheduling and so on to describe how we achieve mixing section, as well as how to assess utilization. Over the past few months, we built a few hundred nodes GPU and CPU mixing unit cluster, we will introduce best practices in the production of cluster deployment of mixing and long service AI batch tasks.

1-5-10: how to quickly restore large-scale container failure


Speaker
Ali cloud container platform technology expert Bear Huan (rather humble)


issues Introduction
In the cloud era, enterprise applications based on the surge in container due to manual operation, hardware failure, the likelihood of a substantial increase in vessel failure. Therefore, how to ensure the availability of resources without increasing investment in the reliability of large-scale container has become a huge challenge facing the cloud platform. Alibaba running millions of containers for the resumption of container-related failures 1-5-10 proposed theory: MTTD (average detection time) of 1 minute, MTTI (average identification time) 5 minutes, MTTR (average time to resolution) 10 minutes. At this meeting, we will discuss how 1-5-10 improve the reliability of large-scale container:

  • How to build an effective agent in the local problems are detected within 1 minute;
  • With expert knowledge of how intelligent diagnosis container issues;
  • How to fault-driven mode automatic recovery container issues.

Learn Kubernetes Master scalability and performance

Speaker
Ali cloud container platform Senior Software Engineer Chen Xingyu (Yu Mu)
Ali cloud container platform senior technical experts Zengfan Song (by the Spirit)


issues Introduction
At present, Kubernetes size limit is 5k nodes, so if you want to use it to manage like 10k node so the Web cluster size, you may not be achieved. Do you want to know what Kubernetes manage more than 5k node performance bottleneck? When you want to expand the scalability to a new level, which component hinder? etcd, apiserver or scheduler? Understanding these issues is the key to large-scale operations Kubernetes cluster. In Alibaba, we encountered a lot of problems, such as clusters become larger, pod creation becomes very slow. In this presentation, we would like to share how various benchmarks and analysis, and to identify bottlenecks, and how to adjust the control assembly, and realized more than 100 times the performance.

Intro:containerd


Speaker
Ali cloud container platform senior development engineer Fu Wei (Yu Song)
Google software engineer Liu Lan Tao


topics Introduction
This presentation will containerd architecture design philosophy of, to share the audience how to use the plug-in capabilities to enhance containerd, provide a different image storage and strong isolation solution container operation. At the same time, but also to show to the audience containerd with gVisor, integrated Firecracker container runtime presentation case, let the audience a better understanding of the best way to integrate containerd.

Ali Baba using K8S, Kata bare metal containers and no cloud server Construction


Speaker
Ali cloud container platform technology expert Zhang Yifei (Wu Peng)
Ali cloud platform container senior development engineer Tang Huamin (Huamin)


issues Introduction
no server computing is the popular form of the calculation, which greatly reduces the developers to deploy, manage, run applications cost. In the absence of server platforms, users of different services are typically mixed unit on the same node, therefore, a need to provide a trusted operating environment in a multi-tenant scenarios. In Alibaba, we use Kata Containers running as a secure container, storage, networking, hardware and other aspects to ensure performance for one more hard isolation and service runtime. In this sharing, the practice according to our production, discussed in detail at how the scene for one more hard to achieve high performance multi rent and services running.

Alibaba digital push the open source community to explore


Speaker
Ali Baba open source office management Senior Community Manager Zhao Shengyu (Sheng rain)


issues Introduction
open source community operations have been a sore point of open source software development, especially for the leading pure developer community, how to effectively manage the open source community and found active contributor to the community, through the data to identify problems in the community management problems, are solved. Of the presentation will include:

  • How to judge the individual developer activity in the community?
  • How to judge the overall activity of the open source community?
  • What can be seen from the top open source projects in the world currently under analysis these models, what kind of insights obtained?
  • Community management tools in the open source community should play what role?
  • Based on the above, try to do what Ali, the harvest of what results?

**

Ali Baba: electricity supplier giants native to the cloud evolution of experiences and lessons

Speaker
Ali cloud platform container senior technical expert Zhang Lei
container platform senior development engineer Wang Siyu (wine wish)


issues Introduction
to migrate like global e-commerce giant Alibaba native to the cloud platform is no easy task, in this presentation, we will share the lessons we draw from the perspective of last year's work and technical communities, including:

  • What are the main obstacles to Ali directed the original cloud technology migrate born?
  • What Ali's main technical debt? How do we solve these problems? Our approach effective?
  • If your application management and organization Kubernetes completely different, how to do?
  • Why predictability is essential for e-commerce? Kubernetes whether predictable available out of the box? If not, why? How to solve this problem (no solution possible)?
  • How to verify the scalability problem of thousands of nodes in the cluster?
  • Large-scale team can win-win cooperation with the upstream community?

Intro: Dragonfly

Speaker
Ali cloud application platform operation and maintenance of technical experts Hu Zuozheng (positive Greek)
Ali cloud application platform operation and maintenance senior development engineer Zhang Jin (too cloud)


issues Introduction
With the application of technology in the container industry more widely, how to safely and efficiently distribute images is a new challenge engineers face. Dragonfly Project is an open source P2P intelligent image and document distribution system. The project aims to solve all the problems cloud distribution native scene. Currently, dragonflies project focused on the following areas:

  • Simple: user-oriented well-defined API (HTTP), for all non-invasive vessel engines
  • Efficient: CDN support, distribution companies in order to save the bandwidth based on P2P file
  • Intelligence: Host detection level to achieve a host of speed limits, intelligent traffic control
  • Security: encryption of data block transfer, HTTPS connection support


In this presentation, we will focus on distributing container image by dragonflies. We will review the challenges facing the organization, including large-scale distribution, transmission security, bandwidth costs, and provide solutions. This presentation will discuss the actual use cases.

No confusion: mass Kubernetes Audit and Inspection


Speaker
Ali cloud container platform technology expert Chen Jie
Ma Jinjing ants gold dress senior development engineer


topics Introduction
As we all know, the exact abnormality detection and rapid problem analysis is to ensure that the key Kubernetes cluster availability and stability. But in the entire Kubernetes project, with numerous monitoring indicator data. Only our Kubernetes cluster as an example, we observed that monitoring data such as this will produce several thousand per second. How rational use of these complex and large amounts of data and indicators, they are recorded and analyzed effectively, it becomes easy to understand visual display into accurate warning information, it is a very challenging task.


In this presentation, we hope to share with you in Kubernetes cluster monitor in Alibaba, audit and inspection aspects of the practice and experience. First, we'll chat Kubernetes important data and indicators related to stability, and how to understand them. We will take the form of case, specifically to talk about how we integrate and analysis of these data and indicators. Finally, we will share Alibaba efficient, real-time data to automate best practices for inspection and analysis.

GPU minimize operating costs depth study on Kubernetes

Speaker
Zhang Kai Ali cloud platform container senior technical expert
Ali cloud container platform technology experts car Yang (must Ran)


issues Introduction
More and more scientists run data based on the depth of learning tasks NvidiaGPU on Kubernetes. At the same time, they found a cluster of GPU idle waste of more than 40% of the cost. So how can help improve efficiency in the use GPU has become a major challenge. We will introduce a native Kubernetes based on the GPU sharing solution:

  • How to define a shared GPU API
  • In the case of how not to change the scheduler in scheduling GPU bare metal code sharing.
  • How to isolate GPU integrated solutions and Kubernetes
  • We will also how to run different jobs on the same GPU device in Kubernetes cluster user through demo shows Tensorflow

Native accelerate cloud era three ways mirrored distribution


Speaker
Ali cloud container platform technology expert Jiang Yong (Yi Fang)


issues Introduction
This presentation will share practices and lessons learned to improve the efficiency of distribution from the mirror Alibaba network size. Depending on the scene, we use a different image distribution method. Distributed P2P-based CNCF / Dragonfly is the center of the mirror to alleviate the bandwidth and the most direct way to reduce distribution time. Further, CNCF / containerd remote file system snapshot in a remote storage mirroring program directly, the container engine reads image content over a network, to distribute almost no time. You will find the second way depends on the stability of the network, then how dynamically loaded from local storage to remote mirroring as a trade-off based on image content read request? Finally, we will summarize how to choose a mirror image.

Pod dynamically adjust resource constraints in the Web cluster level


Speaker
Ali cloud container platform technology expert Wang Cheng
Ali cloud container platform technology experts ROCKETS (co source)


issues Introduction
are large scale Alibaba such a large-scale global giant electricity supplier, the number of applications they have and the type of application. How scientific and rational management of resources for these containers, has been a huge challenge for us. In this presentation, we will be technology and community evolution and other dimensions as we share our practical experience and technological achievements. These include:

  • What is the current status of the container resource management community is?
  • What are the specific challenges of such large-scale application deployment Ali is?
  • How do we diagnosis and treatment of various incurable diseases on resource management?
  • How do we do significantly improve resource utilization while ensuring a stable online service?
  • How to balance the cloud-based native of evolution and work to achieve rapid delivery?
  • Our experience can bring what and how we can help you to achieve win-win situation for the community feedback?




KubeCon China 2019 Ali Baba technical presentations Overview

**



Well-being broadcast


In particular, we prepared for you  50% off the purchase KubeCon + CloudNativeCon + OpenSource Summit tickets discount code held in Shanghai in June 2019 24-26!


Alibaba concern public cloud native resolution background Send " preferential " immediately receive this discount code! We look forward to seeing you in Shanghai!



Buy KubeCon + CloudNativeCon + OpenSource Summit can directly scan two-dimensional code

Guess you like

Origin yq.aliyun.com/articles/705161