Disaster recovery switching time is reduced by 99%. How does "cloud-side collaboration" improve the efficiency and stability of film and performance services?

After three years of silence, the offline performance market is ushering in a "retaliatory" recovery. For a long-awaited concert, whether the ticket checking process is smooth and whether you can enter the venue quickly will directly affect every audience's experience and evaluation of the entire performance service. I believe many friends have personal feelings.

Alibaba Pictures Group is a film and television industry company driven by the Internet as the core. It has an entertainment platform for the entire industry chain of content production, Internet publicity and distribution, derivative product authorization and comprehensive development, theater ticket management and data services. It is an important vertical business column of Alibaba Culture and Entertainment Group. Alibaba Pictures has been leading the industry's entire chain towards digital and intelligent transformation through technology and architecture innovation.

While the industry traffic is exploding, the film and acting scenes are also constantly being extended and enriched. In this context, in order to cope with the severe test brought about by the pressure of on-site performance service efficiency, system stability, and high availability, Alibaba Pictures has implemented a set of cloud-edge-end integrated hybrid cloud architecture for film and television on-site service scenarios based on Alibaba Cloud's edge container service ACK@Edge. Through support for access to massive heterogeneous devices, as well as performance improvements such as high availability, high stability, and scalability, it can meet the needs of future high-latency-sensitive operational messages and rapid business development . It is worth mentioning that this is the first implementation in the performance industry to realize cloud-edge-end integration, cloud services and edge cluster cloud-native collaboration, and won the "Trusted Edge Best Practice Case" by China Academy of Information and Communications Technology in June 2023.

The large flow of people and the complex environment, the challenge of on-site service for film and television performances has led to the demand for cloud-side collaboration

Alibaba Pictures serves people in offline performance scenes mainly divided into three categories, consumers, sponsors and regulators. Regulators must meet safety and stability requirements; consumers must ensure accurate verification and fast access; and organizers, in addition to the above points, also need to reduce costs as much as possible.

The service system management platform is the core business system of Alibaba Pictures. Under the conditions of uneven basic network facilities in different performance venues, combined with the characteristics of short-term concurrent growth of on-site traffic, the business system relies heavily on highly sensitive network resource requirements, resulting in business robustness cannot be guaranteed. Under the current situation of rapid development of field service scale, serious bottlenecks have emerged in the development of side-end services. For example, multi-side project rules cannot be coordinated and configured, multi-performance venues cannot be uniformly monitored and managed, and massive heterogeneous devices cannot be uniformly operated, maintained, and scheduled. The current status of traditional cloud-to-end and end-to-end architectures cannot continue to meet the needs of actual edge-end scenarios. A highly available, highly stable, and scalable cloud-edge-end integrated hybrid cloud architecture for massive heterogeneous device access is required to solve existing bottlenecks and future expansion problems:

  • Convenient cloud-side collaboration: With the rapid development of technologies such as cloud computing, edge computing, and the Internet of Things, the demand for collaborative work is also growing. Cloud-side collaboration can make full use of these advanced technologies to provide users with a more efficient and convenient collaboration experience.
  • Efficient Data Processing: Data has become one of the most important assets of businesses and organizations. With the explosive growth of movie performance data, the demand for data storage, processing and analysis has also increased. Cloud-side collaboration can help users better manage and utilize these data resources, and cloud-side collaboration can transcend geographical and time zone restrictions to improve work efficiency.
  • Low latency: In the movie performance scene, the real-time requirements for data processing and feedback are very high. Cloud-edge collaboration can use edge computing technology to realize fast data processing on local devices, reduce latency, and meet real-time requirements.
  • Significant cost reduction: cloud-side collaboration can perform some data processing on local devices, reducing the amount of data transmission in the network, thereby reducing network bandwidth requirements and communication costs, while making more reasonable use of machine resources, reducing hardware input costs and hardware transportation costs.

Alibaba Pictures' Cloud-Edge Collaborative IoT Architecture Practice Based on ACK@Edge

Alibaba Pictures' edge IoT service system uses a cloud-edge-device collaboration architecture, which is a solution for on-site inspection business scenarios. The overall idea is mainly based on the core idea of ​​cloud-controlled edge, edge autonomy, and end intelligence, and provides high-availability, high-performance, and highly scalable on-site services for the field in a way that realizes cloud-edge collaboration and diversification.

ACK@Edge is a cloud-edge integrated collaborative hosting solution launched by Alibaba Cloud Container Service for edge computing scenarios. For large-scale edge computing scenarios, ACK@Edge has the product capability of "excellent node management" certified by the China Academy of Information and Communications Technology. It adopts native Kubernetes non-intrusive enhancements to support unified application lifecycle management and unified resource scheduling in edge computing scenarios, helping enterprises focus on the development and management of containerized applications.

Figure 1: Alibaba Cloud Edge Container Service ACK

On the overall architecture, the cloud-edge-end integrated collaborative hosting solution is adopted to sink the cloud computing capabilities to the edge side and device side, focusing on providing storage, network, security, monitoring, log and other capabilities; in terms of cluster management, APIserver and scheduler have built-in a large number of performance optimizations; in terms of cloud-edge network, the optimization of the network plug-in Flannel greatly reduces cloud-edge traffic overhead; Edge traffic management, lightweight, native operation and maintenance API support, etc., natively support application unified lifecycle management and unified resource scheduling in edge computing scenarios, ensuring edge business stability.

ACK@Edge has been widely used in CDN, real-time audio and video cloud services, online education, transportation, smart city, smart industry, IoT, logistics, water conservancy, energy, agriculture and other scenarios.

Alibaba Pictures' cloud-edge-device collaboration architecture uses ACK@Edge as the hosting base for underlying cloud-native edge infrastructure scheduling, and leverages the capabilities of edge autonomy, edge management, and service operation and maintenance provided by ACK@Edge to support the design principles of cloud-controlled edge and edge autonomy.

Figure 2: The overall architecture of Alibaba Pictures' cloud-side collaboration solution

In actual business scenarios, the on-site edge servers are scattered and not fixed. Usually, when the edge servers leave the factory, the edge nodes need to be added to the master node of ACK&Edge, and then the self-built equipment monitoring platform on the cloud is used for business deployment, operation and maintenance management and control. Utilize the edge autonomy capability of ACK&Edge to ensure the normal service start of on-site nodes in the case of extremely weak or no network, and provide on-site ticket exchange and ticket inspection capabilities, so that the on-site can ensure normal ticket inspection, ticket exchange and other operations. In addition, through the observability of ACK&Edge, on-site service nodes are monitored and alarmed to improve the self-discovery of on-site service problems and ensure the availability of on-site services.

2.1 Efficient edge service customization management

Figure 3: Edge Service Orchestration

The functions provided by the edge container service ACK@Edge, through a higher level of abstraction, manage multiple Deployments in a unified manner, such as creating, updating, and deleting operations. A template is provided to define an application and deploy multiple workloads to different regions, and each region is defined as a node pool.

Currently, unitized deployment supports two types of workloads, StatefulSet and Deployment. The controller will create child Workload resource objects according to the configuration of the node pool in the unitized deployment, and each resource object has an expected number of Replicas Pods. Through a unitized deployment instance, multiple Deployment or Statefulset resources can be automatically maintained, and at the same time, differentiated configurations such as Name, NodeSelectors, and Replicas can be realized.

The on-site operation and maintenance management platform provides edge device service discovery and edge service differentiated configuration services, dynamically adjusts the deployment configuration according to on-site business, and relies on ACK@Edge to realize efficient edge service customization and management.

2.2 Edge autonomy, seamless automatic switching of node tasks

Large-scale and ultra-large performance on-site ticket checking systems are more reliable than other types of sites, which put forward higher requirements for equipment reliability, and the average failure time of equipment failures must also meet the requirements of all-weather ticket checking. At the same time, for on-site disaster recovery, it can automatically sense and switch services, reducing the troubleshooting time of on-site operation and maintenance personnel. The reliability of on-site ticket checking service equipment must reach 0.999 or above, and have service disaster recovery capabilities to achieve multi-machine operation and cloud-integrated service disaster recovery.

The ability of edge nodes to independently negotiate, make decisions, and execute tasks; the autonomous capability can make edge nodes more intelligent, able to automatically adapt to environmental changes, and ensure the stability and reliability of the system. Seamless automatic switching means that in edge computing, when a node fails or is unavailable, the system can automatically transfer tasks to other nodes to achieve seamless task switching and fault tolerance. Through the technology of edge autonomy and seamless automatic switching, edge computing can perform task scheduling and resource utilization more flexibly and efficiently, and can also improve system reliability and fault tolerance.

The device side connects to the edge and the cloud to provide inspection capabilities. The device uses the automatic decision-making SDK to judge the network status and intelligently monitor service behaviors to automatically make decisions to determine whether to connect to the edge or to connect to cloud services. The edge performs multi-channel data interaction with the cloud through data synchronization services to ensure data consistency between the cloud and the edge.

2.3 Cloud-edge collaboration to ensure consistent on-site rules

Cloud-edge collaboration combines cloud computing and edge computing to achieve a more efficient, flexible and reliable computing model through collaborative work. On-site rule consistent multi-open means that in the edge computing environment, the same applications and services can be quickly replicated, deployed and managed to meet the needs of multiple nodes on site. Through the cloud-edge collaboration and on-site rules consistent multi-opening technology, computing resources and applications can be better distributed to edge nodes, improving system response speed and performance, while also meeting the diverse needs of the site.

Specifically, the cloud controls the deployment of the overall central cloud and edge cloud, actively coordinates with the edge, pushes edge data for real-time collaboration, cloud projects and edge projects share on-site rules, cloud-edge configuration overall coordination and reflow, on-site rules, cloud-edge integration and multi-opening collaboration, cloud control and high-speed access to the edge, so as to achieve "cloud control terminal, edge back to cloud, consistent collaboration".

Figure 4: Consistent Field Rules

2.4 Service Security, Intelligent Physical Examination

Service security In the edge computing environment, technologies and strategies to protect data and services from attack and abuse are required. In the edge computing scenario, due to the long data transmission path, complex network topology, and high security risks, it is particularly important to ensure service security. At the same time, intelligent physical examination is a comprehensive security physical examination and analysis of edge devices, network environment and services, timely discovering and troubleshooting security risks, and ensuring system security and stability. Through the technology of service security and intelligent physical examination, the security and reliability of the edge computing system can be improved, and the security and availability of data and services can be guaranteed. Ali Pictures’ IoT cloud edge fully considers service security and smart physical examination to ensure the security and reliability of the system.

The edge service automatically and intelligently detects various system indicators of the edge service, automatically uploads system physical examination indicator data, automatically detects, repairs, and guides on-site system alarm repairs, and uploads the detection data to the cloud in real time, so that all edge servers on site can be known, repaired, and processed early.

Figure 5: Intelligent physical examination

ACK@Edge Helps Alibaba Pictures IoT Cloud-Edge Collaboration, Increase Efficiency and Reduce Costs

By using the ACK@Edge platform as the overall base of the IoT cloud-edge-end architecture, Alibaba Pictures has opened up the existing Paas platform on the cloud and the edge-end service configuration management capabilities in the film and show on-site service scene, and expanded the cloud-native capabilities to the edge side. The overall collaboration capability of the cloud, pipe, and edge can meet the strong demands of high-response, low-latency, and large connections.

At present, this framework has been well applied in on-site services, and the total number of tickets checked in more than 200 various projects has been nearly 100,000, which has brought about improvements in business results in many aspects:

  1. Putting services in containers solves the problem of poor stability caused by non-isolation of original resources, unifies the device operating system and configuration environment, reduces 98% of on-site device compatibility issues, increases the deployment speed of on-site personnel by more than 45%, and reduces the cost of active personnel;
  2. Using edge disaster recovery to complete local cluster load balancing, without manual monitoring and manipulation, reducing the switching time by 99%, realizing smooth and non-inductive switching between the main machine and the backup machine, greatly enhancing the disaster recovery capability of on-site services, while ensuring service stability, it also improves the user experience in the ticket checking link. Ticket checking is completed in 1 second, and the per capita ticket checking time is reduced by 70%;
  3. Reasonable utilization of machine resources enables multi-node one machine, reducing hardware investment and deployment costs by 50%.
  4. Edge device management realizes edge device image release, rollback and upgrade, monitoring data and service discovery, realizes remote unified management and control of all nodes, synchronizes version release of all nodes, and reduces admission problems caused by inconsistent versions or unupdated versions.

By implementing the cloud-edge integrated collaboration architecture based on ACK@Edge, Alibaba Pictures has expanded more performance industry scenarios, improved the overall service stability and high availability, and greatly improved the sponsor's trust in Alibaba Pictures and consumer satisfaction, forming an important support to help Alibaba Pictures take the lead in the field of on-site services.

In the future, Alibaba Pictures will continue to adhere to the two-wheel drive development strategy of "content + technology", accelerate the layout of upstream content, increase the advantages of the technology sector, continuously optimize operational efficiency, and promote business diversification. Alibaba Cloud Container Service will always go hand in hand with customer business, helping Alibaba Pictures to provide a rich and satisfying entertainment consumption experience for users, markets and industries.

You are welcome to click to read the original text, or join the DingTalk exchange group (group number: 21976595) to learn more about the product details of Alibaba Cloud's edge container service ACK@Edge.

The 8 most in-demand programming languages ​​in 2023: PHP is strong, C/C++ demand is slowing Musk announced that Twitter will be renamed X, and the logo will be changed for five years, Cython 3.0 is officially released GPT-4 is getting more and more stupid? The accuracy rate dropped from 97.6% to 2.4%. MySQL 8.1 and MySQL 8.0.34 were officially released. The father of C# and TypeScript announced the latest open source project: TypeChat Meta Enlargement move: released an open source large language model Llama 2, which is free for commercial use . React core developer Dan Abramov announced his resignation from Meta. ChatGPT for Android will be launched next week. Pre-registration starts now . needs? Maybe this 5k star GitHub open source project can help - MetaGPT
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3874284/blog/10090435