EMAS mobile DevOps solution-Mobile DevOps

Alibaba Cloud cloud native application development platform EMAS Peng Zhao (Zhou Mu)

1. Introduction to Mobile DevOps

1. What is mobile DevOps

1) DevOps as everyone knows

At this time node in 2020, DevOps is no longer a new concept. I believe everyone has some understanding of their own, but when we are asked to accurately describe what DevOps is, it seems difficult to make it clear. In fact, there is no definition of DevOps that can be unanimously recognized in the industry so far. The reason why it is difficult to accurately define is because DevOps is actually a concept or even a collection of concepts, and it is difficult to be concretized. The word "DevOps" itself can be understood literally as the full life cycle of software from Dev (Development) to Ops (Operations), but what exactly is the definition of DevOps? Among the many DevOps definitions, I personally think that the definition of Azure DevOps[1] is more precise and specific:

DevOps is a compound word of development (Dev) and operation (Ops). It combines people, process and technology to continuously provide value to customers.
What does DevOps mean to the team? DevOps enables previously isolated roles (development, IT operations, quality engineering, and security) to coordinate and collaborate to produce better, more reliable products.
By adopting DevOps culture, practices, and tools, the team can better respond to customer needs, increase confidence in the applications they build, and achieve business goals faster.

There are several key information in this definition to summarize:
① The combination of people, process, and technology
② DevOps enables coordination and collaboration of previously isolated roles
③ DevOps is a concept that requires both a culture and the support of automation tools
④ The purpose is to produce better and more reliable products faster

2) From DevOps to mobile DevOps

For DevOps, everyone usually discusses DevOps on the server side. Since DevOps is an excellent software delivery concept, why not apply DevOps to mobile delivery? This is the mobile DevOps we are going to introduce today.
Because of the differences between mobile and server scenarios, mobile DevOps is very different from server DevOps. Mainly reflected in the following aspects:

Mobile application automation is more complicated

• Build environment fragmentation

Android and iOS platforms need to build build environments based on different operating systems and build tool chains. Even the same platform build tool chain has version fragmentation. For example, Android SDK and Gradle, which Android build depends on, require multiple versions to support simultaneously. The Xcode and Ruby versions that the iOS build depends on require multiple versions to support at the same time

• Mobile terminal construction involves data security issues such as certificate hosting
• The Mac device on which the iOS build depends is a non-standard device in the computer room

Mac equipment is not a standard server and cannot be deployed in a standard computer room. It is usually necessary to build a Mac computer room by itself, which is also a challenge for operability and stability.

Automated construction is an indispensable capability in DevOps, which requires mobile DevOps to solve the above-mentioned client-side automated construction and one-click outsourcing problems through technical means.

Mobile terminal fragmentation is severe, application delivery compatibility is a huge challenge

Different from the consistency of the server-side deployment environment, the mobile-side application operating environment is very fragmented, and the compatibility test coverage is much more difficult than the server-side. The fragmentation of the mobile terminal is particularly serious in the Android system, which is mainly reflected in the following aspects:

• Fragmentation of mobile phone models

There are many mobile phone manufacturers and numerous models in the Android market. Different manufacturers will “optimize” the system at the bottom. In theory, any model that cannot be covered may face compatibility issues. The picture below is the latest in October 2020. According to the distribution of Android Top models of Baidu Statistics Traffic Research Institute [2], the market occupancy rate of Top 10 models is less than 15%, which shows the serious fragmentation of models.
image.png

• Fragmentation of operating system version

Differences in operating systems have a more direct impact on application operation. It is not uncommon for major system version upgrades to cause application incompatibility. Each major version of the operating system is a test of application compatibility; while considering compatibility with new systems Can't give up users of the old system.
The figure below is the latest Android version distribution data of Baidu Traffic Research Institute in October 2020. You can see that Android 10.0 has been released for more than a year, and the market occupancy rate is less than 50%. The operating system two years ago is still the mainstream
image.png

Due to the fragmentation of end devices, mobile DevOps is required to have mobile testing capabilities to automatically complete a large number of real-device compatibility tests.

Long mobile app release and update cycle

The new version of the application may be released within 2 weeks and the update rate will not exceed 50%, unlike the server that can complete the software release of all servers in a short time. The long release cycle means that the cost of making mistakes is higher. A version with a bug may take a long time to be digested through an update.

This requires mobile DevOps to have a complete gray release mechanism on the one hand to avoid publishing problematic applications to the user side at one time; on the other hand, once a bug version has been released, mobile DevOps needs to have hot repair capabilities, which can be added The mass patch package release method is lighter and faster to fix bugs.

Mobile applications run on a large number of mobile devices

Unlike server-side services that run in a specific cluster, which can be managed, controlled and operated in a unified manner, the operating environment of mobile applications is on the user's mobile phone, and it is a billion-level mass device for super apps such as mobile Taobao.

This requires mobile monitoring products to implement mobile terminal operation and maintenance monitoring through big data technology, and even requires remote log functions to pull error logs on designated devices to locate and troubleshoot errors.

Based on the above points and referring to the DevOps definition of the software delivery life cycle, summarize the mobile DevOps application life cycle and the capability requirements at each stage as follows:
image.png

2. What is Mobile DevOps

1) Mobile DevOps is a concrete realization of the EMAS mobile DevOps concept

First introduce EMAS (Enterprise Mobile Application Studio). EMAS is a leading domestic cloud native application development platform (mobile App, H5 application, applet, web application, etc.) from Alibaba Cloud, based on a wide range of cloud native technology (Backend as a Service). , Serverless, DevOps, low code, etc.), committed to providing enterprises and developers with one-stop application R&D management services, covering the entire life cycle of applications such as development, testing, operation and maintenance, and operations. For more information about EMAS, please refer to the EMAS details page of Alibaba Cloud official website .
Mobile DevOps is a concrete product output of the EMAS mobile DevOps concept. It is a central axis product of EMAS. It links all EMAS products to jointly realize the above mobile DevOps concept. Mobile DevOps realized the linkage and complete closed loop of the products originally isolated in each life cycle of EMAS as shown in the figure above, realizing the upgrade of EMAS from a mobile middleware platform to a mobile R&D platform. Mobile DevOps combines the following EMAS products to form the mobile DevOps of EMAS:
R&D domain: Mobile DevOps
test domain: mobile test
release domain: Mobile DevOps
operation and maintenance domain: mobile monitoring, mobile hot fix
operation domain: mobile push, mobile user feedback

2) The history of Mobile DevOps

Mobile DevOps is the commercial output version of the group's internal mobile R&D platform. Alibaba Cloud and the Taotao team developed the first version of the output version of the proprietary cloud as early as 2017. The first public cloud version was launched in April 2020.
The picture below is the development history of Mobile DevOps. It can be said that the development history of Mobile DevOps is actually the development history of Alibaba Group's mobile R&D technology, which is the precipitation of Alibaba's mobile technology and engineering R&D concepts in the past ten years.
image.png

3) The status quo of Mobile DevOps

Proprietary cloud has begun to take shape.
Mobile DevOps Proprietary Cloud is mainly for large customers, especially those who are doing digital transformation. These customers have high requirements for security and can only accept the mode of Proprietary Cloud deployment. It is also willing to invest costs to improve R&D efficiency.
In 2018, Mobile DevOps was officially launched in a proprietary cloud scenario. It has created value for dozens of major customers in multiple industries and empowered the digital transformation of corporate R&D processes.
In the public cloud free public beta,
compared to the proprietary cloud, the Mobile DevOps public cloud is more for small, medium and micro enterprises. These customers have demands for improvement in R&D efficiency but are price-sensitive. Public cloud is a good form of undertaking; at the same time There are some externally exported businesses within the Alibaba Group (such as exclusive DingTalk) that cannot be used for mobile DevOps based on the group's internal R&D platform. Mobile DevOps public cloud is also a good choice.
The Mobile DevOps public cloud has been officially tested for free since 2020.07. It has already served many small, medium and micro customers, as well as customers such as Dingding, government affairs Dingding, and singing ducks within Alibaba Group.

2. Cloud-native Mobile DevOps

Compared with proprietary clouds, building cloud-native Mobile DevOps in public cloud scenarios faces more technical challenges. This chapter will share with you our thinking, challenges and our solutions in the process of building cloud-native Mobile DevOps.

1. Why do you need Mobile DevOps in the public cloud?

1) Provide inclusive Mobile DevOps services for small, medium and micro customers

Although proprietary cloud deployment has advantages such as exclusive use and intranet security isolation, the high cost of proprietary cloud delivery is destined to be accepted by high-end players in the industry. Private cloud Mobile DevOps assess the cost of inputs as follows:
• one-time investment: one million a procurement costs
• continued investment: at least 30 W / annual server costs + 20 W / in manpower and maintenance costs
is calculated based on the cost of the private cloud The input costs for one year, the second year, and the third year are respectively: 150W, 50W, 50W and 200W cumulatively, which is unacceptable for small, medium and micro customers.
As the infrastructure of the new era, Alibaba Cloud needs to provide inclusive cloud services for more small, medium and micro enterprises other than large customers. And Mobile DevOps in the form of public cloud fits this concept. Based on the advantages of cloud-native flexible expansion and contraction and billing by volume, it can greatly reduce the cost of using Mobile DevOps for small, medium and micro customers. At the same time, it provides a DevOps R&D process that is more suitable for target customers based on the characteristics of small, medium and micro customers in the public cloud scenario.

2) Linking the EMAS product line to provide developers with a one-stop mobile R&D platform

The launch of the public cloud Mobile DevOps can effectively link EMAS’s existing mobile testing, mobile monitoring, mobile hot repair and other products, allowing EMAS to cover the entire application life cycle, complete the upgrade of EMAS from mobile middleware to mobile R&D platform, and improve user experience and viscosity.
Compared with traditional self-built CI/CD platforms such as Jekins and Gitlab Runner based on open source solutions, EMAS one-stop mobile R&D platform has obvious advantages in terms of cost, high availability, technical support, etc., and can cover application construction and testing in one-stop , Release, operation and maintenance, and operation full life cycle management. Compared with the traditional self-built CI/CD "chimney-style" independent open source systems, it also has obvious advantages in the efficiency of research and development collaboration.

2. Challenges Facing Public Cloud Mobile DevOps

Compared with the scenarios of private cloud intranet deployment and internal staff use, Mobile DevOps in the form of public cloud will face more technical challenges, which are mainly reflected in the following aspects:

1) Security

• Tenant isolation
The first problem faced by public clouds is tenant isolation. Different customers must use shared resources at the same time, but they cannot see each other's data. For the construction of this scenario, in addition to the construction tasks of different customers may affect each other, the construction environment also involves the user's code, certificates and other private information. There must be a complete solution to ensure the isolation of the user's construction environment
• Code, certificate, secret key If the security of private data
is constructed, it will inevitably involve user codes, certificates, and secret keys. These data are extremely private data. Any problems in public cloud storage, transmission, and use may cause significant losses to users.
• The external
VPN public cloud is exposed to the public network and can be used by anyone, and it also faces the risk of malicious VPN. Especially the construction scenario involves a large number of custom execution commands, and there must be a perfect mechanism to prevent ** * Executing malicious custom commands leaves a backdoor in the build environment.

2) High availability

• Must support elastic expansion and contraction. When the
public cloud business grows in scale, the business needs to be able to quickly expand and contract to adapt to business growth, otherwise it will cause abnormal services. This requires cloud products to comply with the distributed architecture in terms of technical implementation, especially to build clusters to support stateless rapid expansion.
• Stability of the
build environment The build environment should be stable to avoid damage to the build environment caused by accidents or abnormal use, such as environment variables, build tool chains, etc.
• High-standard SLA, real-time online, never downtime.
High-standard SLA is not only a promise to customers, but also awe of the Alibaba Cloud brand.

3) Scalability

• Application Architecture diversification lead to large differences in the process of building
a limited number of customers private cloud, and a sound KA customer technical support services, so there are limited differences in the application and access specialist support. However, there are many customers in the public cloud environment, and the diversity of application architecture puts forward higher requirements for the versatility and scalability of the system.
• Diversified R&D processes.
Public clouds have different R&D team sizes, R&D cultures, and R&D processes for different customers, and they also put forward higher requirements for the scalability of Mobile DevOps R&D processes.

3. Our solution

In response to the above public cloud Mobile DevOps challenges, we use technical means to solve the following two aspects:

1) General construction architecture based on pipeline

The assembly line architecture makes the construction universal, based on the assembly line custom orchestration of the construction process, and based on the task plug-in to expand the assembly line business capabilities, which solves the above-mentioned scalability problem. This architecture has the following features:
• Universal construction architecture, supporting full platform construction capabilities
• Customized orchestration of the construction process based on YAML
• Visual orchestration
of pipelines • Infinite expansion of pipeline support for task plugins

2) Build clusters based on containerization/virtualization

Using containerization (Linux)/virtualization (Mac Os) solutions can completely solve various security and stability problems caused by resource sharing. Each construction task starts with a new container/virtual machine. After the construction task is completed, the container /The virtual machine is destroyed immediately, not only can effectively isolate the running environment between tasks, but the build environment is also "usually often new", which can effectively avoid the problem of damage to the build environment; in addition, a stable stateless containerization/virtualization build cluster can be built Ensure the high availability of construction services.
In the third and fourth chapters below, we will elaborate on these two points separately and decrypt the design architecture and technical details.

Three, general construction architecture based on pipeline

1. Technical pre-research

In fact, there are not many products of friends in the industry based on pipeline design, especially foreign similar products, such as Azure DevOps Pipeline and Github Actions, two excellent pipeline products. These two products are rich in function, ease of use, documentation, Considering several aspects of user scale, it has many advantages over other products.
The predecessor of Azure DevOps is Visual Studio Team Services (VSTS), which is a software development collaboration platform with a history of more than ten years. Its Azure Pipeline product was released in April 2018 [3]; Github Actions product was launched in August 2019 Release [4] is a heavyweight product released after Microsoft acquired Github. Generally speaking, both are relatively new platforms, and Azure Pipeline has only been more than two years old.
An interesting phenomenon was discovered in the pre-research. Since Github is already a subsidiary of Microsoft, the two pipeline products are not only similar in design and concept, but also found in the technical pre-research that the Mac virtualization solutions of the two are also technology-sharing, even Mac virtualization clusters. The computer room is also shared. The difference is that Github Actions is more streamlined and elegant than Azure Pipeline. In addition, Github Actions still continues the Github open source style. Its pipeline plugins are all open source. Although it has only been online for more than 1 year, there are already 5000+ open source plugins. From the perspective of plug-ins, this is a gold mine. If these plug-ins can be used directly in Mobile DevOps, the functional plug-ins of the basic pipeline will be aligned with the open source community. Considering the possibility of supporting these open source plug-ins in the future, the final Mobile DevOps design architecture also embraces the Github Actions of the open source community.

2. The core concept of the pipeline

image.png

• Trigger
, which actively triggers a pipeline execution.
• Pipeline
is the smallest unit that is triggered to run. A pipeline can contain one or more Jobs
• Job
Job is the smallest unit to be scheduled. According to the execution environment to which the Job is scheduled, it can be divided into two types: Agent (building a cluster) and Agentless (server);
multiple jobs There can be parallel operation without dependencies, or sequential execution with dependencies. The relationship between multiple jobs can be represented by a DAG diagram;
each job can contain 1 or more steps

Step is the smallest unit to be executed. Each Job is composed of multiple Steps executed in sequence
• Task
Task is a task plug-in with predefined specifications and functions, which can be declared and referenced for execution in Steps. A Step contains only one Task

3. The technical architecture of the pipeline

image.png

The pipeline consists of the following core systems:

1) Pipeline process engine

Responsible for the triggering, orchestration, state flow execution of the pipeline, and the maintenance of pipeline metadata information.
Pipeline trigger module The
trigger module is responsible for triggering the execution of a pipeline, and supports three triggering methods: manual, timer, and event (git event, webhook callback, etc.). Triggers are the only entry point for pipeline execution. In this layer, caller's checksum check can be done, and different trigger parameters can be passed in to control the execution and scheduling process of pipeline.
Pipeline orchestration module
Pipeline orchestration defines a set of DSL language used to describe a pipeline. Based on this DSL language, a pipeline that can be scheduled and executed can be accurately defined.
Pipelined execution module
pipelined execution module of the pipeline to ensure that all Job are parallel or the correct dependency order execution, real-time updates and real-time pipeline flow state.

2) Job scheduling engine

Job is the smallest unit that is scheduled in the pipeline. The job scheduling engine is mainly responsible for scheduling each job generated from the pipeline process engine to the correct construction cluster machine.

3) Integrated engine

There are two types of task plug-ins in the pipeline. One type is Agent tasks, such as Android and iOS construction. This type of task requires a specific construction environment, so it is natural to think that it will be scheduled by the Job scheduling engine to the construction machine; there is another type of task. It is Agentless tasks, such as approval, notification, external system calls, etc. Such tasks can be completed on the ordinary server side without occupying valuable construction resources, and will be scheduled by the Job scheduling engine to the integration engine for execution. Most Agentless tasks are related to external service integration.

4) Channel service

The Channel channel is mainly responsible for building the communication link and protocol implementation between the cluster and the server. The main functions are as follows:
• Construction of a cluster to request unified authentication.
For security reasons, the construction of a cluster is in a different VPC from other microservices. Complete network isolation ensures that the construction of the cluster cannot directly access the server intranet. Based on this background, the construction of the cluster access server in the above-mentioned "pipeline technology architecture diagram" takes the public network HTTPS request, which requires the authentication of the construction machine request, and the Channel channel is the authentication server closing
. The construction of the cluster request is unified To
build a cluster requires real-time heartbeat, status reporting, pull tasks, and task execution status to be maintained with the server. Channel is the interface for these requests and is responsible for allocating requests for different businesses to different microservices.

5) Build a cluster

The construction of the cluster is mainly responsible for pulling and executing the Agent type construction tasks, and the services running in the construction of the cluster are responsible for starting the isolated construction environment matching the task type:
• Starting the Docker container under the Linux platform
Android construction is based on the Linux platform, and the Docker containerization solution under the Linux platform It is the best choice for environment isolation. It starts serverless Docker container based on ACK serverless (Alibaba Cloud Public Cloud K8S product) and automatically destroys and recycles after execution. The cloud-native ACK serverless maximizes the flexibility of constructing clusters, does not occupy almost any computing resources without constructing, and greatly controls the construction cost.
• Starting the virtual machine under the Mac OS platform
Due to Apple’s ecological constraints, iOS and Mac App can only be built under the Mac OS system. Currently, there is no mature Docker-like container solution for Mac OS that can be used. Finally, we implement it based on the virtualization solution. Environmental isolation. We have built a Mac virtualization cluster based on the cloud architecture to completely pool the physical resources of Mac, and can quickly complete the elastic expansion and contraction of the cluster, which is fully in line with the concept of cloud native. A virtual machine is dynamically created from the virtualized cluster for each construction, and it is destroyed immediately after the construction.
It is worth mentioning that Mac virtualization cluster is our technical advantage. In Chapter 5, we will detail the practice of Mobile DevOps in the direction of Mac virtualization cluster.

Four, Mac virtualization to build a cluster

At present, Mobile DevOps's Mac virtualization cluster construction solution is in an absolute leading position in China. We "maybe" the first domestic DevOps platform based on Mac virtualization technology to implement iOS construction. There are almost no domestic manufacturers that support iOS construction. The essential reason is actually the limitation of Mac virtualization technology: the traditional Mac physical bare metal construction can only be used in the internal environment and does not have the conditions for public cloud open services. Mac virtualization to build a cluster solution is the technical advantage of Mobile DevOps.

1. Virtualization solution selection

Due to the limitation of the kernel of the Mac OS platform, the current containerization scheme of the Mac OS platform is extremely immature. The environment isolation of the Mac OS platform basically only leaves the way of virtualization.
The choice of virtualization type
Two types of virtualization solutions are shown in the following figure. Both solutions are implemented based on Hypervisor. The comparison of the two solutions is as follows:
image.png

Virtualization solution 1:
• The unhosted OS is directly based on the Hypervisor virtualized VM, which has high resource utilization and is more suitable for the virtualization solution of cloud services.
• There are higher requirements for hardware compatibility.
Virtualization solution 2:
• The OS on the host The above is based on Hypervisor virtualized VM, which is more suitable for desktop users.
Due to the host OS, the hardware compatibility is better.
Based on the consideration of our Mobile DevOps to provide public cloud services, option 1 can more effectively improve resource utilization , Hardware compatibility can be circumvented by choosing suitable hardware products.
Apple’s ecological security compliance issues
Apple’s ecosystem is closed and has many security compliance restrictions. The Mac platform has the following legal compliance restrictions:

1. MacOS must run on Apple hardware
2. For commercial purposes, an Apple hardware is only allowed to run one macOS instance

image.png

From the comparison of the above four virtualization solutions, only Option 4 has both Apple's ecological compliance and compatibility, and Option 4 is actually the virtualization solution 1 we chose in the previous section. Based on the above virtualization types and Apple’s ecological security compliance and compatibility considerations, we finally selected the above option 4.
##2. Cloud-based virtualization clusters
need to provide public construction services on the cloud. Virtualization solutions are not enough, and a set of virtualization cluster solutions that conform to the cloud architecture are required to meet the needs of Mobile DevOps for building clusters. Appeals:
① Mac hardware resource pooling-each Mac resource in the cluster should be stateless. All Mac hardware resources together form a resource pool, which can be uniformly allocated and scheduled by the cluster.
② Elastic expansion and contraction-the scale of public cloud business has a certain degree of flexibility, which requires that virtual clusters can also adapt to business scenarios, can quickly and flexibly expand and shrink, and keep up with business growth.
③ High availability-In the event that individual Mac hardware devices are damaged, the cluster can quickly and automatically assign tasks to new virtual machines to improve the success rate of task execution.
From a single virtual machine to a virtual machine cluster, in addition to the above-mentioned Mac hardware resource pooling, the newly introduced distributed storage and distributed network problems after the hardware resource clustering have to be solved. The following figure shows from the virtualized single machine to the virtualized cluster. :
image.png

5. Future Outlook

Future outlook

At present, the public cloud Mobile DevOps is still in the public beta stage, and there are many directions that need to be worked hard:
• Increase the ability to build intelligent analysis and prompts for errors. With a large number of public cloud users, constructing incorrect Q&A is a huge labor cost. Follow-up needs to be based on technical means such as keyword matching, big data analysis, and even AI automatic error classification to directly prompt the cause of the construction error, reducing the cost of manual Q&A
• Follow Other EMAS products strengthen more linkages, allowing Mobile DevOps to connect the complete application development life cycle
• Maintain better affinity with the community. Support Github Actions, Azure Pipeline and other platform pipelines to migrate to Mobile DevOps; task plugins directly support Github Actions 5000+ open source plugins to enjoy open source community dividends
• Strengthen the ability to be integrated, so that the Mobile DevOps mobile R&D platform can be better integrated into customers In some R&D processes
• Deeply optimize application compilation and construction efficiency to reduce application construction time. The ultimate goal is to make the application build time on the cloud significantly shorter than the local build, so that developers can intuitively feel the advantages of building on the cloud.
If you are interested in mobile build compilation technology, mobile R&D technology, or cloud native direction, and you are A person who likes technical challenges, welcome to join us, our goal is to "be a leading international mobile DevOps brand". ➡️Click here to view job information.

Citation:

[1] Azure DevOps: What is DevOps?
[2] Baidu Statistics Traffic Research Institute
[3] Microsoft released Azure Pipelines, open source projects can use CI/CD without restriction
[4] All open source projects are free to use, and GitHub built-in CI/CD is finally here!

Guess you like

Origin blog.51cto.com/14989488/2555961