Efficient application of Baidu Hydra tool in mobile UI compatibility testing

Introduction: Although automated testing technology is changing with each passing day, the existence of issues such as automated case construction cost and execution stability make manual testing still an important means of mobile quality assurance. In traditional manual testing, test cases must be executed manually, and efficiency improvement depends on the operational proficiency of testers. This article starts from the introduction of the current situation of UI compatibility testing in Baidu, and introduces the tool Hydra, which is based on the concept of "one machine with multiple controls". Then from the perspective of technical implementation, the overall design idea of ​​Hydra and the design of some core modules are introduced.

1. Background

1.1 Mobile UI Compatibility Test

The UI compatibility test on the mobile terminal, as the name suggests, is to test the consistency of the UI interface display of the mobile terminal application on mobile devices of different models, different resolutions and sizes.

As an important part of the quality assurance of mobile applications, operational testing has long been carried out in a purely manual way. The traditional mobile terminal manual compatibility test is mainly in the following two stages in Baidu's application development process:

a. In the functional testing stage
, there are usually 12 people, in a small number of business line mobile phones (about 13 units), and the total testing time is 10~20 hours.

b. There are 14 people in the full-featured (regression) test phase before going online
, and there are 512 mobile phones in the business line, and the total test time is 20~50 hours.

1.2 Problems faced

In the current manual compatibility test on mobile terminals, there are the following two problems:

a. Difficulty in improving performance
Take a simple example. In a regression test, UI compatibility is verified on 10 devices for 100 cases, and each case takes 1 minute, so it takes about 17 minutes to verify 10x100=1000 cases in total. Hours. These cases require testers to manually operate on each device. Then it is necessary to reduce the time-consuming of the testing phase, only by increasing the testing manpower.

b. Insufficient compatibility testing
For UI compatibility testing, the covered brand, model, system version, UI version and other factors will affect the recall rate of the test; but in the current situation, mobile devices are unique and limited to each business Yes, it is difficult to circulate equipment between business lines.

2. One machine with multiple controls and Hydra

2.1 How Hydra solves the problem of manual compatibility of traditional mobile terminals

The UI compatibility test on the mobile terminal is still carried out by manual testing, and there are objective reasons that cannot be solved at present, such as: the UI interface of the mobile terminal application is iteratively fast, and the cost of generating automated test cases is high; the UI compatibility is stable and cannot be standardized and defined, recalling difficult, etc.

As for the efficiency issue, Hydra tries to improve the efficiency of manual test execution from another idea-that is, through the method of "one machine with multiple controls". One machine with multiple controls, as the name implies, means that the tester controls one "master" device, and his control actions can control multiple "slave" devices at the same time, and perform manual UI verification, so as to achieve more tests per unit time. Purpose. The problem of insufficient equipment is solved by connecting "one machine with multiple controls" to the cloud equipment platform to get rid of the limitations of physical equipment, such as cloud equipment to improve the coverage of compatibility tests.

2.2 User needs

What are the expectations of front-line testers for such a one-machine, multi-control tool that improves the efficiency of manual testing? In summary, there are four aspects:

a. Accurate: The position and effect controlled by the "master" device can be accurately copied to the "slave" device. This is the basic function of a multi-control tool for one machine.
b. Many: including a large number of devices, a variety of devices, and a variety of supported applications.
c. Ease of use: The manipulation experience and interaction should be convenient, fast, and in line with usage habits.
d. Fast: The speed of manipulation is fast.

2.3 Hydra's solution

After comprehensively considering the needs of users, the basic form and technical solution of Hydra are determined:

First of all, we take into account the factor of "many". Since there are two major mobile systems, Android and IOS, the driving methods and toolsets of the devices are very different; secondly, non-native application forms such as small programs and H5 are increasingly emerging. Come out, the driving method of native applications is not suitable for these new forms of applications. Therefore, we decided to use the image algorithm as the core algorithm for action "replication".

The use of image algorithms will lead to the problems of faster acquisition of images and faster calculation of images. Therefore, a wired connection through a PC is adopted to better meet the needs of users for alignment and speed.

Hydra's basic form is a PC program that allows users to connect to local devices via wired connections, or to connect to cloud devices via a network. All the images of the tested applications on the device are displayed directly on the browser, enabling testers to verify the UI interface more intuitively, thereby recalling UI compatibility issues. This display method also solves the problem of verification efficiency caused by the increase of equipment.

For the consideration of testers' usage habits, Hydra also supports operation through mobile devices as "masters".

3. Hydra's technical architecture

Hydra adopts the BS architecture as a whole, and communicates with the front-end display through the http/websocket protocol. The specific presentation form is decoupled from the capability implementation, which can easily expand to new presentation types, such as mobile clients.

Among the functional components, the core is the group control engine and the image synthesizer, which are respectively responsible for the input and output parts of the "one machine multi-control" function.

The real-time input of the user on the "master" device is captured through the browser/client, replicated by the group control engine, and each "slave device" is simultaneously operated. The feedback of the operation is displayed on the user's browser/client via a "live image stream". In this way, a "what you see is what you get" real-time control experience is achieved.


The design and implementation of several core modules in the functional components will be introduced below.

3.1 Group Control Engine

The design goal of the group control engine is to complete one action and execute it multiple times.

The difficulty lies in:

a. Accurate mapping of coordinate-related user input at different resolutions.
b. Under different device performance, the execution sequence of a single action on different devices is a problem.
c. The timing problem of multiple action combinations. (such as click to long press)

The coordinate mapping processing referred to in a. is solved by "multi-scene high-performance image algorithm", which will be introduced in detail in the subsequent sections.

For b. When the group control engine performs actions in parallel with the "slave", it will wait for all coordinate mapping processing to be completed, and start to execute in a unified manner. The effect achieved for the user is that an action will respond "simultaneously".

Establish an action execution queue for the c. group control engine for the "master" and each "slave", and record the timestamp of each action of the "master". When the "slave" executes the action, it is executed according to the relative interval of the action of the "master" to ensure that the execution sequence is as consistent as possible with the user input.

3.2 Live Image Streaming

The real-time image stream is a transmission path for device images and presentations. Its design goal is to realize the "real-time" image presentation of multiple machines, allowing users to view operational feedback faster.

Among the difficulties:
a. The output image frame rate is different under different device performance.
b. The image output frame rate of the network device is unstable
.
d. Front-end display callback explosion

To solve the above stability, we first need to make it clear that for real-time image streaming, real-time performance is more important than fluency (frame rate). Because the user's input action is a discrete behavior, the UI compatibility issues that usually need to be identified are also static. Therefore, we first limited the input frame rate of the device to 16 frames and reduced/weighed the size of each frame image to meet the basic requirements of "smoothness", and at the same time reduce the performance problems caused by the amount of data as much as possible.

After limiting the frame rate of a single device, the front-end callback explosion problem caused by the superposition of multiple devices, through a custom data protocol. Combine multi-camera images to reduce the frame rate of the images received by the presentation layer. In the implementation, a fixed composite frame rate is used, and the input images of all devices are collected regularly to shield the image frame rate differences of different devices for the front-end display.

As shown above, we have n devices, device #1 outputs images stably at a fixed frame rate, device #2 outputs images stably at a lower frame rate, and device #n outputs images erratically. Hydra established a separate "image composition" thread to capture the latest images from all devices at the expected fixed frame rate (16fps), as the image displayed to the user, and composited.

The figure above shows the design of a custom synthetic data protocol. Each frame contains data of all images, and the basic information of the frame is described in the frame header, including how many device images are included. Next, in the individual image data of every other device, the header is included. The data header describes information about the device and the image. At the same time, the data part of the entire data frame is compressed again to reduce the amount of data transmission.

3.3 Multi-scene high-performance image algorithm

After realizing the group control engine and real-time image stream, the basic prototype of group control is realized. But to make the group control experience better, it is necessary to combine the image algorithms introduced in this section.

Coordinate mapping is at the core of the architecture and needs to meet performance, accuracy, and versatility. The algorithm we choose needs to comprehensively consider the above requirements and make trade-offs. Since we are a real-time system as a whole, the requirements for performance will be higher.

At the most basic level, we can choose the mathematical conversion of coordinate values, and convert them according to the ratio of the screen size. Obviously, this algorithm is the simplest and has the highest performance, but the accuracy cannot be satisfied. Because this algorithm can only convert accurately when the images of all devices are exactly the same. Image template matching is a high-performance and high-accuracy image algorithm, but it cannot be recognized after the image is deformed, and its versatility cannot be satisfied.

We choose the sift algorithm as the basic image algorithm, and perform conversion mapping through the calculated feature points. The direct application of the sift algorithm has better performance in accuracy and versatility, but there is a very large bottleneck in performance. The average processing time of a group of images is 2s, which cannot meet the real-time requirements. Therefore, we need to optimize the application of the sift algorithm:

first, we do an area interception of the images of the master and slave devices to reduce the calculation amount of the algorithm; then we do not directly use the calculated feature points to map, because the feature points The distribution of is related to the image, and the input of coordinates can be arbitrary. Therefore, we calculate the relationship (offset angle and distance) between the target point c and a and b in the main control by selecting the existing 2 feature points (red and green points in the figure) a and b. This relationship is applied to the slave control image for calculation, so as to calculate the target coordinates.

The above algorithm can perform well in most scenarios, but there are problems with accuracy in some scenarios (such as lists). Therefore, further optimization is required.

When a group control mapping is initiated, the cnn algorithm is first used to identify the page to determine whether it is currently on the list page, and then the dnn object recognition is used to detect whether there is a pre-trained icon in the image. The above process is only performed once in a group control mapping. Based on the above results, different strategies can be adopted when mapping each device: if there is a preset icon, then directly use dnn to identify the object. Then determine the parameters of the sift algorithm (larger selection box, higher feature threshold) according to whether it is on the list page.

Based on the above algorithm, Hydra achieves coordinate mapping within an average of 160ms and an accuracy of 97.52%.

3.4 Consistency Repair

In the actual use process, it is a relatively common phenomenon that the device image is inconsistent. The reason may be that the "slave" did not click according to the action of the "master" as expected, or an unexpected interface such as a system pop-up window appeared. Inconsistencies in group control are expected. In order to improve efficiency, efficient repair methods have been designed.

First of all, in the control design of Hydra, it supports the independent viewing and control of each device in the group control process. When there is inconsistency between individual devices, they can be repaired immediately and return to the normal test operation process. middle.

Secondly, for scenarios that are prone to inconsistencies, such as list pages, semi-automatic sliding consistency fixes are provided.

As shown in the figure above, when the sliding positions of the "master" and "slave" on the list page are inconsistent, the sift calculation will be performed on the two images respectively to obtain the corresponding feature points. Through the slope between the lines connecting the feature points, it is judged whether the current "slave" image has sliding inconsistency.

Not all feature points can be used. Let's imagine a list page, the navigation bar and the tab page cannot be swiped, and only the content in the middle can be swiped and inconsistent. Therefore, we divide the image into four regions from top to bottom, and use different weights for the feature points in each region. The regions where the uppermost and lowermost parts do not change have the lowest weight value. The slidable part in the middle is also divided into two parts, the upper and lower parts. Due to the screen size, the lower part will display a little inconsistency in the height of the displayed content, which is easy to be misleading, so it is given a medium weight. The upper half area is the area we mainly use to judge, giving the highest weight.

When the feature points with different weights are connected, we perform a weighted average of them, and then we can judge whether there is inconsistency in sliding according to the results, and calculate the difference according to the slope, and then automatically control the equipment to perform reverse sliding compensation, that is, the sliding is completed. Consistency fixes.

3.5 Mobile Handheld Control

The traditional test method of testers is to directly control the handheld device, and the habit of mouse and key control cannot be adapted. Therefore Hydra also provides the operation mode through mobile hand-held control. Hydra offers two options for handheld control:

  1. Remote control scheme

Under this scheme, like the browser scheme, first select a device as the master. Then leverage existing front-end technologies. The image collected from the "master" device will be sent to the browser of the remote control device and displayed, and user input will be collected and sent to the real "master". It's as if you're remotely controlling a "master" device.

The advantage of this solution is that it is simple to implement, utilizes existing front-end technologies, and has good compatibility, which is universal on both Android and IOS.

However, the disadvantages are also obvious: (1) The remote control device itself cannot be used for testing, which will waste a device. (2) If the resolution size of the remote control device and the main control device are inconsistent, the display effect displayed on the remote control device will be scaled and the display effect will be poor.

2. Android client solution

This solution is to take advantage of the openness of the Android platform. Hydra provides an Android client, which can be directly controlled when using this device as the master. Hydra's client will collect user input and directly control all "slave" devices in a group.

So how to capture user input?

The client will create a double-layer floating window on the mobile phone in advance. When the user controls the mobile phone, (for example, the user clicks the coordinates (400, 1000) here, the coordinates of the touch action will be intercepted by layer#1. If the user clicks a virtual button such as back, then the It is intercepted by layer#2. When the action coordinates and key information are obtained, the client sends these information to the group control engine, and "copy" triggers all "slave" machines to execute. At the same time, because the user action has been intercepted , these actions cannot take effect directly. Therefore, the Hydra client also includes a miniature execution engine, which can interpret user actions and perform operations on the current device.

Based on the above two solutions, it is possible to allow users to use mobile devices. Comes with the ability to control multiple machines in one machine.

Fourth, the landing effect of Hydra

Hydra has been implemented in multiple business lines within Baidu, and has achieved positive returns in various business scenarios such as app regression, advertising testing, and operational activities, with an average weekly efficiency improvement of 20% to 70%.

Of course, Hydra also has certain limitations, which are not applicable in some scenarios or the effect is not obvious:

1. It is not suitable for group control scenarios, such as the login of different accounts, which needs to be implemented with an external customized input scheme.
2. It is an icon click with a dynamic background such as a video, which has a relatively large bottleneck in recognition.
3. It is a use case with a long operation link and a complex operation type. After a certain period of group control, the probability of inconsistency will increase.
4. Some complex gesture operations such as zooming, rotating, etc. are not supported yet. We are continuously optimizing and improving the above problems.

Five, further improve the efficiency of manual testing - toolbox

Usually, when we complete the manual testing of the mobile terminal, we need testers to find the tools needed to complete the test, learn and use them. This testing process has a certain learning cost and the effect is difficult to guarantee. Hydra is a tool for improving manual compatibility on mobile terminals. This usage mode also reflects our thinking on the field of manual testing on mobile terminals - a "bridge" is needed between testers and equipment, which is a toolbox , which can be used out-of-the-box to accomplish specific test goals.

Further, it can be a testing process that standardizes the manual testing process, and this standardization is reflected in the definition of testing tasks, the use of tools, the collection of test data, and the verification of results. Depending on the test scenario, recommendations can be selected for the toolset used at each stage. It can also be connected with external case management and bug tracking systems to form a closed loop. In this way, for manual testing, not only the local testing tasks, but also the overall completion efficiency can be further improved.

Original link: https://mp.weixin.qq.com/s/OHmWsHS-_ANrNXj8c5bDNg

 

Baidu Architect
Baidu's official technical public account is online!
Technical dry goods·Industry information·Online salon·Industry conferences
Recruitment information·Introduction information·Technical books·Baidu peripherals
Welcome all students to pay attention!

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324126203&siteId=291194637