Dry Share: Totoro deep plowing and harvesting in the field of automated testing

Automated testing framework Totoro by ants gold dress terminal engineering experiment platform technology group Technology independently developed an automated testing framework, support for  Android , iOS , HTML5 , applets , weex , Cube  and other mobile end automated test scenarios.

In order to ensure that the ants gold dress mobile test platform in a clustered environment can be stable and efficient operation of automation tasks and the flexibility to quickly support multi-scene within business, Totoro gone from 0-1, from 1-2, and gradually evolved to the current support Ali domain one of the group's most extensive use of surface, the performance of the most stable framework for automated testing and automated testing 10+ BU daily combination of mobile development platform mPaaS external output, become.

This article will focus Totoro step on the way to the pit, sound and mature iterative process, expand the share from a number of methodologies and solutions precipitation summary:

  • Totoro C / S architecture design model
  • The stability of the whole link building
  • Android App automatic intelligent installation
  • Abnormal pop governance of the whole scene
  • Totoro important milestones and future plans

 

Totoro C / S architecture design model

Ants gold dress mobile test platform initially cited Appium open source solutions, but because of its complexity of deployment, the interface is unstable, dropped equipment, multi-link service, community maintenance was not quick enough and other problems, a comprehensive assessment of the industry has a similar framework common pain points, so we decided to design a suitable measuring cloud cluster environment to meet the needs of different business domains fast iterative update solution.

Based on existing pain points, we believe Totoro from the design to meet the need "to call link as short as possible," "transparent project structure as simple as possible," and so on, to ensure that instability on the test link as little as possible. At the same time, considering the abnormal situation, we need to be able to quickly locate the problem, and have a certain self-healing capabilities. The industry commonly used in conjunction with a plurality of frame three or more layers design, Totoro eventually became a two-tier design C / S model.

Two-tier design concept actually bring many advantages to Totoro, such as:

  1. Unified service integrated into the mobile phone side, reduced the complex to call the PC side : as long as the Client-side and mobile phone links, you can begin to automate the process, avoiding the command center of transit service and resource management service itself;
  2. The real C / S architecture model : both in speed and stability automated call link has been a qualitative improvement, in addition to simple architectural model framework to ensure the feasibility of the recent rapid iteration.

 

The stability of the whole link building

Measured the face of ants cloud cluster automation stringent requirements, stability problems still surfaced, becoming Totoro have to solve a difficult problem.

There are exceptions may occur at any node link automation tasks, so stability is actually covered a number of levels, such as:

  • Stability of API functions framework itself;
  • Hosting service stability;
  • Device Link online stability;
  • Stability of the network devices;
  • Hardware Hub stability.

Next, we set forth the entire call link those efforts we have made from the above five aspects.

1. Abnormal comprehensive governance program

Totoro framework in early development, routine maintenance needs to invest a lot of energy every day to face their own shortcomings caused by abnormal framework and various business itself is abnormal. At the same time, various types of abnormal problems require manual screening, so as to promote their own framework and the business side to solve. The results of this is that, most of the cloud measuring task because that code problems caused by the termination, resulting in test development is not stable enough.

In order to improve the instability caused by abnormal task, it puts an end to the frame itself SDK problems and abnormal able to do business intelligent classification, we first did a whole stack of the type of exception reporting statistics. According to statistics background can be roughly divided into "business logic layer abnormal" and "SDK layer exception" for detection of problems, we put a special focus on research and development effort, the logical framework abnormal repair caused by unreasonable to eliminate SDK their own problems; abnormal for mass services we do a layer of abstraction classification, the business logic is classified as abnormal expressly give some tips and suggestions to promote and add detection point status check; now abnormal, heavy footsteps layer retry do a presentation for certain cases even result Success rate.

In the abnormal process of governance procedures, we found that businesses mostly need to encapsulate some business logic at all operational stage of the process use cases, however SDK layer will have some initialization process, use cases through JUnit run up once the business package or SDK layer interface calls the practical No, it may lead to program instability. Thus, existing traffic demands in a more status Totoro frame, and the problem has been found in the daily, self-customized set of standardized Totoro case life cycle, business use cases can be encapsulated in the respective nodes hook logic process.

 

 

2. Phone hosting services guarantee stability

Totoro framework in the phone's core services (TotoroUiautomator / TotoroWDA) used during execution cases, you will find a link fails, the service is unavailable, etc., this system is more instability caused by the restrictions, can do is at the right time restart the service, to protect the entire automated process normally.

  • API interface level exception analysis, filtering service exceptions, initiate a restart. And can retain or recover a field cases.
  • Agent daemon process monitoring service in rotation, dynamic start exception service (for TotoroWDA).
  • Port forwarding failed pseudo anomaly analysis, through the network detection, false trigger restart logic.

 

3. Phone stable link strategy

Phone dropped calls problem is automation tasks in the process must face, Totoro joint ants cloud test platform with a set of hardware and software support services online full link.

Ⅰ. Resilience dropped on the software link

Ability to link refers to software integrated in a terminal device Totoro Client recovery program, embedded in the underlying communications interface, once the device is dropped, via remote web service, send a message to the core mobile phone service, it was found that the device owner permissions restart the phone ADB, if still fail will be usbreset PC end of the link.

Under normal circumstances, almost three times restart the phone ADB will recover. Individual cases recovery fails, there will be on-site to report detailed information, and triggers changedevices strategy to replace the phone re-testing tasks to ensure the normal flow. If the reported data based on historical statistics, analysis of old equipment in an unstable state often dropped, the downgrade will take measures to exchange links require low pool equipment (such as monkey cell) or offline operation.

Ⅱ. Device link escort capability on hardware link

在硬件链路的稳定性构建中,大多云测平台选择购买质量较好的 USB Hub。然而蚂蚁云测平台目前要面临每日 7k+ 级别的自动化任务和 mPaaS 金融云级别的用例稳定性挑战,经过实验,市面上再好的设备也无法达到的所有工程需要的质量标准,并且缺少智能控制模块。因此蚂蚁终端工程技术部实验平台组自研了一套 SmartHub,具备独立稳定的供电模块,每个端口可远程程序自动控制(电压/电源/重置等)。目前为止 SmartHub 已经全面量产并投入使用,效果图如下:

4. 设备网络稳定性

设置网络服务的稳定提供,我们主要做了以下几方面尝试:

  • 用例失败检测点的网络探测快照,快速定位用例失败现场是否有网络问题;
  • SLM 云终端服务管理手机网络,能够自动链接指定网络,并具备网络异常后重置链接能力;
  • 云测平台集群环境升级机柜,隔离网络,保障网络热点稳定性。(终端实验室的集群服务将以规范化的屏蔽机柜为单位,提供稳定的移动自动化服务)。

 

5. 多维度策略 提升用例成功率

在真实的用例构建环境中,需要有很多细节策略点保障整个服务的稳定运行,这里主要罗列几条主要的方案:

  • 针对顽固偶现的不稳定因素,采取 DeviceChange(更换设备)策略。
  • 针对手机内存、资源等系统限制,会采用 DeviceReboot(重启手机)策略。
  • 针对偶现的既定的几种抽象异常类型,采用重新执行策略,有效提高成功率。
  • 针对全场景的异常点,钉钉报警,及时发布补丁。

 

安卓App全自动智能安装

蚂蚁云测自动化执行集群环境中,应用全自动智能安装是最常见场景之一,然而 Android ROM 的碎片化和各个厂商的定制化,导致在安装过程中需要适配各种各样的弹窗;甚至部分厂商需要登录态且要求输入账号密码,导致在数以千计的机型集群环境中全自动智能安装应用成了一个挑战。如下图部分安装弹窗场景:

 

1. 技术选型

Totoro 框架的自动化服务能力是基于 Uiautomator2 深度定制的,因此整个服务会以 APK 形式安装在手机端。要做到一套完整的全自动安装方案,就必须抛弃在 Totoro 服务 APK 里实现。

最终,我们采用了可以独立在手机中免安装直接运行的 Uiautomator1 方案进行实现,作为独立的安装弹窗处理专项进行迭代更新。

针对国内机型及云测机房全线机型,安装弹窗专项项目,前期以全覆盖的方式抽象弹窗点击规律,dump 页面控件信息,查找关键字,做了机型纬度的适配,并且在每个任务有安全失败报警机制,研发人员能够快速兼容问题机型,及 UI 变更。

最终实现了一套可以处理大部分 ROM 安装弹窗场景的持续迭代的智能安装弹窗处理方案。

2. 智能盲点

由于整个弹窗处理依赖与 dump 控件信息逻辑,某些厂商(华为、vivo、oppo 等)为了防止黑产及其他安全考量,部分安装链路上的弹窗页面会禁止 dump 功能,导致我们获取不到页面信息,而无法判断应该点击的页面坐标信息。

针对该场景,我们对机房的手机做了大量的安装调研,发现弹窗的 button 出现的位置区域和意义是有一定规律的,有些需要服务重启才能 dump 控件信息,有些是按照版本及机型呈现规律的 UI 样式,有些需要特殊的手机 Action 才能获取相应事件。我们将这些规律进一步抽象分类,做了一套智能盲点逻辑,针对无法 dump 到的场景具备拓展兼容的能力。

3. 算法辅助实践

智能盲点在个别规律没有考虑周全的场景下仍然会出现失败的情况,那么,如何构建一套自适应的能力呢?

因此,我们在思考是否可以结合 AI 能力来智能分析页面信息,由算法结果提供具体的点击路径方案,从而快速兼容遗留场景。

目前结合 OCR 服务,Totoro 具备智能分析界面信息,精准获取点击目标坐标,完成弹窗处理的能力。后续将结合深度算法实践,采用安装场景模型数据,让算法直接给出操作建议,完成整个场景的自适应兼容方法。

4. 云测效果视频

目前自动化安装组件经过多纬度的场景兼容,已具备一定自适应能力,能够完成日常自动化安装任务,目前已处于极低成本的维护状态。除了应用在日常自动化任务中,该功能也嵌入了云测平台的远程租用功能,以下是安装效果:

全场景的弹窗治理

移动自动化测试过程中的各种手机弹窗是影响用例稳定性执行的重要因素之一,面对各种类型及场景的弹窗,Totoro 框架中自研了一套全场景的弹窗治理方案:

 

1. 深度改造安卓 Watcher 接口

异常弹窗的处理中,安卓框架中给出了 UiDevice.registerWatcher 接口方案。但是我们实际使用中发现,这个接口回调不是稳定的,更加官方解释,当自动化过程中查找一个控件失败时候才会触发回调。

为了能够构建多场景的监听机制,必须要有一套页面监听的稳定回调接口。经过翻看UiWatcher相关源码发现,可以通过 hook,主动触发 runWatchers() 。而我们需要做的,还需要在页面弹窗变化时,稳定触发该接口。

安卓 Accessibility 服务可以通过注册,监听弹窗或者页面甚至一个细微的控件变化,为了性能均衡,只需注册弹窗变化回调事件即可。这样一套稳定的弹窗监听回调机制就构建好了。

2. 多维度注册监听

有了保障 registerWatcher 接口的回调稳定性的机制,那个我们就可以依赖这个接口去监听页面UI的变化,做到稳定处理页面弹窗。结合业务需求及日常用例场景,Totoro 框架中可以针对以下纬度来监听页面变化,做到几乎全场景的弹窗治理。

  • 注册关键字文案监听
  • 注册内容模糊匹配,精准点击目标控件
  • 注册 desc 文案
  • 注册资源 ID
  • 注册目标控件,触发一个 Action

 

3. 机器学习图像检测方案

然后面对无法 dump 到控件信息的非 Native 页面(H5 /小程序),就需要结合机器学习的方式,采用算法能力去分析页面 UI 结构,去处理页面中可能的异常弹窗。

Totoro 算法同学自研了一套控件 dump 算法能力,脱离平台及页面渲染方式,可以将 App 截图通过算法生产页面原始控件图,满足非 native 场景的弹窗处理。

目前机器学习的分析能力仍然在快速迭代中,除了应用在弹窗页面分析处理外,还应用在页面异常类型检测(包括加载失败、控件截断黑白屏等),已成功落地小程序日常准入和支付宝钱包日常兼容性等重要业务线中,后续会推广到更多的业务中去,让 AI 赋能不是一句空话。

 

重要里程碑与规划

Totoro自动化测试框架从立项到现在已经走过近三个年头,目前仍然处于快速迭代时期。最近一年,项目自身稳定性质量有了质的提升,在与蚂蚁云测平台共同努力下,越来越多的域内 BU 选择蚂蚁云测和 Totoro 作为移动自动化云测方案。

规划

为更好的支撑域内及 mPaaS 移动自动化测试测试技术,高效输出 Totoro 实验 SDK ,我们还有很多事情可以完善。

未来,我们将从以下几个场景发力,朝着规范化可扩展多语言平台插件化方向继续努力发展。

  • 继续降低用例维护成本;
  • 完善多端脚本语言支持;
  • 标准化文档、项目配置等构建;
  • 加强 AI 赋能,继续深挖落地场景;
  • 构建开发者社区,拥抱开发者,支持域内更多的业务线,最大价值化项目的业务价值。

 

Guess you like

Origin blog.csdn.net/weixin_44326589/article/details/96994529