The evolution of kiwi fruit screen projection

Author's note: Kiwi Screen Mirroring has developed with Kiwi TV until 2022, and has reached more than 3 million daily active users. Both users and we have put forward more demands and higher requirements for the functions and performance of screen mirroring, so it will start in 2022 The screen projection function and performance have been systematically expanded and optimized. This article is based on the TV side and introduces to you the difficulties and solutions faced in the process of optimizing screen projection on the iQiyi site. We are open to your corrections and suggestions.

01

   Optimization process review

Since taking over the screen projection function at the beginning of 2022, we have successively carried out work such as function expansion and troubleshooting efficiency improvement. By the end of 2022, we still felt that the iteration and problem handling efficiency of the screen projection function was not high. As a bridge between mobile phones and TVs, the screencasting function has high requirements for its reliability and stability. Only by laying a solid foundation can we achieve steady and long-term progress. Therefore, we have started the process of screencasting optimization, targeting unstable screencasting services and online data. We seek thorough solutions to the three major problems of unavailability and low efficiency in online troubleshooting .

Problem 1: Screencasting service is unstable

In order to ensure maximum availability, the screen mirroring service needs to survive independently of the client process, so it is started by a sub-process. In order to more flexibly iterate and fix online problems, it needs to be deployed and upgraded independently, so it uses an independent plug-in. Although the historical version of the screen casting service architecture can support the above two points, the single service solution adopted (the screen casting service is registered to the client through ModuleManager) cannot well support the two-way communication stability of screen casting, screen casting service monitoring and Stay alive.

The new solution adopts a dual-service design and is based on the Binder mechanism of the Android system, which can stably and reliably sense the peer status and monitor the connection status. Use Bind and Start to start the Service at the same time to increase the priority of the screencasting process to achieve better keep-alive effects and provide more stable two-way communication capabilities.


Problem 2: Online data is not available

In the old screen projection service architecture, data management cannot cover the entire process, resulting in incomplete reported data, inability to monitor the online quality of the screen projection service, and the inability to analyze and solve online problems.

The new screen projection service architecture is designed with three levels of delivery monitoring:

  • Screen projection service module operation and reliability monitoring
  • Screencasting protocol startup and results
  • Steps to push the link

Each level establishes a corresponding business session Session mechanism. Each business process generates a unique SessionId as a session identifier, which connects the entire business logic life cycle and reports corresponding business data at each key node as the basis for online data analysis.

  1. Screen projection service module

The design goal at this level is to ensure and improve the overall reliability of screencasting, service functions and process keep-alive, retry and reconnection and other data collection.

This module completed the collection of online device process keep-alive status information, exposed and verified the reasons for the instability of the old architecture, and implemented targeted avoidance in the new version. like:

Problems exposed by data feedback

Avoidance and improvement solutions

The startService method starts the child process. The process priority is low, and the process is easily recycled and causes frequent restarts.

Add Bind method to increase process priority

The low-performance device process takes a long time to start, and higher versions of Android will trigger ANR exceptions (Service.startForeground is not called in time)

Start Service in Bind mode, and then add startService after success.

The implementation of the plug-in mechanism is defective, and the process priority control of the flag parameter of bindService is lost, causing the child process to be recycled.

In plug-in mode, abandon the sub-process mode and run the screen mirroring service in the main process

The LMK mechanism of some Rom is strict. When the memory is tight, in order to keep alive the child process with higher priority, the main process visible in the foreground may be killed.

For problematic devices, abandon the sub-process mode and switch the screencasting service to the main process mode.

  1. Screencasting protocol started

The core function points of the screen projection service are at the protocol layer and network layer. The design goals of this level are to start the screen projection protocol module and track the results, monitor changes in the system network, and restart the screen projection protocol module in a timely manner to ensure that the screen projection service is available under the new network.

After verification and improvement, this module has completed the monitoring of screencasting protocol startup and statistics of failure reasons, and collected and summarized the network card and connection information of each online device, and the distribution of protocol startup failures in each entrance scenario, for analysis and improvement. The protocol startup success rate provides source data and optimization feedback. Online analysis and solutions to existing problems are as follows:

Problems exposed by data feedback

Avoidance and improvement solutions

Network change scenario, protocol startup success rate is low

Under normal circumstances, the device network is already in an unstable state when the network is changed and cannot be avoided for the time being.

The process survives in the background, and the device sleeps and activates, causing the network to be closed and opened. It will frequently trigger network changes and restart the protocol module. The network is not ready when it is first activated.

Delayed processing of network change events can avoid some abnormal scenarios, but this delay cannot be accurate, cannot completely avoid the period when the network is not ready, and cannot handle scenarios where the activation time is short and the network is dormant.

Based on the presence of network cards and IPs at startup, try to eliminate startup failures without IPv4.

Some devices have dual network cards connected at the same time, WIFI frequently triggers disconnection and reconnection, and the screen projection protocol module frequently restarts and the failure rate increases.

Optimize the network card selection strategy and give priority to network cards with active network types in the system to avoid frequent restarts of protocol modules due to alternate selection of different network cards.

Some devices failed to obtain the active network of the system or had no network. In fact, the network was available and received some delivery with no network error code.

Optimize the network card selection strategy. The active network of the system is only used as a reference. When the network card and IP status are available, continue to start the protocol module.

  1. Push link link

This link includes the TV end receiving the push request, verifying the data and local capabilities, pre-caching the start-up data, launching the interface for playback, recording each stage and the first frame rendering time, etc.

Through this level of statistical data, it is possible to analyze Qimo screencasting and DLNA screencasting failure and damage at each link, stage time consumption, start-up success rate and start-up time, etc. The optimization points of the film promotion process are as follows:

Problems exposed by data feedback

Avoidance and improvement solutions

Link loss in cross-process pullup

Pull up across processes for device adaptation and switch to the main process mode to avoid starting up new processes.

Link loss during activity startup phase

Background Activity startup is damaged, and system restrictions cannot be avoided. By guiding users to open the Kiwi app in advance, we can avoid the background startup scenario.

Link loss at first frame rendering

The first frame rendering rate is related to the playback success rate and the resources of the movie push, and cannot be solved at the push link level.

Qimo screencasting time-consuming optimization

For Qimo screencasting scenarios, optimize and delete the interface calls in the link link, coordinate with iQiyi App, add necessary information fields, and avoid requesting interfaces when pushing movies.

  1. Screen projection indicator system

Establish a bulletin board for the screen projection quality system, pay attention to the trend changes of important indicators after the new version is launched, and compare it with the old version over the same period. This includes the startup success rate of the screencasting service, the startup success rate of the screencasting protocol, the average time it takes to start broadcasting Qimo videos, etc.

  1. Problem discovery and analysis examples

5.1 Screen mirroring protocol SDK startup failure and optimization process

1) Problem discovery

At the beginning of each release, the success rate of the screen mirroring SDK will habitually fall below 90%, as shown below

2) Analyze the reasons

There must be a reason for the abnormality. After analyzing the delivery data of the screen mirroring SDK startup during the problem period, and ranking it based on the device dimension, we found that the SDK startup failure problem has the following characteristics:

  • The device models are relatively concentrated, and MagicBox_M20C/A two models contribute 80% of the errors.
  • Device IDs are relatively concentrated and trigger frequently. The device IDs that trigger problems for the two models only account for 3-4% of their DAU.

Difficulties encountered in reproducing:

  • There are no two devices in the test equipment library
  • They are all older models and it is no longer possible to purchase equipment.

We can only conduct an in-depth analysis of the screen projection service startup and protocol startup delivery data sequences of individual devices in order to find commonalities. We randomly checked several serious device IDs and found that:

  • When the failure occurs, the active network of the system is wired and connected to wifi. The wifi frequently sends change notifications.
  • When starting the protocol, eth0 and wlan0 are alternately selected. Changes in the network card cause frequent restarts of the SDK, resulting in a huge number of startup failures. The shortest interval can be 6 seconds each time.
  • The number of abnormal devices is not large, but the amount of abnormal data generated is large.

From this we can infer the scenario where the problem occurs:

  • When both the wired network card and the wireless network card are connected, the ROM of the two devices Tmall Magic Box M20A/C will frequently (interval <5s) notify the WIFI network to disconnect or reconnect.
  • The old network card selection strategy of Kiwi is to give priority to the wifi network card. In this scenario, the wired network card and the wireless network card will be used alternately, and the screen mirroring SDK will be restarted.
  • At this time, during the network change period, the network card status is unstable, and frequent startup increases the probability of startup failure of the screen mirroring SDK.
3) Optimization plan and data verification

Upgrade and adjust the network card selection strategy, add a new network card selection strategy, and support cloud configuration switching between old and new strategies to facilitate data comparison between different strategies:

  • Prioritize system active network card
  • Wired network takes priority over WIFI

支持新选网卡策略的版本上线后,云配控制M20A/C设备的新版本选网卡策略,如下图橙线(v13.6)走势,投屏SDK成功率明显拐点上行,云配生效后(红色圈)止住下跌趋势,证明新策略有效,之后版本曲线不再出现严重(<90%)的下探

5.2 投屏SDK启动无网络错误码占比偏高

1) 问题发现与分析

版本全量后,投屏SDK成功率仍在98%左右徘徊,离目标99%仍有距离;为此,需要聚焦错误原因,解决错误数据大头,快速提升投屏SDK成功率。

搜集投屏SDK启动数据,以设备维度聚合,按各类错误总数逆序排行表,发现:

  • Top10中,索尼占据了9席,比较典型
  • 从错误类型看,无网络错误占比较大,相应原因是获取系统当前活跃网络出错或无网络
2) 优化方案及数据验证
  • 更改有无网络的判断依赖,系统活跃网络仅作为参照项,检测失败不阻碍后续启动
  • 判断网卡IP作为兜底,如果网卡存在合适IP,可忽略系统活跃网络

新版本上线后,针对该批设备云配网络判断策略,40款设备收集线上修改前后数据进行对比验证如下:

  • 10款型号(涉及sony和小米),错误数/率下降 90%+,效果显著
  • 9款型号,错误数/率下降 50%+,效果明显
  • 10款型号,错误数/率下降仅20%+,效果一般
  • 4款型号,效果低于/接近10%,效果不明显
  • 6款三星设备,未升级覆盖,几乎无效

应用新策略后,全量后整体无网络错误率下降一半左右。如下图,红框所示的版本全量区域,13.7/13.8对比13.6同期优化幅度近50%,红圈区域为应用新策略时间段13.6的错误率下降趋势.

此次适配优化后,版本全量后,投屏协议启动成功率可达98.5%+

问题三:投屏线上报障解决效率低

  1. 困难与对策

困难描述

影响范围

解决方案

TV端日志不全

缺少关键日志,无法定位问题

新投屏服务架构完善了投屏进程的日志上报功能,基于新的日志体系,能够补充更多关键日志

只有单端日志

无法支持双端联合分析

增加移动端投屏报障联动功能,即移动端投屏报障会给TV发指令追加一份TV日志到同一工单;找不到TV设备的问题,协同客服同学引导用户双端报障

只能收集到应用内的日志,无系统日志

无法分析系统行为

暂时无法解决,只能尽量增加应用调用系统接口的日志

只能个案分析

个案问题基本上没有共同特征,无法归纳分析并解决;而且无法判定影响程度

结合线上数据协同分析,尝试解决一类问题,而不是一个问题

扩展发现设备的途径,增加局域网扫码投屏功能,优化网络抖动等网络不稳定原因导致的无法找到TV设备

扩展通信通道,增加远程投屏,建立广域网通信通道

  1. 批量分析方法

关联质量投递数据,建立用户报障批量分析流程,提升用户反馈分析效率,流程如下图

02

   未来可期

总结过去是为了更好的创造未来。经过多团队共同努力,至2023年底,投屏功能在稳定性(99%+)、成功率(98.5%+)、可监控等方面取得了阶段性的成果,为投屏功能的进一步发展、创新打下了坚实的基础。

投屏的未来何去何从?电视作为家庭娱乐中心的地位短时间还不会被轻易撼动,手机作为个人不可或缺的贴身设备,短时间也很难找到替代品,投屏作为连接手机和电视的桥梁,未来目标是实现1+1>2的效果:

  1. 各取所长:
    1. 电视的观影体验更好(大屏、高画质、好音效),但是操控不够便捷;
    2. 手机的操控便捷,但是观影体验不如电视;
  2. 开疆拓土:打破边界、拉近距离,会产生更能多可能性。
    1. 远程投屏:将手机与电视的互动从局域网扩展到广域网,延伸了投屏的边界,同时拉近了人与人的距离,让你的手机可以连接父母的电视;
    2. 万物互联:物联网作为当下科技创新大潮中的一员,已经崭露头角。电视作为家庭的中心,手机作为个人的的延伸,已经通过投屏建立了连接,随着更多家用设备接入物联网,一定能借由投屏这座桥产生更多可能性。

未来已来,愿与大家共同努力创造爱奇艺投屏新生态。



本文分享自微信公众号 - 爱奇艺技术产品团队(iQIYI-TP)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

老乡鸡“开源”了 deepin-IDE 终于实现了自举! 好家伙,腾讯真把 Switch 变成了「思维驰学习机」 腾讯云4月8日故障复盘及情况说明 RustDesk 远程桌面启动重构 Web 客户端 微信基于 SQLite 的开源终端数据库 WCDB 迎来重大升级 TIOBE 4 月榜单:PHP 跌至历史最低点 FFmpeg 之父 Fabrice Bellard 发布音频压缩工具 TSAC 谷歌发布代码大模型 CodeGemma 不要命啦?做的这么好还开源 - 开源图片 & 海报编辑器工具
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4484233/blog/11044122