DoKit: One machine with multiple controls WebView non-invasive injection of JS|DiDi open source

Set Didi Technology as " Star ⭐️ "

Receive article updates as soon as possible

guide

Since its official open source in 2018, DoKit has experienced five years of polishing, and has developed into a relatively complete ecosystem, supporting six major platforms (Android, iOS, Web, Mini Programs, Flutter, and PC), and is a masterpiece of the Didi Open Source Committee. Incubation project. Externally, it is widely used by many leading companies and has gained a good reputation.

Within Didi, DoKit has been implemented in all business lines and apps, basically covering all daily development and testing scenarios, and improving the daily work efficiency of R&D and quality students. However, with the popularization of Didi's internal cross-terminal R&D solutions, while significantly improving R&D efficiency, it has also brought greater pressure to the quality team . Therefore, DoKit, as the twin brother of the cross-terminal R&D solution, will continue to focus on the field of test efficiency in 2023, such as UI automation testing, one machine with multiple controls and other technologies. At the same time, we also share the technical difficulties and solutions encountered in the development of one machine with multiple controls with developers.

This article is divided into:

1. Two major problems encountered in one machine with multiple controls

2. What are the conventional JS injection methods?

  • Inject using loadUrl

  • Inject using evaluateJavascript

  • Modify html to insert <script> tag injection

3. Why is the first line executed?

  • failure experience

  • first trip needed

  • html insert <script> tag injection

4. How to implement HTML insertion <script> tag injection?

  • Why can the first line be executed by inserting the <script> tag?

  • How to intercept HTML file loading?

  • How to insert <script> tag in HTML file?

5. How to achieve non-intrusive code?

  • Non-intrusive setting of WebViewClientProxy

  • When to insert code

  • Set WebViewClientProxy

  • Handling Compatibility Issues

6. Summary

During mobile development, developers often encounter three problems:

1. During the development process, it is necessary to temporarily develop many development tools that are convenient for debugging and testing;

2. There are many entrances to development tools, and it is often impossible to find where the desired function is;

3. Design students need cumbersome operation steps to see whether the UI meets the requirements of the design draft when checking and accepting the UI...

In order to solve these three problems that plague everyone every day, the DoKit team has developed more than 30 basic R&D efficiency tools (including many testing and visual efficiency tools), and all of them are gathered in one portal. All tool functions are clearly visible and convenient. Look for. At the same time, in order to solve the problems of scattered and messy tool entrances and high tool development costs in the daily development process , DoKit has also developed highly customizable capabilities, integrating custom extension tools that are strongly related to business, etc., hoping to do a good job for developers on the mobile end. R&D efficiency tools.

1. Two major problems encountered in one machine with multiple controls

DoKit is a device with multiple controls. Simply put, it is a function that can control multiple mobile phones to perform operations simultaneously by operating one mobile phone. It aims to improve the testing efficiency of the quality team, especially in the aspect of multi-device compatibility testing. In the past year, we have continuously developed and improved it, and its functions have been relatively complete and stable. After completing the functions of one machine with multiple controls on the native side and H5 with one machine with multiple controls, it is found that to use the H5 one machine with multiple controls function, it is necessary to introduce the dokit-for-web.js file when publishing the H5 page in the test environment in order to have a Machine multi-control function.

Although on the native side, one machine with multiple controls is also compiled and initialized by introducing the DoKit dependency package, but when there are H5 pages in the App business process, if the native and H5 one machine with multiple controls cannot be enabled at the same time , the test process It will interrupt the control process of one machine and multiple controls due to encountering the H5 page, resulting in the failure of normal synchronous operation.

9b0fd31e919e49e21c037d312598979b.png

(DoKit one machine multi-control demo screenshot)

It is not difficult to make H5 support one machine with multiple controls, you only need to form dokit-for-web. The machine multi-control code is brought to the online formal environment to run.

In the Debug package of the native app, you can switch between different operating environments (test/official environment) at any time, which will make the H5 multi-control function unavailable when the app is switched to the official environment.

In order to ensure a smooth user experience and enable the H5 multi-controller to run in the test and official environments at the same time , it is necessary to allow the H5 multi-controller to be controlled by the same environment switch as the native side. However, there are differences between the Web front-end R&D system and the client. Although there are test environment deployment services, there are many difficulties in switching H5 to the test environment with one click, such as numerous and scattered H5 pages, and high team coordination costs.

Therefore, the injection of dokit-for-web.js code through the client has become an ideal choice, which can not only solve the problem of synchronous switching of the environment, but also solve the problem that the H5 official environment service cannot integrate one machine with multiple control codes The problem.

Through continuous exploration and attempts to inject JavaScript code into WebView, we have precipitated a set of implementation methods that conform to the DoKit design concept to complete JavaScript code injection.

Let's take a look at the exploration and practice process of DoKit injecting JavaScript into WebView on the Android side.

2. What are the conventional JS injection methods?

To search and query information, the commonly used injection scheme is to set WebViewClient in WebView, call back in onPageFinished() through WebViewClient, and load the JS code that needs to be injected through loadUrl() or evaluateJavascript() after onPageStart() at the earliest, or directly Modify the html file to insert <script> tags to inject JS code.

Inject using loadUrl

Use the loadUrl provided by WebView to inject JS code. This API is the earliest supported by the Android system and can be used in all versions of Android. You need to turn on the js running support switch of WebView.

Sample code:

//打开js运行支持开关
webView.getSettings().setJavaScriptEnabled(true);


public void onPageFinished(WebView webView, String url) {
    webView.loadUrl("javascript:javacalljs()"); 
    supper.onPageFinished(view, url);
}

Inject using evaluateJavascript

Use WebView's evaluateJavascript to inject JS code. Note that this API is available on Android 4.4 and above. evaluateJavascript can directly return the execution result after injecting the code, which is very convenient when the return value is needed, compared with loadUrl which needs to go through a complicated callback method to get the result.

Sample code:

public void onPageFinished(WebView webView, String url) {
    //在android4.4及以上使用
    webView.evaluateJavascript("javacalljs()", new ValueCallback<String>  () {
        @Override
        public void onReceiveValue(String value) {
            //todo 执行js方法的返回结果
        }
    })
}

Modify html to insert <script> tag injection

Modifying html to insert <script> tag injection is a pure front-end JS injection method, which is equivalent to directly writing code and adding a piece of JS code. It is usually used in some scenarios where malicious codes are injected through network interception and modification. This method is not restricted by the browser Api, as long as the network communication can be intercepted.

A normal html file is as follows:

<html>
<head>
    <meta charset="utf-8">
    <meta name="renderer" content="webkit">
    <script type="text/javascript" src="https://dokit.cn/js/h.js/></script>
</head>
<body>
    <div class="main">...</div>
    <div class="pan">....</div>
    <script type="text/javascript" src="https://dokit.cn/js/a.js/"/> <script>
    <script type="text/javascript" src="https://dokit.cn/js/b.js/"/> <script>
    <script type="text/javascript" src="https://dokit.cn/js/c.js/"/> <script>
</body>
</html>

The modified html file is as follows:

<html>
<head>
    <meta charset="utf-8">
    <meta name="renderer" content="webkit">
    <script type="text/javascript" src="https://dokit.cn/js/h.js/></script>
</head>
<body>
    <div class="main">...</div>
    <div class="pan">....</div>
    <script type="text/javascript" src="https://dokit.cn/js/a.js/"/> <script>
    <script type="text/javascript" src="https://dokit.cn/js/b.js/"/> <script>
    <script type="text/javascript" src="https://dokit.cn/js/c.js/"/> <script>
    //以下是注入代码
    <script>javacalljs()</script>
</body>
</html>

3. Why is the first line executed?

failure experience

At the beginning, we used loadUrl and evaluateJavascript to inject JS code with multiple controls on one machine. The test passed quickly after running on the test page, but many problems were exposed when migrating to the actual business scenario. After investigation, it was found that not all network requests were intercepted, and some requested data from the master could not be synchronized to the slave, which resulted in the slave being unable to load data and display the page normally, or some events failed to be intercepted.

The final determination is that the test page is too simple and does not take into account the actual use scenarios. The business side usually uses various frameworks, and these frameworks usually do some operations that guarantee that they will be executed first. It will lead to the failure of the hook of one machine with multiple controls. On the one hand, the framework used by the business needs to be compatible and adapted. On the other hand, some operations in the framework often affect the hook point, so the code of one machine with multiple controls must be executed first , in order to hook to the key point in advance.

Need to execute the first line

If the injection code cannot be guaranteed to be executed first, some hook points required for injection will not work properly. For example, the interception of network requests cannot be completed at the earliest time, which will cause part of the requested data to not be intercepted, and the data synchronization between the master and slave cannot be realized. After the page is rendered, the page rendering is intercepted, and some user operation events on the page cannot be hooked.

Inject JS code with multiple controls on one machine through loadUrl and evaluateJavascript, using the capabilities provided by WebView, and use these interfaces to inject code. There is no accurate injection time to ensure that the injected code runs successfully, and can also be executed in other JS codes. before execution.

html insert <script> tag injection

After communicating with the front-end classmates, to implement the first line of code injection, the injection needs to be implemented through the front-end method. WebView does not have a way to meet business needs, unless you use your own magically modified browser engine to provide such capabilities.

Using the inserted <script> tag can guarantee the basis for the first line to run, and the WebView design adopts a single-threaded model. That is, when the multi-threaded API is not used, the loading of the web page and the execution of JS are performed sequentially.

Then as long as you follow the design rules of html, you can accurately control when the inserted <script> tag is executed.

4. How to implement HTML insertion <script> tag injection

Why can the first line of execution be realized?

In addition to the single-threaded model adopted by the aforementioned WebView, there is a fixed rule for html loading pages. When html is loaded, the <head> tag is loaded first, and then the <body> tag is loaded.

If a <script> tag is encountered in the <head> tag, it will judge whether to import an external JS file or JS code. If it is a file, it will start downloading the external file. If it is a code, the loading of the html page will be suspended. At this time, the javascript engine starts to execute the code. , and continue to load the content in other tags after the JS code is executed. Wait for the content in the <head> tag to be loaded before loading the content of the <body> tag.

If a <script> tag is encountered in the <body> tag, it will be processed in the same way as the <head> tag. It will judge whether the import is an external JS file or JS code. If it is an external file, it will start downloading the external file. If it is a code, it will pause loading the page, and then let the Javascript engine start executing the code. After the JS code is executed, it will continue to load the page.

So the inserted <script> tag should be inserted in the <head> tag, and should be inserted at the front of the <head> tag, and must be inserted code rather than a JS file.

Sample code:

<html>
<head>
    //以下是注入代码,必须放在head的起始位置
    <script>javacalljs()</script>
    <meta charset="utf-8">
    <meta name="renderer" content="webkit">
    <script type="text/javascript" src="https://dokit.cn/js/h.js/></script>
</head>
<body>
    <div class="main">...</div>
    <div class="pan">....</div>
    <script type="text/javascript" src="https://dokit.cn/js/a.js/"/> <script>
    <script type="text/javascript" src="https://dokit.cn/js/b.js/"/> <script>
    <script type="text/javascript" src="https://dokit.cn/js/c.js/"/> <script>
    //以下是注入代码,在这里的不会被首先执行
    <script>javacalljs()</script>
</body>
</html>

How to intercept HTML file loading?

Looking at the code of WebView and WebViewClient, it is found that the network requests sent by WebView can be intercepted. Through the shouldOverrideUrlLoading() method of WebViewClient, all network requests of Webview can be intercepted, including the loading request of html files.

Use webRequest.isForMainFrame() to identify the request for the html file. For the lower version of the url, you can judge whether the file suffix is ​​.html. After intercepting the request, you can directly use the returned WebResourceResponse, or load it into an html file through your own network library, such as OkHttp request. It should be noted that the current thread should be blocked to request the html file.

Sample code:

// 拦截网络请求的回调,在android 21以后被废弃
public WebResourceResponse shouldInterceptRequest(WebView view, String request) {
    if(DoKitUtils.isHtmlRequest(request)){
      //todo 拦截代理请求html并完成JS代码注入
      return DoKitUtils.hookWebViewHtmlRequest(request);
    }
    return supper.shouldInterceptRequest(view, request)
}

Sample code:

// 拦截网络请求的回调,在android 21新增加
public WebResourceResponse shouldInterceptRequest(WebView view, WebResourceRequest request) {
    if(request.isForMainFrame()){
      //todo 拦截代理请求html并完成JS代码注入
      return DoKitUtils.hookWebViewHtmlRequest(request);
    }
    return supper.shouldInterceptRequest(view, request)
}

How to insert <script> tag in HTML file?

parse HTML text

The callback shouldInterceptRequest() provided by WebViewClient can intercept the network request to the html file and get the file. It needs to parse the html file to insert the injected <script> tag to the desired insertion position.

If you use the xml parsing library to parse html files yourself, it is more complicated, and you cannot directly search for the desired tags and modify them. DoKit uses Jsoup to parse html text.

Sample code:

pulic String handleWebViewHtml(String html){
    Document document = Jsoup.parse(html);
    //todo insert dokit <script>
    return document.toString();
}

insert <script> tag

Get the html file and parse the html file through the html parsing library Jsoup, then you need to find the <head> tag in the html, and insert the <script> tag code that needs to be injected into one machine with multiple controls at the beginning of the tag. Finally, construct a WebResourceResponse object return to shouldInterceptRequest() callback.

Sample code:

/**
 * 拦截html请求并注入JS代码
 */
public static WebResourceResponse hookWebViewHtmlRequest(WebResourceRequest request){
    Response response = requestWebViewHtml(request);
    if(response !=null){
        String html = injectScript(toHtml(response));
        return WebResourceResponse("text/html",
            response.header("content-encoding", "utf-8"),
            ConvertUtils.string2InputStream(html, "utf-8")
    }
    return null;
}


/**
 * 插入script标签
 */
private static String injectScript(String html){
    //读取本地js hook 代码
    String dokitScript = ResourceUtils.readAssets2String("dokit/dokit_script.html")
    Document doc = Jsoup.parse(html)
    doc.outputSettings().prettyPrint(true)
    val elements = doc.getElementsByTag("head")
    if (elements.size > 0) {
        elements[0].prepend(dokitScript)
    }
    return doc.toString()
}

5. How to achieve non-intrusive code?

At first, I thought it was easy to set up WebViewClient, just call setWebViewClient() to set WebViewClientProxy after creating WebView. Bytecode modification has been used in many functions in the DoKit project. And a reference already exists for bytecode modification of WebView. In actual operation, it is found that there are still some places to pay attention to to achieve a complete non-intrusive bytecode modification.

Non-intrusive setting of WebViewClientProxy

First of all, we need to have no intrusion into the code (the consistent style of DoKit function, the code is not intrusive, and can be used or removed at any time), so we cannot allow the access party to actively set the WebViewClientProxy after creating the Weview. We choose a mature technology that uses more scenarios in DoKit, and use ASM to modify the bytecode during tranform to implement setting WebViewClientProxy for WebView.

To modify the bytecode to implement the setting of WebViewClientProxy, you need to use the DoKit plug-in, and ensure that the switch of the plug-in is turned on.

When to insert code

first insertion time

The first insertion opportunity is when the WebViewClient is created, and the WebViewClient is set through the ASM insertion code when the constructor is called. But because WebView is a system class, it is not in the compiled product, and it is not a class written by itself, so it is impossible to inject code into the constructor to set WebViewClient. And usually the business side will set its own WebViewClient in the WebView. Even if the WebViewClient is set in the constructor or when creating the WebView, there will be a risk of being overwritten and the setting will fail. Setting the WebViewClient in the constructor does not suffice.

Sample code:

public class WebView{
  //构造函数
  public WebView(Context contex){
    setWebViewClient(new WebViewClientProxy())
  }
}

second insertion time

The second insertion opportunity is to modify the setWebViewClient function through ASM when executing setWebViewClient(), and inject a piece of code to set WebViewClientProxy. Also because WebView is a system class, it is not in the compiled product, and it is not a class written by itself, so it cannot be modified by modifying the bytecode. And because there is no guarantee that all WebViews will set the WebViewClient, there is no guarantee that the insertion will be successful. If a WebView does not set its own WebViewClient, the setting of WebViewClientProxy will fail, so even if the WebView can be modified, it cannot guarantee to cover all scenarios.

Sample code:

public void setWebViewClient(WebViewClient webViewClient){
  WebViewClientProxy = new WebViewClientProxy(webViewClient);
  super.setWebViewClient(WebViewClientProxy);
}

third insertion time

The third insertion opportunity is when the loadUrl() method of WebView is called to load the page or JS code. Before loadUrl, it should be after Webview is ready. If WebViewClient needs to be set on the business code, then WebViewClient should be set before loadUrl. When setting WebViewClientProxy, you can get the already set WebViewClient to implement proxy without affecting business functions. loadUrl is also our last chance to set up WebViewClientProxy, and it is also the method that must be called to load the page.

But there is also a problem here. You cannot directly modify the bytecode of WeView's loadUrl method to insert and set the logic of WebViewClientProxy.

Since you cannot directly modify the loadUrl method of WebView, you can only modify the bytecode where all loadUrl is called. It is necessary to scan all lines of code, which is relatively inefficient when modifying bytecode in this way. If the access party can use a custom WebView to rewrite the loadUrl method, it can avoid the time-consuming processing of relatively large bytecodes caused by scanning all.

Sample code:

klass.methods.forEach { method ->
    method.instructions?.iterator()?.asIterable()
        ?.filterIsInstance(MethodInsnNode::class.java)?.filter {
            if ("loadUrl".equals(it.name)) {
                "hook loadUrl() all ${className} ^${superName}^${it.owner} :: ${it.name} , ${it.desc} ,${it.opcode}".println()
            }
            (it.opcode == Opcodes.INVOKEVIRTUAL || it.opcode == Opcodes.INVOKESPECIAL)
                && it.name == "loadUrl"
                && (it.desc == "(Ljava/lang/String;)V" || it.desc == "(Ljava/lang/String;Ljava/util/Map;)V")
                && isWebViewOwnerNameMatched(it.owner)
        }?.forEach {
            "${context.projectDir.lastPath()}->hook WebView#loadurl method  succeed in :  ${className}_${method.name}_${method.desc} | ${it.owner}".println()
            if (it.desc == "(Ljava/lang/String;)V") {
                method.instructions.insertBefore(it, createWebViewInsnList())
            } else {
                method.instructions.insertBefore(it, createWebViewInsnList(method))
            }
        }
}

Sample code:

/**
 * 创建webView函数指令集
 * 参考:https://www.jianshu.com/p/7d623f441bed
 */
private fun createWebViewInsnList(): InsnList {
    return with(InsnList()) {
        //复制栈顶的2个指令 指令集变为 比如 aload 2 aload0 / aload 2 aload0
        add(InsnNode(Opcodes.DUP2))
        //抛出最上面的指令 指令集变为 aload 2 aload0 / aload 2  其中 aload 2即为我们所需要的对象
        add(InsnNode(Opcodes.POP))
        add(
            MethodInsnNode(
                Opcodes.INVOKESTATIC,
                "com/didichuxing/doraemonkit/aop/WebViewHook",
                "inject",
                "(Ljava/lang/Object;)V",
                false
            )
        )
        add(
            MethodInsnNode(
                Opcodes.INVOKESTATIC,
                "com/didichuxing/doraemonkit/aop/WebViewHook",
                "getSafeUrl",
                "(Ljava/lang/String;)Ljava/lang/String;",
                false
            )
        )
        this
    }
}

Set WebViewClientProxy

Through the above code, the hook that calls loadUrl for all WebViews is implemented. After interception, the specific logic is written in normal java code. Determine whether the incoming object is WeView (the method name has been determined to be loadUrl when inserting, but it cannot be sure that it must be WebView). Then you need to start JS support switch and other settings to ensure that the inserted JS code can be executed.

Sample code:

private static void injectNormal(Object webView) {
        if (webView instanceof WebView) {
//            webView.setWebContentsDebuggingEnabled(true);
            if (!(WebViewCompat.getWebViewClient(webView) instanceof DoKitWebViewClient)) {
                WebSettings settings = webView.getSettings();
                settings.setJavaScriptEnabled(true);
                settings.setDatabaseEnabled(true);
                settings.setDomStorageEnabled(true);
                settings.setAllowUniversalAccessFromFileURLs(true);
                webView.addJavascriptInterface(new DoKitJSI(), "dokitJsi");
                webView.setWebViewClient(new DoKitWebViewClient(WebViewCompat.getWebViewClient(webView), settings.getUserAgentString()));
            }

Handling Compatibility Issues

Compatible with different versions of android

WebViewCompat.getWebViewClient(webView) has version compatibility issues and only supports version 26 and above.

Compatible with version 21-25, you need to hook the setWebViewClient method of WebView, then store the WebViewClient object when setting WebViewClient, and then set it to WebViewClientProxy when you need to use it.

Support for different WebView implementations

Support X5 Webview When setting WebViewClientProxy, if it detects that the incoming object is another browser engine, replace WebViewClientProxy with the corresponding proxy implementation.

Sample code:

if (X5WebViewUtil.INSTANCE.hasImpX5WebViewLib()) {
    if (webView instanceof WebView) {
        injectNormal((WebView) webView);
    } else if (webView instanceof com.tencent.smtt.sdk.WebView) {
        injectX5((com.tencent.smtt.sdk.WebView) webView);
    }
} else {
    if (webView instanceof WebView) {
        injectNormal((WebView) webView);
    }
}

6. Summary

By intercepting HTML file downloads and inserting a <script> tag containing JS code at the beginning of the HTML <head> tag, we have achieved non-invasive JS code injection on the Android side, which is stably supported in the DoKit multi-control function. The one machine multi-control function runs on the H5 page.

On the whole, the method used by DoKit is not the most efficient, nor is it the smallest code modification, but it is indeed suitable for the needs of DoKit multi-control scenarios. We hope that our technical attempt of injecting JS code into WebView will help those in need.

You are welcome to pay attention to [DoKit official exchange group], and you can give us feedback in the group if you have any problems in use. At the same time, you are also welcome to submit issues and PRs to us through the official DoKit code repository.

2c7013f5163a47ee0049d75b82476f97.png

 QQ scan code to join DoKit official communication group

Guess you like

Origin blog.csdn.net/DiDi_Tech/article/details/131447474