Uncover the mystery of JS technology without buried points

1. Background

I believe that many people have come into contact with the concept of **"buried point" , whether it is front-end or back-end development, we can use this technology to produce some operational raw data (interface time-consuming, program installation/startup, user Interaction behavior, etc.), and then analyze them to get some abstract indicators (such as retention rate, conversion rate), and then determine the direction of product operation or code optimization. There are many well-known data platforms in the industry, such as Google Analytics, Facebook Pixel, Mixpanel, GrowingIO, Zhuge IO, TalkingData, Shence Data, etc. There are countless large numbers of votes. These platforms are purely for data analysis, and they also serve specific areas such as advertising. For monitoring conversion, all provide multi-terminal (Android, iOS, Web, applet, ReactNative) embedded point SDK and more comprehensive BI services. In the past two years, many platforms have begun to promote a technology called "no burial point"**. The following takes the Web side as an example to uncover its mystery.

2. What is no buried point?

**"No burial point" is called on some foreign platforms Codeless Tracking. As the name implies, it is possible to write "less" burial point code. The "code burying point"** generally requires developers to write code, listen to events generated by a certain html element, and then call the data reporting interface to send data. The non-buried point can be configured by non-technical personnel (such as operations, products) in a visual tool, and then the behavior generated in the html element can be reported to the background. Below is a screenshot of the visualization tool of the Mixpanel platform.

In this tool, you need to enter the url of the page first. After the page is loaded, a toolbar for visual configuration will appear. Click Create Event, you can enter the element selection mode, click an element on the page (such as button, a element) with the mouse, and you can set the name of the event (such as call TEST) in the pop-up dialog box . After saving this configuration, if the page is viewed in the browser, an TESTevent will be reported to the background when the button just configured is clicked . We can also set TESTup some properties when reporting an event. These properties are also selected on the page with the mouse and then saved.

Seeing this, first of all, from the product level, we have a more specific understanding of what the "buried point" is. The non-buried point is to use the visualization tool to configure the elements that need to be monitored on the page, and set the element to generate behavior The data that needs to be reported at the time. But there is a very critical point that must be mentioned. To make the "no buried point" work, a piece of basic code of the JS SDK must be embedded in the page, but there is no need to call the specific data reporting interface of the SDK.

Therefore, the key to the "no buried point" technology is:

Operate the visual configuration tool and save the configuration
How the SDK basic code reports behavior based on configuration

Here's how to achieve these two keys.

3. Key technology

1. Basic code

Like the code burying point , for the "no burying point" to work, there must be a section of "basic code" in the web page.

<!-- start Mixpanel --><script type="text/javascript">(function(e,a){if(!a.__SV){var b=window;try{var c,l,i,j=b.location,g=j.hash;c=function(a,b){return(l=a.match(RegExp(b+"=([^&]*)")))?l[1]:null};g&&c(g,"state")&&(i=JSON.parse(decodeURIComponent(c(g,"state"))),"mpeditor"===i.action&&(b.sessionStorage.setItem("_mpcehash",g),history.replaceState(i.desiredHash||"",e.title,j.pathname+j.search)))}catch(m){}var k,h;window.mixpanel=a;a._i=[];a.init=function(b,c,f){function e(b,a){var c=a.split(".");2==c.length&&(b=b[c[0]],a=c[1]);b[a]=function(){b.push([a].concat(Array.prototype.slice.call(arguments,
0)))}}var d=a;"undefined"!==typeof f?d=a[f]=[]:f="mixpanel";d.people=d.people||[];d.toString=function(b){var a="mixpanel";"mixpanel"!==f&&(a+="."+f);b||(a+=" (stub)");return a};d.people.toString=function(){return d.toString(1)+".people (stub)"};k="disable time_event track track_pageview track_links track_forms register register_once alias unregister identify name_tag set_config reset opt_in_tracking opt_out_tracking has_opted_in_tracking has_opted_out_tracking clear_opt_in_out_tracking people.set people.set_once people.unset people.increment people.append people.union people.track_charge people.clear_charges people.delete_user".split(" ");
for(h=0;h<k.length;h++)e(d,k[h]);a._i.push([b,c,f])};a.__SV=1.2;b=e.createElement("script");b.type="text/javascript";b.async=!0;b.src="undefined"!==typeof MIXPANEL_CUSTOM_LIB_URL?MIXPANEL_CUSTOM_LIB_URL:"file:"===e.location.protocol&&"//cdn4.mxpnl.com/libs/mixpanel-2-latest.min.js".match(/^\\/\\//)?"https://cdn4.mxpnl.com/libs/mixpanel-2-latest.min.js":"//cdn4.mxpnl.com/libs/mixpanel-2-latest.min.js";c=e.getElementsByTagName("script")[0];c.parentNode.insertBefore(b,c)}})(document,window.mixpanel||[]);
mixpanel.init("46042714e64a7536dde6f02af1aec923");</script><!-- end Mixpanel -->

The above is the basis for the code Mixpanel platform, this platform different home base code , much the same, is a piece of IIFE form of compressed js code, after the execution is complete, the head of which inserted a new script tag, asynchronous to download The real core SDK code works down. So it's not that the basic code can report behavior according to the configuration, but the basic code will download a "bigger" SDK core code, this code is the real function of the SDK.

The advantage of doing this is that the basic code is very short, it will not affect the performance of the web page when it is loaded, and the core SDK code update does not require users to update this basic code.

2. Unique identification of the page

When configuring element behavior, you need to uniquely identify a page, so as to ensure that the configuration of page A will not be sent to page B, and will not cause page B to produce the behavior configured in page A. The URL is used to identify the page in the Web. The URL is composed of protocol, domain, port, path, and parameters. When storing the configuration, the parameters of the URL must be brought forward and stored. The location of url parameters can be changed, such as urlA ( http://a.b.com/c.html?pa=1&pb=2) and urlB ( http://a.b.com/c.html?pb=2&pa=1) urlA ！== urlB, but they are actually a page.

3. Unique identification of the element

After the page is uniquely identified, the next step is to uniquely identify the elements in the page, so as to ensure that the element A1 configured in the A page can be found by the SDK to monitor the events it generates.

In html, the elements are organized by DOM Tree. If you start along element A1 and record its parent and its index in parent up to the root node body, then you can get the only element A1 in DOM Tree path.

HTML elements also have many attributes, such as css class and id can be used to locate elements. Through the Chrome Developer Tools, you can see that the Mixpanel visualization tool uses the https://github.com/Autarc/optimal-select library to generate the unique identification of the element when configuring the element . And there is a library like https://github.com/rowthan/whats-element on Github , which can also generate the unique identifier of the element in the DOM Tree.

In addition, there are platforms that use them when identifying elements xpath. This is also an idea.

4. How to find elements

As mentioned above, an element can have a unique identifier, so with a unique identifier, you can use its principle to find this element. A very useful API is document.querySelector()that this API can find the corresponding element based on the CSS selector. Further, the method according to the identification elements can also be used document.getElementById(), document.getElementByName()to achieve the look of elements.

What needs to be emphasized here is that if the page is modified after the configuration is completed, causing the DOM Tree to change, the unique identifier of the element that needs to be monitored may also change at this time. It is likely that the element cannot be found based on the previous configuration, or the element that we want to monitor is not found, which leads to a more obvious change in the number of events generated. For the stability and accuracy of the data, there should be corresponding monitoring alarms to handle this case, and prompt the user to reconfigure the page. I personally think this is the biggest shortcoming of no buried point.

5. Implementation of highlighting effect and visual interaction when marking elements

This is a more detailed point. In fact, those who are familiar with js know that there are countless ways to achieve the 类hovereffect when the mouse moves on the element. After clicking the element, a dialog box will pop up, allowing the user to enter the configuration information. . But what I want to say is that once we implement the interactive interface of the visualization tool by dynamically adding elements to the page, it may destroy the original DOM Tree structure of the page. This leads to errors when generating the unique identifier of the element, so it must be handled carefully to ensure that the generated element identifier will not be affected.

I see that Mixpanel adopts CustomElementand ShadowDOM, and all the functions of the visualization tool are Web Componentimplemented with custom . Although it is currently only supported by Chrome Web Component, it is really a bit embarrassing. . In this way, customized elements and interactions will not affect the user's web page DOM. Of course, if your visualization tool is implemented very lightly, such as just putting the user's web page in one iframe, most of the interaction is handed over to the parent page of the iframe to handle, then it can also be configured to minimize the damage to the user. Of the page.

6. How to control page jump in configuration tool

When entering the visual configuration state, we can let the user click on an element, and then a dialog box will pop up, allowing the user to configure the element. At this point, what if the clickbehavior of this element itself is page jump? What should we do?

This is essentially an interaction design issue. In the visual configuration tool, there should be two basic interactive operations. One is to allow the user to select an element for configuration; the other is to allow the user to trigger the original behavior of the page.

Why is there a second interaction? Because our tool must support users to visually configure secondary pages, right? In other words, a dialog box may pop up on the user's page. There is a button in the dialog box. The user monitors this button and configures it, right? Simply put, it is the original click behavior on the user page, which may cause changes in the page structure, such as jumps, pop-up dialog boxes in the page, and so on.

The problem is easy to solve. In addition to clicking, it would be better to design an interaction to support the original click behavior on the user's web page. You can use "right click" or "hold shift+click". Anyway, don't interact with the default webpage in a way that is prone to conflict.

Finally, I would like to mention it for a long time before I have thought about how to prevent the page from jumping when the user clicks. Later, I learned that the event flow of DOM is divided into three stages: capture, target, and bubbling. So in order to prevent the user from clicking on the birth page to jump, add a listener to the document in the capture phase to intercept the event and continue to distribute it.

The simple sample code is as follows:

document.addEventListener('click', e => {
  // 如果是按住shift的点击，那么保持原有的行为
  if (e.shiftKey) {
    return;
  }
  // 如果是单纯的点击，那么拦截分发
  e.preventDefault();
  e.stopImmediatePropagation();
  // 获取元素的唯一标识，然后让用户进行配置等等
  this._selectElement(e.target);
}, true); // useCapture必须为true

Four, summary

It can be seen that "no buried point" is not zero intrusion . The user's webpage still needs to load the SDK code (unless you are a browser manufacturer, you can add the inject basic code to the webpage when loading the webpage). It's just that the reporting code of each behavior event does not need to be manually written by the developer, but is configured by the operator with a visualization tool, so it may be more appropriate to call it **"visualization buried point"**. We know that data collection is the basis and prerequisite of data analysis. If data collection is not done well, everything else is a castle in the sky.

Here is a summary of the pros and cons of the "no buried point" technology. The advantage of no buried point is that the technical cost is low, and it is very user-friendly. It does not need to be redeployed, and the configuration can take effect after completion. But its shortcomings are also very obvious. It does not have the flexibility and depth of code embedding. It can only collect data visible to the user's naked eye, cannot obtain data in memory, and cannot adapt to changes in page structure. So in actual production, It is necessary to selectively use buried-point technology where appropriate.

A little bit more about the choice of product design and technical solutions, can the product support the collection of memory data? Of course it is possible. For example, the "custom analysis" of the WeChat applet can support the reporting dataof the properties under the page . At this time, although it is also a visual configuration, the operator will definitely not know the variable names in the code, and developers must participate in the configuration. Just work. Regarding data loss after the page structure changes, there are also solutions that can be broken. For example, Codeless Tracking on the Mixpanel platform actually collects click events and reports on all pages in the page, and then calculates the number of conversions based on the user's configuration in the background. The advantage of this is that if the user receives an alarm and modifies the configuration after the page changes, then the data reporting scheme is full, so the platform is able to trace the past data back. And the solution we mentioned above is based on the configuration to issue, find and monitor the specified elements, and then report the data is on-demand reporting, data errors cannot be traced back. However, everyone knows that the full amount of data reported is too unfriendly. The amount of data is too large. Not only does the front end consume a lot of resources, but for data backtracking, the storage pressure on the background will increase, and most of the stored data is still invalid. This cost is a bit high.

Five, reference materials

JS buried point technical analysis
https://github.com/Autarc/optimal-select
https://github.com/rowthan/whats-element
https://www.zhihu.com/question/38000812
** The author: ** unclechen
Link to this article: 2018/06/24/Uncover the mystery of JS technology without buried point/
Copyright notice: This article uses Creative Commons Attribution-Non-Commercial Use-Same Way Sharing 4.0 International License Agreement for licensing. Please indicate the source!