[JS buried code for WEB data collection]

The challenges of a statistical analysis platform architecture come from the following five:

(1) Log collection

(2) Metadata management

(3) Business data modeling

(4) Task scheduling

(5) OLAP engine

 



 

Three Phases of Data Collection

(1) Embedding stage

(2) Data collection stage

(3) Back-end processing stage

 

As shown above, from broswer to page, to javascript and backend backend, the browser returns the normal program running result, and the final log is returned in the local file, which is like a piece of "dark code" buried in the user program. Invisibly "stealing" the user's behavior information.

 

The data collection script will be executed after being requested. This script generally does the following things:

(1) Collect information through browser built-in javascript objects, such as page title (via document.title), referrer (last hop url, via document.referrer), user display resolution (via windows.screen), cookie information (via windows.screen) document.cookie) and so on.

(2) Parse _gaq to collect configuration information. This may include user-defined event tracking, business data (such as the item number of an e-commerce website, etc.), etc.

(3) Parse and splicing the data collected in the above two steps in a predefined format.

(4) Request a back-end script, and put the information in the http request parameter and carry it to the back-end script.

The only problem here is step 4. The common method for javascript to request backend scripts is ajax, but ajax cannot make cross-domain requests. Here ga.js is executed in the domain of the website being counted, and the back-end script is in another domain (GA's back-end statistical script is http://www.google-analytics.com/__utm.gif), ajax will not work. A common method is to create an Image object with a js script, point the src attribute of the Image object to the back-end script and carry parameters, and at this time, the cross-domain request back-end is realized. This is why backend scripts are often disguised as gif files.

 

Show results



 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326022653&siteId=291194637