Big Data Course L1 - Overview && Overall Architecture of Website Traffic Project

Email of the author of the article: [email protected] Address: Huizhou, Guangdong

 ▲ This chapter’s program

⚪ Understand the case overview of website traffic projects;

⚪ Understand the data burying points and collection of website traffic items;

⚪ Understand the overall structure of the website traffic project;

1. Overview of website traffic projects

1. Background description

Website traffic statistics are one of the important means to improve website services. By obtaining user behavior on the website, we can analyze which content is popular and which pages have problems, so that website improvement activities can be more targeted.

2. Description of statistical indicators

Commonly used website traffic statistics indicators generally include the following situation analysis:

1. Analysis by online situation

Online situation analysis records the activity information of online users respectively, including: visit time, visitor region, origin page, current stay page, etc. These functions are very helpful for enterprises to grasp their own website traffic in real time.

2. Analysis by time period

Time period analysis provides traffic changes on the website within any period of time. Or traffic changes from a certain period of time to a certain period of time, such as small period distribution, daily visits distribution, which is a good analysis for enterprises to understand the time period when users browse web pages. .

3. Analysis by source

Source analysis provides data such as the number of visits, IP, unique visitors, new visitors, number of new visitor views, and total number of site views brought by the domain name. This data can directly allow companies to understand the origin of promotion effectiveness, thereby analyzing which websites have more effective advertising effects.

2. Data embedding and collection

1 Overview

The so-called burying point is to collect some information in a specific process in the application to track the usage of the application, and then use it to further optimize the product or provide operational data support, that is, to collect data through data burying point, such as collecting: visit ( Visits), visitors (Visitor), time on site (Time On Site), page views (Page Views, also known as page views) and bounce rate (Bounce Rate, also known as bounce rate).

A typical data platform consists of the following five steps for data processing:

Among them, we artificially take the first step, that is, data burying and collection is the most basic issue. Whether the data collection is rich, whether the collected data is accurate, and whether the collection is timely will directly affect the application effect of the entire data platform.

Two ways to bury points:

1. Your own company develops and injects code into the product for collection. For example, write the hidden code in a js, and then put it on an application website.

2. Use third-party statistical tools, such as Umeng, Baidu Mobile, Rubik's Cube, App Annie, talking data, etc.

Implementation of this project:

We implement burying points through js code. Write a specific js script and then embed it on the web page that requires log analysis (actually, the js file is embedded through the <script> tag).

3. Description of log data collection module

1 Overview

The ultimate purpose of log collection is to aggregate user access logs to the target website to a specific directory in the HDFS file system so that they can be provided to the next step of data cleaning module for processing. To complete this work, you need to do the following steps:

1. JS buried points

2. Log server setup

3. Log collection

2. JS buried points

We need to collect the access status of a certain web page. The common method is to embed a JS script on the web page, so that when the user visits the page, the JS script on the page will dynamically add an <image/> tag to the page, and The src attribute of <image/> points to the URL address of a transparent image under the log server.

This URL parameter will be accompanied by some user access information (such as the URL of the visited page, a cookie to identify the user, etc.), so that this information can be obtained by analyzing the access log file of the log server (usually nginx or apache) . This JS buried script can be implemented by writing it yourself. The code snippet is as follows:

js main function code:

function ar_main() {

//The path to which logs are submitted after collecting

Guess you like

Origin blog.csdn.net/u013955758/article/details/132723480