User behavior monitoring practice of front-end monitoring 2 (data statistics mongodb)

1. Technology stack introduction

For our current project, the backend is built with node, and the database is a non-relational database mongodb.

2. Data introduction

The log storage format is as follows:

 mainly include:

key significance
type current access type
actionTime interview time
content access content
erp、fullname、orgname、fullOrgname User Info

3. Statistical needs

Requirement 1: Under the statistical report page, pv and uv of different reports

The statistics of data in mongodb are mainly realized through aggregate (aggregation).

Here, by default, you who read this article have a certain understanding of mogodb. If you don't understand it, it doesn't matter, I will provide you with some information, and explain the code and ideas in detail.

Step 1: Filter the list of logs that match the time, log type is page, and access is report page.

      { 
        // mongodb的筛选使用 $match。
        $match: {
          type:'browse', // 类型为页面
          actionTime: {   // 筛选时间范围内的
            '$gte': formatStart,   // $gte 为mongodb中的大于操作符
            '$lte': formatEnd   // $lte 为mongodb中的小于操作符
          },
          // 这里content 匹配,是根据业务而定,报表页面都具有 /analysis/event ; 且包含/table 的报表url 才是有效的
          // 这里通过 $all 实现同时满足的查询; $in 为其中一个满足;$nin 为其中一个不满足
          content: { '$all': [ /\/analysis\/event/, /table/] },
        },
      },

Step 2: Take the tableName part in content

The page URL we record must be a complete URL, but the last display is a part of the URL. Therefore, the content of content needs to be intercepted. Through the previous article, we unified the URL into a format that all ends in /table/tableName. In other words, we only need to intercept the last tableName.

Since URLs are of variable length, we cannot directly specify the capture index. It can only be done by:

      {
        // project 为聚合操作中的一个步骤,主要是进行映射
        $project: {
          _id: 1,
          // 通过 $split 对url 进行分割
          content: { '$split': ['$content', '/'] },  
          actionTime: 1, // 不变的话,则可以设置为 1
          erp: 1,
          fullname: 1
        },
      },
      {
        $project: {
          _id: 1,
          // 通过 $arrayElemAt 取最后一项,即tableName
          content: { $arrayElemAt: ['$content', -1] },
          actionTime: 1,
          erp: 1,
          fullname: 1
        },
      },

 Step 3: Perform group statistics

From here, the statistical logic of uv and pv is different.

Think about it first, pv calculates the same page under the same time range. The pv group aggregation code is:

{
  // $group 对应 sql 中的 group by
  $group: {
    _id: {
      content: '$content',
      // 分组一定是按天分组,用 $substrBytes 取天
      actionTime: { $substrBytes: ['$actionTime', 0, 10] },
    },
    // total 表示将分组统计的结果记为 total
    total: { $sum: 1 }
  }
},
{
  // 由于我们是按 content 和 actiontime 进行分组的,分组后,需要将这两个值展开
  $project: { _id: 0, content: '$_id.content', total: 1, date: '$_id.actionTime' }
},

 However, when uv is counted, it is counted by person, time, and page at the same time. The uv statistics code is:

{
  // 先对每天每个人访问的内容进行去重
  $group: {
    _id: {
      actionTime: { $substrBytes: ['$actionTime', 0, 10] },
      erp: '$erp',
      fullname: '$fullname',
      content: '$content'
    },
    num: { $sum: 1 }
  }
},
{
  // 上面分组之后,再统计uv
  $group: {
    _id: {
      actionTime: '$_id.actionTime',
      content: '$_id.content',
    },
    total: { $sum: 1 }
  }
},
{
  $project: { _id: 0, content: '$_id.content', total: 1, date: '$_id.actionTime' }
},

Requirement 2: Statistics of PV and UV under the overall platform

The pv under the platform as a whole is even simpler.

This is the stats only code.

PV statistics code:

{ 
  $match: {
    type:'browse',
    actionTime: {
      '$gte': formatStart,
      '$lte': formatEnd
    }
  },
}, 
{
  $group: {
    _id: { actionTime: { $substrBytes: ['$actionTime', 0, 10] } },
    total: { $sum: 1 }
  }
},
{
  $project: { _id: 0, total: 1, date: '$_id.actionTime' }
},

uv statistics code:

{ 
  // 筛选满足条件的数据
  $match: {
    type:'browse',
    actionTime: {
      '$gte': formatStart,
      '$lte': formatEnd
    }
  },
}, 
{
  // 按人去重
  $group: {
    _id: {
      actionTime: { $substrBytes: ['$actionTime', 0, 10] },
      erp: '$erp',
      fullname: '$fullname',
    },
    num: { $sum: 1 }
  }
},
{
  $group: {
    _id: {
      // 统计人数
      actionTime: '$_id.actionTime',
    },
    total: { $sum: 1 }
  }
},
{
  $project: { _id: 0, total: 1, date: '$_id.actionTime' }
},

reference documents

Aggregation-related documents are as follows:

Official Aggregation Example Book: 

https://www.practical-mongodb-aggregations.com/front-cover.html

mongodb official documentation: Steps for aggregation support 

Aggregation Pipeline Stages — MongoDB Manual

mongodb official documentation: operators supported by aggregation

Aggregation Pipeline Operators — MongoDB Manual

Four. Summary 

At this point, the recording of user log monitoring requirements is over.

Looking back now, this requirement is actually not complicated, the difficulty is,

Learn how to use mongodb for data statistics and collect the desired information.

The purpose of the summary is also to strengthen the ability to collect information and statistics.

In fact, the Internet industry is often empirical. Your statistical needs, in the eyes of those who have done a complete front-end monitoring, may be a big demand, but for my first close contact with front-end monitoring To someone else, it seems so difficult.

In the follow-up, we will have a deeper understanding, to summarize other types of monitoring implementation methods, and to learn and use mongodb.

Guess you like

Origin blog.csdn.net/qq_34539486/article/details/129184256