1. Technology stack introduction
For our current project, the backend is built with node, and the database is a non-relational database mongodb.
2. Data introduction
The log storage format is as follows:
mainly include:
key | significance |
---|---|
type | current access type |
actionTime | interview time |
content | access content |
erp、fullname、orgname、fullOrgname | User Info |
3. Statistical needs
Requirement 1: Under the statistical report page, pv and uv of different reports
The statistics of data in mongodb are mainly realized through aggregate (aggregation).
Here, by default, you who read this article have a certain understanding of mogodb. If you don't understand it, it doesn't matter, I will provide you with some information, and explain the code and ideas in detail.
Step 1: Filter the list of logs that match the time, log type is page, and access is report page.
{
// mongodb的筛选使用 $match。
$match: {
type:'browse', // 类型为页面
actionTime: { // 筛选时间范围内的
'$gte': formatStart, // $gte 为mongodb中的大于操作符
'$lte': formatEnd // $lte 为mongodb中的小于操作符
},
// 这里content 匹配,是根据业务而定,报表页面都具有 /analysis/event ; 且包含/table 的报表url 才是有效的
// 这里通过 $all 实现同时满足的查询; $in 为其中一个满足;$nin 为其中一个不满足
content: { '$all': [ /\/analysis\/event/, /table/] },
},
},
Step 2: Take the tableName part in content
The page URL we record must be a complete URL, but the last display is a part of the URL. Therefore, the content of content needs to be intercepted. Through the previous article, we unified the URL into a format that all ends in /table/tableName. In other words, we only need to intercept the last tableName.
Since URLs are of variable length, we cannot directly specify the capture index. It can only be done by:
{
// project 为聚合操作中的一个步骤,主要是进行映射
$project: {
_id: 1,
// 通过 $split 对url 进行分割
content: { '$split': ['$content', '/'] },
actionTime: 1, // 不变的话,则可以设置为 1
erp: 1,
fullname: 1
},
},
{
$project: {
_id: 1,
// 通过 $arrayElemAt 取最后一项,即tableName
content: { $arrayElemAt: ['$content', -1] },
actionTime: 1,
erp: 1,
fullname: 1
},
},
Step 3: Perform group statistics
From here, the statistical logic of uv and pv is different.
Think about it first, pv calculates the same page under the same time range. The pv group aggregation code is:
{
// $group 对应 sql 中的 group by
$group: {
_id: {
content: '$content',
// 分组一定是按天分组,用 $substrBytes 取天
actionTime: { $substrBytes: ['$actionTime', 0, 10] },
},
// total 表示将分组统计的结果记为 total
total: { $sum: 1 }
}
},
{
// 由于我们是按 content 和 actiontime 进行分组的,分组后,需要将这两个值展开
$project: { _id: 0, content: '$_id.content', total: 1, date: '$_id.actionTime' }
},
However, when uv is counted, it is counted by person, time, and page at the same time. The uv statistics code is:
{
// 先对每天每个人访问的内容进行去重
$group: {
_id: {
actionTime: { $substrBytes: ['$actionTime', 0, 10] },
erp: '$erp',
fullname: '$fullname',
content: '$content'
},
num: { $sum: 1 }
}
},
{
// 上面分组之后,再统计uv
$group: {
_id: {
actionTime: '$_id.actionTime',
content: '$_id.content',
},
total: { $sum: 1 }
}
},
{
$project: { _id: 0, content: '$_id.content', total: 1, date: '$_id.actionTime' }
},
Requirement 2: Statistics of PV and UV under the overall platform
The pv under the platform as a whole is even simpler.
This is the stats only code.
PV statistics code:
{
$match: {
type:'browse',
actionTime: {
'$gte': formatStart,
'$lte': formatEnd
}
},
},
{
$group: {
_id: { actionTime: { $substrBytes: ['$actionTime', 0, 10] } },
total: { $sum: 1 }
}
},
{
$project: { _id: 0, total: 1, date: '$_id.actionTime' }
},
uv statistics code:
{
// 筛选满足条件的数据
$match: {
type:'browse',
actionTime: {
'$gte': formatStart,
'$lte': formatEnd
}
},
},
{
// 按人去重
$group: {
_id: {
actionTime: { $substrBytes: ['$actionTime', 0, 10] },
erp: '$erp',
fullname: '$fullname',
},
num: { $sum: 1 }
}
},
{
$group: {
_id: {
// 统计人数
actionTime: '$_id.actionTime',
},
total: { $sum: 1 }
}
},
{
$project: { _id: 0, total: 1, date: '$_id.actionTime' }
},
reference documents
Aggregation-related documents are as follows:
Official Aggregation Example Book:
https://www.practical-mongodb-aggregations.com/front-cover.html
mongodb official documentation: Steps for aggregation support
Aggregation Pipeline Stages — MongoDB Manual
mongodb official documentation: operators supported by aggregation
Aggregation Pipeline Operators — MongoDB Manual
Four. Summary
At this point, the recording of user log monitoring requirements is over.
Looking back now, this requirement is actually not complicated, the difficulty is,
Learn how to use mongodb for data statistics and collect the desired information.
The purpose of the summary is also to strengthen the ability to collect information and statistics.
In fact, the Internet industry is often empirical. Your statistical needs, in the eyes of those who have done a complete front-end monitoring, may be a big demand, but for my first close contact with front-end monitoring To someone else, it seems so difficult.
In the follow-up, we will have a deeper understanding, to summarize other types of monitoring implementation methods, and to learn and use mongodb.