Aggregation operation of mongodb

     In mongodb, sometimes we need to perform analysis operations on data, such as some statistical operations. At this time, simple query operations ( find ) cannot handle these requirements, so we need to use aggregation framework ( aggregation ) to complete. There are three ways to complete aggregation operations in mongodb. aggregation pipeline map-reduce function , and single purpose aggregation methods , this article mainly explains the use of aggregation pipeline (aggregation pipeline).

Aggregation Pipeline

    MongoDB's aggregation framework is based on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms documents into aggregated results. Such as document projection, filtering, sorting, grouping, etc. In addition, pipeline stages can use operators to perform tasks such as calculating averages or concatenating strings, etc.

 

The following figure is a simple aggregation example ( this figure is from the official website of mongodb )

   as shown in the figure above: first use $match to construct and filter out the data whose status is equal to A, then use $group to construct grouped data, group by cust_id, and use $sum Perform a grouped sum operation.

 

Aggregation pipeline limitations

       1. Document size limit

             A single document returned by the aggregation cannot exceed 16M, but a single document can exceed 16M during the aggregation process.

       2, the memory limit

By default , the             aggregation phase can use 100M of memory , if it exceeds, an error will be reported. If you want to process data that requires more than 100M of memory, you need to allowDiskUse set it to true so that it can be written to a temporary file. But in the $graphLookupstage , the memory is still limited to less than 100M, even if setallowDiskUse=true, 在此管道阶段会失效,但是如果以其他的管道阶段还是会生效的。当allowDiskUser=false,内存超出发生异常。

 

 Aggregation pipeline stage

$match For filtering data, input for aggregation stage
$order Sort documents by the specified key
$limit Used to limit how many documents are given as input
$skip how many documents to skip
$project Projection fields, which can be understood as how many fields to query, similar to a,b,c in select a,b,c
$group Perform a grouping operation, where the _id field is used to specify the field that needs to be grouped.
$count Returns the number of documents for this aggregation pipeline stage

 For more pipeline stages, click here .

Aggregation pipeline operations, click here

 

basic grammar

db.collection.aggregate( [ { <stage> }, ... ] )

 

Prepare data

db.persons.insertMany([
    {userId : '001',age : 24,salary : 5000,dept : 'Department 1'},    
    {userId : '002',age : 25,salary : 7000,dept : 'Department 2'},    
    {userId : '003',age : 23,salary : 8000,dept : 'Department 1'},    
    {userId : '004',age : 26,salary : 1000,dept : 'Department 3'},    
    {userId : '005',age : 27,salary : 2000,dept : 'Department 2'},    
    {userId : '006',age : 22,salary : 7000,dept : 'Department 1'},    
    {userId : '007',age : 25,salary : 6000,dept : 'Department 3'},    
    {userId : '008',age : 26,salary : 4000,dept : 'Department 3'},    
    {userId : '009',age : 28,salary : 9000,dept : 'Department 2'}
])

 1. Use $project to project the required fields

       * Exclude the _id field

       * Return the age field

       * Generate a new field newAge whose value is the value of the original age field plus 1

db.persons.aggregate([
    {$project : {_id : 0,age : 1,newAge : {$add : ['$age',1]}}}    
])

 2. Use $match to filter data

       It is the same as normal query conditions.

db.persons.aggregate([
    {$match : {age : {$gt : 22}} }
])

 3. Use $sort to sort

db.persons.aggregate([
    {$match : {age : {$gt : 22}} },
    {$sort : {age : 1}}
])

 4. Use $limit and $skip to limit data and filter data

db.persons.aggregate([
    {$match : {age : {$gt : 22}} },
    {$sort : {age : -1}},
    {$limit : 6},
    {$skip : 3}
])

 5. Use $group for grouping operations

db.persons.aggregate([
    {$group : {_id : "$dept",count : {$sum : 1}}}
]);

 

 With the above simple knowledge, we complete a simple exercise.

 Requirement: Get 6 users who are older than 22 years old. If the salary is less than 1000, directly adjust the salary to 4000. After the previous step is done, you need to select the oldest one, and find out the employees of the same age in each department. Average salary and get the three highest paid people.

 Ideas: 1. Project the age (age), part (dept), and salary (salary) fields

            2. Identify employees who are older than 22

            3. Sort by age

            4. Limit the return to 7 pieces of data and skip one piece of data

            5. Group by age of department and find the average score

            6. The average score of the previous step is being sorted in reverse order

            7. Then return 3 pieces of data

 code show as below

db.persons.aggregate([
    { $project : {age : 1,dept : 1,oldSalary : "$salary",salary : {
        $switch : {
            branches : [
                { case : { $lte : ["$salary",1000] }, then : {$sum : ["$salary",4000]}}
            ],
            default : '$salary'
        }
    }} },
    { $match : {age : {$gt : 22}} },
    { $sort : {age : -1}},
    { $limit : 7},
    { $skip : 1 },
    { $group : { _id : {dept : "$dept",age : "$age"},pers : {$sum : 1} , deptAvgSalary : { $avg : "$salary"} } },
    { $sort : {deptAvgSalary : -1}},
    { $limit : 3}
]);

 running result

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326049517&siteId=291194637