MongoDB aggregation operation-02

1. Aggregation operation

Aggregate operations process data records and return computed results. Aggregation operations group values ​​from multiple documents and perform various operations on the grouped data to return a single result. There are three types of aggregation operations: single-action aggregation, aggregation pipeline, and MapReduce.
      Single-Action Aggregation: Provides easy access to common aggregation procedures, operations that all aggregate documents from a single collection.
     Aggregation pipeline is a framework for data aggregation, and the model is based on the concept of data processing pipeline. Documents enter a multi-stage pipeline that transforms documents into aggregated results.
     A MapReduce operation has two phases: a map phase that processes each document and emits one or more objects per input document, and a reduce phase that combines the output of the map operations.

1. Single-action polymerization

     MongoDB provides db.collection.estimatedDocumentCount(), db.collection.count(), db.collection.distinct() such single-action aggregate functions. All of these operations aggregate data from a single collection
document. While these operations provide easy access to common aggregation processes, they lack the flexibility and power of aggregation pipelines and mapReduce.

 

db.collection.estimatedDocumentCou
nt()
Returns the count of all documents in a collection or view
db.collection.count()
Returns the count of documents matching the find() query for a collection or view. Equivalent to
db.collection.find(query).count() construction
db.collection.distinct()
Find distinct values ​​of specified fields in a single collection or view, and in an array
Return the result.
#检索books集合中所有文档的计数
 db.books.estimatedDocumentCount()
#计算与查询匹配的所有文档
 db.books.count({favCount:{$gt:50}})
#返回不同type的数组
 db.books.distinct("type")
#返回收藏数大于90的文档不同type的数组
 db.books.distinct("type",{favCount:{$gt:90}})
Note: On sharded clusters, if there are orphan documents or a chunk migration is in progress, db.collection.count()
The absence of query predicates may result in inaccurate counts. To avoid these situations, on a sharded cluster use
db.collection.aggregate() method.

2. Polymerization pipeline

What is MongoDB Aggregation Framework

MongoDB Aggregation Framework (Aggregation Framework) is a computing framework that can: act on one or several collections; perform a series of operations on the data in the collection; transform these data into desired forms;
In terms of effect, the aggregation framework is equivalent to GROUP BY, LEFT OUTER JOIN, AS, etc. in SQL queries.
Pipeline (Pipeline) and stage ( Stage) The entire aggregation operation process is called pipeline (Pipeline), which is composed of multiple stages (
Stage) , each pipeline: accepts a series of documents (raw data); each stage performs a series of operations on these documents; the resulting documents are output to the next stage;

Aggregation Pipeline Operation Syntax
pipeline = [$stage1, $stage2, ...$stageN];
db.collection.aggregate(pipeline, {options})
        pipelines A set of data aggregation stages. With the exception of the $out, $Merge, and $geonear stages, each stage can appear multiple times in the pipeline.
       options Optional, other parameters of the aggregation operation. Including: query plan, whether to use temporary files, cursors, maximum operation time, read and write strategies, mandatory indexes, etc.

 

Common pipeline aggregation stages

The aggregation pipeline contains a very rich aggregation stage, the following are the most commonly used aggregation stages

 

data preparation

Prepare the dataset and execute the script
var tags = ["nosql","mongodb","document","developer","popular"];
 var types = ["technology","sociality","travel","novel","literature"];
 var books=[];
 for(var i=0;i<50;i++){
 var typeIdx = Math.floor(Math.random()*types.length);
 var tagIdx = Math.floor(Math.random()*tags.length);
 var tagIdx2 = Math.floor(Math.random()*tags.length);
 var favCount = Math.floor(Math.random()*100);
 var username = "xx00"+Math.floor(Math.random()*10);
 var age = 20 + Math.floor(Math.random()*15);
 var book = {
 title: "book‐"+i,
 type: types[typeIdx],
 tag: [tags[tagIdx],tags[tagIdx2]],
 favCount: favCount,
 author: {name:username,age:age}
 };
 books.push(book)
 }
 db.books.insertMany(books);

$project

Projection operation , project the original field into the specified name, such as projecting the title in the collection into name
db.books.aggregate([{$project:{name:"$title"}}])
$project can flexibly control the format of the output document , and can also eliminate unnecessary fields
db.books.aggregate([{$project:{name:"$title",_id:0,type:1,author:1}}])
Exclude fields from nested documents
db.books.aggregate([
 {$project:{name:"$title",_id:0,type:1,"author.name":1}}
])
 或者
db.books.aggregate([
 {$project:{name:"$title",_id:0,type:1,author:{name:1}}}
])

$match

      $match is used to filter documents , and then can be aggregated on the obtained document subsets, $match can be used
Use all normal query operators except geospatial, and put $match at the front of the pipeline as much as possible in practical applications
face position . This has two advantages: one is that unnecessary documents can be quickly filtered out to reduce the workload of the pipeline; two
Yes, if $match is executed before projection and grouping, the query can use the index.
db.books.aggregate([{$match:{type:"technology"}}])
     When the screening pipeline operation cooperates with other pipeline operations, try to put it in the initial stage, which can reduce the subsequent pipeline
The number of documents to be operated by the operator to improve efficiency
  
   
db.books.aggregate([
 {$match:{type:"technology"}},
 {$project:{name:"$title",_id:0,type:1,author:{name:1}}}
 ])

$count

 Count and return the number of results matching the query

db.books.aggregate([
   {$match:{type:"technology"}},
   {$count: "type_count"}
 ])
The $match stage filters out documents whose type matches technology and passes them to the next stage;
The $count stage returns the count of documents remaining in the aggregation pipeline and assigns this value to type_count

$group

Groups documents by the specified expression and outputs each distinctly grouped document to the next stage . output documentation package
Contains an _id field that contains the different groups by key.
The output document can also contain computed fields that hold some accumulator tables grouped by the _id field of $group
expression value. $group does not output specific documents but just statistics.
{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1>
}, ... } }
1. The _id field is required; however, a null value for _id can be specified to calculate a cumulative value for the entire input document.
2. The remaining calculated fields are optional and calculated using the <accumulator> operator.
3. The _id and <accumulator> expressions can accept any valid expression .
accumulator operator
name
describe analogy sql
$avg
Calculate mean
avg
$first
Returns the first document in each set, sorted if available, or the first document in the default stored order if not.
limit0,1
$last
Returns the last document in each group, if sorted, sorted, if not the last document in the default stored order
$max
According to the grouping, get the maximum value corresponding to all documents in the collection
max
$min
According to the grouping, get the minimum value corresponding to all documents in the collection.
min
$pus
Adds the value of the specified expression to an array
$addToSet
Add the value of the expression to a collection (no duplicates, unordered)
$sum
Calculate the sum
sum
$stdDevPop
Returns the population standard deviation of the input values
$stdDevSamp
Returns the sample standard deviation of the input values
The memory limit for the $group stage is 100M. By default, $group will generate an error if the stage exceeds this limit
error. However, to allow handling of large datasets, set the allowDiskUse option to true to enable $group operations
to write to a temporary file.
The number of books, the total number of collections and the average
db.books.aggregate([
 {$group:{_id:null,count:{$sum:1},pop:{$sum:"$favCount"},avg:{$avg:"$favCount"}}}
 ])
Count the total number of book collections for each author
db.books.aggregate([
 {$group:{_id:"$author.name",pop:{$sum:"$favCount"}}}
 ])
Count the number of favorites for each book by each author
db.books.aggregate([
 {$group:{_id:{name:"$author.name",title:"$title"},pop:{$sum:"$favCount"}}}
 ])
The type collection of each author's book
db.books.aggregate([
  {$group:{_id:"$author.name",types:{$addToSet:"$type"}}}
])

$unwind

Arrays can be split into separate documents
{
 $unwind:
 {
 #要指定字段路径,在字段名称前加上$符并用引号括起来。
 path: <field path>,
 #可选,一个新字段的名称用于存放元素的数组索引。该名称不能以$开头。
 includeArrayIndex: <string>,
 #可选,default :false,若为true,如果路径为空,缺少或为空数组,则$unwind输出文档
 preserveNullAndEmptyArrays: <boolean>
 } }
The tag array of the book of the author named xx006 is split into multiple documents
db.books.aggregate([
 {$match:{"author.name":"xx006"}},
 {$unwind:"$tag"}
 ])

 db.books.aggregate([
 {$match:{"author.name":"xx006"}}
 ])
A collection of tags for each author's book
db.books.aggregate([
 {$unwind:"$tag"},
 {$group:{_id:"$author.name",types:{$addToSet:"$tag"}}}
 ])

$limit

db.books.aggregate([
 {$limit : 5 }
 ])
This operation returns only the first 5 documents piped to it. $limit has no effect on the content of the document it is passed.
NOTE: When $sort occurs immediately before $limit in the pipeline, the $sort operation will only maintain the first n knots in the process
, where n is the specified limit, and MongoDB only needs to store n items in memory.

$skip

Skip the specified number of documents entering the stage and pass the rest to the next stage in the pipeline
db.books.aggregate([
 {$skip : 5 }
 ])
This action will skip the first 5 documents that are passed to it by the pipeline. $skip has no effect on the content of documents passed along the pipeline.

$sort

Sorts all input documents and returns them to the pipeline in sorted order.
grammar:
{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }
To sort a field, set the sort order to 1 or -1 to specify ascending or descending sort respectively, as shown in the following example
Show:
db.books.aggregate([
 {$sort : {favCount:‐1,title:1}}
 ])

$lookup

New in Mongodb 3.2 version, it is mainly used to implement multi-table association query, which is equivalent to multi-table association query in relational database
inquire. Each input document to be processed will be processed by the $lookup stage, and the output new document will contain a new
The resulting array (you can name the new key as needed). The data stored in the array column is from the adapter document that is joined into the collection, such as
If not, the collection is empty (ie [ ])
grammar:
db.collection.aggregate([{
 $lookup: {
 from: "<collection to join>",
 localField: "<field from the input documents>",
 foreignField: "<field from the documents of the from collection>",
 as: "<output array field>"
  }
 })
from
The collection waiting to be joined under the same database.
localField
For the match value in the source collection, if a document in the input collection does not have the Key (Field) of localField, during processing, the document will contain a localField: null key-value pair by default.
foreignField
The match value of the collection to be joined. If the document in the collection to be joined does not have a foreignField value, during processing, the document will contain a foreignField: null key-value pair by default.
as
Name the new value added to the output document. If the value already exists in the input collection, it will be overwritten
Note: null = null this is true
Its syntax function is similar to the following pseudo-SQL statement:
SELECT *, <output array field>
 FROM collection
 WHERE <output array field> IN (SELECT *
 FROM <collection to join>
 WHERE <foreignField>= <collection.localField>);
the case
data preparation
db.customer.insert({customerCode:1,name:"customer1",phone:"13112345678",address:"test1"})
db.customer.insert({customerCode:2,name:"customer2",phone:"13112345679",address:"test2"})
db.order.insert({orderId:1,orderCode:"order001",customerCode:1,price:200})
db.order.insert({orderId:2,orderCode:"order002",customerCode:2,price:400})
db.orderItem.insert({itemId:1,productName:"apples",qutity:2,orderId:1})
db.orderItem.insert({itemId:2,productName:"oranges",qutity:2,orderId:1})
db.orderItem.insert({itemId:3,productName:"mangoes",qutity:2,orderId:1})
db.orderItem.insert({itemId:4,productName:"apples",qutity:2,orderId:2})
db.orderItem.insert({itemId:5,productName:"oranges",qutity:2,orderId:2})
db.orderItem.insert({itemId:6,productName:"mangoes",qutity:2,orderId:2})
Association query
db.customer.aggregate([
 {$lookup: {
    from: "order",
    localField: "customerId",
    foreignField: "customerId",
    as: "customerOrder"
    }
  }
 ])

 db.order.aggregate([
 {$lookup: {
 from: "customer",
 localField: "customerCode",
 foreignField: "customerCode",
 as: "curstomer"
 }
},
 {$lookup: {
 from: "orderItem",
 localField: "orderId",
 foreignField: "orderId",
 as: "orderItem"
 }
 } ])
Aggregation operation case 1
Count the number of book documents in each category
db.books.aggregate([
 {$group:{_id:"$type",total:{$sum:1}}},
 {$sort:{total:‐1}}
 ])
The popularity ranking of tags, and the popularity of tags is calculated according to the number of favorites ( favCount ) of their associated book documents
db.books.aggregate([
 {$match:{favCount:{$gt:0}}},
 {$unwind:"$tag"},
 {$group:{_id:"$tag",total:{$sum:"$favCount"}}},
 {$sort:{total:‐1}}
 ])
1. $match stage: used to filter documents with favCount=0.
2. $unwind stage: used to expand the tag array, so that a document containing 3 tags will be disassembled into 3 items.
3. $group stage: Group and calculate the disassembled documents, $sum: "$favCount" means to accumulate according to the favCount field.
4. $sort stage: Receive the output of the group calculation and sort by the total score.
Count the number of book document collections[0,10),[10,60),[60,80),[80,100),[100,+∞)
db.books.aggregate([{
 $bucket:{
 groupBy:"$favCount",
 boundaries:[0,10,60,80,100],
 default:"other",
 output:{"count":{$sum:1}}
 }
 }])

Two, MapReduce

The MapReduce operation splits a large amount of data processing work into multiple threads for parallel processing, and then merges the results into one
rise. The Map-Reduce provided by MongoDB is very flexible and practical for large-scale data analysis.
MapReduce has two phases:
1. The map stage that integrates document data with the same Key
2. Combine the results of the map operation for the reduce stage of statistical output
As of MongoDB 5.0, map-reduce operations have been deprecated. Aggregation pipelines provide more
Good performance and usability. Map-reduce operations can be rewritten using aggregation pipeline operators such as $group,
$merge etc.

Guess you like

Origin blog.csdn.net/u011134399/article/details/131258880