6.1 Aggregation framework overview

A call to the aggregation framework defines a pipeline , the aggregation pipeline, where the output from each step in the pipeline provides input to the next step. Each step executes a single operation on the input documents to transform the input and generate output documents.

  • $project: specify fields to be placed in the output document
  • $match
  • $limit
  • $skip
  • $unwind
  • $group
  • $sort
  • $geoNear:Select documents near a geospatial location
  • $out
  • $redact: Control access to certain data

image.png

  1. db.products.aggregate([ {$match: …}, {$group: …}, {$sort: …} ] )

image.png

a detailed comparison of SQL commands to the aggregation framework operators
image.png

6.2 E-commerce aggregation example

image.png

6.2.1 Products, categories, and reviews

an example of counting the number of reviews for a given product using this query:

  1. product = db.products.findOne({'slug': 'wheelbarrow-9092'})
  2. reviews_count = db.reviews.count({'product_id': product['_id']})

how to do this using the aggregation framework

  1. db.reviews.aggregate([
  2. {$group : { _id:'$product_id', // Group the input documents by product_id
  3. count:{$sum:1} }} // Count the number of reviews for each product
  4. ]);

image.png


add one more operator to your pipline so that you select only the one product you want to get a count for:

  1. product = db.products.findOne({'slug': 'wheelbarrow-9092'})
  2. ratingSummary = db.reviews.aggregate([
  3. {$match : { product_id: product['_id']} }, // Select only a single product
  4. {$group : { _id:'$product_id',
  5. count:{$sum:1} }}
  6. ]).next(); // Return the first document in the results

image.png


Calculating the average review

  1. product = db.products.findOne({'slug': 'wheelbarrow-9092'})
  2. ratingSummary = db.reviews.aggregate([
  3. {$match : {'product_id': product['_id']}},
  4. {$group : { _id:'$product_id',
  5. average:{$avg:'$rating'}, // Calculate the average rating for a product
  6. count: {$sum:1}}}
  7. ]).next();

image.png

Counting reviews by rating

image.png

  1. countsByRating = db.reviews.aggregate([
  2. {$match : {'product_id': product['_id']}}, // Select product
  3. {$group : { _id:'$rating', // Group by value of rating: '$rating'
  4. count:{$sum:1}}} // Count number of reviews for each rating
  5. ]).toArray(); // Convert resulting cursor to an array

image.png

Joining collections

  1. db.mainCategorySummary.remove({}); // Remove existing documents from mainCategorySummary collection
  2. db.products.aggregate([
  3. {$group : { _id:'$main_cat_id',
  4. count:{$sum:1}}}
  5. ]).forEach(function(doc){
  6. var category = db.categories.findOne({_id:doc._id}); // Read category for a result
  7. if (category !== null) { // You aren't guaranteed the category actually exists!
  8. doc.category_name = category.name;
  9. }
  10. else {
  11. doc.category_name = 'not found';
  12. }
  13. db.mainCategorySummary.insert(doc); // Insert cominned result into your summary collection
  14. })

$out and $project

With the $out operator, you can automatically save the output from a pipeline into a collection.

  1. db.products.aggregate([
  2. {$group : { _id:'$main_cat_id',
  3. count:{$sum:1}}},
  4. {$out : 'mainCategorySummary'} // Save pipeline results to collection mainCategorySummary
  5. ])

The $project operator allows you to filter which fields will be passed to the next stage of the pipeline.
Although $match allows you to limit how much data is passed to the next stage by limiting the number of documents passed, $project can be used to limit the size of each document passed to the next stage.

The following is an example of a $project operator that limits the output documents to just the list of category IDs used for each product:

  1. db.products.aggregate([
  2. {$project : {category_ids:1}}
  3. ]);

image.png

Faster joins with $unwind

  1. db.products.aggregate([
  2. {$project : {category_ids:1}},
  3. {$unwind : '$category_ids'}, // Create an output document for every array entry in category_ids
  4. {$group : { _id:'$category_ids',
  5. count:{$sum:1}}},
  6. {$out : 'countsByCategory'}
  7. ]);

image.png

6.2.2 User and order