Data Analysis

Tutorial 4 of 4

Data Analysis with MongoDB Aggregation Framework

1. Introduction

In this tutorial, we will explore how to perform data analysis using MongoDB's powerful aggregation framework. We will learn how to use group operations and lookup operations to analyze our data.

By the end of this tutorial, you will be able to:
- Understand the MongoDB Aggregation Framework
- Use group and lookup operations for data analysis
- Create complex aggregation pipelines

Prerequisites: Basic understanding of MongoDB and JavaScript.

2. Step-by-Step Guide

MongoDB's aggregation framework is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result. The most basic pipeline stages provide filters that operate like queries and document transformations that modify the form of the output document.

Group Operations

The $group stage groups input documents by a specified identifier expression and applies the accumulator expression(s) to each group. The identifier field can reference field(s) from the input documents.

Example:

db.sales.aggregate([
   {
      $group : {
         _id : "$item",  // Group by the 'item' field
         totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } } // Sum the product of 'price' and 'quantity'
      }
   }
])

Lookup Operations

The $lookup stage performs a left outer join to another collection in the same database to filter in documents from the "joined" collection for processing.

Example:

db.orders.aggregate([
   {
      $lookup:
         {
           from: "inventory", // Join 'inventory' collection
           localField: "item", // field in the orders collection
           foreignField: "sku", // field in the inventory collection
           as: "inventory_docs" // output array field
         }
   }
])

3. Code Examples

Example 1: Group Operation

// Group sales data by the 'item' field and calculate the total sale amount for each item
db.sales.aggregate([
   {
      $group : {
         _id : "$item", 
         totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } } 
      }
   }
])

This will output documents with _id as the 'item' value and totalSaleAmount as the sum of the product of 'price' and 'quantity'.

Example 2: Lookup Operation

// Join 'orders' collection with 'inventory' collection based on 'item'/'sku' match
db.orders.aggregate([
   {
      $lookup:
         {
           from: "inventory", 
           localField: "item",
           foreignField: "sku",
           as: "inventory_docs"
         }
   }
])

This will output documents from the 'orders' collection with an additional 'inventory_docs' array field that includes the matching documents from the 'inventory' collection.

4. Summary

We've learned how to use MongoDB's aggregation framework for data analysis. We've learned how to group documents and perform calculations using $group, and how to join documents from another collection using $lookup.

Next steps for learning include exploring other pipeline stages such as $project, $match, and $unwind. You can refer to the official MongoDB documentation for more details.

5. Practice Exercises

  1. Group the 'orders' collection by 'customer' field and calculate the total quantity for each customer.
  2. Join 'orders' collection with 'customers' collection based on 'customer'/'name' match.
  3. Group the 'orders' collection by 'item' and calculate the average price for each item.

Solutions and further practice can be found in the official MongoDB documentation. Remember, the key to mastering MongoDB's aggregation framework is practice and exploration!