4.1 Principles of schema design

some question for table when modeling data with any database system:

  • What are your application access patterns?
  • What’s the basic unit of data?
  • What are the capabilities of your database?
  • What makes a good unique id or primary key for a record?

    4.2 Designing an e-commerce data model

    4.2.1 Schema basics

    1. {
    2. _id: ObjectId("4c4b1476238d3b4dd5003981"),
    3. slug: "wheelbarrow-9092",
    4. sku: "9092",
    5. name: "Extra Large Wheelbarrow",
    6. description: "Heavy duty wheelbarrow...",
    7. details: {
    8. weight: 47,
    9. weight_units: "lbs",
    10. model_num: 4039283402,
    11. manufacturer: "Acme",
    12. color: "Green"
    13. },
    14. total_reviews: 4,
    15. average_review: 4.5,
    16. pricing: {
    17. retail: 589700,
    18. sale: 489700,
    19. },
    20. price_history: [
    21. {
    22. retail: 529700,
    23. sale: 429700,
    24. start: new Date(2010, 4, 1),
    25. end: new Date(2010, 4, 8)
    26. },
    27. {
    28. retail: 529700,
    29. sale: 529700,
    30. start: new Date(2010, 4, 9),
    31. end: new Date(2010, 4, 16)
    32. },
    33. ],
    34. primary_category: ObjectId("6a5b1476238d3b4dd5000048"),
    35. category_ids: [
    36. ObjectId("6a5b1476238d3b4dd5000048"),
    37. ObjectId("6a5b1476238d3b4dd5000049")
    38. ],
    39. main_cat_id: ObjectId("6a5b1476238d3b4dd5000048"),
    40. tags: ["tools", "gardening", "soil"],
    41. }
  • Unique slug

    1. db.products.createIndex({slug: 1}, {unique: true})
  • nested documents

  • one-to-many relationships
  • many-to-many relationships
    1. {
    2. _id: ObjectId("6a5b1476238d3b4dd5000048"),
    3. slug: "gardening-tools",
    4. name: "Gardening Tools",
    5. description: "Gardening gadgets galore!",
    6. parent_id: ObjectId("55804822812cb336b78728f9"),
    7. ancestors: [
    8. {
    9. name: "Home",
    10. _id: ObjectId("558048f0812cb336b78728fa"),
    11. slug: "home"
    12. },
    13. {
    14. name: "Outdoors",
    15. _id: ObjectId("55804822812cb336b78728f9"),
    16. slug: "outdoors"
    17. }
    18. ]
    19. }
    to query for all products in the Gardening Tools category, the code is simple:
    1. db.products.find({category_ids: ObjectId('6a5b1476238d3b4dd5000048')})
    To query for all categories from a given product, you use the $in operator:
    1. product = db.products.findOne({"slug": "wheelbarrow-9092"})
    2. db.categories.find({_id: {$in: product['category_ids']}})

    4.2.2 Users and orders

    1. {
    2. _id: ObjectId("6a5b1476238d3b4dd5000048"),
    3. user_id: ObjectId("4c4b1476238d3b4dd5000001"),
    4. state: "CART",
    5. line_items: [
    6. {
    7. _id: ObjectId("4c4b1476238d3b4dd5003981"),
    8. sku: "9092",
    9. name: "Extra Large Wheelbarrow",
    10. quantity: 1,
    11. pricing: {
    12. retail: 5897,
    13. sale: 4897,
    14. }
    15. },
    16. {
    17. _id: ObjectId("4c4b1476238d3b4dd5003982"),
    18. sku: "10027",
    19. name: "Rubberized Work Glove, Black",
    20. quantity: 2,
    21. pricing: {
    22. retail: 1499,
    23. sale: 1299
    24. }
    25. }
    26. ],
    27. shipping_address: {
    28. street: "588 5th Street",
    29. city: "Brooklyn",
    30. state: "NY",
    31. zip: 11215
    32. },
    33. sub_total: 6196
    34. }
    Finding all orders for a given user:
    1. db.orders.find{user_id: user['_id']}
    The query for getting the user for a particular order is equally simple:
    1. db.users.findOne({_id: order['user_id']})
    1. {
    2. _id: ObjectId("4c4b1476238d3b4dd5000001"),
    3. username: "kbanker",
    4. email: "kylebanker@gmail.com",
    5. first_name: "Kyle",
    6. last_name: "Banker",
    7. hashed_password: "bd1cfa194c3a603e7186780824b04419",
    8. addresses: [
    9. {
    10. name: "home",
    11. street: "588 5th Street",
    12. city: "Brooklyn",
    13. state: "NY",
    14. zip: 11215
    15. },
    16. {
    17. name: "work",
    18. street: "1 E. 23rd Street",
    19. city: "New York",
    20. state: "NY",
    21. zip: 10010
    22. }
    23. ],
    24. payment_methods: [
    25. {
    26. name: "VISA",
    27. payment_token: "43f6ba1dfda6b8106dc7"
    28. }
    29. ]
    30. }
    1. {
    2. _id: ObjectId("4c4b1476238d3b4dd5000041"),
    3. product_id: ObjectId("4c4b1476238d3b4dd5003981"),
    4. date: new Date(2010, 5, 7),
    5. title: "Amazing",
    6. text: "Has a squeaky wheel, but still a darn good wheelbarrow.",
    7. rating: 4,
    8. user_id: ObjectId("4c4b1476238d3b4dd5000042"),
    9. username: "dgreenthumb",
    10. helpful_votes: 3,
    11. voter_ids: [
    12. ObjectId("4c4b1476238d3b4dd5000033"),
    13. ObjectId("7a4f0376238d3b4dd5000003"),
    14. ObjectId("92c21476238d3b4dd5000032")
    15. ]
    16. }

    4.3 Nuts and bolts: on databases, collections and documents

    4.3.1 Databases

    a database is created automatically once you write to a collection in that database. Have a look at this Ruby code:
    1. connection = Mongo::Client.new( [ '127.0.0.1:27017' ], :database => 'garden' )
    2. db = connection.database
    1. products = db['products']
    2. products.insert_one({:name => "Extra Large Wheelbarrow"})
    You can delete all the data in this collection by calling:
    1. products.find({}).delete_many
    This command doesn’t remove the collection itself; it only empties it. To remove a collection entirely, you use the drop method, like this:
    1. products.drop
    You can drop the garden database from Ruby like so:
    1. db.drop
    From the MongoDB shell, run the dropDatabase() method using JavaScript:
    1. use garden
    2. db.dropDatabase();
    The data files reside in whichever directory you designated as the dbpath when starting mongod. When left unspecified, mongod stores all its files in /data/db.3 Let’s see how this directory looks after creating the garden database:
    image.png
    First note the mongod.lock file, which stores the server’s process ID.
    You can always check the amount of space used versus the amount allocated by using the stats command in the JavaScript shell:
    image.png
    the fileSize field indicates the total size of files allocated for this database.
    the dataSize is the actural size of the BSON objects in the database
    the storageSize includes extra space reserved for collection growth and also unallocated deleted space

4.3.2 Collections

  1. db.createCollection("users")
  1. db.createCollection("users", {size: 20000})

reanme collection

  1. db.products.renameCollection("store_products")

capped collection is distinguished from standard collections by their fixed size. This means that once a capped collection reaches its maximum size, subsequent inserts will overwirte the least-recently-inserted documents in the collection.

  1. require 'mongo'
  2. VIEW_PRODUCT = 0 # action type constants
  3. ADD_TO_CART = 1
  4. CHECKOUT = 2
  5. PURCHASE = 3
  6. client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'garden')
  7. client[:user_actions].drop
  8. actions = client[:user_actions, :capped => true, :size => 16384]
  9. actions.create
  10. 500.times do |n| # loop 500 times, using n as the iterator
  11. doc = {
  12. :username => "kbanker",
  13. :action_code => rand(4), # random value between 0 and 3, inclusive
  14. :time => Time.now.utc,
  15. :n => n
  16. }
  17. actions.insert_one(doc)
  18. end

query the collection from the shell:

  1. use garden
  2. db.user_actions.count();

mongodb allows you to specify a maximum number of documents for a capped collection with the max parameter
this size configuration has precedence

  1. db.createCollection("users.actions", {capped: true, size: 16384, max: 100})

capped collections don’t allow all operations available for a normal collection.


mongodb also allows you to expire documents from a collection after a certain amount of time has passed.
this functionality is actually implemented using a special kind of index.

  1. db.reviews.createIndex({time_field: 1}, {expireAfterSeconds: 3600})
  1. db.reviews.insert({
  2. time_field: new Date(),
  3. ...
  4. })

TTL indexes just measure the difference between the indexed value and the current time, to compare to expireAfterSeconds
TTL indexes ‘s restrictions:
can’t have a TTL index on _id or on a field used in another index
can’t use TTL indexexs with capped collections because they don’t support removing individual documents
can’t have compound TTL indexes, though you can have an array of timestamps in the indexes field.


Two of the special system collections are system.namespaces and system.indexes

  1. db.system.namespaces.find()

image.png

system.indexes stores each index definition for the current database

  1. db.system.indexes.find()

image.png

4.3.3 Documents and insertion

Document serialization, types, and limits

all documents are serialized to BSON befor being sent to MongoDB; they’re later deserialized from BSON

  1. doc = {
  2. :_id => BSON::ObjectId.new,
  3. :username => "kbanker",
  4. :action_code => rand(5),
  5. :time => Time.now.utc,
  6. :n => 1
  7. }
  8. bson = doc.to_bson
  9. puts "Document #{doc.inspect} takes up #{bson.length} bytes as BSON"

The serialize method returns a byte array

  1. string_io = StringIO.new(bson)
  2. deserialized_doc = String.from_bson(string_io)
  3. puts "Here's our document deserialized from BSON:"
  4. puts deserialized_doc.inspect

BSON types:

  • strings
  • numbers: double, int , and long

    1. db.numbers.save({n: 5});
    2. db.numbers.save({n: NumberLong(5)})
    1. db.numbers.find({n: 5});

    image.png
    you’ll see that doubles are type 1 and 64-bit integers are type 18. Thus, you can query the collection for values by type:
    image.png
    The only other issue that commonly arises with BSON numeric types is the lack of decimal support. This means that if you’re planning on storing currency values in MongoDB, you need to use an integer type and keep the values in cents.

  • datetimes

  • virtual types
  • limits on documents
  • bulk inserts