SlideShare a Scribd company logo
MongoDB Hackathon 02
Vivek A. Ganesan
vivganes@gmail.com
Big Data Gods Meetup, Santa Clara, CA May
18, 2013
Before we start
Copyright 2013, Vivek A. Ganesan, All rights reserved 1
o A BIG thank you to our sponsors –
Big Data Cloud
o Meeting Space
o Food + Drinks
o Consulting/Training
Agenda
Copyright 2013, Vivek A. Ganesan, All rights reserved 2
o Review of Hackathon 01
o Data Modeling
o Indexing
o Aggregation
o Map/Reduce
Introduction
Copyright 2013, Vivek A. Ganesan, All rights reserved 3
o This is a hackathon, not a class
o Which means we work on stuff together
o Please consult and help your team mates
o There will be labs (that’s when we learn!)
o Talk to your team mates
o Figure out what problem you want to solve
o Think about your data sets and how to model them in
Mongo DB
Review – MongoDB Basics
Copyright 2013, Vivek A. Ganesan, All rights reserved 4
o MongoDB is a document-oriented NoSQL data store
o It saves data internally as Binary JSON
o A mongo data store may hold multiple databases
o A database may have multiple collections (analog of tables)
o A collection is a container of documents
o Documents contain Key/Value pairs
o A default key of “_id” is inserted by MongoDB for all documents
o User can set the value of “_id” to anything they want
o Documents are schema-free
o No fixed structure to a collection
o A collection can have documents with different key/value pairs
Review – Shell and Clients
Copyright 2013, Vivek A. Ganesan, All rights reserved 5
o A Mongo Shell is a CLI client to MongoDB
o Shell commands are Javascript functions
o You can write your own Javascript code within the shell
o You can also import Javascript modules using load()
o Mongo Shell looks for an initialization file : ~/.mongorc.js
o Setup global variables here
o To use your favorite editor within the Mongo shell :
o Set the environment variable EDITOR to your editor
o MongoDB supports clients in several programming languages :
o JS, Java, C, C++, C#, Scala, Python, Ruby, Perl and Erlang
Review – Mongo DB Objects
Copyright 2013, Vivek A. Ganesan, All rights reserved 6
o Note : Mongo Shell commands are in blue and output is in green
o Mongo uses a hierarchical naming scheme for database objects
o The current database is always in the db object
o The db command prints the name of the current db
o A collection called “mycollection” in the current database :
o db.mycollection (Note : This is a mongodb object)
o Commands are methods invoked on objects
o For e.g., to insert a document to db.mycollection collection :
o db.mycollection.insert command
o For e.g., to find documents in db.mycollection collection :
o db.mycollection.find command
Review – Create
Copyright 2013, Vivek A. Ganesan, All rights reserved 7
o First exercise :
o Create a new database called “blog”
o Create a collection called “users” and a collection called “posts”
o Solution to first exercise :
o use blog;
o db; => blog
o show collections; => system.indexes
o db.createCollection(“users”); => { “ok” => 1 }
o db.createCollection(“posts”); => { “ok” => 1 }
o show collections; => posts, system.indexes, users
Review – Insert
Copyright 2013, Vivek A. Ganesan, All rights reserved 8
o Second Exercise :
o In the “users” collection :
o Insert a single document, {username: “admin”}
o In the “posts” collection :
o Insert ten posts using a loop
o Blog data : post_title, post_body and post_tags as CSV
o Solution to Second Exercise :
o db.users.insert({username : “admin”});
o for (var i = 1; i <= 10; i++) { db.posts.insert({post_title:
"Title", post_body: "Post Body", post_tags:
"tag1,tag2,tag3,tag4,tag5"}); }
Review – Updates with modifier
Copyright 2013, Vivek A. Ganesan, All rights reserved 9
o Third Exercise :
o In the “posts” collection :
o Update ten posts with an updated_at key and set it to the
current timestamp
o Solution to the Third Exercise :
o Note : MongoDB replaces the entire document for an
update call without a modifier (modifiers start with a
‘$’ symbol)
o db.posts.update({}, {$set : {updated_at: new
Date()}}, false, true);
Review – Selective Updates
Copyright 2013, Vivek A. Ganesan, All rights reserved 10
o Fourth Exercise :
o In the “posts” collection :
o Update the posts such that the first three posts have a “foo”
tag (use the cursor functionality to iterate)
o Solution to the Fourth Exercise :
o c = db.posts.find().limit(3);
o while ( c.hasNext() ) {
o post = c.next();
o post["post_tags"] = post["post_tags"] + ",foo";
o db.posts.save(post);
o }
Review – Mastering find
Copyright 2013, Vivek A. Ganesan, All rights reserved 11
o In a Mongo Shell,
o Find all posts but extract only the post_title field
o db.posts.find({}, {post_title: 1, _id: 0});
o List all posts but in reverse order of created_on
o db.posts.find().sort({_id: -1});
o Do the same as above but paginate in sets of three
o db.posts.find().sort({_id: -1}).skip(3).limit(3);
o Find all posts that contain a tag called “foo”
o db.posts.find({post_tags: /foo/});
Review – Modifiers
Copyright 2013, Vivek A. Ganesan, All rights reserved 12
o Fifth Exercise :
o Modify “posts” collection
o Change the post_tags field to an array instead of a
CSV list
o c = db.posts.find();
o while ( c.hasNext() ) {
o post = c.next();
o post["post_tags"] = post["post_tags"].split(",");
o db.posts.save(post);
o }
Data Modeling
Copyright 2013, Vivek A. Ganesan, All rights reserved 13
o http://docs.mongodb.org/manual/core/data-modeling/
o When to reference?
o When it makes sense to i.e. many-to-many relationships
o When document size is a concern
o Some drivers may do this automatically
o When to embed?
o When it is “natural” for e.g. blog post and comments
o When there is a need for atomic operations
o When read performance is critical
Lab 01 – Model your data set
Copyright 2013, Vivek A. Ganesan, All rights reserved 14
o Break – 15 minutes
o Lab 01 – 45 minutes - With your team :
o Look at your data set and figure out how you will model it
o How would you bulk load the data?
o How would you handle errors while loading?
o Implement the schema for your data set
o Bulk load a small portion of your data set
o Verify the load and also run some sample queries
o Figure out what queries you would run frequently
Indexes
Copyright 2013, Vivek A. Ganesan, All rights reserved 15
o http://docs.mongodb.org/manual/core/indexes/
o When to index?
o Improve find performance
o Improve sort performance
o Note : There is a performance impact for writes
o What to index?
o Depends on the query
o Usually, most frequently searched for fields
o Sometimes, fields in embedded documents as well
Types of Indexes and Options
Copyright 2013, Vivek A. Ganesan, All rights reserved 16
o Unique indexes (_id has an unique index by default)
o Simple
o Compound Indexes
o Prefix order is important!
o Text indexes
o Sparse Indexes
o Multi-key indexes (for arrays)
o Geospatial and Geohaystack indexes
o Indexes can be built in the background (recommended!)
o Indexes can be named explicity (definitely recommened!)
Lab 02 – Indexes
Copyright 2013, Vivek A. Ganesan, All rights reserved 17
o Lab 02 – 30 minutes - With your team :
o Look at the frequent queries from Lab 01 and :
o Which would you index and why?
o What kind of indexes are needed?
o Since this is predominantly a read use case, index away
o Would you use the sparse index? For what and how?
o Would you use the geospatial index? For what and how?
o Would you use the TTL index? For what and how?
Aggregation
Copyright 2013, Vivek A. Ganesan, All rights reserved 18
o Used for “group by”-like queries
o Aggregation Framework (introduced in 2.1)
o http://docs.mongodb.org/manual/aggregation/
o Simple count : db.posts.count();
o Using Aggregation Framework : db.posts.aggregate([{
$group: { _id: null, count: {$sum: 1}}}]);
o Check the reference for comparison with SQL group by
o Still supports Map/Reduce (older approach and still relevant)
Lab 03 – Aggregation
Copyright 2013, Vivek A. Ganesan, All rights reserved 19
o Lab 03 – 30 minutes - With your team :
o Figure out what aggregations to run on the data set :
o For e.g., average rating per user?
o Or, average number of movies rated by all users?
o Write the queries for these aggregations and test them
o Are indexes helpful in aggregations? Why/Why not?
o Are you better off just doing these in your client code?
Why/Why not?
o When would you use pipelined aggregations?
Map/Reduce
Copyright 2013, Vivek A. Ganesan, All rights reserved 20
o Scatter/Gather framework
o db.collection.mapReduce(map_fn, red_fn, {out: output_coll})
o http://docs.mongodb.org/manual/aggregation/
o Mapper – just emits key/value pairs
o Framework – Groups and sorts mapper output => Reducer
o Reducer – Applies a function on the input => Output Coll.
o Distributed computation framework for full table scans
o http://docs.mongodb.org/manual/tutorial/map-reduce-
examples/
Lab 04 – Map/Reduce
Copyright 2013, Vivek A. Ganesan, All rights reserved 21
o Lab 04 – 30 minutes - With your team :
o Go through the Map/Reduce examples
o Figure out what Map/Reduce functions you would use
o Implement these functions (on a small data set)
o Some things to think about :
o Can you use Map/Reduce to “seed” your
recommendations?
o Can you use incremental Map/Reduce to “update”
your recommendations? How would you do this?
Questions? Comments?
Thank You!
E-mail: vivganes@gmail.com
Twitter : onevivek
Copyright 2013, Vivek A. Ganesan, All rights
reserved
22

More Related Content

PPTX
Simplifying Persistence for Java and MongoDB with Morphia
PPTX
MongoDB (Advanced)
PPT
Spring data presentation
PDF
Wed 1630 greene_robert_color
PPT
Tthornton code4lib
PPTX
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
PDF
Elastic Search Training#1 (brief tutorial)-ESCC#1
PDF
MongoDB Advanced Topics
Simplifying Persistence for Java and MongoDB with Morphia
MongoDB (Advanced)
Spring data presentation
Wed 1630 greene_robert_color
Tthornton code4lib
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
Elastic Search Training#1 (brief tutorial)-ESCC#1
MongoDB Advanced Topics

What's hot (20)

PDF
Search Engine-Building with Lucene and Solr
PDF
"Solr Update" at code4lib '13 - Chicago
PPTX
Starting with JSON Path Expressions in Oracle 12.1.0.2
PPTX
Spring data jpa
PPSX
JSON in Oracle 18c and 19c
PDF
Full metal mongo
PPSX
JSON in 18c and 19c
ODP
This upload requires better support for ODP format
PDF
UKOUG Tech14 - Getting Started With JSON in the Database
PPTX
Oracle Database - JSON and the In-Memory Database
PDF
An introduction into Spring Data
PPT
Persistences
PPTX
Jdbc Java Programming
PDF
MySQL without the SQL -- Cascadia PHP
PPTX
Android Data Storagefinal
PDF
Los Angeles R users group - Dec 14 2010 - Part 2
PDF
第2回 Hadoop 輪読会
PPTX
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
PDF
PDF
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
Search Engine-Building with Lucene and Solr
"Solr Update" at code4lib '13 - Chicago
Starting with JSON Path Expressions in Oracle 12.1.0.2
Spring data jpa
JSON in Oracle 18c and 19c
Full metal mongo
JSON in 18c and 19c
This upload requires better support for ODP format
UKOUG Tech14 - Getting Started With JSON in the Database
Oracle Database - JSON and the In-Memory Database
An introduction into Spring Data
Persistences
Jdbc Java Programming
MySQL without the SQL -- Cascadia PHP
Android Data Storagefinal
Los Angeles R users group - Dec 14 2010 - Part 2
第2回 Hadoop 輪読会
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes

Viewers also liked (7)

PDF
Big data pipelines
PPTX
Mongodb hackathon 01
PPTX
Collaborative filtering getting_started
PPTX
Recommendation Engines Program Kickoff
PPTX
Introduction to Data Engineering
PDF
Luigi presentation NYC Data Science
PDF
Building a Data Pipeline from Scratch - Joe Crobak
Big data pipelines
Mongodb hackathon 01
Collaborative filtering getting_started
Recommendation Engines Program Kickoff
Introduction to Data Engineering
Luigi presentation NYC Data Science
Building a Data Pipeline from Scratch - Joe Crobak

Similar to Mongodb hackathon 02 (20)

PPTX
No SQL DB lecture showing structure and syntax
PPT
9. Document Oriented Databases
PPTX
Getting Started with MongoDB
PPTX
PPTX
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
PPTX
MongoDB installation,CRUD operation & JavaScript shell
PDF
Building Apps with MongoDB
PPTX
introtomongodb
PPTX
Introduction to MongoDB – A NoSQL Database
PDF
Nosql part 2
PPTX
MongoDB - An Introduction
KEY
London MongoDB User Group April 2011
PDF
Mongo db basics
PDF
Quick overview on mongo db
PPTX
Introduction to MongoDB
PDF
MongoDB.pdf
PPT
Mongodb Training Tutorial in Bangalore
PDF
Building your first app with mongo db
KEY
Introduction to MongoDB
PPTX
Mongo db
No SQL DB lecture showing structure and syntax
9. Document Oriented Databases
Getting Started with MongoDB
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
MongoDB installation,CRUD operation & JavaScript shell
Building Apps with MongoDB
introtomongodb
Introduction to MongoDB – A NoSQL Database
Nosql part 2
MongoDB - An Introduction
London MongoDB User Group April 2011
Mongo db basics
Quick overview on mongo db
Introduction to MongoDB
MongoDB.pdf
Mongodb Training Tutorial in Bangalore
Building your first app with mongo db
Introduction to MongoDB
Mongo db

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
sap open course for s4hana steps from ECC to s4
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
20250228 LYD VKU AI Blended-Learning.pptx
Programs and apps: productivity, graphics, security and other tools
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Mongodb hackathon 02

  • 1. MongoDB Hackathon 02 Vivek A. Ganesan vivganes@gmail.com Big Data Gods Meetup, Santa Clara, CA May 18, 2013
  • 2. Before we start Copyright 2013, Vivek A. Ganesan, All rights reserved 1 o A BIG thank you to our sponsors – Big Data Cloud o Meeting Space o Food + Drinks o Consulting/Training
  • 3. Agenda Copyright 2013, Vivek A. Ganesan, All rights reserved 2 o Review of Hackathon 01 o Data Modeling o Indexing o Aggregation o Map/Reduce
  • 4. Introduction Copyright 2013, Vivek A. Ganesan, All rights reserved 3 o This is a hackathon, not a class o Which means we work on stuff together o Please consult and help your team mates o There will be labs (that’s when we learn!) o Talk to your team mates o Figure out what problem you want to solve o Think about your data sets and how to model them in Mongo DB
  • 5. Review – MongoDB Basics Copyright 2013, Vivek A. Ganesan, All rights reserved 4 o MongoDB is a document-oriented NoSQL data store o It saves data internally as Binary JSON o A mongo data store may hold multiple databases o A database may have multiple collections (analog of tables) o A collection is a container of documents o Documents contain Key/Value pairs o A default key of “_id” is inserted by MongoDB for all documents o User can set the value of “_id” to anything they want o Documents are schema-free o No fixed structure to a collection o A collection can have documents with different key/value pairs
  • 6. Review – Shell and Clients Copyright 2013, Vivek A. Ganesan, All rights reserved 5 o A Mongo Shell is a CLI client to MongoDB o Shell commands are Javascript functions o You can write your own Javascript code within the shell o You can also import Javascript modules using load() o Mongo Shell looks for an initialization file : ~/.mongorc.js o Setup global variables here o To use your favorite editor within the Mongo shell : o Set the environment variable EDITOR to your editor o MongoDB supports clients in several programming languages : o JS, Java, C, C++, C#, Scala, Python, Ruby, Perl and Erlang
  • 7. Review – Mongo DB Objects Copyright 2013, Vivek A. Ganesan, All rights reserved 6 o Note : Mongo Shell commands are in blue and output is in green o Mongo uses a hierarchical naming scheme for database objects o The current database is always in the db object o The db command prints the name of the current db o A collection called “mycollection” in the current database : o db.mycollection (Note : This is a mongodb object) o Commands are methods invoked on objects o For e.g., to insert a document to db.mycollection collection : o db.mycollection.insert command o For e.g., to find documents in db.mycollection collection : o db.mycollection.find command
  • 8. Review – Create Copyright 2013, Vivek A. Ganesan, All rights reserved 7 o First exercise : o Create a new database called “blog” o Create a collection called “users” and a collection called “posts” o Solution to first exercise : o use blog; o db; => blog o show collections; => system.indexes o db.createCollection(“users”); => { “ok” => 1 } o db.createCollection(“posts”); => { “ok” => 1 } o show collections; => posts, system.indexes, users
  • 9. Review – Insert Copyright 2013, Vivek A. Ganesan, All rights reserved 8 o Second Exercise : o In the “users” collection : o Insert a single document, {username: “admin”} o In the “posts” collection : o Insert ten posts using a loop o Blog data : post_title, post_body and post_tags as CSV o Solution to Second Exercise : o db.users.insert({username : “admin”}); o for (var i = 1; i <= 10; i++) { db.posts.insert({post_title: "Title", post_body: "Post Body", post_tags: "tag1,tag2,tag3,tag4,tag5"}); }
  • 10. Review – Updates with modifier Copyright 2013, Vivek A. Ganesan, All rights reserved 9 o Third Exercise : o In the “posts” collection : o Update ten posts with an updated_at key and set it to the current timestamp o Solution to the Third Exercise : o Note : MongoDB replaces the entire document for an update call without a modifier (modifiers start with a ‘$’ symbol) o db.posts.update({}, {$set : {updated_at: new Date()}}, false, true);
  • 11. Review – Selective Updates Copyright 2013, Vivek A. Ganesan, All rights reserved 10 o Fourth Exercise : o In the “posts” collection : o Update the posts such that the first three posts have a “foo” tag (use the cursor functionality to iterate) o Solution to the Fourth Exercise : o c = db.posts.find().limit(3); o while ( c.hasNext() ) { o post = c.next(); o post["post_tags"] = post["post_tags"] + ",foo"; o db.posts.save(post); o }
  • 12. Review – Mastering find Copyright 2013, Vivek A. Ganesan, All rights reserved 11 o In a Mongo Shell, o Find all posts but extract only the post_title field o db.posts.find({}, {post_title: 1, _id: 0}); o List all posts but in reverse order of created_on o db.posts.find().sort({_id: -1}); o Do the same as above but paginate in sets of three o db.posts.find().sort({_id: -1}).skip(3).limit(3); o Find all posts that contain a tag called “foo” o db.posts.find({post_tags: /foo/});
  • 13. Review – Modifiers Copyright 2013, Vivek A. Ganesan, All rights reserved 12 o Fifth Exercise : o Modify “posts” collection o Change the post_tags field to an array instead of a CSV list o c = db.posts.find(); o while ( c.hasNext() ) { o post = c.next(); o post["post_tags"] = post["post_tags"].split(","); o db.posts.save(post); o }
  • 14. Data Modeling Copyright 2013, Vivek A. Ganesan, All rights reserved 13 o http://docs.mongodb.org/manual/core/data-modeling/ o When to reference? o When it makes sense to i.e. many-to-many relationships o When document size is a concern o Some drivers may do this automatically o When to embed? o When it is “natural” for e.g. blog post and comments o When there is a need for atomic operations o When read performance is critical
  • 15. Lab 01 – Model your data set Copyright 2013, Vivek A. Ganesan, All rights reserved 14 o Break – 15 minutes o Lab 01 – 45 minutes - With your team : o Look at your data set and figure out how you will model it o How would you bulk load the data? o How would you handle errors while loading? o Implement the schema for your data set o Bulk load a small portion of your data set o Verify the load and also run some sample queries o Figure out what queries you would run frequently
  • 16. Indexes Copyright 2013, Vivek A. Ganesan, All rights reserved 15 o http://docs.mongodb.org/manual/core/indexes/ o When to index? o Improve find performance o Improve sort performance o Note : There is a performance impact for writes o What to index? o Depends on the query o Usually, most frequently searched for fields o Sometimes, fields in embedded documents as well
  • 17. Types of Indexes and Options Copyright 2013, Vivek A. Ganesan, All rights reserved 16 o Unique indexes (_id has an unique index by default) o Simple o Compound Indexes o Prefix order is important! o Text indexes o Sparse Indexes o Multi-key indexes (for arrays) o Geospatial and Geohaystack indexes o Indexes can be built in the background (recommended!) o Indexes can be named explicity (definitely recommened!)
  • 18. Lab 02 – Indexes Copyright 2013, Vivek A. Ganesan, All rights reserved 17 o Lab 02 – 30 minutes - With your team : o Look at the frequent queries from Lab 01 and : o Which would you index and why? o What kind of indexes are needed? o Since this is predominantly a read use case, index away o Would you use the sparse index? For what and how? o Would you use the geospatial index? For what and how? o Would you use the TTL index? For what and how?
  • 19. Aggregation Copyright 2013, Vivek A. Ganesan, All rights reserved 18 o Used for “group by”-like queries o Aggregation Framework (introduced in 2.1) o http://docs.mongodb.org/manual/aggregation/ o Simple count : db.posts.count(); o Using Aggregation Framework : db.posts.aggregate([{ $group: { _id: null, count: {$sum: 1}}}]); o Check the reference for comparison with SQL group by o Still supports Map/Reduce (older approach and still relevant)
  • 20. Lab 03 – Aggregation Copyright 2013, Vivek A. Ganesan, All rights reserved 19 o Lab 03 – 30 minutes - With your team : o Figure out what aggregations to run on the data set : o For e.g., average rating per user? o Or, average number of movies rated by all users? o Write the queries for these aggregations and test them o Are indexes helpful in aggregations? Why/Why not? o Are you better off just doing these in your client code? Why/Why not? o When would you use pipelined aggregations?
  • 21. Map/Reduce Copyright 2013, Vivek A. Ganesan, All rights reserved 20 o Scatter/Gather framework o db.collection.mapReduce(map_fn, red_fn, {out: output_coll}) o http://docs.mongodb.org/manual/aggregation/ o Mapper – just emits key/value pairs o Framework – Groups and sorts mapper output => Reducer o Reducer – Applies a function on the input => Output Coll. o Distributed computation framework for full table scans o http://docs.mongodb.org/manual/tutorial/map-reduce- examples/
  • 22. Lab 04 – Map/Reduce Copyright 2013, Vivek A. Ganesan, All rights reserved 21 o Lab 04 – 30 minutes - With your team : o Go through the Map/Reduce examples o Figure out what Map/Reduce functions you would use o Implement these functions (on a small data set) o Some things to think about : o Can you use Map/Reduce to “seed” your recommendations? o Can you use incremental Map/Reduce to “update” your recommendations? How would you do this?
  • 23. Questions? Comments? Thank You! E-mail: vivganes@gmail.com Twitter : onevivek Copyright 2013, Vivek A. Ganesan, All rights reserved 22