SlideShare a Scribd company logo
ANALYTICS WITH MONGODB


      ROGER BODAMER
YOU WANT TO ANALYZE THIS
LIKE THIS
BUT HOW ?



• These   graphs are the end result of a process

• In
   order get here there’s a few things you need to do and
 explore
A WORD ON NON-NATIVE
         APPROACHES
•   Yes, you can

    •   map your document schema to a relational schema

    •   then export your data from MongoDB to a relational db

        •   and set up a cron job to do this every day

    •   then use your BI tool to map relational to “objects”

    •   and then Report and do Analytics
BUT THAT WOULD BE NO
              FUN


• Analytics   using Native Queries

•A   simple process
PROCESS: NAIVE

• Take   a sample document

• Develop     query

• Put   on chart

• Done    !

  • and   a gold star from your boss !
PROCESS: REALITY
• Understand       your schema
  • multiple schema’s in single collection
  • multiple collections / multiple data sources
• Iterate:
  • define metric
  • develop query and report on metrics
    • understand and drill down or discard
    • repeat
• Operationalize metrics: dashboard
  • Dimensions
  • Plotting
WHY ITERATE ?
UNDERSTAND YOUR SCHEMA

{
    "name" : "Mario",
    "games" : [{"game" : "WoW",
                "duration" : 130},
               {"game" : "Tetris",
                "duration" : 130}]
}
BUT ALSO:
• Schema’s   can be Polymorphic

{
    "name" : "Bob",
    "location" : "us",
    "games" : [{"game" : "WoW",
                "duration" : 2910},
               {"game" : "Tetris",
                "duration" : 593}]
}
SO NOW WHAT ?
•   Only report on common attributes

    •   probably missing the most recent / interesting data
SO NOW WHAT ?
•   Write 2 programs, one for each schema

    •   2 graphs / reports

    •   2 programs writing to 1 graph (basically merging instance data in 2
        places)
SO NOW WHAT ?

•   Unify Schema

    •   deal with absent, null values

    •   translate(NULL, “EU”);
ITERATE



• total   time and how many games people play in the us vs eu ?
QUERY
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

    location : 1,
	

    games: 1
    }},
    { $unwind : "$games" },
    { $group : {
        _id : { location : 1},
	

    number_games: { $sum : 1 },
        total_duration: {$sum : "$games.duration"}
    }},
    { $project : {
	

    _id : 0,
        location : "$_id.location",
	

    number_games : 1,
        total_duration : 1
    }}
]})
SIDEBAR: WRITING
           AGGREGATION QUERIES
•   Prepare Data
    •   Extract relevant properties from collection documents
    •   Unwind sub collection if its document is contributing to aggregation
•   Aggregate data
    •   determine the key (_id) on which the aggregates should be done
    •   name aggregates
•   Project Data
    •   For final results
EXAMPLE
{
    "name" : "Alice",
    "location" : "us",
    "games" : [{
        "game" : "WoW",
        "duration" : 200
      }, {
        "game" : "Tetris",
        "duration" : 100
      }]
}
PREPARE
• Only   use location and games:

{ $project : {
	

 location : 1,
	

 games: 1
    }}


• Unwind   games as properties of its documents are aggregated
 over:

{ $unwind : "$games" }
AGGREGATE DATA
• Aggregate on number of games (add 1 per game)
  and total duration (add duration per game)
  using location as key


{ $group : {
      _id : { location : 1},
	

   number_games: { $sum : 1 },
      total_duration: {$sum : "$games.duration"}
   }}
PROJECT
• Only   show location and aggregates, do not show _id


{ $project : {
	

 _id : 0,
      location : "$_id.location",
	

 number_games : 1,
      total_duration : 1
   }}
RESULT 1




• People   spend a little more time playing in the US
• More   games played in the EU
RING....
CHALLENGE 2


• Since
     we found EU and US play similar amount and same
 number of games, new challenge is:


• Lets
     see what the distribution of different
 games is the 2 locations
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                    location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                    location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                            location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                     location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                            location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},   key: aggregate on location and game
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                     location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                            location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},   key: aggregate on location and game
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
QUERY 2
db.runCommand(
{ aggregate : "gamers", pipeline : [
    { $project : {
	

     location : 1,                                                location, games
	

     games : 1
    }},
    { $unwind : "$games" },
    { $project : {
	

     location : 1,
	

     game : "$games.game",                                        location, game, duration
        duration : "$games.duration"
    }},
    { $group : {
        _id : { location: "$location", game: "$game"},              key: aggregate on location and game
	

     number_games: { $sum : 1 },
        total_duration: {$sum : "$duration"}
    }},
    { $project : {
        _id : 0,
        location : "$_id.location",                      project: location, game, total(#games), sum(duration)
        game : "$_id.game",
	

     number_games : 1,
        total_duration : 1
    }}
]})
RESULT 2




Count: EU - WoW, US Tetris
EU spends more time on WoW, US it’s more
evenly spread
RING....
CHALLENGE 3:



• How   do I compare Bob to everyone else in the EU ?
QUERY

•2   aggregations happening at same time:

  •1   by user

  •1   by location

• This   query needs to be broken up in several queries

• Fairly   complex

• Currently   easiest to process in Ruby/Java/Python/...
db.runCommand(                                                 db.runCommand(
{ aggregate : "gamers", pipeline : [                           { aggregate : "gamers", pipeline : [
    { $project : {                                                 { $project : {
         name : 1,                                             	

     location : 1,
	

     location : 1,                                          	

     games : 1
	

     games : 1                                                  }},
    }},                                                            { $unwind : "$games" },
    { $unwind : "$games" },                                        { $project : {
    { $project : {                                                      location : 1,
	

     name: 1,                                                        duration : "$games.duration"
         location : 1,                                             }},
	

     game : "$games.game",                                      { $group : {
         duration : "$games.duration"                                   _id : { location: 1},
    }},                                                                 total_duration: {$sum :
    { $group : {                                               "$duration"}
         _id : { location: "$location", name: "$name", game:       }},
"$game"},                                                          { $project : {
         total_duration: {$sum : "$duration"}                  	

     name : "$_id.location",
    }},                                                                 _id : 0,
    { $project : {                                                      total_duration : 1
	

     name : "$_id.name",                                        }}
         _id : 0,                                              ]})
         location : "$_id.location",
         game : "$_id.game",
         total_duration : 1
    }}
]})
RESULT 3




• Bob plays >20% WoW in comparison to the Europeans, but
 plays 200% more Tetris
A NOTE ON QUERIES


• There’s   no notion of a declared schema

• The   augmented scheme is coded in queries

• Reuse   is very hard, happens at a query language
DIMENSIONS
• Most   questions / graphs have a dimension

 • Time, Geo

 • Categories

 • Relative: what’s   X’s contribution of revenue to total

• Youwill need to be able to pass in dimensions as a
 predicate for your queries

 • or   cache result and post process client-side
A WORD ON RENDERING
           GRAPHS / REPORTS
• Several   libraries available for ruby / python / java

  • Gruff, Scruffy, StockCharts, D3, JRafael, JQuery Vizualize,
   MooCharts, etc, etc.

• Also some services: John Nunemakers work (http://
 get.gaug.es/)

• But   Basically:

  • you   know how to program, right !
REVIEW
• Understand       your schema
  • multiple schema’s in single collection
  • multiple collections / multiple data sources
• Iterate:
  • define metric
  • develop query and report on metrics
    • understand and drill down or discard
    • repeat
• Operationalize metrics: dashboard
  • Dimensions
  • Plotting
PUNCHLINES

• We     have described a software engineering process

  • but    requirements will be very fluid

• When      you know how to write ruby / java / python etc. - life is
  good

• If   you’re a business analyst you have a problem

  • better   be BFF with some engineer :)
PLUG

• We’ve    been working on a declarative analytics product

• (initially)   uses Excel as its presentation layer

• Reach    out to me if you’re interested

  @rogerb
  roger@norellan.com
THANK YOU / QUESTIONS

More Related Content

PDF
PyCon2009_AI_Alt
PPTX
Mongo db mug_2012-02-07
PDF
CouchDB @ red dirt ruby conference
PDF
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
PPTX
PPTX
MongoDC 2012: How MongoDB Powers Doodle or Die
PDF
The (unknown) collections module
PDF
The Ring programming language version 1.3 book - Part 42 of 88
PyCon2009_AI_Alt
Mongo db mug_2012-02-07
CouchDB @ red dirt ruby conference
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
MongoDC 2012: How MongoDB Powers Doodle or Die
The (unknown) collections module
The Ring programming language version 1.3 book - Part 42 of 88

What's hot (20)

PDF
The Ring programming language version 1.6 book - Part 50 of 189
PDF
The Ring programming language version 1.5.3 book - Part 62 of 184
PDF
Sensmon couchdb
PPTX
Mongo or Die: How MongoDB Powers Doodle or Die
PPTX
Game dev 101 part 3
PPTX
MongoDB Online Conference: Introducing MongoDB 2.2
PDF
The Ring programming language version 1.9 book - Part 62 of 210
PDF
From mysql to MongoDB(MongoDB2011北京交流会)
PDF
Html5 game programming overview
TXT
Books
PPTX
Game dev 101 part 2
PPTX
First app online conf
PDF
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
PDF
Cleaner, Leaner, Meaner: Refactoring your jQuery
PPTX
Coding Horrors
PDF
Groovy scripts with Groovy
PDF
The Testing Games: Mocking, yay!
PDF
The Ring programming language version 1.5 book - Part 9 of 31
PDF
enchant js workshop on Calpoly
ODP
Contando uma história com O.O.
The Ring programming language version 1.6 book - Part 50 of 189
The Ring programming language version 1.5.3 book - Part 62 of 184
Sensmon couchdb
Mongo or Die: How MongoDB Powers Doodle or Die
Game dev 101 part 3
MongoDB Online Conference: Introducing MongoDB 2.2
The Ring programming language version 1.9 book - Part 62 of 210
From mysql to MongoDB(MongoDB2011北京交流会)
Html5 game programming overview
Books
Game dev 101 part 2
First app online conf
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
Cleaner, Leaner, Meaner: Refactoring your jQuery
Coding Horrors
Groovy scripts with Groovy
The Testing Games: Mocking, yay!
The Ring programming language version 1.5 book - Part 9 of 31
enchant js workshop on Calpoly
Contando uma história com O.O.
Ad

Viewers also liked (9)

PPTX
Social Analytics on MongoDB at MongoNYC
PPT
Klmug presentation - Simple Analytics with MongoDB
PDF
Blazing Fast Analytics with MongoDB & Spark
PDF
MongoDB for Analytics
PPTX
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
PDF
Webinar: Faster Big Data Analytics with MongoDB
PPTX
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
PPTX
Real Time Data Analytics with MongoDB and Fluentd at Wish
PDF
MongoDB World 2016: The Best IoT Analytics with MongoDB
Social Analytics on MongoDB at MongoNYC
Klmug presentation - Simple Analytics with MongoDB
Blazing Fast Analytics with MongoDB & Spark
MongoDB for Analytics
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: Faster Big Data Analytics with MongoDB
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Real Time Data Analytics with MongoDB and Fluentd at Wish
MongoDB World 2016: The Best IoT Analytics with MongoDB
Ad

Similar to Thoughts on MongoDB Analytics (20)

KEY
MongoDB Aggregation Framework
PPTX
Querying mongo db
PDF
Aggregation Framework MongoDB Days Munich
PPTX
The Aggregation Framework
PDF
MongoDB Aggregation Framework in action !
PPTX
The Aggregation Framework
PPTX
Joins and Other MongoDB 3.2 Aggregation Enhancements
PPTX
Tips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
PPTX
Agg framework selectgroup feb2015 v2
PDF
MongoDB Aggregation Framework
PDF
De normalised london aggregation framework overview
PPTX
Query for json databases
PPTX
MongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query Pitfalls
PPTX
Aggregation in MongoDB
PPTX
Mongo - an intermediate introduction
PPTX
Geoindexing with MongoDB
PDF
Tips and Tricks for Avoiding Common Query Pitfalls
PPTX
Aggregation Framework
PDF
Full metal mongo
PDF
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB Aggregation Framework
Querying mongo db
Aggregation Framework MongoDB Days Munich
The Aggregation Framework
MongoDB Aggregation Framework in action !
The Aggregation Framework
Joins and Other MongoDB 3.2 Aggregation Enhancements
Tips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
Agg framework selectgroup feb2015 v2
MongoDB Aggregation Framework
De normalised london aggregation framework overview
Query for json databases
MongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query Pitfalls
Aggregation in MongoDB
Mongo - an intermediate introduction
Geoindexing with MongoDB
Tips and Tricks for Avoiding Common Query Pitfalls
Aggregation Framework
Full metal mongo
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls

More from rogerbodamer (6)

PDF
Thoughts on consistency models
PDF
Intro to MongoDB and datamodeling
PPT
Mongo Web Apps: OSCON 2011
PDF
Mongo db japan
PDF
Deployment
KEY
Schema Design with MongoDB
Thoughts on consistency models
Intro to MongoDB and datamodeling
Mongo Web Apps: OSCON 2011
Mongo db japan
Deployment
Schema Design with MongoDB

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
Understanding_Digital_Forensics_Presentation.pptx
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MIND Revenue Release Quarter 2 2025 Press Release
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Thoughts on MongoDB Analytics

  • 1. ANALYTICS WITH MONGODB ROGER BODAMER
  • 2. YOU WANT TO ANALYZE THIS
  • 4. BUT HOW ? • These graphs are the end result of a process • In order get here there’s a few things you need to do and explore
  • 5. A WORD ON NON-NATIVE APPROACHES • Yes, you can • map your document schema to a relational schema • then export your data from MongoDB to a relational db • and set up a cron job to do this every day • then use your BI tool to map relational to “objects” • and then Report and do Analytics
  • 6. BUT THAT WOULD BE NO FUN • Analytics using Native Queries •A simple process
  • 7. PROCESS: NAIVE • Take a sample document • Develop query • Put on chart • Done ! • and a gold star from your boss !
  • 8. PROCESS: REALITY • Understand your schema • multiple schema’s in single collection • multiple collections / multiple data sources • Iterate: • define metric • develop query and report on metrics • understand and drill down or discard • repeat • Operationalize metrics: dashboard • Dimensions • Plotting
  • 10. UNDERSTAND YOUR SCHEMA { "name" : "Mario", "games" : [{"game" : "WoW", "duration" : 130}, {"game" : "Tetris", "duration" : 130}] }
  • 11. BUT ALSO: • Schema’s can be Polymorphic { "name" : "Bob", "location" : "us", "games" : [{"game" : "WoW", "duration" : 2910}, {"game" : "Tetris", "duration" : 593}] }
  • 12. SO NOW WHAT ? • Only report on common attributes • probably missing the most recent / interesting data
  • 13. SO NOW WHAT ? • Write 2 programs, one for each schema • 2 graphs / reports • 2 programs writing to 1 graph (basically merging instance data in 2 places)
  • 14. SO NOW WHAT ? • Unify Schema • deal with absent, null values • translate(NULL, “EU”);
  • 15. ITERATE • total time and how many games people play in the us vs eu ?
  • 16. QUERY db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, games: 1 }}, { $unwind : "$games" }, { $group : { _id : { location : 1}, number_games: { $sum : 1 }, total_duration: {$sum : "$games.duration"} }}, { $project : { _id : 0, location : "$_id.location", number_games : 1, total_duration : 1 }} ]})
  • 17. SIDEBAR: WRITING AGGREGATION QUERIES • Prepare Data • Extract relevant properties from collection documents • Unwind sub collection if its document is contributing to aggregation • Aggregate data • determine the key (_id) on which the aggregates should be done • name aggregates • Project Data • For final results
  • 18. EXAMPLE { "name" : "Alice", "location" : "us", "games" : [{ "game" : "WoW", "duration" : 200 }, { "game" : "Tetris", "duration" : 100 }] }
  • 19. PREPARE • Only use location and games: { $project : { location : 1, games: 1 }} • Unwind games as properties of its documents are aggregated over: { $unwind : "$games" }
  • 20. AGGREGATE DATA • Aggregate on number of games (add 1 per game) and total duration (add duration per game) using location as key { $group : { _id : { location : 1}, number_games: { $sum : 1 }, total_duration: {$sum : "$games.duration"} }}
  • 21. PROJECT • Only show location and aggregates, do not show _id { $project : { _id : 0, location : "$_id.location", number_games : 1, total_duration : 1 }}
  • 22. RESULT 1 • People spend a little more time playing in the US • More games played in the EU
  • 24. CHALLENGE 2 • Since we found EU and US play similar amount and same number of games, new challenge is: • Lets see what the distribution of different games is the 2 locations
  • 25. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 26. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 27. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 28. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, key: aggregate on location and game number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 29. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, key: aggregate on location and game number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 30. QUERY 2 db.runCommand( { aggregate : "gamers", pipeline : [ { $project : { location : 1, location, games games : 1 }}, { $unwind : "$games" }, { $project : { location : 1, game : "$games.game", location, game, duration duration : "$games.duration" }}, { $group : { _id : { location: "$location", game: "$game"}, key: aggregate on location and game number_games: { $sum : 1 }, total_duration: {$sum : "$duration"} }}, { $project : { _id : 0, location : "$_id.location", project: location, game, total(#games), sum(duration) game : "$_id.game", number_games : 1, total_duration : 1 }} ]})
  • 31. RESULT 2 Count: EU - WoW, US Tetris EU spends more time on WoW, US it’s more evenly spread
  • 33. CHALLENGE 3: • How do I compare Bob to everyone else in the EU ?
  • 34. QUERY •2 aggregations happening at same time: •1 by user •1 by location • This query needs to be broken up in several queries • Fairly complex • Currently easiest to process in Ruby/Java/Python/...
  • 35. db.runCommand( db.runCommand( { aggregate : "gamers", pipeline : [ { aggregate : "gamers", pipeline : [ { $project : { { $project : { name : 1, location : 1, location : 1, games : 1 games : 1 }}, }}, { $unwind : "$games" }, { $unwind : "$games" }, { $project : { { $project : { location : 1, name: 1, duration : "$games.duration" location : 1, }}, game : "$games.game", { $group : { duration : "$games.duration" _id : { location: 1}, }}, total_duration: {$sum : { $group : { "$duration"} _id : { location: "$location", name: "$name", game: }}, "$game"}, { $project : { total_duration: {$sum : "$duration"} name : "$_id.location", }}, _id : 0, { $project : { total_duration : 1 name : "$_id.name", }} _id : 0, ]}) location : "$_id.location", game : "$_id.game", total_duration : 1 }} ]})
  • 36. RESULT 3 • Bob plays >20% WoW in comparison to the Europeans, but plays 200% more Tetris
  • 37. A NOTE ON QUERIES • There’s no notion of a declared schema • The augmented scheme is coded in queries • Reuse is very hard, happens at a query language
  • 38. DIMENSIONS • Most questions / graphs have a dimension • Time, Geo • Categories • Relative: what’s X’s contribution of revenue to total • Youwill need to be able to pass in dimensions as a predicate for your queries • or cache result and post process client-side
  • 39. A WORD ON RENDERING GRAPHS / REPORTS • Several libraries available for ruby / python / java • Gruff, Scruffy, StockCharts, D3, JRafael, JQuery Vizualize, MooCharts, etc, etc. • Also some services: John Nunemakers work (http:// get.gaug.es/) • But Basically: • you know how to program, right !
  • 40. REVIEW • Understand your schema • multiple schema’s in single collection • multiple collections / multiple data sources • Iterate: • define metric • develop query and report on metrics • understand and drill down or discard • repeat • Operationalize metrics: dashboard • Dimensions • Plotting
  • 41. PUNCHLINES • We have described a software engineering process • but requirements will be very fluid • When you know how to write ruby / java / python etc. - life is good • If you’re a business analyst you have a problem • better be BFF with some engineer :)
  • 42. PLUG • We’ve been working on a declarative analytics product • (initially) uses Excel as its presentation layer • Reach out to me if you’re interested @rogerb roger@norellan.com
  • 43. THANK YOU / QUESTIONS

Editor's Notes