SlideShare a Scribd company logo
Pre-aggregation with counters




       © Copyright 2010 10gen Inc.
Goals
• Dashboard style reports
• (Known) Reports
• Real-time numbers
Framework
• Know your metrics/counter
• Prepared reports
• Calculate during write
• Fast queries
• Always up to date data
• Record time-series collections
Rationale
• Documents are updated in-place*
• $inc update operator
• Working set is small
• Aggregations are much smaller*
Dashboard
               Projects       Lines   Events

                  6497
     5543
        3401                                        3543
                                 2314 2342
                        921                            1234
16              27              42             45

JavaScript           Java         Ruby         Python




                                                              Monday     Tuesday
                                                              Thursday   Friday
Demo Dashboard
Roads not traveled
• Map/Reduce
  • Reprocess raw data
  • Now possible to do partial reduce
• Aggregation Framework (aggregate in 2.2)
  • Also reprocess data on operation (initial release)
  • Optimizations to come


• More costly during reads
Not Appropriate For
• Ad-hoc aggregations (unknown metrics)
• One-off reports
• Possibly complex calculations
Processing
• Event received
• Split into many updates w/$inc
• Aggregate
  • Input Field(s)
  • Time periods (hourly, daily, monthly)
  • Defined Metrics
Example Data: github
> db.events.findOne() {
     "repository" : {
           "url" : "https://github.com/vidageek/games",
           ...
           "open_issues" : 25,
           "watchers" : 6,
           "pushed_at" : "2012/03/10 08:34:00 -0800",
           "language" : "Java"
     },
     "actor_attributes" : {...},
     "created_at" : "2012/03/11 15:20:24 -0700",
     "public" : true,
     "actor" : "juliano",
     "payload" : {...},
     "url" : "https://github.com/...",
     "type" : "CommitCommentEvent” }
Define Metrics
• “actor”
• “repository.name”
• “repository.language”
• “type”
  PushEvent, IssuesEvent, WatchEvent, GistEvent
• “payload.ref”
  efs/heads/improved_history, refs/heads/master, refs
  /heads/signs
Aggregations
TimePeriod, type #
TimePeriod, author #
TimePeriod, project #
Stats Collections
stats_[hourly/daily/monthly].actors

stats_[hourly/daily/monthly].projects

stats_[hourly/daily/monthly].langs

stats_[hourly/daily/monthly].types
Stats
> db.stats_hourly.types.find({"_id.type":"GistEvent"}) {
     "_id" : {
              "p" : ISODate("2012-05-21T00:00:00Z"),
              "type" : "GistEvent” },
     "hour" : {
           "2" : { "count" : 65 },
           "3" : { "count" : 2 },
           "7" : { ”count" : 130},
           "8" : { "count" : 5 }    },
     "total" : { ”count" : 202 } }
Updates Increment
Query:
{ ”p" : Date(…), "actor" : "neoplastic"}}

Update:
{ "$inc" : { "h.21.c" : 1 , "t.c" : 1}}

Upsert : true
Query/Graphing
• Select by grouping (by date, by type/value)
• Documents hold many data points
The Whys
• Writing more data up front, helps with reads
• Multiple data points per document
• Documents hold many timed points
• Good for graphs by time, or types
• Nested for improved performance
Thanks for coming… ne questions
drivers at mongodb.org
mongodb.org Supported               Community Supported

      C                            REST             node.js
      C#                           ActionScript3    Objective C
      C++                          C# and .NET      PHP
      Erlang                       Clojure          PowerShell
      Haskell                      ColdFusion       Blog post
      Java                         Delphi           Python
      Javascript                   Erlang           Ruby
      Perl                         F#               Scala
      PHP                          Go: gomongo      Scheme (PLT)
      Python                       Groovy           Smalltalk: Dolphin
      Ruby                         Haskell          Smalltalk
                                   Javascript
                                   Lua




                   © Copyright 2010 10gen Inc.
download at mongodb.org

        conferences, appearances, and meetups
                  http://www.10gen.com/events



   Facebook             |         Twitter                 |         LinkedIn
http://bit.ly/mongofb            @mongodb                     http://linkd.in/joinmongo


 support, training, and this talk brought to you by



                            © Copyright 2010 10gen Inc.

More Related Content

PPTX
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
PDF
H2O World - Intro to R, Python, and Flow - Amy Wang
PDF
Flow based programming an overview
PPTX
Yahoo! Mail antispam - Bay area Hadoop user group
PDF
H2O World - PySparkling Water - Nidhi Mehta
PDF
3 avro hug-2010-07-21
PPTX
Event Pipe - Lambda Architecture
PDF
2014 09 30_sparkling_water_hands_on
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
H2O World - Intro to R, Python, and Flow - Amy Wang
Flow based programming an overview
Yahoo! Mail antispam - Bay area Hadoop user group
H2O World - PySparkling Water - Nidhi Mehta
3 avro hug-2010-07-21
Event Pipe - Lambda Architecture
2014 09 30_sparkling_water_hands_on

What's hot (20)

PDF
Norikra: SQL Stream Processing In Ruby
PPTX
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
PDF
Fluentd and Docker - running fluentd within a docker container
PDF
(Fios#02) 2. elk 포렌식 분석
PDF
Data collection in AWS at Schibsted
PDF
Luigi future
PPT
Hadoop and Pig at Twitter__HadoopSummit2010
PPTX
Presto overview
PDF
Async and Non-blocking IO w/ JRuby
PPTX
Onyx data processing the clojure way
PDF
«Scrapy internals» Александр Сибиряков, Scrapinghub
PPTX
Introduction to Apache Pig
PDF
Norikra: Stream Processing with SQL
PDF
NoSQL and SQL Anti Patterns
PPT
Rapid, Scalable Web Development with MongoDB, Ming, and Python
PDF
Fluentd - Flexible, Stable, Scalable
PDF
A New Chapter of Data Processing with CDK
KEY
Protocol Buffers and Hadoop at Twitter
PDF
Luigi Presentation at OSCON 2013
PDF
Barcelona MUG MongoDB + Hadoop Presentation
Norikra: SQL Stream Processing In Ruby
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
Fluentd and Docker - running fluentd within a docker container
(Fios#02) 2. elk 포렌식 분석
Data collection in AWS at Schibsted
Luigi future
Hadoop and Pig at Twitter__HadoopSummit2010
Presto overview
Async and Non-blocking IO w/ JRuby
Onyx data processing the clojure way
«Scrapy internals» Александр Сибиряков, Scrapinghub
Introduction to Apache Pig
Norikra: Stream Processing with SQL
NoSQL and SQL Anti Patterns
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Fluentd - Flexible, Stable, Scalable
A New Chapter of Data Processing with CDK
Protocol Buffers and Hadoop at Twitter
Luigi Presentation at OSCON 2013
Barcelona MUG MongoDB + Hadoop Presentation
Ad

Similar to Realtime Analytics with MongoDB Counters (mongonyc 2012) (20)

PPTX
Webdevcon Keynote hh-2012-09-18
KEY
Google App Engine Java, Groovy and Gaelyk
PDF
Buildingsocialanalyticstoolwithmongodb
PDF
Python - A Comprehensive Programming Language
PDF
Latest Developments in H2O
PDF
node.js 실무 - node js in practice by Jesang Yoon
PDF
Bar Camp Auckland - Mongo DB Presentation BCA4
PDF
Facebook Presto presentation
KEY
SD, a P2P bug tracking system
PDF
Nodejs - Should Ruby Developers Care?
PDF
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
PDF
Machine learning model to production
PDF
Node.js Enterprise Middleware
PDF
Java 23 and Beyond - A Roadmap Of Innovations
PDF
Node azure
PDF
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
KEY
Gaelyk - JFokus 2011 - Guillaume Laforge
PDF
Developing realtime apps with Drupal and NodeJS
PPTX
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
PDF
Web Development using Ruby on Rails
Webdevcon Keynote hh-2012-09-18
Google App Engine Java, Groovy and Gaelyk
Buildingsocialanalyticstoolwithmongodb
Python - A Comprehensive Programming Language
Latest Developments in H2O
node.js 실무 - node js in practice by Jesang Yoon
Bar Camp Auckland - Mongo DB Presentation BCA4
Facebook Presto presentation
SD, a P2P bug tracking system
Nodejs - Should Ruby Developers Care?
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
Machine learning model to production
Node.js Enterprise Middleware
Java 23 and Beyond - A Roadmap Of Innovations
Node azure
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Gaelyk - JFokus 2011 - Guillaume Laforge
Developing realtime apps with Drupal and NodeJS
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
Web Development using Ruby on Rails
Ad

More from Scott Hernandez (14)

PPTX
MongoDB 2.8 Replication Internals: Fitting it all together
PDF
Advanced Replication Internals
PDF
MongoDB Operational Best Practices (mongosf2012)
PDF
MongoDB Datacenter Awareness (mongosf2012)
PPTX
Mongo sf easy java persistence
PPTX
MongoDB: Easy Java Persistence with Morphia
PPTX
MongoDB: Mastering the shell
PPTX
MongoDB: Backup, Restore, and DR
PPT
A Brief MongoDB Intro
PPTX
What's new in the MongoDB Java Driver (2.5)?
PPTX
MongoDB Aug2010 SF Meetup
PPTX
MongoDB: tips, trick and hacks
PPTX
Mastering the MongoDB Javascript Shell
PPTX
Java Development with MongoDB
MongoDB 2.8 Replication Internals: Fitting it all together
Advanced Replication Internals
MongoDB Operational Best Practices (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)
Mongo sf easy java persistence
MongoDB: Easy Java Persistence with Morphia
MongoDB: Mastering the shell
MongoDB: Backup, Restore, and DR
A Brief MongoDB Intro
What's new in the MongoDB Java Driver (2.5)?
MongoDB Aug2010 SF Meetup
MongoDB: tips, trick and hacks
Mastering the MongoDB Javascript Shell
Java Development with MongoDB

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MYSQL Presentation for SQL database connectivity
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Empathic Computing: Creating Shared Understanding
sap open course for s4hana steps from ECC to s4
Encapsulation_ Review paper, used for researhc scholars
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx
MIND Revenue Release Quarter 2 2025 Press Release

Realtime Analytics with MongoDB Counters (mongonyc 2012)

  • 1. Pre-aggregation with counters © Copyright 2010 10gen Inc.
  • 2. Goals • Dashboard style reports • (Known) Reports • Real-time numbers
  • 3. Framework • Know your metrics/counter • Prepared reports • Calculate during write • Fast queries • Always up to date data • Record time-series collections
  • 4. Rationale • Documents are updated in-place* • $inc update operator • Working set is small • Aggregations are much smaller*
  • 5. Dashboard Projects Lines Events 6497 5543 3401 3543 2314 2342 921 1234 16 27 42 45 JavaScript Java Ruby Python Monday Tuesday Thursday Friday
  • 7. Roads not traveled • Map/Reduce • Reprocess raw data • Now possible to do partial reduce • Aggregation Framework (aggregate in 2.2) • Also reprocess data on operation (initial release) • Optimizations to come • More costly during reads
  • 8. Not Appropriate For • Ad-hoc aggregations (unknown metrics) • One-off reports • Possibly complex calculations
  • 9. Processing • Event received • Split into many updates w/$inc • Aggregate • Input Field(s) • Time periods (hourly, daily, monthly) • Defined Metrics
  • 10. Example Data: github > db.events.findOne() { "repository" : { "url" : "https://github.com/vidageek/games", ... "open_issues" : 25, "watchers" : 6, "pushed_at" : "2012/03/10 08:34:00 -0800", "language" : "Java" }, "actor_attributes" : {...}, "created_at" : "2012/03/11 15:20:24 -0700", "public" : true, "actor" : "juliano", "payload" : {...}, "url" : "https://github.com/...", "type" : "CommitCommentEvent” }
  • 11. Define Metrics • “actor” • “repository.name” • “repository.language” • “type” PushEvent, IssuesEvent, WatchEvent, GistEvent • “payload.ref” efs/heads/improved_history, refs/heads/master, refs /heads/signs
  • 12. Aggregations TimePeriod, type # TimePeriod, author # TimePeriod, project #
  • 14. Stats > db.stats_hourly.types.find({"_id.type":"GistEvent"}) { "_id" : { "p" : ISODate("2012-05-21T00:00:00Z"), "type" : "GistEvent” }, "hour" : { "2" : { "count" : 65 }, "3" : { "count" : 2 }, "7" : { ”count" : 130}, "8" : { "count" : 5 } }, "total" : { ”count" : 202 } }
  • 15. Updates Increment Query: { ”p" : Date(…), "actor" : "neoplastic"}} Update: { "$inc" : { "h.21.c" : 1 , "t.c" : 1}} Upsert : true
  • 16. Query/Graphing • Select by grouping (by date, by type/value) • Documents hold many data points
  • 17. The Whys • Writing more data up front, helps with reads • Multiple data points per document • Documents hold many timed points • Good for graphs by time, or types • Nested for improved performance
  • 18. Thanks for coming… ne questions
  • 19. drivers at mongodb.org mongodb.org Supported Community Supported C REST node.js C# ActionScript3 Objective C C++ C# and .NET PHP Erlang Clojure PowerShell Haskell ColdFusion Blog post Java Delphi Python Javascript Erlang Ruby Perl F# Scala PHP Go: gomongo Scheme (PLT) Python Groovy Smalltalk: Dolphin Ruby Haskell Smalltalk Javascript Lua © Copyright 2010 10gen Inc.
  • 20. download at mongodb.org conferences, appearances, and meetups http://www.10gen.com/events Facebook | Twitter | LinkedIn http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo support, training, and this talk brought to you by © Copyright 2010 10gen Inc.