SlideShare a Scribd company logo
Simple Analytics with MongoDB
About Me
I’m Ross Affandy. Senior
Developer Cum System
Administrator at Carlist.MY
MongoPress Core
Developer
I will talking about:
- Our stack (architecture)
- Our problem
- Our solution
- Our lesson
Stack in cloud

Platform – Linux (Amazon Distro)
Database – MongoDB
Language – PHP (API)
Webserver – NginX
(Sorry node.js – I’m not developing event-driven programming or require long
pulling persistent connection)
Using Amazon EC2 micro instance
600MB RAM
8GB EBS root partition
30GB EBS partition for MongoDB storage (format as xfs filesystem)
Why Amazon Cloud?
I want to save 70% of my time managing infrastructure and focus to writing code
Business Analytics Essential
- Bank use business analytics to predict & prevent credit card fraud
- Retailers use business analytics to predict the best location for
store and reach target market
- Even sports team use business analytics to determine game
strategy and ticket price
Problem to solve
Real time data collection :
- Implementing pageview counter
- Simple Analytics

Why MongoDB?
- MySQL usually blocked on file system reads
- Good at saving large volume of data
- Support asynchronous insert ( fire & forget )
- Fast access to large binary object
- Read/write ratio is highly skewed to reads
- Upsert ( simplify my code )
Klmug presentation - Simple Analytics with MongoDB
Data structure and how it look like?
Now the story begin!
Problem / Challenge
We face many exciting challenges ( expect the unexpected )
Implementation
We use map reduce to gather the information that we collect
What is map reduce in MongoDB and why we use it?
- Equal to count/sum/avg/group by function with MySQL.
- Map reduce is easier to understand
- Useful to process large dataset concurrently in large cluster of machines
(sorry for this, we don’t have budget yet )
Problem
Map reduce very slow and crash the server due to the javascript engine
and lack of processing power (low RAM and cpu)
MongoDB also has a group() function. Why not use it?
Group() function only return single bson object (less than 16mb). Not
useful for unique data more than 10,000 value
Problem / Challenge
Problem / Challenge
Problem / Challenge
Problem / Challenge
Klmug presentation - Simple Analytics with MongoDB
Moving to aggregation framework
Quickly running latest version of MongoDB just to get aggregation
function
Changing PHP query to using aggregation instead of map reduce

Good news

Server not crash

Bad news

Aggregation is better but still need more RAM to process 2 million
document. Still slow.
Klmug presentation - Simple Analytics with MongoDB
Experiment

Test run on Amazon SSD + 64GB RAM (Virginia)
- Copy 12GB data to another amazon EC2 instance
- Run the map reduce and aggregation query to see what break.
Nothing break. Server look happy 
Problem Solve?
Yes, but server cost is too expensive.
Solution

Denormalization
- In computing, denormalization is the process of attempting to optimise the
read performance of a database by adding redundant data or by grouping
data.In some cases, denormalisation helps cover up the inefficiencies
inherent in relational database software. A relational normalised database
imposes a heavy access load over physical storage of data even if it is well
tuned for high performance.
- Copying of the same data into multiple documents or tables in order to
simplify/optimize query processing
- Be careful about duplicate data that will easier make database big
When to denormalize?
Query data volume or IO per query VS total data volume.
Processing complexity VS total data volume.
Now everytime user access the page, we run 2 query.
1) Capture the data for analytics
2) Update other collection to replace group by. Later on will be use to display
to user.
Summary / Lesson learned
- We learned what makes MongoDB a good analytics tool
- Data modeling is important.
What questions do I have?
What answers do I have?
- Design query before design schema
- Simplified everything
MapReduce is slower and is not supposed to be used in “real time.”
TIPS
Always run load / stress test before go live
1) capacity planning
2) capacity testing
3) performance tuning
Tools
1) Dex performance tuning tool from mongolab is really helpful https://github.com/mongolab/dex
It's not about winning,
It's all about taking part!
Contact
Website: http://www.carlist.my
Email: enquiries@carlist.my
We also hiring!
jobs@carlist.my
Q&A?

More Related Content

PDF
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
PDF
In-Memory Data Grids - Ampool (1)
PPTX
Peter_Smith_PhD_ACL_10000_Foot_View_of_Big_Data
PPTX
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
PDF
Propelling IoT Innovation with Predictive Analytics
PDF
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
PDF
MongoDB Capacity Planning
PPTX
Capacity Planning For Your Growing MongoDB Cluster
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
In-Memory Data Grids - Ampool (1)
Peter_Smith_PhD_ACL_10000_Foot_View_of_Big_Data
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
Propelling IoT Innovation with Predictive Analytics
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
MongoDB Capacity Planning
Capacity Planning For Your Growing MongoDB Cluster

What's hot (18)

PPTX
Sergejus Barinovas
PPTX
In-Memory Computing: How, Why? and common Patterns
PPTX
DAT304_Amazon Aurora Performance Optimization with MySQL
PPTX
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
PDF
How to optimize asp dot-net application
PDF
BlazingSQL + RAPIDS AI at GTC San Jose 2019
PPTX
HBaseCon 2013: ETL for Apache HBase
PDF
Designing your SaaS Database for Scale with Postgres
PPTX
Hardware Provisioning
PPTX
HBaseCon 2015: HBase Operations in a Flurry
PPT
How to optimize asp dot net application ?
PDF
Query Anything, Anywhere with Kubernetes
PPTX
Scalable data pipeline at Traveloka - Facebook Dev Bandung
PPTX
HBaseCon 2013: Apache HBase on Flash
PPTX
Scalable data systems at Traveloka
PPTX
Cassandra vs. MongoDB
PPTX
Cloud Optimized Big Data
PPTX
Microsoft SQL Server - Benchmark Presentation
Sergejus Barinovas
In-Memory Computing: How, Why? and common Patterns
DAT304_Amazon Aurora Performance Optimization with MySQL
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
How to optimize asp dot-net application
BlazingSQL + RAPIDS AI at GTC San Jose 2019
HBaseCon 2013: ETL for Apache HBase
Designing your SaaS Database for Scale with Postgres
Hardware Provisioning
HBaseCon 2015: HBase Operations in a Flurry
How to optimize asp dot net application ?
Query Anything, Anywhere with Kubernetes
Scalable data pipeline at Traveloka - Facebook Dev Bandung
HBaseCon 2013: Apache HBase on Flash
Scalable data systems at Traveloka
Cassandra vs. MongoDB
Cloud Optimized Big Data
Microsoft SQL Server - Benchmark Presentation
Ad

Viewers also liked (9)

KEY
Thoughts on MongoDB Analytics
PPTX
Social Analytics on MongoDB at MongoNYC
PDF
Blazing Fast Analytics with MongoDB & Spark
PPTX
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
PDF
MongoDB for Analytics
PDF
Webinar: Faster Big Data Analytics with MongoDB
PPTX
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
PPTX
Real Time Data Analytics with MongoDB and Fluentd at Wish
PDF
MongoDB World 2016: The Best IoT Analytics with MongoDB
Thoughts on MongoDB Analytics
Social Analytics on MongoDB at MongoNYC
Blazing Fast Analytics with MongoDB & Spark
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
MongoDB for Analytics
Webinar: Faster Big Data Analytics with MongoDB
Webinar: MongoDB and Analytics: Building Solutions with the MongoDB BI Connector
Real Time Data Analytics with MongoDB and Fluentd at Wish
MongoDB World 2016: The Best IoT Analytics with MongoDB
Ad

Similar to Klmug presentation - Simple Analytics with MongoDB (20)

ODP
Front Range PHP NoSQL Databases
PPT
Scaling Your Web Application
PPTX
Mongo db pefrormance optimization strategies
PDF
MongoDB performance
PPTX
Data storage for the cloud ce11
PPTX
Data storage for the cloud ce11
PPTX
Data storage for the cloud ce11
PPT
Big Data Real Time Analytics - A Facebook Case Study
PPT
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
PDF
GCP Data Engineer cheatsheet
PDF
MongoDB Tips and Tricks
PDF
Gcp data engineer
PDF
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
PPT
Hadoop and Voldemort @ LinkedIn
PPTX
Cost effective BigData Processing on Amazon EC2
PDF
Vectorization whitepaper
PDF
Architectural anti patterns_for_data_handling
PPTX
http://www.hfadeel.com/Blog/?p=151
PDF
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Front Range PHP NoSQL Databases
Scaling Your Web Application
Mongo db pefrormance optimization strategies
MongoDB performance
Data storage for the cloud ce11
Data storage for the cloud ce11
Data storage for the cloud ce11
Big Data Real Time Analytics - A Facebook Case Study
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
GCP Data Engineer cheatsheet
MongoDB Tips and Tricks
Gcp data engineer
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Hadoop and Voldemort @ LinkedIn
Cost effective BigData Processing on Amazon EC2
Vectorization whitepaper
Architectural anti patterns_for_data_handling
http://www.hfadeel.com/Blog/?p=151
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Modernizing your data center with Dell and AMD
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
CIFDAQ's Market Insight: SEC Turns Pro Crypto
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Monthly Chronicles - July 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Network Security Unit 5.pdf for BCA BBA.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Modernizing your data center with Dell and AMD
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars

Klmug presentation - Simple Analytics with MongoDB

  • 2. About Me I’m Ross Affandy. Senior Developer Cum System Administrator at Carlist.MY MongoPress Core Developer
  • 3. I will talking about: - Our stack (architecture) - Our problem - Our solution - Our lesson
  • 4. Stack in cloud Platform – Linux (Amazon Distro) Database – MongoDB Language – PHP (API) Webserver – NginX (Sorry node.js – I’m not developing event-driven programming or require long pulling persistent connection) Using Amazon EC2 micro instance 600MB RAM 8GB EBS root partition 30GB EBS partition for MongoDB storage (format as xfs filesystem) Why Amazon Cloud? I want to save 70% of my time managing infrastructure and focus to writing code
  • 5. Business Analytics Essential - Bank use business analytics to predict & prevent credit card fraud - Retailers use business analytics to predict the best location for store and reach target market - Even sports team use business analytics to determine game strategy and ticket price
  • 6. Problem to solve Real time data collection : - Implementing pageview counter - Simple Analytics Why MongoDB? - MySQL usually blocked on file system reads - Good at saving large volume of data - Support asynchronous insert ( fire & forget ) - Fast access to large binary object - Read/write ratio is highly skewed to reads - Upsert ( simplify my code )
  • 8. Data structure and how it look like?
  • 9. Now the story begin!
  • 10. Problem / Challenge We face many exciting challenges ( expect the unexpected ) Implementation We use map reduce to gather the information that we collect What is map reduce in MongoDB and why we use it? - Equal to count/sum/avg/group by function with MySQL. - Map reduce is easier to understand - Useful to process large dataset concurrently in large cluster of machines (sorry for this, we don’t have budget yet ) Problem Map reduce very slow and crash the server due to the javascript engine and lack of processing power (low RAM and cpu) MongoDB also has a group() function. Why not use it? Group() function only return single bson object (less than 16mb). Not useful for unique data more than 10,000 value
  • 16. Moving to aggregation framework Quickly running latest version of MongoDB just to get aggregation function Changing PHP query to using aggregation instead of map reduce Good news Server not crash Bad news Aggregation is better but still need more RAM to process 2 million document. Still slow.
  • 18. Experiment Test run on Amazon SSD + 64GB RAM (Virginia) - Copy 12GB data to another amazon EC2 instance - Run the map reduce and aggregation query to see what break. Nothing break. Server look happy  Problem Solve? Yes, but server cost is too expensive.
  • 19. Solution Denormalization - In computing, denormalization is the process of attempting to optimise the read performance of a database by adding redundant data or by grouping data.In some cases, denormalisation helps cover up the inefficiencies inherent in relational database software. A relational normalised database imposes a heavy access load over physical storage of data even if it is well tuned for high performance. - Copying of the same data into multiple documents or tables in order to simplify/optimize query processing - Be careful about duplicate data that will easier make database big When to denormalize? Query data volume or IO per query VS total data volume. Processing complexity VS total data volume. Now everytime user access the page, we run 2 query. 1) Capture the data for analytics 2) Update other collection to replace group by. Later on will be use to display to user.
  • 20. Summary / Lesson learned - We learned what makes MongoDB a good analytics tool - Data modeling is important. What questions do I have? What answers do I have? - Design query before design schema - Simplified everything MapReduce is slower and is not supposed to be used in “real time.” TIPS Always run load / stress test before go live 1) capacity planning 2) capacity testing 3) performance tuning Tools 1) Dex performance tuning tool from mongolab is really helpful https://github.com/mongolab/dex
  • 21. It's not about winning, It's all about taking part!
  • 23. Q&A?