SlideShare a Scribd company logo
How Capital Markets Firms Use
MongoDB as a Tick Database
Antoine Girbal, Technical Account Manager
Email: antoine@10gen.com
Twitter: @antoinegirbal
2
• MongoDB Introduction
• FS Use Cases
• Writing/Capturing Market Data
• Reading/Analyzing Market Data
• Performance, Scalability, & High Availability
• Q&A
Agenda
3
Introduction
10gen is the company behind MongoDB –
the leading next generation database
Document-
Oriented
Open-
Source
General
Purpose
4
10gen Overview
200+ employees 500+ customers
Over $81 million in funding
Offices in New York, Palo Alto, Washington
DC, London, Dublin, Barcelona and Sydney
5
Database Landscape
• No Automatic Joins
• Document Transactions
• Fast, Scalable Read/Writes
6
MongoDB Business Benefits
Increased Developer Productivity Better Customer Experience
Faster Time to Market Lower TCO
7
MongoDB Technical Benefits
Horizontally Scalable
-Sharding
Agile &
Flexible
High
Performance
-Indexes
-RAM
Application
Highly
Available
-Replica Sets
{ author: “roger”,
date: new Date(),
text: “Spirited Away”,
tags: [“Tezuka”, “Manga”]}
8
Most Common FS Use Cases
1. Tick Data Capture & Analysis
2. Reference Data Management
3. Risk Analysis & Reporting
4. Trade Repository
5. Portfolio Reporting
9
Tick Data Capture & Analysis -
Requirements
• Capture real-time market data (multi-asset, top of
book, depth of book, even news)
• Load historical data
• Aggregate data into bars, daily, monthly intervals
• Enable queries & analysis on raw ticks or
aggregates
• Drive backtesting or automated signals
10
Tick Data Capture & Analysis –
Why MongoDB?
• High throughput => can capture real-time feeds for all
products/asset classes needed
• High scalability => all data and depth for all historical time periods
can be captured
• Flexible & Range-based indexing => fast querying on time ranges
and any fields
• Aggregation Framework => can shape raw data into aggregates
(e.g. ticks to bars)
• Map-reduce capability (Native MR or Hadoop Connector) => batch
analysis looking for patterns and opportunities
• Easy to use => native language drivers and JSON expressions that
you can apply for most operational database needs as well
• Low TCO => Low software license cost and commodity hardware
Writing/Capturing Tick Data
12
Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Mark
ets/Brokers
Capturing
Application
Low Latency
Applications
Higher Latency
Trading
Applications
Backtesting and
Analysis
Applications
Market Data
Cached Static &
Aggregated Data
News & social
networking
sources
Orders
Orders
13
Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Mark
ets/Brokers
Capturing
Application
Low Latency
Applications
Higher Latency
Trading
Applications
Backtesting and
Analysis
Applications
Market Data
Cached Static &
Aggregated Data
News & social
networking
sources
Orders
Orders
Data Types
• Top of book
• Depth of book
• Multi-asset
• Derivatives (e.g. strips)
• News (text, video)
• Social Networking
14
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrice: 55.37,
offerPrice: 55.58,
bidQuantity: 500,
offerQuantity: 700
}
> db.ticks.find( {symbol: "DIS",
bidPrice: {$gt: 55.36} } )
Top of book [e.g. equities]
15
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrices: [55.37, 55.36, 55.35],
offerPrices: [55.58, 55.59, 55.60],
bidQuantities: [500, 1000, 2000],
offerQuantities: [1000, 2000, 3000]
}
> db.ticks.find( {bidPrices: {$gt: 55.36} } )
Depth of book
16
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bids: [
{price: 55.37, amount: 500},
{price: 55.37, amount: 1000},
{price: 55.37, amount: 2000} ],
offers: [
{price: 55.58, amount: 1000},
{price: 55.58, amount: 2000},
{price: 55.59, amount: 3000} ]
}
> db.ticks.find( {"bids.price": {$gt: 55.36} } )
or any way your app uses it
17
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
spreadPrice: 0.58
leg1: {symbol: “CLM13”, price: 97.34}
leg2: {symbol: “CLK13”, price: 96.92}
}
db.ticks.find( { leg1 : “CLM13” },
{ leg2 : “CLK13” },
{ spreadPrice : {$gt: 0.50 } } )
Synthetic spreads
18
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
title: “Disney Earnings…”
body: “Walt Disney Company reported…”,
tags: [“earnings”, “media”, “walt disney”]
}
News
19
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
timestamp: ISODate("2013-02-15 10:00"),
twitterHandle: “jdoe”,
tweet: “Heard @DisneyPictures is releasing…”,
usernamesIncluded: [“DisneyPictures”],
hashTags: [“movierumors”, “disney”]
}
Social networking
20
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS”,
openTS: Date("2013-02-15 10:00"),
closeTS: Date("2013-02-15 10:05"),
open: 55.36,
high: 55.80,
low: 55.20,
close: 55.70
}
Aggregates (bars, daily, etc.)
Querying/Analyzing Tick Data
22
Architecture for Querying Data
Higher Latency
Trading
Applications
Backtesting
Applications
• Ticks
• Bars
• Other analysis
Research &
Analysis
Applications
23
Index any fields: arrays, nested, etc
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
24
Query for ticks by time; price
threshold
// Ticks for last month for media companies
> db.ticks.find({
symbol: {$in: ["DIS", “VIA“, “CBS"]},
timestamp: {$gt: new ISODate("2013-01-01")},
timestamp: {$lte: new ISODate("2013-01-31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({
symbol: "DIS",
bidPrice: {$gt: 55.50},
timestamp: {$gt: new ISODate("2013-02-01")}})
25
• Custom application code
– Run your queries, compute your results
• Aggregation framework
– Declarative, pipeline-based approach
• Native Map/Reduce in MongoDB
– Javascript functions distributed across cluster
• Hadoop Connector
– Offline batch processing/computation
Analyzing/Aggregating Options
26
//Aggregate minute bars for Disney for this month
db.ticks.aggregate(
{ $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}},
{ $project: {
year: {$year: "$timestamp"},
month: {$month: "$timestamp"},
day: {$dayOfMonth: "$timestamp"},
hour: {$hour: "$timestamp"},
minute: {$minute: "$timestamp"},
second: {$second: "$timestamp"},
timestamp: 1,
price: 1}},
{ $sort: { timestamp: 1}},
{ $group :
{ _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"},
open: {$first: "$price"},
high: {$max: "$price"},
low: {$min: "$price"},
close: {$last: "$price"} }} )
Aggregate into min bars
27
…
//then count the number of down bars
{ $project: {
downBar: {$lt: [“$close”, “$open”] },
timestamp: 1,
open: 1, high: 1, low: 1, close: 1}},
{ $group: {
_id: “$downBar”,
sum: {$sum: 1}}} })
Add analysis on the bars
28
var mapFunction = function () {
emit(this.symbol, this.bidPrice);
}
var reduceFunction = function (symbol, priceList) {
return Array.sum(priceList);
}
> db.ticks.mapReduce(
map, reduceFunction, {out: ”tickSums"})
Map-Reduce Example: Sum
29
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop
jobs
– No need to go through HDFS
• Leverage power of Hadoop ecosystem against
operational data in MongoDB
Process Data on Hadoop
Performance, Scalability, and
High Availability
31
Why MongoDB is fast and scalable
Better data locality
Relational MongoDB
In-Memory
Caching
Auto-Sharding
Read/write scaling
32
Auto-Sharding for Horizontal Scale
mongod
Read/Write Scalability
Key Range
Symbol: A…Z
33
Auto-Sharding for Horizontal Scale
Read/Write Scalability
mongod mongod
Key Range
Symbol: A…J
Key Range
Symbol: K…Z
34
Sharding
mongod mongod
mongod mongod
Read/Write Scalability
Key Range
Symbol: A…F
Key Range
Symbol: G…J
Key Range
Symbol: K…O
Key Range
Symbol: P…Z
35
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
MongoS MongoS MongoS
Application
Key Range
Symbol: A…F,
Time
Key Range
Symbol: G…J,
Time
Key Range
Symbol: K…O,
Time
Key Range
Symbol: P…Z,
Time
36
Subscriptions
Professional Support, Enterprise Edition and Commercial License
10gen Products and Services
Consulting
Expert Resources for All Phases of MongoDB Implementations
Training
Online and In-Person, for Developers and Administrators
37
• MongoDB is high performance for tick data
• Scales horizontally automatically by auto-
sharding
• Fast, flexible querying, analysis, & aggregation
• Dynamic schema can handle any data types
• MongoDB has all these features with low TCO
• 10gen can support you with anything discussed
Summary
38
Resource Location
MongoDB Downloads www.mongodb.org/download
Free Online Training education.10gen.com
Webinars and Events www.10gen.com/events
White Papers www.10gen.com/white-papers
Customer Case Studies www.10gen.com/customers
Presentations www.10gen.com/presentations
Documentation docs.mongodb.org
Additional Info info@10gen.com
For More Information
Resource User Data Management
How Capital Markets Firms Use
MongoDB as a Tick Database
Matt Kalan, Sr. Solution Architect
Email: Matt.kalan@10gen.com
Twitter: @matthewkalan
Webinar: How Banks Use MongoDB as a Tick Database

More Related Content

PPTX
Webinar: How Banks Use MongoDB as a Tick Database
PDF
How Financial Services Organizations Use MongoDB
PPT
MongoDB Tick Data Presentation
PPTX
Python and MongoDB as a Market Data Platform by James Blackburn
PPTX
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
PPTX
Using MongoDB As a Tick Database
PDF
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
PDF
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
Webinar: How Banks Use MongoDB as a Tick Database
How Financial Services Organizations Use MongoDB
MongoDB Tick Data Presentation
Python and MongoDB as a Market Data Platform by James Blackburn
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
Using MongoDB As a Tick Database
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...

What's hot (20)

PPTX
Data Modeling for Microservices with Cassandra and Spark
PPTX
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
PPT
How Retail Banks Use MongoDB
PPTX
Joins and Other MongoDB 3.2 Aggregation Enhancements
PDF
MongoDB - General Purpose Database
PPTX
MongoDB & Hadoop - Understanding Your Big Data
PDF
MongoDB Atlas Workshop - Singapore
PDF
Using MongoDB + Hadoop Together
PPTX
MongoDB et Hadoop
PDF
MongoDB 4.0 새로운 기능 소개
PDF
Key note big data analytics ecosystem strategy
PDF
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
PDF
MongoDB company and case studies - john hong
PPTX
MongoDB + Spring
PDF
Blazing Fast Analytics with MongoDB & Spark
PPTX
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
PPT
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
PPTX
L’architettura di Classe Enterprise di Nuova Generazione
PDF
MongoDB Europe 2016 - The Rise of the Data Lake
Data Modeling for Microservices with Cassandra and Spark
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
How Retail Banks Use MongoDB
Joins and Other MongoDB 3.2 Aggregation Enhancements
MongoDB - General Purpose Database
MongoDB & Hadoop - Understanding Your Big Data
MongoDB Atlas Workshop - Singapore
Using MongoDB + Hadoop Together
MongoDB et Hadoop
MongoDB 4.0 새로운 기능 소개
Key note big data analytics ecosystem strategy
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
MongoDB company and case studies - john hong
MongoDB + Spring
Blazing Fast Analytics with MongoDB & Spark
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
L’architettura di Classe Enterprise di Nuova Generazione
MongoDB Europe 2016 - The Rise of the Data Lake
Ad

Similar to Webinar: How Banks Use MongoDB as a Tick Database (20)

PPT
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
PPTX
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
PDF
MongoDB_Spark
PDF
Webinar: Managing Real Time Risk Analytics with MongoDB
PPTX
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
PDF
MongoDB in FS
PDF
Confluent & MongoDB APAC Lunch & Learn
PDF
MongoDB FabLab León
KEY
PPT
MONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASE
PPTX
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
PPTX
Introduction to MongoDB and Workshop
PPTX
Dev Jumpstart: Build Your First App with MongoDB
PDF
Webinar: How Banks Manage Reference Data with MongoDB
PPTX
3 scenarios when to use MongoDB!
PDF
MongoDB NoSQL database a deep dive -MyWhitePaper
KEY
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
PPTX
Mongo db and hadoop driving business insights - final
PPTX
When to Use MongoDB...and When You Should Not...
PDF
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
MongoDB_Spark
Webinar: Managing Real Time Risk Analytics with MongoDB
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
MongoDB in FS
Confluent & MongoDB APAC Lunch & Learn
MongoDB FabLab León
MONGODB VASUDEV PRAJAPATI DOCUMENTBASE DATABASE
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
Introduction to MongoDB and Workshop
Dev Jumpstart: Build Your First App with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
3 scenarios when to use MongoDB!
MongoDB NoSQL database a deep dive -MyWhitePaper
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Mongo db and hadoop driving business insights - final
When to Use MongoDB...and When You Should Not...
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
sap open course for s4hana steps from ECC to s4
Review of recent advances in non-invasive hemoglobin estimation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Unlocking AI with Model Context Protocol (MCP)
20250228 LYD VKU AI Blended-Learning.pptx
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
Programs and apps: productivity, graphics, security and other tools
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx

Webinar: How Banks Use MongoDB as a Tick Database

  • 1. How Capital Markets Firms Use MongoDB as a Tick Database Antoine Girbal, Technical Account Manager Email: antoine@10gen.com Twitter: @antoinegirbal
  • 2. 2 • MongoDB Introduction • FS Use Cases • Writing/Capturing Market Data • Reading/Analyzing Market Data • Performance, Scalability, & High Availability • Q&A Agenda
  • 3. 3 Introduction 10gen is the company behind MongoDB – the leading next generation database Document- Oriented Open- Source General Purpose
  • 4. 4 10gen Overview 200+ employees 500+ customers Over $81 million in funding Offices in New York, Palo Alto, Washington DC, London, Dublin, Barcelona and Sydney
  • 5. 5 Database Landscape • No Automatic Joins • Document Transactions • Fast, Scalable Read/Writes
  • 6. 6 MongoDB Business Benefits Increased Developer Productivity Better Customer Experience Faster Time to Market Lower TCO
  • 7. 7 MongoDB Technical Benefits Horizontally Scalable -Sharding Agile & Flexible High Performance -Indexes -RAM Application Highly Available -Replica Sets { author: “roger”, date: new Date(), text: “Spirited Away”, tags: [“Tezuka”, “Manga”]}
  • 8. 8 Most Common FS Use Cases 1. Tick Data Capture & Analysis 2. Reference Data Management 3. Risk Analysis & Reporting 4. Trade Repository 5. Portfolio Reporting
  • 9. 9 Tick Data Capture & Analysis - Requirements • Capture real-time market data (multi-asset, top of book, depth of book, even news) • Load historical data • Aggregate data into bars, daily, monthly intervals • Enable queries & analysis on raw ticks or aggregates • Drive backtesting or automated signals
  • 10. 10 Tick Data Capture & Analysis – Why MongoDB? • High throughput => can capture real-time feeds for all products/asset classes needed • High scalability => all data and depth for all historical time periods can be captured • Flexible & Range-based indexing => fast querying on time ranges and any fields • Aggregation Framework => can shape raw data into aggregates (e.g. ticks to bars) • Map-reduce capability (Native MR or Hadoop Connector) => batch analysis looking for patterns and opportunities • Easy to use => native language drivers and JSON expressions that you can apply for most operational database needs as well • Low TCO => Low software license cost and commodity hardware
  • 12. 12 Trades/metrics High Level Trading Architecture Feed Handler Exchanges/Mark ets/Brokers Capturing Application Low Latency Applications Higher Latency Trading Applications Backtesting and Analysis Applications Market Data Cached Static & Aggregated Data News & social networking sources Orders Orders
  • 13. 13 Trades/metrics High Level Trading Architecture Feed Handler Exchanges/Mark ets/Brokers Capturing Application Low Latency Applications Higher Latency Trading Applications Backtesting and Analysis Applications Market Data Cached Static & Aggregated Data News & social networking sources Orders Orders Data Types • Top of book • Depth of book • Multi-asset • Derivatives (e.g. strips) • News (text, video) • Social Networking
  • 14. 14 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrice: 55.37, offerPrice: 55.58, bidQuantity: 500, offerQuantity: 700 } > db.ticks.find( {symbol: "DIS", bidPrice: {$gt: 55.36} } ) Top of book [e.g. equities]
  • 15. 15 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrices: [55.37, 55.36, 55.35], offerPrices: [55.58, 55.59, 55.60], bidQuantities: [500, 1000, 2000], offerQuantities: [1000, 2000, 3000] } > db.ticks.find( {bidPrices: {$gt: 55.36} } ) Depth of book
  • 16. 16 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bids: [ {price: 55.37, amount: 500}, {price: 55.37, amount: 1000}, {price: 55.37, amount: 2000} ], offers: [ {price: 55.58, amount: 1000}, {price: 55.58, amount: 2000}, {price: 55.59, amount: 3000} ] } > db.ticks.find( {"bids.price": {$gt: 55.36} } ) or any way your app uses it
  • 17. 17 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), spreadPrice: 0.58 leg1: {symbol: “CLM13”, price: 97.34} leg2: {symbol: “CLK13”, price: 96.92} } db.ticks.find( { leg1 : “CLM13” }, { leg2 : “CLK13” }, { spreadPrice : {$gt: 0.50 } } ) Synthetic spreads
  • 18. 18 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), title: “Disney Earnings…” body: “Walt Disney Company reported…”, tags: [“earnings”, “media”, “walt disney”] } News
  • 19. 19 { _id : ObjectId("4e2e3f92268cdda473b628f6"), timestamp: ISODate("2013-02-15 10:00"), twitterHandle: “jdoe”, tweet: “Heard @DisneyPictures is releasing…”, usernamesIncluded: [“DisneyPictures”], hashTags: [“movierumors”, “disney”] } Social networking
  • 20. 20 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS”, openTS: Date("2013-02-15 10:00"), closeTS: Date("2013-02-15 10:05"), open: 55.36, high: 55.80, low: 55.20, close: 55.70 } Aggregates (bars, daily, etc.)
  • 22. 22 Architecture for Querying Data Higher Latency Trading Applications Backtesting Applications • Ticks • Bars • Other analysis Research & Analysis Applications
  • 23. 23 Index any fields: arrays, nested, etc // Compound indexes > db.ticks.ensureIndex({symbol: 1, timestamp:1}) // Index on arrays >db.ticks.ensureIndex( {bidPrices: -1}) // Index on any depth > db.ticks.ensureIndex( {“bids.price”: 1} ) // Full text search > db.ticks.ensureIndex ( {tweet: “text”} )
  • 24. 24 Query for ticks by time; price threshold // Ticks for last month for media companies > db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01-01")}, timestamp: {$lte: new ISODate("2013-01-31")}}) // Ticks when Disney’s bid breached 55.50 this month > db.ticks.find({ symbol: "DIS", bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02-01")}})
  • 25. 25 • Custom application code – Run your queries, compute your results • Aggregation framework – Declarative, pipeline-based approach • Native Map/Reduce in MongoDB – Javascript functions distributed across cluster • Hadoop Connector – Offline batch processing/computation Analyzing/Aggregating Options
  • 26. 26 //Aggregate minute bars for Disney for this month db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} ) Aggregate into min bars
  • 27. 27 … //then count the number of down bars { $project: { downBar: {$lt: [“$close”, “$open”] }, timestamp: 1, open: 1, high: 1, low: 1, close: 1}}, { $group: { _id: “$downBar”, sum: {$sum: 1}}} }) Add analysis on the bars
  • 28. 28 var mapFunction = function () { emit(this.symbol, this.bidPrice); } var reduceFunction = function (symbol, priceList) { return Array.sum(priceList); } > db.ticks.mapReduce( map, reduceFunction, {out: ”tickSums"}) Map-Reduce Example: Sum
  • 29. 29 • MongoDB’s Hadoop Connector • Supports Map/Reduce, Streaming, Pig • MongoDB as input/output storage for Hadoop jobs – No need to go through HDFS • Leverage power of Hadoop ecosystem against operational data in MongoDB Process Data on Hadoop
  • 31. 31 Why MongoDB is fast and scalable Better data locality Relational MongoDB In-Memory Caching Auto-Sharding Read/write scaling
  • 32. 32 Auto-Sharding for Horizontal Scale mongod Read/Write Scalability Key Range Symbol: A…Z
  • 33. 33 Auto-Sharding for Horizontal Scale Read/Write Scalability mongod mongod Key Range Symbol: A…J Key Range Symbol: K…Z
  • 34. 34 Sharding mongod mongod mongod mongod Read/Write Scalability Key Range Symbol: A…F Key Range Symbol: G…J Key Range Symbol: K…O Key Range Symbol: P…Z
  • 35. 35 Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary MongoS MongoS MongoS Application Key Range Symbol: A…F, Time Key Range Symbol: G…J, Time Key Range Symbol: K…O, Time Key Range Symbol: P…Z, Time
  • 36. 36 Subscriptions Professional Support, Enterprise Edition and Commercial License 10gen Products and Services Consulting Expert Resources for All Phases of MongoDB Implementations Training Online and In-Person, for Developers and Administrators
  • 37. 37 • MongoDB is high performance for tick data • Scales horizontally automatically by auto- sharding • Fast, flexible querying, analysis, & aggregation • Dynamic schema can handle any data types • MongoDB has all these features with low TCO • 10gen can support you with anything discussed Summary
  • 38. 38 Resource Location MongoDB Downloads www.mongodb.org/download Free Online Training education.10gen.com Webinars and Events www.10gen.com/events White Papers www.10gen.com/white-papers Customer Case Studies www.10gen.com/customers Presentations www.10gen.com/presentations Documentation docs.mongodb.org Additional Info info@10gen.com For More Information Resource User Data Management
  • 39. How Capital Markets Firms Use MongoDB as a Tick Database Matt Kalan, Sr. Solution Architect Email: Matt.kalan@10gen.com Twitter: @matthewkalan

Editor's Notes

  • #6: Mention tick databases
  • #15: JSON document – contains key value pairs, different types, values can also be arrays and other documents
  • #16: because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
  • #17: because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
  • #18: because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
  • #19: comments is an array of JSON documentswe can query by fields inside embedded documents as well as array members.
  • #20: secondary indexes, compound indexes, multikey indexes.why is it important to have all of document together? data locality
  • #21: secondary indexes, compound indexes, multikey indexes.why is it important to have all of document together? data locality
  • #32: Fewer reads, data is together, memory mapped files, caching handled by OS, naturally leaves most frequently accessed data in RAM (have enough RAM to fit indexes and working data set into RAM for best performance), horizontal scaling is "built-in" to the product by design from the start.
  • #36: Full deployment. As many mongoS processes as you have app servers (for example); Config DBs are small but hold the critical information about where ranges of data are located on disk/shards.