SlideShare a Scribd company logo
Achieving 100,000
Transactions Per Second
 with a NoSQL Database



         Eric David Bloch
           @eedeebee

          23 May 2012
A bit about me

• I’ve written software used by millions of people.

    Apps, libraries, compilers, device drivers, operating systems

• This is my third Gluecon and my first talk

• I’m the Community Director at MarkLogic, last 2 years.

• I survived having 3 kids in less than 2 years.
Like me, but not me
Actually, me
Storyline

• Why?
• How?
  • Whirl-wind tour of database architecture
  • Techniques to get to 100K
Real World Big Data, 100,000 transactions a second in a NoSQL Database
 It’s about money.




 Top 5 bank needed to manage trades
 Trades look more like documents than tables
 Schemas for trades change all the time
 Transactions
 Scale and velocity (“Big Data”)
Trades and Positions


 1 million trades per day
 Followed by 1 million position reports at end of day
    Roll up trades of current date for each “book, instrument” pair
    Group-by, with key = “date, book, instrument”

                  1M positions
         1M trades




               Day         Day   Day    Day    Day     ...
                1           2     3      4      5
Trades and Positions

<trade>
   <quantity>8540882</quantity>
   <quantity2>1193.71</quantity2>
   <instrument>WASAX</instrument>
   <book>679</book>
   <trade-date>2011-03-13-07:00</trade-date>
   <settle-date>2011-03-17-07:00</settle-date>
</trade>


<position>
  <instrument>EAAFX</instrument>
  <book>679</book>
  <quantity>3</quantity>
  <business-date>2011-03-25Z</business-date>
  <position-date>2011-03-24Z</position-date>
</position>
Requirements

   NoSQL flexibility,
 performance & scale

    Enterprise-grade
transactional guarantees
Now show us

• 15K inserts per second
• Linear scalability
What NoSQL Database out there can we use?




                     A quote from the bank
“We threw everything we had at MarkLogic and it didn’t break a sweat”
We weren’t content with 15K


So we showed them…




Slide 13   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
What is MarkLogic?


 Non-relational, document-oriented, distributed
  database
            Shared nothing clustering, linear scale out
            Multi-Version Concurrency Control (MVCC)
            Transactions (ACID)


 Search Engine
               Web scale (Big Data)
               Inverted indexes (term lists)
               Real-time updates
               Compose-able queries

Slide 14       Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Architecture



 Data model
 Indexing
 Clustering
 Query execution
 Transactions



Slide 15   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Key (URI)                    Value (Document)

/trade/153748994             <trade>
                              <id>8</id>
                              <time>2012-02-20T14:00:00</time>
                              <instrument>BYME AAA</instrument>
                              <price cur=“usd”>600.27</price>
                             </trade>



/user/eedeebee               {
                                 “name” : “Eric Bloch”,
                                 “age” : 47,
                                 “hair” : “gray”,
                                 “kids” : [ “Grace”, “Ryan”, “Owen” ]
                             }

/book5293                    It was the best of times, it was the worst of times, it
                             was the age of wisdom, ...


/2012-02-20T14:47:53/01445   .mp3
                             .avi
                             [your favorite binary format]
Inverted Index



“which”                                          123, 127, 129, 152, 344, 791 . . .

“uniquely”                                       122, 125, 126, 129, 130, 167 . . .

“identify”                                       123, 126, 130, 142, 143, 167 . . .

“each”                                           123, 130, 131, 135, 162, 177 . . .   Document
“uniquely identify”
                                                 126, 130, 167, 212, 219, 377 . . .   References
<article>                                        ...
                                                                                      126, 130, 167, 212, 219, 377 . . .
article/abstract/@author                         ...

<product>IMS</product>




  Slide 17   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Range Index

                  <trade>
                   <trader_id>8</trader_id>                                       Value     Docid
                   <time>2012-02-20T14:00:00</time>
                   <instrument>IBM</instrument>                                   0         287
                   …
                  </trade>                                                        8         1129
                                                                                  13        531    Docid Value
                  <trade>
                   <trader_id>13</trader_id>                                      …         …      287    0
                   <time>2012-02-20T14:30:00</time>
Rows               <instrument>AAPL</instrument>                                  …         …      531    13
                   …                                                                               1129   8
                  </trade>
                                                                                                   …      …
                  <trade>
                   <trader_id>0</trader_id>                                                        …      …
                   <time>2012-02-20T15:30:00</time>
                   <instrument>GOOG</instrument>                                      • Column Oriented
                   …                                                                  • Memory Mapped
                  </trade>


       Slide 18   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Shared-Nothing Clustering




     Host D1                                        Host D2                Host D3   …   Host Dj




Slide 19   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Query Evaluation – “Map”


                                         Q
      Host E1                                         Host E2                      Host E3   …       Host Ei




         Host D1                                        Host D2                    Host D3            Host Dj
                                                                                             …

Q                                        Q                                     Q                 Q
      F1 … Fn                                       F1 … Fn                        F1 … Fn           F1 … Fn
    Slide 21   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Query Evaluation – “Reduce”

                                           Top
                                           10
        Host E1                                         Host E2                    Host E3   …     Host Ei




           Host D1                                        Host D2                  Host D3             Host Dj
                                                                                             …
Top                                             Top                              Top             Top
10                                              10                               10              10
        F1 … Fn                                       F1 … Fn                      F1 … Fn             F1 … Fn
      Slide 22   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Queries/Updates with MVCC


 Every query has a timestamp
 Documents do not change

    Reads are lock-free
    Inserts – see next slide
    Deletes – mark as deleted
    Edits –
            copy
            edit
            insert the copy
            mark the original as deleted

Slide 23    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Insert Mechanics


1)         New URI+Document arrive at E-node
2)         URI Probe – determine whether URI exists in any forest
3)         URI Lock – write locks is taken on D node(s)
4)         Forest Assignment – URI is deterministically placed in Forest
5)         Indexing
6)         Journaling
7)         Commit – transaction complete
8)         Release URI Locks – D node(s) are notified to release lock




Slide 24   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Save-and-merge
       (Log Stuctured Tree Merge)

                                                                                  Database
                          Insert/
                          Update



                                                            Forest 1                              Forest 2
                                                 S0
Memory                         Save

Disk
       Journaled
                                                 S1                 S2                       S1     S2
                                                                                    S3                       S3

                                          Merge
                  J

       Slide 25   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Back to
the money
Trades and Positions

 1 million trades per day
 followed by 1 million position reports at end of day
        Roll up trades of the current “date” for each “book:instrument” pair
        Group-by, with key = “book:date:instrument”


                           1M positions
                  1M trades




                                Day                    Day                 Day   Day   Day   ...
                                 1                      2                   3     4     5


Slide 27   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Naive Query Pseudocode



for each book
for each instrument in that book
  position = position(yesterday, book, instrument)
  for each trade of that instrument in this book
   position += trade(today, book, instrument).quantity
  insert(today, book, instrument, position)




    Slide 28   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Initial results




Single node – 19,000 inserts per second




Slide 29   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Initial results with cluster


           70000


           60000


           50000


           40000
 doc/sec




           30000                                                                                 Report Query 2


           20000


           10000


               0
                                      1DE                                        2DE       3DE
                                                                              # of nodes




Slide 30      Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Techniques to get to 100K


 Query for Computing New Positions
            Materialized compound key, Co-Occurrence Query and Aggregation
 Insert of New Positions
            Batching
            In-Forest Eval




Slide 31    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Materializing a compound key

<trade>
   <quantity2>1193.71</quantity2>
   <instrument>WASAX</instrument>
   <book>679</book>
   <trade-date>2011-03-13-07:00</trade-date>
   <settle-date>2011-03-17-07:00</settle-date>
</trade>




  Slide 32   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Materializing a compound key

<trade>
   <roll-up book-date-instrument=“151333445566782303”/>
   <quantity2>1193.71</quantity2>
   <instrument>WASAX</instrument>
   <book>679</book>
   <trade-date>2011-03-13-07:00</trade-date>
   <settle-date>2011-03-17-07:00</settle-date>
</trade>




 Slide 33   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Co-Occurrence and
       Distributed Aggregation
<trade>
 <roll-up
    book-date-instrument=“151373445566703”/>                                       V        D
                                                                                       D         V
 <quantity>8540882</quantity>                                                      …        …
 <instrument>WASAX</instrument>                                                        …        …
                                                                                   …        1129
 <book>679</book>                                                                      …        …
 <trade-date>2011-03-13-07:00</trade-date>                                         …         …
                                                                                       15137…    …
 <settle-date>2011-03-17-07:00</settle-date>
                                                                                   …        …
</trade>                                                                               …         …
                                                                                   …        …
                                                                                       …         …

• Co-occurrences:
                                                                                                 V         D
     Find pairings of range indexed values                                                            D        V
                                                                                                 …         …
• Aggregate on the D nodes (Map/Reduce):
                                                                                                 8540882
                                                                                                       …   1129 …
     Sum up the quantities above
                                                                                                      …        …
                                                                                                 …         …
                                                                                                      …        8540882
• Similar to a Group-by,                                                                         …         …
     • in a column-oriented, in-memory                                                           …    …    …   …
     database                                                                                         …        …


        Slide 34   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Initial results with new query




      > 30K inserts/second on a single node




Slide 35   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Co-Occurrence + Aggregate versus
Naïve approach

           70000


           60000


           50000


           40000
 doc/sec




                                                                                                 Report Query 3
           30000
                                                                                                 Report Query 2

           20000


           10000


               0
                                      1DE                                        2DE       3DE
                                                                              # of nodes




Slide 36      Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Techniques


 Computing Positions
            Materialized compound key, Co-Occurrence Query and Aggregation
 Updates
            Batching
            In-Forest Eval




Slide 37    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Transaction Size and
Throughput

                  35000


                  30000


                  25000
 #docs per sec.




                  20000


                  15000                                                                                                   insert throughput


                  10000


                   5000


                      0
                                1           2            4            8              16   32   100   500   1000   10000
                                                                   # doc inserts per transaction




Slide 38             Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Techniques


 Computing Positions (Data Warehouse)
            Hash Key Decoration
            Co-Occurrence
 Updates (Transaction)
            Batching
            In-Forest Eval




Slide 39    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Insert Mechanics


1)         New URI+Document arrive at E-node
2)         *URI Probe – determine whether URI exists in any forest
3)         *URI Lock – write locks are created
4)         Forest Assignment – URI is deterministically placed in Forest
5)         Indexing
6)         Journaling
7)         Commit – transaction complete
8)         *Release URI Locks – D nodes are notified to release lock



* Overhead of these operations increases with cluster size


Slide 40   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Deterministic Placement


4) Forest Assignment – URI is deterministically placed in Forest

                             Hash                                          64-bit   Bucketed        Fi
  URI
                             function                                      number   into a Forest




  • Done in C++ within server
  • But…
       • Can also be done in the client
       • Server allows queries to be evaluated against only one
       forest…




Slide 41   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
In-Forest Eval


                                                        Q
                                                                           E




                                        D1                                              D2


                                                                               Q
                         F1                                F2                      F3        F4



Slide 42   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
In-Forest Eval Insert Mechanics


1) New URI+Document arrive at E-node
           a)      Compute Fi using
           b)      Ask server to evaluated the insert query only against Fi
2)         URI Probe – Fi Only
3)         URI Lock – Fi Only
4)         Forest Assignment – Fi Only
5)         Indexing
6)         Journaling
7)         Commit – transaction complete
8)         Lock Release - Fi Only



Slide 43    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Regular Insert Vs. In-Forest Eval


               70000


               60000


               50000
 docs/second




               40000


                                                                                                     regular insert
               30000
                                                                                                     in-forest-eval
                                                                                                     In-Forest Eval

               20000


               10000


                   0
                                           1DE                                       2DE       3DE
                                                                                  # of nodes




Slide 44          Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Tada! Achieving 100K Updates




                                                                           In-Forest Eval




Slide 45   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
NoSQL document-oriented database
with real-time full-text search and transactions
    Doing 100K transactions per second




                                     (Yes that’s marker felt)
Real World Big Data, 100,000 transactions a second in a NoSQL Database
Real World Big Data, 100,000 transactions a second in a NoSQL Database
Thank You!

@eedeebee
eric.bloch@marklogic.com




Slide 49   Copyright © 2012 MarkLogic® Corporation. All rights reserved.

More Related Content

PPTX
Edb 100k-trans-qcon-rev1
PPTX
Internal training - Eda
PDF
Hibernate OGM - JPA for Infinispan and NoSQL
PPTX
Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB
PPTX
Scaling to 30,000 Requests Per Second and Beyond with MongoDB
PDF
Benchmark slideshow
PDF
Adam Coates at AI Frontiers: AI for 100 Million People with Deep Learning
PDF
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Edb 100k-trans-qcon-rev1
Internal training - Eda
Hibernate OGM - JPA for Infinispan and NoSQL
Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB
Scaling to 30,000 Requests Per Second and Beyond with MongoDB
Benchmark slideshow
Adam Coates at AI Frontiers: AI for 100 Million People with Deep Learning
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...

Viewers also liked (20)

PDF
MarkLogic Overview and Use Cases
PDF
Runaway complexity in Big Data... and a plan to stop it
PPTX
MiFID II: The Top 5 Data Challenges
PPT
36518008 hdfc-bank-3
PDF
Junli Gu at AI Frontiers: Autonomous Driving Revolution
PDF
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
PDF
Speed up your Symfony2 application and build awesome features with Redis
DOCX
hdfc Bank project
PDF
Organization study on hdfc bank
PPTX
HDFC Persentation
PPTX
GDPR: A Step-By-Step Guide To Compliance
PDF
Project report format
PPTX
Preparing Detailed Project Report and Presenting Business Plan to Investors
PPTX
Hdfc bank ppt
PPT
Preparation of project report for bank finance
PPT
A study on financial analysis of hdfc bank
PDF
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
PDF
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
PDF
Jisheng Wang at AI Frontiers: Deep Learning in Security
PDF
Charles Fan at AI Frontiers: The New Era of AI Plus
MarkLogic Overview and Use Cases
Runaway complexity in Big Data... and a plan to stop it
MiFID II: The Top 5 Data Challenges
36518008 hdfc-bank-3
Junli Gu at AI Frontiers: Autonomous Driving Revolution
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Speed up your Symfony2 application and build awesome features with Redis
hdfc Bank project
Organization study on hdfc bank
HDFC Persentation
GDPR: A Step-By-Step Guide To Compliance
Project report format
Preparing Detailed Project Report and Presenting Business Plan to Investors
Hdfc bank ppt
Preparation of project report for bank finance
A study on financial analysis of hdfc bank
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
Jisheng Wang at AI Frontiers: Deep Learning in Security
Charles Fan at AI Frontiers: The New Era of AI Plus
Ad

Similar to Real World Big Data, 100,000 transactions a second in a NoSQL Database (20)

PPT
BugDay2012 Test Design with CTE XL(TrainingDay)
PDF
Harmony intune final
PDF
Playing in Tune: How We Refactored Cube to Terabyte Scale
PPT
Multidimensional Indexing
PDF
MicroService Architecture
KEY
A Data Scientist And A Log File Walk Into A Bar...
PDF
Backtesting Trading Strategies with R
PDF
Cascading meetup #4 @ BlueKai
PDF
Unlocking the secrets to how essbase thinks e roske in sync10 oracle epm track
PPTX
KEY
Approximating Change Sets at Philips Healthcare: A Case Study
PDF
MongoDB @ Frankfurt NoSql User Group
PDF
Process Mining - Chapter 10 - Tool Support
PDF
Process mining chapter_10_tool_support
PDF
Query optimization how to search millions of record in sql table faster -
PPTX
Architecture of Search Systems and Measuring the Search Effectiveness
PDF
Memory Deduction Games
PPTX
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
PPTX
Experiences and requirements for a User Interaction Modeling Language
PDF
Functional programming for optimization problems in Big Data
BugDay2012 Test Design with CTE XL(TrainingDay)
Harmony intune final
Playing in Tune: How We Refactored Cube to Terabyte Scale
Multidimensional Indexing
MicroService Architecture
A Data Scientist And A Log File Walk Into A Bar...
Backtesting Trading Strategies with R
Cascading meetup #4 @ BlueKai
Unlocking the secrets to how essbase thinks e roske in sync10 oracle epm track
Approximating Change Sets at Philips Healthcare: A Case Study
MongoDB @ Frankfurt NoSql User Group
Process Mining - Chapter 10 - Tool Support
Process mining chapter_10_tool_support
Query optimization how to search millions of record in sql table faster -
Architecture of Search Systems and Measuring the Search Effectiveness
Memory Deduction Games
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
Experiences and requirements for a User Interaction Modeling Language
Functional programming for optimization problems in Big Data
Ad

Recently uploaded (20)

PPTX
Chapter 5: Probability Theory and Statistics
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Encapsulation theory and applications.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Mushroom cultivation and it's methods.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Enhancing emotion recognition model for a student engagement use case through...
Chapter 5: Probability Theory and Statistics
Heart disease approach using modified random forest and particle swarm optimi...
1 - Historical Antecedents, Social Consideration.pdf
Encapsulation_ Review paper, used for researhc scholars
Hindi spoken digit analysis for native and non-native speakers
Encapsulation theory and applications.pdf
A novel scalable deep ensemble learning framework for big data classification...
Tartificialntelligence_presentation.pptx
Hybrid model detection and classification of lung cancer
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
NewMind AI Weekly Chronicles - August'25-Week II
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Approach and Philosophy of On baking technology
Mushroom cultivation and it's methods.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Building Integrated photovoltaic BIPV_UPV.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
cloud_computing_Infrastucture_as_cloud_p
Enhancing emotion recognition model for a student engagement use case through...

Real World Big Data, 100,000 transactions a second in a NoSQL Database

  • 1. Achieving 100,000 Transactions Per Second with a NoSQL Database Eric David Bloch @eedeebee 23 May 2012
  • 2. A bit about me • I’ve written software used by millions of people. Apps, libraries, compilers, device drivers, operating systems • This is my third Gluecon and my first talk • I’m the Community Director at MarkLogic, last 2 years. • I survived having 3 kids in less than 2 years.
  • 3. Like me, but not me
  • 5. Storyline • Why? • How? • Whirl-wind tour of database architecture • Techniques to get to 100K
  • 7.  It’s about money.  Top 5 bank needed to manage trades  Trades look more like documents than tables  Schemas for trades change all the time  Transactions  Scale and velocity (“Big Data”)
  • 8. Trades and Positions  1 million trades per day  Followed by 1 million position reports at end of day  Roll up trades of current date for each “book, instrument” pair  Group-by, with key = “date, book, instrument” 1M positions 1M trades Day Day Day Day Day ... 1 2 3 4 5
  • 9. Trades and Positions <trade> <quantity>8540882</quantity> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> <position> <instrument>EAAFX</instrument> <book>679</book> <quantity>3</quantity> <business-date>2011-03-25Z</business-date> <position-date>2011-03-24Z</position-date> </position>
  • 10. Requirements NoSQL flexibility, performance & scale Enterprise-grade transactional guarantees
  • 11. Now show us • 15K inserts per second • Linear scalability
  • 12. What NoSQL Database out there can we use? A quote from the bank “We threw everything we had at MarkLogic and it didn’t break a sweat”
  • 13. We weren’t content with 15K So we showed them… Slide 13 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 14. What is MarkLogic?  Non-relational, document-oriented, distributed database  Shared nothing clustering, linear scale out  Multi-Version Concurrency Control (MVCC)  Transactions (ACID)  Search Engine  Web scale (Big Data)  Inverted indexes (term lists)  Real-time updates  Compose-able queries Slide 14 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 15. Architecture  Data model  Indexing  Clustering  Query execution  Transactions Slide 15 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 16. Key (URI) Value (Document) /trade/153748994 <trade> <id>8</id> <time>2012-02-20T14:00:00</time> <instrument>BYME AAA</instrument> <price cur=“usd”>600.27</price> </trade> /user/eedeebee { “name” : “Eric Bloch”, “age” : 47, “hair” : “gray”, “kids” : [ “Grace”, “Ryan”, “Owen” ] } /book5293 It was the best of times, it was the worst of times, it was the age of wisdom, ... /2012-02-20T14:47:53/01445 .mp3 .avi [your favorite binary format]
  • 17. Inverted Index “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . “each” 123, 130, 131, 135, 162, 177 . . . Document “uniquely identify” 126, 130, 167, 212, 219, 377 . . . References <article> ... 126, 130, 167, 212, 219, 377 . . . article/abstract/@author ... <product>IMS</product> Slide 17 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 18. Range Index <trade> <trader_id>8</trader_id> Value Docid <time>2012-02-20T14:00:00</time> <instrument>IBM</instrument> 0 287 … </trade> 8 1129 13 531 Docid Value <trade> <trader_id>13</trader_id> … … 287 0 <time>2012-02-20T14:30:00</time> Rows <instrument>AAPL</instrument> … … 531 13 … 1129 8 </trade> … … <trade> <trader_id>0</trader_id> … … <time>2012-02-20T15:30:00</time> <instrument>GOOG</instrument> • Column Oriented … • Memory Mapped </trade> Slide 18 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 19. Shared-Nothing Clustering Host D1 Host D2 Host D3 … Host Dj Slide 19 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 20. Query Evaluation – “Map” Q Host E1 Host E2 Host E3 … Host Ei Host D1 Host D2 Host D3 Host Dj … Q Q Q Q F1 … Fn F1 … Fn F1 … Fn F1 … Fn Slide 21 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 21. Query Evaluation – “Reduce” Top 10 Host E1 Host E2 Host E3 … Host Ei Host D1 Host D2 Host D3 Host Dj … Top Top Top Top 10 10 10 10 F1 … Fn F1 … Fn F1 … Fn F1 … Fn Slide 22 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 22. Queries/Updates with MVCC  Every query has a timestamp  Documents do not change  Reads are lock-free  Inserts – see next slide  Deletes – mark as deleted  Edits –  copy  edit  insert the copy  mark the original as deleted Slide 23 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 23. Insert Mechanics 1) New URI+Document arrive at E-node 2) URI Probe – determine whether URI exists in any forest 3) URI Lock – write locks is taken on D node(s) 4) Forest Assignment – URI is deterministically placed in Forest 5) Indexing 6) Journaling 7) Commit – transaction complete 8) Release URI Locks – D node(s) are notified to release lock Slide 24 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 24. Save-and-merge (Log Stuctured Tree Merge) Database Insert/ Update Forest 1 Forest 2 S0 Memory Save Disk Journaled S1 S2 S1 S2 S3 S3 Merge J Slide 25 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 26. Trades and Positions  1 million trades per day  followed by 1 million position reports at end of day  Roll up trades of the current “date” for each “book:instrument” pair  Group-by, with key = “book:date:instrument” 1M positions 1M trades Day Day Day Day Day ... 1 2 3 4 5 Slide 27 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 27. Naive Query Pseudocode for each book for each instrument in that book position = position(yesterday, book, instrument) for each trade of that instrument in this book position += trade(today, book, instrument).quantity insert(today, book, instrument, position) Slide 28 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 28. Initial results Single node – 19,000 inserts per second Slide 29 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 29. Initial results with cluster 70000 60000 50000 40000 doc/sec 30000 Report Query 2 20000 10000 0 1DE 2DE 3DE # of nodes Slide 30 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 30. Techniques to get to 100K  Query for Computing New Positions  Materialized compound key, Co-Occurrence Query and Aggregation  Insert of New Positions  Batching  In-Forest Eval Slide 31 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 31. Materializing a compound key <trade> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> Slide 32 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 32. Materializing a compound key <trade> <roll-up book-date-instrument=“151333445566782303”/> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> Slide 33 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 33. Co-Occurrence and Distributed Aggregation <trade> <roll-up book-date-instrument=“151373445566703”/> V D D V <quantity>8540882</quantity> … … <instrument>WASAX</instrument> … … … 1129 <book>679</book> … … <trade-date>2011-03-13-07:00</trade-date> … … 15137… … <settle-date>2011-03-17-07:00</settle-date> … … </trade> … … … … … … • Co-occurrences: V D Find pairings of range indexed values D V … … • Aggregate on the D nodes (Map/Reduce): 8540882 … 1129 … Sum up the quantities above … … … … … 8540882 • Similar to a Group-by, … … • in a column-oriented, in-memory … … … … database … … Slide 34 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 34. Initial results with new query > 30K inserts/second on a single node Slide 35 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 35. Co-Occurrence + Aggregate versus Naïve approach 70000 60000 50000 40000 doc/sec Report Query 3 30000 Report Query 2 20000 10000 0 1DE 2DE 3DE # of nodes Slide 36 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 36. Techniques  Computing Positions  Materialized compound key, Co-Occurrence Query and Aggregation  Updates  Batching  In-Forest Eval Slide 37 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 37. Transaction Size and Throughput 35000 30000 25000 #docs per sec. 20000 15000 insert throughput 10000 5000 0 1 2 4 8 16 32 100 500 1000 10000 # doc inserts per transaction Slide 38 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 38. Techniques  Computing Positions (Data Warehouse)  Hash Key Decoration  Co-Occurrence  Updates (Transaction)  Batching  In-Forest Eval Slide 39 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 39. Insert Mechanics 1) New URI+Document arrive at E-node 2) *URI Probe – determine whether URI exists in any forest 3) *URI Lock – write locks are created 4) Forest Assignment – URI is deterministically placed in Forest 5) Indexing 6) Journaling 7) Commit – transaction complete 8) *Release URI Locks – D nodes are notified to release lock * Overhead of these operations increases with cluster size Slide 40 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 40. Deterministic Placement 4) Forest Assignment – URI is deterministically placed in Forest Hash 64-bit Bucketed Fi URI function number into a Forest • Done in C++ within server • But… • Can also be done in the client • Server allows queries to be evaluated against only one forest… Slide 41 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 41. In-Forest Eval Q E D1 D2 Q F1 F2 F3 F4 Slide 42 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 42. In-Forest Eval Insert Mechanics 1) New URI+Document arrive at E-node a) Compute Fi using b) Ask server to evaluated the insert query only against Fi 2) URI Probe – Fi Only 3) URI Lock – Fi Only 4) Forest Assignment – Fi Only 5) Indexing 6) Journaling 7) Commit – transaction complete 8) Lock Release - Fi Only Slide 43 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 43. Regular Insert Vs. In-Forest Eval 70000 60000 50000 docs/second 40000 regular insert 30000 in-forest-eval In-Forest Eval 20000 10000 0 1DE 2DE 3DE # of nodes Slide 44 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 44. Tada! Achieving 100K Updates In-Forest Eval Slide 45 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 45. NoSQL document-oriented database with real-time full-text search and transactions Doing 100K transactions per second (Yes that’s marker felt)
  • 48. Thank You! @eedeebee eric.bloch@marklogic.com Slide 49 Copyright © 2012 MarkLogic® Corporation. All rights reserved.

Editor's Notes

  • #8: Pain pointsThree tiered architectureHouse keeping &amp; AvailabilityStraw in a swimming pool
  • #9: Pain pointsThree tiered architectureHouse keeping &amp; AvailabilityStraw in a swimming pool
  • #10: Pain pointsThree tiered architectureHouse keeping &amp; AvailabilityStraw in a swimming pool
  • #15: word and phrase search, booleansearch, proximity, wildcarding, stemming, tokenization, decompounding, case-sensitivity options, punctuation-sensitivity options, diacritic-sensitivity options, document quality settings, numerous relevance algorithms, individual term weighting, topic clustering, faceted navigation, custom-indexed fields, and more.