SlideShare a Scribd company logo
To Scale or Not to ScaleWhich API is Right for You?Uri Cohen, GigaSpaces@uri1803
> SELECT * FROM devoxx2010.speakers WHERE name=‘Uri Cohen’+-----------------------------------------------------+| Name      | Company    | Role            | Twitter  |+-----------------------------------------------------+| Uri Cohen | GigaSpaces | Product Manager | @uri1803 |+-----------------------------------------------------+> db.devoxx_speakers.find({name:”Uri Cohen”}){  “name”:”Uri Cohen”,   “company”: {                name:”GigaSpaces”,                 products:[“XAP”, “IMDG”]                domain: “In memory data grids”             }  “role”:”product manager”,   “twitter”:”@uri1803”}
AgendaSQLWhat it is and isn’t good for NoSQLMotivation & Main Concepts Common interaction modelsKey/Value, Column, DocumentNOT consistency and distribution algorithms One Data Store, Multiple APIsBrief intro to GigaSpaces Key/Value challenges SQL challenges: Add-hoc querying, Relationships (JPA)
A Few (more) Words About SQL
SQL(Usually) Centralized  Transactional, consistent Hard to Scale
SQLStatic, normalized data schemaDon’t duplicate, use FKs
SQLAdd hoc query support  Model first, query later
SQLStandard  Well known  Rich ecosystem
(Brief) NOSql Recap
NoSql (or a Naive Attempt to Define It)A loosely coupled collection ofnon-relational data stores
NoSql (or a Naive Attempt to Define It)(Mostly) d   i   s   t   r   i   b   u   t   e   d
NoSql (or a Naive Attempt to Define It)scalable (Up & Out)
NoSql (or a Naive Attempt to Define It)Not (always) ACID BASE anyone?
Why Now?Timing is everything…Exponential Increase in data & throughput Non or semi structured data that changes frequently
A Universe of Data Models 15Key / ValueColumnDocument{  “name”:”uri”,  “ssn”:”213445”,  “hobbies”:[”…”,“…”],  “…”: {      “…”:”…”      “…”:”…”   } }{  { ... }}{  { ... }}
Key/ValueHave the key? Get the valueThat’s about it when it comes to querying Map/Reduce (sometimes)Good forcache aside (e.g. Hibernate 2nd level cache)Simple, id based interactions (e.g. user profiles) In most cases, values are Opaque
Key/ValueScaling out is relatively easy (just hash the keys)Some will do that automatically for you Fixed vs. consistent hashing
Key/ValueImplementations: Memcached, Redis, Riak In memory data grids (mostly Java-based) started this way GigaSpaces, Oracle Coherence, WebSphere XS, JBoss Infinispan, etc.
Column Based
Column Based Mostly derived from Google’s BigTable / Amazon Dynamo papers One giant table of rows and columnsColumn == pair (name and a value, sometimes timestamp)Each row can have a different number of columnsTable is sparse:  (#rows) × (#columns) ≥ (#values)
Column Based Query on row key Or column value (aka secondary index)Good for a constantly changing, (albeit flat) domain model
DocumentThink JSON (or BSON, or XML){  “name”:”Lady Gaga”,  “ssn”:”213445”,  “hobbies”:[”Dressing up”,“Singing”],  “albums”:     [{“name”:”The fame”       “release_year”:”2008”},      {“name”:”Born this way”       “release_year”:”2011”}] }{  { ... }}{  { ... }}
DocumentModel is not flat, data store is aware of it Arrays, nested documents Better support for ad hoc queriesMongoDB  excels at this Very intuitive model Flexible schema
What if you didn’t have to choose?JPA{  “name”:”uri”,  “ssn”:”213445”,  “hobbies”:[”…”,“…”],  “…”: {      “…”:”…”      “…”:”…”   } }{  { ... }}{  { ... }}JDBC
A Brief Intro to GigaSpaces In Memory Data Grid With optional write behind to a secondary storage A Brief Intro to GigaSpaces Tuple basedAware of nested tuples (and soon collections)Document like Rich querying and map/reduce semantics A Brief Intro to GigaSpaces Transparent partitioning & HAFixed hashing based on a chosen property A Brief Intro to GigaSpaces Transactional (Like, ACID)Local (single partition)
Distributed (multiple partitions)Use the Right API for the JobEven for the same data…POJO & JPA for Java apps with complex domain modelDocument for a more dynamic viewMemcached for simple, language neutral data accessJDBC for:Interaction with legacy apps Flexible ad-hoc querying (e.g. projections)
Memcached (the Daemon is in the Details)30
Memcached (the Daemon is in the Details)31
SQL/JDBC – Query Them AllQuery may involve Map/ReduceReduce phase includes merging and sorting32
SQL/JDBC – Things to Consider Unique and FK constraints are not practically enforceable Sorting and aggregation may be expensive Distributed transactions are evil Stay local…33
JPA It’s all about relationships…34
JPA Relationships To embed or not to embed, that is the question….35Easy to partition and scale
Easy to query: user.accounts.type= ‘checking’Owned relationships only JPA Relationships To embed or not to embed, that is the question….36Any type of relationship
Partitioning is hard
Querying involves joiningSummaryOne API doesn’t fit allUse the right API for the job Know the tradeoffsAlways ask what you’re giving up, not just what you’re gaining 37
Thank You!@uri1803http://www.gigaspaces.com38
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my app?
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my app?

More Related Content

PDF
Redis: REmote DIctionary Server
PDF
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
PPTX
Cassandra & puppet, scaling data at $15 per month
PDF
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
PDF
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
PDF
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
PDF
Speedment - Reactive programming for Java8
PDF
Elasticsearch
Redis: REmote DIctionary Server
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
Cassandra & puppet, scaling data at $15 per month
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
Speedment - Reactive programming for Java8
Elasticsearch

What's hot (20)

PDF
Spark Summit EU talk by Ted Malaska
PDF
Buzzwords 2014 / Overview / part1
PDF
How to Make Norikra Perfect
PDF
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
PDF
High Performance Solr
PDF
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
PDF
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
PDF
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
PPTX
Zeppelin and spark sql demystified
PDF
Facebook Presto presentation
PPTX
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
PDF
使用ZooKeeper打造軟體式負載平衡
PDF
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
PDF
Automated Spark Deployment With Declarative Infrastructure
PDF
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
PDF
Scaling an invoicing SaaS from zero to over 350k customers
PDF
Prestogres, ODBC & JDBC connectivity for Presto
PPTX
Terraform Modules Restructured
PDF
From Lucene to Elasticsearch, a short explanation of horizontal scalability
PDF
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Spark Summit EU talk by Ted Malaska
Buzzwords 2014 / Overview / part1
How to Make Norikra Perfect
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
High Performance Solr
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Zeppelin and spark sql demystified
Facebook Presto presentation
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
使用ZooKeeper打造軟體式負載平衡
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Automated Spark Deployment With Declarative Infrastructure
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Scaling an invoicing SaaS from zero to over 350k customers
Prestogres, ODBC & JDBC connectivity for Presto
Terraform Modules Restructured
From Lucene to Elasticsearch, a short explanation of horizontal scalability
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Ad

Similar to To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my app? (20)

PPTX
Yes, Sql!
PDF
Nosql data models
PPT
No sql (1)
PPT
The No SQL Principles and Basic Application Of Casandra Model
PDF
NoSql and it's introduction features-Unit-1.pdf
PPTX
No SQL- The Future Of Data Storage
PDF
Functional Dependencies and Normalization for Relational Databases
PPTX
Introduction to Data Science NoSQL.pptx
PPTX
NoSql-YesSQL mickey alon
PPTX
Sql vs NoSQL
PPTX
NoSQL
PPT
SQL/NoSQL How to choose ?
PPTX
No sq lv2
PPTX
Big Data and the growing relevance of NoSQL
PPTX
mongodb_DS.pptx
PPTX
NoSQL.pptx
PPTX
NoSQL: An Analysis
PPTX
cours database pour etudiant NoSQL (1).pptx
PPTX
PPTX
NoSQL A brief look at Apache Cassandra Distributed Database
Yes, Sql!
Nosql data models
No sql (1)
The No SQL Principles and Basic Application Of Casandra Model
NoSql and it's introduction features-Unit-1.pdf
No SQL- The Future Of Data Storage
Functional Dependencies and Normalization for Relational Databases
Introduction to Data Science NoSQL.pptx
NoSql-YesSQL mickey alon
Sql vs NoSQL
NoSQL
SQL/NoSQL How to choose ?
No sq lv2
Big Data and the growing relevance of NoSQL
mongodb_DS.pptx
NoSQL.pptx
NoSQL: An Analysis
cours database pour etudiant NoSQL (1).pptx
NoSQL A brief look at Apache Cassandra Distributed Database
Ad

More from Uri Cohen (20)

PPTX
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
PPTX
Cloudify workshop at CCCEU 2014
PPTX
SSDs, IMDGs and All the Rest - Jax London
PPTX
Alef event - going open source
PPTX
GigaSpaces XAP for Financial Services
PPTX
In Memory Data Grids, Demystified!
PPTX
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
PPTX
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
PPTX
Deployment Automation on OpenStack with TOSCA and Cloudify
PPTX
Cloud stack collabiration conference - It's the app, stupid!
PPTX
Changing organizational culture - a sweaty usecase
PPTX
GigaSpaces XAP - Don't Call Me Cache!
PPTX
Oscon 2013 - Lessons from building an open source community
PPTX
Oscon 2013 -Your OSS Project Is now served
PPTX
OpenStack Israel Summit 2013 - It’s the App, Stupid!
PPTX
One Does Not Simply Walk Into Devops
PPTX
MongoDB in the Clouds
PPTX
Carrier Paas - CloudStack Collaboration Event 2012
PPTX
Your Apps on the Cloud - What it really takes
PPTX
Cassandra summit - Big Data Apps on the cloud
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Cloudify workshop at CCCEU 2014
SSDs, IMDGs and All the Rest - Jax London
Alef event - going open source
GigaSpaces XAP for Financial Services
In Memory Data Grids, Demystified!
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Deployment Automation on OpenStack with TOSCA and Cloudify
Cloud stack collabiration conference - It's the app, stupid!
Changing organizational culture - a sweaty usecase
GigaSpaces XAP - Don't Call Me Cache!
Oscon 2013 - Lessons from building an open source community
Oscon 2013 -Your OSS Project Is now served
OpenStack Israel Summit 2013 - It’s the App, Stupid!
One Does Not Simply Walk Into Devops
MongoDB in the Clouds
Carrier Paas - CloudStack Collaboration Event 2012
Your Apps on the Cloud - What it really takes
Cassandra summit - Big Data Apps on the cloud

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
cuic standard and advanced reporting.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
cuic standard and advanced reporting.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
Understanding_Digital_Forensics_Presentation.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Review of recent advances in non-invasive hemoglobin estimation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Empathic Computing: Creating Shared Understanding
The Rise and Fall of 3GPP – Time for a Sabbatical?
Per capita expenditure prediction using model stacking based on satellite ima...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Weekly Chronicles - August'25 Week I
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx

To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my app?

  • 1. To Scale or Not to ScaleWhich API is Right for You?Uri Cohen, GigaSpaces@uri1803
  • 2. > SELECT * FROM devoxx2010.speakers WHERE name=‘Uri Cohen’+-----------------------------------------------------+| Name | Company | Role | Twitter |+-----------------------------------------------------+| Uri Cohen | GigaSpaces | Product Manager | @uri1803 |+-----------------------------------------------------+> db.devoxx_speakers.find({name:”Uri Cohen”}){ “name”:”Uri Cohen”, “company”: { name:”GigaSpaces”, products:[“XAP”, “IMDG”] domain: “In memory data grids” } “role”:”product manager”, “twitter”:”@uri1803”}
  • 3. AgendaSQLWhat it is and isn’t good for NoSQLMotivation & Main Concepts Common interaction modelsKey/Value, Column, DocumentNOT consistency and distribution algorithms One Data Store, Multiple APIsBrief intro to GigaSpaces Key/Value challenges SQL challenges: Add-hoc querying, Relationships (JPA)
  • 4. A Few (more) Words About SQL
  • 5. SQL(Usually) Centralized  Transactional, consistent Hard to Scale
  • 6. SQLStatic, normalized data schemaDon’t duplicate, use FKs
  • 7. SQLAdd hoc query support  Model first, query later
  • 8. SQLStandard  Well known  Rich ecosystem
  • 10. NoSql (or a Naive Attempt to Define It)A loosely coupled collection ofnon-relational data stores
  • 11. NoSql (or a Naive Attempt to Define It)(Mostly) d i s t r i b u t e d
  • 12. NoSql (or a Naive Attempt to Define It)scalable (Up & Out)
  • 13. NoSql (or a Naive Attempt to Define It)Not (always) ACID BASE anyone?
  • 14. Why Now?Timing is everything…Exponential Increase in data & throughput Non or semi structured data that changes frequently
  • 15. A Universe of Data Models 15Key / ValueColumnDocument{ “name”:”uri”, “ssn”:”213445”, “hobbies”:[”…”,“…”], “…”: { “…”:”…” “…”:”…” } }{ { ... }}{ { ... }}
  • 16. Key/ValueHave the key? Get the valueThat’s about it when it comes to querying Map/Reduce (sometimes)Good forcache aside (e.g. Hibernate 2nd level cache)Simple, id based interactions (e.g. user profiles) In most cases, values are Opaque
  • 17. Key/ValueScaling out is relatively easy (just hash the keys)Some will do that automatically for you Fixed vs. consistent hashing
  • 18. Key/ValueImplementations: Memcached, Redis, Riak In memory data grids (mostly Java-based) started this way GigaSpaces, Oracle Coherence, WebSphere XS, JBoss Infinispan, etc.
  • 20. Column Based Mostly derived from Google’s BigTable / Amazon Dynamo papers One giant table of rows and columnsColumn == pair (name and a value, sometimes timestamp)Each row can have a different number of columnsTable is sparse: (#rows) × (#columns) ≥ (#values)
  • 21. Column Based Query on row key Or column value (aka secondary index)Good for a constantly changing, (albeit flat) domain model
  • 22. DocumentThink JSON (or BSON, or XML){ “name”:”Lady Gaga”, “ssn”:”213445”, “hobbies”:[”Dressing up”,“Singing”], “albums”: [{“name”:”The fame” “release_year”:”2008”}, {“name”:”Born this way” “release_year”:”2011”}] }{ { ... }}{ { ... }}
  • 23. DocumentModel is not flat, data store is aware of it Arrays, nested documents Better support for ad hoc queriesMongoDB excels at this Very intuitive model Flexible schema
  • 24. What if you didn’t have to choose?JPA{ “name”:”uri”, “ssn”:”213445”, “hobbies”:[”…”,“…”], “…”: { “…”:”…” “…”:”…” } }{ { ... }}{ { ... }}JDBC
  • 25. A Brief Intro to GigaSpaces In Memory Data Grid With optional write behind to a secondary storage A Brief Intro to GigaSpaces Tuple basedAware of nested tuples (and soon collections)Document like Rich querying and map/reduce semantics A Brief Intro to GigaSpaces Transparent partitioning & HAFixed hashing based on a chosen property A Brief Intro to GigaSpaces Transactional (Like, ACID)Local (single partition)
  • 26. Distributed (multiple partitions)Use the Right API for the JobEven for the same data…POJO & JPA for Java apps with complex domain modelDocument for a more dynamic viewMemcached for simple, language neutral data accessJDBC for:Interaction with legacy apps Flexible ad-hoc querying (e.g. projections)
  • 27. Memcached (the Daemon is in the Details)30
  • 28. Memcached (the Daemon is in the Details)31
  • 29. SQL/JDBC – Query Them AllQuery may involve Map/ReduceReduce phase includes merging and sorting32
  • 30. SQL/JDBC – Things to Consider Unique and FK constraints are not practically enforceable Sorting and aggregation may be expensive Distributed transactions are evil Stay local…33
  • 31. JPA It’s all about relationships…34
  • 32. JPA Relationships To embed or not to embed, that is the question….35Easy to partition and scale
  • 33. Easy to query: user.accounts.type= ‘checking’Owned relationships only JPA Relationships To embed or not to embed, that is the question….36Any type of relationship
  • 35. Querying involves joiningSummaryOne API doesn’t fit allUse the right API for the job Know the tradeoffsAlways ask what you’re giving up, not just what you’re gaining 37