To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my app?

To Scale or Not to ScaleWhich API is Right for You?Uri Cohen, GigaSpaces@uri1803

> SELECT * FROM devoxx2010.speakers WHERE name=‘Uri Cohen’+-----------------------------------------------------+| Name | Company | Role | Twitter |+-----------------------------------------------------+| Uri Cohen | GigaSpaces | Product Manager | @uri1803 |+-----------------------------------------------------+> db.devoxx_speakers.find({name:”Uri Cohen”}){ “name”:”Uri Cohen”, “company”: { name:”GigaSpaces”, products:[“XAP”, “IMDG”] domain: “In memory data grids” } “role”:”product manager”, “twitter”:”@uri1803”}

AgendaSQLWhat it is and isn’t good for NoSQLMotivation & Main Concepts Common interaction modelsKey/Value, Column, DocumentNOT consistency and distribution algorithms One Data Store, Multiple APIsBrief intro to GigaSpaces Key/Value challenges SQL challenges: Add-hoc querying, Relationships (JPA)

SQL(Usually) Centralized  Transactional, consistent Hard to Scale

SQLStatic, normalized data schemaDon’t duplicate, use FKs

SQLAdd hoc query support  Model first, query later

SQLStandard  Well known  Rich ecosystem

NoSql (or a Naive Attempt to Define It)A loosely coupled collection ofnon-relational data stores

NoSql (or a Naive Attempt to Define It)(Mostly) d i s t r i b u t e d

NoSql (or a Naive Attempt to Define It)scalable (Up & Out)

NoSql (or a Naive Attempt to Define It)Not (always) ACID BASE anyone?

Why Now?Timing is everything…Exponential Increase in data & throughput Non or semi structured data that changes frequently

A Universe of Data Models 15Key / ValueColumnDocument{ “name”:”uri”, “ssn”:”213445”, “hobbies”:[”…”,“…”], “…”: { “…”:”…” “…”:”…” } }{ { ... }}{ { ... }}

Key/ValueHave the key? Get the valueThat’s about it when it comes to querying Map/Reduce (sometimes)Good forcache aside (e.g. Hibernate 2nd level cache)Simple, id based interactions (e.g. user profiles) In most cases, values are Opaque

Key/ValueScaling out is relatively easy (just hash the keys)Some will do that automatically for you Fixed vs. consistent hashing

Key/ValueImplementations: Memcached, Redis, Riak In memory data grids (mostly Java-based) started this way GigaSpaces, Oracle Coherence, WebSphere XS, JBoss Infinispan, etc.

Column Based Mostly derived from Google’s BigTable / Amazon Dynamo papers One giant table of rows and columnsColumn == pair (name and a value, sometimes timestamp)Each row can have a different number of columnsTable is sparse: (#rows) × (#columns) ≥ (#values)

Column Based Query on row key Or column value (aka secondary index)Good for a constantly changing, (albeit flat) domain model

DocumentThink JSON (or BSON, or XML){ “name”:”Lady Gaga”, “ssn”:”213445”, “hobbies”:[”Dressing up”,“Singing”], “albums”: [{“name”:”The fame” “release_year”:”2008”}, {“name”:”Born this way” “release_year”:”2011”}] }{ { ... }}{ { ... }}

DocumentModel is not flat, data store is aware of it Arrays, nested documents Better support for ad hoc queriesMongoDB excels at this Very intuitive model Flexible schema

What if you didn’t have to choose?JPA{ “name”:”uri”, “ssn”:”213445”, “hobbies”:[”…”,“…”], “…”: { “…”:”…” “…”:”…” } }{ { ... }}{ { ... }}JDBC

A Brief Intro to GigaSpaces In Memory Data Grid With optional write behind to a secondary storage A Brief Intro to GigaSpaces Tuple basedAware of nested tuples (and soon collections)Document like Rich querying and map/reduce semantics A Brief Intro to GigaSpaces Transparent partitioning & HAFixed hashing based on a chosen property A Brief Intro to GigaSpaces Transactional (Like, ACID)Local (single partition)

Distributed (multiple partitions)Use the Right API for the JobEven for the same data…POJO & JPA for Java apps with complex domain modelDocument for a more dynamic viewMemcached for simple, language neutral data accessJDBC for:Interaction with legacy apps Flexible ad-hoc querying (e.g. projections)

Memcached (the Daemon is in the Details)30

Memcached (the Daemon is in the Details)31

SQL/JDBC – Query Them AllQuery may involve Map/ReduceReduce phase includes merging and sorting32

SQL/JDBC – Things to Consider Unique and FK constraints are not practically enforceable Sorting and aggregation may be expensive Distributed transactions are evil Stay local…33

JPA It’s all about relationships…34

JPA Relationships To embed or not to embed, that is the question….35Easy to partition and scale

Easy to query: user.accounts.type= ‘checking’Owned relationships only JPA Relationships To embed or not to embed, that is the question….36Any type of relationship

Querying involves joiningSummaryOne API doesn’t fit allUse the right API for the job Know the tradeoffsAlways ask what you’re giving up, not just what you’re gaining 37

Thank You!@uri1803http://www.gigaspaces.com38

To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my app?

To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my app?

More Related Content

What's hot (20)

Similar to To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my app? (20)

More from Uri Cohen (20)

Recently uploaded (20)

To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my app?