Non Relational Databases

§ Focus
§ Raising awareness
§ Trends
§ High level

§ Questions
§ Why are non-relational databases increasing in usage?
§ What types or categories exist?
§ What are some examples in each category?
§ Why should I [the developer, the administrator, etc.] care?

A View of the Non-Relational Database Landscape

§ Trend 1: Data is becoming more and more connected
§ Joins, joins, and more joins (relationships are exploding)

§ Trend 2: Data sets are becoming larger and larger
§ Instruments dump massive amounts of data in the lab

§ Trend 3: Data is becoming less and less structured

Why Are Non-Relational DBs Increasing In Usage?

§ “Trend” 4: Cloud Computing
§ ..and perhaps more specifically, the scaling and fault tolerance needs.
§ For cloud providers, these are required hence addressed from the outset.
§ Backing up is replaced with having multiple active copies…
§ Data sets exist over multiple machines…
§ Nodes can crash and applications live to see another day…
§ Nodes can be added (or removed) at any point in time…

vs.

Why Are Non-Relational DBs Increasing In Usage?

§ What is ACID?
§ A promise ring your RDBMS wears.
§ Atomic, Consistent, Isolated, Durable
§ ACID trips when:
§ Downtime is unacceptable
§ Reliability is >= 2 nodes
§ Challenging over networks

§ What is CAP Theorem?
§ Distributed systems can have two:
§ Consistency (data is correct all the time)
§ Availability (read and write all the time)
§ Partition Tolerance (plug and play nodes)

§ What is BASE?
§ More people much smarter than me came up with an ACID alternative:
§ Basically Available (appears to work all the time)
§ Soft state (doesn’t have to be consistent all the time…)
§ Eventually consistent (…but eventually it will be)

Turn Up The BASE

Key Value Databases Column-Oriented Databases
Stores entities as key value Stores entities by column
pairs in large hash tables (versus row)

Document Databases Graph Databases
Stores documents (JSON) Stores entities as nodes and edges

Distributed Databases
More attribute than type!

Non-Relational Database Landscape

Database System Type Open Source/Commercial/Proprietary
Dynamo Key Value Proprietary (Amazon)
SimpleDB Key Value Commercial (Amazon Web Services)
Project Voldemort Key Value Open Source (started @ LinkedIn)
Memcached Key Value Open Source
Redis Key Value Open Source
Tokyo Cabinet Key Value Open Source
Cassandra Column-oriented * Open Source (started @ Facebook)
BigTable Column-oriented * Proprietary (Google), Commercial (AppEngine)
Hypertable Column-oriented * Open Source (implementation of BigTable)
Hbase Column-oriented * Open Source (implementation of BigTable)
CouchDB Document Open Source
MongoDB Document Open Source
Neo4j Graph Open Source

Notable Non-Relational Databases

§ Concepts
§ Domains: similar to table concept except schema-less.
§ Keys: arbitrary value.
§ Values: arbitrary blobs.
§ No explicit relationships between domains or within a domain.

§ Access
§ API (often SOAP or RESTful).
§ Some provide SQL-like syntax.
§ Basic filter predicates (=, !=, <, >, <=, >=). Ke Attributes
y
§ Integrity 1 Make: Nissan
§ Often contained in application code Model: Pathfinder
Color: Green
Year: 2003
2 Make: Nissan
Model: Pathfinder
Color: Green
Year: 2003
Transmission: Auto

Key Value Databases

§ Memcached
§ Originally developed to speed up LiveJournal.com.
§ Generic in nature but intended for use in alleviating database load.
§ Lightening fast, distributed, RAM only, no persistence.
§ “Everyone” uses it: Facebook, Digg, Slashdot, Twitter, YouTube,
SourceForge, …
function get_foo(int userid)
{
result = db_select("SELECT * FROM users WHERE userid = ?", userid);
return result;
}

function get_foo(int userid)
{
result = memcached_fetch("userrow:" + userid);
if (!result) {
result = db_select("SELECT * FROM users WHERE userid = ?", userid);
memcached_add("userrow:" + userid, result);
}
return result;
}

Key Value Databases: Memcached

§ SimpleDB
§ Written in Erlang (luckily you don’t need to know it to use it).
§ Eventually consistency is a key feature (concurrency!!)
§ Available via Amazon Web Services at very low cost.
§ Very common to use it in conjunction with other AWS offerings (EC2, S3,
SQS).

Key Value Databases: SimpleDB

§ SimpleDB Limitations

Key Value Databases: SimpleDB

§ Overview
EmployeeID Name Position
1 Moe Director
2 Larry Developer
3 Curly Analyst

A gross (emphasis on gross) simplification of what this serializes too…
ROW: 1,Moe,Director;2,Larry,Developer;3,Curly,Analyst
COLUMN: 1,2,3;Moe,Larry,Curly;Director,Developer,Analyst

§ Where It Shines
§ Querying many rows for smaller subsets of data (not all columns)
§ Maximizes disk performance (read scans)

§ Where It Is Outperformed
§ Querying all columns of a single row
§ Writing a new row if all of the column data is supplied at the same time

Column Oriented Databases

§ BigTable (and HBase, and Hypertable)
§ BigTable == Google
§ HBase == Interpretation of BigTable (Java) + Hadoop
§ Hypertable == Interpretation of BigTable (C++) + Hadoop

§ Collections of “Multi-dimensional Sparse Maps”
A–y cell => row, column, timestamp
A–n
A Contents B …

A’ B’ …

§ Rows § Columns
§ Name is an arbitrary string. § Two level naming structure
§ Ordered lexicographically. § family:optional_qualifier
§ Atomic access. § Families are a unit of access.
§ Creation is implicit. § Few column families in a table
§ Families can be marked with attributes.
§ Families can be assigned to locality groups

Column Like Databases: BigTable & Co.

content
“www.cnn.com” content language anchor: bms.com
Contents
“www.cnn.com/...” content language …
Contents
“www.cnn.com/.../...” content language …

“Application A” jonest: settings schmoej: settings

“Application B” … …

“Application C” … …

assay:a
“Sample X” assay:a assay:b
Contents
“Sample Y” … …
Contents
“Sample Z” … …

Column Like Databases: BigTable & Co.

§ Overview
§ Similar to key value stores
§ Most employ JSON.
§ Inherently schema-less
§ Most are denormalized.
§ Often composed of collections (akin to tables w/o schema)

Document Databases

“… is a distributed, fault tolerant, and schema-free
document-oriented database accessible via a RESTful
HTTP/JSON API…”

§ Other Tidbits
§ Believe it or not, idea was inspired by Lotus Notes.
§ Hosted with Apache, written in Erlang.
§ Futon: clean, stream-lined administrator interface.

§ Basic API
§ Create: HTTP PUT
§ Read: HTTP GET
§ Update: HTTP POST
§ Delete: HTTP DELETE

§ Adding Structure To Semi-Structured Data
§ Views are the method of aggregating and reporting on documents.
§ Built on-demand, dynamically, and do affect underlying documents.
§ Views are persisted.

Document Databases: CouchDB

§ Overview
§ Nodes represent entities.
§ Edges represent relationships.
§ Nodes and edges can have associated attributes (key values).
§ Most anything can be described as a graph.
§ Key value store with full support for relationships.

Graph Databases

§ Overview
§ Open source.
§ Java based.
§ Lightweight (single <500k JAR with minimal dependencies).
§ Still very early in development but looks promising.
§ Can handle graphs of several billion nodes/relationships/properties.
§ Disk based, solid state drive (SSD) ready.
§ Optional layers to expose it as an RDF store (OWL, SPARQL).
§ Has RDBMS features (ACID, durable persistence)

Graph Databases: Neo4j

§ If you’re in the cloud, you’re going to use them.
§ Amazon Web Services: SimpleDB
§ Google App Engine: BigTable
§ Open Source: Memcached, HBase, Hypertable, Cassandra, and more…

§ Break the habit; relational databases do not fit every problem.
§ Stuffing files into a RDBMS, maybe there’s something better?
§ Using a RDBMS for caching, perhaps a lighter-weight solution is better?
§ Cramming log data into a RDBMS, perhaps a key value store is better?

§ Despite the hype, relational databases are not doomed.
§ Though in my opinion their role and place will certainly change.

§ Scaling is a real challenge for relational databases.
§ Sharding is a band-aid, not feasible beyond a few nodes.

§ There is a hit in overcoming the initial learning curve
§ It changes how you build applications

Parting Thoughts & Musings

Non Relational Databases

More Related Content

What's hot (20)

Similar to Non Relational Databases (20)

Recently uploaded (20)

Non Relational Databases