SlideShare a Scribd company logo
Riak
How does Riak compare to Cassandra?
/usr/bin/whoami

‱   Russell Smith

‱   Work for UKD1, a consultancy for web-related-tech

‱   Help with application design, infrastructure, capacity planning, etc

‱   Mainly for the video-games industry & web-startups

‱   Twitter: @ukd1
What is Riak?
‱   Pronounced ‘ree-ack’

‱   A scalable, high-availability, distributed, key-value store

‱   Modelled on Amazon’s description of Dynamo, like Cassandra

‱   Commercially supported / developed by Basho

‱   Written in Erlang

‱   Open source - Apache License (2.0)
What isn’t Riak?


‱   Schema enforced - store what you want

‱   Relational database - No joins or constraint enforcement as there are no global locks

‱   Not intended to compete with in-memory column based databases
What versions are available?

‱   Riak

‱   Riak Search (Riak + distributed full-text indexing / search)

‱   Riak Enterprise - commercially licensed - supports extra features for
    enterprise use (SNMP, data-centre awareness, etc)

‱   Luwak (Riak + app for storing large ïŹles; it’s bundled by default)
Riak’s take on CAP

‱   Exposed to the end user - allowing tuning of N, R & W

‱   N - # of nodes, set per bucket (default of 3)

‱   R - # of nodes required for a read (per request)

‱   W - # of nodes required for a successful write (a number, all, quorum
    or default for the bucket)
Client libraries


‱   PHP, Python, Ruby, Java, Erlang, Javascript, .NET

‱   Community client libraries;

‱   C, Clojure, Go, Griffon, Groovy, Haskell, Perl, Scala, Smalltalk
What can you store?

‱   Values against keys

‱   Keys are organised in to buckets

‱   Practical value limit of 64mb

‱   For large ïŹles; Luwak (built in > 0.13) splits them in to smaller blocks
Querying

‱   Two main interfaces; HTTP & Protocol buffers

‱   HTTP API is mainly REST - GET, PUT, DELETE

‱   Riak stores the key, value & metadata about the key;

‱   Content Type, Charset, Encoding & link data

‱   Also: any custom metadata
Links


‱   Used to store one-way relationships between objects;

‱   Stored in object meta-data

‱   Link-walking uses MapReduce
MapReduce

‱   Designed to be used for web-page-speed requests

‱   Built in

‱   Map / Reduce functions are written in Javascript or Erlang

‱   Can do re-reduce

‱   Streaming MapReduce
Vector clocks
‱   Each value is tagged with a vector clock

‱   Riak can determine if values;
‱   Are direct decendants of a single object

‱   Share a common parent

‱   Unrelated

‱   In Riak each object has a vector clock

‱   Cassandra uses timestamps - problems can occur with out of sync
Siblings
‱   Siblings are different versions of the same document which Riak has
    not merged

‱   Occurs only if allow_mult is enabled on a bucket AND;

‱   Concurrent write with the same vector clock value

‱   Stale vector clock

‱   No vector clock passed
Pre & Post Commit Hooks

‱   Allow the object to be written

‱   Modify the object

‱   Fail the update

‱   They are per-bucket (stored in the properties)

‱   Written in Javascript (pre-hooks) or Erlang (pre/post-hooks)
Admin

‱   Super simple;

‱   riak-admin join <node-in-cluster>

‱   riak-admin leave

‱   Backup tools are provided....
Backup / restore

‱   riak-admin backup|restore <node> <cookie> <output_ïŹle> [[node|
    all]]

‱   Alternative is ïŹlesystem backup for bitcask; as it uses append-only ïŹles

‱   riak-admin backup is storage-engine agnostic

‱   riak-admin only backs up kv data; not search indexes (Riak-Search)
Storage engines

‱   Ships with two default storage engines;

‱   Bitcask - default, best when keyspace < RAM

‱   InnoDB - suggested when keyspace > RAM

‱   Also available - Google’s LevelDB. It’s BSD licensed & recently
    integrated, good for large sets.
Riak-Search
‱   Full-text search engine built on top of Riak

‱   Realtime

‱   Uses Lucene Analyzers, custom ones may be written in Erlang / Java

‱   Supports term / ïŹeld searchs, boolean operators, grouping, lexical
    range queries and end of word wildcards

‱   Will be part of Riak as default from 1.0
Riak > Cassandra

‱   Extremely simple to add or remove nodes from a cluster

‱   No pre-setup of datamodel

‱   Rest & Protobuf API access

‱   Commercial support from the original developers, Basho
Riak = Cassandra

‱   No single point of failure

‱   Linearly scalable

‱   High availability

‱   Eventually consistent

‱   You can choose your own consistency requirements
Riak < Cassandra
‱   CQL; an SQL-ish language

‱   Range / cover queries are built in (no need to write MapReduce functions)

‱   ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build

‱   Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra
    http://wiki.apache.org/cassandra/ThirdPartySupport


‱   Cassandra is seemly more popular & has a bigger community

‱   Partitions vs MD5 of RandomPartitioner; you can’t reconïŹgure if you need - careful you plan with Riak!
    http://wiki.basho.com/Cluster-Capacity-Planning.html
Further reading

‱   Basho’s slide deck; http://wiki.basho.com/Slide-Decks.html

‱   Commit hooks; http://wiki.basho.com/Pre--and-Post-Commit-
    Hooks.html

‱   Riak / Cassandra; http://wiki.basho.com/Riak-Compared-to-
    Cassandra.html
Questions?

More Related Content

PDF
Riak at shareaholic
PPT
Rolling With Riak
PDF
Scaling with Riak at Showyou
PDF
Cassandra in e-commerce
PDF
Change data capture with MongoDB and Kafka.
PDF
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
PPTX
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
PPTX
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
Riak at shareaholic
Rolling With Riak
Scaling with Riak at Showyou
Cassandra in e-commerce
Change data capture with MongoDB and Kafka.
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data

What's hot (20)

PDF
Apache HBase Workshop
PPTX
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
PDF
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
PPTX
Lambda architecture: from zero to One
PPTX
Taboola Road To Scale With Apache Spark
PDF
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
PDF
Building Complete Private Clouds with Apache CloudStack and Riak CS
PPTX
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
PPTX
Big Data Platform at Pinterest
 
PPTX
Bootstrap SaaS startup using Open Source Tools
PPTX
HBaseConAsia2018 Track3-2: HBase at China Telecom
PDF
A Collaborative Data Science Development Workflow
PPTX
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
PPTX
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
PPTX
Securing Data in Hadoop at Uber
PPTX
Real time fraud detection at 1+M scale on hadoop stack
PDF
Presto @ Uber Hadoop summit2017
PPTX
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
PDF
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
PDF
#GeodeSummit - Redis to Geode Adaptor
Apache HBase Workshop
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Lambda architecture: from zero to One
Taboola Road To Scale With Apache Spark
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
Building Complete Private Clouds with Apache CloudStack and Riak CS
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
Big Data Platform at Pinterest
 
Bootstrap SaaS startup using Open Source Tools
HBaseConAsia2018 Track3-2: HBase at China Telecom
A Collaborative Data Science Development Workflow
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Securing Data in Hadoop at Uber
Real time fraud detection at 1+M scale on hadoop stack
Presto @ Uber Hadoop summit2017
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
#GeodeSummit - Redis to Geode Adaptor
Ad

Viewers also liked (12)

PPSX
Cassandra and Riak at BestBuy.com
PPTX
The BestBuy.com Cloud Architecture
KEY
Riak with node.js
PPTX
Riak TS
PPTX
Migrating Data Pipeline from MongoDB to Cassandra
PPTX
Pattern of Innovation
PDF
IoT BASED VEHICLE TRACKING AND TRAFFIC SURVIELLENCE SYSTEM
PDF
Best Buy Web 2.0
PDF
CouchDB Vs MongoDB
DOCX
Best Buy case study
PPTX
ApacheCon 2014: Infinite Session Clustering with Apache Shiro & Cassandra
PPTX
Data Modeling IoT and Time Series data in NoSQL
Cassandra and Riak at BestBuy.com
The BestBuy.com Cloud Architecture
Riak with node.js
Riak TS
Migrating Data Pipeline from MongoDB to Cassandra
Pattern of Innovation
IoT BASED VEHICLE TRACKING AND TRAFFIC SURVIELLENCE SYSTEM
Best Buy Web 2.0
CouchDB Vs MongoDB
Best Buy case study
ApacheCon 2014: Infinite Session Clustering with Apache Shiro & Cassandra
Data Modeling IoT and Time Series data in NoSQL
Ad

Similar to How does Riak compare to Cassandra? [Cassandra London User Group July 2011] (20)

KEY
Riak seattle-meetup-august
PDF
Building Distributed Systems With Riak and Riak Core
PDF
Getting started with Riak in the Cloud
PDF
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
PDF
Riak CS Build Your Own Cloud Storage
PDF
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
PPTX
Scala and Spark are Ideal for Big Data
PDF
A closer look to locaweb IaaS
PDF
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
PPTX
Open stack ha design & deployment kilo
PPTX
How does Apache Pegasus (incubating) community develop at SensorsData
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PPTX
DEVNET-1106 Upcoming Services in OpenStack
PDF
Scalable Persistent Storage for Erlang: Theory and Practice
PDF
Getting started with MariaDB with Docker
PDF
Introduction to Apache Geode (Cork, Ireland)
PPTX
Apache Hadoop YARN State of the Union
PDF
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
PPTX
Introducing Venice - Strata NYC 2017
PPTX
Oracle RAD stack REST, APEX, Database
Riak seattle-meetup-august
Building Distributed Systems With Riak and Riak Core
Getting started with Riak in the Cloud
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Riak CS Build Your Own Cloud Storage
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Scala and Spark are Ideal for Big Data
A closer look to locaweb IaaS
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Open stack ha design & deployment kilo
How does Apache Pegasus (incubating) community develop at SensorsData
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
DEVNET-1106 Upcoming Services in OpenStack
Scalable Persistent Storage for Erlang: Theory and Practice
Getting started with MariaDB with Docker
Introduction to Apache Geode (Cork, Ireland)
Apache Hadoop YARN State of the Union
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Introducing Venice - Strata NYC 2017
Oracle RAD stack REST, APEX, Database

More from Rainforest QA (11)

PDF
Machine Learning in Practice - CTO Summit Chicago 2019
PPTX
CTO Summit NASDAQ NYC 2017: Creating a QA Strategy
PDF
Ops Skills and Tools for Beginners [#MongoDB World 2014]
PDF
Pragmatic Rails Architecture [SF Rails, 24 Apr 14]
PDF
Bitcoin Ops & Security Primer
PDF
Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...
KEY
MongoDB Command Line Tools
KEY
Seedhack MongoDB 2011
KEY
An Introduction to Map/Reduce with MongoDB
KEY
London MongoDB User Group April 2011
KEY
Geo & capped collections with MongoDB
Machine Learning in Practice - CTO Summit Chicago 2019
CTO Summit NASDAQ NYC 2017: Creating a QA Strategy
Ops Skills and Tools for Beginners [#MongoDB World 2014]
Pragmatic Rails Architecture [SF Rails, 24 Apr 14]
Bitcoin Ops & Security Primer
Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...
MongoDB Command Line Tools
Seedhack MongoDB 2011
An Introduction to Map/Reduce with MongoDB
London MongoDB User Group April 2011
Geo & capped collections with MongoDB

Recently uploaded (20)

PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Transform Your Business with a Software ERP System
PDF
Understanding Forklifts - TECH EHS Solution
PDF
AI in Product Development-omnex systems
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
L1 - Introduction to python Backend.pptx
PPTX
ai tools demonstartion for schools and inter college
PPTX
Introduction to Artificial Intelligence
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPT
Introduction Database Management System for Course Database
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Design an Analysis of Algorithms I-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Transform Your Business with a Software ERP System
Understanding Forklifts - TECH EHS Solution
AI in Product Development-omnex systems
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How to Migrate SBCGlobal Email to Yahoo Easily
L1 - Introduction to python Backend.pptx
ai tools demonstartion for schools and inter college
Introduction to Artificial Intelligence
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
How Creative Agencies Leverage Project Management Software.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Introduction Database Management System for Course Database
VVF-Customer-Presentation2025-Ver1.9.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

How does Riak compare to Cassandra? [Cassandra London User Group July 2011]

  • 1. Riak How does Riak compare to Cassandra?
  • 2. /usr/bin/whoami ‱ Russell Smith ‱ Work for UKD1, a consultancy for web-related-tech ‱ Help with application design, infrastructure, capacity planning, etc ‱ Mainly for the video-games industry & web-startups ‱ Twitter: @ukd1
  • 3. What is Riak? ‱ Pronounced ‘ree-ack’ ‱ A scalable, high-availability, distributed, key-value store ‱ Modelled on Amazon’s description of Dynamo, like Cassandra ‱ Commercially supported / developed by Basho ‱ Written in Erlang ‱ Open source - Apache License (2.0)
  • 4. What isn’t Riak? ‱ Schema enforced - store what you want ‱ Relational database - No joins or constraint enforcement as there are no global locks ‱ Not intended to compete with in-memory column based databases
  • 5. What versions are available? ‱ Riak ‱ Riak Search (Riak + distributed full-text indexing / search) ‱ Riak Enterprise - commercially licensed - supports extra features for enterprise use (SNMP, data-centre awareness, etc) ‱ Luwak (Riak + app for storing large ïŹles; it’s bundled by default)
  • 6. Riak’s take on CAP ‱ Exposed to the end user - allowing tuning of N, R & W ‱ N - # of nodes, set per bucket (default of 3) ‱ R - # of nodes required for a read (per request) ‱ W - # of nodes required for a successful write (a number, all, quorum or default for the bucket)
  • 7. Client libraries ‱ PHP, Python, Ruby, Java, Erlang, Javascript, .NET ‱ Community client libraries; ‱ C, Clojure, Go, Griffon, Groovy, Haskell, Perl, Scala, Smalltalk
  • 8. What can you store? ‱ Values against keys ‱ Keys are organised in to buckets ‱ Practical value limit of 64mb ‱ For large ïŹles; Luwak (built in > 0.13) splits them in to smaller blocks
  • 9. Querying ‱ Two main interfaces; HTTP & Protocol buffers ‱ HTTP API is mainly REST - GET, PUT, DELETE ‱ Riak stores the key, value & metadata about the key; ‱ Content Type, Charset, Encoding & link data ‱ Also: any custom metadata
  • 10. Links ‱ Used to store one-way relationships between objects; ‱ Stored in object meta-data ‱ Link-walking uses MapReduce
  • 11. MapReduce ‱ Designed to be used for web-page-speed requests ‱ Built in ‱ Map / Reduce functions are written in Javascript or Erlang ‱ Can do re-reduce ‱ Streaming MapReduce
  • 12. Vector clocks ‱ Each value is tagged with a vector clock ‱ Riak can determine if values; ‱ Are direct decendants of a single object ‱ Share a common parent ‱ Unrelated ‱ In Riak each object has a vector clock ‱ Cassandra uses timestamps - problems can occur with out of sync
  • 13. Siblings ‱ Siblings are different versions of the same document which Riak has not merged ‱ Occurs only if allow_mult is enabled on a bucket AND; ‱ Concurrent write with the same vector clock value ‱ Stale vector clock ‱ No vector clock passed
  • 14. Pre & Post Commit Hooks ‱ Allow the object to be written ‱ Modify the object ‱ Fail the update ‱ They are per-bucket (stored in the properties) ‱ Written in Javascript (pre-hooks) or Erlang (pre/post-hooks)
  • 15. Admin ‱ Super simple; ‱ riak-admin join <node-in-cluster> ‱ riak-admin leave ‱ Backup tools are provided....
  • 16. Backup / restore ‱ riak-admin backup|restore <node> <cookie> <output_ïŹle> [[node| all]] ‱ Alternative is ïŹlesystem backup for bitcask; as it uses append-only ïŹles ‱ riak-admin backup is storage-engine agnostic ‱ riak-admin only backs up kv data; not search indexes (Riak-Search)
  • 17. Storage engines ‱ Ships with two default storage engines; ‱ Bitcask - default, best when keyspace < RAM ‱ InnoDB - suggested when keyspace > RAM ‱ Also available - Google’s LevelDB. It’s BSD licensed & recently integrated, good for large sets.
  • 18. Riak-Search ‱ Full-text search engine built on top of Riak ‱ Realtime ‱ Uses Lucene Analyzers, custom ones may be written in Erlang / Java ‱ Supports term / ïŹeld searchs, boolean operators, grouping, lexical range queries and end of word wildcards ‱ Will be part of Riak as default from 1.0
  • 19. Riak > Cassandra ‱ Extremely simple to add or remove nodes from a cluster ‱ No pre-setup of datamodel ‱ Rest & Protobuf API access ‱ Commercial support from the original developers, Basho
  • 20. Riak = Cassandra ‱ No single point of failure ‱ Linearly scalable ‱ High availability ‱ Eventually consistent ‱ You can choose your own consistency requirements
  • 21. Riak < Cassandra ‱ CQL; an SQL-ish language ‱ Range / cover queries are built in (no need to write MapReduce functions) ‱ ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build ‱ Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra http://wiki.apache.org/cassandra/ThirdPartySupport ‱ Cassandra is seemly more popular & has a bigger community ‱ Partitions vs MD5 of RandomPartitioner; you can’t reconïŹgure if you need - careful you plan with Riak! http://wiki.basho.com/Cluster-Capacity-Planning.html
  • 22. Further reading ‱ Basho’s slide deck; http://wiki.basho.com/Slide-Decks.html ‱ Commit hooks; http://wiki.basho.com/Pre--and-Post-Commit- Hooks.html ‱ Riak / Cassandra; http://wiki.basho.com/Riak-Compared-to- Cassandra.html

Editor's Notes