SlideShare a Scribd company logo
Introducing
    Riak
     Kevin A. Smith
   Senior Developer
  Basho Technologies
What Is Riak?
What Is Riak?

• Key/Value store
What Is Riak?

• Key/Value store
• Document-oriented database
What Is Riak?

• Key/Value store
• Document-oriented database
• Web-shaped storage
Key/Value Store
Key/Value Store


• Data organized by bucket/key pairs
Key/Value Store


• Data organized by bucket/key pairs
• Simple REST API (GET, PUT, DELETE)
Document Store
Document Store

• Store values as JSON
Document Store

• Store values as JSON
• Many clients support automatic JSON
  encoding/decoding
Document Store

• Store values as JSON
• Many clients support automatic JSON
  encoding/decoding
• Javascript Map/Reduce built on top of JSON
Web-Shaped
 Storage
Web-Shaped
         Storage

• Content neutral
Web-Shaped
          Storage

• Content neutral
• Highly distributed
Web-Shaped
          Storage

• Content neutral
• Highly distributed
• Replicated
Web-Shaped
          Storage

• Content neutral
• Highly distributed
• Replicated
• Fault-tolerant
What Is Riak?
What Is Riak?
A flexible storage engine...
What Is Riak?
A flexible storage engine...
   ...with a REST API...
What Is Riak?
 A flexible storage engine...
     ...with a REST API...
...and map/reduce capability...
What Is Riak?
   A flexible storage engine...
       ...with a REST API...
 ...and map/reduce capability...
....designed to be fault-tolerant...
What Is Riak?
   A flexible storage engine...
       ...with a REST API...
 ...and map/reduce capability...
....designed to be fault-tolerant...
          ...distributed...
What Is Riak?
   A flexible storage engine...
       ...with a REST API...
 ...and map/reduce capability...
....designed to be fault-tolerant...
          ...distributed...
        ...and ops friendly
Influences
Influences

• CAP Theorem
Influences

• CAP Theorem
• Amazon’s Dynamo Paper
Influences

• CAP Theorem
• Amazon’s Dynamo Paper
• Experience running large networks
  (Akamai)
CAP Theorem
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state

Available System is available for reads and
writes
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state

Available System is available for reads and
writes
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state

Available System is available for reads and
writes

Partition Tolerant System can handle
the failure of individual parts
Common Wisdom
Common Wisdom


   Pick two.
The Riak Way
The Riak Way

  Pick Two.
The Riak Way

     Pick Two.

For each operation.
Dynamo
Influences
Dynamo
           Influences
• N = The number of replicas
Dynamo
           Influences
• N = The number of replicas
• R = The number of replicas needed for a
  successful read
Dynamo
            Influences
• N = The number of replicas
• R = The number of replicas needed for a
  successful read
• W = The number of replicas needed for a
  successful write
Dynamo Math
Dynamo Math


N - R = read fault tolerance
Dynamo Math


N - R = read fault tolerance
N - W = write fault tolerance
Dynamo Math
Dynamo Math
N = 4, W = 2, R = 1
Dynamo Math
N = 4, W = 2, R = 1
Dynamo Math
N = 4, W = 2, R = 1


4 - 2 = 2 hosts can be down and Riak can
still perform writes.
Dynamo Math
N = 4, W = 2, R = 1


4 - 2 = 2 hosts can be down and Riak can
still perform writes.
4 - 1 = 3 hosts can be down and Riak can
still perform reads.
Riak Improvements
Riak Improvements

• N can vary per bucket
Riak Improvements

• N can vary per bucket
• R and W can vary per operation
Riak Improvements

• N can vary per bucket
• R and W can vary per operation
   Choose your own fault tolerance/performance tradeoff
Consistent Hashing
2160            0

                                 node 0
                                 node 1
                       2160/4
                                 node 2
                                 node 3

                    hash(<<"artist">>,<<"REM">>)

       2160/2
R value
            get(<<"artist">>,<<"REM">>,
                        R=2)

(N=3)
                            {ok, Object}




        X
W value
            put(<<"artist">>,<<"REM">>,
                        W=2)

(N=3)
                            ok




        X
N=10, R/W=2
                                 get/put("artist", "REM",
                                          R/W=2)
                (N=10)

                                                {ok, Object}




X                            X
    X
        X                X
            X   X    X
Resolving Conflicts
Resolving Conflicts

• Riak focuses on the AP of CAP
Resolving Conflicts

• Riak focuses on the AP of CAP
• Data could be briefly inconsistent
Resolving Conflicts

• Riak focuses on the AP of CAP
• Data could be briefly inconsistent
• Inconsistency must be resolved
Detecting & Resolving
      Conflicts
    0   1
             Object
              v0
    2   3
Detecting & Resolving
      Conflicts
             Object
   0    1
              v0

             Object
   2    3
              v0
Detecting & Resolving
      Conflicts
             Object
   0    1
              v1

             Object
   2    3
              v0
Detecting & Resolving
      Conflicts
   0    1
             Object
              v1
   2    3
Detecting & Resolving
      Conflicts
             Object
   0    1
              v1

             Object
   2    3
              v1
Client Resolution
Client Resolution

• Can be set per-bucket or server-wide
Client Resolution

• Can be set per-bucket or server-wide
• Conflicting data is “bubbled up” to the
  client
Client Resolution

• Can be set per-bucket or server-wide
• Conflicting data is “bubbled up” to the
  client
• Client picks the winner
Server Resolution
Server Resolution

• “Last write wins”
Server Resolution

• “Last write wins”
• Enabled by default
Server Resolution

• “Last write wins”
• Enabled by default
• What most apps need 80% of the time
Live Demo!
Linking Objects
Linking Objects

• Objects can store pointers, or links, to
  other objects
Linking Objects

• Objects can store pointers, or links, to
  other objects
• Doesn’t have to be the same bucket
Linking Objects

• Objects can store pointers, or links, to
  other objects
• Doesn’t have to be the same bucket
• Object links described in a Link header
Link Header Format

    Object URL


</riak/demo/test1>; riaktag="userinfo"


                              Link tag
Link Walking
Link Walking

• Ask Riak to “walk” a sequence of links
Link Walking

• Ask Riak to “walk” a sequence of links
• Optionally, collect objects along the walk
  and return them
Link Walking

• Ask Riak to “walk” a sequence of links
• Optionally, collect objects along the walk
  and return them
• Can be arbitrarily deep
Link Walking Examples
Link Walking Examples


   /riak/demo/test1/_,_,1
Link Walking Examples


      /riak/demo/test1/_,_,1
Start walking at /demo/test1 and return all
linked objects
Link Walking Examples
Link Walking Examples


  /riak/demo/test1/demo,_,1
Link Walking Examples


    /riak/demo/test1/demo,_,1
Start walking at /demo/test1 and return all
linked objects contained in the demo bucket
Link Walking Examples
Link Walking Examples


 /riak/demo/test1/_,_,0/_,_,1
Link Walking Examples


     /riak/demo/test1/_,_,0/_,_,1
Start walking at /demo/test1, find any linked objects,
then find and return any objects linked to those
Link Walking Examples
Link Walking Examples

/riak/demo/test1/_,child,0/_,_,1
Link Walking Examples

  /riak/demo/test1/_,child,0/_,_,1
Start walking at /demo/test1, find any linked objects
with the link tag “child”, then find and return any objects
linked to those
Map/Reduce
  Terms
Map/Reduce
           Terms
• Phase: A step within a job
Map/Reduce
           Terms
• Phase: A step within a job
• Job: A sequence of phases and inputs
Map/Reduce
           Terms
• Phase: A step within a job
• Job: A sequence of phases and inputs
• Map: Data collection phase
Map/Reduce
            Terms
• Phase: A step within a job
• Job: A sequence of phases and inputs
• Map: Data collection phase
• Reduce: Data collation or processing
  phase
Map/Reduce
 Overview
Map/Reduce
              Overview
• Map phases execute in parallel w/data
  locality
Map/Reduce
              Overview
• Map phases execute in parallel w/data
  locality
• Reduce phases execute in parallel on the
  node where job was submitted
Map/Reduce
              Overview
• Map phases execute in parallel w/data
  locality
• Reduce phases execute in parallel on the
  node where job was submitted
• Results are not cached or stored
Map/Reduce
              Overview
• Map phases execute in parallel w/data
  locality
• Reduce phases execute in parallel on the
  node where job was submitted
• Results are not cached or stored
• Phases can be written in Erlang or
  Javascript
Map Phase
Map Phase

• Inputs must be bucket/key pairs
Map Phase

• Inputs must be bucket/key pairs
• Must return a list
Map Phase

• Inputs must be bucket/key pairs
• Must return a list
• Parallel results are aggregated into a single
  list
Parallel Map
Parallel Map
Parallel Map
Parallel Map
Erlang Map Phase
Erlang Map Phase

• Two types: modfun and qfun
Erlang Map Phase

• Two types: modfun and qfun
• modfuns reference the module and name
  of the Erlang function to call
Erlang Map Phase

• Two types: modfun and qfun
• modfuns reference the module and name
    of the Erlang function to call

•   qfuns are anonymous Erlang functions*
Erlang Map Phase

• Two types: modfun and qfun
• modfuns reference the module and name
    of the Erlang function to call

•   qfuns are anonymous Erlang functions*

     *Must   be on the server-side codepath
Erlang Map Phase
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].


  •   Obj:riak_object retrieved from bucket/key
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].


  •   Obj:riak_object retrieved from bucket/key
  •   KeyData: Static argument specified with the bucket/
      key
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].


  •   Obj:riak_object retrieved from bucket/key
  •   KeyData: Static argument specified with the bucket/
      key
  •   Arg: Static argument specified with the phase
Erlang Map
 Built-Ins
Erlang Map
             Built-Ins
riak_mapreduce:map_object_value/3
Erlang Map
                Built-Ins
riak_mapreduce:map_object_value/3

•   Returns object value wrapped in a list
Erlang Map
                Built-Ins
riak_mapreduce:map_object_value/3

•   Returns object value wrapped in a list

riak_mapreduce:map_object_value_list/3
Erlang Map
                Built-Ins
riak_mapreduce:map_object_value/3

•   Returns object value wrapped in a list

riak_mapreduce:map_object_value_list/3

•   Returns object value. Object value must already
    be a list
Javascript
Map Phase
Javascript
           Map Phase
• Two types: jsanon and jsfun
Javascript
            Map Phase
• Two types: jsanon and jsfun
• jsanons are anonymous JS functions:
Javascript
            Map Phase
• Two types: jsanon and jsfun
• jsanons are anonymous JS functions:
  function(value) { return [value]; }
Javascript
             Map Phase
• Two types: jsanon and jsfun
• jsanons are anonymous JS functions:
  function(value) { return [value]; }

• jsfuns are named JS functions:
Javascript
             Map Phase
• Two types: jsanon and jsfun
• jsanons are anonymous JS functions:
  function(value) { return [value]; }

• jsfuns are named JS functions:
      Riak.mapValuesJson
Erlang & Javascript
Erlang & Javascript

• Same environment as Firefox minus
  browser bits
Erlang & Javascript

• Same environment as Firefox minus
  browser bits
• Erlang to Javascript data is JSON encoded
Erlang & Javascript

• Same environment as Firefox minus
  browser bits
• Erlang to Javascript data is JSON encoded
• Javascript to Erlang data is JSON decoded
Javascript Map Phase
Javascript Map Phase
function(value, keyData, arg)
Javascript Map Phase
function(value, keyData, arg)
Javascript Map Phase
function(value, keyData, arg)


• value: JSON-encoded version of
  riak_object
Javascript Map Phase
function(value, keyData, arg)


• value: JSON-encoded version of
  riak_object

• keyData: Same as Erlang
Javascript Map Phase
function(value, keyData, arg)


• value: JSON-encoded version of
  riak_object

• keyData: Same as Erlang
• arg: Same as Erlang
Javascript Map
   Built-Ins
Javascript Map
        Built-Ins
Riak.mapValues
Javascript Map
          Built-Ins
Riak.mapValues

•   Returns object values. Handles detecting
    when/if to use list wrapping.
Javascript Map
          Built-Ins
Riak.mapValues

•   Returns object values. Handles detecting
    when/if to use list wrapping.
Riak.mapValuesJson
Javascript Map
          Built-Ins
Riak.mapValues

•   Returns object values. Handles detecting
    when/if to use list wrapping.
Riak.mapValuesJson

•   Returns JSON parsed object values. Also
    performs list wrapping, if needed.
Reduce Phase
Reduce Phase

• Performed on the node coordinating the
  map/reduce job
Reduce Phase

• Performed on the node coordinating the
  map/reduce job
• Two processes per reduce phase to add
  minor parallelism
Reduce Phase

• Performed on the node coordinating the
  map/reduce job
• Two processes per reduce phase to add
  minor parallelism
• Must return a list
Erlang Reduce
   Built-Ins
Erlang Reduce
           Built-Ins
riak_mapreduce:reduce_set_union/2
Erlang Reduce
               Built-Ins
riak_mapreduce:reduce_set_union/2
•   Returns unique set of values
Erlang Reduce
               Built-Ins
riak_mapreduce:reduce_set_union/2
•   Returns unique set of values
riak_mapreduce:reduce_sum/2
Erlang Reduce
               Built-Ins
riak_mapreduce:reduce_set_union/2
•   Returns unique set of values
riak_mapreduce:reduce_sum/2
•   Returns the sum of inputs
Erlang Reduce
               Built-Ins
riak_mapreduce:reduce_set_union/2
•   Returns unique set of values
riak_mapreduce:reduce_sum/2
•   Returns the sum of inputs
riak_mapreduce:reduce_sort/2
Erlang Reduce
               Built-Ins
riak_mapreduce:reduce_set_union/2
•   Returns unique set of values
riak_mapreduce:reduce_sum/2
•   Returns the sum of inputs
riak_mapreduce:reduce_sort/2
•   Returns the sorted list of inputs
Javascript Reduce
     Built-Ins
Javascript Reduce
        Built-Ins
Riak.reduceMin
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
  Riak.reduceMax
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
  Riak.reduceMax

• Returns the maximum value of the input set
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
  Riak.reduceMax

• Returns the maximum value of the input set
  Riak.reduceSort
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
  Riak.reduceMax

• Returns the maximum value of the input set
  Riak.reduceSort

• Returns a sorted list of the input set
Building
M/R Job
Building
               M/R Job

• Job is a list of phases and starting inputs
Building
               M/R Job

• Job is a list of phases and starting inputs
• Each phase can:
Building
               M/R Job

• Job is a list of phases and starting inputs
• Each phase can:
 • Receive a static argument
Building
               M/R Job

• Job is a list of phases and starting inputs
• Each phase can:
 • Receive a static argument
 • Accumulate and return results
Submitting Jobs
  via HTTP
Submitting Jobs
        via HTTP
• Riak exposes M/R via its REST API
Submitting Jobs
        via HTTP
• Riak exposes M/R via its REST API
• Job is described in JSON
Submitting Jobs
        via HTTP
• Riak exposes M/R via its REST API
• Job is described in JSON
• Submitted via POST
Submitting Jobs
        via HTTP
• Riak exposes M/R via its REST API
• Job is described in JSON
• Submitted via POST
• Default URL is /mapred
Erlang Phase
   (JSON)
Erlang Phase
             (JSON)
{Type:{“language”:”erlang”, “module”: Module,
Erlang Phase
             (JSON)
{Type:{“language”:”erlang”, “module”: Module,
       “function”: Function, “keep”:Flag}}
Erlang Phase
             (JSON)
{Type:{“language”:”erlang”, “module”: Module,
       “function”: Function, “keep”:Flag}}
Erlang Phase
                 (JSON)
    {Type:{“language”:”erlang”, “module”: Module,
           “function”: Function, “keep”:Flag}}



•   Type: “map” or “reduce”
Erlang Phase
                 (JSON)
    {Type:{“language”:”erlang”, “module”: Module,
           “function”: Function, “keep”:Flag}}



•   Type: “map” or “reduce”

•   Module: String name of Erlang module
Erlang Phase
                 (JSON)
    {Type:{“language”:”erlang”, “module”: Module,
           “function”: Function, “keep”:Flag}}



•   Type: “map” or “reduce”

•   Module: String name of Erlang module

•   Function: String name of Erlang function
Erlang Phase
                 (JSON)
    {Type:{“language”:”erlang”, “module”: Module,
           “function”: Function, “keep”:Flag}}



•   Type: “map” or “reduce”

•   Module: String name of Erlang module

•   Function: String name of Erlang function

•   Flag: Boolean accumulation toggle
Javascript Phase
    (JSON)
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
       “source”: Source,“keep”:Flag}}
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
       “source”: Source,“keep”:Flag}}
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “source”: Source,“keep”:Flag}}


•   Type: “map” or “reduce”
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “source”: Source,“keep”:Flag}}


•   Type: “map” or “reduce”

•   Source: Source for anonymous function
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “source”: Source,“keep”:Flag}}


•   Type: “map” or “reduce”

•   Source: Source for anonymous function

•   Flag: Boolean accumulation toggle
Javascript Phase
    (JSON)
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
       “name”:Name,“keep”:Flag}}
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
       “name”:Name,“keep”:Flag}}
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “name”:Name,“keep”:Flag}}


•   Type: “map” or “reduce”
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “name”:Name,“keep”:Flag}}


•   Type: “map” or “reduce”

•   Name: String name of Javascript function
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “name”:Name,“keep”:Flag}}


•   Type: “map” or “reduce”

•   Name: String name of Javascript function

•   Flag: Boolean accumulation toggle
Putting It
              Together

{“inputs”: [[“stocks”, “goog”]],

 “query”: [{“map”:{“language”:”javascript”,

                   “name”: “Riak.mapValuesJson”},

            “keep”: true}]}
Putting It
              Together

{“inputs”: [[“stocks”, “goog”],

            [“stocks”, “csco”]],

 “query”: [{“map”:{“language”:”javascript”,

                   “name”: “Riak.mapValuesJson”},

            “keep”: true}]}
Putting It
              Together
{“inputs”: “stocks”,

 “query”: [{“map”:{“language”:”javascript”,

                   “name”: “App.extractTickers”,

                   “arg”: “GOOG”},

            “keep”: false},

           {“reduce”:{“language”:”javascript,

                       “name”: “Riak.reduceMin”},

            “keep”: true}]}
Live Demo!
Thank You

     Kevin A. Smith
Email: ksmith@basho.com
  Twitter: @kevsmith

More Related Content

PDF
Riak 2.0 : For Beginners, and Everyone Else
KEY
Introducing Riak
PDF
Relational Databases to Riak
KEY
Introduction to Riak - Red Dirt Ruby Conf Training
POTX
Apache Spark Streaming: Architecture and Fault Tolerance
KEY
Riak Training Session — Surge 2011
KEY
Embrace NoSQL and Eventual Consistency with Ripple
PPTX
Survey of Spark for Data Pre-Processing and Analytics
Riak 2.0 : For Beginners, and Everyone Else
Introducing Riak
Relational Databases to Riak
Introduction to Riak - Red Dirt Ruby Conf Training
Apache Spark Streaming: Architecture and Fault Tolerance
Riak Training Session — Surge 2011
Embrace NoSQL and Eventual Consistency with Ripple
Survey of Spark for Data Pre-Processing and Analytics

What's hot (20)

PPTX
Apache Spark
PPTX
Spark real world use cases and optimizations
KEY
Riak with node.js
KEY
Introduction to Riak and Ripple (KC.rb)
PPTX
Dive into spark2
PDF
Introduction to Scala
PDF
Scala Days NYC 2016
PPTX
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
PDF
Resilient Applications with Akka Persistence - Scaladays 2014
ODP
Introduction to Spark with Scala
PPTX
Stream processing from single node to a cluster
PPTX
03 spark rdd operations
PDF
Akka Cluster in Production
PDF
Scala, Akka, and Play: An Introduction on Heroku
PDF
A Tale of Two APIs: Using Spark Streaming In Production
PDF
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
PPTX
Spark Study Notes
PDF
Functional programming in Scala
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PDF
BDM25 - Spark runtime internal
Apache Spark
Spark real world use cases and optimizations
Riak with node.js
Introduction to Riak and Ripple (KC.rb)
Dive into spark2
Introduction to Scala
Scala Days NYC 2016
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Resilient Applications with Akka Persistence - Scaladays 2014
Introduction to Spark with Scala
Stream processing from single node to a cluster
03 spark rdd operations
Akka Cluster in Production
Scala, Akka, and Play: An Introduction on Heroku
A Tale of Two APIs: Using Spark Streaming In Production
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Spark Study Notes
Functional programming in Scala
Apache Spark in Depth: Core Concepts, Architecture & Internals
BDM25 - Spark runtime internal
Ad

Similar to Introducing Riak (20)

PDF
NoSQL overview implementation free
PDF
Seminar.2010.NoSql
PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
PPT
Big Data & NoSQL - EFS'11 (Pavlo Baron)
PDF
Riak and Ruby
PDF
Cassandra for Ruby/Rails Devs
PDF
Riak at The NYC Cloud Computing Meetup Group
PPTX
Large scale computing with mapreduce
PDF
Design Patterns For Distributed NO-reational databases
KEY
Adding Riak to your NoSQL Bag of Tricks
PDF
Intro to Cassandra
PDF
Thoughts on consistency models
PPTX
NoSQL Introduction, Theory, Implementations
PDF
PDF
Riak intro to..
PDF
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
PDF
Intro to riak
PPTX
Big Data Platforms: An Overview
PDF
Scalable Data Storage Getting You Down? To The Cloud!
PDF
Scalable Data Storage Getting you Down? To the Cloud!
NoSQL overview implementation free
Seminar.2010.NoSql
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Riak and Ruby
Cassandra for Ruby/Rails Devs
Riak at The NYC Cloud Computing Meetup Group
Large scale computing with mapreduce
Design Patterns For Distributed NO-reational databases
Adding Riak to your NoSQL Bag of Tricks
Intro to Cassandra
Thoughts on consistency models
NoSQL Introduction, Theory, Implementations
Riak intro to..
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Intro to riak
Big Data Platforms: An Overview
Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting you Down? To the Cloud!
Ad

Introducing Riak

  • 1. Introducing Riak Kevin A. Smith Senior Developer Basho Technologies
  • 3. What Is Riak? • Key/Value store
  • 4. What Is Riak? • Key/Value store • Document-oriented database
  • 5. What Is Riak? • Key/Value store • Document-oriented database • Web-shaped storage
  • 7. Key/Value Store • Data organized by bucket/key pairs
  • 8. Key/Value Store • Data organized by bucket/key pairs • Simple REST API (GET, PUT, DELETE)
  • 10. Document Store • Store values as JSON
  • 11. Document Store • Store values as JSON • Many clients support automatic JSON encoding/decoding
  • 12. Document Store • Store values as JSON • Many clients support automatic JSON encoding/decoding • Javascript Map/Reduce built on top of JSON
  • 14. Web-Shaped Storage • Content neutral
  • 15. Web-Shaped Storage • Content neutral • Highly distributed
  • 16. Web-Shaped Storage • Content neutral • Highly distributed • Replicated
  • 17. Web-Shaped Storage • Content neutral • Highly distributed • Replicated • Fault-tolerant
  • 19. What Is Riak? A flexible storage engine...
  • 20. What Is Riak? A flexible storage engine... ...with a REST API...
  • 21. What Is Riak? A flexible storage engine... ...with a REST API... ...and map/reduce capability...
  • 22. What Is Riak? A flexible storage engine... ...with a REST API... ...and map/reduce capability... ....designed to be fault-tolerant...
  • 23. What Is Riak? A flexible storage engine... ...with a REST API... ...and map/reduce capability... ....designed to be fault-tolerant... ...distributed...
  • 24. What Is Riak? A flexible storage engine... ...with a REST API... ...and map/reduce capability... ....designed to be fault-tolerant... ...distributed... ...and ops friendly
  • 27. Influences • CAP Theorem • Amazon’s Dynamo Paper
  • 28. Influences • CAP Theorem • Amazon’s Dynamo Paper • Experience running large networks (Akamai)
  • 30. CAP Theorem Consistent Reads and writes reflect a globally consistent system state
  • 31. CAP Theorem Consistent Reads and writes reflect a globally consistent system state
  • 32. CAP Theorem Consistent Reads and writes reflect a globally consistent system state Available System is available for reads and writes
  • 33. CAP Theorem Consistent Reads and writes reflect a globally consistent system state Available System is available for reads and writes
  • 34. CAP Theorem Consistent Reads and writes reflect a globally consistent system state Available System is available for reads and writes Partition Tolerant System can handle the failure of individual parts
  • 36. Common Wisdom Pick two.
  • 38. The Riak Way Pick Two.
  • 39. The Riak Way Pick Two. For each operation.
  • 41. Dynamo Influences • N = The number of replicas
  • 42. Dynamo Influences • N = The number of replicas • R = The number of replicas needed for a successful read
  • 43. Dynamo Influences • N = The number of replicas • R = The number of replicas needed for a successful read • W = The number of replicas needed for a successful write
  • 45. Dynamo Math N - R = read fault tolerance
  • 46. Dynamo Math N - R = read fault tolerance N - W = write fault tolerance
  • 48. Dynamo Math N = 4, W = 2, R = 1
  • 49. Dynamo Math N = 4, W = 2, R = 1
  • 50. Dynamo Math N = 4, W = 2, R = 1 4 - 2 = 2 hosts can be down and Riak can still perform writes.
  • 51. Dynamo Math N = 4, W = 2, R = 1 4 - 2 = 2 hosts can be down and Riak can still perform writes. 4 - 1 = 3 hosts can be down and Riak can still perform reads.
  • 53. Riak Improvements • N can vary per bucket
  • 54. Riak Improvements • N can vary per bucket • R and W can vary per operation
  • 55. Riak Improvements • N can vary per bucket • R and W can vary per operation Choose your own fault tolerance/performance tradeoff
  • 56. Consistent Hashing 2160 0 node 0 node 1 2160/4 node 2 node 3 hash(<<"artist">>,<<"REM">>) 2160/2
  • 57. R value get(<<"artist">>,<<"REM">>, R=2) (N=3) {ok, Object} X
  • 58. W value put(<<"artist">>,<<"REM">>, W=2) (N=3) ok X
  • 59. N=10, R/W=2 get/put("artist", "REM", R/W=2) (N=10) {ok, Object} X X X X X X X X
  • 61. Resolving Conflicts • Riak focuses on the AP of CAP
  • 62. Resolving Conflicts • Riak focuses on the AP of CAP • Data could be briefly inconsistent
  • 63. Resolving Conflicts • Riak focuses on the AP of CAP • Data could be briefly inconsistent • Inconsistency must be resolved
  • 64. Detecting & Resolving Conflicts 0 1 Object v0 2 3
  • 65. Detecting & Resolving Conflicts Object 0 1 v0 Object 2 3 v0
  • 66. Detecting & Resolving Conflicts Object 0 1 v1 Object 2 3 v0
  • 67. Detecting & Resolving Conflicts 0 1 Object v1 2 3
  • 68. Detecting & Resolving Conflicts Object 0 1 v1 Object 2 3 v1
  • 70. Client Resolution • Can be set per-bucket or server-wide
  • 71. Client Resolution • Can be set per-bucket or server-wide • Conflicting data is “bubbled up” to the client
  • 72. Client Resolution • Can be set per-bucket or server-wide • Conflicting data is “bubbled up” to the client • Client picks the winner
  • 75. Server Resolution • “Last write wins” • Enabled by default
  • 76. Server Resolution • “Last write wins” • Enabled by default • What most apps need 80% of the time
  • 79. Linking Objects • Objects can store pointers, or links, to other objects
  • 80. Linking Objects • Objects can store pointers, or links, to other objects • Doesn’t have to be the same bucket
  • 81. Linking Objects • Objects can store pointers, or links, to other objects • Doesn’t have to be the same bucket • Object links described in a Link header
  • 82. Link Header Format Object URL </riak/demo/test1>; riaktag="userinfo" Link tag
  • 84. Link Walking • Ask Riak to “walk” a sequence of links
  • 85. Link Walking • Ask Riak to “walk” a sequence of links • Optionally, collect objects along the walk and return them
  • 86. Link Walking • Ask Riak to “walk” a sequence of links • Optionally, collect objects along the walk and return them • Can be arbitrarily deep
  • 88. Link Walking Examples /riak/demo/test1/_,_,1
  • 89. Link Walking Examples /riak/demo/test1/_,_,1 Start walking at /demo/test1 and return all linked objects
  • 91. Link Walking Examples /riak/demo/test1/demo,_,1
  • 92. Link Walking Examples /riak/demo/test1/demo,_,1 Start walking at /demo/test1 and return all linked objects contained in the demo bucket
  • 94. Link Walking Examples /riak/demo/test1/_,_,0/_,_,1
  • 95. Link Walking Examples /riak/demo/test1/_,_,0/_,_,1 Start walking at /demo/test1, find any linked objects, then find and return any objects linked to those
  • 98. Link Walking Examples /riak/demo/test1/_,child,0/_,_,1 Start walking at /demo/test1, find any linked objects with the link tag “child”, then find and return any objects linked to those
  • 100. Map/Reduce Terms • Phase: A step within a job
  • 101. Map/Reduce Terms • Phase: A step within a job • Job: A sequence of phases and inputs
  • 102. Map/Reduce Terms • Phase: A step within a job • Job: A sequence of phases and inputs • Map: Data collection phase
  • 103. Map/Reduce Terms • Phase: A step within a job • Job: A sequence of phases and inputs • Map: Data collection phase • Reduce: Data collation or processing phase
  • 105. Map/Reduce Overview • Map phases execute in parallel w/data locality
  • 106. Map/Reduce Overview • Map phases execute in parallel w/data locality • Reduce phases execute in parallel on the node where job was submitted
  • 107. Map/Reduce Overview • Map phases execute in parallel w/data locality • Reduce phases execute in parallel on the node where job was submitted • Results are not cached or stored
  • 108. Map/Reduce Overview • Map phases execute in parallel w/data locality • Reduce phases execute in parallel on the node where job was submitted • Results are not cached or stored • Phases can be written in Erlang or Javascript
  • 110. Map Phase • Inputs must be bucket/key pairs
  • 111. Map Phase • Inputs must be bucket/key pairs • Must return a list
  • 112. Map Phase • Inputs must be bucket/key pairs • Must return a list • Parallel results are aggregated into a single list
  • 118. Erlang Map Phase • Two types: modfun and qfun
  • 119. Erlang Map Phase • Two types: modfun and qfun • modfuns reference the module and name of the Erlang function to call
  • 120. Erlang Map Phase • Two types: modfun and qfun • modfuns reference the module and name of the Erlang function to call • qfuns are anonymous Erlang functions*
  • 121. Erlang Map Phase • Two types: modfun and qfun • modfuns reference the module and name of the Erlang function to call • qfuns are anonymous Erlang functions* *Must be on the server-side codepath
  • 124. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)].
  • 125. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)].
  • 126. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)]. • Obj:riak_object retrieved from bucket/key
  • 127. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)]. • Obj:riak_object retrieved from bucket/key • KeyData: Static argument specified with the bucket/ key
  • 128. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)]. • Obj:riak_object retrieved from bucket/key • KeyData: Static argument specified with the bucket/ key • Arg: Static argument specified with the phase
  • 130. Erlang Map Built-Ins riak_mapreduce:map_object_value/3
  • 131. Erlang Map Built-Ins riak_mapreduce:map_object_value/3 • Returns object value wrapped in a list
  • 132. Erlang Map Built-Ins riak_mapreduce:map_object_value/3 • Returns object value wrapped in a list riak_mapreduce:map_object_value_list/3
  • 133. Erlang Map Built-Ins riak_mapreduce:map_object_value/3 • Returns object value wrapped in a list riak_mapreduce:map_object_value_list/3 • Returns object value. Object value must already be a list
  • 135. Javascript Map Phase • Two types: jsanon and jsfun
  • 136. Javascript Map Phase • Two types: jsanon and jsfun • jsanons are anonymous JS functions:
  • 137. Javascript Map Phase • Two types: jsanon and jsfun • jsanons are anonymous JS functions: function(value) { return [value]; }
  • 138. Javascript Map Phase • Two types: jsanon and jsfun • jsanons are anonymous JS functions: function(value) { return [value]; } • jsfuns are named JS functions:
  • 139. Javascript Map Phase • Two types: jsanon and jsfun • jsanons are anonymous JS functions: function(value) { return [value]; } • jsfuns are named JS functions: Riak.mapValuesJson
  • 141. Erlang & Javascript • Same environment as Firefox minus browser bits
  • 142. Erlang & Javascript • Same environment as Firefox minus browser bits • Erlang to Javascript data is JSON encoded
  • 143. Erlang & Javascript • Same environment as Firefox minus browser bits • Erlang to Javascript data is JSON encoded • Javascript to Erlang data is JSON decoded
  • 147. Javascript Map Phase function(value, keyData, arg) • value: JSON-encoded version of riak_object
  • 148. Javascript Map Phase function(value, keyData, arg) • value: JSON-encoded version of riak_object • keyData: Same as Erlang
  • 149. Javascript Map Phase function(value, keyData, arg) • value: JSON-encoded version of riak_object • keyData: Same as Erlang • arg: Same as Erlang
  • 150. Javascript Map Built-Ins
  • 151. Javascript Map Built-Ins Riak.mapValues
  • 152. Javascript Map Built-Ins Riak.mapValues • Returns object values. Handles detecting when/if to use list wrapping.
  • 153. Javascript Map Built-Ins Riak.mapValues • Returns object values. Handles detecting when/if to use list wrapping. Riak.mapValuesJson
  • 154. Javascript Map Built-Ins Riak.mapValues • Returns object values. Handles detecting when/if to use list wrapping. Riak.mapValuesJson • Returns JSON parsed object values. Also performs list wrapping, if needed.
  • 156. Reduce Phase • Performed on the node coordinating the map/reduce job
  • 157. Reduce Phase • Performed on the node coordinating the map/reduce job • Two processes per reduce phase to add minor parallelism
  • 158. Reduce Phase • Performed on the node coordinating the map/reduce job • Two processes per reduce phase to add minor parallelism • Must return a list
  • 159. Erlang Reduce Built-Ins
  • 160. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2
  • 161. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values
  • 162. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values riak_mapreduce:reduce_sum/2
  • 163. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values riak_mapreduce:reduce_sum/2 • Returns the sum of inputs
  • 164. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values riak_mapreduce:reduce_sum/2 • Returns the sum of inputs riak_mapreduce:reduce_sort/2
  • 165. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values riak_mapreduce:reduce_sum/2 • Returns the sum of inputs riak_mapreduce:reduce_sort/2 • Returns the sorted list of inputs
  • 166. Javascript Reduce Built-Ins
  • 167. Javascript Reduce Built-Ins Riak.reduceMin
  • 168. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set
  • 169. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set Riak.reduceMax
  • 170. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set Riak.reduceMax • Returns the maximum value of the input set
  • 171. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set Riak.reduceMax • Returns the maximum value of the input set Riak.reduceSort
  • 172. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set Riak.reduceMax • Returns the maximum value of the input set Riak.reduceSort • Returns a sorted list of the input set
  • 174. Building M/R Job • Job is a list of phases and starting inputs
  • 175. Building M/R Job • Job is a list of phases and starting inputs • Each phase can:
  • 176. Building M/R Job • Job is a list of phases and starting inputs • Each phase can: • Receive a static argument
  • 177. Building M/R Job • Job is a list of phases and starting inputs • Each phase can: • Receive a static argument • Accumulate and return results
  • 178. Submitting Jobs via HTTP
  • 179. Submitting Jobs via HTTP • Riak exposes M/R via its REST API
  • 180. Submitting Jobs via HTTP • Riak exposes M/R via its REST API • Job is described in JSON
  • 181. Submitting Jobs via HTTP • Riak exposes M/R via its REST API • Job is described in JSON • Submitted via POST
  • 182. Submitting Jobs via HTTP • Riak exposes M/R via its REST API • Job is described in JSON • Submitted via POST • Default URL is /mapred
  • 183. Erlang Phase (JSON)
  • 184. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module,
  • 185. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}}
  • 186. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}}
  • 187. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}} • Type: “map” or “reduce”
  • 188. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}} • Type: “map” or “reduce” • Module: String name of Erlang module
  • 189. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}} • Type: “map” or “reduce” • Module: String name of Erlang module • Function: String name of Erlang function
  • 190. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}} • Type: “map” or “reduce” • Module: String name of Erlang module • Function: String name of Erlang function • Flag: Boolean accumulation toggle
  • 191. Javascript Phase (JSON)
  • 192. Javascript Phase (JSON) {Type:{“language”:”javascript”,
  • 193. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}}
  • 194. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}}
  • 195. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}} • Type: “map” or “reduce”
  • 196. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}} • Type: “map” or “reduce” • Source: Source for anonymous function
  • 197. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}} • Type: “map” or “reduce” • Source: Source for anonymous function • Flag: Boolean accumulation toggle
  • 198. Javascript Phase (JSON)
  • 199. Javascript Phase (JSON) {Type:{“language”:”javascript”,
  • 200. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}}
  • 201. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}}
  • 202. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}} • Type: “map” or “reduce”
  • 203. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}} • Type: “map” or “reduce” • Name: String name of Javascript function
  • 204. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}} • Type: “map” or “reduce” • Name: String name of Javascript function • Flag: Boolean accumulation toggle
  • 205. Putting It Together {“inputs”: [[“stocks”, “goog”]], “query”: [{“map”:{“language”:”javascript”, “name”: “Riak.mapValuesJson”}, “keep”: true}]}
  • 206. Putting It Together {“inputs”: [[“stocks”, “goog”], [“stocks”, “csco”]], “query”: [{“map”:{“language”:”javascript”, “name”: “Riak.mapValuesJson”}, “keep”: true}]}
  • 207. Putting It Together {“inputs”: “stocks”, “query”: [{“map”:{“language”:”javascript”, “name”: “App.extractTickers”, “arg”: “GOOG”}, “keep”: false}, {“reduce”:{“language”:”javascript, “name”: “Riak.reduceMin”}, “keep”: true}]}
  • 209. Thank You Kevin A. Smith Email: ksmith@basho.com Twitter: @kevsmith