SlideShare a Scribd company logo
Introducing
    Riak
     Kevin A. Smith
   Senior Developer
  Basho Technologies
What Is Riak?
What Is Riak?

• Key/Value store
What Is Riak?

• Key/Value store
• Document-oriented database
What Is Riak?

• Key/Value store
• Document-oriented database
• Web-shaped storage
Key/Value Store
Key/Value Store


• Data organized by bucket/key pairs
Key/Value Store


• Data organized by bucket/key pairs
• Simple REST API (GET, PUT, DELETE)
Document Store
Document Store

• Store values as JSON
Document Store

• Store values as JSON
• Many clients support automatic JSON
  encoding/decoding
Document Store

• Store values as JSON
• Many clients support automatic JSON
  encoding/decoding
• Javascript Map/Reduce built on top of JSON
Web-Shaped
 Storage
Web-Shaped
         Storage

• Content neutral
Web-Shaped
          Storage

• Content neutral
• Highly distributed
Web-Shaped
          Storage

• Content neutral
• Highly distributed
• Replicated
Web-Shaped
          Storage

• Content neutral
• Highly distributed
• Replicated
• Fault-tolerant
What Is Riak?
What Is Riak?
A flexible storage engine...
What Is Riak?
A flexible storage engine...
   ...with a REST API...
What Is Riak?
 A flexible storage engine...
     ...with a REST API...
...and map/reduce capability...
What Is Riak?
   A flexible storage engine...
       ...with a REST API...
 ...and map/reduce capability...
....designed to be fault-tolerant...
What Is Riak?
   A flexible storage engine...
       ...with a REST API...
 ...and map/reduce capability...
....designed to be fault-tolerant...
          ...distributed...
What Is Riak?
   A flexible storage engine...
       ...with a REST API...
 ...and map/reduce capability...
....designed to be fault-tolerant...
          ...distributed...
        ...and ops friendly
Influences
Influences

• CAP Theorem
Influences

• CAP Theorem
• Amazon’s Dynamo Paper
Influences

• CAP Theorem
• Amazon’s Dynamo Paper
• Experience running large networks
  (Akamai)
CAP Theorem
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state

Available System is available for reads and
writes
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state

Available System is available for reads and
writes
CAP Theorem
Consistent Reads and writes reflect a
globally consistent system state

Available System is available for reads and
writes

Partition Tolerant System can handle
the failure of individual parts
Common Wisdom
Common Wisdom


   Pick two.
The Riak Way
The Riak Way

  Pick Two.
The Riak Way

     Pick Two.

For each operation.
Dynamo
Influences
Dynamo
           Influences
• N = The number of replicas
Dynamo
           Influences
• N = The number of replicas
• R = The number of replicas needed for a
  successful read
Dynamo
            Influences
• N = The number of replicas
• R = The number of replicas needed for a
  successful read
• W = The number of replicas needed for a
  successful write
Dynamo Math
Dynamo Math


N - R = read fault tolerance
Dynamo Math


N - R = read fault tolerance
N - W = write fault tolerance
Dynamo Math
Dynamo Math
N = 4, W = 2, R = 1
Dynamo Math
N = 4, W = 2, R = 1
Dynamo Math
N = 4, W = 2, R = 1


4 - 2 = 2 hosts can be down and Riak can
still perform writes.
Dynamo Math
N = 4, W = 2, R = 1


4 - 2 = 2 hosts can be down and Riak can
still perform writes.
4 - 1 = 3 hosts can be down and Riak can
still perform reads.
Riak Improvements
Riak Improvements

• N can vary per bucket
Riak Improvements

• N can vary per bucket
• R and W can vary per operation
Riak Improvements

• N can vary per bucket
• R and W can vary per operation
   Choose your own fault tolerance/performance tradeoff
Consistent Hashing
2160            0

                                 node 0
                                 node 1
                       2160/4
                                 node 2
                                 node 3

                    hash(<<"artist">>,<<"REM">>)

       2160/2
R value
            get(<<"artist">>,<<"REM">>,
                        R=2)

(N=3)
                            {ok, Object}




        X
W value
            put(<<"artist">>,<<"REM">>,
                        W=2)

(N=3)
                            ok




        X
N=10, R/W=2
                                 get/put("artist", "REM",
                                          R/W=2)
                (N=10)

                                                {ok, Object}




X                            X
    X
        X                X
            X   X    X
Resolving Conflicts
Resolving Conflicts

• Riak focuses on the AP of CAP
Resolving Conflicts

• Riak focuses on the AP of CAP
• Data could be briefly inconsistent
Resolving Conflicts

• Riak focuses on the AP of CAP
• Data could be briefly inconsistent
• Inconsistency must be resolved
Detecting & Resolving
      Conflicts
    0   1
             Object
              v0
    2   3
Detecting & Resolving
      Conflicts
             Object
   0    1
              v0

             Object
   2    3
              v0
Detecting & Resolving
      Conflicts
             Object
   0    1
              v1

             Object
   2    3
              v0
Detecting & Resolving
      Conflicts
   0    1
             Object
              v1
   2    3
Detecting & Resolving
      Conflicts
             Object
   0    1
              v1

             Object
   2    3
              v1
Client Resolution
Client Resolution

• Can be set per-bucket or server-wide
Client Resolution

• Can be set per-bucket or server-wide
• Conflicting data is “bubbled up” to the
  client
Client Resolution

• Can be set per-bucket or server-wide
• Conflicting data is “bubbled up” to the
  client
• Client picks the winner
Server Resolution
Server Resolution

• “Last write wins”
Server Resolution

• “Last write wins”
• Enabled by default
Server Resolution

• “Last write wins”
• Enabled by default
• What most apps need 80% of the time
Live Demo!
Linking Objects
Linking Objects

• Objects can store pointers, or links, to
  other objects
Linking Objects

• Objects can store pointers, or links, to
  other objects
• Doesn’t have to be the same bucket
Linking Objects

• Objects can store pointers, or links, to
  other objects
• Doesn’t have to be the same bucket
• Object links described in a Link header
Link Header Format

    Object URL


</riak/demo/test1>; riaktag="userinfo"


                              Link tag
Link Walking
Link Walking

• Ask Riak to “walk” a sequence of links
Link Walking

• Ask Riak to “walk” a sequence of links
• Optionally, collect objects along the walk
  and return them
Link Walking

• Ask Riak to “walk” a sequence of links
• Optionally, collect objects along the walk
  and return them
• Can be arbitrarily deep
Link Walking Examples
Link Walking Examples


   /riak/demo/test1/_,_,1
Link Walking Examples


      /riak/demo/test1/_,_,1
Start walking at /demo/test1 and return all
linked objects
Link Walking Examples
Link Walking Examples


  /riak/demo/test1/demo,_,1
Link Walking Examples


    /riak/demo/test1/demo,_,1
Start walking at /demo/test1 and return all
linked objects contained in the demo bucket
Link Walking Examples
Link Walking Examples


 /riak/demo/test1/_,_,0/_,_,1
Link Walking Examples


     /riak/demo/test1/_,_,0/_,_,1
Start walking at /demo/test1, find any linked objects,
then find and return any objects linked to those
Link Walking Examples
Link Walking Examples

/riak/demo/test1/_,child,0/_,_,1
Link Walking Examples

  /riak/demo/test1/_,child,0/_,_,1
Start walking at /demo/test1, find any linked objects
with the link tag “child”, then find and return any objects
linked to those
Map/Reduce
  Terms
Map/Reduce
           Terms
• Phase: A step within a job
Map/Reduce
           Terms
• Phase: A step within a job
• Job: A sequence of phases and inputs
Map/Reduce
           Terms
• Phase: A step within a job
• Job: A sequence of phases and inputs
• Map: Data collection phase
Map/Reduce
            Terms
• Phase: A step within a job
• Job: A sequence of phases and inputs
• Map: Data collection phase
• Reduce: Data collation or processing
  phase
Map/Reduce
 Overview
Map/Reduce
              Overview
• Map phases execute in parallel w/data
  locality
Map/Reduce
              Overview
• Map phases execute in parallel w/data
  locality
• Reduce phases execute in parallel on the
  node where job was submitted
Map/Reduce
              Overview
• Map phases execute in parallel w/data
  locality
• Reduce phases execute in parallel on the
  node where job was submitted
• Results are not cached or stored
Map/Reduce
              Overview
• Map phases execute in parallel w/data
  locality
• Reduce phases execute in parallel on the
  node where job was submitted
• Results are not cached or stored
• Phases can be written in Erlang or
  Javascript
Map Phase
Map Phase

• Inputs must be bucket/key pairs
Map Phase

• Inputs must be bucket/key pairs
• Must return a list
Map Phase

• Inputs must be bucket/key pairs
• Must return a list
• Parallel results are aggregated into a single
  list
Parallel Map
Parallel Map
Parallel Map
Parallel Map
Erlang Map Phase
Erlang Map Phase

• Two types: modfun and qfun
Erlang Map Phase

• Two types: modfun and qfun
• modfuns reference the module and name
  of the Erlang function to call
Erlang Map Phase

• Two types: modfun and qfun
• modfuns reference the module and name
    of the Erlang function to call

•   qfuns are anonymous Erlang functions*
Erlang Map Phase

• Two types: modfun and qfun
• modfuns reference the module and name
    of the Erlang function to call

•   qfuns are anonymous Erlang functions*

     *Must   be on the server-side codepath
Erlang Map Phase
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].


  • Obj:riak_object retrieved from bucket/key
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].


  • Obj:riak_object retrieved from bucket/key
  • KeyData: Static argument specified with the bucket/
    key
Erlang Map Phase
map_object_value(Obj, _KeyData, _Arg) ->
 [riak_object:get_value(Obj)].


  • Obj:riak_object retrieved from bucket/key
  • KeyData: Static argument specified with the bucket/
    key
  • Arg: Static argument specified with the phase
Erlang Map
 Built-Ins
Erlang Map
             Built-Ins
riak_mapreduce:map_object_value/3
Erlang Map
               Built-Ins
riak_mapreduce:map_object_value/3

• Returns object value wrapped in a list
Erlang Map
               Built-Ins
riak_mapreduce:map_object_value/3

• Returns object value wrapped in a list
riak_mapreduce:map_object_value_list/3
Erlang Map
               Built-Ins
riak_mapreduce:map_object_value/3

• Returns object value wrapped in a list
riak_mapreduce:map_object_value_list/3

• Returns object value. Object value must already
  be a list
Javascript
Map Phase
Javascript
           Map Phase
• Two types: jsanon and jsfun
Javascript
            Map Phase
• Two types: jsanon and jsfun
• jsanons are anonymous JS functions:
Javascript
            Map Phase
• Two types: jsanon and jsfun
• jsanons are anonymous JS functions:
  function(value) { return [value]; }
Javascript
             Map Phase
• Two types: jsanon and jsfun
• jsanons are anonymous JS functions:
  function(value) { return [value]; }

• jsfuns are named JS functions:
Javascript
             Map Phase
• Two types: jsanon and jsfun
• jsanons are anonymous JS functions:
  function(value) { return [value]; }

• jsfuns are named JS functions:
      Riak.mapValuesJson
Erlang & Javascript
Erlang & Javascript

• Same environment as Firefox minus
  browser bits
Erlang & Javascript

• Same environment as Firefox minus
  browser bits
• Erlang to Javascript data is JSON encoded
Erlang & Javascript

• Same environment as Firefox minus
  browser bits
• Erlang to Javascript data is JSON encoded
• Javascript to Erlang data is JSON decoded
Javascript Map Phase
Javascript Map Phase
function(value, keyData, arg)
Javascript Map Phase
function(value, keyData, arg)
Javascript Map Phase
function(value, keyData, arg)


• value: JSON-encoded version of
  riak_object
Javascript Map Phase
function(value, keyData, arg)


• value: JSON-encoded version of
  riak_object

• keyData: Same as Erlang
Javascript Map Phase
function(value, keyData, arg)


• value: JSON-encoded version of
  riak_object

• keyData: Same as Erlang
• arg: Same as Erlang
Javascript Map
   Built-Ins
Javascript Map
        Built-Ins
Riak.mapValues
Javascript Map
         Built-Ins
Riak.mapValues

• Returns object values. Handles detecting
  when/if to use list wrapping.
Javascript Map
         Built-Ins
Riak.mapValues

• Returns object values. Handles detecting
  when/if to use list wrapping.
Riak.mapValuesJson
Javascript Map
         Built-Ins
Riak.mapValues

• Returns object values. Handles detecting
  when/if to use list wrapping.
Riak.mapValuesJson

• Returns JSON parsed object values. Also
  performs list wrapping, if needed.
Reduce Phase
Reduce Phase

• Performed on the node coordinating the
  map/reduce job
Reduce Phase

• Performed on the node coordinating the
  map/reduce job
• Two processes per reduce phase to add
  minor parallelism
Reduce Phase

• Performed on the node coordinating the
  map/reduce job
• Two processes per reduce phase to add
  minor parallelism
• Must return a list
Erlang Reduce
   Built-Ins
Erlang Reduce
           Built-Ins
riak_mapreduce:reduce_set_union/2
Erlang Reduce
             Built-Ins
riak_mapreduce:reduce_set_union/2
• Returns unique set of values
Erlang Reduce
             Built-Ins
riak_mapreduce:reduce_set_union/2
• Returns unique set of values
riak_mapreduce:reduce_sum/2
Erlang Reduce
             Built-Ins
riak_mapreduce:reduce_set_union/2
• Returns unique set of values
riak_mapreduce:reduce_sum/2
• Returns the sum of inputs
Erlang Reduce
             Built-Ins
riak_mapreduce:reduce_set_union/2
• Returns unique set of values
riak_mapreduce:reduce_sum/2
• Returns the sum of inputs
riak_mapreduce:reduce_sort/2
Erlang Reduce
              Built-Ins
riak_mapreduce:reduce_set_union/2
• Returns unique set of values
riak_mapreduce:reduce_sum/2
• Returns the sum of inputs
riak_mapreduce:reduce_sort/2
• Returns the sorted list of inputs
Javascript Reduce
     Built-Ins
Javascript Reduce
        Built-Ins
Riak.reduceMin
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
  Riak.reduceMax
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
  Riak.reduceMax

• Returns the maximum value of the input set
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
  Riak.reduceMax

• Returns the maximum value of the input set
  Riak.reduceSort
Javascript Reduce
           Built-Ins
  Riak.reduceMin

• Returns the minimum value of the input set
  Riak.reduceMax

• Returns the maximum value of the input set
  Riak.reduceSort

• Returns a sorted list of the input set
Building
M/R Job
Building
               M/R Job

• Job is a list of phases and starting inputs
Building
               M/R Job

• Job is a list of phases and starting inputs
• Each phase can:
Building
               M/R Job

• Job is a list of phases and starting inputs
• Each phase can:
 • Receive a static argument
Building
               M/R Job

• Job is a list of phases and starting inputs
• Each phase can:
 • Receive a static argument
 • Accumulate and return results
Submitting Jobs
  via HTTP
Submitting Jobs
        via HTTP
• Riak exposes M/R via its REST API
Submitting Jobs
        via HTTP
• Riak exposes M/R via its REST API
• Job is described in JSON
Submitting Jobs
        via HTTP
• Riak exposes M/R via its REST API
• Job is described in JSON
• Submitted via POST
Submitting Jobs
        via HTTP
• Riak exposes M/R via its REST API
• Job is described in JSON
• Submitted via POST
• Default URL is /mapred
Erlang Phase
   (JSON)
Erlang Phase
             (JSON)
{Type:{“language”:”erlang”, “module”: Module,
Erlang Phase
             (JSON)
{Type:{“language”:”erlang”, “module”: Module,
       “function”: Function, “keep”:Flag}}
Erlang Phase
             (JSON)
{Type:{“language”:”erlang”, “module”: Module,
       “function”: Function, “keep”:Flag}}
Erlang Phase
                 (JSON)
    {Type:{“language”:”erlang”, “module”: Module,
           “function”: Function, “keep”:Flag}}



•   Type: “map” or “reduce”
Erlang Phase
                 (JSON)
    {Type:{“language”:”erlang”, “module”: Module,
           “function”: Function, “keep”:Flag}}



•   Type: “map” or “reduce”

•   Module: String name of Erlang module
Erlang Phase
                 (JSON)
    {Type:{“language”:”erlang”, “module”: Module,
           “function”: Function, “keep”:Flag}}



•   Type: “map” or “reduce”

•   Module: String name of Erlang module

•   Function: String name of Erlang function
Erlang Phase
                 (JSON)
    {Type:{“language”:”erlang”, “module”: Module,
           “function”: Function, “keep”:Flag}}



•   Type: “map” or “reduce”

•   Module: String name of Erlang module

•   Function: String name of Erlang function

•   Flag: Boolean accumulation toggle
Javascript Phase
    (JSON)
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
       “source”: Source,“keep”:Flag}}
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
       “source”: Source,“keep”:Flag}}
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “source”: Source,“keep”:Flag}}


•   Type: “map” or “reduce”
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “source”: Source,“keep”:Flag}}


•   Type: “map” or “reduce”

•   Source: Source for anonymous function
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “source”: Source,“keep”:Flag}}


•   Type: “map” or “reduce”

•   Source: Source for anonymous function

•   Flag: Boolean accumulation toggle
Javascript Phase
    (JSON)
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
       “name”:Name,“keep”:Flag}}
Javascript Phase
          (JSON)
{Type:{“language”:”javascript”,
       “name”:Name,“keep”:Flag}}
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “name”:Name,“keep”:Flag}}


•   Type: “map” or “reduce”
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “name”:Name,“keep”:Flag}}


•   Type: “map” or “reduce”

•   Name: String name of Javascript function
Javascript Phase
              (JSON)
    {Type:{“language”:”javascript”,
           “name”:Name,“keep”:Flag}}


•   Type: “map” or “reduce”

•   Name: String name of Javascript function

•   Flag: Boolean accumulation toggle
Putting It
              Together

{“inputs”: [[“stocks”, “goog”]],

 “query”: [{“map”:{“language”:”javascript”,

                   “name”: “Riak.mapValuesJson”},

            “keep”: true}]}
Putting It
              Together

{“inputs”: [[“stocks”, “goog”],

            [“stocks”, “csco”]],

 “query”: [{“map”:{“language”:”javascript”,

                   “name”: “Riak.mapValuesJson”},

            “keep”: true}]}
Putting It
              Together
{“inputs”: “stocks”,

 “query”: [{“map”:{“language”:”javascript”,

                   “name”: “App.extractTickers”,

                   “arg”: “GOOG”},

            “keep”: false},

           {“reduce”:{“language”:”javascript,

                       “name”: “Riak.reduceMin”},

            “keep”: true}]}
Live Demo!
Thank You

     Kevin A. Smith
Email: ksmith@basho.com
  Twitter: @kevsmith

More Related Content

PPT
Intro to MySQL Master Slave Replication
PDF
MySQL Advanced Administrator 2021 - 네오클로바
PPTX
Data models in NoSQL
PPTX
RECURSIVE DESCENT PARSING
PPT
Consistency protocols
PDF
Best practices for MySQL High Availability
PPTX
MySQL_MariaDB-성능개선-202201.pptx
KEY
Testing Hadoop jobs with MRUnit
Intro to MySQL Master Slave Replication
MySQL Advanced Administrator 2021 - 네오클로바
Data models in NoSQL
RECURSIVE DESCENT PARSING
Consistency protocols
Best practices for MySQL High Availability
MySQL_MariaDB-성능개선-202201.pptx
Testing Hadoop jobs with MRUnit

What's hot (20)

PDF
MongoDB WiredTiger Internals: Journey To Transactions
PDF
Best practices for MySQL High Availability Tutorial
PPTX
Learning in AI
PDF
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
PPTX
Pytorch
PDF
[2018] MySQL 이중화 진화기
PDF
InnoDb Vs NDB Cluster
PPTX
Memory Management & Garbage Collection
PPTX
Introduction to NoSQL
PDF
Spark
PDF
MySQL High Availability -- InnoDB Clusters
PDF
MySQL Enterprise Backup (MEB)
PDF
MariaDB 마이그레이션 - 네오클로바
PDF
Best Practices of running PostgreSQL in Virtual Environments
PDF
Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...
PDF
Multiprocessor
PDF
Best Practices with PostgreSQL on Solaris
PPTX
Introduction to HDFS
PDF
Introduction to MySQL InnoDB Cluster
PPTX
Semophores and it's types
MongoDB WiredTiger Internals: Journey To Transactions
Best practices for MySQL High Availability Tutorial
Learning in AI
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
Pytorch
[2018] MySQL 이중화 진화기
InnoDb Vs NDB Cluster
Memory Management & Garbage Collection
Introduction to NoSQL
Spark
MySQL High Availability -- InnoDB Clusters
MySQL Enterprise Backup (MEB)
MariaDB 마이그레이션 - 네오클로바
Best Practices of running PostgreSQL in Virtual Environments
Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...
Multiprocessor
Best Practices with PostgreSQL on Solaris
Introduction to HDFS
Introduction to MySQL InnoDB Cluster
Semophores and it's types
Ad

Viewers also liked (9)

PPT
Riak a successful failure
KEY
Introduction to Riak - Red Dirt Ruby Conf Training
PDF
Riak Operations
PDF
Riak - From Small to Large
PDF
Distributed Key-Value Stores- Featuring Riak
PDF
Riak (Øredev nosql day)
KEY
Riak in Ten Minutes
PDF
Relational Databases to Riak
KEY
Riak Training Session — Surge 2011
Riak a successful failure
Introduction to Riak - Red Dirt Ruby Conf Training
Riak Operations
Riak - From Small to Large
Distributed Key-Value Stores- Featuring Riak
Riak (Øredev nosql day)
Riak in Ten Minutes
Relational Databases to Riak
Riak Training Session — Surge 2011
Ad

Similar to Introducing Riak (20)

KEY
Introduction to Riak and Ripple (KC.rb)
KEY
Embrace NoSQL and Eventual Consistency with Ripple
KEY
Adding Riak to your NoSQL Bag of Tricks
PDF
Rack
PPT
Processing Large Graphs
PDF
Dynamo: Not Just For Datastores
PDF
Riak at The NYC Cloud Computing Meetup Group
PPTX
Convergent Replicated Data Types in Riak 2.0
PPTX
When OLAP Meets Real-Time, What Happens in eBay?
PDF
Let's Get to the Rapids
PDF
Riak at Engine Yard Cloud
KEY
London devops logging
PDF
Reactive Stream Processing with Akka Streams
PPTX
Java script basics
PPTX
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
PPTX
Introduction to R - Basics of R programming, Data structures.pptx
PPT
Rolling With Riak
PPTX
Large Scale Machine Learning with Apache Spark
PDF
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
PDF
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
Introduction to Riak and Ripple (KC.rb)
Embrace NoSQL and Eventual Consistency with Ripple
Adding Riak to your NoSQL Bag of Tricks
Rack
Processing Large Graphs
Dynamo: Not Just For Datastores
Riak at The NYC Cloud Computing Meetup Group
Convergent Replicated Data Types in Riak 2.0
When OLAP Meets Real-Time, What Happens in eBay?
Let's Get to the Rapids
Riak at Engine Yard Cloud
London devops logging
Reactive Stream Processing with Akka Streams
Java script basics
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
Introduction to R - Basics of R programming, Data structures.pptx
Rolling With Riak
Large Scale Machine Learning with Apache Spark
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...

Introducing Riak

  • 1. Introducing Riak Kevin A. Smith Senior Developer Basho Technologies
  • 3. What Is Riak? • Key/Value store
  • 4. What Is Riak? • Key/Value store • Document-oriented database
  • 5. What Is Riak? • Key/Value store • Document-oriented database • Web-shaped storage
  • 7. Key/Value Store • Data organized by bucket/key pairs
  • 8. Key/Value Store • Data organized by bucket/key pairs • Simple REST API (GET, PUT, DELETE)
  • 10. Document Store • Store values as JSON
  • 11. Document Store • Store values as JSON • Many clients support automatic JSON encoding/decoding
  • 12. Document Store • Store values as JSON • Many clients support automatic JSON encoding/decoding • Javascript Map/Reduce built on top of JSON
  • 14. Web-Shaped Storage • Content neutral
  • 15. Web-Shaped Storage • Content neutral • Highly distributed
  • 16. Web-Shaped Storage • Content neutral • Highly distributed • Replicated
  • 17. Web-Shaped Storage • Content neutral • Highly distributed • Replicated • Fault-tolerant
  • 19. What Is Riak? A flexible storage engine...
  • 20. What Is Riak? A flexible storage engine... ...with a REST API...
  • 21. What Is Riak? A flexible storage engine... ...with a REST API... ...and map/reduce capability...
  • 22. What Is Riak? A flexible storage engine... ...with a REST API... ...and map/reduce capability... ....designed to be fault-tolerant...
  • 23. What Is Riak? A flexible storage engine... ...with a REST API... ...and map/reduce capability... ....designed to be fault-tolerant... ...distributed...
  • 24. What Is Riak? A flexible storage engine... ...with a REST API... ...and map/reduce capability... ....designed to be fault-tolerant... ...distributed... ...and ops friendly
  • 27. Influences • CAP Theorem • Amazon’s Dynamo Paper
  • 28. Influences • CAP Theorem • Amazon’s Dynamo Paper • Experience running large networks (Akamai)
  • 30. CAP Theorem Consistent Reads and writes reflect a globally consistent system state
  • 31. CAP Theorem Consistent Reads and writes reflect a globally consistent system state
  • 32. CAP Theorem Consistent Reads and writes reflect a globally consistent system state Available System is available for reads and writes
  • 33. CAP Theorem Consistent Reads and writes reflect a globally consistent system state Available System is available for reads and writes
  • 34. CAP Theorem Consistent Reads and writes reflect a globally consistent system state Available System is available for reads and writes Partition Tolerant System can handle the failure of individual parts
  • 36. Common Wisdom Pick two.
  • 38. The Riak Way Pick Two.
  • 39. The Riak Way Pick Two. For each operation.
  • 41. Dynamo Influences • N = The number of replicas
  • 42. Dynamo Influences • N = The number of replicas • R = The number of replicas needed for a successful read
  • 43. Dynamo Influences • N = The number of replicas • R = The number of replicas needed for a successful read • W = The number of replicas needed for a successful write
  • 45. Dynamo Math N - R = read fault tolerance
  • 46. Dynamo Math N - R = read fault tolerance N - W = write fault tolerance
  • 48. Dynamo Math N = 4, W = 2, R = 1
  • 49. Dynamo Math N = 4, W = 2, R = 1
  • 50. Dynamo Math N = 4, W = 2, R = 1 4 - 2 = 2 hosts can be down and Riak can still perform writes.
  • 51. Dynamo Math N = 4, W = 2, R = 1 4 - 2 = 2 hosts can be down and Riak can still perform writes. 4 - 1 = 3 hosts can be down and Riak can still perform reads.
  • 53. Riak Improvements • N can vary per bucket
  • 54. Riak Improvements • N can vary per bucket • R and W can vary per operation
  • 55. Riak Improvements • N can vary per bucket • R and W can vary per operation Choose your own fault tolerance/performance tradeoff
  • 56. Consistent Hashing 2160 0 node 0 node 1 2160/4 node 2 node 3 hash(<<"artist">>,<<"REM">>) 2160/2
  • 57. R value get(<<"artist">>,<<"REM">>, R=2) (N=3) {ok, Object} X
  • 58. W value put(<<"artist">>,<<"REM">>, W=2) (N=3) ok X
  • 59. N=10, R/W=2 get/put("artist", "REM", R/W=2) (N=10) {ok, Object} X X X X X X X X
  • 61. Resolving Conflicts • Riak focuses on the AP of CAP
  • 62. Resolving Conflicts • Riak focuses on the AP of CAP • Data could be briefly inconsistent
  • 63. Resolving Conflicts • Riak focuses on the AP of CAP • Data could be briefly inconsistent • Inconsistency must be resolved
  • 64. Detecting & Resolving Conflicts 0 1 Object v0 2 3
  • 65. Detecting & Resolving Conflicts Object 0 1 v0 Object 2 3 v0
  • 66. Detecting & Resolving Conflicts Object 0 1 v1 Object 2 3 v0
  • 67. Detecting & Resolving Conflicts 0 1 Object v1 2 3
  • 68. Detecting & Resolving Conflicts Object 0 1 v1 Object 2 3 v1
  • 70. Client Resolution • Can be set per-bucket or server-wide
  • 71. Client Resolution • Can be set per-bucket or server-wide • Conflicting data is “bubbled up” to the client
  • 72. Client Resolution • Can be set per-bucket or server-wide • Conflicting data is “bubbled up” to the client • Client picks the winner
  • 75. Server Resolution • “Last write wins” • Enabled by default
  • 76. Server Resolution • “Last write wins” • Enabled by default • What most apps need 80% of the time
  • 79. Linking Objects • Objects can store pointers, or links, to other objects
  • 80. Linking Objects • Objects can store pointers, or links, to other objects • Doesn’t have to be the same bucket
  • 81. Linking Objects • Objects can store pointers, or links, to other objects • Doesn’t have to be the same bucket • Object links described in a Link header
  • 82. Link Header Format Object URL </riak/demo/test1>; riaktag="userinfo" Link tag
  • 84. Link Walking • Ask Riak to “walk” a sequence of links
  • 85. Link Walking • Ask Riak to “walk” a sequence of links • Optionally, collect objects along the walk and return them
  • 86. Link Walking • Ask Riak to “walk” a sequence of links • Optionally, collect objects along the walk and return them • Can be arbitrarily deep
  • 88. Link Walking Examples /riak/demo/test1/_,_,1
  • 89. Link Walking Examples /riak/demo/test1/_,_,1 Start walking at /demo/test1 and return all linked objects
  • 91. Link Walking Examples /riak/demo/test1/demo,_,1
  • 92. Link Walking Examples /riak/demo/test1/demo,_,1 Start walking at /demo/test1 and return all linked objects contained in the demo bucket
  • 94. Link Walking Examples /riak/demo/test1/_,_,0/_,_,1
  • 95. Link Walking Examples /riak/demo/test1/_,_,0/_,_,1 Start walking at /demo/test1, find any linked objects, then find and return any objects linked to those
  • 98. Link Walking Examples /riak/demo/test1/_,child,0/_,_,1 Start walking at /demo/test1, find any linked objects with the link tag “child”, then find and return any objects linked to those
  • 100. Map/Reduce Terms • Phase: A step within a job
  • 101. Map/Reduce Terms • Phase: A step within a job • Job: A sequence of phases and inputs
  • 102. Map/Reduce Terms • Phase: A step within a job • Job: A sequence of phases and inputs • Map: Data collection phase
  • 103. Map/Reduce Terms • Phase: A step within a job • Job: A sequence of phases and inputs • Map: Data collection phase • Reduce: Data collation or processing phase
  • 105. Map/Reduce Overview • Map phases execute in parallel w/data locality
  • 106. Map/Reduce Overview • Map phases execute in parallel w/data locality • Reduce phases execute in parallel on the node where job was submitted
  • 107. Map/Reduce Overview • Map phases execute in parallel w/data locality • Reduce phases execute in parallel on the node where job was submitted • Results are not cached or stored
  • 108. Map/Reduce Overview • Map phases execute in parallel w/data locality • Reduce phases execute in parallel on the node where job was submitted • Results are not cached or stored • Phases can be written in Erlang or Javascript
  • 110. Map Phase • Inputs must be bucket/key pairs
  • 111. Map Phase • Inputs must be bucket/key pairs • Must return a list
  • 112. Map Phase • Inputs must be bucket/key pairs • Must return a list • Parallel results are aggregated into a single list
  • 118. Erlang Map Phase • Two types: modfun and qfun
  • 119. Erlang Map Phase • Two types: modfun and qfun • modfuns reference the module and name of the Erlang function to call
  • 120. Erlang Map Phase • Two types: modfun and qfun • modfuns reference the module and name of the Erlang function to call • qfuns are anonymous Erlang functions*
  • 121. Erlang Map Phase • Two types: modfun and qfun • modfuns reference the module and name of the Erlang function to call • qfuns are anonymous Erlang functions* *Must be on the server-side codepath
  • 124. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)].
  • 125. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)].
  • 126. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)]. • Obj:riak_object retrieved from bucket/key
  • 127. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)]. • Obj:riak_object retrieved from bucket/key • KeyData: Static argument specified with the bucket/ key
  • 128. Erlang Map Phase map_object_value(Obj, _KeyData, _Arg) -> [riak_object:get_value(Obj)]. • Obj:riak_object retrieved from bucket/key • KeyData: Static argument specified with the bucket/ key • Arg: Static argument specified with the phase
  • 130. Erlang Map Built-Ins riak_mapreduce:map_object_value/3
  • 131. Erlang Map Built-Ins riak_mapreduce:map_object_value/3 • Returns object value wrapped in a list
  • 132. Erlang Map Built-Ins riak_mapreduce:map_object_value/3 • Returns object value wrapped in a list riak_mapreduce:map_object_value_list/3
  • 133. Erlang Map Built-Ins riak_mapreduce:map_object_value/3 • Returns object value wrapped in a list riak_mapreduce:map_object_value_list/3 • Returns object value. Object value must already be a list
  • 135. Javascript Map Phase • Two types: jsanon and jsfun
  • 136. Javascript Map Phase • Two types: jsanon and jsfun • jsanons are anonymous JS functions:
  • 137. Javascript Map Phase • Two types: jsanon and jsfun • jsanons are anonymous JS functions: function(value) { return [value]; }
  • 138. Javascript Map Phase • Two types: jsanon and jsfun • jsanons are anonymous JS functions: function(value) { return [value]; } • jsfuns are named JS functions:
  • 139. Javascript Map Phase • Two types: jsanon and jsfun • jsanons are anonymous JS functions: function(value) { return [value]; } • jsfuns are named JS functions: Riak.mapValuesJson
  • 141. Erlang & Javascript • Same environment as Firefox minus browser bits
  • 142. Erlang & Javascript • Same environment as Firefox minus browser bits • Erlang to Javascript data is JSON encoded
  • 143. Erlang & Javascript • Same environment as Firefox minus browser bits • Erlang to Javascript data is JSON encoded • Javascript to Erlang data is JSON decoded
  • 147. Javascript Map Phase function(value, keyData, arg) • value: JSON-encoded version of riak_object
  • 148. Javascript Map Phase function(value, keyData, arg) • value: JSON-encoded version of riak_object • keyData: Same as Erlang
  • 149. Javascript Map Phase function(value, keyData, arg) • value: JSON-encoded version of riak_object • keyData: Same as Erlang • arg: Same as Erlang
  • 150. Javascript Map Built-Ins
  • 151. Javascript Map Built-Ins Riak.mapValues
  • 152. Javascript Map Built-Ins Riak.mapValues • Returns object values. Handles detecting when/if to use list wrapping.
  • 153. Javascript Map Built-Ins Riak.mapValues • Returns object values. Handles detecting when/if to use list wrapping. Riak.mapValuesJson
  • 154. Javascript Map Built-Ins Riak.mapValues • Returns object values. Handles detecting when/if to use list wrapping. Riak.mapValuesJson • Returns JSON parsed object values. Also performs list wrapping, if needed.
  • 156. Reduce Phase • Performed on the node coordinating the map/reduce job
  • 157. Reduce Phase • Performed on the node coordinating the map/reduce job • Two processes per reduce phase to add minor parallelism
  • 158. Reduce Phase • Performed on the node coordinating the map/reduce job • Two processes per reduce phase to add minor parallelism • Must return a list
  • 159. Erlang Reduce Built-Ins
  • 160. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2
  • 161. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values
  • 162. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values riak_mapreduce:reduce_sum/2
  • 163. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values riak_mapreduce:reduce_sum/2 • Returns the sum of inputs
  • 164. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values riak_mapreduce:reduce_sum/2 • Returns the sum of inputs riak_mapreduce:reduce_sort/2
  • 165. Erlang Reduce Built-Ins riak_mapreduce:reduce_set_union/2 • Returns unique set of values riak_mapreduce:reduce_sum/2 • Returns the sum of inputs riak_mapreduce:reduce_sort/2 • Returns the sorted list of inputs
  • 166. Javascript Reduce Built-Ins
  • 167. Javascript Reduce Built-Ins Riak.reduceMin
  • 168. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set
  • 169. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set Riak.reduceMax
  • 170. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set Riak.reduceMax • Returns the maximum value of the input set
  • 171. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set Riak.reduceMax • Returns the maximum value of the input set Riak.reduceSort
  • 172. Javascript Reduce Built-Ins Riak.reduceMin • Returns the minimum value of the input set Riak.reduceMax • Returns the maximum value of the input set Riak.reduceSort • Returns a sorted list of the input set
  • 174. Building M/R Job • Job is a list of phases and starting inputs
  • 175. Building M/R Job • Job is a list of phases and starting inputs • Each phase can:
  • 176. Building M/R Job • Job is a list of phases and starting inputs • Each phase can: • Receive a static argument
  • 177. Building M/R Job • Job is a list of phases and starting inputs • Each phase can: • Receive a static argument • Accumulate and return results
  • 178. Submitting Jobs via HTTP
  • 179. Submitting Jobs via HTTP • Riak exposes M/R via its REST API
  • 180. Submitting Jobs via HTTP • Riak exposes M/R via its REST API • Job is described in JSON
  • 181. Submitting Jobs via HTTP • Riak exposes M/R via its REST API • Job is described in JSON • Submitted via POST
  • 182. Submitting Jobs via HTTP • Riak exposes M/R via its REST API • Job is described in JSON • Submitted via POST • Default URL is /mapred
  • 183. Erlang Phase (JSON)
  • 184. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module,
  • 185. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}}
  • 186. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}}
  • 187. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}} • Type: “map” or “reduce”
  • 188. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}} • Type: “map” or “reduce” • Module: String name of Erlang module
  • 189. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}} • Type: “map” or “reduce” • Module: String name of Erlang module • Function: String name of Erlang function
  • 190. Erlang Phase (JSON) {Type:{“language”:”erlang”, “module”: Module, “function”: Function, “keep”:Flag}} • Type: “map” or “reduce” • Module: String name of Erlang module • Function: String name of Erlang function • Flag: Boolean accumulation toggle
  • 191. Javascript Phase (JSON)
  • 192. Javascript Phase (JSON) {Type:{“language”:”javascript”,
  • 193. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}}
  • 194. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}}
  • 195. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}} • Type: “map” or “reduce”
  • 196. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}} • Type: “map” or “reduce” • Source: Source for anonymous function
  • 197. Javascript Phase (JSON) {Type:{“language”:”javascript”, “source”: Source,“keep”:Flag}} • Type: “map” or “reduce” • Source: Source for anonymous function • Flag: Boolean accumulation toggle
  • 198. Javascript Phase (JSON)
  • 199. Javascript Phase (JSON) {Type:{“language”:”javascript”,
  • 200. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}}
  • 201. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}}
  • 202. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}} • Type: “map” or “reduce”
  • 203. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}} • Type: “map” or “reduce” • Name: String name of Javascript function
  • 204. Javascript Phase (JSON) {Type:{“language”:”javascript”, “name”:Name,“keep”:Flag}} • Type: “map” or “reduce” • Name: String name of Javascript function • Flag: Boolean accumulation toggle
  • 205. Putting It Together {“inputs”: [[“stocks”, “goog”]], “query”: [{“map”:{“language”:”javascript”, “name”: “Riak.mapValuesJson”}, “keep”: true}]}
  • 206. Putting It Together {“inputs”: [[“stocks”, “goog”], [“stocks”, “csco”]], “query”: [{“map”:{“language”:”javascript”, “name”: “Riak.mapValuesJson”}, “keep”: true}]}
  • 207. Putting It Together {“inputs”: “stocks”, “query”: [{“map”:{“language”:”javascript”, “name”: “App.extractTickers”, “arg”: “GOOG”}, “keep”: false}, {“reduce”:{“language”:”javascript, “name”: “Riak.reduceMin”}, “keep”: true}]}
  • 209. Thank You Kevin A. Smith Email: ksmith@basho.com Twitter: @kevsmith