SlideShare a Scribd company logo
Querying Riak Just Got Easier
 Secondary Indices in Riak


         OSCON Data
 Portland, Oregon · July 2011


  Basho Technologies   rusty@basho.com
     Rusty Klophaus    twitter: rustyio
tl;dr:

Secondary Indices fundamentally
 change data modeling in Riak.

 Model one-to-one, one-to-many,
 or many-to-many relationships
     simply and efficiently.
               2
But first, a little bit
about tradeoffs and NoSQL.




            3
Querying Riak Just Got Easier - Introducing Secondary Indices
Which one would you choose?

It depends on their abilities.
It also depends on the quest.


                    Fools!




              5
Rule #1
  Your character will fare better
    in certain environments
depending on his (or her) abilities.




                 6
Rule #2
There are always tradeoffs




            7
Databases are like
    RPG character classes.

They focus on certain abilities.




               8
Database Abilities (1/2)
Schema
 Flexible Schema ⟩ Pre-Defined Schema ⟩ Typed Fields ⟩ Untyped Fields ⟩ Blob

Operation Skew
 Mostly Writes ⟩ 50/50 ⟩ Mostly Reads

Disk Persistence
 Every Operation ⟩ Delayed Batch ⟩ Flush ⟩ Never

Transactions
 Global ⟩ Table ⟩ Object ⟩ None

Data Relations
 Ad-Hoc ⟩ Pre-Defined ⟩ None

Operation Order
 Random ⟩ Sequential



                                            9
Database Abilities (2/2)
Secondary Queries
 Ad-Hoc ⟩ Pre-Determined ⟩ None

Native Data Types
 Tables ⟩ XML ⟩ JSON ⟩ Text ⟩ Blob

Scalable
 Data Center ⟩ Cluster ⟩ Single Machine

Failure-Tolerance
 Data Center ⟩ Network ⟩ Machine ⟩ Disk ⟩ Sector ⟩ None

Stability
 Predictable Latency ⟩ Variable Latency

Performance
 Ops Per Second
And so on...
                                          10
For a long time, industry focused on just one
     manifestation of database abilities:


                  SELECT * FROM Quests




           «Relational Database»

                         11
The World Has Changed
Global Internet
 “Everyone is connected...”




                                                    ile
                                                           Soc
                                                 ob


                                                            ial
                                                M
Mobile Computing
                                                    Global
 “...all the time...”

Social Networking
 “...producing more data (in more varieties) than ever.”




                                  12
NoSQL is about alternatives.

     Focus on different
  abilities, environments,
        and tradeoffs.



             13
As a database consumer,
  you need to understand the
basic tradeoffs of each solution.

 In turn, database producers
     should strive to make
     those tradeoffs clear.


               14
There are always tradeoffs.

Databases that claim to do
everything well are lying.




            15
Where Does Riak Fit?




         16
17
Have you ever been burned by:

     hardware failure?
    overloaded servers?
    emergencies at 2am?



             17
Does your quest involve:

SLAs that mention uptime or latency?
  data that you can’t afford to lose?




                 18
If so, Riak’s tradeoffs will make sense.

          If not, they won’t.




                   19
Riak KV - Some Tradeoffs (1/3)
Amazon’s Dynamo Architecture
 ✔Distributed, scalable, no single point of failure.
 ✘ No transactions; trade strong consistency for eventual consistency.




                                  20
Riak KV - Some Tradeoffs (2/3)
Extremely Focused on Operations
 ✔Simple to install, manage, connect a cluster.
 ✔Has been called “plumbing”, ie: it just works.
 ✘ Historically, developer-facing features lagged behind.
   This is rapidly changing.


               # Scale out...
               riak-admin join nodename@hostname

               # Scale back in...
               riak-admin leave



                                   21
Riak KV - Some Tradeoffs (3/3)
Key/Value Model
 ✔Simple, straightforward, content-type agnostic.
 ✘ More difficult to discover your data. (Queryability.)




            Let’s dive deeper into “queryability.”




                                  22
Current Options for Querying Riak
MapReduce
 Provide set of starting keys, filter via map.
 MapReduce is meant for calculations/aggregations, not queries.

Riak Search
 Full-text search in Riak.
 Opinionated, assumes your document is prose.

Roll Your Own Indices
 Difficult to get right.
 More code to maintain.
 Often introduces SPOFs.

                                 23
New Feature: Secondary Indices




              24
What are Secondary Indices?




            25
What Are Secondary Indices?
Goals
 Provide *simple* indexing on Riak objects.
 Maintain Riak’s operational advantages.
 Make a developer’s life easier.

How Does It Work?
 At write time, tag your data with key/value metadata.
 Query the metadata, get matching objects.




                                 26
For example...

BUCKET KEY            VALUE           INDEX METADATA

loot     gauntlet24                   category: armor
                                      price: 400
                                      ...



                      “Gauntlets of
                        Shininess”


                          27
Index an Object

# Store an object with:
#   Bucket: loot
#   Key:     gauntlet24
#   Fields:
#      - category: armor
#      - price: 400
curl 
  -X PUT 
  -d "OPAQUE_VALUE" 
  -H "x-riak-index-category_bin: armor" 
  -H "x-riak-index-price_int: 400" 
  http://127.0.0.1:8098/riak/loot/gauntlet24



                               28
Query the Index

# Query for category_bin = "armor"
curl 
  http://127.0.0.1:8098/buckets/loot/index/category_bin/armor

{"keys":["gauntlet24"]}



# Query for price_int between 300 and 500
curl 
  http://127.0.0.1:8098/buckets/loot/index/price_int/300/500

{"keys":["gauntlet24"]}




                                29
Query Syntax
 /buckets/$BUCKET/index/$FIELDNAME/$VALUE
 /buckets/$BUCKET/index/$FIELDNAME/$START/$END

$BUCKET
 Bucket to query.
$FIELDNAME
 Must end with “_bin” for binaries, “_int” for integers.
 Special field $key for key range lookups.
$VALUE / $START / $END
 Equality or range queries.



                                   30
Data Modeling with
Secondary Indices




        31
Key/Value Lookups
     Retrieve a user’s session.
     Retrieve an object by key.



         sessions/                 {
         8b6cfaa                       foo: "bar",
                                       ...
                                   }




                              32
Key/Value Lookups                               Use Case
     Retrieve a user’s session.
     Retrieve an object by key.            Generic Case
                            Object

         sessions/                   {
         8b6cfaa                         foo: "bar",
                                         ...
                                     }


               Key                         Value




                              33
Alternate Keys / One-to-One Relationships
      Retrieve a user by username or by email address.
      An object has multiple names.

    users/                    {
    rusty                         username: "rusty",
                                  email: "rusty@basho.com",
                                  twitter: "rustyio",
                                  ...
                              }

    emails/                   "rusty"
    rusty@basho.com




                             34
Alternate Keys
                                                                Wit
     Retieve a user by username or by email address.         Seco h
                                                                  nda
                                                             Ind      ry
     An object has multiple names.                               ices
                                                                      !

    users/                   {
    rusty                        username: "rusty",
                                 email: "rusty@basho.com",
                                 twitter: "rustyio",
                                 ...
                             }
                             Indexes:
                               email_bin: rusty@basho.com
                               twitter_bin: rustyio




                            35
Ownership / One-to-Many Relationships
      A person has many cars.
      Parent has a *small* number of children.

    people/                    {
    frank                          cars: [
                                     {
                                       plate: "ET7-B928",
                                       color: "red",
                                       type: "corvette",
                                     }
                                     ...
                                   ]
                               }




                              36
Ownership / One-to-Many Relationships
                                                                Wit
      A person has many cars.                                Seco h
                                                                  nda
                                                             Ind      ry
      Parent has a *small* number of children.                   ices
                                                                      !

    people/                    {
    frank                          cars: [
                                     {
                                        plate: "ET7-B928",
                                        color: "red",
                                        type: "corvette",
                                     },
                                     ...
                                   ]
                               }
                               Indexes:
                                 cars_plate_bin: ET7-B928
                                 cars_plate_bin: BUB-7911
                                 ...
                              37
Ownership / One-to-Many Relationships
      A user has many status updates.
      Parent has a *large* number of children.
    users/                    {
    rustyio                       status_updates: [
                                    "18258713",
                                    "87187597",
                                    "71117389",
                                    ...
                                  ]
                              }

    statuses/                 {
    18258713                      author: "rustyio"
                                  reply_to: "barackobama"
                                  text: "Sorry, can't hang out
                                         now, I'm speaking at
                                         OSCON Data."
                              }38
Ownership / One-to-Many Relationships
                                                                Wit
      A user has many status updates.                        Seco h
                                                                  nda
                                                             Ind      ry
      Parent has a *large* number of children.                   ices
                                                                      !
   users/                     {
   rustyio                        ...
                              }

   statuses/                  {
   18258713                       author: "rustyio"
                                  reply_to: "barackobama"
                                  text: "Sorry, can't hang out
                                         now, I'm speaking at
                                         OSCON Data."
                              }
                              Indexes:
                                author_bin: rustyio
                                reply_to_bin: barackobama
                               39
Membership / Many-to-Many Relationships
      A user joins one or more clubs, a club has users.
      A group has many members.

   users/                    {
   rusty                         clubs: [
                                   "dc_larping_club",
                                   ...
                                 ]
                             }

   clubs/                    {
   dc_larpers                    users: [
                                   "rusty",
                                   "frank"
                                 ]
                             }

                              40
Membership / Many-to-Many Relationships
                                                              Wit
      A user joins one or more clubs, a club has users.    Seco h
                                                                nda
                                                           Ind      ry
      A group has many members.                                ices
                                                                    !

   users/                    {
   rusty                         clubs: [
                                   "dc_larping_club",
                                   "nascar_fans",
                                   ...
                                 ]
                             }
                             Indexes:
                               club_bin: dc_larping_club
                               club_bin: nascar_fans

   clubs/
                             ...
   dc_larpers
                              41
What Were The Challenges?




           42
Challenge: Ambitious Prototyping (1/5)
Why Difficult?
 Early prototypes contained support for:
 • A SQL-like query language (RQL), with compound queries
 • Sorting and pagination, with intelligent caching
 • Inline map/reduce
 • Extensible data types
 Arguably too clever.
 Allowed developers to shoot selves in foot.

How Solved?
 Ruthlessly cut features & simplify.

                                   43
Challenge: Data Types (2/5)
Why Difficult?
 What type is a given field? Naming convention, or global dictionary?
 What if the user wants to change the type?
 What if the user provides a value of the wrong type?

How Solved?
 Field type determined by suffix. (field1_bin, field2_int)
 Different type == different field name.
 Pre-commit hook to validate data types.




                                 44
Challenge: Disk Based Storage (3/5)
Why Difficult?
 Disk is slow. (http://highscalability.com/numbers-everyone-should-know)
 Need data structures that are both read and write efficient, for data
 of unpredictable sizes and shapes.

How Solved?
 Leverage merge_index (data engine from Riak Search).
 Investigate LevelDB (library from Google)




                                   45
Challenge: Atomicity (4/5)
Why Difficult?
 Need to keep the index synchronized with the object value.
 Account for eventual consistency, siblings, handoff, replication, etc.
 What happens during node failure / partial cluster situations?

How Solved?
 Make the KV object the authoritative data.
 Indexed data is discarded if the object moves.




                                    46
Challenge: System is Distributed (5/5)
Why Difficult?
 Index is split over many different partitions.
 *Don’t* need to query every partition.
 *Do* need to be smart about which partitions to query.

How Solved?
 Extend riak_core (distribution layer) with ability to broadcast
 command to covering set of partitions.
 (h/t to Kelly McLaughlin, @_klm)




                                  47
Indexing

                  Client
                   API                        User tags the object with metadata.



Cluster

            Write Coordinator
                                                Validate that metadata parses.
            riak_kv_put_fsm

                  PUT
                 Request
 Node

             VNode Coordinator
          riak_core_vnode_master


   VNode (Virtual Node)

                Core VNode
             riak_core_vnode




                 KV VNode
              riak_kv_vnode



                 Backend                           Stores object in bitcask,
          riak_kv_index_backend                index metadata in merge_index.
Querying

                  Client                         User issues a query
                   API                            against metadata.


Cluster
             Query Coordinator
           riak_index_query_fsm


              Coverage Logic
          riak_core_coverage_fsm

                  Query
                 Request
 Node

             VNode Coordinator
          riak_core_vnode_master


   VNode (Virtual Node)

                Core VNode
             riak_core_vnode



                 KV VNode
              riak_kv_vnode



                 Backend                      Runs query against index,
          riak_kv_index_backend                 replies with results.
Next Steps
Publish
 We are open source. Code is available now (for the foolhardy.)
 Beta version soon (for the adventurous.)
 Included in Riak version 1.0 (for the masses.)




                                 50
API design is like sex:
     Make one mistake and
support it for the rest of your life.
           - @joshbloch

(Everything is subject to change.)


                 51
About Basho Technologies
<plug type=“shameless”>
 • Distributed company: ~30 people
 • Cambridge, MA / San Francisco, CA / Reston, VA
 • “We hate downtime, we hate overtime.”
 • Riak KV (Open Source)
 • Riak Support, Services, and Enterprise Features ($)

</plug>




                                 52
Thanks! / Questions?
 Also at OSCON...


 Rusty Klophaus        Mark Phillups
rusty@basho.com        mark@basho.com
         @rustyio      @pharkmillups




    Ryan Zezeski        Tony Falco
rzezeski@basho.com      tony@basho.com
        @rzezeski       @antonyfalco
END

More Related Content

PDF
Riak from Small to Large
PDF
Intro To Couch Db
PDF
02 20180605 meetup_fdw_v1
PDF
OrientDB introduction - NoSQL
PPTX
OrientDB the graph database
PDF
Icinga 2009 at OSMC
PDF
Elastify you application: from SQL to NoSQL in less than one hour!
PDF
Relevance trilogy may dream be with you! (dec17)
Riak from Small to Large
Intro To Couch Db
02 20180605 meetup_fdw_v1
OrientDB introduction - NoSQL
OrientDB the graph database
Icinga 2009 at OSMC
Elastify you application: from SQL to NoSQL in less than one hour!
Relevance trilogy may dream be with you! (dec17)

What's hot (7)

PDF
OrientDB
PDF
NoSQL and JavaScript: a Love Story
PDF
Search@airbnb
PPTX
Couch db
PDF
N hidden gems in hippo forge and experience plugins (dec17)
PDF
N hidden gems in forge (as of may '17)
ODP
OrientDB
NoSQL and JavaScript: a Love Story
Search@airbnb
Couch db
N hidden gems in hippo forge and experience plugins (dec17)
N hidden gems in forge (as of may '17)
Ad

Viewers also liked (14)

PPTX
A little about Message Queues - Boston Riak Meetup
PPTX
Humans & Machines Ethics Canvas
PPS
Subway in Lisbon
PDF
Performance-Based Funding – A New Era in Accountability?
KEY
Social Media Strategies for Schools for OASBO Conference
PDF
Ux och design som konverterar del 3
PDF
The threat to small business retirement savings
PDF
"Unë do t'ju tregoj të ardhmen 1"
PPT
Linkedin Slideshare Driving Force Btec
PPT
CP2 Newport Beach 2010
PDF
Презентация агентства PRCI.Storytellers
PDF
Storytelling + Experiences: Ingredients of a Successful Redesign
PPTX
Semantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
PDF
Glassdoor Summit: Ana Recio
A little about Message Queues - Boston Riak Meetup
Humans & Machines Ethics Canvas
Subway in Lisbon
Performance-Based Funding – A New Era in Accountability?
Social Media Strategies for Schools for OASBO Conference
Ux och design som konverterar del 3
The threat to small business retirement savings
"Unë do t'ju tregoj të ardhmen 1"
Linkedin Slideshare Driving Force Btec
CP2 Newport Beach 2010
Презентация агентства PRCI.Storytellers
Storytelling + Experiences: Ingredients of a Successful Redesign
Semantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Glassdoor Summit: Ana Recio
Ad

Similar to Querying Riak Just Got Easier - Introducing Secondary Indices (20)

PPT
MongoDb and Windows Azure
PPT
OOP programming for engineering students
PPTX
Elastic search and Symfony3 - A practical approach
PDF
Freebase: Wikipedia Mining 20080416
PDF
Terrastore - A document database for developers
PDF
Slides: Moving from a Relational Model to NoSQL
PDF
PDF
Application development with Oracle NoSQL Database 3.0
PDF
Couchdb Nosql
PDF
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
PDF
MongoDB at FrozenRails
PDF
Introduction to MongoDB
PDF
Schema on read is obsolete. Welcome metaprogramming..pdf
PPTX
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
KEY
Managing Social Content with MongoDB
PDF
CouchDB Open Source Bridge
PDF
Schema registries and Snowplow
PPTX
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
PDF
MongoDB Meetup
PDF
Bids talk 9.18
MongoDb and Windows Azure
OOP programming for engineering students
Elastic search and Symfony3 - A practical approach
Freebase: Wikipedia Mining 20080416
Terrastore - A document database for developers
Slides: Moving from a Relational Model to NoSQL
Application development with Oracle NoSQL Database 3.0
Couchdb Nosql
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
MongoDB at FrozenRails
Introduction to MongoDB
Schema on read is obsolete. Welcome metaprogramming..pdf
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
Managing Social Content with MongoDB
CouchDB Open Source Bridge
Schema registries and Snowplow
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB Meetup
Bids talk 9.18

More from Rusty Klophaus (9)

PDF
Everybody Polyglot! - Cross-Language RPC with Erlang
KEY
Winning the Erlang Edit•Build•Test Cycle
PDF
Masterless Distributed Computing with Riak Core - EUC 2010
PDF
Riak - From Small to Large - StrangeLoop
PDF
Riak - From Small to Large
PDF
Riak Core: Building Distributed Applications Without Shared State
PDF
Riak Search - Erlang Factory London 2010
PDF
Riak Search - Berlin Buzzwords 2010
PDF
Getting Started with Riak - NoSQL Live 2010 - Boston
Everybody Polyglot! - Cross-Language RPC with Erlang
Winning the Erlang Edit•Build•Test Cycle
Masterless Distributed Computing with Riak Core - EUC 2010
Riak - From Small to Large - StrangeLoop
Riak - From Small to Large
Riak Core: Building Distributed Applications Without Shared State
Riak Search - Erlang Factory London 2010
Riak Search - Berlin Buzzwords 2010
Getting Started with Riak - NoSQL Live 2010 - Boston

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Machine learning based COVID-19 study performance prediction
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
KodekX | Application Modernization Development
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation theory and applications.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
MYSQL Presentation for SQL database connectivity
The Rise and Fall of 3GPP – Time for a Sabbatical?
Machine learning based COVID-19 study performance prediction
“AI and Expert System Decision Support & Business Intelligence Systems”
KodekX | Application Modernization Development
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation theory and applications.pdf
NewMind AI Weekly Chronicles - August'25 Week I
MIND Revenue Release Quarter 2 2025 Press Release
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Approach and Philosophy of On baking technology
Review of recent advances in non-invasive hemoglobin estimation

Querying Riak Just Got Easier - Introducing Secondary Indices

  • 1. Querying Riak Just Got Easier Secondary Indices in Riak OSCON Data Portland, Oregon · July 2011 Basho Technologies rusty@basho.com Rusty Klophaus twitter: rustyio
  • 2. tl;dr: Secondary Indices fundamentally change data modeling in Riak. Model one-to-one, one-to-many, or many-to-many relationships simply and efficiently. 2
  • 3. But first, a little bit about tradeoffs and NoSQL. 3
  • 5. Which one would you choose? It depends on their abilities. It also depends on the quest. Fools! 5
  • 6. Rule #1 Your character will fare better in certain environments depending on his (or her) abilities. 6
  • 7. Rule #2 There are always tradeoffs 7
  • 8. Databases are like RPG character classes. They focus on certain abilities. 8
  • 9. Database Abilities (1/2) Schema Flexible Schema ⟩ Pre-Defined Schema ⟩ Typed Fields ⟩ Untyped Fields ⟩ Blob Operation Skew Mostly Writes ⟩ 50/50 ⟩ Mostly Reads Disk Persistence Every Operation ⟩ Delayed Batch ⟩ Flush ⟩ Never Transactions Global ⟩ Table ⟩ Object ⟩ None Data Relations Ad-Hoc ⟩ Pre-Defined ⟩ None Operation Order Random ⟩ Sequential 9
  • 10. Database Abilities (2/2) Secondary Queries Ad-Hoc ⟩ Pre-Determined ⟩ None Native Data Types Tables ⟩ XML ⟩ JSON ⟩ Text ⟩ Blob Scalable Data Center ⟩ Cluster ⟩ Single Machine Failure-Tolerance Data Center ⟩ Network ⟩ Machine ⟩ Disk ⟩ Sector ⟩ None Stability Predictable Latency ⟩ Variable Latency Performance Ops Per Second And so on... 10
  • 11. For a long time, industry focused on just one manifestation of database abilities: SELECT * FROM Quests «Relational Database» 11
  • 12. The World Has Changed Global Internet “Everyone is connected...” ile Soc ob ial M Mobile Computing Global “...all the time...” Social Networking “...producing more data (in more varieties) than ever.” 12
  • 13. NoSQL is about alternatives. Focus on different abilities, environments, and tradeoffs. 13
  • 14. As a database consumer, you need to understand the basic tradeoffs of each solution. In turn, database producers should strive to make those tradeoffs clear. 14
  • 15. There are always tradeoffs. Databases that claim to do everything well are lying. 15
  • 16. Where Does Riak Fit? 16
  • 17. 17
  • 18. Have you ever been burned by: hardware failure? overloaded servers? emergencies at 2am? 17
  • 19. Does your quest involve: SLAs that mention uptime or latency? data that you can’t afford to lose? 18
  • 20. If so, Riak’s tradeoffs will make sense. If not, they won’t. 19
  • 21. Riak KV - Some Tradeoffs (1/3) Amazon’s Dynamo Architecture ✔Distributed, scalable, no single point of failure. ✘ No transactions; trade strong consistency for eventual consistency. 20
  • 22. Riak KV - Some Tradeoffs (2/3) Extremely Focused on Operations ✔Simple to install, manage, connect a cluster. ✔Has been called “plumbing”, ie: it just works. ✘ Historically, developer-facing features lagged behind. This is rapidly changing. # Scale out... riak-admin join nodename@hostname # Scale back in... riak-admin leave 21
  • 23. Riak KV - Some Tradeoffs (3/3) Key/Value Model ✔Simple, straightforward, content-type agnostic. ✘ More difficult to discover your data. (Queryability.) Let’s dive deeper into “queryability.” 22
  • 24. Current Options for Querying Riak MapReduce Provide set of starting keys, filter via map. MapReduce is meant for calculations/aggregations, not queries. Riak Search Full-text search in Riak. Opinionated, assumes your document is prose. Roll Your Own Indices Difficult to get right. More code to maintain. Often introduces SPOFs. 23
  • 26. What are Secondary Indices? 25
  • 27. What Are Secondary Indices? Goals Provide *simple* indexing on Riak objects. Maintain Riak’s operational advantages. Make a developer’s life easier. How Does It Work? At write time, tag your data with key/value metadata. Query the metadata, get matching objects. 26
  • 28. For example... BUCKET KEY VALUE INDEX METADATA loot gauntlet24 category: armor price: 400 ... “Gauntlets of Shininess” 27
  • 29. Index an Object # Store an object with: # Bucket: loot # Key: gauntlet24 # Fields: # - category: armor # - price: 400 curl -X PUT -d "OPAQUE_VALUE" -H "x-riak-index-category_bin: armor" -H "x-riak-index-price_int: 400" http://127.0.0.1:8098/riak/loot/gauntlet24 28
  • 30. Query the Index # Query for category_bin = "armor" curl http://127.0.0.1:8098/buckets/loot/index/category_bin/armor {"keys":["gauntlet24"]} # Query for price_int between 300 and 500 curl http://127.0.0.1:8098/buckets/loot/index/price_int/300/500 {"keys":["gauntlet24"]} 29
  • 31. Query Syntax /buckets/$BUCKET/index/$FIELDNAME/$VALUE /buckets/$BUCKET/index/$FIELDNAME/$START/$END $BUCKET Bucket to query. $FIELDNAME Must end with “_bin” for binaries, “_int” for integers. Special field $key for key range lookups. $VALUE / $START / $END Equality or range queries. 30
  • 33. Key/Value Lookups Retrieve a user’s session. Retrieve an object by key. sessions/ { 8b6cfaa foo: "bar", ... } 32
  • 34. Key/Value Lookups Use Case Retrieve a user’s session. Retrieve an object by key. Generic Case Object sessions/ { 8b6cfaa foo: "bar", ... } Key Value 33
  • 35. Alternate Keys / One-to-One Relationships Retrieve a user by username or by email address. An object has multiple names. users/ { rusty username: "rusty", email: "rusty@basho.com", twitter: "rustyio", ... } emails/ "rusty" rusty@basho.com 34
  • 36. Alternate Keys Wit Retieve a user by username or by email address. Seco h nda Ind ry An object has multiple names. ices ! users/ { rusty username: "rusty", email: "rusty@basho.com", twitter: "rustyio", ... } Indexes: email_bin: rusty@basho.com twitter_bin: rustyio 35
  • 37. Ownership / One-to-Many Relationships A person has many cars. Parent has a *small* number of children. people/ { frank cars: [ { plate: "ET7-B928", color: "red", type: "corvette", } ... ] } 36
  • 38. Ownership / One-to-Many Relationships Wit A person has many cars. Seco h nda Ind ry Parent has a *small* number of children. ices ! people/ { frank cars: [ { plate: "ET7-B928", color: "red", type: "corvette", }, ... ] } Indexes: cars_plate_bin: ET7-B928 cars_plate_bin: BUB-7911 ... 37
  • 39. Ownership / One-to-Many Relationships A user has many status updates. Parent has a *large* number of children. users/ { rustyio status_updates: [ "18258713", "87187597", "71117389", ... ] } statuses/ { 18258713 author: "rustyio" reply_to: "barackobama" text: "Sorry, can't hang out now, I'm speaking at OSCON Data." }38
  • 40. Ownership / One-to-Many Relationships Wit A user has many status updates. Seco h nda Ind ry Parent has a *large* number of children. ices ! users/ { rustyio ... } statuses/ { 18258713 author: "rustyio" reply_to: "barackobama" text: "Sorry, can't hang out now, I'm speaking at OSCON Data." } Indexes: author_bin: rustyio reply_to_bin: barackobama 39
  • 41. Membership / Many-to-Many Relationships A user joins one or more clubs, a club has users. A group has many members. users/ { rusty clubs: [ "dc_larping_club", ... ] } clubs/ { dc_larpers users: [ "rusty", "frank" ] } 40
  • 42. Membership / Many-to-Many Relationships Wit A user joins one or more clubs, a club has users. Seco h nda Ind ry A group has many members. ices ! users/ { rusty clubs: [ "dc_larping_club", "nascar_fans", ... ] } Indexes: club_bin: dc_larping_club club_bin: nascar_fans clubs/ ... dc_larpers 41
  • 43. What Were The Challenges? 42
  • 44. Challenge: Ambitious Prototyping (1/5) Why Difficult? Early prototypes contained support for: • A SQL-like query language (RQL), with compound queries • Sorting and pagination, with intelligent caching • Inline map/reduce • Extensible data types Arguably too clever. Allowed developers to shoot selves in foot. How Solved? Ruthlessly cut features & simplify. 43
  • 45. Challenge: Data Types (2/5) Why Difficult? What type is a given field? Naming convention, or global dictionary? What if the user wants to change the type? What if the user provides a value of the wrong type? How Solved? Field type determined by suffix. (field1_bin, field2_int) Different type == different field name. Pre-commit hook to validate data types. 44
  • 46. Challenge: Disk Based Storage (3/5) Why Difficult? Disk is slow. (http://highscalability.com/numbers-everyone-should-know) Need data structures that are both read and write efficient, for data of unpredictable sizes and shapes. How Solved? Leverage merge_index (data engine from Riak Search). Investigate LevelDB (library from Google) 45
  • 47. Challenge: Atomicity (4/5) Why Difficult? Need to keep the index synchronized with the object value. Account for eventual consistency, siblings, handoff, replication, etc. What happens during node failure / partial cluster situations? How Solved? Make the KV object the authoritative data. Indexed data is discarded if the object moves. 46
  • 48. Challenge: System is Distributed (5/5) Why Difficult? Index is split over many different partitions. *Don’t* need to query every partition. *Do* need to be smart about which partitions to query. How Solved? Extend riak_core (distribution layer) with ability to broadcast command to covering set of partitions. (h/t to Kelly McLaughlin, @_klm) 47
  • 49. Indexing Client API User tags the object with metadata. Cluster Write Coordinator Validate that metadata parses. riak_kv_put_fsm PUT Request Node VNode Coordinator riak_core_vnode_master VNode (Virtual Node) Core VNode riak_core_vnode KV VNode riak_kv_vnode Backend Stores object in bitcask, riak_kv_index_backend index metadata in merge_index.
  • 50. Querying Client User issues a query API against metadata. Cluster Query Coordinator riak_index_query_fsm Coverage Logic riak_core_coverage_fsm Query Request Node VNode Coordinator riak_core_vnode_master VNode (Virtual Node) Core VNode riak_core_vnode KV VNode riak_kv_vnode Backend Runs query against index, riak_kv_index_backend replies with results.
  • 51. Next Steps Publish We are open source. Code is available now (for the foolhardy.) Beta version soon (for the adventurous.) Included in Riak version 1.0 (for the masses.) 50
  • 52. API design is like sex: Make one mistake and support it for the rest of your life. - @joshbloch (Everything is subject to change.) 51
  • 53. About Basho Technologies <plug type=“shameless”> • Distributed company: ~30 people • Cambridge, MA / San Francisco, CA / Reston, VA • “We hate downtime, we hate overtime.” • Riak KV (Open Source) • Riak Support, Services, and Enterprise Features ($) </plug> 52
  • 54. Thanks! / Questions? Also at OSCON... Rusty Klophaus Mark Phillups rusty@basho.com mark@basho.com @rustyio @pharkmillups Ryan Zezeski Tony Falco rzezeski@basho.com tony@basho.com @rzezeski @antonyfalco
  • 55. END