SlideShare a Scribd company logo
Cassandra Basics
              Indexing

     Benjamin Black, b@b3k.us
Relational stores are
SCHEMA ORIENTED
Start from your SCHEMA &
WORK FORWARDS
Column stores are
QUERY ORIENTED
Start from your QUERIES &
WORK BACKWARDS
AT SCALE
AT SCALE
           Denormalization is
              THE NORM
AT SCALE
AT SCALE
           Everything depends on
               THE INDICES
Cassandra is an
INDEX CONSTRUCTION KIT
Column Family
Two-level Map

key: {
  column: value,
  column: value,
  ...
 }
Super Column Family
Three-level Map
key: {
   supercolumn: {
       column:value,
      column: value
   },
   supercolumn: {
     ...
   }
 }
column sorting defined by
         CompareWith/
CompareSubcolumnsWith
TimeUUIDType
  UTF8Type
                ASCIIType
LongType

     LexicalUUIDType
row placement determined by
             Partitioner
RandomPartitioner
Place based on MD5 of key




        OrderPreservingPartitioner
               Place based on actual key
Rows are sorted by key on each node
Regardless of partitioner
One example in
TWO ACTS
Prelude
A USER DATABASE
<ColumnFamily Name=”Users”
       CompareWith=”UTF8Type” />
“b”:    {“name”:”Ben”, “street”:”1234 Oak St.”,
        “city”:”Seattle”, “state”:”WA”}
“jason”: {”name”:”Jason”, “street”:”456 First Ave.”,
        “city”:”Bellingham”, “state”:”WA”}
“zack”:     {”name”: “Zack”, “street”: “4321 Pine St.”,
          “city”: “Seattle”, “state”: “WA”}
“jen1982”: {”name”:”Jennifer”, “street”:”1120 Foo Lane”,
         “city”:”San Francisco”, “state”:”CA”}
“albert”: {”name”:”Albert”, “street”:”2364 South St.”,
         “city”:”Boston”, “state”:”MA”}
SELECT name FROM Users
WHERE state=”WA”
SELECT name FROM Users
               WHERE state=”WA”

How is WHERE clause
formed?
Act One
Supercolumn Indexing
<ColumnFamily Name=”LocationUserIndexSCF”
       CompareWith=”UTF8Type”
       CompareSubcolumnsWith=”UTF8Type”
       ColumnType=”Super” />
[state]: {
  [city1]: {[name1]:[user1], [name2]:[user2], ... },
  [city2]: {[name3]:[user3], [name4]:[user4], ... },
  ...
  [cityX]: {[name5]:[user5], [name6]:[user6], ... }
}
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key


“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key
                 Super Column

“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key
                                     Colum
                 Super Column
                                     n
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key
                                     Colum
                 Super Column                Value
                                     n
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Show me
EVERYONE IN WASHINGTON
get(:LocationUserIndexSCF, ‘WA’)
{

   “Bellingham”: {”Jason”: “jason”},

   “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Act Two
Composite Key Indexing
Order Preserving Partitioner
                          +
        Range Queries
<ColumnFamily Name=”LocationUserIndexCF”
       CompareWith=”UTF8Type” />
[state1]/[city1]:   {[name1]:[user1], [name2]:[user2], ... }
[state1]/[city2]:   {[name3]:[user3], [name4]:[user4], ... }
[state2]/[city1]:   {[name5]:[user5], [name6]:[user6], ... }
...
[stateX]/[cityY]:   {[name7]:[user7], [name8]:[user8], ... }
“CA/San Francisco”: {”Jennifer”: “jen1982”}
“MA/Boston”: {”Albert”: “albert”}
“WA/Bellingham”: {”Jason”: “jason”}
“WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
Show me
EVERYONE IN WASHINGTON
get_range(:LocationUserIndexCF, {:start: 'WA',
                          :finish:'WB'})
{
    ”WA/Bellingham”: {”Jason”: “jason”},
    “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
}
Finale
BUILD SOMETHING AWESOME
(This part is up to you)
Appendix
EXAMPLE KEYSPACE
<Keyspace Name="UserDb">
  <ColumnFamily Name="Users"
          CompareWith="UTF8Type" />

  <ColumnFamily Name="LocationUserIndexSCF"

   
         CompareWith="UTF8Type"
  
       
     CompareSubcolumnsWith="UTF8Type"

   
         ColumnType="Super" />

   
  <ColumnFamily Name="LocationUserIndexCF"

   
         CompareWith="UTF8Type" />

   
  <ReplicaPlacementStrategy>
      org.apache.cassandra.locator.RackUnawareStrategy
  </ReplicaPlacementStrategy>
  <ReplicationFactor>1</ReplicationFactor>
  <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>

More Related Content

ODP
TXT
Exemple de création de base
DOC
SetFocus Portfolio
PDF
Growing jQuery
PDF
Cassandra Explained
PDF
[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...
PDF
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
PDF
Cassandra Day SV 2014: Basic Operations with Apache Cassandra
Exemple de création de base
SetFocus Portfolio
Growing jQuery
Cassandra Explained
[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Cassandra Day SV 2014: Basic Operations with Apache Cassandra

Viewers also liked (20)

PDF
Cassandra
PDF
Graphite cluster setup blueprint
PDF
Understanding BYOE and How Today's User Experience Drives Value for UC
PPTX
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
PPTX
What is a DMP
PDF
Highly Available Graphite
PDF
Cassandra Basics, Counters and Time Series Modeling
PDF
Cassandra and Spark
PPTX
data science toolkit 101: set up Python, Spark, & Jupyter
PDF
Introduction to Apache Spark
PPTX
Presentation of Apache Cassandra
PDF
Introduction to Cassandra - Denver
KEY
Developers summit cassandraで見るNoSQL
PDF
Intro to py spark (and cassandra)
PDF
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
PDF
Python & Cassandra - Best Friends
PDF
Diagnosing Problems in Production: Cassandra Summit 2014
PDF
Intro to Cassandra
PDF
The Cassandra Distributed Database
PDF
PySpark Cassandra - Amsterdam Spark Meetup
Cassandra
Graphite cluster setup blueprint
Understanding BYOE and How Today's User Experience Drives Value for UC
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
What is a DMP
Highly Available Graphite
Cassandra Basics, Counters and Time Series Modeling
Cassandra and Spark
data science toolkit 101: set up Python, Spark, & Jupyter
Introduction to Apache Spark
Presentation of Apache Cassandra
Introduction to Cassandra - Denver
Developers summit cassandraで見るNoSQL
Intro to py spark (and cassandra)
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Python & Cassandra - Best Friends
Diagnosing Problems in Production: Cassandra Summit 2014
Intro to Cassandra
The Cassandra Distributed Database
PySpark Cassandra - Amsterdam Spark Meetup
Ad

Similar to Cassandra Basics: Indexing (10)

PPTX
Building Your First Java Application with MongoDB
PDF
Elasticsearch for SQL Users
PPTX
MongoDB - Features and Operations
PDF
Json at work overview and ecosystem-v2.0
PDF
Elasticsearch for SQL Users
PDF
Embedding a language into string interpolator
PDF
Native json in the Cache' ObjectScript 2016.*
PPTX
The Aggregation Framework
PDF
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
Building Your First Java Application with MongoDB
Elasticsearch for SQL Users
MongoDB - Features and Operations
Json at work overview and ecosystem-v2.0
Elasticsearch for SQL Users
Embedding a language into string interpolator
Native json in the Cache' ObjectScript 2016.*
The Aggregation Framework
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
Ad

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation theory and applications.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
KodekX | Application Modernization Development
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf
Encapsulation theory and applications.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks

Cassandra Basics: Indexing

  • 1. Cassandra Basics Indexing Benjamin Black, b@b3k.us
  • 3. Start from your SCHEMA & WORK FORWARDS
  • 5. Start from your QUERIES & WORK BACKWARDS
  • 7. AT SCALE Denormalization is THE NORM
  • 9. AT SCALE Everything depends on THE INDICES
  • 10. Cassandra is an INDEX CONSTRUCTION KIT
  • 12. Two-level Map key: { column: value, column: value, ... }
  • 14. Three-level Map key: { supercolumn: { column:value, column: value }, supercolumn: { ... } }
  • 15. column sorting defined by CompareWith/ CompareSubcolumnsWith
  • 16. TimeUUIDType UTF8Type ASCIIType LongType LexicalUUIDType
  • 17. row placement determined by Partitioner
  • 18. RandomPartitioner Place based on MD5 of key OrderPreservingPartitioner Place based on actual key
  • 19. Rows are sorted by key on each node Regardless of partitioner
  • 22. <ColumnFamily Name=”Users” CompareWith=”UTF8Type” />
  • 23. “b”: {“name”:”Ben”, “street”:”1234 Oak St.”, “city”:”Seattle”, “state”:”WA”} “jason”: {”name”:”Jason”, “street”:”456 First Ave.”, “city”:”Bellingham”, “state”:”WA”} “zack”: {”name”: “Zack”, “street”: “4321 Pine St.”, “city”: “Seattle”, “state”: “WA”} “jen1982”: {”name”:”Jennifer”, “street”:”1120 Foo Lane”, “city”:”San Francisco”, “state”:”CA”} “albert”: {”name”:”Albert”, “street”:”2364 South St.”, “city”:”Boston”, “state”:”MA”}
  • 24. SELECT name FROM Users WHERE state=”WA”
  • 25. SELECT name FROM Users WHERE state=”WA” How is WHERE clause formed?
  • 27. <ColumnFamily Name=”LocationUserIndexSCF” CompareWith=”UTF8Type” CompareSubcolumnsWith=”UTF8Type” ColumnType=”Super” />
  • 28. [state]: { [city1]: {[name1]:[user1], [name2]:[user2], ... }, [city2]: {[name3]:[user3], [name4]:[user4], ... }, ... [cityX]: {[name5]:[user5], [name6]:[user6], ... } }
  • 29. “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 30. Row Key “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 31. Row Key Super Column “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 32. Row Key Colum Super Column n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 33. Row Key Colum Super Column Value n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 34. Show me EVERYONE IN WASHINGTON
  • 36. { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 38. Order Preserving Partitioner + Range Queries
  • 39. <ColumnFamily Name=”LocationUserIndexCF” CompareWith=”UTF8Type” />
  • 40. [state1]/[city1]: {[name1]:[user1], [name2]:[user2], ... } [state1]/[city2]: {[name3]:[user3], [name4]:[user4], ... } [state2]/[city1]: {[name5]:[user5], [name6]:[user6], ... } ... [stateX]/[cityY]: {[name7]:[user7], [name8]:[user8], ... }
  • 41. “CA/San Francisco”: {”Jennifer”: “jen1982”} “MA/Boston”: {”Albert”: “albert”} “WA/Bellingham”: {”Jason”: “jason”} “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
  • 42. Show me EVERYONE IN WASHINGTON
  • 44. { ”WA/Bellingham”: {”Jason”: “jason”}, “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”} }
  • 46. (This part is up to you)
  • 48. <Keyspace Name="UserDb"> <ColumnFamily Name="Users" CompareWith="UTF8Type" /> <ColumnFamily Name="LocationUserIndexSCF" CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" ColumnType="Super" /> <ColumnFamily Name="LocationUserIndexCF" CompareWith="UTF8Type" /> <ReplicaPlacementStrategy> org.apache.cassandra.locator.RackUnawareStrategy </ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> </Keyspace>