SlideShare a Scribd company logo
High-order bits from Cassandra & Hadoopsrisatishambati@srisatish
Thank You! svccg in first page of search results for “cloud” on google!
NoSQL-Know your queries.
pointsUsecasesWhy cassandra?Usecase: Hadoop, BriskFUD:Consistency Why facebook is not using Cassandra?Anti-patternsCommunity, Code, ToolsQ&A
Users. Netflix.Key by Customer, read-heavyKey by Customer:Movie, write-heavy
TimeSeries: (several customers)periodic readings:  dev0, dev1…deviceID:metric:timestamp ->valueMetrics typically way larger dataset than users.
Why Cassandra?
Operational simplicitypeer-to-peer
Operational simplicitypeer-to-peer
Replication: Multi-datacenterMulti-region ec2Multi-availability zones
reads localdc1dc2Replication: Multi-datacenterMulti-region ec2, awsMulti-availability zones
4.21.2011,  Amazon Web Services outage:“Movie marathons on Netflix awaiting AWS to come back up.”  #ec2disabled
4.21.2011,  Amazon Web Services outage:Netflix was running on AWS.
fast durable writes. fast reads.
WritesSequential, append-only.~1-5ms
WritesSequential, append-only.~1-5msOn cloud: ephemeral disks rock!
Reads LocalKey & row caches, (also, jna-based 0xffheap)indexes, materialized
Reads LocalKey & row caches, (also, jna-based 0xffheap)indexes, materializedssds: improved read performance!
Distribution between nodes GossipAnti-entropyFailure-detectorL i g h t w e i g h t
Clients: cql, thriftpycassa, phpcassa hector, pelops(scala, ruby, clojure)
Usecase #3: hadoopHdfs cassandra hiveLogs         stats          analytics
BriskTruly peer-to-peer hadoop.
mv computationnot data
High order bits from cassandra & hadoop
Parallel Execution View
High order bits from cassandra & hadoop
jobtracker, tasktrackerhdfs: namenode, datanode
clouderaamazon: elastic map reducehortonworksmapRbrisk
Tools & Analytics Hive, Pig, RKarmasphereDatameer… dozens of stealth startups!
Namenode decomposition, explained.
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
Use column families (tables)inodesblock
near-real time hadoopLow latency: cassandra_dc nodesBatch Analytics: brisk_dc nodes
FUD, acronym: fear, uncertainty, doubt.
Consistency:  R + W > N    ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS* N is replication factor. Not to be confused with T=total #of nodes
Tune-able, flexibility.For High Consistency:  read:quorum, write:quorumFor High Availability: 	high W, low R.
High order bits from cassandra & hadoop
Inbox Search: 600+cores.120+TB (2008)Went from 100-500m users.Average NoSQL deployment size: ~6-12 nodes.
Usecase #5: searchApache Solr + Cassandra = SolandraOther inbox/file Searches:xobni, c3github.com/tjake/solandra
“Eventual consistency is harder to program.”mostly immutable data.complex systems at scale.
Miscellaneous,Myth: data-loss, partial rows.writes are durable.
Anti-PatternsTransactionsJoinsRead before write
Anti-Patterns for cloudebsjvm, virtualizedsingle region
Three good reasons for Cassandra...
ToolsAMIs, OpsCenter, DataStaxAppDynamicsNetflix just builds AMIs for deployment!
B e a u t i f u l   C   0   d   e= new code(); //less is more~90k.java.concurrent.@annotate. bloomfilters, merkletrees.non-blocking, staged-event-driven.bigtable, dynamo.
Current & Future Focus:Distributed Counters, CQL.Simple client.operational smoothening. compaction.
CommunityRobust. Rapid. #Professional support from DataStax.Filesysteminnovatin from Acunuengineers: independent,startups, large companies, Rackspace, Twitter, Netflix..Come join the efforts!
High order bits from cassandra & hadoop
Usecase #4:  first NoSQL, then scale!simpledb  Cassandramongodb Cassandra
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
Copyright: xkcd
Copyright: plantoys… more than one way to do it!
Summary -high scale peer-to-peer datastorebest friend for multi-region, multi-zone availability.Hadoop – HDFS engulfing the DataWorld
Q&A@srisatish
NoSQL-Know your queries.

More Related Content

PPTX
High order bits from cassandra & hadoop
ODP
Big data
PDF
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
PPTX
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
PPTX
Cassandra at no_sql
PDF
Brisk hadoop june2011
PDF
Brisk hadoop june2011_sfjava
PDF
Cassandra for Sysadmins
High order bits from cassandra & hadoop
Big data
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Cassandra at no_sql
Brisk hadoop june2011
Brisk hadoop june2011_sfjava
Cassandra for Sysadmins

What's hot (20)

PDF
The Automation Factory
PDF
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
PPTX
Spark application on ec2 cluster
PDF
Cassandra CLuster Management by Japan Cassandra Community
PDF
SSTable Reader Cassandra Day Denver 2014
PDF
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
PDF
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
PPTX
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
PDF
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
PDF
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
PPTX
Learn Cassandra at edureka!
PDF
Introduction to Cassandra
PDF
Instaclustr webinar 2017 feb 08 japan
PPTX
Up and running with pyspark
PDF
Scylla db deck, july 2017
KEY
Introduction to Cassandra: Replication and Consistency
PPTX
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
PDF
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
PDF
Introduction to cassandra 2014
PDF
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
The Automation Factory
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Spark application on ec2 cluster
Cassandra CLuster Management by Japan Cassandra Community
SSTable Reader Cassandra Day Denver 2014
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Learn Cassandra at edureka!
Introduction to Cassandra
Instaclustr webinar 2017 feb 08 japan
Up and running with pyspark
Scylla db deck, july 2017
Introduction to Cassandra: Replication and Consistency
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
Introduction to cassandra 2014
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
Ad

Similar to High order bits from cassandra & hadoop (20)

PPTX
Netflix and Open Source
PDF
MySQL Cluster Scaling to a Billion Queries
PDF
Spring one2gx2010 spring-nonrelational_data
PPTX
Apache Cassandra Lunch #72: Databricks and Cassandra
PPTX
Cassandra & Python - Springfield MO User Group
PPTX
Navigating NoSQL in cloudy skies
PDF
On Rails with Apache Cassandra
PDF
Apache cassandra and spark. you got the the lighter, let's start the fire
PPTX
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
PDF
Breakthrough OLAP performance with Cassandra and Spark
PPTX
Big data vahidamiri-tabriz-13960226-datastack.ir
PPT
Spinnaker VLDB 2011
PDF
Developing with Cassandra
PDF
Introduction to parallel iterative deep learning on hadoop’s next​ generation...
PDF
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
PDF
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
PPTX
Minnebar 2013 - Scaling with Cassandra
PPTX
NoSql Database
PPTX
Cassandra
Netflix and Open Source
MySQL Cluster Scaling to a Billion Queries
Spring one2gx2010 spring-nonrelational_data
Apache Cassandra Lunch #72: Databricks and Cassandra
Cassandra & Python - Springfield MO User Group
Navigating NoSQL in cloudy skies
On Rails with Apache Cassandra
Apache cassandra and spark. you got the the lighter, let's start the fire
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
Breakthrough OLAP performance with Cassandra and Spark
Big data vahidamiri-tabriz-13960226-datastack.ir
Spinnaker VLDB 2011
Developing with Cassandra
Introduction to parallel iterative deep learning on hadoop’s next​ generation...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Minnebar 2013 - Scaling with Cassandra
NoSql Database
Cassandra
Ad

More from srisatish ambati (11)

PDF
H2O Open Dallas 2016 keynote for Business Transformation
PDF
Digital Transformation with AI and Data - H2O.ai and Open Source
PDF
Top 10 Performance Gotchas for scaling in-memory Algorithms.
PDF
Cacheconcurrencyconsistency cassandra svcc
PDF
Jvm goes big_data_sfjava
PPT
jvm goes to big data
PPT
Svccg nosql 2011_sri-cassandra
PDF
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
PPT
How to Stop Worrying and Start Caching in Java
PDF
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
PPT
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
H2O Open Dallas 2016 keynote for Business Transformation
Digital Transformation with AI and Data - H2O.ai and Open Source
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Cacheconcurrencyconsistency cassandra svcc
Jvm goes big_data_sfjava
jvm goes to big data
Svccg nosql 2011_sri-cassandra
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
How to Stop Worrying and Start Caching in Java
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Getting Started with Data Integration: FME Form 101
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
A Presentation on Artificial Intelligence
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Spectroscopy.pptx food analysis technology
Per capita expenditure prediction using model stacking based on satellite ima...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Getting Started with Data Integration: FME Form 101
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
20250228 LYD VKU AI Blended-Learning.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Group 1 Presentation -Planning and Decision Making .pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
A Presentation on Artificial Intelligence
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Programs and apps: productivity, graphics, security and other tools
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
Spectroscopy.pptx food analysis technology

High order bits from cassandra & hadoop