SlideShare a Scribd company logo
Cozy with CassandraGetting to know the Cassandra CodebaseGary Dusbabek • Rackspace@gdusbabekCassandra Summit • Mission Bay Conference Center • San Francisco • 10 August 2010
OutlineCode ThemesStartup SequenceKey ClassesRead PathWrite PathStages & ThreadingBootstrap & StreamingTests & IDE considerationsAdding API methodsQuestions
Themes and PatternsLayers
Themes and PatternsServices
Themes and PatternsSingletons and statics
Themes and PatternsStages &Thread pools
StartupProcess
CassandraDaemonLoads configurationTransport initializationStorage (Keyspace initialization)CommitLogrecoveryStorageService.initServer()Initializes CassandraServerPasses it off to transport
CassandraServerImplements IDL interface methods (cassandra.thrift, cassandra.genavro)Good place to start diving when adding or troubleshooting API methods
ConfigurationDatabaseDescriptorVia CassandraDaemon.setup()Looks for config path, loads yamlDoesn’t spin anything upDefines system tablesKS and CF described by CFMetaData and KSMetaData
CodeSomeClasses
Main ControllersEnd with *Service or *ManagerStorageService, MessagingServiceCompactionManager, HintedHandoffManager, StageManager, StreamInManager
StorageProxyPut & Get methodsCollection of static methodsMerges local and distributed operationsTracks latencyExposed via StorageProxyMBean
StorageServiceinitServer()—Starts servicesRegisters verb handlers (in MessagingService)Main event respondersRepository of replication strategies and TokenMetadataRing topology & token information
MessagingServiceVerb handlers reside hereSets up socket listenersGateway for outbound messagesMS.sendRR()MS.sendOneWay()Inbound tooMS.receive()
Table & ColumnFamilyStoreAlso RowMutationLow-level storage operationso.a.c.db.*SSTableLocal operations
Read+Write Paths
Reading	Socket->CassandraServerPermissionsRequest validationMarshalling
Reading	StorageProxyRangesCollectorsLocal & remote branches
Reading	StorageProxy localTable, ColumnFamilyStoreCFSMake QueryFilterQuery MemtablesQuery SSTablesCoalesce in iteratorso.a.c.db packageo.a.c.db.filter
Reading	StorageProxy remoteread commandResponse handlerSend to remote nodes
WritingSocket->CassandraServerValidationConvert to Mutation (IDL object)Penalties!
WritingStorageProxyblocking/non-blocking mutate local/remote branchRowMutationone ColumnFamily perColumnFamilyCollection of column modifications
WritingRM.apply->Table.applyWrite to CLIterate over RM CFsCFS.apply()Overwrites results on pre-existing column families
WritingRM is serialized into a Message and sent to other nodesWaits for ACKs depending on CL
Stages & Threading
StagesSEDAhttp://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdfo.a.c.concurrent.StageManagerRead, mutation, stream, gossip, response, anti entropy, load balance, migrationThread pools that consume tasks
ThreadsMessagingService.listen() spawns thread.Each incoming connection spawns a new short-lived thread (IncomingTcpConnection)Non-stream ops go to MS.messageDeserializerExecutor_Stream ops handled there.Anti-entropy repair
Bootstraping!Any interest?//FIXMETODOCRAP
<=0.6 BootstrapingA wants data, B has data.StreamingRequestMessage A->BHandled on B by StreamRequestVerbHandlerFor each range StreamOut.transferRanges()Flush, anticompactionStreamInitiateMessage B->A for each range transferMeanwhile, back on A…StreamInitiateVerbHandler gets the SIM from B, does some nesting.StreamInitiateDone A->BBack to B…StreamInitiateDoneHandler gets the SID from ACalls StreamOutManager.startNext() which sends a single file to AMessagingService on A picks this up and the file is streamed.Sstable is createdSTREAM_FINISHED A->BB gets rid of the file, calls SOM.startNext()
0.7 BootstrappingA wants data, B has dataStreamRequestMessage A->BOn B, StreamRequestVerbHandlerIf single file, sends it.If range, StreamOut.transferRangesForRequest()Send next file (first will contain meta data  about all files)On A, IncomingStreamReader.read()	Data is received, sstable createdAck, request next file
TestsTestable & UntestableUnit testsant clean build testSystem testsant gen-thrift-pynosetests test/system/test_thrift_server.py
IDEConfiguration file must be in the classpathTreat as sourcelib vs build/libLog at debug
IDE-ea -Xms128M –Xmx2G -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false-Dcassandra-foreground=yes -Dlog4j.configuration=log4j-server.properties-Dmx4jport=9081
Adding API methodsSame goes for modifyingDefine method and structures in IDLinterface/cassandra.thriftRegenerate filesant gen-thrift-java gen-thrift-pyImplement methods in o.a.c.thrift.CassandraServerCreate a system test (tests/system/test_thrift_server.py)
Questions?gdusbabek@gmail.com@gdusbabek

More Related Content

PPTX
Cassandra Codebase 2011
PDF
Spotify: Automating Cassandra repairs
PPT
Reactive programming with examples
PPTX
Tale of Kafka Consumer for Spark Streaming
PPTX
Low latency in java 8 by Peter Lawrey
PDF
Cassandra Internals Overview
PPTX
Asynchronous Orchestration DSL on squbs
PPTX
Realtime Statistics based on Apache Storm and RocketMQ
Cassandra Codebase 2011
Spotify: Automating Cassandra repairs
Reactive programming with examples
Tale of Kafka Consumer for Spark Streaming
Low latency in java 8 by Peter Lawrey
Cassandra Internals Overview
Asynchronous Orchestration DSL on squbs
Realtime Statistics based on Apache Storm and RocketMQ

What's hot (20)

PPTX
Gude for C++11 in Apache Traffic Server
PDF
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
PDF
Monitoring with Prometheus
PDF
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
PPTX
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
PPTX
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
PDF
Openstack meetup lyon_2017-09-28
PPT
Specs2 whirlwind tour at Scaladays 2014
ODP
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
PDF
Training Slides: Intermediate 205: Configuring Tungsten Replicator to Extract...
PDF
Introduction to Akka-Streams
PDF
Testing Kafka components with Kafka for JUnit
PPTX
PDF
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
PPTX
Real-time streaming and data pipelines with Apache Kafka
PDF
Reactive Streams / Akka Streams - GeeCON Prague 2014
PPTX
Mario on spark
PPTX
Deterministic behaviour and performance in trading systems
PDF
Building Scalable Stateless Applications with RxJava
PPTX
Determinism in finance
Gude for C++11 in Apache Traffic Server
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Monitoring with Prometheus
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Openstack meetup lyon_2017-09-28
Specs2 whirlwind tour at Scaladays 2014
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Training Slides: Intermediate 205: Configuring Tungsten Replicator to Extract...
Introduction to Akka-Streams
Testing Kafka components with Kafka for JUnit
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Real-time streaming and data pipelines with Apache Kafka
Reactive Streams / Akka Streams - GeeCON Prague 2014
Mario on spark
Deterministic behaviour and performance in trading systems
Building Scalable Stateless Applications with RxJava
Determinism in finance
Ad

Similar to Getting to Know the Cassandra Codebase (20)

PDF
Apache Con NA 2013 - Cassandra Internals
PPTX
NoSql Database
PPTX
Service messaging using Kafka
PDF
Apache Cassandra in Bangalore - Cassandra Internals and Performance
PPT
Spinnaker VLDB 2011
ODP
Introduction to apache_cassandra_for_developers-lhg
DOC
weblogic perfomence tuning
PPT
No sql
PPTX
Cassandra Java APIs Old and New – A Comparison
PPTX
Learning spark ch10 - Spark Streaming
PDF
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
PDF
Building Continuous Application with Structured Streaming and Real-Time Data ...
PPT
Jdbc ppt
PDF
Non-blocking I/O, Event loops and node.js
ODP
Web program-peformance-optimization
PDF
Jdbc[1]
PDF
JDBC programming
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
PDF
Building a High-Performance Database with Scala, Akka, and Spark
PDF
Introduction to JDBC and database access in web applications
Apache Con NA 2013 - Cassandra Internals
NoSql Database
Service messaging using Kafka
Apache Cassandra in Bangalore - Cassandra Internals and Performance
Spinnaker VLDB 2011
Introduction to apache_cassandra_for_developers-lhg
weblogic perfomence tuning
No sql
Cassandra Java APIs Old and New – A Comparison
Learning spark ch10 - Spark Streaming
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Building Continuous Application with Structured Streaming and Real-Time Data ...
Jdbc ppt
Non-blocking I/O, Event loops and node.js
Web program-peformance-optimization
Jdbc[1]
JDBC programming
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Building a High-Performance Database with Scala, Akka, and Spark
Introduction to JDBC and database access in web applications
Ad

More from gdusbabek (14)

PPTX
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
PDF
How To (Not) Open Source - Javazone, Oslo 2014
PDF
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
PDF
Measure All the Things! - Austin Data Day 2014
PDF
Blueflood: Open Source Metrics Processing at CassandraEU 2013
PDF
Introduction to Blueflood at Berlin Buzzwords 2013
PDF
Rackspace Cloud Monitoring - Strata NYC
PPTX
Austin cassandra meetup
PPTX
How Rackspace Cloud Monitoring uses Cassandra
PPTX
Breaking the Relational Headlock: A Survey of NoSQL Datastores
PPTX
Building Rackspace Cloud Monitoring
PPTX
Data Modeling with Cassandra Column Families
PPTX
Introduction to Cassandra (June 2010)
PPTX
Cassandra Presentation for San Antonio JUG
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
How To (Not) Open Source - Javazone, Oslo 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Measure All the Things! - Austin Data Day 2014
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Introduction to Blueflood at Berlin Buzzwords 2013
Rackspace Cloud Monitoring - Strata NYC
Austin cassandra meetup
How Rackspace Cloud Monitoring uses Cassandra
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Building Rackspace Cloud Monitoring
Data Modeling with Cassandra Column Families
Introduction to Cassandra (June 2010)
Cassandra Presentation for San Antonio JUG

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Cloud computing and distributed systems.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
KodekX | Application Modernization Development
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
DOCX
The AUB Centre for AI in Media Proposal.docx
Big Data Technologies - Introduction.pptx
sap open course for s4hana steps from ECC to s4
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KodekX | Application Modernization Development
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Empathic Computing: Creating Shared Understanding
The AUB Centre for AI in Media Proposal.docx

Getting to Know the Cassandra Codebase

Editor's Notes

  • #3: Talk about Cassandra processes and the classes that relate to them.
  • #4: Data operated on,then passed off to another layer.Evident on R/W pathOnion analogy
  • #5: DisjointCoupled when neccessary
  • #6: Model of class/object designSuck it upNot a class project
  • #15: ServicesGossiperMessagingServiceMigrationManagerBootstrapPreloaded cacheHandle to server-related tasks.
  • #28: layers