SlideShare a Scribd company logo
©2013 DataStax ConïŹdential. Do not distribute without consent.
@chbatey
Christopher Batey‹
Manchester Hadoop and Big Data Meetup
@chbatey
Who am I?
‱ Maintainer of Stubbed Cassandra
‱ Other OS projects: akka-persistence,
wiremock
‱ Advocate for Apache Cassandra
‱ Part time consultant
@chbatey
Agenda
‱ Why - running Spark + C*
‱ How - Spark partitions are built up
‱ Example - KillrWeather
@chbatey
OLTP OLAP Batch
Weather data streaming
Incoming
weather
events
Apache Kafka
Producer
Consumer
NodeGuardian
Dashboard
@chbatey
@chbatey
@chbatey
Run this your self
‱ https://github.com/killrweather/killrweather
@chbatey
The details
@chbatey
Pop quiz!
‱ Spark RDD
‱ Spark partition
‱ Spark worker
‱ Spark task
‱ Cassandra row
‱ Cassandra partition
‱ Cassandra token range
@chbatey
Spark architecture
@chbatey
org.apache.spark.rdd.RDD
‱ Resilient Distributed Dataset (RDD)
‱ Created through transformations on data (map,filter..) or other RDDs
‱ Immutable
‱ Partitioned
‱ Reusable
@chbatey
RDD Operations
‱ Transformations - Similar to Scala collections API
‱ Produce new RDDs
‱ filter, flatmap, map, distinct, groupBy, union, zip, reduceByKey, subtract
‱ Actions
‱ Require materialization of the records to generate a value
‱ collect: Array[T], count, fold, reduce..
Spark RDDs
Represent a Large
Amount of Data
Partitioned into Chunks
RDD
1 2 3
4 5 6
7 8 9Worker 2
Worker 1 Worker 3
Worker 4
Worker 2
Worker 1
Spark RDDs
Represent a Large
Amount of Data
Partitioned into Chunks
RDD
2
346
7 8 9
Worker 3
Worker 4
1 5
Cassandra table
CREATE TABLE daily_aggregate_precip (
weather_station text,
year int,
month int,
day int,
precipitation counter,
PRIMARY KEY ((weather_station), year, month, day)
)
PRIMARY KEY ((weatherstation_id),year,month,day)
Partition Key Clustering Columns
Cassandra Data is Distributed By Token Range
Cassandra Data is Distributed By Token Range
0
500
999
Cassandra Data is Distributed By Token Range
0
500
Node 1
Node 2
Node 3
Node 4
Cassandra Data is Distributed By Token Range
0
500
Node 1
Node 2
Node 3
Node 4
Without vnodes
Cassandra Data is Distributed By Token Range
0
500
Node 1
Node 2
Node 3
Node 4
With vnodes
@chbatey
Replication strategy
‱ NetworkTopology
- Every Cassandra node knows its DC and Rack
- Replicas won’t be put on the same rack unless Replication Factor > # of racks
- Unfortunately Cassandra can’t create servers and racks on the fly to fix this :(
@chbatey
Replication
DC1 DC2
client
RF3 RF3
C
RC
WRITE
CL = 1 We have replication!
@chbatey
Pop quiz!
‱ Spark RDD
‱ Spark partition
‱ Spark worker
‱ Spark task
‱ Cassandra row
‱ Cassandra partition
‱ Cassandra token range
@chbatey
Goals
‱ Spark partitions made up of token ranges on the same
node
‱ Tasks to be executed on workers co-located with that
node
‱ Same(ish) amount of data in each Spark partition
Node 1
120-220
300-500
780-830
0-50
‱spark.cassandra.input.split.size_in_mb 64
‱system.size_estimates (# partitions & mean size)
‱tokens per spark partition
The Connector Uses Information on the Node to Make ‹
Spark Partitions
Node 1
120-220
300-500
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
1
780-830
1
Node 1
120-220
300-500
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
2
1
Node 1 300-500
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
2
1
Node 1 300-500
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
2
1
Node 1
300-400
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
400-500
21
Node 1
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
400-500
21
Node 1
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
400-500
3
21
Node 1
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
3
400-500
21
Node 1
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
3
4
21
Node 1
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
3
4
21
Node 1
0-50
The Connector Uses Information on the Node to Make ‹
Spark Partitions
780-830
3
421
Node 1
The Connector Uses Information on the Node to Make ‹
Spark Partitions
3
@chbatey
Key classes
‱ CassandraTableScanRDD, CassandraRDD
- getPreferredLocations
‱ CassandraTableRowReaderProvider
- DataSizeEstimates - goes to C*
‱ CassandraPartitioner
- Gets ring information from the driver
‱ CassandraPartition
- endpoints
- tokenRanges
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50780-830
Node 1
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows
50 CQL Rows
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows
50 CQL Rows
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows
50 CQL Rows
4
spark.cassandra.input.fetch.size_in_rows 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows
50 CQL Rows
50 CQL Rows
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE
token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
50 CQL Rows
@chbatey
Paging
@chbatey
Other bits and bobs
‱ LocalNodeFirstLoadBalancingPolicy
@chbatey
Then we’re into Spark land
‱ Spark partitions are made up of C* partitions that exist
on the same node
‱ C* connector tells Spark which workers to use via
information from the C* driver
@chbatey
RDD -> C* Table
Node 2
Node 1
RDD
2
346
7 8 9
Node 3
Node 4
1 5
The Spark Cassandra
Connector saveToCassandra
method can be called on
almost all RDDs
rdd.saveToCassandra("Keyspace","Table")
Node 11
Java
Driver
Node 11
Java
Driver
1,1,1
1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
3,9,1
Node 11
Java
Driver
1,1,1
1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
Node 11
Java
Driver
1,1,1
1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
PK=1
Node 11
Java
Driver
1,1,1 1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
PK=1
Node 11
Java
Driver
1,1,1 1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
PK=1
PK=2
Node 11
Java
Driver
1,1,1 1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
PK=1
PK=2
Node 11
Java
Driver
1,1,1 1,2,1
2,1,1
3,8,13,2,1 3,4,1 3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
PK=1
PK=2
PK=3
Node 11
Java
Driver
1,1,1 1,2,1
2,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4,3,9,1
3,1,1
spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
PK=1
PK=2
Node 11
Java
Driver
1,1,1 1,2,1
2,1,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4,3,9,1 spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
PK=1
PK=2
PK=3
Node 11
Java
Driver
1,1,1 1,2,1
2,1,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
PK=1
PK=2
PK=3
Node 11
Java
Driver
2,1,1
3,1,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
PK=2
PK=3
Node 11
Java
Driver
2,1,1
3,1,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
PK=2
PK=3
PK=5
Node 11
Java
Driver
2,1,1
3,1,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,1
PK=2
PK=3
PK=5
Node 11
Java
Driver
2,1,1
3,1,1
5,4,1
2,4,18,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,13,9,1
PK=2
PK=3
PK=5
Node 11
Java
Driver
2,1,1
3,1,1
5,4,1
2,4,18,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,9,13,9,1
Write Acknowledged
PK=2
PK=3
PK=5
Node 11
Java
Driver
2,1,1
3,1,1
5,4,1
2,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
8,4,1
3,9,1
PK=2
PK=3
PK=5
Node 11
Java
Driver
3,1,1
5,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
8,4,1
3,9,1
PK=3
PK=5
Node 11
Java
Driver
3,1,1
5,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
8,4,1
3,9,1
PK=8
PK=3
PK=5
Node 11
Java
Driver
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition‹
spark.cassandra.output.batch.size.rows 4
spark.cassandra.output.batch.grouping.buffer.size 3
spark.cassandra.output.concurrent.writes 2‹
3,1,1
5,4,1
8,4,1
3,9,1
PK=8
PK=3
PK=5
@chbatey
Example
@chbatey
Weather Station Analysis
‱ Weather station collects data
‱ Cassandra stores in sequence
‱ Spark rolls up data into new
tables
Windsor California
July 1, 2014
High: 73.4F
Low : 51.4F
raw_weather_data
CREATE TABLE raw_weather_data (
weather_station text, // Composite of Air Force Datsav3 station number and NCDC WBAN numbe
year int, // Year collected
month int, // Month collected
day int, // Day collected
hour int, // Hour collected
temperature double, // Air temperature (degrees Celsius)
dewpoint double, // Dew point temperature (degrees Celsius)
pressure double, // Sea level pressure (hectopascals)
wind_direction int, // Wind direction in degrees. 0-359
wind_speed double, // Wind speed (meters per second)
sky_condition int, // Total cloud cover (coded, see format documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters)
six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters)
PRIMARY KEY ((weather_station), year, month, day, hour)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
Reverses data in the storage engine.
@chbatey
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Partition Key
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Partition Key Clustering Columns
WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Partition Key Clustering Columns
10010:99999
WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
2005:12:1:7:temp
-5.6
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Partition Key Clustering Columns
10010:99999
-5.1
2005:12:1:8:temp
-4.9
2005:12:1:9:temp
-5.3
2005:12:1:10:temp
WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
Data Locality
weatherstation_id=‘10010:99999’ ?
1000 Node Cluster
You are here!
Query patterns
‱ Range queries
‱ “Slice” operation on disk
SELECT weatherstation,hour,temperature
FROM raw_weather_data
WHERE weatherstation_id=‘10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
Single seek on disk
2005:12:1:12
-5.4
2005:12:1:11
-4.9
2005:12:1:7
-5.6-5.1
2005:12:1:8
-4.9
2005:12:1:9
10010:99999
-5.3
2005:12:1:10
Partition key for locality
Query patterns
‱ Range queries
‱ “Slice” operation on disk
Programmers like this
Sorted by event_time
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:10
-5.3
10010:99999
SELECT weatherstation,hour,temperature
FROM raw_weather_data
WHERE weatherstation_id=‘10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
weather_station
CREATE TABLE weather_station (
id text PRIMARY KEY, // Composite of Air Force Datsav3 station number and NCDC WBAN number
name text, // Name of reporting station
country_code text, // 2 letter ISO Country ID
state_code text, // 2 letter state code for US stations
call_sign text, // International station call sign
lat double, // Latitude in decimal degrees
long double, // Longitude in decimal degrees
elevation double // Elevation in meters
);
Lookup table
daily_aggregate_temperature
CREATE TABLE daily_aggregate_temperature (
weather_station text,
year int,
month int,
day int,
high double,
low double,
mean double,
variance double,
stdev double,
PRIMARY KEY ((weather_station), year, month, day)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC);
SELECT high, low FROM daily_aggregate_temperature
WHERE weather_station='010010:99999'
AND year=2005 AND month=12 AND day=3;
high | low
------+------
1.8 | -1.5
daily_aggregate_precip
CREATE TABLE daily_aggregate_precip (
weather_station text,
year int,
month int,
day int,
precipitation counter,
PRIMARY KEY ((weather_station), year, month, day)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC);
SELECT precipitation FROM daily_aggregate_precip
WHERE weather_station='010010:99999'
AND year=2005 AND month=12 AND day>=1 AND day <= 7;
0
10
20
30
40
1 2 3 4 5 6 7
17
26
2
0
33
12
0
Weather Station Stream Analysis
‱ Weather station collects data
‱ Data processed in stream
‱ Data stored in Cassandra
Windsor California
Today
Rainfall total: 1.2cm
High: 73.4F
Low : 51.4F
Incoming data from Kafka
725030:14732,2008,01,01,00,5.0,-3.9,1020.4,270,4.6,2,0.0,0.0
@chbatey
Creating a Stream
@chbatey
Saving the raw data
@chbatey
Building an aggregate
CREATE TABLE daily_aggregate_precip (
weather_station text,
year int,
month int,
day int,
precipitation counter,
PRIMARY KEY ((weather_station), year, month, day)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC);
CQL Counter
@chbatey
Batch job on the fly?
(count: 24, mean: 14.428150, stdev: 7.092196, max: 28.034969, min: 0.675863)
(count: 11242, mean: 8.921956, stdev: 7.428311, max: 29.997986, min: -2.200000)
Weather data streaming
Load
Generator or
Data import
Apache Kafka
Producer
Consumer
NodeGuardian
Dashboard
@chbatey
Summary
‱ Cassandra
- always-on operational database
‱ Spark
- Batch analytics
- Stream processing and saving back to Cassandra
@chbatey
Thanks for listening
‱ Follow me on twitter @chbatey
‱ Cassandra + Fault tolerance posts a plenty:
‱ http://christopher-batey.blogspot.co.uk/
‱ Cassandra resources: http://planetcassandra.org/

More Related Content

PDF
NYC Cassandra Day - Java Intro
PDF
LJC: Microservices in the real world
PDF
Docker and jvm. A good idea?
PDF
Introduction to .Net Driver
PPTX
Cassandra Day NY 2014: Getting Started with the DataStax C# Driver
PDF
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
PDF
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
PPTX
Developing distributed applications with Akka and Akka Cluster
NYC Cassandra Day - Java Intro
LJC: Microservices in the real world
Docker and jvm. A good idea?
Introduction to .Net Driver
Cassandra Day NY 2014: Getting Started with the DataStax C# Driver
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Developing distributed applications with Akka and Akka Cluster

What's hot (20)

PDF
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
PPTX
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
PPTX
DataStax: An Introduction to DataStax Enterprise Search
PDF
Cassandra EU - Data model on fire
PPTX
Asynchronous Orchestration DSL on squbs
PDF
Advanced VCL: how to use restart
 
PDF
I can't believe it's not a queue: Kafka and Spring
PDF
Script it
PDF
[212] large scale backend service develpment
PDF
MySQL in your laptop
PDF
openstackæșç ćˆ†æž(1)
PDF
Python and cassandra
PPTX
Cassandra Java APIs Old and New – A Comparison
PDF
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
PDF
Successful Architectures for Fast Data
PDF
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
PDF
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
 
PDF
Altitude SF 2017: Debugging Fastly VCL 101
 
PDF
Bulk Loading into Cassandra
PDF
Spark / Mesos Cluster Optimization
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
DataStax: An Introduction to DataStax Enterprise Search
Cassandra EU - Data model on fire
Asynchronous Orchestration DSL on squbs
Advanced VCL: how to use restart
 
I can't believe it's not a queue: Kafka and Spring
Script it
[212] large scale backend service develpment
MySQL in your laptop
openstackæșç ćˆ†æž(1)
Python and cassandra
Cassandra Java APIs Old and New – A Comparison
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
Successful Architectures for Fast Data
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
 
Altitude SF 2017: Debugging Fastly VCL 101
 
Bulk Loading into Cassandra
Spark / Mesos Cluster Optimization
Ad

Viewers also liked (17)

PDF
Cassandra summit LWTs
PDF
Cassandra London - 2.2 and 3.0
PDF
Cassandra Day NYC - Cassandra anti patterns
PDF
Think your software is fault-tolerant? Prove it!
PDF
Cassandra Day London: Building Java Applications
PDF
Dublin Meetup: Cassandra anti patterns
PDF
1 Dundee - Cassandra 101
PDF
IoT London July 2015
PDF
3 Dundee-Spark Overview for C* developers
PDF
Cassandra London - C* Spark Connector
PDF
2 Dundee - Cassandra-3
PDF
Manchester Hadoop Meetup: Spark Cassandra Integration
PDF
Manchester Hadoop User Group: Cassandra Intro
PDF
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
PPTX
EVCache at Netflix
PPTX
Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016
PDF
Paris Day Cassandra: Use case
Cassandra summit LWTs
Cassandra London - 2.2 and 3.0
Cassandra Day NYC - Cassandra anti patterns
Think your software is fault-tolerant? Prove it!
Cassandra Day London: Building Java Applications
Dublin Meetup: Cassandra anti patterns
1 Dundee - Cassandra 101
IoT London July 2015
3 Dundee-Spark Overview for C* developers
Cassandra London - C* Spark Connector
2 Dundee - Cassandra-3
Manchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop User Group: Cassandra Intro
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
EVCache at Netflix
Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016
Paris Day Cassandra: Use case
Ad

Similar to Manchester Hadoop Meetup: Cassandra Spark internals (20)

PDF
Cassandra and Spark
PDF
Spark And Cassandra: 2 Fast, 2 Furious
PDF
Spark and Cassandra 2 Fast 2 Furious
PDF
Analyzing Time Series Data with Apache Spark and Cassandra
PDF
Building a High-Performance Database with Scala, Akka, and Spark
PPTX
Spark Sql for Training
PDF
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
PDF
Analytics with Cassandra & Spark
PDF
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
PDF
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
PDF
Apache cassandra and spark. you got the the lighter, let's start the fire
PPTX
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
PDF
DataSource V2 and Cassandra – A Whole New World
PDF
Nike Tech Talk: Double Down on Apache Cassandra and Spark
PPTX
The Pushdown of Everything by Stephan Kessler and Santiago Mola
PDF
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
PPTX
Spark + Cassandra = Real Time Analytics on Operational Data
PPTX
Storlets fb session_16_9
PDF
Spark Summit EU talk by Miklos Christine paddling up the stream
PPTX
ETL with SPARK - First Spark London meetup
Cassandra and Spark
Spark And Cassandra: 2 Fast, 2 Furious
Spark and Cassandra 2 Fast 2 Furious
Analyzing Time Series Data with Apache Spark and Cassandra
Building a High-Performance Database with Scala, Akka, and Spark
Spark Sql for Training
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Analytics with Cassandra & Spark
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Apache cassandra and spark. you got the the lighter, let's start the fire
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
DataSource V2 and Cassandra – A Whole New World
Nike Tech Talk: Double Down on Apache Cassandra and Spark
The Pushdown of Everything by Stephan Kessler and Santiago Mola
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Spark + Cassandra = Real Time Analytics on Operational Data
Storlets fb session_16_9
Spark Summit EU talk by Miklos Christine paddling up the stream
ETL with SPARK - First Spark London meetup

More from Christopher Batey (12)

PDF
Data Science Lab Meetup: Cassandra and Spark
PDF
Webinar Cassandra Anti-Patterns
PDF
Munich March 2015 - Cassandra + Spark Overview
PDF
Reading Cassandra Meetup Feb 2015: Apache Spark
PDF
LA Cassandra Day 2015 - Testing Cassandra
PDF
LA Cassandra Day 2015 - Cassandra for developers
PDF
Voxxed Vienna 2015 Fault tolerant microservices
PDF
Vienna Feb 2015: Cassandra: How it works and what it's good for!
PDF
Jan 2015 - Cassandra101 Manchester Meetup
PDF
LJC: Fault tolerance with Apache Cassandra
PDF
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
PDF
Cassandra Summit EU 2014 - Testing Cassandra Applications
Data Science Lab Meetup: Cassandra and Spark
Webinar Cassandra Anti-Patterns
Munich March 2015 - Cassandra + Spark Overview
Reading Cassandra Meetup Feb 2015: Apache Spark
LA Cassandra Day 2015 - Testing Cassandra
LA Cassandra Day 2015 - Cassandra for developers
Voxxed Vienna 2015 Fault tolerant microservices
Vienna Feb 2015: Cassandra: How it works and what it's good for!
Jan 2015 - Cassandra101 Manchester Meetup
LJC: Fault tolerance with Apache Cassandra
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
Cassandra Summit EU 2014 - Testing Cassandra Applications

Recently uploaded (20)

PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Transform Your Business with a Software ERP System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
AI in Product Development-omnex systems
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Essential Infomation Tech presentation.pptx
PDF
System and Network Administraation Chapter 3
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
top salesforce developer skills in 2025.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
VVF-Customer-Presentation2025-Ver1.9.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Design an Analysis of Algorithms II-SECS-1021-03
Design an Analysis of Algorithms I-SECS-1021-03
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Operating system designcfffgfgggggggvggggggggg
AI in Product Development-omnex systems
Odoo Companies in India – Driving Business Transformation.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Essential Infomation Tech presentation.pptx
System and Network Administraation Chapter 3
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
top salesforce developer skills in 2025.pdf

Manchester Hadoop Meetup: Cassandra Spark internals