SlideShare a Scribd company logo
Big Data Analytics mit
Spark & Cassandra_
JUG Stuttgart 01/2016
Matthias Niehoff
•Cassandra
•Spark
•Spark & Cassandra
•Spark Applications
•Spark Streaming
•Spark SQL
•Spark MLLib
Agenda_
2
Cassandra
3
•Distributed database
•Highly Available
•Linear Scalable
•Multi Datacenter Support
•No Single Point Of Failure
•CQL Query Language
• Similiar to SQL
• No Joins and aggregates
•Eventual Consistency „Tunable Consistency“
Cassandra_
4
Distributed Data Storage_
5
Node 1
Node 2
Node 3
Node 4
1-25
26-5051-75
76-0
CQL - Querying Language With Limitations_
6
SELECT	*	FROM	performer	WHERE	name	=	'ACDC'	
—>	ok	
SELECT	*	FROM	performer	WHERE	name	=	'ACDC'	and	country	=	
'Australia'	
—>	not	ok	
SELECT	country,	COUNT(*)	as	quantity	FROM	artists	GROUP	BY	
country	ORDER	BY	quantity	DESC	
—>	not	supported	
performer
name (PK)
genre
country
Spark
7
•Open Source & Apache project since 2010
•Data processing Framework
• Batch processing
• Stream processing
What Is Apache Spark_
8
•Fast
• up to 100 times faster than Hadoop
• a lot of in-memory processing
• linear scalable using more nodes
•Easy
• Scala, Java and Python API
• Clean Code (e.g. with lambdas in Java 8)
• expanded API: map, reduce, filter, groupBy, sort, union, join,
reduceByKey, groupByKey, sample, take, first, count
•Fault-Tolerant
• easily reproducible
Why Use Spark_
9
•RDD‘s – Resilient Distributed Dataset
• Read–Only description of a collection of objects
• Distributed among the cluster (on memory or disk)
• Determined through transformations
• Allows automatically rebuild on failure
•Operations
• Transformations (map,filter,reduce...) —> new RDD
• Actions (count, collect, save)
•Only Actions start processing!
Easily Reproducable?_
10
•Partitions
• Describes the Partitions (i.e. one per Cassandra Partition)
•Dependencies
• dependencies on parent RDD’s
•Compute
• The function to compute the RDD’s partitions
•(Optional) Partitioner
• How is the data partitioned? (Hash, Range..)
•(Optional) Preferred Location
• Where to get the data (i.e. List of Cassandra Node IP’s)
Properties Of A RDD_
11
RDD Example_
12
scala>	val	textFile	=	sc.textFile("README.md")	
textFile:	spark.RDD[String]	=	spark.MappedRDD@2ee9b6e3	
scala>	val	linesWithSpark	=	textFile.filter(line	=>	
line.contains("Spark"))	
linesWithSpark:	spark.RDD[String]	=	spark.FilteredRDD@7dd4af09	
scala>	linesWithSpark.count()		
res0:	Long	=	126
Reproduce RDD’s Using A Tree_
13
Datenquelle
rdd1
rdd3
val1 rdd5
rdd2
rdd4
val2
rdd6
val3
map(..)filter(..)
union(..)
count()
count() count()
sample(..)
cache()
•Transformations
• map, flatMap
• sample, filter, distinct
• union, intersection, cartesian
•Actions
• reduce
• count
• collect,first, take
• saveAsTextFile
• foreach
Spark Transformations & Actions_
14
Run Spark In A Cluster_
15
•Memory
• A lot of data in memory
• More memory —> Less disk IO —> Faster processing
• Minimum 8 GB / Node
•Network
• Communication between Driver, Cluster Manager & Worker
• Important for reduce operations
• 10 Gigabit LAN or better
•CPU
• Less communication between threads
• Good to parallelize
• Minimum 8 – 16 Cores / Node
What About Hardware?_
16
•Master Web UI (8080)
How To Monitor? (1/3)_
17
•Worker Web UI (8081)
How To Monitor? (2/3)_
18
•Application Web UI (4040)
How To Monitor? (3/3)_
19
([atomic,collection,object]	,	[atomic,collection,object])	
val	fluege	=		
List(	("Thomas",	"Berlin"),("Mark",	"Paris"),("Thomas",	"Madrid"))	
val	pairRDD	=	sc.parallelize(fluege)	
pairRDD.filter(_._1	==	"Thomas")	
.collect	
.foreach(t	=>	println(t._1	+	"	flog	nach	"	+	t._2))	
Pair RDDs_
20
key – not unique value
•Parallelization!
• keys are use for partitioning
• pairs with different keys are distributed across the cluster
•Efficient processing of
• aggregate by key
• group by key
• sort by key
• joins, union based on keys
Why Use Pair RDD’s_
21
RDD Dependencies_
22
„Narrow“ (pipeline-able)
map, filter
union
join on co partitioned data
RDD Dependencies_
23
„Wide“ (shuffle)
groupBy
on non partitioned data join on non co partitioned data
Spark Demo
24
Spark & Cassandra
25
Use Spark And Cassandra In A Cluster_
26
Spark	

Client
Spark
Driver
C*
C*
C*C*
Spark
WN
Spark
WN
Spark
WN
Spark
WN
Spark
Master
Two Datacenter - Two Purposes_
27
C*
C*
C*C*
C*
C*
C*C*
Spark
WN
Spark
WN
Spark
WN
Spark
WN
Spark
Master
DC1 - Online DC2 - Analytics
•Spark Cassandra Connector by Datastax
• https://github.com/datastax/spark-cassandra-connector
•Cassandra tables as Spark RDD (read & write)
•Mapping of C* tables and rows onto Java/Scala objects
•Server-Side filtering („where“)
•Compatible with
• Spark ≥ 0.9
• Cassandra ≥ 2.0
•Clone & Compile with SBT or download at Maven Central
Connecting Spark With Cassandra_
28
•Start the shell
bin/spark-shell	

--jars	~/path/to/jar/spark-cassandra-connector-
assembly-1.3.0.jar	

--conf	spark.cassandra.connection.host=localhost	
•Import	Cassandra	Classes	
scala>	import	com.datastax.spark.connector._	
Use The Connector In The Shell_
29
•Read complete table
val	movies	=	sc.cassandraTable("movie","movies")	
//	returns	CassandraRDD[CassandraRow]	
•Read selected columns
val	movies	=	sc.cassandraTable("movie","movies").select("title","year")	
•Filter rows
val	movies	=	sc.cassandraTable("movie","movies").where("title	=	'Die	
Hard'")	
•Access Columns in Result Set
movies.collect.foreach(r	=>	println(r.get[String]("title")))	
Read A Cassandra Table_
30
Read As Tuple
val	movies	=	

sc.cassandraTable[(String,Int)]("movie","movies")

.select("title","year")	
val	movies	=	

sc.cassandraTable("movie","movies")

.select("title","year")

.as((_:	String,	_:Int))	
//	both	result	in	a	CassandraRDD[(String,Int)]	
Read A Cassandra Table_
31
Read As Case Class
case	class	Movie(title:	String,	year:	Int)	
sc.cassandraTable[Movie]("movie","movies").select("title","year")	
sc.cassandraTable("movie","movies").select("title","year").as(Movie)	
Read A Cassandra Table_
32
•Every RDD can be saved
• Using Tuples
val	tuples	=	sc.parallelize(Seq(("Hobbit",2012),("96	Hours",2008)))		
tuples.saveToCassandra("movie","movies",	SomeColumns("title","year")	
• Using Case Classes
case	class	Movie	(title:String,	year:	int)

val	objects	=		
sc.parallelize(Seq(Movie("Hobbit",2012),Movie("96	Hours",2008)))		
objects.saveToCassandra("movie","movies")	
Write Table_
33
//	Load	and	format	as	Pair	RDD	
val	pairRDD	=	sc.cassandraTable("movie","director")	
.map(r	=>	(r.getString("country"),r))	
//	Directors	/	Country,	sorted	
pairRDD.mapValues(v	=>	1).reduceByKey(_+_)	
.sortBy(-_._2).collect.foreach(println)	
//	or,	unsorted	
pairRDD.countByKey().foreach(println)	
//	All	Countries	
pairRDD.keys()	
Pair RDDs With Cassandra_
34
director
name text K
country text
•Joins can be expensive as they may require shuffling
val	directors	=	sc.cassandraTable(..)

.map(r	=>	(r.getString("name"),r))	
val	movies	=	sc.cassandraTable()

.map(r	=>	(r.getString("director"),r))	
movies.join(directors)	
//	RDD[(String,	(CassandraRow,	CassandraRow))]	
Pair RDDs With Cassandra - Join
35
director
name text K
country text
movie
title text K
director text
•Automatically on read
•Not automatically on write
• No Shuffling Spark Operations -> Writes are local
• Shuffeling Spark Operartions
• Fan Out writes to Cassandra
• repartitionByCassandraReplica(“keyspace“, “table“) before write
•Joins with data locality
Using Data Locality With Cassandra_
36
sc.cassandraTable[CassandraRow](KEYSPACE,	A)	
.repartitionByCassandraReplica(KEYSPACE,	B)	
.joinWithCassandraTable[CassandraRow](KEYSPACE,	B)

.on(SomeColumns("id"))
•cassandraCount()
• Utilizes Cassandra query
• vs load the table into memory and do a count
•spanBy(), spanByKey()
• group data by Cassandra partition key
• does not need shuffling
• should be preferred over groupBy/groupByKey
CREATE TABLE events (year int, month int, ts timestamp, data
varchar, PRIMARY KEY (year,month,ts));
sc.cassandraTable("test",	"events")	
		.spanBy(row	=>	(row.getInt("year"),	row.getInt("month")))	
sc.cassandraTable("test",	"events")	
		.keyBy(row	=>	(row.getInt("year"),	row.getInt("month")))	
		.spanByKey	
Further Transformations & Actions_
37
Spark & Cassandra Demo
38
Create an Application
39
•Normal Scala Application
•SBT as build tool
•source in src/main/scala-2.10
•assembly.sbt in root and project directory
•build.sbt in root directory
•sbt assembly to build
Scala Application_
40
libraryDependencies	+=	"com.datastax.spark"	%	"spark-cassandra-connector"	%	"1.3.0"	
libraryDependencies	+=	"org.apache.spark"	%	"spark-core"	%	"1.3.1"	%	"provided"	
libraryDependencies	+=	"org.apache.spark"	%	"spark-mllib_2.10"	%	"1.3.1"	%	"provided"	
libraryDependencies	+=	"org.apache.spark"	%	"spark-streaming_2.10"	%	"1.3.1"	%	
"provided"
•Normal Java Application
•Java 8!
•MVN as build tool
•source in src/main/java
•in pom.xml
• dependencies (spark-core, spark-streaming, spark-mllib, 

spark-cassandra-connector)
• assembly-plugin or shade-plugin
•mvn clean install to build
Java Application_
41
•Special classes for Java
SparkConf	conf	=	

new	SparkConf().setMaster("local[2]").setAppName("Java")

.set("spark.cassandra.connection.host",	"127.0.0.1");	
JavaSparkContext	sc	=	new	JavaSparkContext(conf);	
JavaStreamingContext	ssc	=	new	JavaStreamingContext(conf,	
Durations.seconds(1L));	
JavaRDD<Integer>	rdd	=		
sc.parallelize(Arrays.asList(1,	2,	3,	4,	5,	6));	
rdd.filter(e	->	e	%	2	==	0).foreach(System.out::println);	
Java Specials_
42
•Special classes for Java
import	static	
com.datastax.spark.connector.japi.CassandraJavaUtil.*;	
CassandraTableScanJavaRDD<CassandraRow>	table	=	
javaFunctions(sc.sparkContext())	
.cassandraTable("keyspace",	„table");	
CassandraTableScanJavaRDD<Entity>	table	=	
javaFunctions(sc.sparkContext())

.cassandraTable("keyspace",	"table",mapRowTo(Entity.class))	
javaFunctions(someRDD).writerBuilder("keyspace",	"table",	
mapToRow(Entity.class)).saveToCassandra();	
Java Specials - Cassandra_
43
Spark SQL
44
•SQL Queries with Spark (SQL & HiveQL)
• On structured data
• On DataFrame
• Every result of Spark SQL is a DataFrame
• All operations of the GenericRDD‘s available
•Supports (even on non primary key columns)
• Joins
• Union
• Group By
• Having
• Order By
Spark SQL_
45
val	sqlContext	=	new	SQLContext(sc)	
val	persons	=	sqlContext.jsonFile(path)	
//	Show	the	schema	
persons.printSchema()	
persons.registerTempTable("persons")	
val	adults	=		
sqlContext.sql("SELECT	name	FROM	persons	WHERE	age	>	18")	
adults.collect.foreach(println)	
Spark SQL - JSON Example_
46
{"name":"Michael"}	
{"name":"Jan",	"age":30}	
{"name":"Tim",	"age":17}
val	csc	=	new	CassandraSQLContext(sc)	
csc.setKeyspace("musicdb")	
val	result	=	csc.sql("SELECT	country,	COUNT(*)	as	anzahl"	+	
	 	 	 				"FROM	artists	GROUP	BY	country"	+	
	 	 	 				"ORDER	BY	anzahl	DESC");	
result.collect.foreach(println);	
Spark SQL - Cassandra Example_
47
Spark SQL Demo
48
Spark Streaming
49
•Real Time Processing using micro batches
•Supported sources: TCP, S3, Kafka, Twitter,..
•Data as Discretized Stream (DStream)
•Same programming model as for batches
•All Operations of the GenericRDD & SQL & MLLib
•Stateful Operations & Sliding Windows
Stream Processing With Spark Streaming_
50
import	org.apache.spark.streaming._	
val	ssc	=	new	StreamingContext(sc,Seconds(1))	
val	stream	=	ssc.socketTextStream("127.0.0.1",9999)	
stream.map(x	=>	1).reduce(_	+	_).print()	
ssc.start()	
//	await	manual	termination	or	error	
ssc.awaitTermination()	
//	manual	termination	
ssc.stop()	
Spark Streaming - Example_
51
•Maintain State for each key in a DStream: updateStateByKey
Spark Streaming - Stateful Operations_
52
def	updateAlbumCount(newValues:	Seq[Int],runningCount:	
Option[Int])	:	Option[Int]	=	
{	
		val	newCount	=	runningCount.getOrElse(0)	+	newValues.size	
		Some(newCount)	
}	
val	countStream	=	stream.updateStateByKey[Int]	
								(updateAlbumCount	_)	
Stream	is	a	DStream	of	Pair	RDD's
•One Receiver -> One Node
• Start more receivers and union them
val	numStreams	=	5	
val	kafkaStreams	=	(1	to	numStreams).map	{	i	=>	
KafkaUtils.createStream(...)	}	
val	unifiedStream	=	streamingContext.union(kafkaStreams)	
unifiedStream.print()	
•Received data will be split up into blocks
• 1 block => 1 task
• blocks = batchSize / blockInterval
•Repartition data to distribute over cluster
Spark Streaming - Parallelism_
53
Spark Streaming Demo
54
Spark MLLib
55
•Fully integrated in Spark
• Scalable
• Scala, Java & Python APIs
• Use with Spark Streaming & Spark SQL
•Packages various algorithms for machine learning
•Includes
• Clustering
• Classification
• Prediction
• Collaborative Filtering
•Still under development
• performance, algorithms
Spark MLLib_
56
MLLib Example - Clustering_
57
age
set of data points meaningful clusters
income
//	Load	and	parse	data	
val	data	=	sc.textFile("data/mllib/kmeans_data.txt")	
val	parsedData	=	data

.map(s	=>	Vectors.dense(s.split('	')	
.map(_.toDouble))).cache()	
		
//	Cluster	the	data	into	3	classes	using	KMeans	with	20	
iterations	
val	clusters	=	KMeans.train(parsedData,	2,	20)	
		
//	Evaluate	clustering	by	computing	Sum	of	Squared	Errors	
val	SSE	=	clusters.computeCost(parsedData)	
println("Sum	of	Squared	Errors	=	"	+	WSSSE)	
MLLib Example - Clustering (using KMeans)_
58
MLLib Example - Classification_
59
MLLib Example - Classification_
60
//	Load	training	data	in	LIBSVM	format.	
val	data	=		
MLUtils.loadLibSVMFile(sc,	"sample_libsvm_data.txt")	
//	Split	data	into	training	(60%)	and	test	(40%).	
val	splits	=	data.randomSplit(Array(0.6,	0.4),	seed	=	11L)	
val	training	=	splits(0).cache()	
val	test	=	splits(1)	
//	Run	training	algorithm	to	build	the	model	
val	numIterations	=	100	
val	model	=	SVMWithSGD.train(training,	numIterations)
MLLib Example - Classification (Linear SVM)_
61
//	Compute	raw	scores	on	the	test	set.	
val	scoreAndLabels	=	test.map	{	point	=>	
		val	score	=	model.predict(point.features)	
		(score,	point.label)	
}	
//	Get	evaluation	metrics.	
val	metrics	=	new	
BinaryClassificationMetrics(scoreAndLabels)	
val	auROC	=	metrics.areaUnderROC()	
println("Area	under	ROC	=	"	+	auROC)
MLLib Example - Classification (Linear SVM)_
62
MLLib Example - Collaborative Filtering_
63
//	Load	and	parse	the	data	(userid,itemid,rating)	
val	data	=	sc.textFile("data/mllib/als/test.data")	
val	ratings	=	data.map(_.split(',')	match		
{		
case	Array(user,	item,	rate)	=>	Rating(user.toInt,	
item.toInt,	rate.toDouble)		
})	
//	Build	the	recommendation	model	using	ALS	
val	rank	=	10	
val	numIterations	=	20	
val	model	=	ALS.train(ratings,	rank,	numIterations,	0.01)
MLLib Example - Collaborative Filtering using ALS_
64
//	Evaluate	the	model	on	rating	data	
val	usersProducts	=	ratings.map	{		
case	Rating(user,	product,	rate)	=>	(user,	product)	}	
val	predictions	=	model.predict(usersProducts).map	{	
	case	Rating(user,	product,	rate)	=>	((user,	product),	rate)	
}	
val	ratesAndPredictions	=	ratings.map	{		
case	Rating(user,	product,	rate)	=>((user,	product),	rate)}	
.join(predictions)	
val	MSE	=	ratesAndPredictions.map	{		
case	((user,	product),	(r1,	r2))	=>	val	err	=	(r1	-	r2);	
err	*	err	}.mean()	
println("Mean	Squared	Error	=	"	+	MSE)
MLLib Example - Collaborative Filtering using ALS_
65
Use Cases
66
•In particular for huge amounts of external data
•Support for CSV, TSV, XML, JSON und other
Use Cases for Spark and Cassandra_
67
Data Loading
case	class	User	(id:	java.util.UUID,	name:	String)	
val	users	=	sc.textFile("users.csv")
.repartition(2*sc.defaultParallelism)
.map(line	=>	line.split(",")	match	{	case	Array(id,name)	=>	
User(java.util.UUID.fromString(id),	name)})	
users.saveToCassandra("keyspace","users")
Validate consistency in a Cassandra database
•syntactic
• Uniqueness (only relevant for columns not in the PK)
• Referential integrity
• Integrity of the duplicates
•semantic
• Business- or Application constraints
• e.g.: At least one genre per movies, a maximum of 10 tags per blog
post
Use Cases for Spark and Cassandra_
68
Validation & Normalization
•Modelling, Mining, Transforming, ....
•Use Cases
• Recommendation
• Fraud Detection
• Link Analysis (Social Networks, Web)
• Advertising
• Data Stream Analytics ( Spark Streaming)
• Machine Learning ( Spark ML)
Use Cases for Spark and Cassandra_
69
Analyses (Joins, Transformations,..)
•Changes on existing tables
• New table required when changing primary key
• Otherwise changes could be performed in-place
•Creating new tables
• data derived from existing tables
• Support new queries
•Use the CassandraConnectors in Spark
Use Cases for Spark and Cassandra_
70
Schema Migration
Thank you for your attention!
71
Questions?
Matthias Niehoff,
IT-Consultant
90
codecentric AG
Zeppelinstraße 2
76185 Karlsruhe, Germany
mobil: +49 (0) 172.1702676
matthias.niehoff@codecentric.de
www.codecentric.de
blog.codecentric.de
matthiasniehoff

More Related Content

PDF
Spark And Cassandra: 2 Fast, 2 Furious
PDF
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
PDF
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
PPTX
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
PDF
10 Good Reasons to Use ClickHouse
PDF
MongoDB Performance Tuning
PDF
Cassandra 101
PDF
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
Spark And Cassandra: 2 Fast, 2 Furious
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
10 Good Reasons to Use ClickHouse
MongoDB Performance Tuning
Cassandra 101
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud

What's hot (20)

PDF
[Pgday.Seoul 2017] 8. PostgreSQL 10 새기능 소개 - 김상기
PDF
Data Source API in Spark
PDF
TiDB Introduction
PPTX
An Overview of Apache Cassandra
PDF
MySQL Shell for Database Engineers
PPTX
Cassandra Learning
PDF
Advanced Percona XtraDB Cluster in a nutshell... la suite
PPTX
MySQL8.0_performance_schema.pptx
PDF
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
PDF
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
PPTX
Maxscale 소개 1.1.1
PPTX
事例で学ぶApache Cassandra
PDF
Query and audit logging in cassandra
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
PDF
SQL Server Performance Tuning Baseline
PDF
Nabil Nawaz Oracle Oracle 12c Data Guard Deep Dive Presentation
PDF
Prometheus course
PPTX
Banco de dados temporal
PPTX
Apache Spark Core
PDF
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
[Pgday.Seoul 2017] 8. PostgreSQL 10 새기능 소개 - 김상기
Data Source API in Spark
TiDB Introduction
An Overview of Apache Cassandra
MySQL Shell for Database Engineers
Cassandra Learning
Advanced Percona XtraDB Cluster in a nutshell... la suite
MySQL8.0_performance_schema.pptx
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Maxscale 소개 1.1.1
事例で学ぶApache Cassandra
Query and audit logging in cassandra
A Day in the Life of a ClickHouse Query Webinar Slides
SQL Server Performance Tuning Baseline
Nabil Nawaz Oracle Oracle 12c Data Guard Deep Dive Presentation
Prometheus course
Banco de dados temporal
Apache Spark Core
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Ad

Viewers also liked (14)

PPTX
Spark Cassandra Connector: Past, Present and Furure
PPTX
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
PDF
Streaming Plattformen und die Qual der Wahl
PDF
Advanced Operations
PDF
Feeding Cassandra with Spark-Streaming and Kafka
PDF
Cassandra & Spark for IoT
PDF
Apache cassandra and spark. you got the the lighter, let's start the fire
PDF
Lightning fast analytics with Spark and Cassandra
PDF
Real-time Analytics with Cassandra, Spark, and Shark
PPTX
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...
PPTX
BI, Reporting and Analytics on Apache Cassandra
PDF
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
PDF
When NOT to use MongoDB
PDF
Cassandra and Spark: Optimizing for Data Locality
Spark Cassandra Connector: Past, Present and Furure
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Streaming Plattformen und die Qual der Wahl
Advanced Operations
Feeding Cassandra with Spark-Streaming and Kafka
Cassandra & Spark for IoT
Apache cassandra and spark. you got the the lighter, let's start the fire
Lightning fast analytics with Spark and Cassandra
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...
BI, Reporting and Analytics on Apache Cassandra
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
When NOT to use MongoDB
Cassandra and Spark: Optimizing for Data Locality
Ad

Similar to Big data analytics with Spark & Cassandra (20)

PPTX
Presentation
PDF
Apache cassandra & apache spark for time series data
PDF
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
PDF
PySpark Cassandra - Amsterdam Spark Meetup
PDF
Analytics with Cassandra & Spark
PPTX
Cassandra Java APIs Old and New – A Comparison
PDF
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
PDF
Enter the Snake Pit for Fast and Easy Spark
PDF
3 Dundee-Spark Overview for C* developers
PDF
20170126 big data processing
PDF
Real-Time Spark: From Interactive Queries to Streaming
PDF
Osd ctw spark
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
PPTX
Spark + Cassandra = Real Time Analytics on Operational Data
PDF
Spark Summit EU talk by Ted Malaska
PPTX
Jump Start with Apache Spark 2.0 on Databricks
PDF
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
PPTX
ETL with SPARK - First Spark London meetup
PDF
Lightning fast analytics with Spark and Cassandra
PDF
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
Presentation
Apache cassandra & apache spark for time series data
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
PySpark Cassandra - Amsterdam Spark Meetup
Analytics with Cassandra & Spark
Cassandra Java APIs Old and New – A Comparison
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Enter the Snake Pit for Fast and Easy Spark
3 Dundee-Spark Overview for C* developers
20170126 big data processing
Real-Time Spark: From Interactive Queries to Streaming
Osd ctw spark
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Spark + Cassandra = Real Time Analytics on Operational Data
Spark Summit EU talk by Ted Malaska
Jump Start with Apache Spark 2.0 on Databricks
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
ETL with SPARK - First Spark London meetup
Lightning fast analytics with Spark and Cassandra
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...

Recently uploaded (20)

PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Computer network topology notes for revision
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Introduction to Business Data Analytics.
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
.pdf is not working space design for the following data for the following dat...
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
climate analysis of Dhaka ,Banglades.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Mega Projects Data Mega Projects Data
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Computer network topology notes for revision
Major-Components-ofNKJNNKNKNKNKronment.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Business Data Analytics.
Introduction to Knowledge Engineering Part 1
Clinical guidelines as a resource for EBP(1).pdf
Introduction-to-Cloud-ComputingFinal.pptx
.pdf is not working space design for the following data for the following dat...

Big data analytics with Spark & Cassandra