GRAPH 101- GETTING STARTED WITH TITAN AND CASSANDRA

GRAPH 101: GETTING STARTED
WITH TITAN AND CASSANDRA
Shaunak Das

PURPOSE OF THIS SESSION
Get users comfortable with DSE Graph
- DSE Graph is currently under heavy development
- Titan DB as a prototype
DSE Graph is built ‘on top’ of DSE, i.e. will use many of the other features
provided in DSE (e.g. Spark, Hadoop, Solr)
- Today, we use Titan with Cassandra for persistent storage of graphs

WHAT IS A GRAPH DATABASE?
You all know what a graph is. Abstractly, it consists of vertices and edges
connecting pairs of vertices. Edges are allowed to have a directionality between
vertices.
A graph database is a graph implemented as a data structure, in which vertex and
edge instances are allowed to hold multiples [Key:Value]-pairs.

EXAMPLE: AMAZON DATA SET
Id: 1
ASIN: 0827229534
title: Patterns of Preaching: A Sermon Sampler
group: Book
salesrank: 396585
similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X
categories: 2
|Books[283155]|Subjects[1000]|Religion &
Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]
|Books[283155]|Subjects[1000]|Religion &
Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]
reviews: total: 2 downloaded: 2 avg rating: 5
2000-7-28 customer: A2JW67OY8U6HHK rating: 5 votes: 10 helpful: 9
2003-12-14 customer: A2VE83MZF98ITY rating: 5 votes: 6 helpful: 5

AMAZON GRAPH SCHEMA
image courtesy of Pierre LaPorte

IN THE MEANTIME...
As mentioned, we currently have Titan DB as a stand-in for DSE Graph
Titan is what we will be using today
There are several names you may have heard associated to Graph:
Titan
TinkerPop
Gremlin
Let me briefly (and perhaps incorrectly) distinguish what each is for.

TITAN? TINKERPOP?
With data, we care about (simplification):
how to ‘effectively’ store it
● serialization, compaction strategies
● This is Titan
how to ‘effectively’ retrieve/query it
● query algorithms, OLAP vs. OLTP
● this is TinkerPop
DSE Graph will encompass both parts of the above: graph storage and
graph querying/traversing

GETTING STARTED WITH TITAN
One can download a pre-built version of Titan 1.0, with TinkerPop
http://s3.thinkaurelius.com/downloads/titan/titan-1.0.0-hadoop1.zip
We will download and unpack it in a moment.

GREMLIN?
With this Titan distribution comes the Gremlin query language
The Gremlin query language is a graph traversal language, used to navigate and
query graph instances.
“Gremlin is to Titan what CQL is to Cassandra”
(This analogy is not perfect, but for our purposes, is good)
Just like Cassandra comes with a CQL shell, Titan comes with a Gremlin shell.
This will be how the user primarily interfaces with graphs. Let’s use it.

root@perf-lab-03b:~/titan-1.0.0-hadoop1# bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: aurelius.titan
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/titan-1.0.0-hadoop1/lib/slf4j-log4j12-
1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/titan-1.0.0-hadoop1/lib/logback-classic-
1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
13:19:16 INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph -
HADOOP_GREMLIN_LIBS is set to: /root/titan-1.0.0-hadoop1/lib
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.tinkergraph
gremlin>

USING A TINKERGRAPH
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.V().count()
==>0
gremlin> g.E().count()
==>0

TITAN WITH CASSANDRA
You should be asking yourself: where did that graph go?
TinkerGraph = in-memory graph. Once we closed it, the data in that TinkerGraph
instance is gone.
TitanGraph = persistent graph. This is where Cassandra comes into play.
How do we get Titan to play with Cassandra, in order to store a persistent
graph, which we can ‘microwave’ up for future querying and modifications?

TITAN WITH CASSANDRA
PREREQUISITE: Cassandra is running on your machine/cluster.
Enter back into the Gremlin REPL. Specify the type of graph, the host machine
Cassandra is running on, and keyspace we want to store this graph:
conf = new BaseConfiguration()
conf.setProperty(‘gremlin.graph’,’com.thinkaurelius.titan.core.TitanFactory’)
conf.setProperty(‘storage.backend’, ‘cassandra’)
conf.setProperty(‘storage.hostname’, ‘localhost’)
conf.setProperty(‘storage.cassandra.keyspace’, ‘graph’)
Instantiate your graph, with the above specified configurations:
graph = GraphFactory.open(conf)

AUTOMATING DATA LOADING
So that was quite a bit of work to just get two vertices into Cassandra.
What if we are dealing with a large data set that needs to get into a TitanGraph?
The Gremlin shell accepts parser scripts for automating the loading of data.

We have the following parser script for this data set:
https://github.com/riptano/automaton/blob/master/resources/tests/graph/scripts/A
mazonTitan.groovy
Let’s take a high-level glance at what is involved here.

QUESTIONS WE CAN ‘ANSWER’ WITH GRAPH
Let’s return back to our Amazon data set example. Suppose we want to determine
all users who liked a particular item with ASIN number X?
g.V() ← Get all vertices
g.V().has('ASIN', 'X') ← ...with ASIN value X
g.V().has(‘ASIN’, ‘X’).inE('rated') ← Grab its incoming rated edges
g.V().has(‘ASIN’, ‘X’).inE('rated').has('rating', 5) ← with
rating value 5
g.V().has('ASIN', ‘X’).inE('rated').has('rating', 5).outV() ←
customers

RECOMMENDATION SYSTEM?
Now suppose we want to get the top ten items that were liked by customers who
liked item with ‘ASIN’ value X?
What kind of traversal query should be make now?

POTENTIAL FUTURE SESSIONS AND TOPICS
Defining Graph Schema: indexing
Using Hadoop and Spark for ‘OLAP-Querying’ a Graph
Using Hadoop and Spark for Bulk Loading Graph Data into Cassandra
Your suggestions!

GRAPH 101- GETTING STARTED WITH TITAN AND CASSANDRA

More Related Content

What's hot (20)

Similar to GRAPH 101- GETTING STARTED WITH TITAN AND CASSANDRA (20)

GRAPH 101- GETTING STARTED WITH TITAN AND CASSANDRA

Editor's Notes