Powering a Graph Data System with Scylla + JanusGraph

Powering a Graph Data
System with Scylla +
JanusGraph
Ryan Stauffer, Founder & CEO

Presenter
Ryan Stauffer, Founder & CEO
Ryan founded Enharmonic to change the way we interact with
data. He has experience building modern data solutions for fast-
moving companies, both as a consultant and as the leader of
Data Strategy and Analytics at Private Equity-backed Driven
Brands. He received his MBA from Washington University in St.
Louis, and has additional experience in Investment Banking and
as a U.S. Army Infantry Officer. In his free time, he makes music
and tries to set PRs running up Potrero Hill.

Graph Data System
We can break down the concept of a “Graph Data System” into 2 pieces:
■ Graph - we’re modelling our data as a property graph
● Vertices model logical entities (Customer, Product, Order)
● Edges model logical relationships between entities (PURCHASED, IN_ORDER)
● Properties model attributes of entities/relationships (name, purchaseDate)
■ Data System - we use several components in a single system to store
and retrieve our data

3 Core Benefits
■ Flexibility
■ Schema support
■ OLTP & OLAP support (Distinct from Scylla Workload Prioritization)

Flexibility
The “killer feature” of a graph data model is flexibility
■ Changing database schemas to support new business logic and data
sources is tough!
■ The nature of a graph’s data model makes it easier to evolve the data
model over time
■ Iterate on our model to match our understanding as we learn,
without having to start from scratch
■ In practice
● Incorporate fresh data sources without breaking existing workloads
● Write query results directly to the graph as new vertices & edges
● Share production-quality data between teams

Schema Support
By supporting a defined schema, our data system can enforce business
logic, and minimize duplicative application code
■ Flexible schema support out-of-the-box
■ We can pre-define the properties and datatypes that are possible for
a given vertex or edge, without requiring that each vertex/edge
contain every property
■ We can pre-define which edge types are allowed to connect a pair of
vertices, without requiring every pair of vertices to have this edge
■ Simplifies testing on new use cases
■ Separates data integrity maintenance from business logic

OLTP + OLAP
■ Transactional (graph-local) workloads
● Begin with a small number of vertices (found with the help of an index)
● Traverse across a reasonably small number of edges and vertices
● Goal is to minimize latency
● With Scylla, we can achieve scalable, single-digit millisecond response
■ Analytical (graph-global) workloads
● Travel to all (or a substantial portion) of the vertices and edges
● Includes many classic graph algorithms
● Goal is to maximize throughput (might leverage Spark)
■ The same traversal language (Gremlin) can be used to write both
types of workloads
■ At the graph level -> distinct from Scylla workload prioritization

Where to Deploy?
VMs
Bare
Metal

Kubernetes
■ Open-source system for managing containerized applications
■ Groups application containers into logical units
■ Builds abstractions on top of the basic resources
● Compute
● Memory
● Disk
● Network

Deployment Overview
Stateful SetDeployment Storage Class
Headless
Service
Load
Balancer
Client
■ The “stateful” components of our system are Scylla & Elasticsearch
■ JanusGraph is deployed as a stateless server that stores and
retrieves data to and from the stateful systems

Scylla
■ Use your existing deployment == Zero lift!
■ New keyspace for JanusGraph data

Elasticsearch
Stateful Set Storage ClassHeadless Service

Elasticsearch - Manifest Summary
Storage Class kind: StatefulSet
metadata: ...
spec:
serviceName: es
replicas: 3
selector: { matchLabels: { app: es }}
template:
metadata: { labels: { app: es }}
spec:
containers:
- name: elasticsearch
image: .../elasticsearch-oss:6.6.0
env:
- name: discovery.zen.ping.unicast.hosts
value: "es-0.es.default.svc.cluster.local,..."
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata: { name: data }
spec:
accessModes: [ ReadWriteOnce ]
storageClassName: elasticsearch-ssd
kind: Service
metadata:
name: es
labels: { app: es }
spec:
clusterIP: None
ports:
- port: 9200
- port: 9300
selector:
app: es
Headless Service
Stateful Set
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: elasticsearch-ssd
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd

Elasticsearch - Deploy
$ kubectl apply -f elasticsearch.yaml
storageclass.storage.k8s.io/elasticsearch-ssd created
service/es created
statefulset.apps/elasticsearch created
$ kubectl get all -l app=elasticsearch
NAME READY AGE
statefulset.apps/elasticsearch 3/3 2m10s
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-0 1/1 Running 0 2m9s
pod/elasticsearch-1 1/1 Running 0 87s
pod/elasticsearch-2 1/1 Running 0 44s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/es ClusterIP None <none> 9200/TCP,9300/TCP 2m9s

JanusGraph Image
$ git clone https://github.com/JanusGraph/janusgraph-docker.git
$ cd janusgraph-docker
$ sudo ./build-images.sh 0.4
# Push the image to your private project repository
$ docker tag janusgraph/janusgraph:0.4.0 gcr.io/$PROJECT/janusgraph:0.4.0
$ gcloud auth configure-docker
$ docker push gcr.io/$PROJECT/janusgraph:0.4.0
■ There are already official JanusGraph images on Docker Hub
■ You can also build your own using the JanusGraph project build
scripts and push it to a private image repository (ex: GCP)
$ docker pull janusgraph/janusgraph:0.4.0

JanusGraph Console
(Just a Pod…)

JanusGraph Console - Manifest Summary
■ Run JanusGraph in a Pod, and connect to it directly
● Graph is only accessible through this console connection, but actions are persisted
in Scylla and Elasticsearch
kind: Pod
spec:
containers:
- name: janusgraph
image: .../janusgraph:0.4.0
env:
- name: JANUS_PROPS_TEMPLATE
value: cql-es
- name: janusgraph.storage.hostname
value: 10.138.0.3
- name: janusgraph.storage.cql.keyspace
value: graphdev
- name: janusgraph.index.search.hostname

graph = JanusGraphFactory.open('/etc/opt/janusgraph/janusgraph.properties')
mgmt = graph.openManagement()
JanusGraph Console - Deploy & Define Schema
$ kubectl create -f janusgraph-gremlin-console.yaml
$ kubectl exec -it janusgraph-gremlin-console -- bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
...
gremlin>
// Define Schema for a Product Vertex and Properties
Product = mgmt.makeVertexLabel("Product").make()
name = mgmt.makePropertyKey("name").
dataType(String.class).cardinality(Cardinality.SINGLE).make()
productId = mgmt.makePropertyKey("productId").
dataType(Integer.class).cardinality(Cardinality.SINGLE).make()
mgmt.addProperties(Product, name, productId)
mgmt.commit()

JanusGraph Server
DeploymentLoad Balancer

JanusGraph Server - Manifest Summary
■ Deploy JanusGraph as a standalone server
Service
kind: Deployment
labels:
app: janusgraph
spec:
replicas: 1
template:
spec:
containers:
- name: janusgraph
image: .../janusgraph:0.4.0
env:
- name: JANUS_PROPS_TEMPLATE
value: cql-es
- name: janusgraph.storage.hostname
value: 10.138.0.3
- name: janusgraph.storage.cql.keyspace
value: graphdev
- name: janusgraph.index.search.hostname
Deployment
kind: Service
metadata:
name: janusgraph-service-lb
spec:
type: LoadBalancer
selector:
app: janusgraph
ports:
- name: gremlin-server-websocket
protocol: TCP
port: 8182
targetPort: 8182
● Uses TinkerPop Gremlin Server
● Graph will be accessible to a wide range of client languages (Python, Java, JS, etc.)

JanusGraph Server - Deploy
$ kubectl apply -f janusgraph.yaml
service/janusgraph-service-lb created
deployment.apps/janusgraph-server created
$ kubectl get all -l app=janusgraph
NAME READY STATUS RESTARTS AGE
pod/janusgraph-server-5d77dd9ddf-nc87p 1/1 Running 0 1m2s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/janusgraph-service-lb LoadBalancer 10.0.12.109 35.121.171.101 8182/TCP 1m3s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/janusgraph-server 1/1 1 1 1m3s
NAME DESIRED CURRENT READY AGE
replicaset.apps/janusgraph-server-5d77dd9ddf 1 1 1 1m2s

A Better Way - Helm Charts
■ Nobody has time to manage all of these individual manifest files!
■ Use Helm (https://helm.sh) - the “package manager” for k8s
■ Makes it easy to define, deploy & upgrade Kubernetes applications
■ You can find our opinionated take on deploying JanusGraph with
Helm at https://github.com/EnharmonicAI/janusgraph-helm

With Kubernetes, it’s easy
to deploy JanusGraph on
top of Scylla

Flexible, scalable graph
data system for building
applications

Thank you Stay in touch
Any questions?
Ryan Stauffer
ryan@enharmonic.ai
@RyantheStauffer

Powering a Graph Data System with Scylla + JanusGraph

More Related Content

What's hot (20)

Similar to Powering a Graph Data System with Scylla + JanusGraph (20)

More from ScyllaDB (20)

Recently uploaded (20)

Powering a Graph Data System with Scylla + JanusGraph

Editor's Notes