SlideShare a Scribd company logo
Extensible RESTful
Applications
with Apache Tinkerpop
Graph Day SF 2018
About Us
LIKES
{
"first_name": "Varun",
"last_name": "Ganesh",
}
{
"first_name": "Harshvardhan",
"last_name": "Joshi",
}
LIKES
CONNECTING TO
BUSINESS STACKS VISUALISATION
CUSTOM BUILT
INFOGRAPHICS
NATURAL
LANGUAGE
GENERATED
INSIGHTS
EXPORT & SHARE
STORIES
EMAIL
POWERPOINT, TV
WEB
Embedded SDK
About
CLIENTS
• Automating the process of data storytelling
• For more information, visit www.nugit.co
Agenda
• Use Cases
• The Slack APIs
• Defining the Entities
• Graph Design and Considerations
• Making the Graph RESTful
• Building a DSL
• Testing the Application
• Scaling the Graph
Use Cases - Communities
• View contribution to
communication
• Participation across
channels
• Identify collaborative
groups
• Users connected by
mentions and reactions
• Identify influential users
per channel
• Highlight engaging conversations
• Top videos, GIFs, links
• Get insights across channels
Use Cases – Top Posts
Defining the
Entities
Top Post:
• Files shared
• Messages with attachments
• Posts without replies or reactions
are not considered
Defining the
Entities
Notable Message:
• Messages with reactions or replies
• Replies and Comments that have
reactions
• Other alerts that gather reactions
Defining the
Entities
Mention:
• Replies and Comments can have
mentions too
• Ignore mentions that are
unnecessary or alreadycaptured in
a relationship
Defining the Entities
• Narrows down data required for the use case
• Helps “whiteboarding” process for graph design
• Allows defining schema for payloads
• Requires understanding the nuances of the platform
Graph Design and Considerations
• Team node acts as root node
• Allows maintaing separate graphs
for different organisations
Graph Design and Considerations
• Top posts, notable messages are
both message nodes
• Differentiated using edge labels
• Edge traversals favoured over
property lookup
Graph Design and Considerations
• Any user can comment on, react to
or be mentioned in any message
• Reaction type modelled as edge
property
• Efficient as use-case does not need
filtering by reaction type
Graph Design and Considerations
• Same file shared across channels
shares common pool of reactions
• Schema respects Slack specific
behaviour
• Handles idempotency based on
unique ID maintained by Slack
Graph Design and Considerations
{
"type": "message",
"user": "U2FQG2G9F",
"text": "next time you want cereal: n<https://www.instagram.com/p/BcDN4eWFjac/?taken-
by=therock>",
"attachments": [
{
"service_name": "Instagram",
"title": "Instagram post by @therock • Nov 28, 2017 at 7:14pm UTC",
"title_link": "https://www.instagram.com/p/BcDN4eWFjac/?taken-by=therock",
"text": "346.3k Likes, 2,167 Comments - @therock on Instagram:”……”",
"fallback": "Instagram: Instagram post by @therock • Nov 28, 2017 at 7:14pm UTC",
"image_url": "https://scontent-iad3-1.cdninstagram.com/t51.2885-15/e35/24178_n.jpg",
"from_url": "https://www.instagram.com/p/BcDN4eWFjac/?taken-by=therock",
"image_width": 334,
"image_height": 250,
"image_bytes": 178559,
"service_icon": "https://www.instagram.com/static/images/ico/appl.png/932e4d9af891.png",
"id": 1
}
],
"thread_ts": "1511936426.000178",
"reply_count": 3,
"replies": [
{
"user": "U193XDML7",
"ts": "1511953167.000138”
},
{
"user": "U2FQG2G9F",
"ts": "1511953180.000044"
},
{
"user": "U193XDML7",
"ts": "1511953192.000230”
}
],
"ts": "1511936426.000178",
"reactions": [
{
"name": "smile",
"users": [
"U193XDML7”
],
"count": 1
},
{
"name": "obesecat",
"users": [
"U193XDML7”
],
"count": 1
}
]
}
The Slack APIs
Endpoint:
https://slack.com/api/conversations.history
Endpoint:
https://slack.com/api/conversations.history
[
{
"type": "message",
"user": "U4BPQR94L",
"text": "Yinghui Malmsteen
<@U2FQG2G9F>n<https://www.youtube.com/watch?v=D4OxW_0qqv8>",
"attachments": [
{
...
}
],
"ts": "1536057373.000100",
"reactions": [
{
"name": "flag-se",
"users": [
"U58LYK8Q6"
],
"count": 1
}
]
}
]
[
{
"user": "U2Q2U37SA",
"inviter": "U0LPSJQR0",
"text": "<@U2Q2U57SA> has joined the channel",
"type": "message",
"subtype": "channel_join",
"ts": "1536138265.000200”
}
]
The Slack APIs
[
{ "id": "U4C0FDU2J",
"team_id": "T028ZLMQN",
"name": "friendlybotdev",
"deleted": true,
"profile": {
"title": "",
"phone": "",
"skype": "",
"real_name": "Friendly Bot",
"real_name_normalized": "Friendly Bot",
"display_name": "friendlybotdev",
"display_name_normalized": "friendlybotdev",
"status_text": "",
"status_emoji": "",
"status_expiration": 0,
"avatar_hash": "123456",
"bot_id": "B4B47T0G3",
"api_app_id": "A4B92ZEER",
"always_active": true,
"image_original": "https://slack-edge.com/2017-06-21/123456_original.png",
"first_name": "Friendly",
"last_name": "Bot",
"image_24": "https://slack-edge.com/2017-06-21/123456_24.png",
"image_32": "https://slack-edge.com/2017-06-21/123456_32.png",
"image_48": "https://slack-edge.com/2017-06-21/123456_48.png",
"image_72": "https://slack-edge.com/2017-06-21/123456_72.png",
"image_192": "https://slack-edge.com/2017-06-21/123456_192.png",
"image_512": "https://slack-edge.com/2017-06-21/123456_512.png",
"image_1024": "https://slack-edge.com/2017-06-21/123456_1024.png",
"status_text_canonical": "",
"team": "T028Z5MQN"
},
"is_bot": true,
"is_app_user": false,
"updated": 1517305013
}
]
[
{ "id": "C8KMHCN5D",
"name": "arandomchannel",
"is_channel": true,
"created": 1507613685,
"creator": "U5BG5XU6T",
"is_shared": false,
"is_member": true,
"is_private": false,
"last_read": "1533892238.000324",
"latest": {
"type": "message",
"user": "U84K3ZTF9",
"text": "let's meetup tomorrow",
"ts": "1536139470.000100"
},
"unread_count": 7,
"unread_count_display": 7,
"members": [
"U08ED90CD",
"U0LPSJQR0",
"U193XDML7",
"U9LKWV9C1",
"UBJ4CHV5L" ],
"topic": {
"value": "place for people who are interested in sharing and learning",
"creator": "U5BGLXU6T",
"last_set": 1507613720
},
"purpose": {
"value": "",
"creator": "",
"last_set": 0
},
"previous_names": []
}
]
Endpoint:
https://slack.com/api/users.list
Endpoint:
https://slack.com/api/channels.info
The Slack APIs
The Journey So Far
• Defining entities and modelling them into Graph
• Iterative feedback-drivenprocess
• Understanding the data available from the API
• Identifying unique IDs
• Filtering out required fields
Data Ingestion and Extraction
• Apache Flink cluster retrieves, parses and filters Slack data
• GraphQL service requests data for visualization
• Flask REST service ingests/queries data to/from Tinkerpop
POST
PUT
GET
Gremlin-Python
Gremlin
Bytecode
Why Tinkerpop?
• Abstraction that lets us avoid vendor lock-in
• Reduces rework when switching data stores
• Gremlin query language
• Hadoop and SparkComputer
Making the Graph RESTful
• Defining REST Endpoints
• Defining the Resources
• Remote Traversals
• Write endpoints for seeding
• POST /teams/<team_uid>/channels
• POST /teams/<team_uid>/channels/<channel_uid>/messages
• Handling Idempotency
• Replace default strategy with ”ElementIDStrategy”
• Enables creation of nodes with Slack specific unique IDs
Defining REST Endpoints
// scripts/empty-sample.groovy
globals << [g : graph.traversal(),sg: graph.traversal().withStrategies(ElementIdStrategy.build().create())]
• Read endpoints for queries
• GET /teams/<team_uid>/top_posts
Making the Graph RESTful
• Setting up REST Endpoints
• Defining the Resources
• Remote Traversals
Defining the Resources
from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema
from marshmallow.exceptions import ValidationError
...
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
ts = fields.Float(required=True)
text = fields.Str()
comment = fields.Str()
subtype = fields.Str()
bot_id = fields.Str(validate=is_bot_uid)
user = fields.Str(validate=is_user_uid) thread_ts = fields.Str()
file_share = fields.Nested(FileShareSchema, load_from="file")
attachments = fields.Nested(AttachmentSchema, many=True)
reactions = fields.Nested(ReactionSchema, many=True)
comments = fields.Nested(CommentSchema, many=True, load_from="replies")
mentions = fields.List(fields.Str(validate=is_user_uid))
class AttachmentSchema(Schema):
""" Holds all the required fields for an Attachment object."""
class ReactionSchema(Schema):
""" Holds all the required fields for a reaction object."""
class CommentSchema(Schema):
""" Holds all the required fields for a comment object."""
...
• Organized code with single point of
reference
Defining the Resources
from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema
from marshmallow.exceptions import ValidationError
...
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
@validates_schema
def validate_message(self, data):
""" Validate if the message contains any of comments, mentions or reactions. """
if not any([f(data) for f in (has_comments, has_mentions, has_reactions)]):
raise ValidationError("The message must contain comments, mentions or
reactions")
ts = fields.Float(required=True)
text = fields.Str()
comment = fields.Str()
subtype = fields.Str()
bot_id = fields.Str(validate=is_bot_uid)
user = fields.Str(validate=is_user_uid) thread_ts = fields.Str()
file_share = fields.Nested(FileShareSchema, load_from="file")
attachments = fields.Nested(AttachmentSchema, many=True)
reactions = fields.Nested(ReactionSchema, many=True)
comments = fields.Nested(CommentSchema, many=True, load_from="replies")
mentions = fields.List(fields.Str(validate=is_user_uid))
class AttachmentSchema(Schema):
""" Holds all the required fields for an Attachment object."""
class ReactionSchema(Schema):
""" Holds all the required fields for a reaction object."""
class CommentSchema(Schema):
""" Holds all the required fields for a comment object."""
...
• Organized code with single point of
reference
• Validate data before ingestion
• Enforce types and required fields
@validates_schema
def validate_message(self, data):
""" Validate if the message contains any of comments, mentions or reactions. """
if not any([f(data) for f in (has_comments, has_mentions, has_reactions)]):
raise ValidationError("The message must contain comments, mentions or
reactions")
from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema
from marshmallow.exceptions import ValidationError
...
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
class AttachmentSchema(Schema):
""" Holds all the required fields for an Attachment object."""
title = fields.Str()
fallback = fields.Str()
text = fields.Str()
thumb_url = fields.Str()
image_url = fields.Str()
title_link = fields.Str()
@post_load
def reshape_attachment(self, data):
""" Apply required transformations on the Attachment object. ""”
# Create a post_title field
collapse_keys(data, "post_title", *("fallback", "title", "text"))
# Create a post_thumbnail field
collapse_keys(data, "post_thumbnail", *("thumb_url", "image_url",
"title_link"))
# Set post_type to URL
data["post_type"] = "URL”
class ReactionSchema(Schema):
""" Holds all the required fields for a reaction object."""
class CommentSchema(Schema):
""" Holds all the required fields for a comment object."""
class FileShareSchema(Schema):
""" Holds all the required fields for a File Share object.""”
class UserSchema(Schema):
""" Holds all the required fields for a User object.""”
...
• Organized code with single point of
reference
• Validate data before ingestion
• Enforce types and required fields
• Normalize fields with post-
processing
Defining the Resources
@post_load
def reshape_attachment(self, data):
""" Apply required transformations on the Attachment object. ""”
# Create a post_title field
collapse_keys(data, "post_title", *("fallback", "title", "text"))
# Create a post_thumbnail field
collapse_keys(data, "post_thumbnail", *("thumb_url", "image_url",
"title_link"))
# Set post_type to URL
data["post_type"] = "URL”
Making the Graph RESTful
• Schema enforcement and validation
• Handling Idempotency of endpoints
• Custom Traversal Source
Remote Traversals
• Bytecode sent over network instead of string
• Allows using custom traversal source for a Domain Specific Language (DSL)
from gremlin_python.driver.driver_remote_connection import
DriverRemoteConnection
...
conn = DriverRemoteConnection(GREMLIN_SERVER_HOST, 'sg')
slack = Graph().traversal(SlackTraversalSource).withRemote(conn)
Building a DSL
• Motivations
• Custom Workflows
Building a DSL - Motivations
class SlackTraversalSource(BaseTraversalSource):
""" Module to initialise a Graph with the methods listed under SlackTraversal. """
def __init__(self, *args, **kwargs):
super(SlackTraversalSource, self).__init__(*args, **kwargs)
self.graph_traversal = SlackTraversal
def channels(self, *channel_ids):
""" Shorthand to identify all channel nodes"""
return traversal
• Custom traversal source can also specify useful shorthands
• E.g. Traversing to all the Channel nodes
traversal = self.get_graph_traversal()
traversal.bytecode.add_step("V")
traversal.bytecode.add_step("hasLabel", NODES.channel)
if channel_ids:
traversal.bytecode.add_step("has", "__id", P.within(channel_ids))
Building a DSL - Motivations
class SlackTraversal(BaseTraversal):
def addPartOfChannelEdges(self, channel_uid, *user_uids, **kwargs):
""" Add an edge to a channel from the users who were/are a part of the channel. ""”
return self
• Custom traversal source specifies business logic behind traversals
• E.g. Connecting a User node to a Channel node
for user_uid in user_uids:
edge_uid = construct_uid(user_uid, channel_uid, EDGES.part_of.name, delim="|")
self.getOrAddEdgeFrom(edge_label=EDGES.part_of, edge_uid=edge_uid,
node_label=NODES.user, node_uid=user_uid)
.upsertProperties(kwargs.get("properties")).inV()
Building a DSL - Motivations
from gremlin_python.process.graph_traversal import GraphTraversal
from gremlin_python.process.graph_traversal import GraphTraversalSource, __
class BaseTraversal(GraphTraversal):
def getOrAddEdgeFrom(self, edge_label, edge_uid, node_label, node_uid):
"""
Adds an edge from the node with the given label and uid only if the edge doesn’t exist.
"""
return self.coalesce(
__.addE(edge_label).property(T.id, edge_uid).from_(
__.V().getNode(node_label, node_uid)))
__.InE(edge_label).hasId(edge_uid).and(
__.outV().hasId(node_uid), __.outV().hasLabel(node_label)),
• BaseTraversal handles creation of nodes and edges
• These methods should guarantee idempotency
• E.g. Creation of edges between two nodes…
• ...checks for an existing edge
Building a DSL - Motivations
from gremlin_python.process.graph_traversal import GraphTraversal
from gremlin_python.process.graph_traversal import GraphTraversalSource, __
class BaseTraversal(GraphTraversal):
def getOrAddEdgeFrom(self, edge_label, edge_uid, node_label, node_uid):
"""
Adds an edge from the node with the given label and uid only if the edge doesn’t exist.
"""
return self.coalesce(
__.InE(edge_label).hasId(edge_uid).and(
__.outV().hasId(node_uid), __.outV().hasLabel(node_label)),
__.addE(edge_label).property(T.id, edge_uid).from_(
__.V().getNode(node_label, node_uid)))
• The edge is created only if it doesn’t already exist
def build_visualization(self, traversal_source,
**kwargs):
""" The below are standardized steps that are
required to generate data for any visualization."""
return self.start(traversal_source)
.filterByDate(self.date_dimension,
kwargs.get("start_time"),
kwargs.get("end_time"))
.filterByFields(self.filters_map,
kwargs.get("filters"))
.sortByFields(self.sorting_map,
kwargs.get("sort_field"),
kwargs.get("sort_direction"))
.buildObject(self.object_map).toList()
Building a DSL – Custom Workflows
• Standardized steps for generating a visualization are defined in the BaseTraversal
• Custom maps define traversal paths for fields that vary across visualizations
Building a DSL – Custom Workflows
# Sample filter from frontend
filter_obj = {'_and': [{"field": 'reactions', '_gte': 100},
{"field": 'post_creator',
'_in': [‘bob’, ‘chloe']
}]}
filter_map = {"post_creator": lambda pred:
__.in_(EDGES.created_post).has(USER.display_name, pred),
"reactions": lambda pred:
__.inE(EDGES.reacted_to).count().is_(pred)
}
object_map = {
"post_creator": {"uid": [__.in_(EDGES.created_post).values("__id"),
__.constant("")],
"image": ... # define similar path here,
},
"reactions":
__.inE(EDGES.reacted_to).groupCount().by(__.values(REACTION.name))
}
start = lambda traversal_source: traversal_source.posts()
# DSL generates the required lower level base traversals
slack.posts().where(
__.and_(
__.inE(EDGES.reacted_to).count().is_(P.gte(100)),
__.in_(EDGES.created_post).has(USER.display_name,
P.within(['bob', 'chloe'])))).
project("post_creator", "reactions").by(
__.project("image", "display_name", "uid").by(
__.in(EDGES.created_post).values(USER.image),
__.in(EDGES.created_post).values(USER.display_name),
__.in(EDGES.created_post).values("__id"))).by(
__.inE(EDGES.reacted_to).groupCount()).toList()
# Inject maps into DSL methods
start(slack)
.filterByFields(self.filters_map, kwargs.get("filters"))
.buildObject(self.object_map)
.toList()
• The DSL takes in functions/paths that map fields to their traversals
• Maps customized based on the visualization that is needed
Building a DSL – Custom Workflows
{
"reactions": {
"palm_tree": 82,
"robot_face": 18
},
"post_creator": {
"image": "https://url_of_image.jpg",
"display_name": ”chloe",
"uid": "U024ZH7HL”
}
}
• The traversals generated churn out the final response objects
• Objects rendered into visualizations by the client
Testing the Application
• Unit Tests
• Validating traversals on Gremlin Server
Check if test passes
Use Fixtures
Write code to make the
test pass
Write a failing test
class TestNodeMethods(object):
""" Test methods that help in retrieval and creation of Nodes. """
def test_node_retrieval(self, graph):
""" Test if getNode retrieves an existing node. """
assert graph.V().getNode(label="person", uid=100)
.count().next() == 1
assert graph.V().getNode(label="person", uid=101)
.count().next() == 1
Start Gremlin
Server
Testing Our Application – Unit Testing
Check if test passes
Use Fixtures
Write code to make the
test pass
Write a failing test
Start Gremlin
Server
def getNode(self, label, uid):
"""
Returns the node with the given label and uid.
Args: label (string): The label of the node to return
uid (string): Unique ID of the node
Raises: StopIteration: Node with the given label and uid does not exist
"""
return self.and_(__.hasLabel(label), __.has(T.id, uid))
Testing Our Application – Unit Testing
Check if test passes
Use Fixtures
Write code to make the
test pass
Write a failing test
Start Gremlin
Server
$ bin/gremlin-server.sh conf/gremlin-server-neo4j-python.yaml
class TestBasicTraversal(object):
"""
Tests for methods that help create edges or nodes
and methods that help populate the properties of these objects.
"""
@pytest.fixture(scope="module")
def graph(self):
""" Graph with two nodes and one edge connecting them. """
graph = Graph().traversal(CerebroTraversalSource)
.withRemote(
DriverRemoteConnection(GREMLIN_SERVER_HOST,
GREMLIN_SERVER_TRAVERSER))
graph.V().clear()
from_node = graph.addV("person").
property(T.id, 100).next()
to_node = graph.addV("person").
property(T.id, 101).next()
graph.addE("knows").from_(from_node).to(to_node)
.property("__id", "1")
.next()
yield graph
graph.V().clear()
Testing Our Application – Unit Testing
Check if test passes
class TestNodeMethods(object):
""" Test methods that help in retrieval and creation of Nodes. """
def test_node_retrieval(self, graph):
""" Test if getNode retrieves an existing node. """
assert graph.V().getNode(label="person", uid=100)
.count().next() == 1
assert graph.V().getNode(label="person", uid=101)
.count().next() == 1
Write code to make the
test pass
Write a failing test
Use Fixtures
Start Gremlin
Server
Testing Our Application – Unit Testing
[
{
"reactions": [
{
"name": "joy",
"users": [
"U5K7JUATE”
]
}
],
"attachments": [
{
...
}
],
"text": "<https://www.youtube.com/watch?v=4iEh1ykb13w>",
"ts": "1465895473.000050",
"user": "U37BF9457",
"type": "message”
}
]
Testing Our Application – Unit Testing
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
. . .
• Fixture used to test if the
MessageSchema class is
implemented correctly
[
{
"reactions": [
{
"name": "joy",
"users": [
"U5K7JUATE”
]
}
],
"attachments": [
{...}
],
"text": ” <@U123456> <https://www.youtube.com/watch?v=4iEh1ykb13w>",
"mentions": [
"U123456”
],
"ts": ”a
"type": "message”
}
]
Testing Our Application – Unit Testing
class MessageSchema(Schema):
""" Holds all the required fields for a message object."""
mentions = fields.List(fields.Str(validate=is_user_uid))
• MessageSchema needs
to include mentions
• Update the fixture to
be able to test that the
schema includes
mentions
• Need to validate if
traversals pick up
mentions
Write code to make the
test pass
Write a failing test
[
{
"reactions": [
{
"name": "joy",
"users": [
"U5K7JUATE”
]
}
],
"attachments": [
{...}
],
"text": ” <@U123456> <https://www.youtube.com/watch?v=4iEh1ykb13w>",
"mentions": [
"U123456”
],
"ts": ”a
"type": "message”
}
]
gremlin> graph.io(graphson()).writeGraph("graph_name.json")
Testing Our Application – Unit Testing
Update JSON &
Generate GraphSON
Check if test passes
Use Fixtures
Start Gremlin
Server
Write code to make the
test pass
Write a failing test
@pytest.fixture(scope="module")
def slack_graph():
""" Open a subgraph on localhost for testing. """
slack.V().clear()
slack_client = Client(GREMLIN_SERVER_HOST, SLACK_TRAVERSER)
path_to_fixture = str(Path.cwd().joinpath(
"tests/fixtures/slack_graph.json"))
graphson_statement = 'graph.io(graphson()).readGraph("{}")’.
format(path_to_fixture)
slack_client.submit(graphson_statement).all().result()
yield slack
slack.V().clear()
Testing Our Application – Unit Testing
Update JSON &
Generate GraphSON
Check if test passes
Use Fixtures
Start Gremlin
Server
Testing the Application – CI/CD
• Automated tests using CircleCI
• Custom Configuration for Gremlin Server
• Caching Dependencies for Faster Tests
steps: #CircleCI 2.0
...
- run:
command: |
if [ ! -d ./apache-tinkerpop-gremlin-server-3.3.3 ]; then
curl -O https://archive.apache.org/dist/tinkerpop/3.3.3/apache-tinkerpop-gremlin-server-
3.3.3-bin.zip
unzip -q apache-tinkerpop-gremlin-server-3.3.3-bin.zip
# Install gremlin-python
cd ./apache-tinkerpop-gremlin-server-3.3.3 && 
./bin/gremlin-server.sh install org.apache.tinkerpop gremlin-python 3.3.3
# Change max content length and traversal strategy
sed -i -- 's/.*maxContentLength:.*/maxContentLength: 2621440/g' conf/gremlin-server.yaml
sed -i -- 's/graph.traversal()]/graph.traversal(),sg:
graph.traversal().withStrategies(ElementIdStrategy.build().create())]/g' 
./scripts/empty-sample.groovy
fi
...
Testing the Application – CI/CD
Testing the Application – CI/CD
steps: #CircleCI 2.0
- checkout
- restore_cache:
keys:
- v1-dependencies-{{ .Branch }}
- v1-dependencies-master
- run:
# Download and install Gremlin server
...
# Cache the installation
- save_cache:
key: v1-dependencies-{{ .Branch }}
paths:
- ~/src/app_name/apache-tinkerpop-gremlin-server-3.3.3
# Test
- run:
# Starting Gremlin Server
command: |
cd ./apache-tinkerpop-gremlin-server-3.3.3 && ./bin/gremlin-server.sh 
./conf/gremlin-server.yaml
background: true
# Sleep to give the gremlin server enough time to start
- run: sleep 10
- run: pycodestyle app_name
- run: coverage run --source=app_name -m pytest tests --capture=no --strict
- run: coverage report -m --fail-under=95
Testing the Application – CI/CD
Scaling Our Graph
• Async Traversals
• HA Cluster and Load Balancing
def seed_channels(data, team_uid):
for channel_data in data:
channel_uid, creator, members = (channel_data.pop(key) for
key in ["uid", "creator", "members"])
slack.V().addChannel(channel_uid, properties=channel_data).next()
slack.teams(team_uid).addTeamHasChannelEdge(team_uid, channel_uid).next()
slack.users(creator).addCreatedChannelEdge(creator, channel_uid).next()
slack.channels(channel_uid).addPartOfChannelEdges(channel_uid, *members).next()
def seed_channels(data, team_uid):
for channel_data in data:
channel_uid, creator, members = (channel_data.pop(key) for
key in ["uid", "creator", "members"])
slack.V().addChannel(channel_uid, properties=channel_data)
.addTeamHasChannelEdge(team_uid, channel_uid).inV()
.addCreatedChannelEdge(creator, channel_uid).inV()
.addPartOfChannelEdges(channel_uid, *members).next()
def seed_channels(data, team_uid):
for channel_data in data:
channel_uid, creator, members = (channel_data.pop(key) for
key in ["uid", "creator", "members"])
slack.V().addChannel(channel_uid, properties=channel_data)
.addTeamHasChannelEdge(team_uid, channel_uid).inV()
.addCreatedChannelEdge(creator, channel_uid).inV()
.addPartOfChannelEdges(channel_uid, *members).promise()
• Seed subgraph using “next”
• Reduce number of blocking calls to one
per channel
• Seed subgraph using “promise”
• Make seeding asynchronous, no
blocking calls
• Verify that the returned futures were
successful
• Seed individual entities using “next”
• Each call to “next” is blocking
Async Traversals
next()
next()
next()
next()
next()
promise()
HA Cluster and Load Balancing
• Preparing for high availability with Neo4J and Gremlin
• Configuring Gremlin Server and Neo4J
• Understanding the Neo4J HA Architecture
• Advantages
• Data replication
• Spread writes across instance
• Handle greater read loads
• HA cluster is fronted by a load balancer like HAProxy
• Reference:
• https://neo4j.com/docs/operations-manual/current/ha-cluster/architecture/
• http://tinkerpop.apache.org/docs/3.3.3/reference/#_high_availability_configuration
HA Cluster and Load Balancing
• Tuning parameters for the cluster
• Frequency of pulling updates from other members of the cluster
• gremlin.neo4j.conf.ha.pull_interval
• Number of slaves a transaction should be committed to
• gremlin.neo4j.conf.ha.tx_push_factor
• Tuning parameters for the Load Balancer
• Routing requests across the cluster
• balance
• Checking if the members in the cluster are responsive
• option httpchk
// gremlin-server-neo4j-ha-{1..3}.yaml
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
> curl "http://localhost:8182?gremlin=100-1"
Thank You
Graph Day SF 2018

More Related Content

PPTX
Elasticsearch
PDF
Streamlining JSON mapping
PDF
Elasticsearch first-steps
PDF
CouchDB at New York PHP
PDF
Entity Relationships in a Document Database at CouchConf Boston
KEY
JSON-LD: JSON for Linked Data
PPTX
ElasticSearch - Introduction to Aggregations
PDF
Building Highly Flexible, High Performance Query Engines
Elasticsearch
Streamlining JSON mapping
Elasticsearch first-steps
CouchDB at New York PHP
Entity Relationships in a Document Database at CouchConf Boston
JSON-LD: JSON for Linked Data
ElasticSearch - Introduction to Aggregations
Building Highly Flexible, High Performance Query Engines

What's hot (6)

PDF
Elasticsearch: You know, for search! and more!
PPTX
Introduction to Elasticsearch with basics of Lucene
PDF
Data Exploration with Elasticsearch
PPTX
Elasticsearch
PPTX
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
PDF
tutorial2-notes2
Elasticsearch: You know, for search! and more!
Introduction to Elasticsearch with basics of Lucene
Data Exploration with Elasticsearch
Elasticsearch
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
tutorial2-notes2
Ad

Similar to Extensible RESTful Applications with Apache TinkerPop (20)

PDF
Python RESTful webservices with Python: Flask and Django solutions
PDF
PDF
Max euro python 2015
PDF
Django at Scale
PDF
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
PDF
The Art of Social Media Analysis with Twitter & Python
PDF
Writing a REST Interconnection Library in Swift
PDF
Intuitive APIs and Developer Education
PDF
API Design & Security in django
PDF
The Serverless GraphQL Backend Architecture
PDF
SFScon 2020 - Nikola Milisavljevic - BASE - Python REST API framework
PDF
GraphQL Subscriptions
PDF
Brubeck
PPTX
Build restful ap is with python and flask
KEY
Messaging, interoperability and log aggregation - a new framework
PDF
Diving into GraphQL, React & Apollo
PDF
Wieldy remote apis with Kekkonen - ClojureD 2016
PPTX
Scaling GraphQL Subscriptions
PPTX
REST with Eve and Python
PDF
Dexterity in the Wild
Python RESTful webservices with Python: Flask and Django solutions
Max euro python 2015
Django at Scale
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python
Writing a REST Interconnection Library in Swift
Intuitive APIs and Developer Education
API Design & Security in django
The Serverless GraphQL Backend Architecture
SFScon 2020 - Nikola Milisavljevic - BASE - Python REST API framework
GraphQL Subscriptions
Brubeck
Build restful ap is with python and flask
Messaging, interoperability and log aggregation - a new framework
Diving into GraphQL, React & Apollo
Wieldy remote apis with Kekkonen - ClojureD 2016
Scaling GraphQL Subscriptions
REST with Eve and Python
Dexterity in the Wild
Ad

Recently uploaded (20)

PPTX
Introduction to machine learning and Linear Models
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Business Analytics and business intelligence.pdf
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to machine learning and Linear Models
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Fluorescence-microscope_Botany_detailed content
Introduction to Knowledge Engineering Part 1
IBA_Chapter_11_Slides_Final_Accessible.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
climate analysis of Dhaka ,Banglades.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IB Computer Science - Internal Assessment.pptx
Quality review (1)_presentation of this 21
STUDY DESIGN details- Lt Col Maksud (21).pptx
Supervised vs unsupervised machine learning algorithms
Business Analytics and business intelligence.pdf
STERILIZATION AND DISINFECTION-1.ppthhhbx
Business Ppt On Nestle.pptx huunnnhhgfvu

Extensible RESTful Applications with Apache TinkerPop

  • 1. Extensible RESTful Applications with Apache Tinkerpop Graph Day SF 2018
  • 2. About Us LIKES { "first_name": "Varun", "last_name": "Ganesh", } { "first_name": "Harshvardhan", "last_name": "Joshi", } LIKES
  • 3. CONNECTING TO BUSINESS STACKS VISUALISATION CUSTOM BUILT INFOGRAPHICS NATURAL LANGUAGE GENERATED INSIGHTS EXPORT & SHARE STORIES EMAIL POWERPOINT, TV WEB Embedded SDK About CLIENTS • Automating the process of data storytelling • For more information, visit www.nugit.co
  • 4. Agenda • Use Cases • The Slack APIs • Defining the Entities • Graph Design and Considerations • Making the Graph RESTful • Building a DSL • Testing the Application • Scaling the Graph
  • 5. Use Cases - Communities • View contribution to communication • Participation across channels • Identify collaborative groups • Users connected by mentions and reactions • Identify influential users per channel
  • 6. • Highlight engaging conversations • Top videos, GIFs, links • Get insights across channels Use Cases – Top Posts
  • 7. Defining the Entities Top Post: • Files shared • Messages with attachments • Posts without replies or reactions are not considered
  • 8. Defining the Entities Notable Message: • Messages with reactions or replies • Replies and Comments that have reactions • Other alerts that gather reactions
  • 9. Defining the Entities Mention: • Replies and Comments can have mentions too • Ignore mentions that are unnecessary or alreadycaptured in a relationship
  • 10. Defining the Entities • Narrows down data required for the use case • Helps “whiteboarding” process for graph design • Allows defining schema for payloads • Requires understanding the nuances of the platform
  • 11. Graph Design and Considerations • Team node acts as root node • Allows maintaing separate graphs for different organisations
  • 12. Graph Design and Considerations • Top posts, notable messages are both message nodes • Differentiated using edge labels • Edge traversals favoured over property lookup
  • 13. Graph Design and Considerations • Any user can comment on, react to or be mentioned in any message • Reaction type modelled as edge property • Efficient as use-case does not need filtering by reaction type
  • 14. Graph Design and Considerations • Same file shared across channels shares common pool of reactions • Schema respects Slack specific behaviour • Handles idempotency based on unique ID maintained by Slack
  • 15. Graph Design and Considerations
  • 16. { "type": "message", "user": "U2FQG2G9F", "text": "next time you want cereal: n<https://www.instagram.com/p/BcDN4eWFjac/?taken- by=therock>", "attachments": [ { "service_name": "Instagram", "title": "Instagram post by @therock • Nov 28, 2017 at 7:14pm UTC", "title_link": "https://www.instagram.com/p/BcDN4eWFjac/?taken-by=therock", "text": "346.3k Likes, 2,167 Comments - @therock on Instagram:”……”", "fallback": "Instagram: Instagram post by @therock • Nov 28, 2017 at 7:14pm UTC", "image_url": "https://scontent-iad3-1.cdninstagram.com/t51.2885-15/e35/24178_n.jpg", "from_url": "https://www.instagram.com/p/BcDN4eWFjac/?taken-by=therock", "image_width": 334, "image_height": 250, "image_bytes": 178559, "service_icon": "https://www.instagram.com/static/images/ico/appl.png/932e4d9af891.png", "id": 1 } ], "thread_ts": "1511936426.000178", "reply_count": 3, "replies": [ { "user": "U193XDML7", "ts": "1511953167.000138” }, { "user": "U2FQG2G9F", "ts": "1511953180.000044" }, { "user": "U193XDML7", "ts": "1511953192.000230” } ], "ts": "1511936426.000178", "reactions": [ { "name": "smile", "users": [ "U193XDML7” ], "count": 1 }, { "name": "obesecat", "users": [ "U193XDML7” ], "count": 1 } ] } The Slack APIs Endpoint: https://slack.com/api/conversations.history
  • 17. Endpoint: https://slack.com/api/conversations.history [ { "type": "message", "user": "U4BPQR94L", "text": "Yinghui Malmsteen <@U2FQG2G9F>n<https://www.youtube.com/watch?v=D4OxW_0qqv8>", "attachments": [ { ... } ], "ts": "1536057373.000100", "reactions": [ { "name": "flag-se", "users": [ "U58LYK8Q6" ], "count": 1 } ] } ] [ { "user": "U2Q2U37SA", "inviter": "U0LPSJQR0", "text": "<@U2Q2U57SA> has joined the channel", "type": "message", "subtype": "channel_join", "ts": "1536138265.000200” } ] The Slack APIs
  • 18. [ { "id": "U4C0FDU2J", "team_id": "T028ZLMQN", "name": "friendlybotdev", "deleted": true, "profile": { "title": "", "phone": "", "skype": "", "real_name": "Friendly Bot", "real_name_normalized": "Friendly Bot", "display_name": "friendlybotdev", "display_name_normalized": "friendlybotdev", "status_text": "", "status_emoji": "", "status_expiration": 0, "avatar_hash": "123456", "bot_id": "B4B47T0G3", "api_app_id": "A4B92ZEER", "always_active": true, "image_original": "https://slack-edge.com/2017-06-21/123456_original.png", "first_name": "Friendly", "last_name": "Bot", "image_24": "https://slack-edge.com/2017-06-21/123456_24.png", "image_32": "https://slack-edge.com/2017-06-21/123456_32.png", "image_48": "https://slack-edge.com/2017-06-21/123456_48.png", "image_72": "https://slack-edge.com/2017-06-21/123456_72.png", "image_192": "https://slack-edge.com/2017-06-21/123456_192.png", "image_512": "https://slack-edge.com/2017-06-21/123456_512.png", "image_1024": "https://slack-edge.com/2017-06-21/123456_1024.png", "status_text_canonical": "", "team": "T028Z5MQN" }, "is_bot": true, "is_app_user": false, "updated": 1517305013 } ] [ { "id": "C8KMHCN5D", "name": "arandomchannel", "is_channel": true, "created": 1507613685, "creator": "U5BG5XU6T", "is_shared": false, "is_member": true, "is_private": false, "last_read": "1533892238.000324", "latest": { "type": "message", "user": "U84K3ZTF9", "text": "let's meetup tomorrow", "ts": "1536139470.000100" }, "unread_count": 7, "unread_count_display": 7, "members": [ "U08ED90CD", "U0LPSJQR0", "U193XDML7", "U9LKWV9C1", "UBJ4CHV5L" ], "topic": { "value": "place for people who are interested in sharing and learning", "creator": "U5BGLXU6T", "last_set": 1507613720 }, "purpose": { "value": "", "creator": "", "last_set": 0 }, "previous_names": [] } ] Endpoint: https://slack.com/api/users.list Endpoint: https://slack.com/api/channels.info The Slack APIs
  • 19. The Journey So Far • Defining entities and modelling them into Graph • Iterative feedback-drivenprocess • Understanding the data available from the API • Identifying unique IDs • Filtering out required fields
  • 20. Data Ingestion and Extraction • Apache Flink cluster retrieves, parses and filters Slack data • GraphQL service requests data for visualization • Flask REST service ingests/queries data to/from Tinkerpop POST PUT GET Gremlin-Python Gremlin Bytecode
  • 21. Why Tinkerpop? • Abstraction that lets us avoid vendor lock-in • Reduces rework when switching data stores • Gremlin query language • Hadoop and SparkComputer
  • 22. Making the Graph RESTful • Defining REST Endpoints • Defining the Resources • Remote Traversals
  • 23. • Write endpoints for seeding • POST /teams/<team_uid>/channels • POST /teams/<team_uid>/channels/<channel_uid>/messages • Handling Idempotency • Replace default strategy with ”ElementIDStrategy” • Enables creation of nodes with Slack specific unique IDs Defining REST Endpoints // scripts/empty-sample.groovy globals << [g : graph.traversal(),sg: graph.traversal().withStrategies(ElementIdStrategy.build().create())] • Read endpoints for queries • GET /teams/<team_uid>/top_posts
  • 24. Making the Graph RESTful • Setting up REST Endpoints • Defining the Resources • Remote Traversals
  • 25. Defining the Resources from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema from marshmallow.exceptions import ValidationError ... class MessageSchema(Schema): """ Holds all the required fields for a message object.""" ts = fields.Float(required=True) text = fields.Str() comment = fields.Str() subtype = fields.Str() bot_id = fields.Str(validate=is_bot_uid) user = fields.Str(validate=is_user_uid) thread_ts = fields.Str() file_share = fields.Nested(FileShareSchema, load_from="file") attachments = fields.Nested(AttachmentSchema, many=True) reactions = fields.Nested(ReactionSchema, many=True) comments = fields.Nested(CommentSchema, many=True, load_from="replies") mentions = fields.List(fields.Str(validate=is_user_uid)) class AttachmentSchema(Schema): """ Holds all the required fields for an Attachment object.""" class ReactionSchema(Schema): """ Holds all the required fields for a reaction object.""" class CommentSchema(Schema): """ Holds all the required fields for a comment object.""" ... • Organized code with single point of reference
  • 26. Defining the Resources from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema from marshmallow.exceptions import ValidationError ... class MessageSchema(Schema): """ Holds all the required fields for a message object.""" @validates_schema def validate_message(self, data): """ Validate if the message contains any of comments, mentions or reactions. """ if not any([f(data) for f in (has_comments, has_mentions, has_reactions)]): raise ValidationError("The message must contain comments, mentions or reactions") ts = fields.Float(required=True) text = fields.Str() comment = fields.Str() subtype = fields.Str() bot_id = fields.Str(validate=is_bot_uid) user = fields.Str(validate=is_user_uid) thread_ts = fields.Str() file_share = fields.Nested(FileShareSchema, load_from="file") attachments = fields.Nested(AttachmentSchema, many=True) reactions = fields.Nested(ReactionSchema, many=True) comments = fields.Nested(CommentSchema, many=True, load_from="replies") mentions = fields.List(fields.Str(validate=is_user_uid)) class AttachmentSchema(Schema): """ Holds all the required fields for an Attachment object.""" class ReactionSchema(Schema): """ Holds all the required fields for a reaction object.""" class CommentSchema(Schema): """ Holds all the required fields for a comment object.""" ... • Organized code with single point of reference • Validate data before ingestion • Enforce types and required fields @validates_schema def validate_message(self, data): """ Validate if the message contains any of comments, mentions or reactions. """ if not any([f(data) for f in (has_comments, has_mentions, has_reactions)]): raise ValidationError("The message must contain comments, mentions or reactions")
  • 27. from marshmallow import Schema, fields, pre_load, pre_dump, post_load, validates_schema from marshmallow.exceptions import ValidationError ... class MessageSchema(Schema): """ Holds all the required fields for a message object.""" class AttachmentSchema(Schema): """ Holds all the required fields for an Attachment object.""" title = fields.Str() fallback = fields.Str() text = fields.Str() thumb_url = fields.Str() image_url = fields.Str() title_link = fields.Str() @post_load def reshape_attachment(self, data): """ Apply required transformations on the Attachment object. ""” # Create a post_title field collapse_keys(data, "post_title", *("fallback", "title", "text")) # Create a post_thumbnail field collapse_keys(data, "post_thumbnail", *("thumb_url", "image_url", "title_link")) # Set post_type to URL data["post_type"] = "URL” class ReactionSchema(Schema): """ Holds all the required fields for a reaction object.""" class CommentSchema(Schema): """ Holds all the required fields for a comment object.""" class FileShareSchema(Schema): """ Holds all the required fields for a File Share object.""” class UserSchema(Schema): """ Holds all the required fields for a User object.""” ... • Organized code with single point of reference • Validate data before ingestion • Enforce types and required fields • Normalize fields with post- processing Defining the Resources @post_load def reshape_attachment(self, data): """ Apply required transformations on the Attachment object. ""” # Create a post_title field collapse_keys(data, "post_title", *("fallback", "title", "text")) # Create a post_thumbnail field collapse_keys(data, "post_thumbnail", *("thumb_url", "image_url", "title_link")) # Set post_type to URL data["post_type"] = "URL”
  • 28. Making the Graph RESTful • Schema enforcement and validation • Handling Idempotency of endpoints • Custom Traversal Source
  • 29. Remote Traversals • Bytecode sent over network instead of string • Allows using custom traversal source for a Domain Specific Language (DSL) from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection ... conn = DriverRemoteConnection(GREMLIN_SERVER_HOST, 'sg') slack = Graph().traversal(SlackTraversalSource).withRemote(conn)
  • 30. Building a DSL • Motivations • Custom Workflows
  • 31. Building a DSL - Motivations class SlackTraversalSource(BaseTraversalSource): """ Module to initialise a Graph with the methods listed under SlackTraversal. """ def __init__(self, *args, **kwargs): super(SlackTraversalSource, self).__init__(*args, **kwargs) self.graph_traversal = SlackTraversal def channels(self, *channel_ids): """ Shorthand to identify all channel nodes""" return traversal • Custom traversal source can also specify useful shorthands • E.g. Traversing to all the Channel nodes traversal = self.get_graph_traversal() traversal.bytecode.add_step("V") traversal.bytecode.add_step("hasLabel", NODES.channel) if channel_ids: traversal.bytecode.add_step("has", "__id", P.within(channel_ids))
  • 32. Building a DSL - Motivations class SlackTraversal(BaseTraversal): def addPartOfChannelEdges(self, channel_uid, *user_uids, **kwargs): """ Add an edge to a channel from the users who were/are a part of the channel. ""” return self • Custom traversal source specifies business logic behind traversals • E.g. Connecting a User node to a Channel node for user_uid in user_uids: edge_uid = construct_uid(user_uid, channel_uid, EDGES.part_of.name, delim="|") self.getOrAddEdgeFrom(edge_label=EDGES.part_of, edge_uid=edge_uid, node_label=NODES.user, node_uid=user_uid) .upsertProperties(kwargs.get("properties")).inV()
  • 33. Building a DSL - Motivations from gremlin_python.process.graph_traversal import GraphTraversal from gremlin_python.process.graph_traversal import GraphTraversalSource, __ class BaseTraversal(GraphTraversal): def getOrAddEdgeFrom(self, edge_label, edge_uid, node_label, node_uid): """ Adds an edge from the node with the given label and uid only if the edge doesn’t exist. """ return self.coalesce( __.addE(edge_label).property(T.id, edge_uid).from_( __.V().getNode(node_label, node_uid))) __.InE(edge_label).hasId(edge_uid).and( __.outV().hasId(node_uid), __.outV().hasLabel(node_label)), • BaseTraversal handles creation of nodes and edges • These methods should guarantee idempotency • E.g. Creation of edges between two nodes… • ...checks for an existing edge
  • 34. Building a DSL - Motivations from gremlin_python.process.graph_traversal import GraphTraversal from gremlin_python.process.graph_traversal import GraphTraversalSource, __ class BaseTraversal(GraphTraversal): def getOrAddEdgeFrom(self, edge_label, edge_uid, node_label, node_uid): """ Adds an edge from the node with the given label and uid only if the edge doesn’t exist. """ return self.coalesce( __.InE(edge_label).hasId(edge_uid).and( __.outV().hasId(node_uid), __.outV().hasLabel(node_label)), __.addE(edge_label).property(T.id, edge_uid).from_( __.V().getNode(node_label, node_uid))) • The edge is created only if it doesn’t already exist
  • 35. def build_visualization(self, traversal_source, **kwargs): """ The below are standardized steps that are required to generate data for any visualization.""" return self.start(traversal_source) .filterByDate(self.date_dimension, kwargs.get("start_time"), kwargs.get("end_time")) .filterByFields(self.filters_map, kwargs.get("filters")) .sortByFields(self.sorting_map, kwargs.get("sort_field"), kwargs.get("sort_direction")) .buildObject(self.object_map).toList() Building a DSL – Custom Workflows • Standardized steps for generating a visualization are defined in the BaseTraversal • Custom maps define traversal paths for fields that vary across visualizations
  • 36. Building a DSL – Custom Workflows # Sample filter from frontend filter_obj = {'_and': [{"field": 'reactions', '_gte': 100}, {"field": 'post_creator', '_in': [‘bob’, ‘chloe'] }]} filter_map = {"post_creator": lambda pred: __.in_(EDGES.created_post).has(USER.display_name, pred), "reactions": lambda pred: __.inE(EDGES.reacted_to).count().is_(pred) } object_map = { "post_creator": {"uid": [__.in_(EDGES.created_post).values("__id"), __.constant("")], "image": ... # define similar path here, }, "reactions": __.inE(EDGES.reacted_to).groupCount().by(__.values(REACTION.name)) } start = lambda traversal_source: traversal_source.posts() # DSL generates the required lower level base traversals slack.posts().where( __.and_( __.inE(EDGES.reacted_to).count().is_(P.gte(100)), __.in_(EDGES.created_post).has(USER.display_name, P.within(['bob', 'chloe'])))). project("post_creator", "reactions").by( __.project("image", "display_name", "uid").by( __.in(EDGES.created_post).values(USER.image), __.in(EDGES.created_post).values(USER.display_name), __.in(EDGES.created_post).values("__id"))).by( __.inE(EDGES.reacted_to).groupCount()).toList() # Inject maps into DSL methods start(slack) .filterByFields(self.filters_map, kwargs.get("filters")) .buildObject(self.object_map) .toList() • The DSL takes in functions/paths that map fields to their traversals • Maps customized based on the visualization that is needed
  • 37. Building a DSL – Custom Workflows { "reactions": { "palm_tree": 82, "robot_face": 18 }, "post_creator": { "image": "https://url_of_image.jpg", "display_name": ”chloe", "uid": "U024ZH7HL” } } • The traversals generated churn out the final response objects • Objects rendered into visualizations by the client
  • 38. Testing the Application • Unit Tests • Validating traversals on Gremlin Server
  • 39. Check if test passes Use Fixtures Write code to make the test pass Write a failing test class TestNodeMethods(object): """ Test methods that help in retrieval and creation of Nodes. """ def test_node_retrieval(self, graph): """ Test if getNode retrieves an existing node. """ assert graph.V().getNode(label="person", uid=100) .count().next() == 1 assert graph.V().getNode(label="person", uid=101) .count().next() == 1 Start Gremlin Server Testing Our Application – Unit Testing
  • 40. Check if test passes Use Fixtures Write code to make the test pass Write a failing test Start Gremlin Server def getNode(self, label, uid): """ Returns the node with the given label and uid. Args: label (string): The label of the node to return uid (string): Unique ID of the node Raises: StopIteration: Node with the given label and uid does not exist """ return self.and_(__.hasLabel(label), __.has(T.id, uid)) Testing Our Application – Unit Testing
  • 41. Check if test passes Use Fixtures Write code to make the test pass Write a failing test Start Gremlin Server $ bin/gremlin-server.sh conf/gremlin-server-neo4j-python.yaml class TestBasicTraversal(object): """ Tests for methods that help create edges or nodes and methods that help populate the properties of these objects. """ @pytest.fixture(scope="module") def graph(self): """ Graph with two nodes and one edge connecting them. """ graph = Graph().traversal(CerebroTraversalSource) .withRemote( DriverRemoteConnection(GREMLIN_SERVER_HOST, GREMLIN_SERVER_TRAVERSER)) graph.V().clear() from_node = graph.addV("person"). property(T.id, 100).next() to_node = graph.addV("person"). property(T.id, 101).next() graph.addE("knows").from_(from_node).to(to_node) .property("__id", "1") .next() yield graph graph.V().clear() Testing Our Application – Unit Testing
  • 42. Check if test passes class TestNodeMethods(object): """ Test methods that help in retrieval and creation of Nodes. """ def test_node_retrieval(self, graph): """ Test if getNode retrieves an existing node. """ assert graph.V().getNode(label="person", uid=100) .count().next() == 1 assert graph.V().getNode(label="person", uid=101) .count().next() == 1 Write code to make the test pass Write a failing test Use Fixtures Start Gremlin Server Testing Our Application – Unit Testing
  • 43. [ { "reactions": [ { "name": "joy", "users": [ "U5K7JUATE” ] } ], "attachments": [ { ... } ], "text": "<https://www.youtube.com/watch?v=4iEh1ykb13w>", "ts": "1465895473.000050", "user": "U37BF9457", "type": "message” } ] Testing Our Application – Unit Testing class MessageSchema(Schema): """ Holds all the required fields for a message object.""" . . . • Fixture used to test if the MessageSchema class is implemented correctly
  • 44. [ { "reactions": [ { "name": "joy", "users": [ "U5K7JUATE” ] } ], "attachments": [ {...} ], "text": ” <@U123456> <https://www.youtube.com/watch?v=4iEh1ykb13w>", "mentions": [ "U123456” ], "ts": ”a "type": "message” } ] Testing Our Application – Unit Testing class MessageSchema(Schema): """ Holds all the required fields for a message object.""" mentions = fields.List(fields.Str(validate=is_user_uid)) • MessageSchema needs to include mentions • Update the fixture to be able to test that the schema includes mentions • Need to validate if traversals pick up mentions
  • 45. Write code to make the test pass Write a failing test [ { "reactions": [ { "name": "joy", "users": [ "U5K7JUATE” ] } ], "attachments": [ {...} ], "text": ” <@U123456> <https://www.youtube.com/watch?v=4iEh1ykb13w>", "mentions": [ "U123456” ], "ts": ”a "type": "message” } ] gremlin> graph.io(graphson()).writeGraph("graph_name.json") Testing Our Application – Unit Testing Update JSON & Generate GraphSON Check if test passes Use Fixtures Start Gremlin Server
  • 46. Write code to make the test pass Write a failing test @pytest.fixture(scope="module") def slack_graph(): """ Open a subgraph on localhost for testing. """ slack.V().clear() slack_client = Client(GREMLIN_SERVER_HOST, SLACK_TRAVERSER) path_to_fixture = str(Path.cwd().joinpath( "tests/fixtures/slack_graph.json")) graphson_statement = 'graph.io(graphson()).readGraph("{}")’. format(path_to_fixture) slack_client.submit(graphson_statement).all().result() yield slack slack.V().clear() Testing Our Application – Unit Testing Update JSON & Generate GraphSON Check if test passes Use Fixtures Start Gremlin Server
  • 47. Testing the Application – CI/CD • Automated tests using CircleCI • Custom Configuration for Gremlin Server • Caching Dependencies for Faster Tests
  • 48. steps: #CircleCI 2.0 ... - run: command: | if [ ! -d ./apache-tinkerpop-gremlin-server-3.3.3 ]; then curl -O https://archive.apache.org/dist/tinkerpop/3.3.3/apache-tinkerpop-gremlin-server- 3.3.3-bin.zip unzip -q apache-tinkerpop-gremlin-server-3.3.3-bin.zip # Install gremlin-python cd ./apache-tinkerpop-gremlin-server-3.3.3 && ./bin/gremlin-server.sh install org.apache.tinkerpop gremlin-python 3.3.3 # Change max content length and traversal strategy sed -i -- 's/.*maxContentLength:.*/maxContentLength: 2621440/g' conf/gremlin-server.yaml sed -i -- 's/graph.traversal()]/graph.traversal(),sg: graph.traversal().withStrategies(ElementIdStrategy.build().create())]/g' ./scripts/empty-sample.groovy fi ... Testing the Application – CI/CD
  • 49. Testing the Application – CI/CD steps: #CircleCI 2.0 - checkout - restore_cache: keys: - v1-dependencies-{{ .Branch }} - v1-dependencies-master - run: # Download and install Gremlin server ... # Cache the installation - save_cache: key: v1-dependencies-{{ .Branch }} paths: - ~/src/app_name/apache-tinkerpop-gremlin-server-3.3.3
  • 50. # Test - run: # Starting Gremlin Server command: | cd ./apache-tinkerpop-gremlin-server-3.3.3 && ./bin/gremlin-server.sh ./conf/gremlin-server.yaml background: true # Sleep to give the gremlin server enough time to start - run: sleep 10 - run: pycodestyle app_name - run: coverage run --source=app_name -m pytest tests --capture=no --strict - run: coverage report -m --fail-under=95 Testing the Application – CI/CD
  • 51. Scaling Our Graph • Async Traversals • HA Cluster and Load Balancing
  • 52. def seed_channels(data, team_uid): for channel_data in data: channel_uid, creator, members = (channel_data.pop(key) for key in ["uid", "creator", "members"]) slack.V().addChannel(channel_uid, properties=channel_data).next() slack.teams(team_uid).addTeamHasChannelEdge(team_uid, channel_uid).next() slack.users(creator).addCreatedChannelEdge(creator, channel_uid).next() slack.channels(channel_uid).addPartOfChannelEdges(channel_uid, *members).next() def seed_channels(data, team_uid): for channel_data in data: channel_uid, creator, members = (channel_data.pop(key) for key in ["uid", "creator", "members"]) slack.V().addChannel(channel_uid, properties=channel_data) .addTeamHasChannelEdge(team_uid, channel_uid).inV() .addCreatedChannelEdge(creator, channel_uid).inV() .addPartOfChannelEdges(channel_uid, *members).next() def seed_channels(data, team_uid): for channel_data in data: channel_uid, creator, members = (channel_data.pop(key) for key in ["uid", "creator", "members"]) slack.V().addChannel(channel_uid, properties=channel_data) .addTeamHasChannelEdge(team_uid, channel_uid).inV() .addCreatedChannelEdge(creator, channel_uid).inV() .addPartOfChannelEdges(channel_uid, *members).promise() • Seed subgraph using “next” • Reduce number of blocking calls to one per channel • Seed subgraph using “promise” • Make seeding asynchronous, no blocking calls • Verify that the returned futures were successful • Seed individual entities using “next” • Each call to “next” is blocking Async Traversals next() next() next() next() next() promise()
  • 53. HA Cluster and Load Balancing • Preparing for high availability with Neo4J and Gremlin • Configuring Gremlin Server and Neo4J • Understanding the Neo4J HA Architecture • Advantages • Data replication • Spread writes across instance • Handle greater read loads • HA cluster is fronted by a load balancer like HAProxy • Reference: • https://neo4j.com/docs/operations-manual/current/ha-cluster/architecture/ • http://tinkerpop.apache.org/docs/3.3.3/reference/#_high_availability_configuration
  • 54. HA Cluster and Load Balancing • Tuning parameters for the cluster • Frequency of pulling updates from other members of the cluster • gremlin.neo4j.conf.ha.pull_interval • Number of slaves a transaction should be committed to • gremlin.neo4j.conf.ha.tx_push_factor • Tuning parameters for the Load Balancer • Routing requests across the cluster • balance • Checking if the members in the cluster are responsive • option httpchk // gremlin-server-neo4j-ha-{1..3}.yaml channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer > curl "http://localhost:8182?gremlin=100-1"