SlideShare a Scribd company logo
Gremlin       G = (V, E)

A Graph-Based Programming Language
             Marko A. Rodriguez
       T-5, Center for Nonlinear Studies
       Los Alamos National Laboratory
        http://markorodriguez.com
      http://gremlin.tinkerpop.com

              February 25, 2010
Abstract
Gremlin is a Turing-complete, graph-based programming language
developed for key/value-pair multi-relational graphs called property graphs.
Gremlin makes extensive use of XPath 1.0 to support complex graph
traversals. Connectors exist to various graph databases and frameworks.
This language has application in the areas of graph query, analysis, and
manipulation.




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Acknowledgements
• Marko A. Rodriguez [http://markorodriguez.com]
  designed, developed, tested, and documented Gremlin.
• Peter Neubauer [http://www.linkedin.com/in/neubauer]
  aided in the design and the evangelizing of Gremlin.
• Pavel Yaskevich [http://github.com/xedin]
  aided in the development of user defined functions in Gremlin.
• Joshua Shinavier [http://fortytwo.net]
  provided initial conceptual support for Gremlin.
• Ketrina Yim [http://csillustrated.berkeley.edu]
  designed the logo for Gremlin.
• Gremlin-Users Group [http://groups.google.com/group/gremlin-users]
  provided much direction in the design and implementation of Gremlin.




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
What is a Graph?
• A graph (network) is composed of a collection of vertices (dots) and edges (lines).
  There are many types of graphs: directed/undirected, weighted, attributed, etc.



                                                   vertex-labeled

                                                           a
                                                                                        hyper
                                                                             d                   edge-attributed
                                          ed                            bele
                                       ht                          e-la
                 multi




                                    ig                          edgknows                        created=2-01-09
                                  we 0.2                                                        modified=2-11-09




                                                                                 cted
                                                   tic




                                                                               undire
                                                                 di
                                               an




                                                                    re
                                                                    ct
                                               m




                                                   hired               ed
                                           se




                         reg
              ge




                            ula
            half-ed




                               r
                                                                                                   pseudo
                                                                         http://ex.com/123
                                  type="person"
                                  name="emil"                  resource description framework

                               vertex-attributed



                         Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Why Use a Graph?

• A graph is a very general data structure that can be used to model
  various systems.
    A graph can model the structure of transportation, technological,
    bibliographic, etc. systems.
    A graph can model a list, a map, a tree, etc.

• There are numerous graph algorithms that are defined independent of
  the domain of the graph model.

• There are numerous graph databases, frameworks, packages, etc.
  that aid in the creation, manipulation, and analysis of graphs.




             Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Graph Databases, Frameworks, and Packages
•   Neo4j Graph Database [http://neo4j.org]
•   AllegroGraph Quad Store [http://http://www.franz.com/agraph]
•   HyperGraphDB [http://www.kobrix.com/hgdb.jsp]
•   Java Universal Network/Graph Framework [http://jung.sourceforge.net]
•   OpenRDF Sesame Framework [http://www.openrdf.org]
•   InfoGrid Graph Database [http://infogrid.org]
•   Filament Graph Toolkit [http://filament.sourceforge.net]
•   OWLim Semantic Repository [http://www.ontotext.com/owlim]
•   Sones Graph Database [http://www.sones.com]
•   NetworkX Graph Toolkit [http://networkx.lanl.gov]
•   iGraph Toolkit [http://igraph.sourceforge.net]
•   Blueprints Graph API [http://blueprints.tinkerpop.com]
•   ... and many more.



                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
What Makes Gremlin Different?
• Gremlin is a domain specific language for working with graphs.

• Gremlin is not an application programming interface (API).

• Gremlin makes use of various graph databases, frameworks, packages.

• Gremlin is a language that currently has a virtual machine
  implementation written in Java.

• What can be succinctly expressed in Gremlin is verbose/clumsy to
  express in general purpose languages such as Java, Python, Ruby, etc.

• Gremlin allows one to map single-relational graph analysis algorithms
  over to the multi-relational domain.


              Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Single-Relational Graphs
• In single-relational graphs, all edges have the same meaning
  (e.g. all edges are either frienship, kinship, worksWith, knows, etc.).
       G = (V, E ⊆ (V × V ))

• Most graph algorithms are defined for single-relational graphs
  (e.g. centrality/ranking, clustering/community detection, etc.).

                                                   person-c




                                  person-a                           person-b




NOTE: These types of graphs are also known as directed, vertex-labeled graphs.


                     Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Multi-Relational Graphs
• In multi-relational graphs, edges can have different meanings.
       G = (V, E ⊂ (V × V ), ω : E → Σ∗)

• Most graph software is designed for multi-relational graphs (e.g. arbitrary
  objects as vertices and edges, knowledge-based reasoning systems, etc.).


                                                    book-c


                                             read              cites


                                  person-a          authored           book-b




NOTE: These types of graphs are also known as directed, vertex/edge-labeled graphs.


                     Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Gremlin and Multi-Relational Graphs

• Gremlin provides a means to elegantly map single-relational graph
  analysis algorithms over to the multi-relational graph domain.

• Gremlin provides an elegant way to do automated reasoning in
  multi-relational graphs using path expressions.

These two points form the primary thesis of this presentation.


Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis
Algorithms,” Journal of Informetrics, 4(1), 29–41, doi:10.1016/j.joi.2009.06.004, LA-UR-08-03931,
http://arxiv.org/abs/0806.2274, December 2009.




                      Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Property Graphs

• Gremlin works with a type of multi-relational graph called a property
  graph.
       Vertices and edges are labeled with unique identifiers.
       Edges are directed, labeled, and can form loops.
       Multiple edges of the same label can exist for the same vertex pair.
       Vertices and edges can have any number of key/value pair
       properties/attributes.

Property graphs are a relatively general graph structure that can be constrained to model other graph
structures — though, a property-based hypergraph would be the most general (see HyperGraphDB and the
JUNG API).




                      Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Property Graphs
                                        name = "lop"
                                        lang = "java"

                       weight = 0.4              3
     name = "marko"
     age = 29            created
                                                                weight = 0.2
                   9
               1
                                                                created
                   8                     created
                                                                          12
               7       weight = 1.0
                                                weight = 0.4                     6
weight = 0.5
                        knows
          knows                          11                               name = "peter"
                                                                          age = 35
                                                name = "josh"
                                        4       age = 32
               2

                                        10
     name = "vadas"
     age = 27
                                             weight = 1.0

                                      created



                                        5

                                name = "ripple"
                                lang = "java"




  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts




              Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Gremlin System Architecture

                                                    • The Gremlin console is a scripting environment
  Gremlin              Gremlin                        which allows for the dynamic evaluation of
  Console            ScriptEngine                     Gremlin code.
                                                    • Gremlin implements JSR 223 which allows
                                                      Gremlin to also be used within the Java
                                                      language and thus, as a virtual machine directly
                                                      accessible to Java applications. Popular JSR
                                                      223 implementations include Jython, JRuby, and
                                                      Groovy. For a fine list of implementations see
                                                      https://scripting.dev.java.net.
                                                    • Blueprints is a set of interfaces for abstract
                                                      data structures such as graphs and documents.
                                                      Implementations to these interfaces exist for
                                                      various data management systems.
                                                    • There exist many graph data management
                                                      systems that span various graph data models
Neo4j       NativeStore   TinkerGraph                 (e.g. edge labeled graphs, RDF graphs,
                                                      hypergraphs, etc.).



             Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
“Hello World” in the Gremlin Console


marko$ ./gremlin.sh

         ,,,/
         (o o)
-----oOOo-(_)-oOOo-----
gremlin>
gremlin> concat(‘goodbye’, ‘ ’, ‘self’)
==>goodbye self




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Simple Traversals in Gremlin
                                             name = "lop"                               gremlin> $_ := g:key(‘name’,‘marko’)
                                             lang = "java"
                                                                                        ==>v[1]
                            weight = 0.4              3
       name = "marko"
       age = 29                 created
                                                                                        gremlin> .
                1
                       9                                                                ==>v[1]
                                                                     created

                7
                       8                      created
                                                                               12       gremlin> ./outE
                                                                                    6
weight = 0.5
                               knows
                                                                                        ==>e[7][1-knows->2]
               knows                          11
                       weight = 1.0                                                     ==>e[9][1-created->3]
                                                     name = "josh"
                                             4
                 2
                                                     age = 32                           ==>e[8][1-knows->4]
       name = "vadas"
                                             10                                         gremlin> ./outE/@weight
       age = 27
                                                                                        ==>0.5
                                           created
                                                                                        ==>0.4
                                             5
                                                                                        ==>1.0



./outE/@weight: “Get the current object(s). Then get the outgoing edges of those objects. Then get the
weights of those edges.”
$ is a reserved variable meaning the root list of objects.


                                       Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Simple Traversals in Gremlin
                                 name = "lop"                       gremlin> .
                                 lang = "java"
                                                                    ==>v[1]
                                         3
  name = "marko"                                                    gremlin> ./outE[@label=‘created’]/inV
  age = 29          created
              9
                                                                    ==>v[3]
        1                                        created

              8                   created
                                                                    gremlin> $_ := $_last
                                                           12
        7
                                                                6
                                                                    ==>v[3]
      knows
                   knows
                                  11
                                                                    gremlin> ./@name
                                                                    ==>lop
                                  4
        2                                                           gremlin> g:map(.)
                                 10
                                                                    ==>name=lop
                               created
                                                                    ==>lang=java
                                  5




./outE[@label=‘created’]/inV: “Get the current object(s). Then get the outgoing edges of those
objects, where their labels equal ‘created’. Then get the incoming vertices of those ‘created’ edges.”
$ last is a reserved variable meaning the last value evaluated.


                              Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Simple Traversals in Gremlin
                                                 name = "lop"
                                                 lang = "java"

                                                          3
                  name = "marko"
                  age = 29           created
                               9
                         1                                               created

                               8                  created
                                                                                   12
                         7
                                                                                        6
                                    knows
                       knows                      11

                                                         name = "josh"
                                                 4       age = 32
                         2

                                                 10
                   name = "vadas"
                   age = 27

                                               created



                                                 5




./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name
==>vadas

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Simple Traversals in Gremlin
./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name


1. .: Get the current object(s).

2. outE[@label=‘knows’]: Get the outgoing edges of the current
   object(s), where their labels equal ‘knows’.

3. inV[matches(@name,‘va.{3}’) and @age > 21]: Get the incoming
   vertices of those ‘knows’ edges, where the names of those vertices are 5
   characters long, start with ‘va’, and whose age is greater than 21.

4. @name: get the name of those particular incoming vertices.



                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Knowledge-Based Reasoning
• Blueprints implements the Sesame SAIL interfaces and thus, Gremlin
  can be used over the many Resource Description Framework (RDF)
  triple/quad stores. In such cases, RDF is modeled as a property graph
  where the named graph component is the @ng edge property.

• Gremlin makes use of the Sesame SAIL SPARQL engine to allow for
  queries based on graph-pattern matching.

gremlin> sail:sparql(‘SELECT ?x ?y WHERE { ?x foaf:knows ?y }’)
==>{y=v[http://ex.com#2], x=v[http://ex.com#1]}
==>{y=v[http://ex.com#4], x=v[http://ex.com#1]}

• Gremlin is useful for knowledge-based reasoning using path
  expressions.


              Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Reasoning as Defining New Types of Adjacency
                                                                    • Graph-based reasoning is the process
                                                                      of making explicit what is implicit in
                                      lop    co-developer
                                                                      the graph.
               created

  marko
                                               created              • A reasoner takes a graph G
             co-developer
                                                            peter
                                                                      and a collection of graph-patterns
                                   created
                                                                      (i.e. transformation/rewrite rules) and
  knows      knows
                                                                      creates a new graph G (usually, G ⊂
                            josh
                                                                      G ). G has new relationships/edges
  vadas
                                                                      and thus, new definitions of vertex
                         created                                      adjacency.
                                                                    • Example: The co-developers of person
                         ripple                                       A are those people who have created
                                                                      the same software as person A and who
                                                                      are themselves, not person A (as person
For these “co-developer” examples, we will use
                                                                      A has created the same software as him
vertex 1 (marko) as the source of the reasoning
                                                                      or herself).
process.


                            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
The Co-Developers of Marko A. Rodriguez in SPARQL


                               name = "lop"                             SELECT ?x WHERE {
                               lang = "java"

                               ?y
                                                                          marko created ?y .
                                        3
   name = "marko"
   age = 29          created
                                                                          ?z created ?y .
marko    1
                                               created
                                                            ?z            ?z != marko .
                                 created
                                                             6            ?z name ?x
                    knows
                                                       name = "peter"   }
                                                       age = 35 ?x
        knows
                            ?z
                               4
                                       name = "josh"
                                       age = 32 ?x
                                                                        This query would return: josh and
          2
                                                                        peter.
                             created


                               5




                            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
The Co-Developers of Marko A. Rodriguez in Gremlin
                                           co-developer



                                                                   lop    co-developer
                                           created
                                                                            created
                             marko               co-developer
                                                                                         peter
                                                                created

                             knows       knows



                                                       josh
                             vadas


                                                     created



                                                      ripple




gremin> ./@name
==>marko
gremlin> ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name
==>josh
==>peter


                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
The Co-Developers of Marko A. Rodriguez in Gremlin

./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name


1. .: Get the current object(s) (i.e. vertex 1 — denoting Marko).
2. outE[@label=‘created’]: Get the outgoing edges of the Marko vertex, where their
   labels equal ‘created’.
3. inV: Get the incoming (i.e. head) vertices of those ‘created’ edges.
4. inE[@label=‘created’]: Get the incoming edges of those vertices, where their
   labels equal ‘created’.
5. outV[g:except($ )]: Get the outgoing (i.e. tail) vertices of those ‘created’ edges,
   where those vertices are not the Marko vertex.
6. @name: get the name of those non-Marko vertices.




                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Defining Co-Developers in Gremlin


path co-developer
  ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]
  end

Once defined, you can use it like any other path segment.
gremlin> ./co-developer
==>v[4]
==>v[6]
gremlin> ./co-developer/@name
==>josh
==>peter




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Defining Co-Developers in Java
public class CoDeveloperPath implements Path {
   public List invoke(Object root) {
      if(root instanceof Vertex) {
         List<Vertex> projects = new ArrayList<Vertex>();
         for(Edge edge : ((Vertex)root).getOutEdges()) {
             if(edge.getLabel().equals("created")) {
                projects.add(edge.getInVertex());
             }
         }
         List<Vertex> coDevelopers = new ArrayList<Vertex>();
         for(Vertex project : projects) {
             for(Edge edge : project.getInEdges()) {
                if(edge.getLabel().equals("created") && edge.getOutVertex() != root) {
                    coDevelopers.add(edge.getOutVertex());
                }
             }
         }
         return coDevelopers;
      } else {
         return null;
      }
   }
}



                     Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Gremlin Type System

                                          object




element   graph         number            string         boolean           map               list




vertex    edge




          Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Predefined Paths and Properties
                      vertex 1 out edges                   vertex 3 in edges
       edge 9 out vertex                   edge 9 label                   edge 9 in vertex
                                 edge 9 id


                 1                  9        created                           3

                               8                                 11
                                    knows              created
                                               4                      vertex 4 id
              vertex 4 properties
                                         name = "josh"
                                         age = 32




   object        property                          description                       example
   graph             V                  the vertex iterator of the graph               $g/V
   graph             E                   the edge iterator of the graph                $g/E
vertex/edge         @id                   the identifier of the element                $v/@id
   vertex          outE                the outgoing edges of the vertex              $v/outE
   vertex           inE               the incoming edges of the vertex                $v/inE
   vertex         bothE              both in and out edges of the vertex            $v/bothE
    edge           outV              the outgoing tail vertex of the edge            $e/outV
    edge            inV             the incoming head vertex of the edge             $e/outV
    edge          bothV              both in and out vertices of the edge           $e/bothV
    edge          @label                      the label of the edge                 $e/@label




    Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Predefined Functions

g:assign()        g:remove-idx()           g:list()                 g:sort()                  g:print()
g:assign()        g:load()                 g:dedup()                g:map()                   g:time()
g:unassign()      g:save()                 g:union()                g:keys()                  g:p()
g:id()            g:clear()                g:intersect()            g:values()                g:to-json()
g:key()           g:close()                g:difference()           g:rand-nat()              g:from-json()
g:add-v()         g:keys()                 g:retain()               g:rand-real()             ...
g:add-e()         g:values()               g:except()               g:prob()                  ..
g:remove-ve()     g:map()                  g:remove()               g:cont()                  .
g:idx-all()       g:get()                  g:get()                  g:halt()
g:add-idx()       g:op-value()             g:op-value()             g:type()


There are over 70 predefined functions. See the following for a description of each.
http://wiki.github.com/tinkerpop/gremlin/core-function-library
http://wiki.github.com/tinkerpop/gremlin/gremlin-function-library



                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Working With Non-Graph Types
gremlin> 1.2 + 6
==>7.2
gremlin> ‘this is a string’
==>this is a string
gremlin> true() or false()
==>true
gremlin> g:map(‘marko’,‘lanl’,‘peter’,‘neotech’,‘josh’,‘rpi’)
==>marko=lanl
==>peter=neotech
==>josh=rpi
gremlin> g:list(‘graphs’,‘hockey’,‘motorcylces’,6)
==>graphs
==>hockey
==>motorcylces
==>6.0

            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Working With Non-Graph Types
gremlin> $m := g:map(‘hobbies’,g:list(‘hockey’,‘graphs’),
   ‘location’, g:map(‘state’,‘new mexico’, ‘city’, ‘santa fe’,
      ‘zipcode’, 87501), ‘age’, 30)
==>location={zipcode=87501.0, state=new mexico, city=santa fe}
==>age=30.0
==>hobbies=[hockey, graphs]
gremlin> $m/@age
==>30.0
gremlin> $m/@hobbies[2]
==>graphs
gremlin> $m/@location/@city
==>santa fe




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Variables

• Variables in Gremlin are prefixed with a $ character.

• There are a collection of reserved variables that all begin with $ .
     $ is the root list of objects.
     $ last is the last result evaluated by the evaluator.
     $ g is the “working graph” to reduce typing with graph functions.

gremlin> $x := 1
==>1.0
gremlin> $y := 2
==>2.0
gremlin> $x + $y
==>3.0

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Language Statements
Variable Assignment                                  Repeat

                                                     gremlin> $i := 0
gremlin> $i := 1 + 5                                 ==>0.0
==>6.0                                               gremlin> repeat 10
gremlin> $i                                            $i := $i + 1
==>6.0                                                 end
                                                     ==>10.0
If/Else
                                                     While

gremlin> if true()                                   gremlin> $i := ‘g’
    $i := 1                                          ==>g
  else                                               gremlin> while not(matches($i, ‘ggg’))
    $i := 2                                            $i := concat($i,‘g’)
    end                                                end
==>1.0                                               ==>ggg


                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Language Statements
Foreach                                                   Path

gremlin> $i := 0                                          gremlin> path friend_name
==>0.0                                                       ./outE[@label=‘knows’]/inV/@name
gremlin> foreach $j in 1 | 2 | 3                             end
   $i := $i + $j                                          gremlin> gremlin> ./friend_name
   end                                                    ==>vadas
==>6.0                                                    ==>josh
Function

gremlin> func ex:hello($name)
   concat(‘hello ’, $name)
   end
gremlin> ex:hello(‘pavel’)
==>hello pavel

You can define functions and paths in native Gremlin (as demonstrated above) or in Java.


                     Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
XPath Filters

• Use [ ] filters to filter objects in a path expression (i.e. “such that” or
  “where”)

• The evaluated result of [ ] must be a number or boolean.
      If its a number, it is treated as the position within an array (i.e. list).
      If it is boolean, it is treated as whether to include or exclude the
      object from the next path in the sequence.

gremlin> ./outE[@label=‘knows’]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
gremlin> ./outE[@label=‘knows’ and @weight>0.5]/inV[@age<21 or @name=‘josh’][true()][1]
==>v[4]




                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusion




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset




2,500 concerts
35,000 songs played
600 songs
30 years
11 members
1 band
... the Grateful Dead.



                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset
                                                                 • vertices denote songs and artists
                                                                      type: “song” or “artist”
                                                                      name: name of song or artist.
                                                                      performances: number of times song was
                                                                      played in concert.
                                                                      song type: whether the song was a “cover”
                                                                      or “original”.


                                                                 • edges    denote   followed by,      sung by,
                                                                   written by
                                                                      weight: number of times a song was
                                                                      followed by another song over all concerts
                                                                      played.


Rodriguez, M.A., Gintautas, V., Pepe, A., “A Grateful Dead Analysis: The Relationship Between Concert and Listening

Behavior,” First Monday, 14(1), University of Illinois at Chicago Library, http://arxiv.org/abs/0807.2466, January 2009.

NOTE: A portion of the raw dataset courtesy of Mark Leone http://www.cs.cmu.edu/ mleone/gdead/setlists.html



                          Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset

Stanley Theater                                   type="artist"
                                                                                                                type="artist"
                                                  name="Hunter"
                                                                                                                name="Garcia"
Pittsburgh, PA (11/30/79)                                                        type="song"
                                                                                 name="Scarlet.."
                                                          7
   2nd Set                                                                                                                  5
                                                                    written_by          1           sung_by
-------------------
                                                              weight=239
Scarlet Begonias
                                                                followed_by      type="song"
Fire on the Mountain                                                             name="Fire on.."             sung_by           sung_by
                                                       written_by
Passenger                                                                               2

Terrapin Station                                                weight=1
                                                                                                               type="artist"
                                                                                                               name="Lesh"
...                                                             followed_by
                                                                                 type="song"
                                                                                 name="Pass.."                          6
..
                                                   written_by                           3            sung_by
.
                                                                 followed_by
                                                                                 type="song"
                                                                weight=2         name="Terrap.."


                                                                                        4




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Load Data/Basic Stats

gremlin> g:load(‘data/graph-example-2.xml’)
==>true
gremlin> count($_g/V)
==>809.0
gremlin> count($_g/E)
==>8049.0




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Out-Degree of Each Vertex


gremlin> $degrees := g:map()
gremlin> foreach $v in $_g/V
  $degrees[@name=$v/@name] := count($v/outE)
end




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Out-Degree of Each Vertex

gremlin> g:sort($degrees, ‘value’, true())
==>PLAYING IN THE BAND=96.0
==>SUGAR MAGNOLIA=92.0
==>PROMISED LAND=89.0
==>GOOD LOVING=87.0
==>NOT FADE AWAY=86.0
==>I KNOW YOU RIDER=85.0
==>CASSIDY=83.0
==>DEAL=82.0
==>JACK STRAW=81.0
==>ONE MORE SATURDAY NIGHT=81.0
==>EL PASO=80.0
==>MEXICALI BLUES=79.0
...

            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Inspecting Single Vertex


gremlin> $v := g:key(‘name’,‘CHINA DOLL’)[1]
==>v[129]
gremlin> g:map($v)
==>name=CHINA DOLL
==>song_type=original
==>performances=114
==>type=song
gremlin> $v/outE[@label=‘sung_by’]/inV/@name
==>Garcia




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Inspecting Single Vertex
gremlin> $v/outE[@label=‘followed_by’]/inV/@name
==>BIG RIVER
==>THROWING STONES
==>SAMSON AND DELILAH
==>TRUCKING
==>CASEY JONES
==>HIGH TIME
...
gremlin> $v/outE[@label=‘followed_by’]/@weight
==>2
==>8
==>1
==>2
==>1
==>1
...

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Introduction to PageRank
• The remainder of this section will discuss the PageRank algorithm and
  its application to multi-relational graphs.

• The arguments made and the examples presented generalizes to all other
  single-relational graph algorithms. However, for the sake of brevity and
  consistency, only PageRank will be discussed.




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Introduction to Matrix-Based PageRank

• PageRank is a centrality measure based on the primary eigenvector
                                                  |V |×|V |
  of a modified version of a graph. Let A ∈ R+               denote the
  adjacency matrix representing the graph.

• In order to ensure a positive real values in the eigenvector, the graph
  must be strongly connected. PageRank induces strong connectivity
  by overlaying a low probability (defined by α ∈ [0, 1] – usually 0.15)
                                                           1 |V |×|V |
  “teleportation” graph over the original graph. Let B ∈ |V |          denote
  a teleportation adjacency matrix where ever vertex is connected to vertex
  with equal probability.
                                                           |V |×|V |
     C = (1 − α)A + αB, where C ∈ R+
                         |V |
     λ = λC, where λ ∈ R+ is the PageRank vector over V .


               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Introduction to Random Walk-Based PageRank
• PageRank can be implemented by a random walk.

• Create a vertex counter map, m : V → N+.

• Place a walker on a random vertex in V . Denote the walker’s current
  vertex i ∈ V .
 1.   increment the vertex counter by 1 (i.e. m(i) ← m(i) + 1).
 2.   the walker chooses a random adjacent vertex with probability α.
 3.   the walker chooses a random vertex in V with probability 1 − α.
 4.   rinse and repeat until m reaches a stationary probability distribution
      (continually normalize m if you want a probability distribution).

• We will use this random walk model in the Gremlin examples to follow.


                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over Multi-Relational Graphs

• PageRank was designed for single-relational graphs (i.e. where all edges
  have the same meaning).

• In a multi-relational graph, what does it mean to find the centrality
  of a vertex when vertices can be related by various types of edges?
  For example, if there exists “socializes with” and “met once”, then the
  person who “met once” many people could be the most centrally located
  in the graph. Also, what if you graph has more than just “person”-type
  vertices (e.g. cars, pets, buildings, articles, etc.) and “person”-type
  edges (e.g. owns, walks, livesAt, cites, etc.).




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over Multi-Relational Graphs
• Calculating single-relational PageRank
  would yield Person as the most central                                                                                           ...
                                                                                      Person                                type
  vertex.                                                                                                                type
                                                                                                                      type
• You can boolean filter certain edge labels                                                                        type
                                                                                                                 type
  (e.g. ignore type edges — in such cases,                                                                    type
                                                                       type    type    type    type    type type
  you would have the centrality scores over
  the knows social graph).
• However, what if you only wanted to
  traverse knows edges if and only if the                Herbert       Johan          Marko            Josh           Jen      ...
  adjacent vertex knows more than 10
  other people?                                                knows           knows           knows          knows

• In the end, you want complete
                                                                       knows                           knows
  control (universal computability)
  over      the    paths      that      the
  traverser/walker can take through
  a graph.


                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over Multi-Relational Graphs
• In multi-relational graphs, the meaning of your graph algorithm’s results are
  defined by your definition of adjacency.
• With respect to random walk-based PageRank, define the path that the walker
  should take. That path is the definition of adjacency.
• The stationary probability distribution created from this walk yields a path-dependent
  centrality.
• Thus, in a multi-relational graph, there are many types of PageRanks that can
  be calculated — one for each type of path defined for a walker.


Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, Knowledge-Based Systems,
21(7), 727–739, http://arxiv.org/abs/0803.4355, October 2008.




                    Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over “Garcia Followed By” SubGraph

• Define a path that will go from song-to-song by “followed by” edges and
  only traverse songs that are “sung by” Jerry Garcia.

(./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’]
         /inV[name=‘Garcia’]/../..)[g:rand-nat()]

         A                  B             C               D                        /../..
         followed_by                       sung_by                 name="Garcia"

                                                                                            g:rand-nat()
   .     followed_by                       sung_by                 name="Garcia"



         followed_by                       sung_by                 name="Weir"




                       Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over “Garcia Followed By” SubGraph
path garcia-followed_by
   (./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’]
         /inV[name=‘Garcia’]/../..)[g:rand-nat()]
   end

$m := g:map()
$alpha := 0.15
$_ := g:key(‘type’, ‘song’)[g:rand-nat()]
repeat 2500
  $_ := ./garcia-followed_by
  if count($_) > 0
    g:op-value(‘+’,$m,$_[1]/@name, 1.0)
  end
  if g:rand-real() < $alpha or count($_) = 0
    $_ := g:key(‘type’, ’song’)[g:rand-nat()]
  end
end

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over “Garcia Followed By” SubGraph
gremlin> g:sort($m,‘value’,true())
==>CRAZY FINGERS=98.0
==>HES GONE=85.0
==>CHINA CAT SUNFLOWER=79.0
==>BERTHA=76.0
==>UNCLE JOHNS BAND=74.0
==>TERRAPIN STATION=72.0
==>GOING DOWN THE ROAD FEELING BAD=71.0
==>WHARF RAT=71.0
==>EYES OF THE WORLD=65.0
==>COLD RAIN AND SNOW=62.0
==>SHIP OF FOOLS=58.0
==>RAMBLE ON ROSE=53.0
==>CASEY JONES=51.0
==>DARK STAR=47.0
==>DEAL=46.0
...

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Universal Computation in Paths
path path-name
  # any arbitrary computation can occur here
  end

• A path definition can be used to define adjacencies.
    adjacency can be expressed as anything that can be computed by a Turing machine.
    path definitions are used to create “semantically meaningful” results from single-
    relational graph algorithms applied to multi-relational graphs.
    path definitions make explicit what is implicit in the structure of the graph. This
    has applications to knowledge-based reasoning.
• A path definition can perform any arbitrary computation.
    path definitions can check/set vertex/edge properties.
    path definitions can create new vertices and edges.
    path definitions can call/define functions.

This allows fine grained control over how your traverser/walker moves through a graph.


                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
The Current Gremlin EcoSystems
• Webling: Web console for Gremlin
  (developed by Pavel Yaskevich w/ funding from Neo Technology)


          Webling
• Project Gargamel: Distributed Graph Computing
  (uses Linked Process and Gremlin)




• ReXster: A Graph-Based Recommender Engine




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Thank You
Please enjoy Gremlin at http://gremlin.tinkerpop.com ...




My homepage is http://markorodriguez.com.
Please feel to contact me with any questions or comments.




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

More Related Content

PDF
Gremlin's Graph Traversal Machinery
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
PDF
Apache Nifi Crash Course
PPTX
Slim Baltagi – Flink vs. Spark
PDF
[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏
PDF
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
PPTX
Semaphore
PPT
Graph database
Gremlin's Graph Traversal Machinery
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Nifi Crash Course
Slim Baltagi – Flink vs. Spark
[20171019 三木会] データベース・マイグレーションについて by 株式会社シー・エス・イー 藤井 元雄 氏
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Semaphore
Graph database

What's hot (20)

PDF
Dataflow with Apache NiFi
PPTX
Intro to Neo4j
PPTX
Gremlin's Anatomy
PDF
Graph database Use Cases
PDF
Enterprise Knowledge Graphs - Data Summit 2024
PPTX
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
PDF
Designing and Building Next Generation Data Pipelines at Scale with Structure...
PPTX
Php operators
PDF
Introducing DataFrames in Spark for Large Scale Data Science
PPTX
Apache NiFi in the Hadoop Ecosystem
PDF
PoEAA by Example
PDF
01-Database Administration and Management.pdf
PDF
Introducing Neo4j
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
PPT
Data Warehouse Basic Guide
PPTX
Operating system 21 multithreading models
PPT
Hadoop Security Architecture
PDF
An overview of Neo4j Internals
PDF
Spark overview
Dataflow with Apache NiFi
Intro to Neo4j
Gremlin's Anatomy
Graph database Use Cases
Enterprise Knowledge Graphs - Data Summit 2024
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Php operators
Introducing DataFrames in Spark for Large Scale Data Science
Apache NiFi in the Hadoop Ecosystem
PoEAA by Example
01-Database Administration and Management.pdf
Introducing Neo4j
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Data Warehouse Basic Guide
Operating system 21 multithreading models
Hadoop Security Architecture
An overview of Neo4j Internals
Spark overview
Ad

Viewers also liked (20)

PDF
The Gremlin Graph Traversal Language
PDF
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
PDF
Solving Problems with Graphs
PDF
The Graph Traversal Programming Pattern
PDF
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
PDF
Traversing Graph Databases with Gremlin
PDF
The Gremlin in the Graph
PDF
Quantum Processes in Graph Computing
PDF
Titan: Big Graph Data with Cassandra
PPTX
Graph databases: Tinkerpop and Titan DB
PDF
Introduction to TitanDB
PDF
Titan: Scaling Graphs and TinkerPop3
PDF
Titan: The Rise of Big Graph Data
PPTX
Aerospike Architecture
PDF
Arquitetura emergente - sobre cultura devops
PPTX
Introduction to Gremlin
PPTX
GUI Testing
PPT
testing
PDF
The Path Forward
PDF
DataStax: What's New in Apache TinkerPop - the Graph Computing Framework
The Gremlin Graph Traversal Language
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Solving Problems with Graphs
The Graph Traversal Programming Pattern
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Traversing Graph Databases with Gremlin
The Gremlin in the Graph
Quantum Processes in Graph Computing
Titan: Big Graph Data with Cassandra
Graph databases: Tinkerpop and Titan DB
Introduction to TitanDB
Titan: Scaling Graphs and TinkerPop3
Titan: The Rise of Big Graph Data
Aerospike Architecture
Arquitetura emergente - sobre cultura devops
Introduction to Gremlin
GUI Testing
testing
The Path Forward
DataStax: What's New in Apache TinkerPop - the Graph Computing Framework
Ad

Similar to Gremlin: A Graph-Based Programming Language (20)

PDF
Memoirs of a Graph Addict: Despair to Redemption
PDF
Undirected graphs
PDF
A Path Algebra for Mapping Multi-Relational Networks to Single-Relational Net...
PDF
1st UIM-GDB - Connections to the Real World
PDF
Graph Databases: Trends in the Web of Data
PDF
The Network: A Data Structure that Links Domains
PDF
Gephi short introduction
PDF
Large Scale Graph Processing with Apache Giraph
ODP
Grails goes Graph
PDF
Graph Databases in Python (PyCon Canada 2012)
PDF
Graph Theory and Databases
PDF
Eifrem neo4j
PDF
The Path-o-Logical Gremlin
PPTX
Slides Chapter10.1 10.2
PPTX
Chapter10.pptx jtuffuryrufhrhrurrufudurhrhr
PPTX
Complex Networks
PDF
Skiena algorithm 2007 lecture10 graph data strctures
PPTX
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
PDF
TinkerPop: a story of graphs, DBs, and graph DBs
Memoirs of a Graph Addict: Despair to Redemption
Undirected graphs
A Path Algebra for Mapping Multi-Relational Networks to Single-Relational Net...
1st UIM-GDB - Connections to the Real World
Graph Databases: Trends in the Web of Data
The Network: A Data Structure that Links Domains
Gephi short introduction
Large Scale Graph Processing with Apache Giraph
Grails goes Graph
Graph Databases in Python (PyCon Canada 2012)
Graph Theory and Databases
Eifrem neo4j
The Path-o-Logical Gremlin
Slides Chapter10.1 10.2
Chapter10.pptx jtuffuryrufhrhrurrufudurhrhr
Complex Networks
Skiena algorithm 2007 lecture10 graph data strctures
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
TinkerPop: a story of graphs, DBs, and graph DBs

More from Marko Rodriguez (17)

PDF
mm-ADT: A Virtual Machine/An Economic Machine
PDF
mm-ADT: A Multi-Model Abstract Data Type
PDF
Open Problems in the Universal Graph Theory
PDF
Gremlin 101.3 On Your FM Dial
PDF
ACM DBPL Keynote: The Graph Traversal Machine and Language
PDF
Faunus: Graph Analytics Engine
PDF
The Pathology of Graph Databases
PDF
A Perspective on Graph Theory and Network Science
PPT
The Network Data Structure in Computing
PPT
A Model of the Scholarly Community
PDF
General-Purpose, Internet-Scale Distributed Computing with Linked Process
PDF
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
PDF
Distributed Graph Databases and the Emerging Web of Data
PDF
An Overview of Data Management Paradigms: Relational, Document, and Graph
PDF
Graph Databases and the Future of Large-Scale Knowledge Management
PPT
Automatic Metadata Generation using Associative Networks
PDF
Evolving the Web into a Giant Global Database
mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Multi-Model Abstract Data Type
Open Problems in the Universal Graph Theory
Gremlin 101.3 On Your FM Dial
ACM DBPL Keynote: The Graph Traversal Machine and Language
Faunus: Graph Analytics Engine
The Pathology of Graph Databases
A Perspective on Graph Theory and Network Science
The Network Data Structure in Computing
A Model of the Scholarly Community
General-Purpose, Internet-Scale Distributed Computing with Linked Process
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
Distributed Graph Databases and the Emerging Web of Data
An Overview of Data Management Paradigms: Relational, Document, and Graph
Graph Databases and the Future of Large-Scale Knowledge Management
Automatic Metadata Generation using Associative Networks
Evolving the Web into a Giant Global Database

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Understanding_Digital_Forensics_Presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
20250228 LYD VKU AI Blended-Learning.pptx

Gremlin: A Graph-Based Programming Language

  • 1. Gremlin G = (V, E) A Graph-Based Programming Language Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com http://gremlin.tinkerpop.com February 25, 2010
  • 2. Abstract Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 3. Acknowledgements • Marko A. Rodriguez [http://markorodriguez.com] designed, developed, tested, and documented Gremlin. • Peter Neubauer [http://www.linkedin.com/in/neubauer] aided in the design and the evangelizing of Gremlin. • Pavel Yaskevich [http://github.com/xedin] aided in the development of user defined functions in Gremlin. • Joshua Shinavier [http://fortytwo.net] provided initial conceptual support for Gremlin. • Ketrina Yim [http://csillustrated.berkeley.edu] designed the logo for Gremlin. • Gremlin-Users Group [http://groups.google.com/group/gremlin-users] provided much direction in the design and implementation of Gremlin. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 4. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 5. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 6. What is a Graph? • A graph (network) is composed of a collection of vertices (dots) and edges (lines). There are many types of graphs: directed/undirected, weighted, attributed, etc. vertex-labeled a hyper d edge-attributed ed bele ht e-la multi ig edgknows created=2-01-09 we 0.2 modified=2-11-09 cted tic undire di an re ct m hired ed se reg ge ula half-ed r pseudo http://ex.com/123 type="person" name="emil" resource description framework vertex-attributed Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 7. Why Use a Graph? • A graph is a very general data structure that can be used to model various systems. A graph can model the structure of transportation, technological, bibliographic, etc. systems. A graph can model a list, a map, a tree, etc. • There are numerous graph algorithms that are defined independent of the domain of the graph model. • There are numerous graph databases, frameworks, packages, etc. that aid in the creation, manipulation, and analysis of graphs. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 8. Graph Databases, Frameworks, and Packages • Neo4j Graph Database [http://neo4j.org] • AllegroGraph Quad Store [http://http://www.franz.com/agraph] • HyperGraphDB [http://www.kobrix.com/hgdb.jsp] • Java Universal Network/Graph Framework [http://jung.sourceforge.net] • OpenRDF Sesame Framework [http://www.openrdf.org] • InfoGrid Graph Database [http://infogrid.org] • Filament Graph Toolkit [http://filament.sourceforge.net] • OWLim Semantic Repository [http://www.ontotext.com/owlim] • Sones Graph Database [http://www.sones.com] • NetworkX Graph Toolkit [http://networkx.lanl.gov] • iGraph Toolkit [http://igraph.sourceforge.net] • Blueprints Graph API [http://blueprints.tinkerpop.com] • ... and many more. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 9. What Makes Gremlin Different? • Gremlin is a domain specific language for working with graphs. • Gremlin is not an application programming interface (API). • Gremlin makes use of various graph databases, frameworks, packages. • Gremlin is a language that currently has a virtual machine implementation written in Java. • What can be succinctly expressed in Gremlin is verbose/clumsy to express in general purpose languages such as Java, Python, Ruby, etc. • Gremlin allows one to map single-relational graph analysis algorithms over to the multi-relational domain. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 10. Single-Relational Graphs • In single-relational graphs, all edges have the same meaning (e.g. all edges are either frienship, kinship, worksWith, knows, etc.). G = (V, E ⊆ (V × V )) • Most graph algorithms are defined for single-relational graphs (e.g. centrality/ranking, clustering/community detection, etc.). person-c person-a person-b NOTE: These types of graphs are also known as directed, vertex-labeled graphs. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 11. Multi-Relational Graphs • In multi-relational graphs, edges can have different meanings. G = (V, E ⊂ (V × V ), ω : E → Σ∗) • Most graph software is designed for multi-relational graphs (e.g. arbitrary objects as vertices and edges, knowledge-based reasoning systems, etc.). book-c read cites person-a authored book-b NOTE: These types of graphs are also known as directed, vertex/edge-labeled graphs. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 12. Gremlin and Multi-Relational Graphs • Gremlin provides a means to elegantly map single-relational graph analysis algorithms over to the multi-relational graph domain. • Gremlin provides an elegant way to do automated reasoning in multi-relational graphs using path expressions. These two points form the primary thesis of this presentation. Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), 29–41, doi:10.1016/j.joi.2009.06.004, LA-UR-08-03931, http://arxiv.org/abs/0806.2274, December 2009. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 13. Property Graphs • Gremlin works with a type of multi-relational graph called a property graph. Vertices and edges are labeled with unique identifiers. Edges are directed, labeled, and can form loops. Multiple edges of the same label can exist for the same vertex pair. Vertices and edges can have any number of key/value pair properties/attributes. Property graphs are a relatively general graph structure that can be constrained to model other graph structures — though, a property-based hypergraph would be the most general (see HyperGraphDB and the JUNG API). Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 14. Property Graphs name = "lop" lang = "java" weight = 0.4 3 name = "marko" age = 29 created weight = 0.2 9 1 created 8 created 12 7 weight = 1.0 weight = 0.4 6 weight = 0.5 knows knows 11 name = "peter" age = 35 name = "josh" 4 age = 32 2 10 name = "vadas" age = 27 weight = 1.0 created 5 name = "ripple" lang = "java" Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 15. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 16. Gremlin System Architecture • The Gremlin console is a scripting environment Gremlin Gremlin which allows for the dynamic evaluation of Console ScriptEngine Gremlin code. • Gremlin implements JSR 223 which allows Gremlin to also be used within the Java language and thus, as a virtual machine directly accessible to Java applications. Popular JSR 223 implementations include Jython, JRuby, and Groovy. For a fine list of implementations see https://scripting.dev.java.net. • Blueprints is a set of interfaces for abstract data structures such as graphs and documents. Implementations to these interfaces exist for various data management systems. • There exist many graph data management systems that span various graph data models Neo4j NativeStore TinkerGraph (e.g. edge labeled graphs, RDF graphs, hypergraphs, etc.). Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 17. “Hello World” in the Gremlin Console marko$ ./gremlin.sh ,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> gremlin> concat(‘goodbye’, ‘ ’, ‘self’) ==>goodbye self Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 18. Simple Traversals in Gremlin name = "lop" gremlin> $_ := g:key(‘name’,‘marko’) lang = "java" ==>v[1] weight = 0.4 3 name = "marko" age = 29 created gremlin> . 1 9 ==>v[1] created 7 8 created 12 gremlin> ./outE 6 weight = 0.5 knows ==>e[7][1-knows->2] knows 11 weight = 1.0 ==>e[9][1-created->3] name = "josh" 4 2 age = 32 ==>e[8][1-knows->4] name = "vadas" 10 gremlin> ./outE/@weight age = 27 ==>0.5 created ==>0.4 5 ==>1.0 ./outE/@weight: “Get the current object(s). Then get the outgoing edges of those objects. Then get the weights of those edges.” $ is a reserved variable meaning the root list of objects. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 19. Simple Traversals in Gremlin name = "lop" gremlin> . lang = "java" ==>v[1] 3 name = "marko" gremlin> ./outE[@label=‘created’]/inV age = 29 created 9 ==>v[3] 1 created 8 created gremlin> $_ := $_last 12 7 6 ==>v[3] knows knows 11 gremlin> ./@name ==>lop 4 2 gremlin> g:map(.) 10 ==>name=lop created ==>lang=java 5 ./outE[@label=‘created’]/inV: “Get the current object(s). Then get the outgoing edges of those objects, where their labels equal ‘created’. Then get the incoming vertices of those ‘created’ edges.” $ last is a reserved variable meaning the last value evaluated. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 20. Simple Traversals in Gremlin name = "lop" lang = "java" 3 name = "marko" age = 29 created 9 1 created 8 created 12 7 6 knows knows 11 name = "josh" 4 age = 32 2 10 name = "vadas" age = 27 created 5 ./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name ==>vadas Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 21. Simple Traversals in Gremlin ./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name 1. .: Get the current object(s). 2. outE[@label=‘knows’]: Get the outgoing edges of the current object(s), where their labels equal ‘knows’. 3. inV[matches(@name,‘va.{3}’) and @age > 21]: Get the incoming vertices of those ‘knows’ edges, where the names of those vertices are 5 characters long, start with ‘va’, and whose age is greater than 21. 4. @name: get the name of those particular incoming vertices. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 22. Knowledge-Based Reasoning • Blueprints implements the Sesame SAIL interfaces and thus, Gremlin can be used over the many Resource Description Framework (RDF) triple/quad stores. In such cases, RDF is modeled as a property graph where the named graph component is the @ng edge property. • Gremlin makes use of the Sesame SAIL SPARQL engine to allow for queries based on graph-pattern matching. gremlin> sail:sparql(‘SELECT ?x ?y WHERE { ?x foaf:knows ?y }’) ==>{y=v[http://ex.com#2], x=v[http://ex.com#1]} ==>{y=v[http://ex.com#4], x=v[http://ex.com#1]} • Gremlin is useful for knowledge-based reasoning using path expressions. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 23. Reasoning as Defining New Types of Adjacency • Graph-based reasoning is the process of making explicit what is implicit in lop co-developer the graph. created marko created • A reasoner takes a graph G co-developer peter and a collection of graph-patterns created (i.e. transformation/rewrite rules) and knows knows creates a new graph G (usually, G ⊂ josh G ). G has new relationships/edges vadas and thus, new definitions of vertex created adjacency. • Example: The co-developers of person ripple A are those people who have created the same software as person A and who are themselves, not person A (as person For these “co-developer” examples, we will use A has created the same software as him vertex 1 (marko) as the source of the reasoning or herself). process. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 24. The Co-Developers of Marko A. Rodriguez in SPARQL name = "lop" SELECT ?x WHERE { lang = "java" ?y marko created ?y . 3 name = "marko" age = 29 created ?z created ?y . marko 1 created ?z ?z != marko . created 6 ?z name ?x knows name = "peter" } age = 35 ?x knows ?z 4 name = "josh" age = 32 ?x This query would return: josh and 2 peter. created 5 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 25. The Co-Developers of Marko A. Rodriguez in Gremlin co-developer lop co-developer created created marko co-developer peter created knows knows josh vadas created ripple gremin> ./@name ==>marko gremlin> ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name ==>josh ==>peter Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 26. The Co-Developers of Marko A. Rodriguez in Gremlin ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name 1. .: Get the current object(s) (i.e. vertex 1 — denoting Marko). 2. outE[@label=‘created’]: Get the outgoing edges of the Marko vertex, where their labels equal ‘created’. 3. inV: Get the incoming (i.e. head) vertices of those ‘created’ edges. 4. inE[@label=‘created’]: Get the incoming edges of those vertices, where their labels equal ‘created’. 5. outV[g:except($ )]: Get the outgoing (i.e. tail) vertices of those ‘created’ edges, where those vertices are not the Marko vertex. 6. @name: get the name of those non-Marko vertices. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 27. Defining Co-Developers in Gremlin path co-developer ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)] end Once defined, you can use it like any other path segment. gremlin> ./co-developer ==>v[4] ==>v[6] gremlin> ./co-developer/@name ==>josh ==>peter Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 28. Defining Co-Developers in Java public class CoDeveloperPath implements Path { public List invoke(Object root) { if(root instanceof Vertex) { List<Vertex> projects = new ArrayList<Vertex>(); for(Edge edge : ((Vertex)root).getOutEdges()) { if(edge.getLabel().equals("created")) { projects.add(edge.getInVertex()); } } List<Vertex> coDevelopers = new ArrayList<Vertex>(); for(Vertex project : projects) { for(Edge edge : project.getInEdges()) { if(edge.getLabel().equals("created") && edge.getOutVertex() != root) { coDevelopers.add(edge.getOutVertex()); } } } return coDevelopers; } else { return null; } } } Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 29. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 30. Gremlin Type System object element graph number string boolean map list vertex edge Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 31. Predefined Paths and Properties vertex 1 out edges vertex 3 in edges edge 9 out vertex edge 9 label edge 9 in vertex edge 9 id 1 9 created 3 8 11 knows created 4 vertex 4 id vertex 4 properties name = "josh" age = 32 object property description example graph V the vertex iterator of the graph $g/V graph E the edge iterator of the graph $g/E vertex/edge @id the identifier of the element $v/@id vertex outE the outgoing edges of the vertex $v/outE vertex inE the incoming edges of the vertex $v/inE vertex bothE both in and out edges of the vertex $v/bothE edge outV the outgoing tail vertex of the edge $e/outV edge inV the incoming head vertex of the edge $e/outV edge bothV both in and out vertices of the edge $e/bothV edge @label the label of the edge $e/@label Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 32. Predefined Functions g:assign() g:remove-idx() g:list() g:sort() g:print() g:assign() g:load() g:dedup() g:map() g:time() g:unassign() g:save() g:union() g:keys() g:p() g:id() g:clear() g:intersect() g:values() g:to-json() g:key() g:close() g:difference() g:rand-nat() g:from-json() g:add-v() g:keys() g:retain() g:rand-real() ... g:add-e() g:values() g:except() g:prob() .. g:remove-ve() g:map() g:remove() g:cont() . g:idx-all() g:get() g:get() g:halt() g:add-idx() g:op-value() g:op-value() g:type() There are over 70 predefined functions. See the following for a description of each. http://wiki.github.com/tinkerpop/gremlin/core-function-library http://wiki.github.com/tinkerpop/gremlin/gremlin-function-library Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 33. Working With Non-Graph Types gremlin> 1.2 + 6 ==>7.2 gremlin> ‘this is a string’ ==>this is a string gremlin> true() or false() ==>true gremlin> g:map(‘marko’,‘lanl’,‘peter’,‘neotech’,‘josh’,‘rpi’) ==>marko=lanl ==>peter=neotech ==>josh=rpi gremlin> g:list(‘graphs’,‘hockey’,‘motorcylces’,6) ==>graphs ==>hockey ==>motorcylces ==>6.0 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 34. Working With Non-Graph Types gremlin> $m := g:map(‘hobbies’,g:list(‘hockey’,‘graphs’), ‘location’, g:map(‘state’,‘new mexico’, ‘city’, ‘santa fe’, ‘zipcode’, 87501), ‘age’, 30) ==>location={zipcode=87501.0, state=new mexico, city=santa fe} ==>age=30.0 ==>hobbies=[hockey, graphs] gremlin> $m/@age ==>30.0 gremlin> $m/@hobbies[2] ==>graphs gremlin> $m/@location/@city ==>santa fe Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 35. Variables • Variables in Gremlin are prefixed with a $ character. • There are a collection of reserved variables that all begin with $ . $ is the root list of objects. $ last is the last result evaluated by the evaluator. $ g is the “working graph” to reduce typing with graph functions. gremlin> $x := 1 ==>1.0 gremlin> $y := 2 ==>2.0 gremlin> $x + $y ==>3.0 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 36. Language Statements Variable Assignment Repeat gremlin> $i := 0 gremlin> $i := 1 + 5 ==>0.0 ==>6.0 gremlin> repeat 10 gremlin> $i $i := $i + 1 ==>6.0 end ==>10.0 If/Else While gremlin> if true() gremlin> $i := ‘g’ $i := 1 ==>g else gremlin> while not(matches($i, ‘ggg’)) $i := 2 $i := concat($i,‘g’) end end ==>1.0 ==>ggg Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 37. Language Statements Foreach Path gremlin> $i := 0 gremlin> path friend_name ==>0.0 ./outE[@label=‘knows’]/inV/@name gremlin> foreach $j in 1 | 2 | 3 end $i := $i + $j gremlin> gremlin> ./friend_name end ==>vadas ==>6.0 ==>josh Function gremlin> func ex:hello($name) concat(‘hello ’, $name) end gremlin> ex:hello(‘pavel’) ==>hello pavel You can define functions and paths in native Gremlin (as demonstrated above) or in Java. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 38. XPath Filters • Use [ ] filters to filter objects in a path expression (i.e. “such that” or “where”) • The evaluated result of [ ] must be a number or boolean. If its a number, it is treated as the position within an array (i.e. list). If it is boolean, it is treated as whether to include or exclude the object from the next path in the sequence. gremlin> ./outE[@label=‘knows’] ==>e[7][1-knows->2] ==>e[8][1-knows->4] gremlin> ./outE[@label=‘knows’ and @weight>0.5]/inV[@age<21 or @name=‘josh’][true()][1] ==>v[4] Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 39. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusion Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 40. A Grateful Dead Dataset 2,500 concerts 35,000 songs played 600 songs 30 years 11 members 1 band ... the Grateful Dead. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 41. A Grateful Dead Dataset • vertices denote songs and artists type: “song” or “artist” name: name of song or artist. performances: number of times song was played in concert. song type: whether the song was a “cover” or “original”. • edges denote followed by, sung by, written by weight: number of times a song was followed by another song over all concerts played. Rodriguez, M.A., Gintautas, V., Pepe, A., “A Grateful Dead Analysis: The Relationship Between Concert and Listening Behavior,” First Monday, 14(1), University of Illinois at Chicago Library, http://arxiv.org/abs/0807.2466, January 2009. NOTE: A portion of the raw dataset courtesy of Mark Leone http://www.cs.cmu.edu/ mleone/gdead/setlists.html Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 42. A Grateful Dead Dataset Stanley Theater type="artist" type="artist" name="Hunter" name="Garcia" Pittsburgh, PA (11/30/79) type="song" name="Scarlet.." 7 2nd Set 5 written_by 1 sung_by ------------------- weight=239 Scarlet Begonias followed_by type="song" Fire on the Mountain name="Fire on.." sung_by sung_by written_by Passenger 2 Terrapin Station weight=1 type="artist" name="Lesh" ... followed_by type="song" name="Pass.." 6 .. written_by 3 sung_by . followed_by type="song" weight=2 name="Terrap.." 4 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 43. A Grateful Dead Dataset – Load Data/Basic Stats gremlin> g:load(‘data/graph-example-2.xml’) ==>true gremlin> count($_g/V) ==>809.0 gremlin> count($_g/E) ==>8049.0 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 44. A Grateful Dead Dataset – Out-Degree of Each Vertex gremlin> $degrees := g:map() gremlin> foreach $v in $_g/V $degrees[@name=$v/@name] := count($v/outE) end Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 45. A Grateful Dead Dataset – Out-Degree of Each Vertex gremlin> g:sort($degrees, ‘value’, true()) ==>PLAYING IN THE BAND=96.0 ==>SUGAR MAGNOLIA=92.0 ==>PROMISED LAND=89.0 ==>GOOD LOVING=87.0 ==>NOT FADE AWAY=86.0 ==>I KNOW YOU RIDER=85.0 ==>CASSIDY=83.0 ==>DEAL=82.0 ==>JACK STRAW=81.0 ==>ONE MORE SATURDAY NIGHT=81.0 ==>EL PASO=80.0 ==>MEXICALI BLUES=79.0 ... Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 46. A Grateful Dead Dataset – Inspecting Single Vertex gremlin> $v := g:key(‘name’,‘CHINA DOLL’)[1] ==>v[129] gremlin> g:map($v) ==>name=CHINA DOLL ==>song_type=original ==>performances=114 ==>type=song gremlin> $v/outE[@label=‘sung_by’]/inV/@name ==>Garcia Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 47. A Grateful Dead Dataset – Inspecting Single Vertex gremlin> $v/outE[@label=‘followed_by’]/inV/@name ==>BIG RIVER ==>THROWING STONES ==>SAMSON AND DELILAH ==>TRUCKING ==>CASEY JONES ==>HIGH TIME ... gremlin> $v/outE[@label=‘followed_by’]/@weight ==>2 ==>8 ==>1 ==>2 ==>1 ==>1 ... Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 48. Introduction to PageRank • The remainder of this section will discuss the PageRank algorithm and its application to multi-relational graphs. • The arguments made and the examples presented generalizes to all other single-relational graph algorithms. However, for the sake of brevity and consistency, only PageRank will be discussed. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 49. Introduction to Matrix-Based PageRank • PageRank is a centrality measure based on the primary eigenvector |V |×|V | of a modified version of a graph. Let A ∈ R+ denote the adjacency matrix representing the graph. • In order to ensure a positive real values in the eigenvector, the graph must be strongly connected. PageRank induces strong connectivity by overlaying a low probability (defined by α ∈ [0, 1] – usually 0.15) 1 |V |×|V | “teleportation” graph over the original graph. Let B ∈ |V | denote a teleportation adjacency matrix where ever vertex is connected to vertex with equal probability. |V |×|V | C = (1 − α)A + αB, where C ∈ R+ |V | λ = λC, where λ ∈ R+ is the PageRank vector over V . Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 50. Introduction to Random Walk-Based PageRank • PageRank can be implemented by a random walk. • Create a vertex counter map, m : V → N+. • Place a walker on a random vertex in V . Denote the walker’s current vertex i ∈ V . 1. increment the vertex counter by 1 (i.e. m(i) ← m(i) + 1). 2. the walker chooses a random adjacent vertex with probability α. 3. the walker chooses a random vertex in V with probability 1 − α. 4. rinse and repeat until m reaches a stationary probability distribution (continually normalize m if you want a probability distribution). • We will use this random walk model in the Gremlin examples to follow. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 51. PageRank over Multi-Relational Graphs • PageRank was designed for single-relational graphs (i.e. where all edges have the same meaning). • In a multi-relational graph, what does it mean to find the centrality of a vertex when vertices can be related by various types of edges? For example, if there exists “socializes with” and “met once”, then the person who “met once” many people could be the most centrally located in the graph. Also, what if you graph has more than just “person”-type vertices (e.g. cars, pets, buildings, articles, etc.) and “person”-type edges (e.g. owns, walks, livesAt, cites, etc.). Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 52. PageRank over Multi-Relational Graphs • Calculating single-relational PageRank would yield Person as the most central ... Person type vertex. type type • You can boolean filter certain edge labels type type (e.g. ignore type edges — in such cases, type type type type type type type you would have the centrality scores over the knows social graph). • However, what if you only wanted to traverse knows edges if and only if the Herbert Johan Marko Josh Jen ... adjacent vertex knows more than 10 other people? knows knows knows knows • In the end, you want complete knows knows control (universal computability) over the paths that the traverser/walker can take through a graph. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 53. PageRank over Multi-Relational Graphs • In multi-relational graphs, the meaning of your graph algorithm’s results are defined by your definition of adjacency. • With respect to random walk-based PageRank, define the path that the walker should take. That path is the definition of adjacency. • The stationary probability distribution created from this walk yields a path-dependent centrality. • Thus, in a multi-relational graph, there are many types of PageRanks that can be calculated — one for each type of path defined for a walker. Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, Knowledge-Based Systems, 21(7), 727–739, http://arxiv.org/abs/0803.4355, October 2008. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 54. PageRank over “Garcia Followed By” SubGraph • Define a path that will go from song-to-song by “followed by” edges and only traverse songs that are “sung by” Jerry Garcia. (./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’] /inV[name=‘Garcia’]/../..)[g:rand-nat()] A B C D /../.. followed_by sung_by name="Garcia" g:rand-nat() . followed_by sung_by name="Garcia" followed_by sung_by name="Weir" Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 55. PageRank over “Garcia Followed By” SubGraph path garcia-followed_by (./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’] /inV[name=‘Garcia’]/../..)[g:rand-nat()] end $m := g:map() $alpha := 0.15 $_ := g:key(‘type’, ‘song’)[g:rand-nat()] repeat 2500 $_ := ./garcia-followed_by if count($_) > 0 g:op-value(‘+’,$m,$_[1]/@name, 1.0) end if g:rand-real() < $alpha or count($_) = 0 $_ := g:key(‘type’, ’song’)[g:rand-nat()] end end Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 56. PageRank over “Garcia Followed By” SubGraph gremlin> g:sort($m,‘value’,true()) ==>CRAZY FINGERS=98.0 ==>HES GONE=85.0 ==>CHINA CAT SUNFLOWER=79.0 ==>BERTHA=76.0 ==>UNCLE JOHNS BAND=74.0 ==>TERRAPIN STATION=72.0 ==>GOING DOWN THE ROAD FEELING BAD=71.0 ==>WHARF RAT=71.0 ==>EYES OF THE WORLD=65.0 ==>COLD RAIN AND SNOW=62.0 ==>SHIP OF FOOLS=58.0 ==>RAMBLE ON ROSE=53.0 ==>CASEY JONES=51.0 ==>DARK STAR=47.0 ==>DEAL=46.0 ... Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 57. Universal Computation in Paths path path-name # any arbitrary computation can occur here end • A path definition can be used to define adjacencies. adjacency can be expressed as anything that can be computed by a Turing machine. path definitions are used to create “semantically meaningful” results from single- relational graph algorithms applied to multi-relational graphs. path definitions make explicit what is implicit in the structure of the graph. This has applications to knowledge-based reasoning. • A path definition can perform any arbitrary computation. path definitions can check/set vertex/edge properties. path definitions can create new vertices and edges. path definitions can call/define functions. This allows fine grained control over how your traverser/walker moves through a graph. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 58. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 59. The Current Gremlin EcoSystems • Webling: Web console for Gremlin (developed by Pavel Yaskevich w/ funding from Neo Technology) Webling • Project Gargamel: Distributed Graph Computing (uses Linked Process and Gremlin) • ReXster: A Graph-Based Recommender Engine Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 60. Thank You Please enjoy Gremlin at http://gremlin.tinkerpop.com ... My homepage is http://markorodriguez.com. Please feel to contact me with any questions or comments. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010