SlideShare a Scribd company logo
Neo4j a NOSQL overview
                     and
                the benefits of
               graph databases
                             #neo4j
Emil Eifrem                  @emileifrem
CEO, Neo Technology          emil@neotechnology.com
What's the plan?

 Why now? – Four trends

 NoSQL NOSQL overview

 Graph databases && Neo4j

 Conclusions

 Food
Trend 1:
data set size



40
2007            Source: IDC 2007
988

Trend 1:
data set size



40
2007            2010   Source: IDC 2007
Trend 2: connectedness
                                                                                  Giant
                                                                                 Global
 Information connectivity


                                                                                 Graph
                                                                                 (GGG)

                                                                    Ontologies


                                                              RDF

                                                                          Folksonomies
                                                          Tagging

                                                                User-
                                              Wikis
                                                              generated
                                                               content
                                                      Blogs


                                             RSS


                                 Hypertext


                       Text
                    documents      web 1.0            web 2.0              “web 3.0”

                                1990         2000                   2010                  2020
Trend 3: semi-structure
 Individualization of content!
   In the salary lists of the 1970s, all elements had
   exactly one job
   In the salary lists of the 2000s, we need 5 job
   columns! Or 8? Or 15?

 Trend accelerated by the decentralization of
 content generation that is the hallmark of the age
 of participation (“web 2.0”)
Aside: RDBMS performance
                                                            Relational database
               Salary List
 Performance




                             Majority of
                             Webapps



                                           Social network

                                                                   Semantic




                                                  }
                                                                    Trading




                                                              custom


                                                 Data complexity
Trend 4: architecture

       1990s: Database as integration hub
Trend 4: architecture

               2000s: (Slowly towards...)
       Decoupled services with own backend
Why NoSQL 2009?

 Trend 1: Size.

 Trend 2: Connectivity.

 Trend 3: Semi-structure.

 Trend 4: Architecture.
NoSQL
overview
First off: the damn name

 NoSQL is NOT “Never SQL”

 NoSQL is NOT “No To SQL”

 NoSQL is NOT “WE HATE CHRIS' DOG”
NOSQL
    is simply


Not Only SQL!
Four (emerging) NOSQL categories
 Key-value stores
   Based on DHTs / Amazon's Dynamo paper
   Data model: (global) collection of K-V pairs
   Example: Dynomite, Voldemort, Tokyo

 BigTable clones
   Based on Google's BigTable paper
   Data model: big table, column families
   Example: Hbase, Hypertable
Four (emerging) NOSQL categories
 Document databases
   Inspired by Lotus Notes
   Data model: collections of K-V collections
   Example: CouchDB, MongoDB

 Graph databases
   Inspired by Euler & graph theory
   Data model: nodes, rels, K-V on both
   Example: AllegroGraph, VertexDB, Neo4j
NOSQL data models
 Size



        Key-value stores


                     Bigtable clones


                                       Document
                                       databases


                                                   Graph databases




                                                        Complexity
NOSQL data models
   Size



          Key-value stores


                       Bigtable clones


                                         Document
                                         databases


                                                     Graph databases

                                                                (This is still billions of
 90%                                                            nodes & relationships)
  of
 use
cases




                                                          Complexity
Graph DBs
& Neo4j intro
The Graph DB model: representation
 Core abstractions:                          name = “Emil”
                                             age = 29
   Nodes                                     sex = “yes”



   Relationships between nodes
                                         1                         2
   Properties on both

                        type = KNOWS
                        time = 4 years                       3

                                                                 type = car
                                                                 vendor = “SAAB”
                                                                 model = “95 Aero”
Example: The Matrix
                                                                            name = “The Architect”
                               name = “Morpheus”
                               rank = “Captain”
name = “Thomas Anderson”
                               occupation = “Total badass”                                        42
age = 29
                                                  disclosure = public


                 KNOWS                             KNOWS                                             CODED_BY
     1                                                                           KN O
                                          7                             3            WS

                                                                                                 13
                                           S
                 KN                                       name = “Cypher”
                                          KNOW


                      OW                                  last name = “Reagan”
                           S
                                                                                              name = “Agent Smith”
                                                                        disclosure = secret   version = 1.0b
         age = 3 days                                                   age = 6 months        language = C++
                                      2

                               name = “Trinity”
Code (1): Building a node space
NeoService neo = ... // Get factory


// Create Thomas 'Neo' Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );

// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );
morpheus.setProperty( "occupation", "Total bad ass" );

// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
// ...create Trinity, Cypher, Agent Smith, Architect similarly
Code (1): Building a node space
NeoService neo = ... // Get factory
Transaction tx = neo.beginTx();

// Create Thomas 'Neo' Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );

// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );
morpheus.setProperty( "occupation", "Total bad ass" );

// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
// ...create Trinity, Cypher, Agent Smith, Architect similarly

tx.commit();
Code (1b): Defining RelationshipTypes
// In package org.neo4j.api.core
public interface RelationshipType
{
   String name();
}

// In package org.yourdomain.yourapp
// Example on how to roll dynamic RelationshipTypes
class MyDynamicRelType implements RelationshipType
{
   private final String name;
   MyDynamicRelType( String name ){ this.name = name; }
   public String name() { return this.name; }
}

// Example on how to kick it, static-RelationshipType-like
enum MyStaticRelTypes implements RelationshipType
{
   KNOWS,
   WORKS_FOR,
}
Whiteboard friendly



                                owns
                      Björn                 Big Car
                       build             drives


                               DayCare
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
The Graph DB model: traversal
 Traverser framework for                    name = “Emil”
 high-performance traversing                age = 29
                                            sex = “yes”
 across the node space
                                        1                         2



                       type = KNOWS
                       time = 4 years                       3

                                                                type = car
                                                                vendor = “SAAB”
                                                                model = “95 Aero”
Example: Mr Anderson’s friends
                                                                            name = “The Architect”
                               name = “Morpheus”
                               rank = “Captain”
name = “Thomas Anderson”
                               occupation = “Total badass”                                        42
age = 29
                                                  disclosure = public


                 KNOWS                             KNOWS                                             CODED_BY
     1                                                                           KN O
                                          7                             3            WS

                                                                                                 13
                                           S
                 KN                                       name = “Cypher”
                                          KNOW


                      OW                                  last name = “Reagan”
                           S
                                                                                              name = “Agent Smith”
                                                                        disclosure = secret   version = 1.0b
         age = 3 days                                                   age = 6 months        language = C++
                                      2

                               name = “Trinity”
Code (2): Traversing a node space
// Instantiate a traverser that returns Mr Anderson's friends
Traverser friendsTraverser = mrAnderson.traverse(
   Traverser.Order.BREADTH_FIRST,
   StopEvaluator.END_OF_GRAPH,
   ReturnableEvaluator.ALL_BUT_START_NODE,
   RelTypes.KNOWS,
   Direction.OUTGOING );

// Traverse the node space and print out the result
System.out.println( "Mr Anderson's friends:" );
for ( Node friend : friendsTraverser )
{
   System.out.printf( "At depth %d => %s%n",
       friendsTraverser.currentPosition().getDepth(),
       friend.getProperty( "name" ) );
}
name = “The Architect”
                                name = “Morpheus”
                                rank = “Captain”
 name = “Thomas Anderson”
                                occupation = “Total badass”                                         42
 age = 29
                                                   disclosure = public


                  KNOWS                             KNOWS                                              CODED_BY
      1                                                                           KN O
                                           7                             3            WS

                                                                                                   13

                                            S
                  KN                                       name = “Cypher”

                                           KNOW
                       OW                                  last name = “Reagan”
                            S
                                                                                                name = “Agent Smith”
                                                                         disclosure = secret    version = 1.0b
          age = 3 days                                                   age = 6 months         language = C++
                                       2

                                name = “Trinity”
                                                                     $ bin/start-neo-example
                                                                     Mr Anderson's friends:

                                                                     At      depth     1   =>   Morpheus
friendsTraverser = mrAnderson.traverse(
  Traverser.Order.BREADTH_FIRST,                                     At      depth     1   =>   Trinity
  StopEvaluator.END_OF_GRAPH,                                        At      depth     2   =>   Cypher
  ReturnableEvaluator.ALL_BUT_START_NODE,
  RelTypes.KNOWS,
                                                                     At      depth     3   =>   Agent Smith
  Direction.OUTGOING );                                              $
Example: Friends in love?
                                                                                  name = “The Architect”
                                     name = “Morpheus”
                                     rank = “Captain”
name = “Thomas Anderson”
                                     occupation = “Total badass”                                        42
age = 29
                                                        disclosure = public


                       KNOWS                             KNOWS                                             CODED_BY
     1                                          7                             3        KN O
                                                                                           WS

                                                                                                       13
                                                 S

                       KN
                                                KNOW


                                                                name = “Cypher”
                            OW                                  last name = “Reagan”
                                 S
                                                                                                    name = “Agent Smith”
         LO                                                                   disclosure = secret   version = 1.0b
              VE                                                              age = 6 months        language = C++
                   S
                                            2

                                     name = “Trinity”
Code (3a): Custom traverser
// Create a traverser that returns all “friends in love”
Traverser loveTraverser = mrAnderson.traverse(
   Traverser.Order.BREADTH_FIRST,
   StopEvaluator.END_OF_GRAPH,
   new ReturnableEvaluator()
   {
       public boolean isReturnableNode( TraversalPosition pos )
       {
          return pos.currentNode().hasRelationship(
              RelTypes.LOVES, Direction.OUTGOING );
       }
   },
   RelTypes.KNOWS,
   Direction.OUTGOING );
Code (3a): Custom traverser
// Traverse the node space and print out the result
System.out.println( "Who’s a lover?" );
for ( Node person : loveTraverser )
{
   System.out.printf( "At depth %d => %s%n",
       loveTraverser.currentPosition().getDepth(),
       person.getProperty( "name" ) );
}
name = “The Architect”
                                   name = “Morpheus”
                                   rank = “Captain”
 name = “Thomas Anderson”
                                   occupation = “Total badass”                                        42
 age = 29
                                                      disclosure = public


                     KNOWS                             KNOWS                         KN O                CODED_BY
      1                                       7                             3            WS

                                                                                                     13

                                               S
                     KN

                                              KNOW
                                                              name = “Cypher”
                          OW                                  last name = “Reagan”
                               S
                                                                                                  name = “Agent Smith”
          LO                                                                disclosure = secret   version = 1.0b
            VE                                                              age = 6 months        language = C++
                 S                        2

                                   name = “Trinity”
                                                                       $ bin/start-neo-example
new ReturnableEvaluator()
                                                                       Who’s a lover?
{
  public boolean isReturnableNode(
    TraversalPosition pos)
                                                                       At depth 1 => Trinity
  {                                                                    $
    return pos.currentNode().
      hasRelationship( RelTypes.LOVES,
         Direction.OUTGOING );
  }
},
Bonus code: domain model
    How do you implement your domain model?
    Use the delegator pattern, i.e. every domain entity
    wraps a Neo4j primitive:
// In package org.yourdomain.yourapp
class PersonImpl implements Person
{
   private final Node underlyingNode;
   PersonImpl( Node node ){ this.underlyingNode = node; }

    public String getName()
    {
       return this.underlyingNode.getProperty( "name" );
    }
    public void setName( String name )
    {
       this.underlyingNode.setProperty( "name", name );
    }
}
Domain layer frameworks
 Qi4j (www.qi4j.org)
   Framework for doing DDD in pure Java5
   Defines Entities / Associations / Properties
     Sound familiar? Nodes / Rel’s / Properties!
   Neo4j is an “EntityStore” backend

 NeoWeaver (http://components.neo4j.org/neo-weaver)
   Weaves Neo4j-backed persistence into domain
   objects in runtime (dynamic proxy / cglib based)
   Veeeery alpha
Neo4j system characteristics
 Disk-based
   Native graph storage engine with custom binary
   on-disk format
 Transactional
   JTA/JTS, XA, 2PC, Tx recovery, deadlock
   detection, MVCC, etc
 Scales up (what's the x and the y?)
   Several billions of nodes/rels/props on single JVM
 Robust
   6+ years in 24/7 production
Social network pathExists()
          12
                               ~1k persons
                           3
7         1                    Avg 50 friends per
                               person
                               pathExists(a, b) limit
    36
         41           77       depth 4
                 5
                               Two backends
                               Eliminate disk IO so
                               warm up caches
Social network pathExists()

                 2
                Emil
         1                                    5
                                      7
        Mike                                Kevin
                           3        John
                         Marcus
                   9                4
                 Bruce            Leigh

                                  # persons query time
Relational database                   1 000 2 000 ms
Graph database (Neo4j)                1 000      2 ms
Graph database (Neo4j)            1 000 000      2 ms
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
Pros & Cons compared to RDBMS
+ No O/R impedance mismatch (whiteboard friendly)
+ Can easily evolve schemas
+ Can represent semi-structured info
+ Can represent graphs/networks (with performance)


- Lacks in tool and framework support
- Few other implementations => potential lock in
- No support for ad-hoc queries
+
More consequences
 Ability to capture semi-structured information
   => allowing individualization of content
 No predefined schema
   => easier to evolve model
   => can capture ad-hoc relationships
 Can capture non-normative relations
   => easy to model specific links to specific sets
 All state is kept in transactional memory
   => improves application concurrency
The Neo4j ecosystem
 Neo4j is an embedded database
   Tiny teeny lil jar file
 Component ecosystem
   index-util
   neo-meta
   neo-utils
   pattern-match
   sparql-engine
   ...
 See http://components.neo4j.org
Language bindings
 Neo4j.py – bindings for Jython and CPython
   http://components.neo4j.org/neo4j.py

 Neo4jrb – bindings for JRuby (incl RESTful API)
   http://wiki.neo4j.org/content/Ruby

 Clojure
   http://wiki.neo4j.org/content/Clojure

 Scala (incl RESTful API)
   http://wiki.neo4j.org/content/Scala

 … .NET? Erlang?
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
Grails Neoclipse screendump
Scale out – replication
 Rolling out Neo4j HA before end-of-year
   Side note: ppl roll it today w/ REST frontends & onlinebackup

 Master-slave replication, 1st configuration
   MySQL style... ish
   Except all instances can write, synchronously
   between writing slave & master (strong consistency)
   Updates are asynchronously propagated to the
   other slaves (eventual consistency)
 This can handle billions of entities...
 … but not 100B
Scale out – partitioning
 Sharding possible today
   … but you have to do manual work
   … just as with MySQL
   Great option: shard on top of resilient, scalable
   OSS app server             , see: www.codecauldron.org
 Transparent partitioning? Neo4j 2.0
   100B? Easy to say. Sliiiiightly harder to do.
   Fundamentals: BASE & eventual consistency
   Generic clustering algorithm as base case, but
   give lots of knobs for developers
How ego are you? (aka other impls?)
 Franz’ AllegroGraph      (http://agraph.franz.com)

   Proprietary, Lisp, RDF-oriented but real graphdb
 FreeBase graphd     (http://bit.ly/13VITB)

   In-house at Metaweb
 Kloudshare   (http://kloudshare.com)

   Graph database in the cloud, still stealth mode
 Google Pregel   (http://bit.ly/dP9IP)

   We are oh-so-secret
 Some academic papers from ~10 years ago
   G = {V, E}   #FAIL
Conclusion
 Graphs && Neo4j => teh awesome!
 Available NOW under AGPLv3 / commercial license
   AGPLv3: “if you’re open source, we’re open source”
   If you have proprietary software? Must buy a
   commercial license
   But up to 1M primitives it’s free for all uses!
 Download
   http://neo4j.org
 Feedback
   http://lists.neo4j.org
Party pooper slides
Poop 1
 Key-value stores?
   => the awesome
   … if you have 1000s of BILLIONS records OR you
   don't care about programmer productivity

 What if you had no variables at all in your programs
 except a single globally accessible hashtable?
 Would your software be maintainable?
Poop 2
 In a not-suck architecture...

 … the only thing that makes sense is to have an
 embedded database.
Poop 3
 Exposing your data model on the wire is bad.
 Period.

 Adding a couple of buzzwords doesn't make it less
 bad.

 If it was bad with SQL-over-sockets (hint: it was)
 then – surprise! – it's still bad even tho you use
 Hype-compliant(tm) JSON-over-REST.

 We don't want to couple everything to a specific
 data model again!
Poop 4
 In-memory database

 What the hell?
   That's an oxymoron!
   Up next: ascii-only JPEG
   Up next: loopback-only web server

 If you're not durable, you're a cache!
 If you happen to asynchronously spill over to disk,
 you're a cache that asynchronously spills over to
 disk.
Ait
so, srsly?
Looking ahead: polyglot persistence


      SQL     &&      NoSQL
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
Questions?




             Image credit: lost again! Sorry :(
http://neotechnology.com

More Related Content

ZIP
NoSQL databases
PDF
Graph database Use Cases
PDF
Workshop - Neo4j Graph Data Science
PPTX
Chapter1: NoSQL: It’s about making intelligent choices
PDF
ENEL Electricity Topology Network on Neo4j Graph DB
PDF
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
PDF
Graph based data models
NoSQL databases
Graph database Use Cases
Workshop - Neo4j Graph Data Science
Chapter1: NoSQL: It’s about making intelligent choices
ENEL Electricity Topology Network on Neo4j Graph DB
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Graph based data models

What's hot (20)

PDF
Introduction to Neo4j
PPT
Database connectivity and web technologies
PPTX
Introduction to Graph Databases
PPT
Neo4J : Introduction to Graph Database
PDF
Tutorial On Database Management System
PDF
Tag.bio: Self Service Data Mesh Platform
PPTX
Hadoop File system (HDFS)
PPT
DBMS an Example
PDF
Neo4j Presentation
PPTX
Oracle REST Data Services: Options for your Web Services
PPTX
Object oriented database
PDF
Introduction to column oriented databases
ODP
Partitioning
PDF
BigTable And Hbase
PPTX
Chapter1
PPT
Graph database
PPT
10. Graph Databases
PDF
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
PPTX
NoSQL Graph Databases - Why, When and Where
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Neo4j
Database connectivity and web technologies
Introduction to Graph Databases
Neo4J : Introduction to Graph Database
Tutorial On Database Management System
Tag.bio: Self Service Data Mesh Platform
Hadoop File system (HDFS)
DBMS an Example
Neo4j Presentation
Oracle REST Data Services: Options for your Web Services
Object oriented database
Introduction to column oriented databases
Partitioning
BigTable And Hbase
Chapter1
Graph database
10. Graph Databases
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
NoSQL Graph Databases - Why, When and Where
Introduction to Big Data & Hadoop Architecture - Module 1
Ad

Viewers also liked (16)

PDF
GraphConnect SF 2013 Keynote
PDF
Neo4 jv2 english
PDF
Graph Search: The Power of Connected Data
PPTX
OrientDB vs Neo4j - and an introduction to NoSQL databases
PPTX
NOSQL vs SQL
PDF
Dbta Webinar Realize Value of Big Data with graph 011713
PDF
Graph Databases - Where Do We Do the Modeling Part?
PPTX
Graph Databases for SQL Server Professionals
PDF
A walk in graph databases v1.0
PPTX
An Introduction to NOSQL, Graph Databases and Neo4j
PDF
GraphTalks Rome - The Italian Business Graph
PPT
An Introduction to Graph Databases
PDF
Converting Relational to Graph Databases
PDF
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
DOCX
PFE :: Application de gestion des dus d'enseignement
PDF
Webinar: RDBMS to Graphs
GraphConnect SF 2013 Keynote
Neo4 jv2 english
Graph Search: The Power of Connected Data
OrientDB vs Neo4j - and an introduction to NoSQL databases
NOSQL vs SQL
Dbta Webinar Realize Value of Big Data with graph 011713
Graph Databases - Where Do We Do the Modeling Part?
Graph Databases for SQL Server Professionals
A walk in graph databases v1.0
An Introduction to NOSQL, Graph Databases and Neo4j
GraphTalks Rome - The Italian Business Graph
An Introduction to Graph Databases
Converting Relational to Graph Databases
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
PFE :: Application de gestion des dus d'enseignement
Webinar: RDBMS to Graphs
Ad

Similar to A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009) (20)

PDF
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
PDF
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
PDF
Eifrem neo4j
PDF
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
PDF
Neo4j -- or why graph dbs kick ass
PDF
NOSQLEU - Graph Databases and Neo4j
PDF
Neo4j - The Benefits of Graph Databases (OSCON 2009)
PPTX
CSC 8101 Non Relational Databases
PDF
Django and Neo4j - Domain modeling that kicks ass
PPTX
No Sql Movement
ODP
Grails goes Graph
PDF
Neo4j Nosqllive
PDF
Graph Theory and Databases
PDF
NoSQL intro for YaJUG / NoSQL UG Luxembourg
PDF
An overview of NOSQL (JFokus 2011)
PPTX
Anti-social Databases
PDF
NoSQL with Hadoop and HBase
PDF
Gephi short introduction
PDF
sones company presentation
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
Eifrem neo4j
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
Neo4j -- or why graph dbs kick ass
NOSQLEU - Graph Databases and Neo4j
Neo4j - The Benefits of Graph Databases (OSCON 2009)
CSC 8101 Non Relational Databases
Django and Neo4j - Domain modeling that kicks ass
No Sql Movement
Grails goes Graph
Neo4j Nosqllive
Graph Theory and Databases
NoSQL intro for YaJUG / NoSQL UG Luxembourg
An overview of NOSQL (JFokus 2011)
Anti-social Databases
NoSQL with Hadoop and HBase
Gephi short introduction
sones company presentation

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Modernizing your data center with Dell and AMD
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
The Rise and Fall of 3GPP – Time for a Sabbatical?
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Understanding_Digital_Forensics_Presentation.pptx
Empathic Computing: Creating Shared Understanding
“AI and Expert System Decision Support & Business Intelligence Systems”
Modernizing your data center with Dell and AMD
Electronic commerce courselecture one. Pdf
NewMind AI Weekly Chronicles - August'25 Week I
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)

  • 1. Neo4j a NOSQL overview and the benefits of graph databases #neo4j Emil Eifrem @emileifrem CEO, Neo Technology emil@neotechnology.com
  • 2. What's the plan? Why now? – Four trends NoSQL NOSQL overview Graph databases && Neo4j Conclusions Food
  • 3. Trend 1: data set size 40 2007 Source: IDC 2007
  • 4. 988 Trend 1: data set size 40 2007 2010 Source: IDC 2007
  • 5. Trend 2: connectedness Giant Global Information connectivity Graph (GGG) Ontologies RDF Folksonomies Tagging User- Wikis generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020
  • 6. Trend 3: semi-structure Individualization of content! In the salary lists of the 1970s, all elements had exactly one job In the salary lists of the 2000s, we need 5 job columns! Or 8? Or 15? Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”)
  • 7. Aside: RDBMS performance Relational database Salary List Performance Majority of Webapps Social network Semantic } Trading custom Data complexity
  • 8. Trend 4: architecture 1990s: Database as integration hub
  • 9. Trend 4: architecture 2000s: (Slowly towards...) Decoupled services with own backend
  • 10. Why NoSQL 2009? Trend 1: Size. Trend 2: Connectivity. Trend 3: Semi-structure. Trend 4: Architecture.
  • 12. First off: the damn name NoSQL is NOT “Never SQL” NoSQL is NOT “No To SQL” NoSQL is NOT “WE HATE CHRIS' DOG”
  • 13. NOSQL is simply Not Only SQL!
  • 14. Four (emerging) NOSQL categories Key-value stores Based on DHTs / Amazon's Dynamo paper Data model: (global) collection of K-V pairs Example: Dynomite, Voldemort, Tokyo BigTable clones Based on Google's BigTable paper Data model: big table, column families Example: Hbase, Hypertable
  • 15. Four (emerging) NOSQL categories Document databases Inspired by Lotus Notes Data model: collections of K-V collections Example: CouchDB, MongoDB Graph databases Inspired by Euler & graph theory Data model: nodes, rels, K-V on both Example: AllegroGraph, VertexDB, Neo4j
  • 16. NOSQL data models Size Key-value stores Bigtable clones Document databases Graph databases Complexity
  • 17. NOSQL data models Size Key-value stores Bigtable clones Document databases Graph databases (This is still billions of 90% nodes & relationships) of use cases Complexity
  • 19. The Graph DB model: representation Core abstractions: name = “Emil” age = 29 Nodes sex = “yes” Relationships between nodes 1 2 Properties on both type = KNOWS time = 4 years 3 type = car vendor = “SAAB” model = “95 Aero”
  • 20. Example: The Matrix name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 KN O 7 3 WS 13 S KN name = “Cypher” KNOW OW last name = “Reagan” S name = “Agent Smith” disclosure = secret version = 1.0b age = 3 days age = 6 months language = C++ 2 name = “Trinity”
  • 21. Code (1): Building a node space NeoService neo = ... // Get factory // Create Thomas 'Neo' Anderson Node mrAnderson = neo.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); // Create Morpheus Node morpheus = neo.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); morpheus.setProperty( "occupation", "Total bad ass" ); // Create a relationship representing that they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ...create Trinity, Cypher, Agent Smith, Architect similarly
  • 22. Code (1): Building a node space NeoService neo = ... // Get factory Transaction tx = neo.beginTx(); // Create Thomas 'Neo' Anderson Node mrAnderson = neo.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); // Create Morpheus Node morpheus = neo.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); morpheus.setProperty( "occupation", "Total bad ass" ); // Create a relationship representing that they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ...create Trinity, Cypher, Agent Smith, Architect similarly tx.commit();
  • 23. Code (1b): Defining RelationshipTypes // In package org.neo4j.api.core public interface RelationshipType { String name(); } // In package org.yourdomain.yourapp // Example on how to roll dynamic RelationshipTypes class MyDynamicRelType implements RelationshipType { private final String name; MyDynamicRelType( String name ){ this.name = name; } public String name() { return this.name; } } // Example on how to kick it, static-RelationshipType-like enum MyStaticRelTypes implements RelationshipType { KNOWS, WORKS_FOR, }
  • 24. Whiteboard friendly owns Björn Big Car build drives DayCare
  • 26. The Graph DB model: traversal Traverser framework for name = “Emil” high-performance traversing age = 29 sex = “yes” across the node space 1 2 type = KNOWS time = 4 years 3 type = car vendor = “SAAB” model = “95 Aero”
  • 27. Example: Mr Anderson’s friends name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 KN O 7 3 WS 13 S KN name = “Cypher” KNOW OW last name = “Reagan” S name = “Agent Smith” disclosure = secret version = 1.0b age = 3 days age = 6 months language = C++ 2 name = “Trinity”
  • 28. Code (2): Traversing a node space // Instantiate a traverser that returns Mr Anderson's friends Traverser friendsTraverser = mrAnderson.traverse( Traverser.Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, RelTypes.KNOWS, Direction.OUTGOING ); // Traverse the node space and print out the result System.out.println( "Mr Anderson's friends:" ); for ( Node friend : friendsTraverser ) { System.out.printf( "At depth %d => %s%n", friendsTraverser.currentPosition().getDepth(), friend.getProperty( "name" ) ); }
  • 29. name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 KN O 7 3 WS 13 S KN name = “Cypher” KNOW OW last name = “Reagan” S name = “Agent Smith” disclosure = secret version = 1.0b age = 3 days age = 6 months language = C++ 2 name = “Trinity” $ bin/start-neo-example Mr Anderson's friends: At depth 1 => Morpheus friendsTraverser = mrAnderson.traverse( Traverser.Order.BREADTH_FIRST, At depth 1 => Trinity StopEvaluator.END_OF_GRAPH, At depth 2 => Cypher ReturnableEvaluator.ALL_BUT_START_NODE, RelTypes.KNOWS, At depth 3 => Agent Smith Direction.OUTGOING ); $
  • 30. Example: Friends in love? name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 7 3 KN O WS 13 S KN KNOW name = “Cypher” OW last name = “Reagan” S name = “Agent Smith” LO disclosure = secret version = 1.0b VE age = 6 months language = C++ S 2 name = “Trinity”
  • 31. Code (3a): Custom traverser // Create a traverser that returns all “friends in love” Traverser loveTraverser = mrAnderson.traverse( Traverser.Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, new ReturnableEvaluator() { public boolean isReturnableNode( TraversalPosition pos ) { return pos.currentNode().hasRelationship( RelTypes.LOVES, Direction.OUTGOING ); } }, RelTypes.KNOWS, Direction.OUTGOING );
  • 32. Code (3a): Custom traverser // Traverse the node space and print out the result System.out.println( "Who’s a lover?" ); for ( Node person : loveTraverser ) { System.out.printf( "At depth %d => %s%n", loveTraverser.currentPosition().getDepth(), person.getProperty( "name" ) ); }
  • 33. name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS KN O CODED_BY 1 7 3 WS 13 S KN KNOW name = “Cypher” OW last name = “Reagan” S name = “Agent Smith” LO disclosure = secret version = 1.0b VE age = 6 months language = C++ S 2 name = “Trinity” $ bin/start-neo-example new ReturnableEvaluator() Who’s a lover? { public boolean isReturnableNode( TraversalPosition pos) At depth 1 => Trinity { $ return pos.currentNode(). hasRelationship( RelTypes.LOVES, Direction.OUTGOING ); } },
  • 34. Bonus code: domain model How do you implement your domain model? Use the delegator pattern, i.e. every domain entity wraps a Neo4j primitive: // In package org.yourdomain.yourapp class PersonImpl implements Person { private final Node underlyingNode; PersonImpl( Node node ){ this.underlyingNode = node; } public String getName() { return this.underlyingNode.getProperty( "name" ); } public void setName( String name ) { this.underlyingNode.setProperty( "name", name ); } }
  • 35. Domain layer frameworks Qi4j (www.qi4j.org) Framework for doing DDD in pure Java5 Defines Entities / Associations / Properties Sound familiar? Nodes / Rel’s / Properties! Neo4j is an “EntityStore” backend NeoWeaver (http://components.neo4j.org/neo-weaver) Weaves Neo4j-backed persistence into domain objects in runtime (dynamic proxy / cglib based) Veeeery alpha
  • 36. Neo4j system characteristics Disk-based Native graph storage engine with custom binary on-disk format Transactional JTA/JTS, XA, 2PC, Tx recovery, deadlock detection, MVCC, etc Scales up (what's the x and the y?) Several billions of nodes/rels/props on single JVM Robust 6+ years in 24/7 production
  • 37. Social network pathExists() 12 ~1k persons 3 7 1 Avg 50 friends per person pathExists(a, b) limit 36 41 77 depth 4 5 Two backends Eliminate disk IO so warm up caches
  • 38. Social network pathExists() 2 Emil 1 5 7 Mike Kevin 3 John Marcus 9 4 Bruce Leigh # persons query time Relational database 1 000 2 000 ms Graph database (Neo4j) 1 000 2 ms Graph database (Neo4j) 1 000 000 2 ms
  • 41. Pros & Cons compared to RDBMS + No O/R impedance mismatch (whiteboard friendly) + Can easily evolve schemas + Can represent semi-structured info + Can represent graphs/networks (with performance) - Lacks in tool and framework support - Few other implementations => potential lock in - No support for ad-hoc queries +
  • 42. More consequences Ability to capture semi-structured information => allowing individualization of content No predefined schema => easier to evolve model => can capture ad-hoc relationships Can capture non-normative relations => easy to model specific links to specific sets All state is kept in transactional memory => improves application concurrency
  • 43. The Neo4j ecosystem Neo4j is an embedded database Tiny teeny lil jar file Component ecosystem index-util neo-meta neo-utils pattern-match sparql-engine ... See http://components.neo4j.org
  • 44. Language bindings Neo4j.py – bindings for Jython and CPython http://components.neo4j.org/neo4j.py Neo4jrb – bindings for JRuby (incl RESTful API) http://wiki.neo4j.org/content/Ruby Clojure http://wiki.neo4j.org/content/Clojure Scala (incl RESTful API) http://wiki.neo4j.org/content/Scala … .NET? Erlang?
  • 48. Scale out – replication Rolling out Neo4j HA before end-of-year Side note: ppl roll it today w/ REST frontends & onlinebackup Master-slave replication, 1st configuration MySQL style... ish Except all instances can write, synchronously between writing slave & master (strong consistency) Updates are asynchronously propagated to the other slaves (eventual consistency) This can handle billions of entities... … but not 100B
  • 49. Scale out – partitioning Sharding possible today … but you have to do manual work … just as with MySQL Great option: shard on top of resilient, scalable OSS app server , see: www.codecauldron.org Transparent partitioning? Neo4j 2.0 100B? Easy to say. Sliiiiightly harder to do. Fundamentals: BASE & eventual consistency Generic clustering algorithm as base case, but give lots of knobs for developers
  • 50. How ego are you? (aka other impls?) Franz’ AllegroGraph (http://agraph.franz.com) Proprietary, Lisp, RDF-oriented but real graphdb FreeBase graphd (http://bit.ly/13VITB) In-house at Metaweb Kloudshare (http://kloudshare.com) Graph database in the cloud, still stealth mode Google Pregel (http://bit.ly/dP9IP) We are oh-so-secret Some academic papers from ~10 years ago G = {V, E} #FAIL
  • 51. Conclusion Graphs && Neo4j => teh awesome! Available NOW under AGPLv3 / commercial license AGPLv3: “if you’re open source, we’re open source” If you have proprietary software? Must buy a commercial license But up to 1M primitives it’s free for all uses! Download http://neo4j.org Feedback http://lists.neo4j.org
  • 53. Poop 1 Key-value stores? => the awesome … if you have 1000s of BILLIONS records OR you don't care about programmer productivity What if you had no variables at all in your programs except a single globally accessible hashtable? Would your software be maintainable?
  • 54. Poop 2 In a not-suck architecture... … the only thing that makes sense is to have an embedded database.
  • 55. Poop 3 Exposing your data model on the wire is bad. Period. Adding a couple of buzzwords doesn't make it less bad. If it was bad with SQL-over-sockets (hint: it was) then – surprise! – it's still bad even tho you use Hype-compliant(tm) JSON-over-REST. We don't want to couple everything to a specific data model again!
  • 56. Poop 4 In-memory database What the hell? That's an oxymoron! Up next: ascii-only JPEG Up next: loopback-only web server If you're not durable, you're a cache! If you happen to asynchronously spill over to disk, you're a cache that asynchronously spills over to disk.
  • 58. Looking ahead: polyglot persistence SQL && NoSQL
  • 60. Questions? Image credit: lost again! Sorry :(