SlideShare a Scribd company logo
Running Neo4j in Production
Tips, Tricks and Optimizations
This Talk...
● How we scaled our prod graph
● Challenges faced doing this
● Various lessons we learned and techniques
we used
● Some stuff I’m looking forward to in Neo4j
SNAP Interactive
● Presented by David Fox (Big Data Engineer)
● Social dating app AYI (Are You Interested?)
● Friends and interests
How We Use Neo4j
● Model the friend data of our millions of users
● Indicate connections everywhere on app
● 1.1+ billion nodes
● 8.5+ billion relationships
● 450gb+ store
● 3 instance cluster
Importing lots of data
● Find the right tool
o First try normal Cypher
o No good? Bring out the big guns - Java Batch
Inserter
● Java Batch Inserter
o Sort relationships (GNU sort)
o Try to keep index lookups to in-memory lookups only
 Giant HashMap!
But wait!!!
● Cypher CSV import
o 2.1 M01
o Supposed to be good for importing large data sets
o Anyone tried it?
Read Querying
● Always try Cypher first
o Performance is being improved
● How can you tell if performance is where you
need it to be?
o Time queries (cold vs. warm cache)
o Load testing!
Read Querying cont.
● Dark querying
o Great for benchmarking system where Neo4j
functionality is being injected
o Mitigates risk
o Provides results that are very close to real world
patterns
Read Querying cont.
● Reads too slow? Try these things.
o Write high-throughput business-critical queries in
Java
 unmanaged extension
 faster
 hard limits
o Cache shard
 country, age, gender, etc.
 you hit warm cache more often
Read Querying cont.
● Warm the cache!
o Touch all the nodes
o Touch all the relationships
Writing
● Decide which writes need to be synchronous
and which can be asynchronous
● Queue up asynchronous writes (routine
updates, non-vital to immediate user-
experience)
o Try to evenly distribute them
o How do we do this? Baserunner!
Baserunner
● Written by SNAP developer
● Walks userbase randomly instead of
sequentially
o This avoids pockets of heavily increased write
queries
o Allows us to do high-velocity updating of our data
Tuning the JVM
● For a really high-throughput environment,
G1 GC has been very helpful
o Good at adapting itself
o We experienced less system-stopping pauses than
with CMS
o Try CMS first but remember G1 as option
Hardware is Important
● Lots of memory
● Working set too big for memory?
o SSDs are helpful
o Optimization techniques discussed become much
more important
Not Everything is Your Fault!
● Like any software, Neo4j has bugs
● Developers are receptive
● File reports on Github when you find issues
Some stuff to look forward to...
● Relationship grouping (2.1 M01)
o helps mitigate the super node/dense node problem
● Ronja (rewrite of the Cypher query
language, 2.1?)
● More flexible label index searching (after
2.1)
Questions?

More Related Content

PPTX
DevOps, Performance Optimization and the Green Life with Magento
PDF
Scalable, good, cheap
ODP
Devops in with the old, in with the new
PDF
Modern Messaging for Distributed Systems
PDF
Devops at Startup Weekend BXL
PDF
Techhub Riga - tm 27.07
PPTX
Software Development Whats & Whys
PDF
Performance optimization techniques for Java code
DevOps, Performance Optimization and the Green Life with Magento
Scalable, good, cheap
Devops in with the old, in with the new
Modern Messaging for Distributed Systems
Devops at Startup Weekend BXL
Techhub Riga - tm 27.07
Software Development Whats & Whys
Performance optimization techniques for Java code

Similar to Running Neo4j in Production: Tips, Tricks and Optimizations (20)

PDF
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
PDF
Monitoring and automation
PDF
The 5 Minute MySQL DBA
PPTX
Tips about hibernate with spring data jpa
PPTX
Writing clean scientific software Murphy cleancoding
ODP
Path dependent-development (PyCon India)
PPTX
The Professional Programmer
PPTX
Jonathan Coveney: Why Pig?
PDF
Unbreaking Your Django Application
ODP
Path Dependent Development (PyCon AU)
ODP
Cloud accounting software uk
PDF
OSMC 2015 | Testing in Production by Devdas Bhagat
PDF
OSMC 2015: Testing in Production by Devdas Bhagat
PDF
Devops, the future is here, it's just not evenly distributed yet.
PPTX
Bringing Open-Source Practices to Your Day Job
PDF
Spaghetti gate
PDF
Services, tools & practices for a software house
PDF
High performance json- postgre sql vs. mongodb
PDF
What drives Innovation? Innovations And Technological Solutions for the Distr...
PPTX
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
Monitoring and automation
The 5 Minute MySQL DBA
Tips about hibernate with spring data jpa
Writing clean scientific software Murphy cleancoding
Path dependent-development (PyCon India)
The Professional Programmer
Jonathan Coveney: Why Pig?
Unbreaking Your Django Application
Path Dependent Development (PyCon AU)
Cloud accounting software uk
OSMC 2015 | Testing in Production by Devdas Bhagat
OSMC 2015: Testing in Production by Devdas Bhagat
Devops, the future is here, it's just not evenly distributed yet.
Bringing Open-Source Practices to Your Day Job
Spaghetti gate
Services, tools & practices for a software house
High performance json- postgre sql vs. mongodb
What drives Innovation? Innovations And Technological Solutions for the Distr...
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Ad

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Ad

Running Neo4j in Production: Tips, Tricks and Optimizations

  • 1. Running Neo4j in Production Tips, Tricks and Optimizations
  • 2. This Talk... ● How we scaled our prod graph ● Challenges faced doing this ● Various lessons we learned and techniques we used ● Some stuff I’m looking forward to in Neo4j
  • 3. SNAP Interactive ● Presented by David Fox (Big Data Engineer) ● Social dating app AYI (Are You Interested?) ● Friends and interests
  • 4. How We Use Neo4j ● Model the friend data of our millions of users ● Indicate connections everywhere on app ● 1.1+ billion nodes ● 8.5+ billion relationships ● 450gb+ store ● 3 instance cluster
  • 5. Importing lots of data ● Find the right tool o First try normal Cypher o No good? Bring out the big guns - Java Batch Inserter ● Java Batch Inserter o Sort relationships (GNU sort) o Try to keep index lookups to in-memory lookups only  Giant HashMap!
  • 6. But wait!!! ● Cypher CSV import o 2.1 M01 o Supposed to be good for importing large data sets o Anyone tried it?
  • 7. Read Querying ● Always try Cypher first o Performance is being improved ● How can you tell if performance is where you need it to be? o Time queries (cold vs. warm cache) o Load testing!
  • 8. Read Querying cont. ● Dark querying o Great for benchmarking system where Neo4j functionality is being injected o Mitigates risk o Provides results that are very close to real world patterns
  • 9. Read Querying cont. ● Reads too slow? Try these things. o Write high-throughput business-critical queries in Java  unmanaged extension  faster  hard limits o Cache shard  country, age, gender, etc.  you hit warm cache more often
  • 10. Read Querying cont. ● Warm the cache! o Touch all the nodes o Touch all the relationships
  • 11. Writing ● Decide which writes need to be synchronous and which can be asynchronous ● Queue up asynchronous writes (routine updates, non-vital to immediate user- experience) o Try to evenly distribute them o How do we do this? Baserunner!
  • 12. Baserunner ● Written by SNAP developer ● Walks userbase randomly instead of sequentially o This avoids pockets of heavily increased write queries o Allows us to do high-velocity updating of our data
  • 13. Tuning the JVM ● For a really high-throughput environment, G1 GC has been very helpful o Good at adapting itself o We experienced less system-stopping pauses than with CMS o Try CMS first but remember G1 as option
  • 14. Hardware is Important ● Lots of memory ● Working set too big for memory? o SSDs are helpful o Optimization techniques discussed become much more important
  • 15. Not Everything is Your Fault! ● Like any software, Neo4j has bugs ● Developers are receptive ● File reports on Github when you find issues
  • 16. Some stuff to look forward to... ● Relationship grouping (2.1 M01) o helps mitigate the super node/dense node problem ● Ronja (rewrite of the Cypher query language, 2.1?) ● More flexible label index searching (after 2.1)