Offshore Leaks:
Graph Databases For
Investigative Journalism
Justin Fine
Santa Monica, CA
Feb 2019
Panama Papers
Paradise Papers
2
GRAPHTOUR – San Francisco
https://neo4j.com/graphtour/
https://www.eventbrite.com/e/graphtour-san-francisco-ca-tickets-58600670182
3
Neo4j Graph Algorithms
https://neo4j.com/blog/graph-algorithms-neo4j-graph-technology-ai-applications/
Machine Learning and Graph Algorithms in Neo4j
• Parallel Breadth First Search & DFS
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev, &
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Random Walk
• One Hot Encoding
Reference Implementations

for Graph Embeddings 

(Node to Vector)
• DeepGL
• DeepWalk
Pathfinding 

& Search
Centrality /
Importance
Community
Detection
Similarity &

ML Workflow
https://neo4j.com/docs/graph-algorithms/3.5/
Justin Fine
Field Engineer , Neo4j
Justin.Fine@neo4j.com
https://www.linkedin.com/in/justin-fine-
a7604523
https://github.com/JustinFineNeo/
IntroToNeo4J
Justin’s Graph
Justin’s Graph
https://www.nbcnews.com/tech/social-media/russian-trolls-went-attack-during-key-election-moments-n827176
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
What’s a Graph Database???
Chart Graph
Graph Databases: insight, scandal and the speed you always wanted!
Common Use Case
● Real time recommendations
● Fraud Detection
● Network & IT Management
● Social Networks
● Bill of Materials
● Knowledge Graphs
● Master Data Management
● Access Management
● Microservices Analysis
● IoT
● ...
Fixed Sized Records
“Joins” on Creation
Spin Spin Spin through
this data structure
Pointers instead of
Lookups1
2
3
4
Neo4j Secret Sauce
Neo4j Mission Statement
To help the world make sense of data
Graph Databases: insight, scandal and the speed you always wanted!
Graph Databases: insight, scandal and the speed you always wanted!
bills.cs
v
committees.cs
v
votes.csv
https://www.govtrack.us/
SQLER Diagrams
Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-
Friend
ANDREAS
DELIA
TOBIAS
MICA
Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and
direction
• Can have name-value properties
CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since: 

Jan 10, 2011
brand: “Volvo”
model: “V70”
LOVES
LOVES
LIVES WITH
OW
NS
PERSON PERSON
Cypher: Powerful and Expressive Query
Language
CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
Cypher Query Language
SQL for graphs
https://offshoreleaks.icij.org/pages/
Almost 200 journalists
Based in 65 countries
“Our aim is to bring journalists from different countries together in
teams - eliminating rivalry and promoting collaboration. Together,
we aim to be the
world’s best cross-border investigative team.”
icij.org/about
Exposed the offshore holdings of 12 current and
former world leaders.
And dealings of 128 more politicians and public
officials around the world.
Exposure of hidden Secrets
Graph Databases: insight, scandal and the speed you always wanted!
INSIDE THE 2.6 TB
Context is King
name: “John”
last: „Miller“
role:
„Negotiator“
name:
"Maria"
last: "Osara"
name: “Some Media
Ltd”
value: “$70M”
PERSON
PERSON
$
@
name: ”Jose"
last: “Pereia“
position:
“Governor“
name:
“Alice”
last: „Smith“
role:
„Advisor“
PERSON
PERSON
Context is King
name: “John”
last: „Miller“
role:
„Negotiator“
name:
"Maria"
last: "Osara"
name: “Some Media
Ltd”
value: “$70M”
PERSON
PERSON
$
@
name: ”Jose"
last: “Pereia“
position:
“Governor“
name:
“Alice”
last: „Smith“
role:
„Advisor“
PERSON
MENTIONS
SUPPORTS
PERSONSENT
WROTE
CONTAINS
OW
NS
since: 

Jan 10,
2011
CREATED
We need a data model
Meta Data Entities
• Document, Email, Contract, DB-
Record
• Meta: Author, Date, Source,
Keywords
• Conversation: Sender, Receiver,
Topic
• Money Flows
Actual Entities
• Person
• Representative (Officer)
• Address
• Client
• Company
• Account
Either based on our use cases & questions
On the entities present in our meta-data and data.
Data model – relationships
Meta-Data
• sent, received, cc‘ed
• mentioned, topic-of
• created, signed
• attached
• roles
• family relationships
Activities
• open account
• manage
• has shares
• registered address
• money flow
The Basic ICIJ Data Model
The Real ICIJ Data Model
The Predictions
Actually found because of Neo4j
http://www.bbc.com/news/world-us-canada-41876939
https://www.icij.org/investigations/paradise-papers/us-president-donald-trumps-influencers/
Bit more connected though
Make Sense Of Data
https://neo4j.com/
sandbox-v2/
Neo4j Desktop For ICIJ
https://offshoreleaks.icij.org/pages/database
https://www.eventbrite.com/e/introduction-to-neo4j-san-
diego-tickets-56209010664
https://www.eventbrite.com/e/graphtour-san-francisco-ca-tickets-58600670182
https://maxdemarzi.com
(you)-[:HAVE]->(questions)<-[:ANSWERS]-(neo)

More Related Content

PDF
Mon norton tut_publishing01
PPTX
RDA is Here: Are You Ready?
PPTX
RDA: Are We There Yet? Carterette Webinar S
PPTX
Data Culture / Culture Data
PPT
Building Blocks for the Future: Making Controlled Vocabularies Available for ...
PPTX
Name That Graph !
PPTX
How Graphs Help Investigative Journalists to Connect the Dots
Mon norton tut_publishing01
RDA is Here: Are You Ready?
RDA: Are We There Yet? Carterette Webinar S
Data Culture / Culture Data
Building Blocks for the Future: Making Controlled Vocabularies Available for ...
Name That Graph !
How Graphs Help Investigative Journalists to Connect the Dots

Similar to Graph Databases: insight, scandal and the speed you always wanted! (19)

PPTX
NCompass Live: RDA: Are We There Yet?
PPTX
RDA is Here: Are You Ready?
PDF
Great sources, Great Storytelling by Rosland Gammon
PDF
Hispanic Heritage Spotlight At Langley, Puerto Rican Engineer Helps NASA Reac...
PPTX
VRA 2012, Cataloging Case Studies, Metadata Magic
PPTX
Mike Nelson - Driving Loyalty Through Customer Insights
PPTX
Bigdataforesight
PPTX
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
PPTX
Social networking&it’s models
KEY
Civil War Data 150 at DLF Fall Forum 2011
PPTX
VRA 2014 - Linking the Smithsonian American Art Museum to the Cloud
PDF
Authority Control in the German Speaking Countries: The Integrated Authority ...
PPTX
Analytics with Purpose Data Visualization Gallery
PDF
Downtown Powerpoint
PDF
Graph Query Languages: update from LDBC
PDF
Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup: AACR to R...
PDF
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
PPTX
Knowledge Enabled Location Prediction of Twitter Users
PDF
yourHistory - entity linking for a personalized timeline of historic events
NCompass Live: RDA: Are We There Yet?
RDA is Here: Are You Ready?
Great sources, Great Storytelling by Rosland Gammon
Hispanic Heritage Spotlight At Langley, Puerto Rican Engineer Helps NASA Reac...
VRA 2012, Cataloging Case Studies, Metadata Magic
Mike Nelson - Driving Loyalty Through Customer Insights
Bigdataforesight
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
Social networking&it’s models
Civil War Data 150 at DLF Fall Forum 2011
VRA 2014 - Linking the Smithsonian American Art Museum to the Cloud
Authority Control in the German Speaking Countries: The Integrated Authority ...
Analytics with Purpose Data Visualization Gallery
Downtown Powerpoint
Graph Query Languages: update from LDBC
Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup: AACR to R...
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
Knowledge Enabled Location Prediction of Twitter Users
yourHistory - entity linking for a personalized timeline of historic events
Ad

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
PPTX
Data Con LA 2022 Keynotes
PDF
Data Con LA 2022 Keynote
PPTX
Data Con LA 2022 - Startup Showcase
PPTX
Data Con LA 2022 Keynote
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
PPTX
Data Con LA 2022 - AI Ethics
PDF
Data Con LA 2022 - Improving disaster response with machine learning
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PDF
Data Con LA 2022 - Real world consumer segmentation
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
PDF
Data Con LA 2022 - Intro to Data Science
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka
Ad

Recently uploaded (20)

PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Unlock new opportunities with location data.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Tartificialntelligence_presentation.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
Modernising the Digital Integration Hub
PDF
Getting Started with Data Integration: FME Form 101
PDF
Architecture types and enterprise applications.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A novel scalable deep ensemble learning framework for big data classification...
Benefits of Physical activity for teenagers.pptx
Unlock new opportunities with location data.pdf
WOOl fibre morphology and structure.pdf for textiles
Tartificialntelligence_presentation.pptx
A comparative study of natural language inference in Swahili using monolingua...
Taming the Chaos: How to Turn Unstructured Data into Decisions
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
DP Operators-handbook-extract for the Mautical Institute
1 - Historical Antecedents, Social Consideration.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Hindi spoken digit analysis for native and non-native speakers
Final SEM Unit 1 for mit wpu at pune .pptx
Getting started with AI Agents and Multi-Agent Systems
Modernising the Digital Integration Hub
Getting Started with Data Integration: FME Form 101
Architecture types and enterprise applications.pdf
Zenith AI: Advanced Artificial Intelligence

Graph Databases: insight, scandal and the speed you always wanted!