SlideShare a Scribd company logo
Kettle & Neo4j
Matt Casters, matt.casters@neo4j.com
mattcasters
Agenda
•What is Kettle?
•The Neo4j plugins
•Loading data into Neo4j (with demo)
•Extracting data from Neo4j (with demo)
•Recap
•Q&A
What is Kettle?
3
Kettle: Introduction
•a.k.a Pentaho Data Integration
•One of the most widely used ETL tools
•Ready for the most demanding tasks
•Open source Apache Public License 2.0
•Well maintained
•Large community, marketplace, ...
•Easy to embed, install, package, rebrand
•Download from Sourceforge / Pentaho / PDI-CE
Kettle: Introduction
• Kettle
• Extraction
• Transformation
•Transportation
• Loading
• Environment
• PDI: Pentaho Data Integration @ Hitachi Vantara
Kettle: Architecture
•Metadata driven, engine based :
•No code generation
•Define what you need to happen
-> GUI, Web, code, rules, …
•Execute wherever you need to
-> From Raspberry Pi to Hadoop
•Types of work:
● Jobs for workflows
● Transformations for parallel streaming
Kettle: Design
• 100% Exposure of our engine through UI elements
• Everyone should be able to play along: plugins!
•We built integration points for others: run everywhere!
• Allow the user to avoid programming anything
• Allow the user to program anything: JavaScript, Java,
SQL, RegEx, Rules, Python, Ruby, R, OO Formula, Pig, …
• Transparency wins: top class logging, data lineage,
execution lineage, debugging, data previewing, row
sniff testing, …
Kettle: Cool things
• SpoonGit: UI integration with git
• WebSpoon: web interface to the full Spoon UI
•Data Sets: build transformation unit tests
• Large marketplace with:
http://www.pentaho.com/marketplace/
• Project on github has over 1,000 forks
https://github.com/pentaho/pentaho-kettle
Kettle: Quick Spoon intro
Neo4j Kettle Plugins
10
Plugins: Neo4j Cypher
•For reading and writing
•Dynamic Cypher
•Batching and UNWIND
•Parallel execution
Plugins: Neo4j Output
•Easy node creation
•Create/Merge of ()-[]-()
•Batching and UNWIND
•Parallel execution
•Dynamic labels
Plugins: Neo4j Graph Output
•Update parts of a graph
•Auto-generate Cypher
•Using model
•Using field mapping
Plugins: Check Neo4j Connection
•Job Entry
•Validate DBs are up
•Used in error diagnostic
•Defensive setup
Plugins: Neo4j Cypher Script
•Job Entry
•Executes series of Cypher statements
Loading data into Neo4j
16
Loading Neo4j: loading nodes
•Demonstrates the Neo4j Output step
•Read a CSV file in parallel
•Load the data into nodes in parallel
Loading Neo4j: remove all data
•Demonstrates the Neo4j Cypher step
•Calls procedures
•Uses dynamic Cypher statements
•Reads and updates Neo4j
•Removes the all nodes and edges in batches
Loading Neo4j: update graphs
•Demonstrates the Neo4j Graph Output step
•Updates multiple nodes and relationships at once
•Takes key values into account to ignore nodes
•Automatically generates MERGE statements
Loading Neo4j: Kafka updating Neo4j
• Demonstrates Kafka integration
• Stream data using a Kafka consumer
• Continuously update Neo4j
Extracting data with Kettle
21
Sourcing Neo4j: simple reading
● Read using a Cypher query
● Write to an Excel file
Sourcing Neo4j: Kettle JDBC
● Expose Neo4j queries as a virtual SQL table
● Allow SQL queries to run against Neo4j
Recap
24
Take-aways
With Kettle & Neo4j plugins:
•Work faster, tackle harder problems
•Reduce risk by showing results faster
•Keep maintenance costs under control
Kettle & Neo4j : Q&A
26

More Related Content

PDF
The GLPI Project - present & future
PDF
GraphDay Paris - CAST IMAGING - Un IRM pour les systèmes IT complexes
PPTX
GraphDay Paris - Intro & Découverte de l'outil de visualisation Neo4j Bloom
PDF
How to Build a ML Platform Efficiently Using Open-Source
PPTX
GraphConnect Europe 2016 - Inside the Spider’s Web: Dependency Management wit...
PDF
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
PPTX
Enterprise Performance Planning
PPTX
Icinga Camp Bangalore - Enterprise exceptions
The GLPI Project - present & future
GraphDay Paris - CAST IMAGING - Un IRM pour les systèmes IT complexes
GraphDay Paris - Intro & Découverte de l'outil de visualisation Neo4j Bloom
How to Build a ML Platform Efficiently Using Open-Source
GraphConnect Europe 2016 - Inside the Spider’s Web: Dependency Management wit...
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Enterprise Performance Planning
Icinga Camp Bangalore - Enterprise exceptions

What's hot (20)

PDF
Glpi 9.2-presentation
PDF
Kafka and Kafka Streams in the Global Schibsted Data Platform
PDF
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
PDF
Cnvrg webinar continual learning
PDF
Marvin Platform – Potencializando equipes de Machine Learning
PPTX
The Future of Data Engineering - 2019 InfoQ QConSF
PDF
Marvin Platform - Artificial Intelligence Platform
PPTX
AWS Dev Day 2018
PPTX
Capacity Planning, To be or not to be virtualized
PPT
Whitehorses Oracle OpenWorld 2010: Douwe Pieter van den Bos
PDF
Reco4J @ Munich Meetup (April 18th)
PPT
NocExplorer
PDF
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
PDF
Presentation cisco ucs director & flex pod
PPTX
How to Empower a Platform With a Data Pipeline At a Scale
PDF
Supercharge Your Sirius Web Apps!
PDF
Building a Streaming Data Pipeline for Trains Delays Processing
PDF
OPEN'17_2_Customer Experience_Essent
PDF
Deploying GraphQL Services as Managed APIs
PDF
SiriusCon 2017 - 5 years of modelisation, from a prototype to an industrial g...
Glpi 9.2-presentation
Kafka and Kafka Streams in the Global Schibsted Data Platform
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
Cnvrg webinar continual learning
Marvin Platform – Potencializando equipes de Machine Learning
The Future of Data Engineering - 2019 InfoQ QConSF
Marvin Platform - Artificial Intelligence Platform
AWS Dev Day 2018
Capacity Planning, To be or not to be virtualized
Whitehorses Oracle OpenWorld 2010: Douwe Pieter van den Bos
Reco4J @ Munich Meetup (April 18th)
NocExplorer
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
Presentation cisco ucs director & flex pod
How to Empower a Platform With a Data Pipeline At a Scale
Supercharge Your Sirius Web Apps!
Building a Streaming Data Pipeline for Trains Delays Processing
OPEN'17_2_Customer Experience_Essent
Deploying GraphQL Services as Managed APIs
SiriusCon 2017 - 5 years of modelisation, from a prototype to an industrial g...
Ad

Similar to GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Source Kettle (20)

PDF
Neo4j Data Loading with Kettle
PDF
Neo4J meetup, Brussels, 2018-06-12
ODP
An Introduction to Pentaho Kettle
PDF
Introduction To Pentaho Kettle
PPT
Pentaho etl-tool
ODP
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
ODP
Pentaho Data Integration Introduction
PPTX
Pentaho ppt up
PPTX
Slides pentaho-hadoop-weka
PPTX
Pentaho Data Integration: Extrayendo, integrando, normalizando y preparando m...
PDF
Kettle: Pentaho Data Integration tool
PPTX
Migrating from MongoDB to Neo4j - Lessons Learned
PPT
Kettleetltool 090522005630-phpapp01
PPT
Kettle – Etl Tool
PPTX
Master Real-Time Streams With Neo4j and Apache Kafka
PDF
ETL All The Things with Ruby
PDF
Atlantis Word Processor 4.4.5.1 Free Download
PDF
Auslogics Video Grabber Free 1.0.0.12 Free
PDF
Capture One Enterprise for MacOS Download
PPTX
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j Data Loading with Kettle
Neo4J meetup, Brussels, 2018-06-12
An Introduction to Pentaho Kettle
Introduction To Pentaho Kettle
Pentaho etl-tool
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Pentaho Data Integration Introduction
Pentaho ppt up
Slides pentaho-hadoop-weka
Pentaho Data Integration: Extrayendo, integrando, normalizando y preparando m...
Kettle: Pentaho Data Integration tool
Migrating from MongoDB to Neo4j - Lessons Learned
Kettleetltool 090522005630-phpapp01
Kettle – Etl Tool
Master Real-Time Streams With Neo4j and Apache Kafka
ETL All The Things with Ruby
Atlantis Word Processor 4.4.5.1 Free Download
Auslogics Video Grabber Free 1.0.0.12 Free
Capture One Enterprise for MacOS Download
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Ad

More from Neo4j (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
PDF
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
PDF
GraphSummit Singapore Master Deck - May 20, 2025
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
PPTX
Neo4j Knowledge for Customer Experience.pptx
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
PDF
Neo4j: The Art of the Possible with Graph
PDF
Smarter Knowledge Graphs For Public Sector
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
PDF
ANZ Presentation: GraphSummit Melbourne 2024
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
PDF
Démonstration Digital Twin Building Wire Management
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
PDF
Démonstration Supply Chain - GraphTalk Paris
PDF
The Art of Possible - GraphTalk Paris Opening Session
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
GraphSummit Singapore Master Deck - May 20, 2025
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j Knowledge for Customer Experience.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j: The Art of the Possible with Graph
Smarter Knowledge Graphs For Public Sector
GraphRAG and Knowledge Graphs Exploring AI's Future
Matinée GenAI & GraphRAG Paris - Décembre 24
ANZ Presentation: GraphSummit Melbourne 2024
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Démonstration Digital Twin Building Wire Management
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Démonstration Supply Chain - GraphTalk Paris
The Art of Possible - GraphTalk Paris Opening Session
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...

Recently uploaded (20)

PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
L1 - Introduction to python Backend.pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
medical staffing services at VALiNTRY
PDF
AI in Product Development-omnex systems
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Digital Strategies for Manufacturing Companies
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PTS Company Brochure 2025 (1).pdf.......
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Operating system designcfffgfgggggggvggggggggg
2025 Textile ERP Trends: SAP, Odoo & Oracle
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
L1 - Introduction to python Backend.pptx
How to Migrate SBCGlobal Email to Yahoo Easily
How Creative Agencies Leverage Project Management Software.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Odoo POS Development Services by CandidRoot Solutions
medical staffing services at VALiNTRY
AI in Product Development-omnex systems
CHAPTER 2 - PM Management and IT Context
Digital Strategies for Manufacturing Companies
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx

GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Source Kettle

  • 1. Kettle & Neo4j Matt Casters, matt.casters@neo4j.com mattcasters
  • 2. Agenda •What is Kettle? •The Neo4j plugins •Loading data into Neo4j (with demo) •Extracting data from Neo4j (with demo) •Recap •Q&A
  • 4. Kettle: Introduction •a.k.a Pentaho Data Integration •One of the most widely used ETL tools •Ready for the most demanding tasks •Open source Apache Public License 2.0 •Well maintained •Large community, marketplace, ... •Easy to embed, install, package, rebrand •Download from Sourceforge / Pentaho / PDI-CE
  • 5. Kettle: Introduction • Kettle • Extraction • Transformation •Transportation • Loading • Environment • PDI: Pentaho Data Integration @ Hitachi Vantara
  • 6. Kettle: Architecture •Metadata driven, engine based : •No code generation •Define what you need to happen -> GUI, Web, code, rules, … •Execute wherever you need to -> From Raspberry Pi to Hadoop •Types of work: ● Jobs for workflows ● Transformations for parallel streaming
  • 7. Kettle: Design • 100% Exposure of our engine through UI elements • Everyone should be able to play along: plugins! •We built integration points for others: run everywhere! • Allow the user to avoid programming anything • Allow the user to program anything: JavaScript, Java, SQL, RegEx, Rules, Python, Ruby, R, OO Formula, Pig, … • Transparency wins: top class logging, data lineage, execution lineage, debugging, data previewing, row sniff testing, …
  • 8. Kettle: Cool things • SpoonGit: UI integration with git • WebSpoon: web interface to the full Spoon UI •Data Sets: build transformation unit tests • Large marketplace with: http://www.pentaho.com/marketplace/ • Project on github has over 1,000 forks https://github.com/pentaho/pentaho-kettle
  • 11. Plugins: Neo4j Cypher •For reading and writing •Dynamic Cypher •Batching and UNWIND •Parallel execution
  • 12. Plugins: Neo4j Output •Easy node creation •Create/Merge of ()-[]-() •Batching and UNWIND •Parallel execution •Dynamic labels
  • 13. Plugins: Neo4j Graph Output •Update parts of a graph •Auto-generate Cypher •Using model •Using field mapping
  • 14. Plugins: Check Neo4j Connection •Job Entry •Validate DBs are up •Used in error diagnostic •Defensive setup
  • 15. Plugins: Neo4j Cypher Script •Job Entry •Executes series of Cypher statements
  • 16. Loading data into Neo4j 16
  • 17. Loading Neo4j: loading nodes •Demonstrates the Neo4j Output step •Read a CSV file in parallel •Load the data into nodes in parallel
  • 18. Loading Neo4j: remove all data •Demonstrates the Neo4j Cypher step •Calls procedures •Uses dynamic Cypher statements •Reads and updates Neo4j •Removes the all nodes and edges in batches
  • 19. Loading Neo4j: update graphs •Demonstrates the Neo4j Graph Output step •Updates multiple nodes and relationships at once •Takes key values into account to ignore nodes •Automatically generates MERGE statements
  • 20. Loading Neo4j: Kafka updating Neo4j • Demonstrates Kafka integration • Stream data using a Kafka consumer • Continuously update Neo4j
  • 21. Extracting data with Kettle 21
  • 22. Sourcing Neo4j: simple reading ● Read using a Cypher query ● Write to an Excel file
  • 23. Sourcing Neo4j: Kettle JDBC ● Expose Neo4j queries as a virtual SQL table ● Allow SQL queries to run against Neo4j
  • 25. Take-aways With Kettle & Neo4j plugins: •Work faster, tackle harder problems •Reduce risk by showing results faster •Keep maintenance costs under control
  • 26. Kettle & Neo4j : Q&A 26