SlideShare a Scribd company logo
Graph-Based
Stream Processing
This is Katarina
● Born at the Croatian coast, in the town of Šibenik
● Master’s in Mathematics and Computer science from
the Faculty of Science in Zagreb
● I work as a Developer Relations Engineer at Memgraph
● I love to travel, cook and eat tasty food (check out my
Instagram)
This is Ivan
● Born and living in Croatia
● Master’s in Computer science from the Faculty of
Electrical Engineering and Computing in Zagreb
● I work as a Developer Relations Engineer at
Memgraph
● I love to travel, cook and watch Netflix too much
The graph data model
When to use a graph database?
What are graphs?
A graph is a network structure that consists
of a set of nodes (vertices) and a set of
relationships (edges) connecting them.
● Nodes - structures that represent
entities
● Relationships - connections between
these entities
● Properties - associated values
(key-value pairs) belonging to either
nodes or relationships
Labeled property graph model
Graph database vs relational database
Cypher query language
Cypher is the most widely adopted, fully-specified, and open query language for property graph
databases. It provides an intuitive way to work with property graphs.
Cypher contains:
- clauses such as MATCH, DELETE, SET, RETURN…
- functions such as round(), cos(), toString()...
- custom procedures written in Python, C/C++ and Rust
Cypher query language
MATCH (u:Customer{customer_id:'customer-one'})-[:BOUGHT]->
(p:Product)<-[:BOUGHT]-(peer:Customer)-[:BOUGHT]->
(reco:Product)
WHERE not (u)-[:BOUGHT]->(reco)
RETURN reco as Recommendation, count(*) as Frequency ORDER
BY Frequency DESC LIMIT 5;
SQL
SELECT product.product_name as Recommendation, count(1) as
Frequency
FROM product, customer_product_mapping, (SELECT
cpm3.product_id, cpm3.customer_id
FROM Customer_product_mapping cpm,
Customer_product_mapping cpm2, Customer_product_mapping
cpm3
WHERE cpm.customer_id = ‘customer-one’
and cpm.product_id = cpm2.product_id
and cpm2.customer_id != ‘customer-one’
and cpm3.customer_id = cpm2.customer_id
and cpm3.product_id not in (select distinct product_id
FROM Customer_product_mapping cpm
WHERE cpm.customer_id = ‘customer-one’)
) recommended_products
WHERE customer_product_mapping.product_id =
product.product_id
and customer_product_mapping.product_id in
recommended_products.product_id
and customer_product_mapping.customer_id =
recommended_products.customer_id
GROUP BY product.product_name
ORDER BY Frequency desc
Graph analytics in action
When to use graph analytics?
Graph analytics
Graph analytics, also called network analysis,
generates insights hidden in the relationships of
the network structure.
Some common algorithms include:
● Clustering & community detection
● Connected components
● PageRank
● Shortest path
● BFS & DFS
Graph analytics in action
Google was built on the
PageRank algorithm measuring
the importance of web pages.
Facebook’s social graph uses
Community Detection to infer unknown
data about their users based on similar
network behavior of other users to
power their ad-targeting engine.
Amazon uses Collaborative Filtering
to deliver high quality real-time
product recommendations.
Pinterest uses Random Walks and
Graph-Machine Learning to deliver
high-quality personalized
recommendations responsible for more
than 80% of all user engagements.
“Graph-Machine Learning
features proved the most
valuable of all other features
when determining the quality and
relevancy of our dish and
restaurant recommendations.”
Social networks
[Apache Kafka® Meetup by Confluent] Graph-based stream processing
Recommendation engines
Supply chain management
Fraud detection
Graph stream processing
Analyze your data in real-time instead of in batches
Graph stream processing
Graph stream processing
● Stream processing is a type of big data
architecture in which the data is analyzed
in real-time
● It is used whenever you need to extract
additional information from you data as
it’s being consumed
● The processing happens asynchronously -
the data source and the stream processing
work independently without waiting for a
response from one another
Memgraph
Memgraph is a platform for graph computation
on streaming data powered by an in-memory
graph database.
Memgraph is used to:
● Store graph data in memory
● Run graph analytics
● Analyze streaming data
GQLAlchemy
GQLAlchemy is a fully open-source Python
library. It is an Object Graph Mapper (OGM) - a
link between Graph Database objects and Python
objects.
GQLAlchemy includes:
● OGM capabilities
● Query builder
● On-disk storage
● Graph schema validation Jupyter Notebook Demo:
● https://github.com/memgraph/jupyter-memgrap
h-tutorials/tree/main/twitter_network_analysis
The stream processing architecture
What do you need for stream processing?
Stream processing architecture
Demo time!
The Twitter dataset
Analyzing a Twitter network
We are going to find the most christmassy
person on Twitter using the hashtag #christmas
by analyzing retweets of the original tweet:
● Dynamic PageRank algorithm is used to find
the most influential account
● Dynamic community detection algorithm is
used to figure out which accounts belong to
the same community
Graph stream processing with Memgraph
1. Transformation module
Transforming Kafka messages into graph objects:
2. PageRank
Set up everything and run the dynamic PageRank algorithm:
3. Create and start the stream
4. Create trigger
Stream processing in Memgraph
1. Create a transformation module in Python
2. Run the PageRank and community detection algorithms with GQLAlchemy
3. Create and start the stream with GQLAlchemy
4. Create a trigger and module to send PageRank results back to a Kafka topic
[Apache Kafka® Meetup by Confluent] Graph-based stream processing
Want to play with the Twitter dataset?
Check out the Memgraph Playground:
https://playground.memgraph.com/sandbox/twitter-christmas-retweets
Thank you for your attention!
www.memgraph.io
If you like what we do,
throw us a star! ⭐
https://github.com/memgraph/memgraph
Check out the Twitter network
analysis repository!
https://github.com/memgraph/twitter-network-analysis
www.memgraph.io
Join our Discord server!
memgr.ph/discord
supe_katarina
katarina.supe@memgraph.io
ivan_g_despot
ivan.despot@memgraph.io

More Related Content

PDF
Intro to Delta Lake
PDF
Brand New Web3 Wallet
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
PDF
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
PDF
Token engineering presentation 5 13-18
PDF
How does blockchain work
PPTX
OpenTelemetry For Architects
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Intro to Delta Lake
Brand New Web3 Wallet
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Token engineering presentation 5 13-18
How does blockchain work
OpenTelemetry For Architects
Apache Iceberg Presentation for the St. Louis Big Data IDEA

What's hot (20)

PDF
VictoriaLogs: Open Source Log Management System - Preview
PDF
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
PDF
Talend Open Studio Data Integration
PDF
Blockchain applications in payments and fintech
PDF
Demystifying Service Mesh
PDF
Monitoring Your AWS EKS Environment with Datadog
PDF
Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...
PDF
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
PDF
Solving Enterprise Data Challenges with Apache Arrow
PDF
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
PDF
Blockchain technology
PDF
IBM Cloud pak for data brochure
PDF
Full Isolation in Multi-Tenant SaaS with Kubernetes and Istio
PDF
Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...
PDF
Productizing Structured Streaming Jobs
PDF
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
PDF
Batch and Stream Graph Processing with Apache Flink
PPTX
Blockchain in supply chain conference Jul19 - Christina Patsioura
PDF
Cloud Streaming Platform @Generali Switzerland
PDF
Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...
VictoriaLogs: Open Source Log Management System - Preview
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Talend Open Studio Data Integration
Blockchain applications in payments and fintech
Demystifying Service Mesh
Monitoring Your AWS EKS Environment with Datadog
Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Solving Enterprise Data Challenges with Apache Arrow
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
Blockchain technology
IBM Cloud pak for data brochure
Full Isolation in Multi-Tenant SaaS with Kubernetes and Istio
Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...
Productizing Structured Streaming Jobs
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Batch and Stream Graph Processing with Apache Flink
Blockchain in supply chain conference Jul19 - Christina Patsioura
Cloud Streaming Platform @Generali Switzerland
Blockchain 101 | Blockchain Tutorial | Blockchain Smart Contracts | Blockchai...
Ad

Similar to [Apache Kafka® Meetup by Confluent] Graph-based stream processing (20)

PDF
GraphGen: Conducting Graph Analytics over Relational Databases
PDF
GraphGen: Conducting Graph Analytics over Relational Databases
PDF
Lambda Architecture and open source technology stack for real time big data
PPTX
Real time streaming analytics
PDF
Module 3 - Basics of Data Manipulation in Time Series
PDF
Graph Gurus 15: Introducing TigerGraph 2.4
PDF
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
PDF
Continuous delivery for machine learning
PDF
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
PDF
Leveraging Graphs for Better AI
PDF
Building Data Apps with Python
PDF
Scaling Analytics with Apache Spark
PDF
Nose Dive into Apache Spark ML
PPTX
Graph processing at scale using spark &amp; graph frames
PPT
Making sense of the Graph Revolution
PPTX
Odsc 2019 entity_reputation_knowledge_graph
PDF
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
PDF
Mohan C R CV
PDF
Python Machine Learning - Getting Started
PDF
Leveraging Graphs for Better AI
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
Lambda Architecture and open source technology stack for real time big data
Real time streaming analytics
Module 3 - Basics of Data Manipulation in Time Series
Graph Gurus 15: Introducing TigerGraph 2.4
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Continuous delivery for machine learning
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
Leveraging Graphs for Better AI
Building Data Apps with Python
Scaling Analytics with Apache Spark
Nose Dive into Apache Spark ML
Graph processing at scale using spark &amp; graph frames
Making sense of the Graph Revolution
Odsc 2019 entity_reputation_knowledge_graph
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Mohan C R CV
Python Machine Learning - Getting Started
Leveraging Graphs for Better AI
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Electronic commerce courselecture one. Pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
20250228 LYD VKU AI Blended-Learning.pptx
MIND Revenue Release Quarter 2 2025 Press Release
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
Big Data Technologies - Introduction.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
Assigned Numbers - 2025 - Bluetooth® Document
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
Electronic commerce courselecture one. Pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...

[Apache Kafka® Meetup by Confluent] Graph-based stream processing

  • 2. This is Katarina ● Born at the Croatian coast, in the town of Šibenik ● Master’s in Mathematics and Computer science from the Faculty of Science in Zagreb ● I work as a Developer Relations Engineer at Memgraph ● I love to travel, cook and eat tasty food (check out my Instagram)
  • 3. This is Ivan ● Born and living in Croatia ● Master’s in Computer science from the Faculty of Electrical Engineering and Computing in Zagreb ● I work as a Developer Relations Engineer at Memgraph ● I love to travel, cook and watch Netflix too much
  • 4. The graph data model When to use a graph database?
  • 5. What are graphs? A graph is a network structure that consists of a set of nodes (vertices) and a set of relationships (edges) connecting them. ● Nodes - structures that represent entities ● Relationships - connections between these entities ● Properties - associated values (key-value pairs) belonging to either nodes or relationships Labeled property graph model
  • 6. Graph database vs relational database
  • 7. Cypher query language Cypher is the most widely adopted, fully-specified, and open query language for property graph databases. It provides an intuitive way to work with property graphs. Cypher contains: - clauses such as MATCH, DELETE, SET, RETURN… - functions such as round(), cos(), toString()... - custom procedures written in Python, C/C++ and Rust
  • 8. Cypher query language MATCH (u:Customer{customer_id:'customer-one'})-[:BOUGHT]-> (p:Product)<-[:BOUGHT]-(peer:Customer)-[:BOUGHT]-> (reco:Product) WHERE not (u)-[:BOUGHT]->(reco) RETURN reco as Recommendation, count(*) as Frequency ORDER BY Frequency DESC LIMIT 5; SQL SELECT product.product_name as Recommendation, count(1) as Frequency FROM product, customer_product_mapping, (SELECT cpm3.product_id, cpm3.customer_id FROM Customer_product_mapping cpm, Customer_product_mapping cpm2, Customer_product_mapping cpm3 WHERE cpm.customer_id = ‘customer-one’ and cpm.product_id = cpm2.product_id and cpm2.customer_id != ‘customer-one’ and cpm3.customer_id = cpm2.customer_id and cpm3.product_id not in (select distinct product_id FROM Customer_product_mapping cpm WHERE cpm.customer_id = ‘customer-one’) ) recommended_products WHERE customer_product_mapping.product_id = product.product_id and customer_product_mapping.product_id in recommended_products.product_id and customer_product_mapping.customer_id = recommended_products.customer_id GROUP BY product.product_name ORDER BY Frequency desc
  • 9. Graph analytics in action When to use graph analytics?
  • 10. Graph analytics Graph analytics, also called network analysis, generates insights hidden in the relationships of the network structure. Some common algorithms include: ● Clustering & community detection ● Connected components ● PageRank ● Shortest path ● BFS & DFS
  • 11. Graph analytics in action Google was built on the PageRank algorithm measuring the importance of web pages. Facebook’s social graph uses Community Detection to infer unknown data about their users based on similar network behavior of other users to power their ad-targeting engine. Amazon uses Collaborative Filtering to deliver high quality real-time product recommendations. Pinterest uses Random Walks and Graph-Machine Learning to deliver high-quality personalized recommendations responsible for more than 80% of all user engagements. “Graph-Machine Learning features proved the most valuable of all other features when determining the quality and relevancy of our dish and restaurant recommendations.”
  • 17. Graph stream processing Analyze your data in real-time instead of in batches
  • 19. Graph stream processing ● Stream processing is a type of big data architecture in which the data is analyzed in real-time ● It is used whenever you need to extract additional information from you data as it’s being consumed ● The processing happens asynchronously - the data source and the stream processing work independently without waiting for a response from one another
  • 20. Memgraph Memgraph is a platform for graph computation on streaming data powered by an in-memory graph database. Memgraph is used to: ● Store graph data in memory ● Run graph analytics ● Analyze streaming data
  • 21. GQLAlchemy GQLAlchemy is a fully open-source Python library. It is an Object Graph Mapper (OGM) - a link between Graph Database objects and Python objects. GQLAlchemy includes: ● OGM capabilities ● Query builder ● On-disk storage ● Graph schema validation Jupyter Notebook Demo: ● https://github.com/memgraph/jupyter-memgrap h-tutorials/tree/main/twitter_network_analysis
  • 22. The stream processing architecture What do you need for stream processing?
  • 26. Analyzing a Twitter network We are going to find the most christmassy person on Twitter using the hashtag #christmas by analyzing retweets of the original tweet: ● Dynamic PageRank algorithm is used to find the most influential account ● Dynamic community detection algorithm is used to figure out which accounts belong to the same community
  • 27. Graph stream processing with Memgraph
  • 28. 1. Transformation module Transforming Kafka messages into graph objects:
  • 29. 2. PageRank Set up everything and run the dynamic PageRank algorithm:
  • 30. 3. Create and start the stream
  • 32. Stream processing in Memgraph 1. Create a transformation module in Python 2. Run the PageRank and community detection algorithms with GQLAlchemy 3. Create and start the stream with GQLAlchemy 4. Create a trigger and module to send PageRank results back to a Kafka topic
  • 34. Want to play with the Twitter dataset? Check out the Memgraph Playground: https://playground.memgraph.com/sandbox/twitter-christmas-retweets
  • 35. Thank you for your attention!
  • 36. www.memgraph.io If you like what we do, throw us a star! ⭐ https://github.com/memgraph/memgraph Check out the Twitter network analysis repository! https://github.com/memgraph/twitter-network-analysis
  • 37. www.memgraph.io Join our Discord server! memgr.ph/discord supe_katarina katarina.supe@memgraph.io ivan_g_despot ivan.despot@memgraph.io