SlideShare a Scribd company logo
1 KYOTO UNIVERSITY
KYOTO UNIVERSITY
GRADUATE SCHOOL OF INFORMATICS
Graph Machine Learning
- Past, Present, and Future -
Hisashi Kashima
Kyoto University
2 KYOTO UNIVERSITY
◼ Graph machine learning and graph signal processing have much in
common, but they have developed relatively separately
◼ History from the standpoint of graph machine learning, particularly in
that of predictive modeling, with some of the recent developments
◼ Past: The age of data mining and kernel machines
◼ Current: The age of graph neural networks
◼ Future?: Fusion with causal inference
Graph machine learning
Past, present, and future
Graph Machine Learning Graph Signal Processing
Today’s topic Global features
Local features
3 KYOTO UNIVERSITY
Graphs are versatile tools that model relationships between different entities using nodes (points) and edges (lines
connecting these points). In the real world, graphs represent various complex systems and interactions. Social
networks: Each person is a node, and their friendships or professional connections are edges. Graphs help identify
influential people, community structures, and the spread of information, aiding in marketing strategies and social
behavior research. Transportation networks: Cities or intersections are nodes, and roads or railways are edges.
Graphs optimize routes, manage traffic, and improve urban planning. Navigation apps use graph algorithms to
find the quickest routes. Biological networks: Proteins or genes are nodes, and their interactions are edges. These
graphs help understand cellular functions and disease mechanisms, guiding the development of targeted
therapies. Communication networks: Devices like computers or servers are nodes, and communication links are
edges. Graphs ensure efficient data transfer, robust network design, and better cybersecurity. Recommendation
systems: Users and products are nodes, and interactions (like purchases or ratings) are edges. Graphs enhance
recommendation accuracy, improving user experience on platforms like Amazon and Netflix. Graphs are crucial for
visualizing and analyzing complex relationships, optimizing processes, and improving decision-making across
various fields, from social media and urban development to biology and technology.
Graph machine learning
Graphs are everywhere!
# Do not read them seriously, texts and figures are generated by ChatGPT
4 KYOTO UNIVERSITY
Graphs are versatile tools that model relationships between different entities using nodes (points) and edges (lines
connecting these points). In the real world, graphs represent various complex systems and interactions. Social
networks: Each person is a node, and their friendships or professional connections are edges. Graphs help identify
influential people, community structures, and the spread of information, aiding in marketing strategies and social
behavior research. Transportation networks: Cities or intersections are nodes, and roads or railways are edges.
Graphs optimize routes, manage traffic, and improve urban planning. Navigation apps use graph algorithms to
find the quickest routes. Biological networks: Proteins or genes are nodes, and their interactions are edges. These
graphs help understand cellular functions and disease mechanisms, guiding the development of targeted
therapies. Communication networks: Devices like computers or servers are nodes, and communication links are
edges. Graphs ensure efficient data transfer, robust network design, and better cybersecurity. Recommendation
systems: Users and products are nodes, and interactions (like purchases or ratings) are edges. Graphs enhance
recommendation accuracy, improving user experience on platforms like Amazon and Netflix. Graphs are crucial for
visualizing and analyzing complex relationships, optimizing processes, and improving decision-making across
various fields, from social media and urban development to biology and technology.
Graph machine learning
Graphs are everywhere!
We Just Skip This Because
WE ALL
L VE
GRAPHS !!
# Do not read them seriously, texts and figures are generated by ChatGPT
5 KYOTO UNIVERSITY
The Age of Data Mining
6 KYOTO UNIVERSITY
◼ “Data mining” emerged in the 1990s
◼ originated from the database community
◼ with the aim of discovering knowledge from large databases
◼ Association rules: One of the major inventions of data mining
◼ Rules with the form “If 𝐴, then 𝐵” that satisfy
◼ Pr 𝐴 ∧ 𝐵 > 𝜃 : support constraint
◼ Pr 𝐵 𝐴) > 𝜂 : confidence constraint
◼ Example: “If buy burger ∧ buy(fries), then buy(soda)”
◼ Key technical challenge: How to enumerate all rules efficiently
Data Mining
Aiming for knowledge discovery from huge databases
∧
If
then
With high probability
7 KYOTO UNIVERSITY
◼ Itemset pattern mining problem:
Enumerate all itemsets (combinations of items) that frequently appear
in database
◼ Find all itemsets appearing more than 𝑘 times
◼ , , , , , , , , …
◼ Challenge: Exponential number of candidate itemset patterns exist
◼ We need to explore the huge space of the combinations efficiently
Itemset pattern mining
Discover all frequent item combinations
8 KYOTO UNIVERSITY
◼ Search space composition: Make a non-redundant search space
◼ To avoid evaluating the same patterns over and over again
◼ Search space pruning: Exploiting monotonicity
◼ If appears less than 𝑘 times, { , } will never appear 𝑘 times
◼ Explore smaller item sets first, larger item sets later
Techniques for itemset pattern mining
Composition and pruning of search space
A
If appears less
than 𝑘 times
Can be pruned
9 KYOTO UNIVERSITY
◼ Attempts to extend pattern mining to graphs began around 2000
◼ AGM algorithm: Seminal work by Inokuchi et al. (2000)
◼ What is “knowledge” in graph data mining? - Subgraphs
◼ Substructures are responsible for the nature of structural data
◼ Goal: Find all subgraph patterns appearing at least 𝑘 times
Graph Mining
Extension of frequent itemsets to graph data
(Quoted from Takigawa, I., & Mamitsuka, H. (2013). Graph mining: procedure, application to drug discovery and recent advances. Drug Discovery Today, 18(1-2), 50-57.)
Inokuchi, A., Washio, T., & Motoda, H. (2000). An apriori-based algorithm for mining frequent substructures from graph data.
In Principles of Data Mining and Knowledge Discovery (PKDD), 2000
10 KYOTO UNIVERSITY
◼ It is non-trivial to define an efficient search space for graphs
◼ Smart graph coding to avoid duplicate checking of isomorphic graphs
◼ AGM [Inokuchi et al.,2000] employed vertex sorting code
◼ gSpan [Yan&Han, 2022] employed depth-first search code
(and depth-first search using them)
Technical challenges in graph mining
Design of search space
(Quoted from Yan, X., & Han, J. (2002) gSpan: Graph-based substructure pattern mining. In IEEE International Conference on Data Mining (ICDM))
11 KYOTO UNIVERSITY
◼ Not explicitly oriented to prediction tasks (in machine learning)
◼ One exception is an interesting idea by Kudo et al. (2004):
Graph patterns are used as weak learners in the boosting algorithm
◼ Other limitations:
◼ Only discrete labels are assumed
◼ Node and edge labels are restricted to discrete
due to the construction of the discrete search space
◼ Continuous labels are usually discretized in advance
◼ High demands on computation and memory
Limitations of graph mining
Not explicitly oriented to prediction tasks
Kudo, T., Maeda, E., & Matsumoto, Y. (2004). An application of boosting to graph classification.
Advances in Neural Information Processing Systems, 17.
12 KYOTO UNIVERSITY
The Age of Kernel Machines
13 KYOTO UNIVERSITY
◼ Increased interest in graphs in machine learning,
where prediction is a major target:
◼ Graph classification, node classification, link prediction, …
◼ In data mining, knowledge discovery is the goal
◼ Using subgraphs as features in a predictive model seems like a natural
idea…, but the exponential number of features is still a barrier
Graph machine learning
Tasks that aim directly at prediction (and others)
Safe
Poisonous
Poisonous
or safe?
Prediction
Predictive
modeling
Subgraph features
14 KYOTO UNIVERSITY
◼ Support vector machine proposed by Cortes & Vapnik in 1995
◼ Linear model for data mapped to an ultra-high-dimensional feature
space: 𝑓(𝐱) = 𝐰⊤𝛟(𝐱)
◼ Equivalent to a non-linear model in the original space
◼ Can also be represented as a linear combination of kernel functions
𝑓(𝐱) = σ𝑖=1
𝑁
𝛼(𝑖) 𝛟(𝐱), 𝛟(𝐱(𝑖))
◼ “Kernel trick”: No matter how high (or infinite) the dimensionality of
feature space is, the dimensionality does not matter as long as the inner
product (= kernel function) can be computed efficiently
◼ An attractive framework with a high degree of freedom that is
applicable to any type of data as long as the kernel function is available
Kernel methods
Realization of nonlinear prediction via “kernel trick”
= Kernel function 𝑘 𝐱, 𝐱(𝑖)
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273-297.
15 KYOTO UNIVERSITY
◼ Many attempts to design kernel functions for various data types
◼ Convolution kernels for structured data [Haussler, 1999]
◼ Break down target structured data into parts (substructures)
◼ Define kernel functions by accumulating similarities between parts
◼ Graph kernels: 𝑘 , = 𝛟( ), 𝛟( )
◼ The idea of convolution kernels is also applicable to graphs
◼ How to define 𝛟? - Natural idea is to use subgraphs as the parts
◼ Trade-off between expressiveness of a class of subgraphs
and computational efficiency needs to be considered
Graph kernels
Kernel methods for graph-structured data
Haussler, D. (1999). Convolution kernels on discrete structures.
Technical report, Department of Computer Science, University of California at Santa Cruz.
16 KYOTO UNIVERSITY
◼ Requirement: Trade-off between expressiveness of a class of
subgraphs and computational efficiency
◼ Random walk kernel [Kashima et al. (2003), Gärtner et al. (2003)]
◼ Use label sequences generated by random walks on graphs
◼ Infinite number of label sequences exist
◼ Can be computed in polynomial time by solving linear equations
◼ In practice, a few matrix multiplications with the power method
suffice
◼ Numerous extensions: tree-like patterns, small subgraphs, …
Random walk graph kernel
Infinite number of graph features in polynomial time
(5), (4, 3, 4), (4, 5, 2, 3), (1, 4), ….
Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In ICML.
Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: Hardness results and efficient alternatives. In COLT.
17 KYOTO UNIVERSITY
◼ Kernel trick elegantly solved the curse of dimensionality
◼ Feature dimensionality does not matter,
as long as the kernel function can be computed efficiently
◼ However, kernel trick also has created a new "curse of big data“
◼ Size of the problem/model depends on training data size 𝑁
𝑓(𝐱) = σ𝑖=1
𝑁
𝛼(𝑖)𝑘 𝐱, 𝐱(𝑖)
◼ Serious bottleneck when dealing with large data
◼ Some remedies were proposed, including compression of kernel
matrices …
The “curse” of kernel trick
Vulnerable to data size increase
18 KYOTO UNIVERSITY
◼ Now is the time to move from dual space back to primal space!
◼ Explicit feature composition in primal space by discarding kernel trick
◼ Weisfeiler-Lehman (WL) kernel [Shervashidze et al., 2009]
◼ Based on WL graph isomorphism test
◼ Each node obtains explicit feature representation of local structure
by message passing from neighborhood nodes
◼ (BTW, Hido & Kashima (2009) proposed an essentially same idea… )
Weisfeiler-Lehman (WL) kernel
Feature construction in primal space by message passing
Update
Shervashidze, N., Schweitzer, P., Van Leeuwen, E. J., Mehlhorn, K., & Borgwardt, K. M. (2011). Weisfeiler-lehman graph kernels.
Journal of Machine Learning Research, 12(9).
Hido, S., & Kashima, H. (2009). A linear-time graph kernel. In IEEE International Conference on Data Mining (ICDM).
19 KYOTO UNIVERSITY
The Age of Deep Neural Networks
20 KYOTO UNIVERSITY
◼ 2010s saw the rise of deep learning; the trend extended to graph
machine learning ⇒ Graph neural networks (GNNs)
◼ Two (eventually similar) streams of graph neural network design
◼ Graph convolutional neural network
◼ Originates from graph signal processing
◼ Message passing graph neural network
◼ Based on the idea of aggregation of graph substructures
𝐱𝑖
NEW
= aggr 𝐱𝑖 , σ𝑗∈𝑁𝑖
𝐱𝑗
Graph neural network (GNN)
Graph convolution and message passing
Update node representation by
aggregating information of adjacent vertices
21 KYOTO UNIVERSITY
Structure of graph neural networks
Capture large substructures with multiple layers
Forward path
from input graph
to output label
(Quote and edit from Pope et al. CVPR (2019))
Each layer recognizes larger
subgraph features
Prediction based on
entire graph representation
22 KYOTO UNIVERSITY
◼ Various extensions of graph neural networks
◼ Graph attention: introducing attention mechanism into GNNS
◼ Focus on important vertices in information aggregation
◼ Extensions of target graph class
◼ Heterogeneous graph
◼ Hypergraphs
◼ Graph of Graphs (GoG)
◼ E.g., chemical networks
Extensions of GNNs
Attention mechanism and more general graphs
Harada et al. (2020)
Harada, S., Akita, H., Tsubaki, M., Baba, Y., Takigawa, I., Yamanishi, Y., & Kashima, H. (2020).
Dual graph convolutional neural network for predicting chemical networks. BMC bioinformatics, 21, 1-13.
23 KYOTO UNIVERSITY
◼ Traffic forecasting is an important fundamental technology
for realizing intelligent transportation systems
◼ Promising GNN application!
◼ Traffic network can be represented as a graph
◼ Traffics on road segments have a complex relationship to each other
Application of GNN
Traffic prediction
Shirakami, R., Kitahara, T., Takeuchi, K., & Kashima, H. (2023). QTNet: Theory-based queue length prediction for
urban traffic. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining
24 KYOTO UNIVERSITY
◼ Goal: Predict speed, flow, and queue length at each time and location
◼ Physics-informed ML + GNN:
Efficiently incorporate knowledge of traffic engineering as a constraint
◼ Achieved better prediction performance especially in severe congestions
Physics-informed GNN for traffic prediction
Incorporate known knowledge into graph ML
Known relationship among
speed, flow, and queue length
Shirakami, R., Kitahara, T., Takeuchi, K., & Kashima, H. (2023). QTNet: Theory-based queue length prediction for
urban traffic. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining
25 KYOTO UNIVERSITY
1. Over-smoothing:
◼ Using more layers make all vertices converge to the same
representation, which degrades performance
◼ Remedies: Pruning, skip-connection, selective layer weighting, …
2. Limited representation power
Two major technical issues of GNNs
Over-smoothing and limited representation power
https://minyoungg.github.io/MIT-deeplearning-blogs/2021/12/09/oversquashing-in-gnns/
26 KYOTO UNIVERSITY
1. Over-smoothing
2. Limited representation power
◼ Different graphs that cannot be distinguished using WL test and GNN
◼ ↓ All vertices have the same feature after message passing
Two major technical issues of GNNs
Over-smoothing and limited representation power
Indistinguishable
All vertices always have two neighbors with
the same color
27 KYOTO UNIVERSITY
◼ Graph Isomorphism Network (GIN) [Xu et al.,2019]
◼ Adding random features to nodes further strengthens the
representation power [Sato et al., 2021]
◼ Performs well also in practice!
Making GNN more powerful than standard GNNs
Just adding random features strengthen GNN
Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How Powerful are Graph Neural Networks?. In ICLR.
Sato, R., Yamada, M., & Kashima, H. (2021). Random features strengthen graph neural networks. In SDM.
28 KYOTO UNIVERSITY
The Age of Causal Inference
29 KYOTO UNIVERSITY
◼ Prediction for decision making:
One of most important uses of predictive machine learning is to
support or automate decision making
◼ If we know that a certain customer are likely to buy some products,
we can just recommend them
◼ Treatment effect prediction for better decision making:
◼ If we issue discount coupons for products,
we should issue coupons only for products with the highest effect
◼ We need to consider the causal effect of recommendations on
propensity to buy
Treatment effect prediction
Decision support based on causal effects of actions
× →
30 KYOTO UNIVERSITY
◼ Treatment effect: Differences in outcomes with and without treatment
= Outcome of treatment 𝑌T − Outcome without treatment 𝑌C
◼ Treatment effect can be predicted if both 𝑌T and 𝑌C are predicted
Treatment effect
Quantification of strength of causal relationships
No
Coupon
Outcome of treatment
Outcome without treatment
𝑌T
𝑌C
Treatment effect
Discount coupons promoted
propensity to buy
𝑌T − 𝑌C
31 KYOTO UNIVERSITY
Treatment effect prediction problem
Learning models from data including biased treatments
▪ Training data 𝐱𝑖, 𝑧𝑖, 𝑦𝑖 𝑖=1
𝑁
⚫ 𝐱: input (target of treatment)
⚫ 𝑧 ∈ {0,1}: treatment
⚫ 𝑦: outcome
▪ Goal: prediction model 𝑓: 𝒳 × 𝒵 → 𝒴
◼ Given a target and treatment, predict its outcome
▪ Challenge: Learn unbiased prediction model from biased treatment data
𝑎𝑎, 𝑎𝑎, 𝑎𝑎
target Treated? Outcome
Let us give coupons to
rich people!
Potential car buyers are issued with coupons with higher probability
Risk of
over-estimated
treatment effect
𝐱 𝑧 𝑦
32 KYOTO UNIVERSITY
Deep learning approach to treatment effect prediction
Learning representations independent of treatments
▪ If data is biased, prediction model will be also biased
▪ Various bias reduction methods were proposed in causal inference
▪ In deep learning, intermediate representation of target is learned to be
independent of treatments [Shalit et al., 2017]
– Use independence measure for regularization in representation learning
Treatment
Target
or
Independence measure
(IPM)
X
Outcome
Target representation
Shalit, U., Johansson, F. D., & Sontag, D. (2017). Estimating individual treatment effect:
generalization bounds and algorithms. In International Conference on Machine Learning (ICML)
33 KYOTO UNIVERSITY
◼ Integration of graph machine learning with treatment effect prediction
◼ What are graphs in treatment effect prediction?
◼ Target inputs as graphs:
◼ Input space has graph structure
◼ E.g., Targeted marketing in SNS
◼ Challenge: Interference/spillover effects
◼ Treatments as graphs
◼ Each treatment has a graph structure
◼ E.g., Drug effect estimation
◼ Challenge: Infinite number of treatments
Treatment effect prediction + Graph ML
Treatment targets or treatments as graphs
34 KYOTO UNIVERSITY
Graph-structured treatment targets
GNN considers treatment interference on graph
▪ Treatment effect prediction in a graph-structured input space (e.g., SNS)
▪ Interference of treatments can occur between neighbors
– “My friends have coupons, but I don't...”
▪ GNN extracts features independent of treatments [Ma&Tresp,2021]
– GNN incorporates neighborhood information
– Independence regularization acquires
treatment-independent representations
▪ Extensions to heterogeneous networks
and unknown networks
[Lin et al., 2023,2024]
Ma, Y., & Tresp, V. (2021). Causal inference under networked interference and intervention policy enhancement. In AISTATS.
Lin, X., Zhang, G., Lu, X., Bao, H., Takeuchi, K., & Kashima, H. (2023). Estimating treatment effects under heterogeneous interference. In ECML PKDD.
Lin, X., Zhang, G., Lu, X., & Kashima, H. (2024). Treatment Effect Estimation Under Unknown Interference. In PAKDD.
35 KYOTO UNIVERSITY
Graph-structured treatments
GNN extracts features of graph treatments
▪ Treatment effect prediction of graph-structured treatments (e.g., drugs)
[Harada&Kashima, 2021]
▪ Ensure independence between target representation and treatment
representation of graph treatment extracted by GNN
▪ Zero-shot treatment effect prediction: Applicable to first-time treatments
Target
Independent
(HSIC regularization)
X Outcome
Target representation
G
N
N
Treatment representation
Graph-structured
treatment
Harada, S., & Kashima, H. (2021). GraphITE: Estimating individual effects of graph-structured treatments.
In International Conference on Information & Knowledge Management (CIKM)
36 KYOTO UNIVERSITY
Summary
37 KYOTO UNIVERSITY
◼ A (personal and biased) look at the history of graph machine learning,
from data mining and kernel methods to graph neural networks
◼ Although the techniques vary from time to time, the ideas of focusing
on substructures and message propagation on graphs are inherited
◼ Many topics omitted: ranking/clustering, structured output prediction
… as well as historical developments in graph signal processing
◼ Graph generation is one of the most important future topics
… as well as dynamic/heterogeneous graphs, privacy/fairness/security
Graph machine learning
Graph mining, graph kernels, GNN, and causal inference
Graph Machine Learning Graph Signal Processing

More Related Content

PPT
Lect12 graph mining
PPT
Survey on Frequent Pattern Mining on Graph Data - Slides
PPTX
Networks, Deep Learning and COVID-19
PPT
Trends In Graph Data Management And Mining
PPTX
A survey on graph kernels
PPT
graph_mining_seminar_2009.ppt
PPTX
Large Graph Mining
PPT
Graph mining seminar_2009
Lect12 graph mining
Survey on Frequent Pattern Mining on Graph Data - Slides
Networks, Deep Learning and COVID-19
Trends In Graph Data Management And Mining
A survey on graph kernels
graph_mining_seminar_2009.ppt
Large Graph Mining
Graph mining seminar_2009

Similar to Graph Machine Learning - Past, Present, and Future - (20)

PDF
High-Performance Graph Analysis and Modeling
PPT
5.5 graph mining
PDF
Social Media Mining _indian edition available.pdf
PDF
Social Media Mining _indian edition available.pdf
PDF
Machine Learning Powered by Graphs - Alessandro Negro
PDF
Subgraph relative frequency approach for extracting interesting substructur
PPTX
240722_Thuy_Labseminar[Unveiling Global Interactive Patterns across Graphs: T...
PDF
Grl book
PPTX
GraphTour Boston - Graphs for AI and ML
PDF
Descriptive m0deling
PDF
Graph Analyses with Python and NetworkX
PPTX
Data mining.pptx
PPTX
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
PPTX
Machine Learning Summary for Caltech2
PDF
Geometric Deep Learning
PDF
Ijetcas14 314
PPTX
Graph Models for Deep Learning
PDF
Workshop Tel Aviv - Graph Data Science
PPTX
Graphs for Ai and ML
PDF
Text categorization as a graph
High-Performance Graph Analysis and Modeling
5.5 graph mining
Social Media Mining _indian edition available.pdf
Social Media Mining _indian edition available.pdf
Machine Learning Powered by Graphs - Alessandro Negro
Subgraph relative frequency approach for extracting interesting substructur
240722_Thuy_Labseminar[Unveiling Global Interactive Patterns across Graphs: T...
Grl book
GraphTour Boston - Graphs for AI and ML
Descriptive m0deling
Graph Analyses with Python and NetworkX
Data mining.pptx
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Machine Learning Summary for Caltech2
Geometric Deep Learning
Ijetcas14 314
Graph Models for Deep Learning
Workshop Tel Aviv - Graph Data Science
Graphs for Ai and ML
Text categorization as a graph
Ad

Recently uploaded (20)

PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
annual-report-2024-2025 original latest.
PDF
[EN] Industrial Machine Downtime Prediction
PDF
How to run a consulting project- client discovery
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Modelling in Business Intelligence , information system
PDF
Business Analytics and business intelligence.pdf
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
Leprosy and NLEP programme community medicine
PPTX
modul_python (1).pptx for professional and student
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
A Complete Guide to Streamlining Business Processes
ISS -ESG Data flows What is ESG and HowHow
annual-report-2024-2025 original latest.
[EN] Industrial Machine Downtime Prediction
How to run a consulting project- client discovery
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Data_Analytics_and_PowerBI_Presentation.pptx
SAP 2 completion done . PRESENTATION.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Modelling in Business Intelligence , information system
Business Analytics and business intelligence.pdf
Optimise Shopper Experiences with a Strong Data Estate.pdf
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Leprosy and NLEP programme community medicine
modul_python (1).pptx for professional and student
Galatica Smart Energy Infrastructure Startup Pitch Deck
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
A Complete Guide to Streamlining Business Processes
Ad

Graph Machine Learning - Past, Present, and Future -

  • 1. 1 KYOTO UNIVERSITY KYOTO UNIVERSITY GRADUATE SCHOOL OF INFORMATICS Graph Machine Learning - Past, Present, and Future - Hisashi Kashima Kyoto University
  • 2. 2 KYOTO UNIVERSITY ◼ Graph machine learning and graph signal processing have much in common, but they have developed relatively separately ◼ History from the standpoint of graph machine learning, particularly in that of predictive modeling, with some of the recent developments ◼ Past: The age of data mining and kernel machines ◼ Current: The age of graph neural networks ◼ Future?: Fusion with causal inference Graph machine learning Past, present, and future Graph Machine Learning Graph Signal Processing Today’s topic Global features Local features
  • 3. 3 KYOTO UNIVERSITY Graphs are versatile tools that model relationships between different entities using nodes (points) and edges (lines connecting these points). In the real world, graphs represent various complex systems and interactions. Social networks: Each person is a node, and their friendships or professional connections are edges. Graphs help identify influential people, community structures, and the spread of information, aiding in marketing strategies and social behavior research. Transportation networks: Cities or intersections are nodes, and roads or railways are edges. Graphs optimize routes, manage traffic, and improve urban planning. Navigation apps use graph algorithms to find the quickest routes. Biological networks: Proteins or genes are nodes, and their interactions are edges. These graphs help understand cellular functions and disease mechanisms, guiding the development of targeted therapies. Communication networks: Devices like computers or servers are nodes, and communication links are edges. Graphs ensure efficient data transfer, robust network design, and better cybersecurity. Recommendation systems: Users and products are nodes, and interactions (like purchases or ratings) are edges. Graphs enhance recommendation accuracy, improving user experience on platforms like Amazon and Netflix. Graphs are crucial for visualizing and analyzing complex relationships, optimizing processes, and improving decision-making across various fields, from social media and urban development to biology and technology. Graph machine learning Graphs are everywhere! # Do not read them seriously, texts and figures are generated by ChatGPT
  • 4. 4 KYOTO UNIVERSITY Graphs are versatile tools that model relationships between different entities using nodes (points) and edges (lines connecting these points). In the real world, graphs represent various complex systems and interactions. Social networks: Each person is a node, and their friendships or professional connections are edges. Graphs help identify influential people, community structures, and the spread of information, aiding in marketing strategies and social behavior research. Transportation networks: Cities or intersections are nodes, and roads or railways are edges. Graphs optimize routes, manage traffic, and improve urban planning. Navigation apps use graph algorithms to find the quickest routes. Biological networks: Proteins or genes are nodes, and their interactions are edges. These graphs help understand cellular functions and disease mechanisms, guiding the development of targeted therapies. Communication networks: Devices like computers or servers are nodes, and communication links are edges. Graphs ensure efficient data transfer, robust network design, and better cybersecurity. Recommendation systems: Users and products are nodes, and interactions (like purchases or ratings) are edges. Graphs enhance recommendation accuracy, improving user experience on platforms like Amazon and Netflix. Graphs are crucial for visualizing and analyzing complex relationships, optimizing processes, and improving decision-making across various fields, from social media and urban development to biology and technology. Graph machine learning Graphs are everywhere! We Just Skip This Because WE ALL L VE GRAPHS !! # Do not read them seriously, texts and figures are generated by ChatGPT
  • 5. 5 KYOTO UNIVERSITY The Age of Data Mining
  • 6. 6 KYOTO UNIVERSITY ◼ “Data mining” emerged in the 1990s ◼ originated from the database community ◼ with the aim of discovering knowledge from large databases ◼ Association rules: One of the major inventions of data mining ◼ Rules with the form “If 𝐴, then 𝐵” that satisfy ◼ Pr 𝐴 ∧ 𝐵 > 𝜃 : support constraint ◼ Pr 𝐵 𝐴) > 𝜂 : confidence constraint ◼ Example: “If buy burger ∧ buy(fries), then buy(soda)” ◼ Key technical challenge: How to enumerate all rules efficiently Data Mining Aiming for knowledge discovery from huge databases ∧ If then With high probability
  • 7. 7 KYOTO UNIVERSITY ◼ Itemset pattern mining problem: Enumerate all itemsets (combinations of items) that frequently appear in database ◼ Find all itemsets appearing more than 𝑘 times ◼ , , , , , , , , … ◼ Challenge: Exponential number of candidate itemset patterns exist ◼ We need to explore the huge space of the combinations efficiently Itemset pattern mining Discover all frequent item combinations
  • 8. 8 KYOTO UNIVERSITY ◼ Search space composition: Make a non-redundant search space ◼ To avoid evaluating the same patterns over and over again ◼ Search space pruning: Exploiting monotonicity ◼ If appears less than 𝑘 times, { , } will never appear 𝑘 times ◼ Explore smaller item sets first, larger item sets later Techniques for itemset pattern mining Composition and pruning of search space A If appears less than 𝑘 times Can be pruned
  • 9. 9 KYOTO UNIVERSITY ◼ Attempts to extend pattern mining to graphs began around 2000 ◼ AGM algorithm: Seminal work by Inokuchi et al. (2000) ◼ What is “knowledge” in graph data mining? - Subgraphs ◼ Substructures are responsible for the nature of structural data ◼ Goal: Find all subgraph patterns appearing at least 𝑘 times Graph Mining Extension of frequent itemsets to graph data (Quoted from Takigawa, I., & Mamitsuka, H. (2013). Graph mining: procedure, application to drug discovery and recent advances. Drug Discovery Today, 18(1-2), 50-57.) Inokuchi, A., Washio, T., & Motoda, H. (2000). An apriori-based algorithm for mining frequent substructures from graph data. In Principles of Data Mining and Knowledge Discovery (PKDD), 2000
  • 10. 10 KYOTO UNIVERSITY ◼ It is non-trivial to define an efficient search space for graphs ◼ Smart graph coding to avoid duplicate checking of isomorphic graphs ◼ AGM [Inokuchi et al.,2000] employed vertex sorting code ◼ gSpan [Yan&Han, 2022] employed depth-first search code (and depth-first search using them) Technical challenges in graph mining Design of search space (Quoted from Yan, X., & Han, J. (2002) gSpan: Graph-based substructure pattern mining. In IEEE International Conference on Data Mining (ICDM))
  • 11. 11 KYOTO UNIVERSITY ◼ Not explicitly oriented to prediction tasks (in machine learning) ◼ One exception is an interesting idea by Kudo et al. (2004): Graph patterns are used as weak learners in the boosting algorithm ◼ Other limitations: ◼ Only discrete labels are assumed ◼ Node and edge labels are restricted to discrete due to the construction of the discrete search space ◼ Continuous labels are usually discretized in advance ◼ High demands on computation and memory Limitations of graph mining Not explicitly oriented to prediction tasks Kudo, T., Maeda, E., & Matsumoto, Y. (2004). An application of boosting to graph classification. Advances in Neural Information Processing Systems, 17.
  • 12. 12 KYOTO UNIVERSITY The Age of Kernel Machines
  • 13. 13 KYOTO UNIVERSITY ◼ Increased interest in graphs in machine learning, where prediction is a major target: ◼ Graph classification, node classification, link prediction, … ◼ In data mining, knowledge discovery is the goal ◼ Using subgraphs as features in a predictive model seems like a natural idea…, but the exponential number of features is still a barrier Graph machine learning Tasks that aim directly at prediction (and others) Safe Poisonous Poisonous or safe? Prediction Predictive modeling Subgraph features
  • 14. 14 KYOTO UNIVERSITY ◼ Support vector machine proposed by Cortes & Vapnik in 1995 ◼ Linear model for data mapped to an ultra-high-dimensional feature space: 𝑓(𝐱) = 𝐰⊤𝛟(𝐱) ◼ Equivalent to a non-linear model in the original space ◼ Can also be represented as a linear combination of kernel functions 𝑓(𝐱) = σ𝑖=1 𝑁 𝛼(𝑖) 𝛟(𝐱), 𝛟(𝐱(𝑖)) ◼ “Kernel trick”: No matter how high (or infinite) the dimensionality of feature space is, the dimensionality does not matter as long as the inner product (= kernel function) can be computed efficiently ◼ An attractive framework with a high degree of freedom that is applicable to any type of data as long as the kernel function is available Kernel methods Realization of nonlinear prediction via “kernel trick” = Kernel function 𝑘 𝐱, 𝐱(𝑖) Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273-297.
  • 15. 15 KYOTO UNIVERSITY ◼ Many attempts to design kernel functions for various data types ◼ Convolution kernels for structured data [Haussler, 1999] ◼ Break down target structured data into parts (substructures) ◼ Define kernel functions by accumulating similarities between parts ◼ Graph kernels: 𝑘 , = 𝛟( ), 𝛟( ) ◼ The idea of convolution kernels is also applicable to graphs ◼ How to define 𝛟? - Natural idea is to use subgraphs as the parts ◼ Trade-off between expressiveness of a class of subgraphs and computational efficiency needs to be considered Graph kernels Kernel methods for graph-structured data Haussler, D. (1999). Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz.
  • 16. 16 KYOTO UNIVERSITY ◼ Requirement: Trade-off between expressiveness of a class of subgraphs and computational efficiency ◼ Random walk kernel [Kashima et al. (2003), Gärtner et al. (2003)] ◼ Use label sequences generated by random walks on graphs ◼ Infinite number of label sequences exist ◼ Can be computed in polynomial time by solving linear equations ◼ In practice, a few matrix multiplications with the power method suffice ◼ Numerous extensions: tree-like patterns, small subgraphs, … Random walk graph kernel Infinite number of graph features in polynomial time (5), (4, 3, 4), (4, 5, 2, 3), (1, 4), …. Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In ICML. Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: Hardness results and efficient alternatives. In COLT.
  • 17. 17 KYOTO UNIVERSITY ◼ Kernel trick elegantly solved the curse of dimensionality ◼ Feature dimensionality does not matter, as long as the kernel function can be computed efficiently ◼ However, kernel trick also has created a new "curse of big data“ ◼ Size of the problem/model depends on training data size 𝑁 𝑓(𝐱) = σ𝑖=1 𝑁 𝛼(𝑖)𝑘 𝐱, 𝐱(𝑖) ◼ Serious bottleneck when dealing with large data ◼ Some remedies were proposed, including compression of kernel matrices … The “curse” of kernel trick Vulnerable to data size increase
  • 18. 18 KYOTO UNIVERSITY ◼ Now is the time to move from dual space back to primal space! ◼ Explicit feature composition in primal space by discarding kernel trick ◼ Weisfeiler-Lehman (WL) kernel [Shervashidze et al., 2009] ◼ Based on WL graph isomorphism test ◼ Each node obtains explicit feature representation of local structure by message passing from neighborhood nodes ◼ (BTW, Hido & Kashima (2009) proposed an essentially same idea… ) Weisfeiler-Lehman (WL) kernel Feature construction in primal space by message passing Update Shervashidze, N., Schweitzer, P., Van Leeuwen, E. J., Mehlhorn, K., & Borgwardt, K. M. (2011). Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(9). Hido, S., & Kashima, H. (2009). A linear-time graph kernel. In IEEE International Conference on Data Mining (ICDM).
  • 19. 19 KYOTO UNIVERSITY The Age of Deep Neural Networks
  • 20. 20 KYOTO UNIVERSITY ◼ 2010s saw the rise of deep learning; the trend extended to graph machine learning ⇒ Graph neural networks (GNNs) ◼ Two (eventually similar) streams of graph neural network design ◼ Graph convolutional neural network ◼ Originates from graph signal processing ◼ Message passing graph neural network ◼ Based on the idea of aggregation of graph substructures 𝐱𝑖 NEW = aggr 𝐱𝑖 , σ𝑗∈𝑁𝑖 𝐱𝑗 Graph neural network (GNN) Graph convolution and message passing Update node representation by aggregating information of adjacent vertices
  • 21. 21 KYOTO UNIVERSITY Structure of graph neural networks Capture large substructures with multiple layers Forward path from input graph to output label (Quote and edit from Pope et al. CVPR (2019)) Each layer recognizes larger subgraph features Prediction based on entire graph representation
  • 22. 22 KYOTO UNIVERSITY ◼ Various extensions of graph neural networks ◼ Graph attention: introducing attention mechanism into GNNS ◼ Focus on important vertices in information aggregation ◼ Extensions of target graph class ◼ Heterogeneous graph ◼ Hypergraphs ◼ Graph of Graphs (GoG) ◼ E.g., chemical networks Extensions of GNNs Attention mechanism and more general graphs Harada et al. (2020) Harada, S., Akita, H., Tsubaki, M., Baba, Y., Takigawa, I., Yamanishi, Y., & Kashima, H. (2020). Dual graph convolutional neural network for predicting chemical networks. BMC bioinformatics, 21, 1-13.
  • 23. 23 KYOTO UNIVERSITY ◼ Traffic forecasting is an important fundamental technology for realizing intelligent transportation systems ◼ Promising GNN application! ◼ Traffic network can be represented as a graph ◼ Traffics on road segments have a complex relationship to each other Application of GNN Traffic prediction Shirakami, R., Kitahara, T., Takeuchi, K., & Kashima, H. (2023). QTNet: Theory-based queue length prediction for urban traffic. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining
  • 24. 24 KYOTO UNIVERSITY ◼ Goal: Predict speed, flow, and queue length at each time and location ◼ Physics-informed ML + GNN: Efficiently incorporate knowledge of traffic engineering as a constraint ◼ Achieved better prediction performance especially in severe congestions Physics-informed GNN for traffic prediction Incorporate known knowledge into graph ML Known relationship among speed, flow, and queue length Shirakami, R., Kitahara, T., Takeuchi, K., & Kashima, H. (2023). QTNet: Theory-based queue length prediction for urban traffic. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining
  • 25. 25 KYOTO UNIVERSITY 1. Over-smoothing: ◼ Using more layers make all vertices converge to the same representation, which degrades performance ◼ Remedies: Pruning, skip-connection, selective layer weighting, … 2. Limited representation power Two major technical issues of GNNs Over-smoothing and limited representation power https://minyoungg.github.io/MIT-deeplearning-blogs/2021/12/09/oversquashing-in-gnns/
  • 26. 26 KYOTO UNIVERSITY 1. Over-smoothing 2. Limited representation power ◼ Different graphs that cannot be distinguished using WL test and GNN ◼ ↓ All vertices have the same feature after message passing Two major technical issues of GNNs Over-smoothing and limited representation power Indistinguishable All vertices always have two neighbors with the same color
  • 27. 27 KYOTO UNIVERSITY ◼ Graph Isomorphism Network (GIN) [Xu et al.,2019] ◼ Adding random features to nodes further strengthens the representation power [Sato et al., 2021] ◼ Performs well also in practice! Making GNN more powerful than standard GNNs Just adding random features strengthen GNN Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How Powerful are Graph Neural Networks?. In ICLR. Sato, R., Yamada, M., & Kashima, H. (2021). Random features strengthen graph neural networks. In SDM.
  • 28. 28 KYOTO UNIVERSITY The Age of Causal Inference
  • 29. 29 KYOTO UNIVERSITY ◼ Prediction for decision making: One of most important uses of predictive machine learning is to support or automate decision making ◼ If we know that a certain customer are likely to buy some products, we can just recommend them ◼ Treatment effect prediction for better decision making: ◼ If we issue discount coupons for products, we should issue coupons only for products with the highest effect ◼ We need to consider the causal effect of recommendations on propensity to buy Treatment effect prediction Decision support based on causal effects of actions × →
  • 30. 30 KYOTO UNIVERSITY ◼ Treatment effect: Differences in outcomes with and without treatment = Outcome of treatment 𝑌T − Outcome without treatment 𝑌C ◼ Treatment effect can be predicted if both 𝑌T and 𝑌C are predicted Treatment effect Quantification of strength of causal relationships No Coupon Outcome of treatment Outcome without treatment 𝑌T 𝑌C Treatment effect Discount coupons promoted propensity to buy 𝑌T − 𝑌C
  • 31. 31 KYOTO UNIVERSITY Treatment effect prediction problem Learning models from data including biased treatments ▪ Training data 𝐱𝑖, 𝑧𝑖, 𝑦𝑖 𝑖=1 𝑁 ⚫ 𝐱: input (target of treatment) ⚫ 𝑧 ∈ {0,1}: treatment ⚫ 𝑦: outcome ▪ Goal: prediction model 𝑓: 𝒳 × 𝒵 → 𝒴 ◼ Given a target and treatment, predict its outcome ▪ Challenge: Learn unbiased prediction model from biased treatment data 𝑎𝑎, 𝑎𝑎, 𝑎𝑎 target Treated? Outcome Let us give coupons to rich people! Potential car buyers are issued with coupons with higher probability Risk of over-estimated treatment effect 𝐱 𝑧 𝑦
  • 32. 32 KYOTO UNIVERSITY Deep learning approach to treatment effect prediction Learning representations independent of treatments ▪ If data is biased, prediction model will be also biased ▪ Various bias reduction methods were proposed in causal inference ▪ In deep learning, intermediate representation of target is learned to be independent of treatments [Shalit et al., 2017] – Use independence measure for regularization in representation learning Treatment Target or Independence measure (IPM) X Outcome Target representation Shalit, U., Johansson, F. D., & Sontag, D. (2017). Estimating individual treatment effect: generalization bounds and algorithms. In International Conference on Machine Learning (ICML)
  • 33. 33 KYOTO UNIVERSITY ◼ Integration of graph machine learning with treatment effect prediction ◼ What are graphs in treatment effect prediction? ◼ Target inputs as graphs: ◼ Input space has graph structure ◼ E.g., Targeted marketing in SNS ◼ Challenge: Interference/spillover effects ◼ Treatments as graphs ◼ Each treatment has a graph structure ◼ E.g., Drug effect estimation ◼ Challenge: Infinite number of treatments Treatment effect prediction + Graph ML Treatment targets or treatments as graphs
  • 34. 34 KYOTO UNIVERSITY Graph-structured treatment targets GNN considers treatment interference on graph ▪ Treatment effect prediction in a graph-structured input space (e.g., SNS) ▪ Interference of treatments can occur between neighbors – “My friends have coupons, but I don't...” ▪ GNN extracts features independent of treatments [Ma&Tresp,2021] – GNN incorporates neighborhood information – Independence regularization acquires treatment-independent representations ▪ Extensions to heterogeneous networks and unknown networks [Lin et al., 2023,2024] Ma, Y., & Tresp, V. (2021). Causal inference under networked interference and intervention policy enhancement. In AISTATS. Lin, X., Zhang, G., Lu, X., Bao, H., Takeuchi, K., & Kashima, H. (2023). Estimating treatment effects under heterogeneous interference. In ECML PKDD. Lin, X., Zhang, G., Lu, X., & Kashima, H. (2024). Treatment Effect Estimation Under Unknown Interference. In PAKDD.
  • 35. 35 KYOTO UNIVERSITY Graph-structured treatments GNN extracts features of graph treatments ▪ Treatment effect prediction of graph-structured treatments (e.g., drugs) [Harada&Kashima, 2021] ▪ Ensure independence between target representation and treatment representation of graph treatment extracted by GNN ▪ Zero-shot treatment effect prediction: Applicable to first-time treatments Target Independent (HSIC regularization) X Outcome Target representation G N N Treatment representation Graph-structured treatment Harada, S., & Kashima, H. (2021). GraphITE: Estimating individual effects of graph-structured treatments. In International Conference on Information & Knowledge Management (CIKM)
  • 37. 37 KYOTO UNIVERSITY ◼ A (personal and biased) look at the history of graph machine learning, from data mining and kernel methods to graph neural networks ◼ Although the techniques vary from time to time, the ideas of focusing on substructures and message propagation on graphs are inherited ◼ Many topics omitted: ranking/clustering, structured output prediction … as well as historical developments in graph signal processing ◼ Graph generation is one of the most important future topics … as well as dynamic/heterogeneous graphs, privacy/fairness/security Graph machine learning Graph mining, graph kernels, GNN, and causal inference Graph Machine Learning Graph Signal Processing