Visualization of Graph Neural Networks
Have you ever found it challenging to represent a graph from a very large dataset while building a graph neural network model?
This article presents a method to sample and visualize a subgraph from such datasets.
🎯 Introduction
🎨 NetworkX library
Overview
Sampling large graphs
Implementation
Layouts
Graph representation
Comparing layouts
📘 References
💡 Appendix
What you will learn: How to sample and visualize a graph from a very large dataset for modeling graph neural networks.
Notes:
✅ Please subscribe to Hands-on Geometric Deep Learning for in-depth topics on Geometric learning, reviews and exercises.
🎯 Introduction
This article focuses on visualizing subgraphs within the context of graph neural networks. It does not cover the introduction or explanation of graph neural network architectures and models, as those topics are beyond its scope [ref 2, 3].
📌 There are several Python libraries available for analyzing and visualizing graphs, including Plotly, PyVis, and NetworkKit. In some of our future articles on graph neural networks and geometric deep learning, we will use NetworkX for visualization.
As a reminder, a graph data is fully defined by an instance of Data of 'torch_geometric.data' package with the following property
🎨 NetworkX library
Overview
NetworkX is a BSD-license powerful and flexible Python library for the creation, manipulation, and analysis of complex networks and graphs. It supports various types of graphs, including undirected, directed, and multi-graphs, allowing users to model relationships and structures efficiently [ref 4]
NetworkX provides a wide range of algorithms for graph theory and network analysis, such as shortest paths, clustering, centrality measures, and more. It is designed to handle graphs with millions of nodes and edges, making it suitable for applications in social networks, biology, transportation systems, and other domains. With its intuitive API and rich visualization capabilities, NetworkX is an essential tool for researchers and developers working with network data.
The library supports many standard graph algorithms such as clustering, link analysis, minimum spanning tree, shortest path, cliques, coloring, cuts, Erdos-Renyi or graph polynomial.
Sampling large graphs
Most datasets included in the PyG library contain an extremely large number of nodes and edges, making them impractical to visualize directly. To address this, we can extract (or sample) one or more subgraphs that are easier to display.
In our design, a subgraph is derived from the original large graph by sampling its nodes and edges based on a specified range of indices, as shown below:
In the illustration above, the sampled nodes are [12, ...., 19]
Implementation
For the sake of simplicity, let wraps the visualization of a graph neural network data into a class GNNPlotter which constructor takes 3 parameters [ref 4]:
Simplified constructor for directed (build) and undirected graph (build_directed) are also provided.
Let’s first review our implementation for sampling graph vertices and edges, which involves the following steps:
Finally, the visualization of the graph is achieved by drawing it with the draw method and overlaying the edges using draw_networkx_edges.
In our basic application, we define the layout, customize the color and size of the nodes, and set the title for the display.
Layouts
NetworkX provides several graph layouts to visually display undirected graphs. Each layout arranges nodes in a specific pattern, suited to different types of graphs and visualization purposes. Here's a brief overview of the common layouts available in networkx [ref 4]
Graph representation
Let's consider the Flickr data set included in Torch Geometric (PyG) described in [ref 5]. As a reminder, The Flickr dataset is a graph where nodes represent images and edges signify similarities between them [ref 6]. It includes 89,250 images and 899,756 relationships. Node features consist of image descriptions and shared properties.
Let's apply the class GNNPlotter methods to the Flickr data set, selecting the edges for nodes of index starting 12 to 21 included, using the spring layout.
Output: 319
The 319 vertices of the Flickr graph data set and its undirected edges are visualized using the spring layout. The visualization covers 319/89,250 = 0.35% of the entire dataset of images.
Comparing layouts
Undirected graph representation
Let’s demonstrate six common layouts for displaying subgraphs sampled from the Flickr dataset using 67 nodes.
Directed graph representation
Finally, let’s apply the same layout to a directed graph derived from the Flickr dataset.
✅ Thanks for reading. For comprehensive topics on geometric learning, including detailed analysis, reviews and exercises, subscribe to Hands-on Geometric Deep Learning
📘 References
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning. He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning", Packt Publishing ISBN 978-1-78712-238-3 and Hands-on Geometric Deep Learning newsletter.
💡 Appendix
A selection of edge indices in the range [10, 28] edges indices produces the graph with 1057 nodes and 1.191% coverage.
A selection of edge indices in the range [10, 21] edges indices produces the graph with 351 nodes and 0.039% coverage.
A selection of edge indices in the range [10, 15] edges indices produces the graph with 80 nodes and 0.008% coverage.
#GeometricDeepLearning #GraphNeuralNetwork #PyTorchGeometric #NetworkX