SlideShare a Scribd company logo
Mini-Batch Consistent Slot Set
Encoder For Scalable Set
Encoding
Andreis Bruno1, Jeffrey Ryan Willette1, Juho Lee1,2, Sung Ju Hwang1,2
1KAIST, South Korea
2AITRICS, South Korea
1
Many problems in machine learning involve converting a set of arbitrary size to a
single vector or set of vectors, the set encoding/representation.
The Set Encoding Problem
Encoder Set Encoding
This places a few symmetrical (sometimes probabilistic) restrictions on the encoder.
2
Property 1 A function 𝒇: 𝟐𝑿 → 𝒀 acting on sets must be permutation invariant to the
order of objects in the set, i.e. for any permutation 𝝅:
𝒇 𝒙𝟏, … , 𝒙𝑴 = 𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 .
Exchangeability A distribution for a set of random variables 𝑿 = 𝒙𝒊 𝒊&𝟏
𝑴
is
exchangeable if for an permutation 𝝅:
𝒑 𝑿 = 𝒑 𝝅 𝑿 .
Property 2 A function 𝒇: 𝑿𝑴 → 𝒀𝑴 acting on sets is a permutation equivariant
function if permutation of the input instances permutes the output labels, i.e. for any
permutation 𝝅:
𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 = [𝒇𝝅 𝟏 𝒙 , … , 𝒇𝝅 𝑴 𝒙 ]
Permutation Invariance & Equivariance
Bloem-Reddy, Benjamin, and Yee Whye Teh. "Probabilistic Symmetries and Invariant Neural Networks." J. Mach. Learn. Res. 21 (2020): 90-1.
3
Mini-Batch Consistent (MBC) Set Encoding
Given large sets, we want to be able to process the elements of the set in mini-
batches based on the available computational and memory resources.
Set encoders such as DeepSets and Set Transformers can be modified to do this but
not all can perform mini-batch encoding consistently. We formalize the
requirements for MBC set encoding below:
Property 5 𝐿𝑒𝑡 𝑿 ∈ 𝑹𝑴×𝒅 𝑏𝑒 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑒𝑑 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑿 = 𝑿𝟏 ∪ 𝑿𝟐 ∪ ⋯ ∪
𝑿𝒑 𝑎𝑛𝑑 𝒇: 𝑹𝑴𝒊×𝒅 → 𝑹𝒅"
𝑏𝑒 𝑎 𝑠𝑒𝑡 𝑒𝑛𝑐𝑜𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝒇 𝑿 =
𝒁. 𝐺𝑖𝑣𝑒𝑛 𝑎𝑛 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝒈: 𝒁𝒋 ∈ 𝑹𝒅"
𝒋&𝟏
𝒑
→ 𝑹𝒅"
, 𝒈 𝑎𝑛𝑑 𝒇 𝑎𝑟𝑒 𝑀𝑖𝑛𝑖 −
𝐵𝑎𝑡𝑐ℎ 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝒈 𝒇 𝑿𝟏 , … , 𝒇 𝑿𝒑 = 𝒇 𝑿 .
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
4
Violation of MBC: Set Transformer
We train Set Transformer on an image reconstruction task. At test time, we increase
the number of pixels and encode them in a mini-batch fashion.
The performance of the model degrades in the mini-batch setting. Additionally, it is
not immediately clear how to aggregate the encodings of the mini-batches.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
5
MBC Set Encoding
Deep Sets can trivially satisfy MBC by removing the message-passing layers. Set
Transformer, which is attention based, violates MBC.
Our goal is to design an attention based set encoder, such as Set Transformer, that
satisfies MBC. We achieve this by using slots.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
6
Slot Set Encoder (SSE)
We realize an MBC set encoder, SSE, by computing attention over slots instead of
between the elements of the set. This makes SSE amenable to mini-batch processing.
The SSE in Algorithm 1 is functionally composable over partitions of the input X for a
given slot initialization and any partition of X.
𝑺 ∼ 𝑁 𝜇, 𝑑𝑖𝑎𝑔 𝜎 ∈ 𝑅!×#
𝒂𝒕𝒕𝒏𝒊,𝒋 ≔ 𝝈 𝑀',( 𝑤ℎ𝑒𝑟𝑒 𝑀 ≔
1
C
𝑑
𝑘 𝑿 ⋅ 𝑞 𝑺 ) ∈ 𝑅*×+
G
𝑺 ≔ 𝑾𝑻
⋅ 𝑣 𝑿 ∈ 𝑅!× -
.
𝑤ℎ𝑒𝑟𝑒 𝑾𝒊,𝒋 ≔
𝒂𝒕𝒕𝒏𝒊,𝒋
∑/01
!
𝒂𝒕𝒕𝒏𝒊,𝒍
𝒇 𝑿 = 𝒈 𝒇 𝑿𝟏 , 𝒇 𝑿𝟐 , … , 𝒇 𝑿𝒑
𝒈 ∈ {𝑚𝑒𝑎𝑛, 𝑠𝑢𝑚, 𝑚𝑎𝑥, 𝑚𝑖𝑛)
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
7
Slot Set Encoder (SSE)
SSE is permutation invariant with respect to partitions of the input set in permutation
equivariant with respect to the order of slots.
Proposition 3 𝐹𝑜𝑟 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛𝑠 𝑆 ∈
𝑅X×W, 𝑡ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝑓 𝑎𝑛𝑑 𝑔 𝑎𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑖𝑛 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1 𝑎𝑟𝑒 𝑀𝐵𝐶 𝑓𝑜𝑟
𝒂𝒏𝒚 𝒑𝒂𝒓𝒕𝒊𝒕𝒊𝒐𝒏 𝒐𝒇 𝑿 𝑎𝑛𝑑 ℎ𝑒𝑛𝑐𝑒 𝑠𝑎𝑡𝑖𝑠𝑓𝑦 𝑡ℎ𝑒 𝑀𝐵𝐶 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦.
Proposition 4 𝐿𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑆 ∈ 𝑅X×W 𝑏𝑒 𝑎𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛
𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦. 𝐴𝑑𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙𝑙𝑦, 𝑙𝑒𝑡 𝑆𝑆𝐸 𝑋, 𝑆 𝑏𝑒 𝑡ℎ𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1, 𝑎𝑛𝑑
𝜋Y ∈ 𝑅V×V 𝑎𝑛𝑑 𝜋Z ∈ 𝑅X×X 𝑏𝑒 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛 𝑚𝑎𝑡𝑟𝑖𝑐𝑒𝑠. 𝑇ℎ𝑒𝑛
𝑺𝑺𝑬 𝝅𝐱 ⋅ 𝑿, 𝝅𝑺 ⋅ 𝑺 = 𝝅𝑺 ⋅ 𝑺𝑺𝑬(𝑿, 𝑺)
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
8
Hierarchical Slot Set Encoder
We can stack multiple Slot Set Encoders on top of each to obtain a hierarchy of slot
set encoders. This allows us to model higher-order interactions across slots.
𝑓 𝑋 = 𝑆𝑆𝐸 … 𝑆𝑆𝐸] 𝑆𝑆𝐸^ 𝑋
The resulting set encoding function 𝑓(𝑋) satisfies the MBC property as well as
Propositions 3 & 4.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
9
Approximate Mini-Batch Training of MBC Encoders
How can we train Slot Set Encoders in the large scale or streaming setting?
Both DeepSets and Set Transformers require gradients to be taken with respect to
the full set at train time.
In the Mini-Batch Consistent Setting, this is not feasible for large sets or when set
elements arrive in a stream.
We train MBC models on partitions of sets sampled at each iteration of the
optimization process and find that it works well empirically.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
10
Experiments: Point Cloud Classification (ModelNet40)
We first show that SSE is a valid set encoding function on the point cloud
classification task. Here, no mini-batch encoding is used.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
Encoder Set Encoding Classifier
11
Experiments: Image Reconstruction (CelebA)
We perform image reconstruction using Conditional Neural Processes. We replace
the aggregation function with DeepSets, Set Transformer or Slot Set Encoder. We test
this model in the mini-batch setting where data arrives in a stream.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
12

More Related Content

PDF
Cs229 notes7a
PDF
Skiena algorithm 2007 lecture18 application of dynamic programming
PDF
upgrade2013
PPTX
Variational Auto Encoder and the Math Behind
PDF
redes neuronales tipo Som
PDF
Variational Autoencoders For Image Generation
PDF
Self Organinising neural networks
PDF
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Cs229 notes7a
Skiena algorithm 2007 lecture18 application of dynamic programming
upgrade2013
Variational Auto Encoder and the Math Behind
redes neuronales tipo Som
Variational Autoencoders For Image Generation
Self Organinising neural networks
Variational Autoencoded Regression of Visual Data with Generative Adversarial...

Similar to Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding (20)

PPT
ASIC construction details for design prospective
PDF
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
PPTX
2021 03-01-on the relationship between self-attention and convolutional layers
PPT
Iterative Frame Decimation and Watermarking for Human Motion Animation
PPTX
Batch normalization presentation
PDF
論文紹介:Learning With Neighbor Consistency for Noisy Labels
PDF
Ivd soda-2019
PDF
SAE: Structured Aspect Extraction
PDF
Vectorized VByte Decoding
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
PDF
International Journal of Engineering Research and Development
PDF
Iclr2016 vaeまとめ
PDF
Background Estimation Using Principal Component Analysis Based on Limited Mem...
PDF
Machine Learning 1
PPTX
Generating super resolution images using transformers
PPTX
Convolutional Error Control Coding
PPTX
stargan oral on icassp, a template for oral PPT
PDF
2021 04-01-dalle
PDF
is anyone_interest_in_auto-encoding_variational-bayes
PPTX
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
ASIC construction details for design prospective
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
2021 03-01-on the relationship between self-attention and convolutional layers
Iterative Frame Decimation and Watermarking for Human Motion Animation
Batch normalization presentation
論文紹介:Learning With Neighbor Consistency for Noisy Labels
Ivd soda-2019
SAE: Structured Aspect Extraction
Vectorized VByte Decoding
Convolutional Neural Network (CNN) presentation from theory to code in Theano
International Journal of Engineering Research and Development
Iclr2016 vaeまとめ
Background Estimation Using Principal Component Analysis Based on Limited Mem...
Machine Learning 1
Generating super resolution images using transformers
Convolutional Error Control Coding
stargan oral on icassp, a template for oral PPT
2021 04-01-dalle
is anyone_interest_in_auto-encoding_variational-bayes
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ad

More from MLAI2 (20)

PDF
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
PDF
Online Hyperparameter Meta-Learning with Hypergradient Distillation
PDF
Online Coreset Selection for Rehearsal-based Continual Learning
PDF
Representational Continuity for Unsupervised Continual Learning
PDF
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
PDF
Skill-Based Meta-Reinforcement Learning
PDF
Edge Representation Learning with Hypergraphs
PDF
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
PDF
Task Adaptive Neural Network Search with Meta-Contrastive Learning
PDF
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
PDF
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
PDF
Accurate Learning of Graph Representations with Graph Multiset Pooling
PDF
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
PDF
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
PDF
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
PDF
Adversarial Self-Supervised Contrastive Learning
PDF
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
PDF
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
PDF
Cost-effective Interactive Attention Learning with Neural Attention Process
PDF
Adversarial Neural Pruning with Latent Vulnerability Suppression
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Coreset Selection for Rehearsal-based Continual Learning
Representational Continuity for Unsupervised Continual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Skill-Based Meta-Reinforcement Learning
Edge Representation Learning with Hypergraphs
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Accurate Learning of Graph Representations with Graph Multiset Pooling
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
Adversarial Self-Supervised Contrastive Learning
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Cost-effective Interactive Attention Learning with Neural Attention Process
Adversarial Neural Pruning with Latent Vulnerability Suppression
Ad

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Spectroscopy.pptx food analysis technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The AUB Centre for AI in Media Proposal.docx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
sap open course for s4hana steps from ECC to s4
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Per capita expenditure prediction using model stacking based on satellite ima...
Spectroscopy.pptx food analysis technology
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding

  • 1. Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding Andreis Bruno1, Jeffrey Ryan Willette1, Juho Lee1,2, Sung Ju Hwang1,2 1KAIST, South Korea 2AITRICS, South Korea 1
  • 2. Many problems in machine learning involve converting a set of arbitrary size to a single vector or set of vectors, the set encoding/representation. The Set Encoding Problem Encoder Set Encoding This places a few symmetrical (sometimes probabilistic) restrictions on the encoder. 2
  • 3. Property 1 A function 𝒇: 𝟐𝑿 → 𝒀 acting on sets must be permutation invariant to the order of objects in the set, i.e. for any permutation 𝝅: 𝒇 𝒙𝟏, … , 𝒙𝑴 = 𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 . Exchangeability A distribution for a set of random variables 𝑿 = 𝒙𝒊 𝒊&𝟏 𝑴 is exchangeable if for an permutation 𝝅: 𝒑 𝑿 = 𝒑 𝝅 𝑿 . Property 2 A function 𝒇: 𝑿𝑴 → 𝒀𝑴 acting on sets is a permutation equivariant function if permutation of the input instances permutes the output labels, i.e. for any permutation 𝝅: 𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 = [𝒇𝝅 𝟏 𝒙 , … , 𝒇𝝅 𝑴 𝒙 ] Permutation Invariance & Equivariance Bloem-Reddy, Benjamin, and Yee Whye Teh. "Probabilistic Symmetries and Invariant Neural Networks." J. Mach. Learn. Res. 21 (2020): 90-1. 3
  • 4. Mini-Batch Consistent (MBC) Set Encoding Given large sets, we want to be able to process the elements of the set in mini- batches based on the available computational and memory resources. Set encoders such as DeepSets and Set Transformers can be modified to do this but not all can perform mini-batch encoding consistently. We formalize the requirements for MBC set encoding below: Property 5 𝐿𝑒𝑡 𝑿 ∈ 𝑹𝑴×𝒅 𝑏𝑒 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑒𝑑 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑿 = 𝑿𝟏 ∪ 𝑿𝟐 ∪ ⋯ ∪ 𝑿𝒑 𝑎𝑛𝑑 𝒇: 𝑹𝑴𝒊×𝒅 → 𝑹𝒅" 𝑏𝑒 𝑎 𝑠𝑒𝑡 𝑒𝑛𝑐𝑜𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝒇 𝑿 = 𝒁. 𝐺𝑖𝑣𝑒𝑛 𝑎𝑛 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝒈: 𝒁𝒋 ∈ 𝑹𝒅" 𝒋&𝟏 𝒑 → 𝑹𝒅" , 𝒈 𝑎𝑛𝑑 𝒇 𝑎𝑟𝑒 𝑀𝑖𝑛𝑖 − 𝐵𝑎𝑡𝑐ℎ 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝒈 𝒇 𝑿𝟏 , … , 𝒇 𝑿𝒑 = 𝒇 𝑿 . Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 4
  • 5. Violation of MBC: Set Transformer We train Set Transformer on an image reconstruction task. At test time, we increase the number of pixels and encode them in a mini-batch fashion. The performance of the model degrades in the mini-batch setting. Additionally, it is not immediately clear how to aggregate the encodings of the mini-batches. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 5
  • 6. MBC Set Encoding Deep Sets can trivially satisfy MBC by removing the message-passing layers. Set Transformer, which is attention based, violates MBC. Our goal is to design an attention based set encoder, such as Set Transformer, that satisfies MBC. We achieve this by using slots. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 6
  • 7. Slot Set Encoder (SSE) We realize an MBC set encoder, SSE, by computing attention over slots instead of between the elements of the set. This makes SSE amenable to mini-batch processing. The SSE in Algorithm 1 is functionally composable over partitions of the input X for a given slot initialization and any partition of X. 𝑺 ∼ 𝑁 𝜇, 𝑑𝑖𝑎𝑔 𝜎 ∈ 𝑅!×# 𝒂𝒕𝒕𝒏𝒊,𝒋 ≔ 𝝈 𝑀',( 𝑤ℎ𝑒𝑟𝑒 𝑀 ≔ 1 C 𝑑 𝑘 𝑿 ⋅ 𝑞 𝑺 ) ∈ 𝑅*×+ G 𝑺 ≔ 𝑾𝑻 ⋅ 𝑣 𝑿 ∈ 𝑅!× - . 𝑤ℎ𝑒𝑟𝑒 𝑾𝒊,𝒋 ≔ 𝒂𝒕𝒕𝒏𝒊,𝒋 ∑/01 ! 𝒂𝒕𝒕𝒏𝒊,𝒍 𝒇 𝑿 = 𝒈 𝒇 𝑿𝟏 , 𝒇 𝑿𝟐 , … , 𝒇 𝑿𝒑 𝒈 ∈ {𝑚𝑒𝑎𝑛, 𝑠𝑢𝑚, 𝑚𝑎𝑥, 𝑚𝑖𝑛) Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 7
  • 8. Slot Set Encoder (SSE) SSE is permutation invariant with respect to partitions of the input set in permutation equivariant with respect to the order of slots. Proposition 3 𝐹𝑜𝑟 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛𝑠 𝑆 ∈ 𝑅X×W, 𝑡ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝑓 𝑎𝑛𝑑 𝑔 𝑎𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑖𝑛 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1 𝑎𝑟𝑒 𝑀𝐵𝐶 𝑓𝑜𝑟 𝒂𝒏𝒚 𝒑𝒂𝒓𝒕𝒊𝒕𝒊𝒐𝒏 𝒐𝒇 𝑿 𝑎𝑛𝑑 ℎ𝑒𝑛𝑐𝑒 𝑠𝑎𝑡𝑖𝑠𝑓𝑦 𝑡ℎ𝑒 𝑀𝐵𝐶 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦. Proposition 4 𝐿𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑆 ∈ 𝑅X×W 𝑏𝑒 𝑎𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦. 𝐴𝑑𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙𝑙𝑦, 𝑙𝑒𝑡 𝑆𝑆𝐸 𝑋, 𝑆 𝑏𝑒 𝑡ℎ𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1, 𝑎𝑛𝑑 𝜋Y ∈ 𝑅V×V 𝑎𝑛𝑑 𝜋Z ∈ 𝑅X×X 𝑏𝑒 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛 𝑚𝑎𝑡𝑟𝑖𝑐𝑒𝑠. 𝑇ℎ𝑒𝑛 𝑺𝑺𝑬 𝝅𝐱 ⋅ 𝑿, 𝝅𝑺 ⋅ 𝑺 = 𝝅𝑺 ⋅ 𝑺𝑺𝑬(𝑿, 𝑺) Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 8
  • 9. Hierarchical Slot Set Encoder We can stack multiple Slot Set Encoders on top of each to obtain a hierarchy of slot set encoders. This allows us to model higher-order interactions across slots. 𝑓 𝑋 = 𝑆𝑆𝐸 … 𝑆𝑆𝐸] 𝑆𝑆𝐸^ 𝑋 The resulting set encoding function 𝑓(𝑋) satisfies the MBC property as well as Propositions 3 & 4. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 9
  • 10. Approximate Mini-Batch Training of MBC Encoders How can we train Slot Set Encoders in the large scale or streaming setting? Both DeepSets and Set Transformers require gradients to be taken with respect to the full set at train time. In the Mini-Batch Consistent Setting, this is not feasible for large sets or when set elements arrive in a stream. We train MBC models on partitions of sets sampled at each iteration of the optimization process and find that it works well empirically. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 10
  • 11. Experiments: Point Cloud Classification (ModelNet40) We first show that SSE is a valid set encoding function on the point cloud classification task. Here, no mini-batch encoding is used. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. Encoder Set Encoding Classifier 11
  • 12. Experiments: Image Reconstruction (CelebA) We perform image reconstruction using Conditional Neural Processes. We replace the aggregation function with DeepSets, Set Transformer or Slot Set Encoder. We test this model in the mini-batch setting where data arrives in a stream. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 12