Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding

Mini-Batch Consistent Slot Set
Encoder For Scalable Set
Encoding
Andreis Bruno1, Jeffrey Ryan Willette1, Juho Lee1,2, Sung Ju Hwang1,2
1KAIST, South Korea
2AITRICS, South Korea
1

Many problems in machine learning involve converting a set of arbitrary size to a
single vector or set of vectors, the set encoding/representation.
The Set Encoding Problem
Encoder Set Encoding
This places a few symmetrical (sometimes probabilistic) restrictions on the encoder.
2

Property 1 A function 𝒇: 𝟐𝑿 → 𝒀 acting on sets must be permutation invariant to the
order of objects in the set, i.e. for any permutation 𝝅:
𝒇 𝒙𝟏, … , 𝒙𝑴 = 𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 .
Exchangeability A distribution for a set of random variables 𝑿 = 𝒙𝒊 𝒊&𝟏
𝑴
is
exchangeable if for an permutation 𝝅:
𝒑 𝑿 = 𝒑 𝝅 𝑿 .
Property 2 A function 𝒇: 𝑿𝑴 → 𝒀𝑴 acting on sets is a permutation equivariant
function if permutation of the input instances permutes the output labels, i.e. for any
permutation 𝝅:
𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 = [𝒇𝝅 𝟏 𝒙 , … , 𝒇𝝅 𝑴 𝒙 ]
Permutation Invariance & Equivariance
Bloem-Reddy, Benjamin, and Yee Whye Teh. "Probabilistic Symmetries and Invariant Neural Networks." J. Mach. Learn. Res. 21 (2020): 90-1.
3

Mini-Batch Consistent (MBC) Set Encoding
Given large sets, we want to be able to process the elements of the set in mini-
batches based on the available computational and memory resources.
Set encoders such as DeepSets and Set Transformers can be modified to do this but
not all can perform mini-batch encoding consistently. We formalize the
requirements for MBC set encoding below:
Property 5 𝐿𝑒𝑡 𝑿 ∈ 𝑹𝑴×𝒅 𝑏𝑒 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑒𝑑 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑿 = 𝑿𝟏 ∪ 𝑿𝟐 ∪ ⋯ ∪
𝑿𝒑 𝑎𝑛𝑑 𝒇: 𝑹𝑴𝒊×𝒅 → 𝑹𝒅"
𝑏𝑒 𝑎 𝑠𝑒𝑡 𝑒𝑛𝑐𝑜𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝒇 𝑿 =
𝒁. 𝐺𝑖𝑣𝑒𝑛 𝑎𝑛 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝒈: 𝒁𝒋 ∈ 𝑹𝒅"
𝒋&𝟏
𝒑
→ 𝑹𝒅"
, 𝒈 𝑎𝑛𝑑 𝒇 𝑎𝑟𝑒 𝑀𝑖𝑛𝑖 −
𝐵𝑎𝑡𝑐ℎ 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝒈 𝒇 𝑿𝟏 , … , 𝒇 𝑿𝒑 = 𝒇 𝑿 .
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
4

Violation of MBC: Set Transformer
We train Set Transformer on an image reconstruction task. At test time, we increase
the number of pixels and encode them in a mini-batch fashion.
The performance of the model degrades in the mini-batch setting. Additionally, it is
not immediately clear how to aggregate the encodings of the mini-batches.
5

MBC Set Encoding
Deep Sets can trivially satisfy MBC by removing the message-passing layers. Set
Transformer, which is attention based, violates MBC.
Our goal is to design an attention based set encoder, such as Set Transformer, that
satisfies MBC. We achieve this by using slots.
6

Slot Set Encoder (SSE)
We realize an MBC set encoder, SSE, by computing attention over slots instead of
between the elements of the set. This makes SSE amenable to mini-batch processing.
The SSE in Algorithm 1 is functionally composable over partitions of the input X for a
given slot initialization and any partition of X.
𝑺 ∼ 𝑁 𝜇, 𝑑𝑖𝑎𝑔 𝜎 ∈ 𝑅!×#
𝒂𝒕𝒕𝒏𝒊,𝒋 ≔ 𝝈 𝑀',( 𝑤ℎ𝑒𝑟𝑒 𝑀 ≔
1
C
𝑑
𝑘 𝑿 ⋅ 𝑞 𝑺 ) ∈ 𝑅*×+
G
𝑺 ≔ 𝑾𝑻
⋅ 𝑣 𝑿 ∈ 𝑅!× -
.
𝑤ℎ𝑒𝑟𝑒 𝑾𝒊,𝒋 ≔
𝒂𝒕𝒕𝒏𝒊,𝒋
∑/01
!
𝒂𝒕𝒕𝒏𝒊,𝒍
𝒇 𝑿 = 𝒈 𝒇 𝑿𝟏 , 𝒇 𝑿𝟐 , … , 𝒇 𝑿𝒑
𝒈 ∈ {𝑚𝑒𝑎𝑛, 𝑠𝑢𝑚, 𝑚𝑎𝑥, 𝑚𝑖𝑛)
7

Slot Set Encoder (SSE)
SSE is permutation invariant with respect to partitions of the input set in permutation
equivariant with respect to the order of slots.
Proposition 3 𝐹𝑜𝑟 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛𝑠 𝑆 ∈
𝑅X×W, 𝑡ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝑓 𝑎𝑛𝑑 𝑔 𝑎𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑖𝑛 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1 𝑎𝑟𝑒 𝑀𝐵𝐶 𝑓𝑜𝑟
𝒂𝒏𝒚 𝒑𝒂𝒓𝒕𝒊𝒕𝒊𝒐𝒏 𝒐𝒇 𝑿 𝑎𝑛𝑑 ℎ𝑒𝑛𝑐𝑒 𝑠𝑎𝑡𝑖𝑠𝑓𝑦 𝑡ℎ𝑒 𝑀𝐵𝐶 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦.
Proposition 4 𝐿𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑆 ∈ 𝑅X×W 𝑏𝑒 𝑎𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛
𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦. 𝐴𝑑𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙𝑙𝑦, 𝑙𝑒𝑡 𝑆𝑆𝐸 𝑋, 𝑆 𝑏𝑒 𝑡ℎ𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1, 𝑎𝑛𝑑
𝜋Y ∈ 𝑅V×V 𝑎𝑛𝑑 𝜋Z ∈ 𝑅X×X 𝑏𝑒 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛 𝑚𝑎𝑡𝑟𝑖𝑐𝑒𝑠. 𝑇ℎ𝑒𝑛
𝑺𝑺𝑬 𝝅𝐱 ⋅ 𝑿, 𝝅𝑺 ⋅ 𝑺 = 𝝅𝑺 ⋅ 𝑺𝑺𝑬(𝑿, 𝑺)
8

Hierarchical Slot Set Encoder
We can stack multiple Slot Set Encoders on top of each to obtain a hierarchy of slot
set encoders. This allows us to model higher-order interactions across slots.
𝑓 𝑋 = 𝑆𝑆𝐸 … 𝑆𝑆𝐸] 𝑆𝑆𝐸^ 𝑋
The resulting set encoding function 𝑓(𝑋) satisfies the MBC property as well as
Propositions 3 & 4.
9

Approximate Mini-Batch Training of MBC Encoders
How can we train Slot Set Encoders in the large scale or streaming setting?
Both DeepSets and Set Transformers require gradients to be taken with respect to
the full set at train time.
In the Mini-Batch Consistent Setting, this is not feasible for large sets or when set
elements arrive in a stream.
We train MBC models on partitions of sets sampled at each iteration of the
optimization process and find that it works well empirically.
10

Experiments: Point Cloud Classification (ModelNet40)
We first show that SSE is a valid set encoding function on the point cloud
classification task. Here, no mini-batch encoding is used.
Encoder Set Encoding Classifier
11

Experiments: Image Reconstruction (CelebA)
We perform image reconstruction using Conditional Neural Processes. We replace
the aggregation function with DeepSets, Set Transformer or Slot Set Encoder. We test
this model in the mini-batch setting where data arrives in a stream.
12

Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding

More Related Content

Similar to Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding (20)

More from MLAI2 (20)

Recently uploaded (20)

Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding