SlideShare a Scribd company logo
From Deep Learning to Deep Reasoning
14/08/2021 1
Tutorial at KDD, August 14th 2021
Truyen Tran, Vuong Le, Hung Le and Thao Le
{truyen.tran,vuong.le,thai.le,thao.le}@deakin.edu.au
https://bit.ly/37DYQn7
Part A: Learning to reason
Logistics
14/08/2021 2
Truyen Tran Vuong Le Hung Le Thao Le
https://bit.ly/37DYQn7
Agenda
• Introduction
• Part A: Learning-to-reason framework
• Part B: Reasoning over unstructured and structured data
• Part C: Memory | Data efficiency | Recursive reasoning
14/08/2021 3
2012
2016
AusDM 2016
Turing Awards 2018
GPT-3 2020
DL: 8 years snapshot
DL has been fantastic, but …
• It is great at interpolating
•  data hungry to cover all variations and smooth local manifolds
•  little systematic generalization (novel combinations)
• Lack of human-perceived reasoning capability
• Lack natural mechanism to incorporate prior knowledge, e.g., common sense
• No built-in causal mechanisms
•  Have trust issues!
• To be fair, may of these problems are common in statistical learning!
14/08/2021 5
Why still DL in 2021?
Theoretical
Expressiveness: Neural
nets can approximate any
function.
Learnability: Neural nets
are trained easily.
Generalisability: Neural
nets generalize surprisingly
well to unseen data.
Practical
Generality: Applicable to
many domains.
Competitive: DL is hard to
beat as long as there are
data to train.
Scalability: DL is better with
more data, and it is very
scalable.
The next AI/ML challenge
2020s-2030s
 Learning + reasoning, general
purpose, human-like
 Has contextual and common-
sense reasoning
 Requires less data
 Adapt to change
 Explainable
Photo credit: DARPA
Toward deeper reasoning
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
Single
Image credit: VectorStock | Wikimedia
Perception
Theory of mind
Recursive reasoning
Facts
Semantics
Events and relations
Working space
Memory
System 2
• Holds hypothetical thought
• Decoupling from representation
• Working memory size is not essential.
Its attentional control is.
14/08/2021 9
Figure credit: Jonathan Hui
Reasoning in Probabilistic Graphical Models (PGM)
• Assuming models are fully specified
(e.g., by hand or learnt)
• Estimate MAP as energy
minimization
• Compute marginal probability
• Compute expectation &
normalisation constant
• Key algorithm: Pearl’s Belief
Propagation, a.k.a Sum-Product
algorithm in factor graphs.
• Known result in 2001-2003: BP
minimises Bethe free-energy
minimization.
14/08/2021 10
Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free
energy." Advances in neural information processing systems. 2003.
Can we learn to infer directly from data
without full specification of models?
14/08/2021 11
Agenda
• Introduction
• Part A: Learning-to-reason framework
• Part B: Reasoning over unstructured and structured data
• Part C: Memory | Data efficiency | Recursive reasoning
14/08/2021 12
Part A: Sub-topics
• Reasoning as a prediction skill that can be learnt from data.
• Question answering as zero-shot learning.
• Neural network operations for learning to reason:
• Concept-object binding.
• Attention & transformers.
• Dynamic neural networks, conditional computation & differentiable programming.
• Reasoning as iterative representation refinement & query-driven program
synthesis and execution
• Compositional attention networks.
• Neural module networks.
• Combinatorics reasoning
14/08/2021 13
Learning to reason
• Learning is to self-improve by experiencing ~
acquiring knowledge & skills
• Reasoning is to deduce knowledge from
previously acquired knowledge in response to a
query (or a cues)
• Learning to reason is to improve the ability to
decide if a knowledge base entails a predicate.
• E.g., given a video f, determines if the person with the
hat turns before singing.
• Hypotheses:
• Reasoning as just-in-time program synthesis.
• It employs conditional computation.
14/08/2021 14
Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM
(JACM) 44.5 (1997): 697-725.
(Dan Roth; ACM Fellow; IJCAI
John McCarthy Award)
Learning to reason, a definition
14/08/2021 15
Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM
(JACM) 44.5 (1997): 697-725.
E.g., given a video f, determines if the person with the
hat turns before singing.
Practical setting: (query,database,answer) triplets
• This is very general:
• Classification: Query = what is this? Database = data.
• Regression: Query = how much? Database = data.
• QA: Query = NLP question. Database = context/image/text.
• Multi-task learning: Query = task ID. Database = data.
• Zero-shot learning: Query = task description. Database = data.
• Drug-protein binding: Query = drug. Database = protein.
• Recommender system: Query = User (or item). Database =
inventories (or user base);
14/08/2021 16
Can neural networks reason?
Reasoning is not necessarily
achieved by making logical
inferences
There is a continuity between
[algebraically rich inference] and
[connecting together trainable
learning systems]
Central to reasoning is composition
rules to guide the combinations of
modules to address new tasks
14/08/2021 17
“When we observe a visual scene, when we
hear a complex sentence, we are able to
explain in formal terms the relation of the
objects in the scene, or the precise meaning
of the sentence components. However, there
is no evidence that such a formal analysis
necessarily takes place: we see a scene, we
hear a sentence, and we just know what they
mean. This suggests the existence of a
middle layer, already a form of reasoning, but
not yet formal or logical.”
Bottou, Léon. "From machine learning to machine
reasoning." Machine learning 94.2 (2014): 133-149.
Hypotheses
• Reasoning as just-in-time program synthesis.
• It employs conditional computation.
• Reasoning is recursive, e.g., mental travel.
14/08/2021 18
Two approaches to neural reasoning
• Implicit chaining of predicates through recurrence:
• Step-wise query-specific attention to relevant concepts & relations.
• Iterative concept refinement & combination, e.g., through a working
memory.
• Answer is computed from the last memory state & question embedding.
• Explicit program synthesis:
• There is a set of modules, each performs an pre-defined operation.
• Question is parse into a symbolic program.
• The program is implemented as a computational graph constructed by
chaining separate modules.
• The program is executed to compute an answer.
14/08/2021 19
In search for basic neural operators for reasoning
• Basics:
• Neuron as feature detector  Sensor, filter
• Computational graph  Circuit
• Skip-connection  Short circuit
• Essentials
• Multiplicative gates  AND gate, Transistor,
Resistor
• Attention mechanism  SWITCH gate
• Memory + forgetting  Capacitor + leakage
• Compositionality  Modular design
• ..
14/08/2021 20
Photo credit: Nicola Asuni
Part A: Sub-topics
• Reasoning as a prediction skill that can be learnt from data.
• Question answering as zero-shot learning.
• Neural network operations for learning to reason:
• Concept-object binding.
• Attention & transformers.
• Dynamic neural networks, conditional computation & differentiable programming.
• Reasoning as iterative representation refinement & query-driven program
synthesis and execution.
• Compositional attention networks.
• Reasoning as Neural module networks.
• Combinatorics reasoning
14/08/2021 21
Concept-object binding
• Perceived data (e.g., visual objects) may not share the same semantic space
with high-level concepts.
• Binding between concept-object enables reasoning at the concept level
14/08/2021 22
Example of concept-object binding in LOGNet (Le et al, IJCAI’2020)
More reading: Greff, Klaus, Sjoerd van Steenkiste, and Jürgen Schmidhuber. "On the
binding problem in artificial neural networks." arXiv preprint arXiv:2012.05208 (2020).
Attentions: Picking up only what is needed at a step
• Need attention model to select or ignore
certain computations or inputs
• Can be “soft” (differentiable) or “hard”
(requires RL)
• Needed for selecting predicates in
reasoning.
• Attention provides a short-cut  long-
term dependencies
• Needed for long chain of reasoning.
• Also encourages sparsity if done right!
http://distill.pub/2016/augmented-rnns/
Fast weights | HyperNet – the multiplicative interaction
• Early ideas in early 1990s by Juergen Schmidhuber and
collaborators.
• Data-dependent weights | Using a controller to generate weights of
the main net.
14/08/2021 24
Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." arXiv preprint arXiv:1609.09106 (2016).
Memory networks: Holding the data ready for inference
• Input is a set  Load into
memory, which is NOT updated.
• State is a RNN with attention
reading from inputs
• Concepts: Query, key and
content + Content addressing.
• Deep models, but constant path
length from input to output.
• Equivalent to a RNN with shared
input set.
14/08/2021 25
Sukhbaatar, Sainbayar, Jason Weston, and Rob
Fergus. "End-to-end memory networks." Advances in
neural information processing systems. 2015.
Transformers: Analogical reasoning through self-
attention
14/08/2021 26
Tay, Yi, et al. "Efficient transformers: A survey." arXiv
preprint arXiv:2009.06732 (2020).
State
Key
Query Memory
Transformer as implicit reasoning
• Recall: Reasoning as (free-) energy minimisation
• The classic Belief Propagation algorithm is minimization algorithm
of the Bethe free-energy!
• Transformer has relational, iterative state refinement makes it
a great candidate for implicit relational reasoning.
14/08/2021 27
Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint
arXiv:2008.02217 (2020).
Transformer v.s. memory networks
• Memory network:
• Attention to input set
• One hidden state update at a time.
• Final state integrate information of the set, conditioned on the query.
• Transformer:
• Loading all inputs into working memory
• Assigns one hidden state per input element.
• All hidden states (including those from the query) to compute the answer.
14/08/2021 28
Universal transformers
14/08/2021 29
https://ai.googleblog.com/2018/08/moving-beyond-translation-with.html
Dehghani, Mostafa, et al. "Universal
Transformers." International Conference on
Learning Representations. 2018.
Dynamic neural networks
• Memory-Augmented Neural Networks
• Modular program layout
• Program synthesis
14/08/2021 30
Neural Turing machine (NTM)
A memory-augmented neural network (MANN)
• A controller that takes
input/output and talks to an
external memory module.
• Memory has read/write
operations.
• The main issue is where to
write, and how to update the
memory state.
• All operations are
differentiable.
Source: rylanschaeffer.github.io
MANN for reasoning
• Three steps:
• Store data into memory
• Read query, process sequentially, consult memory
• Output answer
• Behind the scene:
• Memory contains data & results of intermediate steps
• LOGNet does the same, memory consists of object
representations
• Drawbacks of current MANNs:
• No memory of controllers  Less modularity and
compositionality when query is complex
• No memory of relations  Much harder to chain predicates.
14/08/2021 32
Source: rylanschaeffer.github.io
Part A: Sub-topics
• Reasoning as a prediction skill that can be learnt from data.
• Question answering as zero-shot learning.
• Neural network operations for learning to reason:
• Concept-object binding.
• Attention & transformers.
• Dynamic neural networks, conditional computation & differentiable programming.
• Reasoning as iterative representation refinement & query-driven
program synthesis and execution.
• Compositional attention networks.
• Reasoning as Neural module networks.
• Combinatorics reasoning
14/08/2021 33
MAC Net: Recurrent,
iterative representation
refinement
14/08/2021 34
Hudson, Drew A., and Christopher D. Manning. "Compositional attention
networks for machine reasoning." ICLR 2018.
Module networks
(reasoning by constructing and executing neural programs)
• Reasoning as laying
out modules to reach
an answer
• Composable neural
architecture 
question parsed as
program (layout of
modules)
• A module is a function
(x  y), could be a
sub-reasoning process
((x, q)  y).
14/08/2021 35
https://bair.berkeley.edu/blog/2017/06/20/learning-to-reason-with-neural-module-networks/
Putting things together:
A framework for visual
reasoning
14/08/2021 36
@Truyen Tran & Vuong Le, Deakin Uni
Part A: Sub-topics
• Reasoning as a prediction skill that can be learnt from data.
• Question answering as zero-shot learning.
• Neural network operations for learning to reason:
• Concept-object binding.
• Attention & transformers.
• Dynamic neural networks, conditional computation & differentiable programming.
• Reasoning as iterative representation refinement & query-driven program
synthesis and execution.
• Compositional attention networks.
• Reasoning as Neural module networks.
• Combinatorics reasoning
14/08/2021 37
Implement combinatorial algorithms
with neural networks
38
Generalizable
Inflexible
Noisy
High dimensional
Train neural processor P to imitate algorithm A
Processor P:
(a) aligned with the
computations of the target
algorithm;
(b) operates by matrix
multiplications, hence
natively admits useful
gradients;
(c) operates over high-
dimensional latent spaces
Veličković, Petar, and Charles Blundell. "Neural Algorithmic Reasoning." arXiv preprint arXiv:2105.02761 (2021).
Processor as RNN
• Do not assume knowing the
structure of the input, input as a
sequence
not really reasonable, harder to
generalize
• RNN is Turing-complete
 can simulate any algorithm
• But, it is not easy to learn the
simulation from data (input-
output)
Pointer network
39
Assume O(N) memory
And O(N^2) computation
N is the size of input
Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks."
In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2, pp. 2692-2700. 2015.
Processor as MANN
• MANN simulates neural
computers or Turing
machine ideal for
implement algorithms
• Sequential input, no
assumption on input
structure
• Assume O(1) memory
and O(N) computation
40
Graves, A., Wayne, G., Reynolds, M. et al. Hybrid computing using a neural network with dynamic external memory. Nature 538, 471–476 (2016)
Sequential encoding of graphs
41
• Each node is associated with random one-hot
or binary features
• Output is the features of the solution
[x1,y1, feature1],
[x2,y2, feature2],
…
[feature4],
[feature2],
…
Geometry
[node_feature1, node_feature2, edge12],
[node_feature1, node_feature2, edge13],
…
[node_feature4],
[node_feature2],
…
Graph
Convex
Hull
TSP
Shortest
Path
Minimum
Spanning
Tree
Le, Hung, Truyen Tran, and Svetha Venkatesh. "Self-attentive associative memory." In International Conference on Machine Learning, pp. 5682-5691. PMLR, 2020.
DNC: graph
reasoning
42
Graves, A., Wayne, G., Reynolds, M. et al. Hybrid computing using a neural network with dynamic external memory. Nature 538, 471–476 (2016)
NUTM: learning multiple algorithms at once
43
Le, Hung, Truyen Tran, and Svetha Venkatesh. "Neural Stored-program Memory."
In International Conference on Learning Representations. 2019.
Processor as graph neural network (GNN)
44
https://petar-v.com/talks/Algo-WWW.pdf
Veličković, Petar, Rex Ying, Matilde Padovano, Raia Hadsell, and Charles Blundell.
"Neural Execution of Graph Algorithms." In International Conference on Learning Representations. 2019.
Motivation:
• Many algorithm operates on graphs
• Supervise graph neural networks with algorithm operation/step/final output
• Encoder-Process-Decode framework:
Attention Message
passing
Example: GNN for a specific problem (DNF counting)
• Count #assignments that satisfy disjuntive normal
form (DNF) formula
• Classical algorithm is P-hard O(mn)
• m: #clauses, n: #variables
• Supervised training on output-level
45
Best: O(m+n)
Abboud, Ralph, Ismail Ceylan, and Thomas Lukasiewicz. "Learning to reason: Leveraging neural networks for approximate DNF counting.“
In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 3097-3104. 2020.
Neural networks and algorithms alignment
46
Xu, Keylu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, and Stefanie Jegelka. "What Can Neural Networks Reason About?." ICLR 2020 (2020).
https://petar-v.com/talks/Algo-WWW.pdf
Neural exhaustive
search
GNN is aligned with Dynamic
Programming (DP)
47
Neural exhaustive
search
If alignment exists  step-by-step supervision
48
Veličković, Petar, Rex Ying, Matilde Padovano, Raia Hadsell, and Charles Blundell. "Neural Execution of Graph Algorithms." In International Conference on Learning Representations. 2019.
• Merely simulate the
classical graph algorithm,
generalizable
• No algorithm discovery
Joint training is
encouraged
Processor as Transformer
• Back to input sequence
(set), but stronger
generalization
• Transformer with encoder
mask ~ graph attention
• Use Transformer with:
• Binary representation of
numbers
• Dynamic conditional masking
49
Yan, Yujun, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, and Milad Hashemi.
"Neural Execution Engines: Learning to Execute Subroutines." Advances in Neural Information Processing Systems 33 (2020).
Next step
Masked
encoding
Decoding
Mask
prediction
Training with execution trace
50
End of part A
14/08/2021 51
https://bit.ly/37DYQn7
From Deep Learning to Deep Reasoning
14/08/2021 1
Tutorial at KDD, August 14th 2021
Truyen Tran, Vuong Le, Hung Le and Thao Le
{truyen.tran,vuong.le,thai.le,thao.le}@deakin.edu.au
https://bit.ly/37DYQn7
Part B: Reasoning over unstructured and structured data
Agenda
• Cross-modality reasoning, the case of vision-language
integration.
• Reasoning as set-set interaction.
• Relational reasoning
• Temporal reasoning
• Video question answering.
2
14/08/2021
Learning to Reason formulation
• Input:
• A knowledge context C
• A query q
• Output: an answer satisfying
• C can be
• structured: knowledge graphs
• unstructured: text, image, sound, video
Q: Is it simply an optimization problem like recognition, detection or even translation?
 No, because the logics from C, q into a is more complex than other solved optimization problems
 We can solve (some parts of) it with good structures and inference strategies
Q: “What affects her mobility?”
14/08/2021 3
A case study: Image Question Answering
• Realization
• C: visual content of an image
• q: a linguistic question
• a: a linguistic phrase as
the answer to q regarding K
• Challenges
• Reasoning through facts and logics
• Cross-modality integration
14/08/2021 4
Image QA: Question types
14/08/2021 Slide credit: Thao Minh Le 5
Image QA datasets
14/08/2021 Slide credit: Thao Minh Le 6
The two main themes in Image QA
• Neuro-symbolic reasoning
• Parse the question into a “program” of small steps
• Learn the generic steps as neural modules
• Use and reuse the modules for different programs
• Compositional reasoning
• Extract visual and linguistic individual- and joint- representation
• Reasoning happens on the structure of the representation
• Sets/graphs/sequences
• The representation got refined through multi-step compositional
reasoning
14/08/2021 7
Agenda
• Cross-modality reasoning, the case of vision-language
integration.
• Reasoning as set-set interaction.
• Relational reasoning
• Temporal reasoning
• Video question answering.
8
14/08/2021
A simple approach
 Issue: This is very
susceptible to the nuances of
images and questions
14/08/2021 Agrawal et al., 2015, Slide credit: Thao Minh Le 9
Reasoning as set-set interaction
• : a set of context objects
• Faster-RCNN regions
• CNN tubes
• q: a set of linguistic objects L.
- biLSTM embedding of q
 Reasoning is formulated as the interaction between the two sets O and L
for the answer a
14/08/2021 10
Set operations
• Reducing operation (eg: sum/average/max)
• Attention-based combination (Bahdanau et al. 2015)
• Attention weights as query-key dot product (Vaswani et al., 2017)
 Attention-based set ops seem very suitable for visual reasoning
14/08/2021 11
Attention-based reasoning
• Unidirectional attention
• Find relation score between parts in the context C to the question
q:
Options for f:
• Hermann et al. (2015)
• Chen et al. (2016)
• Normalized by softmax into attention weights
• Attended context vector:
 We can now extract information from the context that is “relevant” to the query
14/08/2021 12
Bottom-up-top-down attention (Anderson et al 2017)
• Bottom-up set construction: Choosing Faster-RCNN regions with
high class scores
• Top-down attention: Attending on visual features by question
 Q: How about attention from vision objects to linguistic objects?
14/08/2021 13
Bi-directional attention
• Question-context similarity measure
• Question-guided context attention
• Softmax across columns
• Context-guided question attention
• Softmax across rows
 Q: Probably not working for image qa where single words
does not have the co-reference with a region?
14/08/2021
Dynamic coattention networks for question answering (Seo et al., ICLR
2017) 14
Hierarchical co-attention for ImageQA
• The co-attention is found on a word-phrase-sentence hierarchy
 better cross-domain co-references
 Q: Can this be done on text qa as well?
 Q: How about questions with many reasoning hops?
14/08/2021 15
Multi-step compositional reasoning
• Complex question need multiple hops
of reasoning
• Relations inside the context are multi-
step themselves
• Single shot of attention won’t be
enough
• Single shot of information gathering is
definitely not enough
16
 Q: How to do multi-hop attentional reasoning?
14/08/2021 Figure: Hudson and Manning – ICLR 2018
Multi-step reasoning - Memory, Attention, and Composition (MAC
Nets)
• Attention reasoning is done through multiple sequential steps.
• Each step is done with a recurrent neural cell
• What is the key differences to the normal RNN (LSTM/GRU) cell?
• Not a sequential input, it is sequential processing on static input set.
• Guided by the question through a controller.
14/08/2021 MAC network, Hudson and Manning – ICLR 2018 17
Multi-step attentional reasoning
• At each step, the controller decide what to
look next
• After each step, a piece of information is
gathered, represented through the
attention map on question words and
visual objects
• A common memory kept all the
information extracted toward an answer
14/08/2021
MAC network, Hudson and Manning – ICLR 2018
18
Multi-step attentional reasoning
• Step 1: attends to the “tiny blue
block”, updating m1
• Step 2: look for “the sphere in
front” m2.
• Step3: traverse from the cyan ball
to the final objective – the purple
cylinder,
19
14/08/2021
Reasoning as set-set interaction – a look back
• : a set of context objects
• q: a set of linguistic objects
• Reasoning is formulated as the
interaction between the two
sets O and L for the answer a
Q:What is the brown
animal sitting inside of?
 Q: Set-set interaction falls short for questions about relations between objects
14/08/2021 20
Agenda
• Cross-modality reasoning, the case of vision-language
integration.
• Reasoning as set-set interaction.
• Relational reasoning
• Temporal reasoning
• Video question answering.
21
14/08/2021
Reasoning on Graphs
• Relational questions: requiring explicit reasoning about the
relations between multiple objects
14/08/2021 Figure credit: Santoro et al 2017 22
• Relation networks
• and are neural functions
• generate “relation” between the two objects
• is the aggregation function
Relation networks (Santoro et al 2017)
 The relations here are implicit, complete, pair-wise – inefficient, and lack expressiveness
14/08/2021 23
Reasoning with Graph convolution networks
• Input graph is built from image entities and question
• GCN is used to gather facts and produce answer
 The relations are now explicit
and pruned
 But the graph building is very
stiff:
- Unrecovrable if it makes a
mistake?
- Information during reasoning are
not used to build graphs
14/08/2021 Narasimhan et.al NIPS2018 24
Reasoning with Graph attention networks
• The graph is determined during reasoning process with
attention mechanism
The relations are now
adaptive and integrated
with reasoning
 Are the relations
singular and static?
14/08/2021 ReGAT model, Li et.al. ICCV19 25
Dynamic reasoning graphs
• On complex questions,
multiple sets of relations
are needed
• We need not only multi-
step but also multi-form
structures
• Let’s do multiple
dynamically–built graphs!
14/08/2021 LCGN, Hu et.al. ICCV19 26
Dynamic reasoning graphs
The questions so far act as an unstructured command in the process
Aren’t their structures and relations important too?
14/08/2021 LCGN, Hu et.al. ICCV19 27
Reasoning on cross-modality graphs
• Two types of nodes: Linguistic entities and visual objects
• Two types of edges:
• Visual
• Linguistic-visual binding (as a fuzzy grounding)
• Adaptively updated during reasoning
14/08/2021 LOGNet, T.M Le et.al. IJCAI2020 28
Language-binding Object Graph (LOG) Unit
• Graph constructor: build the dynamic vision graph
• Language binding constructor: find the dynamic L-V relations
14/08/2021 LOGNet, T.M Le et.al. IJCAI2020 29
LOGNet: multi-step visual-linguistic binding
• Object-centric representation 
• Multi-step/multi-structure compositional reasoning 
• Linguistic-vision detail interaction 
14/08/2021 LOGNet, T.M Le et.al. IJCAI2020 30
Dynamic language-vision graphs in
actions
14/08/2021 LOGNet, T.M Le et.al. IJCAI2020 31
We got sets and graphs, how about sequences?
• Videos pose another challenge for visual reasoning: the
dynamics through time.
• Sets and graphs now becomes sequences of such.
• Temporal relations are the key factors
• The size of context is a core issue
14/08/2021 32
Agenda
• Cross-modality reasoning, the case of vision-language
integration.
• Reasoning as set-set interaction.
• Relational reasoning
• Temporal reasoning
• Video question answering.
33
14/08/2021
Overview
• Goals of this part of the tutorial
• Understanding Video QA as a complete testbed of
visual reasoning.
• Representative state-of-the-art approaches for
spatio-temporal reasoning.
34
14/08/2021
Video Question Answering
Short-form Video Question Answering
Movie Question Answering
35
14/08/2021
36
Reasoning
Qualitative spatial
reasoning
Relational, temporal
inference
Commonsense
Object recognition
Scene graphs
Computer Vision
Natural Language
Processing
Machine
learning
Visual QA
Parsing
Symbol binding
Systematic generalization
Learning to classify
entailment
Unsupervised
learning
Reinforcement
learning
Program synthesis
Action graphs
Event detection
Object
discovery
14/08/2021 36
Challenges
37
37
• Difficulties in data annotation.
• Content for performing reasoning spreads over space-
time and multiple modalities (videos, subtitles, speech
etc.)
14/08/2021
Video QA Datasets
38
38
Movie QA
(Tapaswi, M., et al.,
2016)
MSRVTT-QA and
MSVD-QA
(Xu, D., et al., 2017)
TGIF-QA
(Jang, Y., et al.,
2017)
MarioQA
(Mun, J., et al.,
2017)
CLEVRER
(Yi, K., et al., 2019)
KnowIT VQA
(Garcia, N., et al.,
2020)
14/08/2021
Video QA datasets
39
39
(TGIF-QA, Jang et al., 2018) (CLEVRER, Yi, Kexin, et al., 2020)
14/08/2021
Video QA as a spatio-temporal
extension of Image QA
40
(a) Extended end-to-end
memory network
(b) Extended simple
VQA model
(c) Extended temporal
attention model
(d) Extended sequence-
to-sequence model
14/08/2021
Zeng, Kuo-Hao, et al. "Leveraging video descriptions to learn video question answering." AAAI’17.
Spatio-temporal cross-modality
alignment
41
Key ideas:
• Explore the correlation
between vision and
language via attention
mechanisms.
• Joint representations
are query-driven
spatio-temporal
features of a given
videos.
14/08/2021 Zhao, Zhou, et al. "Video question answering via hierarchical dual-level attention network learning." ACL’17.
Memory-based Video QA
42
General Dynamic Memory Network (DMN)
Co-memory attention networks for Video QA
Key ideas:
• DMN refines attention over a set of
facts to extract reasoning clues.
• Motion and appearance features are
complementary clues for question
answering.
14/08/2021 Gao, Jiyang, et al. "Motion-appearance co-memory networks for video question answering." CVPR’18.
Memory-based Video QA
43
Heterogeneous video memory for Video QA
Key differences:
• Learning a joint representation of
multimodal inputs at each memory
read/write step.
• Utilizing external question memory
to model context-dependent
question words.
14/08/2021
Fan, Chenyou, et al. "Heterogeneous memory enhanced multimodal attention model for video question answering." CVPR’19.
Multimodal reasoning units for Video QA
44
• CRN: Conditional Relation
Networks.
• Inputs:
• Frame-based
appearance features
• Motion features
• Query features
• Outputs:
• Joint representations
encoding temporal
relations, motion, query.
.
14/08/2021 Le, Thao Minh, et al. "Hierarchical conditional relation networks for video question answering.“ CVPR’20
Object-oriented spatio-temporal reasoning for
Video QA
45
• OSTR: Object-oriented
Spatio-Temporal Reasoning.
• Inputs:
• Object lives tracked
through time.
• Context (motion).
• Query features.
• Outputs:
• Joint representations
encoding temporal
relations, motion, query. .
14/08/2021 Dang, Long Hoang, et al. "Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering." IJCAI’21
Video QA as a down-stream task of
video language pre-training
46
VideoBERT
Apr., 2019
HowTo100M
Jun., 2019
MIL-NCE
Dec., 2019
UniViLM
Feb., 2020
HERO
May, 2020
ClipBERT
Feb., 2021
14/08/2021
VideoBERT: a joint model for video
and language representation learning
47
• Data for training: Sample videos and texts from YouCook II.
Instructions in text given by ASR toolkit
Subsampled video segments
Sun, Chen, et al. "Videobert: A joint model for video and language representation learning.“ ICCV’19.
14/08/2021
VideoBERT: a joint model for video
and language representation learning
48
Sun, Chen, et al. "Videobert: A joint model for video and language representation learning.“ ICCV’19.
• Linguistic representations:
• Tokenized texts into
WordPieces, similar as
BERT.
• Visual representations:
• S3D features for each segmented
video clips.
• Tokenized into clusters using
hierarchical k-means.
Pre-training
14/08/2021
VideoBERT: a joint model for video
and language representation learning
49
Pre-training
Down-stream
tasks
Sun, Chen, et al. "Videobert: A joint model for video and language representation learning.“ ICCV’19.
Video
captioning
Video question
answering
Zero-shot action
classification
14/08/2021
CLIPBERT: video language pre-training
with sparse sampling
50
Lei, Jie, et al. "Less is more: Clipbert for video-and-language learning via sparse sampling." CVPR’21.
ClipBERT
Prev. methods
ClipBERT overview
Procedure:
• Pretraining on large-scale image-text datasets.
• Finetuning on video-text tasks.
14/08/2021
From short-form Video QA to Movie QA
51
Lei, Jie, et al. "Tvqa: Localized, compositional video question answering." EMNLP’18.
Long-term temporal relationships
Multimodal inputs
14/08/2021
Conventional methods for Movie QA
52
Question-driven multi-stream
models:
• Short-term temporal relationships are
less important.
• Long-term temporal relationships and
multimodal interactions are key.
• Language is dominant over visual
counterpart.
Le, Thao Minh, et al. "Hierarchical conditional
relation networks for video question answering.“
IJCV’21.
Lei, Jie, et al. "Tvqa: Localized, compositional video question answering." EMNLP’18.
14/08/2021
HERO: large-scale pre-training for Movie QA
53
Li, Linjie, et al. "Hero: Hierarchical encoder for video+ language omni-representation pre-training." EMNLP’20.
• Pre-trained on 7.6M
videos and
associated subtitles.
• Achieved state-of-
the-art results on all
datasets.
14/08/2021
End of part B
14/08/2021 54
https://bit.ly/37DYQn7
From Deep Learning to Deep Reasoning
14/08/2021 1
Tutorial at KDD, August 14th 2021
Truyen Tran, Vuong Le, Hung Le and Thao Le
{truyen.tran,vuong.le,thai.le,thao.le}@deakin.edu.au
https://bit.ly/37DYQn7
Part C: Memory | Data efficiency | Recursive reasoning
Agenda
• Reasoning with external memories
• Memory of entities – memory-augmented neural networks
• Memory of relations with tensors and graphs
• Memory of programs & neural program construction.
• Learning to reason with less labels
• Data augmentation with analogical and counterfactual examples
• Question generation
• Self-supervised learning for question answering
• Learning with external knowledge graphs
• Recursive reasoning with neural theory of mind.
2
Agenda
• Reasoning with external memories
• Memory of entities – memory-augmented neural networks
• Memory of relations with tensors and graphs
• Memory of programs & neural program construction.
• Learning to reason with less labels:
• Data augmentation with analogical and counterfactual examples
• Question generation
• Self-supervised learning for question answering
• Learning with external knowledge graphs
• Recursive reasoning with neural theory of mind.
3
Introduction
4
Memory is part of intelligence
• Memory is the ability to
store, retain and recall
information
• Brain memory stores
items, events and high-
level structures
• Computer memory
stores data and
temporary variables
5
Memory-reasoning analogy
6
• 2 processes: fast-slow
o Memory: familiarity-
recollection
• Cognitive test:
o Corresponding reasoning and
memorization performance
o Increasing # premises,
inductive/deductive
reasoning is affected
Heit, Evan, and Brett K. Hayes. "Predicting reasoning from memory." Journal of Experimental Psychology: General 140, no. 1 (2011): 76.
Common memory activities
• Encode: write information to
the memory, often requiring
compression capability
• Retain: keep the information
overtime. This is often assumed
in machinery memory
• Retrieve: read information from
the memory to solve the task at
hand
Encode
Retain
Retrieve
7
Memory taxonomy based on memory content
8
Item
Memory
• Objects, events, items,
variables, entities
Relational
Memory
• Relationships, structures,
graphs
Program
Memory
• Programs, functions,
procedures, how-to knowledge
Item memory
Associative memory
RAM-like memory
Independent memory
9
Distributed item memory as
associative memory
10
"Green" means
"go," but what
does "red" mean?
Language
birthday party on
30th Jan
Time Object
Where is my pen?
What is the
password?
Behaviour
10
Semantic
memory
Episodic
memory
Working
memory
Motor
memory
Associate memory can be implemented as
Hopfield network
Correlation matrix memory Hopfield network
Encode Retrieve Retrieve
Feed-forward
retrieval
Recurrent
retrieval 11
“Fast-weight
�
𝑀𝑀
Rule-based reasoning with associative
memory
• Encode a set of rules:
“pre-conditions
post-conditions”
• Support variable
binding, rule-conflict
handling and partial
rule input
• Example of encoding
rule “A:1,B:3,C:4X”
12
Outer product
for binding
Austin, Jim. "Distributed associative memories for high-speed symbolic reasoning." Fuzzy Sets and Systems 82, no. 2 (1996): 223-233.
Memory-augmented neural networks:
computation-storage separation
13
RNN Symposium 2016: Alex Graves - Differentiable Neural Computer
RAM
Neural Turing Machine (NTM)
• Memory is a 2d matrix
• Controller is a neural
network
• The controller
read/writes to memory
at certain addresses.
• Trained end-to-end,
differentiable
• Simulate Turing Machine
support symbolic
reasoning, algorithm
solving
14
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
Addressing mechanism in NTM
Input
𝑒𝑒𝑡𝑡, 𝑎𝑎𝑡𝑡
Memory writing Memory reading
Algorithmic reasoning
16
Copy
Associative
recall
Priority sort
Optimal memory writing for
memorization
• Simple finding: writing too often
deteriorates memory content (not
retainable)
• Given input sequence of length T
and only D writes, when should we
write to the memory?
17
Le, Hung, Truyen Tran, and Svetha Venkatesh. "Learning to Remember More with Less Memorization." In International Conference on Learning Representations. 2018.
Uniform writing is optimal for
memorization
Better memorization means better algorithmic reasoning
18
T=50, D=5
Regular Uniform (cached)
Memory of independent entities
• Each slot store one or some entities
• Memory writing is done separately for
each memory slot
each slot maintains the life of one or
more entities
• The memory is a set of N parallel RNNs
19
John Apple __ John Apple Office
Apple John __
John Apple Kitchen
Apple John Office Apple John Kitchen
Weston, Jason, Bordes, Antoine, Chopra, Sumit, and Mikolov, Tomas.
Towards ai-complete question answering: A set of prerequisite toy tasks. CoRR, abs/1502.05698, 2015.
RNN 1
RNN 2
…
Time
Recurrent entity network
20
Garden
Henaff, Mikael, Jason Weston, Arthur Szlam, Antoine Bordes, and Yann LeCun.
"Tracking the world state with recurrent entity networks."
In 5th International Conference on Learning Representations, ICLR 2017. 2017.
Recurrent Independent Mechanisms
21
Goyal, Anirudh, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. "Recurrent independent mechanisms.“ ICLR21.
Reasoning with independent
dynamics
22
Copy
Ball
dynamics
Relational memory
Graph memory
Tensor memory
23
Why relational memory? Item memory
is weak at recognizing relationships
Item
Memory
• Store and retrieve individual items
• Relate pair of items of the same time step
• Fail to relate temporally distant items
24
Dual process in memory
25
• Store items
• Simple, low-order
• System 1
Relational
Memory
• Store relationships between items
• Complicated, high-order
• System 2
Item
Memory
Howard Eichenbaum, Memory, amnesia, and the hippocampal system (MIT press, 1993).
Alex Konkel and Neal J Cohen, "Relational memory and the hippocampus: representations and methods", Frontiers in neuroscience 3 (2009).
Memory as graph
• Memory is a static graph with
fixed nodes and edges
• Relationship is somehow
known
• Each memory node stores
the state of the graph’s node
• Write to node via message
passing
• Read from node via MLP
26
Palm, Rasmus Berg, Ulrich Paquet, and Ole Winther. "Recurrent Relational Networks." In NeurIPS. 2018.
bAbI
27
Fact 1
Fact 2
Fact 3
Question
Node
Edge
Answer
CLEVER
Node
(colour, shape. position)
Edge
(distance)
Memory of graphs access conditioned on query
• Encode multiple graphs, each
graph is stored in a set of
memory row
• For each graph, the controller
read/write to the memory:
• Read uses content-based
attention
• Write use message passing
• Aggregate read vectors from
all graphs to create output
28
Pham, Trang, Truyen Tran, and Svetha Venkatesh. "Relational dynamic memory networks." arXiv preprint arXiv:1808.04247 (2018).
Capturing relationship can be done via
memory slot interactions using attention
• Graph memory needs customization to an explicit design of nodes and
edges
• Can we automatically learns structure with a 2d tensor memory?
• Capture relationship: each slot interacts with all other slots (self-
attention)
29
Santoro, Adam, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Théophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy Lillicrap.
"Relational recurrent neural networks." In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 7310-7321. 2018.
Relational Memory Core (RMC) operation
30
RNN-like
Interface
31
Allowing pair-wise interactions can answer
questions on temporal relationship
Dot product attention works for
simple relationship, but …
32
What is
most
similar to
me?
0.7 0.9 - 0.1 0.4
What is most
similar to me
but different
from tiger?
For hard relationship, scalar
representation is limited
Complicated relationship needs high-
order relational memory
33
Extract items
Item
memory
Associate every pairs of them
…
3d relational
tensor
Relational
memory
Le, Hung, Truyen Tran, and Svetha Venkatesh. "Self-
attentive associative memory." In International Conference
on Machine Learning, pp. 5682-5691. PMLR, 2020.
Program memory
Module memory
Stored-program memory
34
Predefining program for subtask
• A program designed for a
task becomes a module
• Parse a question to module
layout (order of program
execution)
• Learn the weight of each
module to master the task
35
Andreas, Jacob, Marcus Rohrbach, Trevor Darrell, and Dan Klein. "Neural module networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 39-48. 2016.
Program selection is based on
parser, others are end2end trained
36
5 module
templates
1 2
3
4
5
Parsing
The most powerful memory is one that stores
both program and data
• Computer architecture:
Universal Turing
Machines/Harvard/VNM
• Stored-program principle
• Break a big task into subtasks,
each can be handled by a
TM/single purposed program
stored in a program memory
37
https://en.wikipedia.org/
NUTM: Learn to select program (neural weight)
via program attention
• Neural stored-program memory
(NSM) stores key (the address)
and values (the weight)
• The weight is selected and
loaded to the controller of NTM
• The stored NTM weights and
the weight of the NUTM is
learnt end-to-end by
backpropagation
38
Le, Hung, Truyen Tran, and Svetha Venkatesh. "Neural Stored-program Memory."
In International Conference on Learning Representations. 2019.
Scaling with memory of mini-programs
• Prior, 1 program = 1 neural
network (millions of
parameters)
• Parameter inefficiency since
the programs do not share
common parameters
• Solution: store sharable
mini-programs to compose
infinite number of programs
39
it is analogous to building Lego structures
corresponding to inputs from basic Lego bricks.
Recurrent program attention to retrieve
singular components of a program
40
Le, Hung, and Svetha Venkatesh. "Neurocoder: Learning General-Purpose Computation Using Stored Neural Programs." arXiv preprint arXiv:2009.11443 (2020).
41
Program attention is equivalent to
binary decision tree reasoning
Recurrent program attention auto
detects task boundary
Agenda
• Reasoning with external memories
• Memory of entities – memory-augmented neural networks
• Memory of relations with tensors and graphs
• Memory of programs & neural program construction.
• Learning to reason with less labels:
• Data augmentation with analogical and counterfactual examples
• Question generation
• Self-supervised learning for question answering
• Learning with external knowledge graphs
• Recursive reasoning with neural theory of mind.
42
Data Augmentation with Analogical and
Counterfactual Examples
43
• Poor generalization when training under independent
and identically distributed assumption.
• Intuition: augmenting counterfactual samples to allow
machines to understand the critical changes in the
input that lead to changes in the answer space.
• Perceptually similar, yet
• Semantically dissimilar realistic samples
Visual counterfactual example
Language counterfactual examples
Gokhale, Tejas, et al. "Mutant: A training paradigm for out-of-distribution
generalization in visual question answering." EMNLP’20.
Question Generations
44
Li, Yikang, et al. "Visual question generation as dual task of visual question answering." CVPR’18.
Krishna, Ranjay, Michael Bernstein, and Li Fei-Fei. "Information maximizing visual question
generation." CVPR’19.
• Question answering is a zero-shot
learning problem. Question
generation helps cover a wider
range of concepts.
• Question generation can be done
with either supervised and
unsupervised learning.
BERT: Transformer That Predicts Its Own
Masked Parts
46
BERT is like parallel
approximate pseudo-
likelihood
• ~ Maximizing the
conditional likelihood of
some variables given the
rest.
• When the number of
variables is large, this
converses to MLE
(maximum likelihood
estimate).
[Slide credit: Truyen Tran]
https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
Visual QA as a Down-stream Task of Visual-
Language BERT Pre-trained Models
47
Numerous pre-trained visual language models during 2019-2021.
VisualBERT (Li, Liunian Harold, et al., 2019)
VL-BERT (Su, Weijie, et al., 2019)
UNITER (Chen, Yen-Chun, et al., 2019)
12-in-1 (Lu, Jiasen, et al., 2020)
Pixel-BERT (Huang, Zhicheng, et al., 2019)
OSCAR (Li, Xiujun, et al., 2020)
Single-stream model Two-stream model
ViLBERT (Lu, Jiasen, et al. , 2019)
LXMERT (Tan, Hao, and Mohit Bansal, 2019)
[Slide credit: Licheng Yu et al.]
Learning with External Knowledge
48
Why external knowledge
for reasoning?
• Questions can be beyond
visual recognition (e.g.
firetrucks usually use a fire
hydrant).
• Human’s prior knowledge for
cognition-level reasoning (e.g.
human’s goals, intents etc.)
Q: What sort of vehicle uses this item?
A: firetruck
Q: What is the sports position of the
man in the orange shirt?
A: goalie/goalkeeper
Marino, Kenneth, et al. "Ok-vqa: A visual question
answering benchmark requiring external
knowledge." CVPR’19.
Zellers, Rowan, et al. "From recognition to cognition: Visual commonsense reasoning." CVPR’19.
Learning with External Knowledge
49
Retrieved by Wikipedia search API
Marino, Kenneth, et al. "Ok-vqa: A visual question
answering benchmark requiring external
knowledge." CVPR’19.
Shah, Sanket, et al. "Kvqa: Knowledge-aware visual question
answering." AAAI’19.
Agenda
• Reasoning with external memories
• Memory of entities – memory-augmented neural networks
• Memory of relations with tensors and graphs
• Memory of programs & neural program construction.
• Learning to reason with less labels:
• Data augmentation with analogical and counterfactual examples
• Question generation
• Self-supervised learning for question answering
• Learning with external knowledge graphs
• Recursive reasoning with neural theory of mind.
50
Source: religious studies project
Core AI faculty:
Theory of mind
Where would ToM fit in?
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
Single
Image credit: VectorStock | Wikimedia
Perception
Theory of mind
Recursive reasoning
Facts
Semantics
Events and relations
Working space
Memory
Contextualized recursive reasoning
• Thus far, QA tasks are straightforward and objective:
• Questioner: I will ask about what I don’t know.
• Answerer: I will answer what I know.
• Real life can be tricky, more subjective:
• Questioner: I will ask only questions I think they can
answer.
• Answerer 1: This is what I think they want from an answer.
• Answerer 2: I will answer only what I think they think I can.
14/08/2021 53
 We need Theory of Mind to function socially.
Social dilemma: Stag Hunt games
• Difficult decision: individual outcomes (selfish)
or group outcomes (cooperative).
• Together hunt Stag (both are cooperative): Both have more
meat.
• Solely hunt Hare (both are selfish): Both have less meat.
• One hunts Stag (cooperative), other hunts Hare (selfish): Only
one hunts hare has meat.
• Human evidence: Self-interested but
considerate of others (cultures vary).
• Idea: Belief-based guilt-aversion
• One experiences loss if it lets other down.
• Necessitates Theory of Mind: reasoning about other’s mind.
Theory of Mind Agent with Guilt Aversion (ToMAGA)
Update Theory of Mind
• Predict whether other’s behaviour are
cooperative or uncooperative
• Updated the zero-order belief (what
other will do)
• Update the first-order belief (what other
think about me)
Guilt Aversion
• Compute the expected material reward
of other based on Theory of Mind
• Compute the psychological rewards, i.e.
“feeling guilty”
• Reward shaping: subtract the expected
loss of the other.
Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates
Cooperative Reinforcement Learning." Asian Conference on Machine
Learning. PMLR, 2020.
[Slide credit: Dung Nguyen]
Machine Theory of Mind Architecture (inside the Observer)
Successor
representations
next-step action
probability
goal
Rabinowitz, Neil, et al. "Machine theory of mind." International conference on machine learning. PMLR, 2018.
[Slide credit: Dung Nguyen]
A ToM
architecture
• Observer maintains memory of
previous episodes of the agent.
• It theorizes the “traits” of the
agent.
• Implemented as Hyper Networks.
• Given the current episode, the
observer tries to infer goal,
intention, action, etc of the
agent.
• Implemented as memory retrieval
through attention mechanisms.
14/08/2021 57
Wrapping up
58
Wrapping up
• Reasoning as the next challenge for deep neural networks
• Part A: Learning-to-reason framework
• Reasoning as a prediction skill that can be learnt from data
• Dynamic neural networks are capable
• Combinatorics reasoning
• Part B: Reasoning over unstructured and structured data
• Reasoning over unstructured sets
• Relational reasoning over structured data
• Part C: Memory | Data efficiency | Recursive reasoning
• Memories of items, relations and programs
• Learning with less labels
• Theory of mind
14/08/2021 59
A possible framework for learning and reasoning
with deep neural networks
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
Single
Image credit: VectorStock | Wikimedia
Perception
Theory of mind
Recursive reasoning
Facts
Semantics
Events and relations
Working space
Memory
QA
14/08/2021 61
https://bit.ly/37DYQn7

More Related Content

PPTX
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
PPTX
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
PPT
Genetic algorithm
PDF
1st Place in EY Data Science Challenge
PPTX
Understanding RNN and LSTM
PPTX
Data Storage in DNA
PPTX
System interconnect architecture
PDF
AI Chip Trends and Forecast
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Genetic algorithm
1st Place in EY Data Science Challenge
Understanding RNN and LSTM
Data Storage in DNA
System interconnect architecture
AI Chip Trends and Forecast

What's hot (20)

PDF
Shared Memory Centric Computing with CXL & OMI
PDF
High Performance Computing
PPTX
Enfabrica - Bridging the Network and Memory Worlds
PDF
RNN and its applications
PDF
VQ-VAE
PDF
Machine Learning: Generative and Discriminative Models
PPTX
Proposed Lightweight Block Cipher Algorithm for Securing Internet of Things
PPTX
Distributed shred memory architecture
PPTX
Artificial nueral network slideshare
PPTX
6장 공영방송의 독립성과 지배구조
PDF
Julien Simon - Deep Dive - Model Merging
PPTX
Neural network & its applications
PPTX
Classifying and understanding financial data using graph neural network
PPTX
Mtech Fourth progress presentation
PPTX
CPAC Connectome Analysis in the Cloud
PDF
Unit 5 Advanced Computer Architecture
PDF
PR-395: Variational Image Compression with a Scale Hyperprior
PPTX
CXL Fabric Management Standards
PPTX
Memory management
PPTX
artificial intelligence that covers frames
Shared Memory Centric Computing with CXL & OMI
High Performance Computing
Enfabrica - Bridging the Network and Memory Worlds
RNN and its applications
VQ-VAE
Machine Learning: Generative and Discriminative Models
Proposed Lightweight Block Cipher Algorithm for Securing Internet of Things
Distributed shred memory architecture
Artificial nueral network slideshare
6장 공영방송의 독립성과 지배구조
Julien Simon - Deep Dive - Model Merging
Neural network & its applications
Classifying and understanding financial data using graph neural network
Mtech Fourth progress presentation
CPAC Connectome Analysis in the Cloud
Unit 5 Advanced Computer Architecture
PR-395: Variational Image Compression with a Scale Hyperprior
CXL Fabric Management Standards
Memory management
artificial intelligence that covers frames
Ad

Similar to From deep learning to deep reasoning (20)

PDF
Deep analytics via learning to reason
PDF
Deep Learning 2.0
PDF
Deep learning 1.0 and Beyond, Part 2
PPTX
Knowledge-Enhanced Neural Machine Reasoning.pptx
PDF
Artificial intelligence in the post-deep learning era
PDF
Buku panduan untuk Machine Learning.pdf
PDF
Introduction to Deep Learning: Concepts, Architectures, and Applications
PDF
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
PPT
Machine Learning
PDF
Week 3 Deep Learning And POS Tagging Hands-On
PDF
Machine Reasoning at A2I2, Deakin University
PPT
Emergence Berkeley presentation for devices
PDF
SOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNING
PDF
Deep learning and reasoning: Recent advances
PPTX
What Deep Learning Means for Artificial Intelligence
PDF
A Quick Overview of Artificial Intelligence and Machine Learning (revised ver...
PPT
Different learning Techniques in Artificial Intelligence
PPT
ML_Overview.ppt
PPTX
ML_Overview.pptx
PPT
ML_Overview.ppt
Deep analytics via learning to reason
Deep Learning 2.0
Deep learning 1.0 and Beyond, Part 2
Knowledge-Enhanced Neural Machine Reasoning.pptx
Artificial intelligence in the post-deep learning era
Buku panduan untuk Machine Learning.pdf
Introduction to Deep Learning: Concepts, Architectures, and Applications
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
Machine Learning
Week 3 Deep Learning And POS Tagging Hands-On
Machine Reasoning at A2I2, Deakin University
Emergence Berkeley presentation for devices
SOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNING
Deep learning and reasoning: Recent advances
What Deep Learning Means for Artificial Intelligence
A Quick Overview of Artificial Intelligence and Machine Learning (revised ver...
Different learning Techniques in Artificial Intelligence
ML_Overview.ppt
ML_Overview.pptx
ML_Overview.ppt
Ad

More from Deakin University (20)

PDF
AI for automated materials discovery via learning to represent, predict, gene...
PDF
Generative AI to Accelerate Discovery of Materials
PDF
Generative AI: Shifting the AI Landscape
PDF
Machine Learning and Reasoning for Drug Discovery
PDF
Deep learning 1.0 and Beyond, Part 1
PDF
Machine reasoning
PDF
AI/ML as an empirical science
PDF
AI in the Covid-19 pandemic
PDF
Visual reasoning
PDF
AI for tackling climate change
PDF
AI for drug discovery
PDF
Deep learning and applications in non-cognitive domains I
PDF
Deep learning and applications in non-cognitive domains II
PDF
Deep learning and applications in non-cognitive domains III
PDF
Deep learning for episodic interventional data
PDF
Deep learning for detecting anomalies and software vulnerabilities
PDF
Deep learning for biomedical discovery and data mining I
PDF
Deep learning for biomedical discovery and data mining II
PDF
AI that/for matters
PDF
Representation learning on graphs
AI for automated materials discovery via learning to represent, predict, gene...
Generative AI to Accelerate Discovery of Materials
Generative AI: Shifting the AI Landscape
Machine Learning and Reasoning for Drug Discovery
Deep learning 1.0 and Beyond, Part 1
Machine reasoning
AI/ML as an empirical science
AI in the Covid-19 pandemic
Visual reasoning
AI for tackling climate change
AI for drug discovery
Deep learning and applications in non-cognitive domains I
Deep learning and applications in non-cognitive domains II
Deep learning and applications in non-cognitive domains III
Deep learning for episodic interventional data
Deep learning for detecting anomalies and software vulnerabilities
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining II
AI that/for matters
Representation learning on graphs

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Artificial Intelligence
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
PPT on Performance Review to get promotions
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PPT
Project quality management in manufacturing
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT
introduction to datamining and warehousing
PDF
Well-logging-methods_new................
OOP with Java - Java Introduction (Basics)
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Automation-in-Manufacturing-Chapter-Introduction.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Digital Logic Computer Design lecture notes
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Artificial Intelligence
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT on Performance Review to get promotions
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
Project quality management in manufacturing
Embodied AI: Ushering in the Next Era of Intelligent Systems
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
introduction to datamining and warehousing
Well-logging-methods_new................

From deep learning to deep reasoning

  • 1. From Deep Learning to Deep Reasoning 14/08/2021 1 Tutorial at KDD, August 14th 2021 Truyen Tran, Vuong Le, Hung Le and Thao Le {truyen.tran,vuong.le,thai.le,thao.le}@deakin.edu.au https://bit.ly/37DYQn7 Part A: Learning to reason
  • 2. Logistics 14/08/2021 2 Truyen Tran Vuong Le Hung Le Thao Le https://bit.ly/37DYQn7
  • 3. Agenda • Introduction • Part A: Learning-to-reason framework • Part B: Reasoning over unstructured and structured data • Part C: Memory | Data efficiency | Recursive reasoning 14/08/2021 3
  • 4. 2012 2016 AusDM 2016 Turing Awards 2018 GPT-3 2020 DL: 8 years snapshot
  • 5. DL has been fantastic, but … • It is great at interpolating •  data hungry to cover all variations and smooth local manifolds •  little systematic generalization (novel combinations) • Lack of human-perceived reasoning capability • Lack natural mechanism to incorporate prior knowledge, e.g., common sense • No built-in causal mechanisms •  Have trust issues! • To be fair, may of these problems are common in statistical learning! 14/08/2021 5
  • 6. Why still DL in 2021? Theoretical Expressiveness: Neural nets can approximate any function. Learnability: Neural nets are trained easily. Generalisability: Neural nets generalize surprisingly well to unseen data. Practical Generality: Applicable to many domains. Competitive: DL is hard to beat as long as there are data to train. Scalability: DL is better with more data, and it is very scalable.
  • 7. The next AI/ML challenge 2020s-2030s  Learning + reasoning, general purpose, human-like  Has contextual and common- sense reasoning  Requires less data  Adapt to change  Explainable Photo credit: DARPA
  • 8. Toward deeper reasoning System 1: Intuitive System 1: Intuitive System 1: Intuitive • Fast • Implicit/automatic • Pattern recognition • Multiple System 2: Analytical • Slow • Deliberate/rational • Careful analysis • Single, sequential Single Image credit: VectorStock | Wikimedia Perception Theory of mind Recursive reasoning Facts Semantics Events and relations Working space Memory
  • 9. System 2 • Holds hypothetical thought • Decoupling from representation • Working memory size is not essential. Its attentional control is. 14/08/2021 9
  • 10. Figure credit: Jonathan Hui Reasoning in Probabilistic Graphical Models (PGM) • Assuming models are fully specified (e.g., by hand or learnt) • Estimate MAP as energy minimization • Compute marginal probability • Compute expectation & normalisation constant • Key algorithm: Pearl’s Belief Propagation, a.k.a Sum-Product algorithm in factor graphs. • Known result in 2001-2003: BP minimises Bethe free-energy minimization. 14/08/2021 10 Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free energy." Advances in neural information processing systems. 2003.
  • 11. Can we learn to infer directly from data without full specification of models? 14/08/2021 11
  • 12. Agenda • Introduction • Part A: Learning-to-reason framework • Part B: Reasoning over unstructured and structured data • Part C: Memory | Data efficiency | Recursive reasoning 14/08/2021 12
  • 13. Part A: Sub-topics • Reasoning as a prediction skill that can be learnt from data. • Question answering as zero-shot learning. • Neural network operations for learning to reason: • Concept-object binding. • Attention & transformers. • Dynamic neural networks, conditional computation & differentiable programming. • Reasoning as iterative representation refinement & query-driven program synthesis and execution • Compositional attention networks. • Neural module networks. • Combinatorics reasoning 14/08/2021 13
  • 14. Learning to reason • Learning is to self-improve by experiencing ~ acquiring knowledge & skills • Reasoning is to deduce knowledge from previously acquired knowledge in response to a query (or a cues) • Learning to reason is to improve the ability to decide if a knowledge base entails a predicate. • E.g., given a video f, determines if the person with the hat turns before singing. • Hypotheses: • Reasoning as just-in-time program synthesis. • It employs conditional computation. 14/08/2021 14 Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM (JACM) 44.5 (1997): 697-725. (Dan Roth; ACM Fellow; IJCAI John McCarthy Award)
  • 15. Learning to reason, a definition 14/08/2021 15 Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM (JACM) 44.5 (1997): 697-725. E.g., given a video f, determines if the person with the hat turns before singing.
  • 16. Practical setting: (query,database,answer) triplets • This is very general: • Classification: Query = what is this? Database = data. • Regression: Query = how much? Database = data. • QA: Query = NLP question. Database = context/image/text. • Multi-task learning: Query = task ID. Database = data. • Zero-shot learning: Query = task description. Database = data. • Drug-protein binding: Query = drug. Database = protein. • Recommender system: Query = User (or item). Database = inventories (or user base); 14/08/2021 16
  • 17. Can neural networks reason? Reasoning is not necessarily achieved by making logical inferences There is a continuity between [algebraically rich inference] and [connecting together trainable learning systems] Central to reasoning is composition rules to guide the combinations of modules to address new tasks 14/08/2021 17 “When we observe a visual scene, when we hear a complex sentence, we are able to explain in formal terms the relation of the objects in the scene, or the precise meaning of the sentence components. However, there is no evidence that such a formal analysis necessarily takes place: we see a scene, we hear a sentence, and we just know what they mean. This suggests the existence of a middle layer, already a form of reasoning, but not yet formal or logical.” Bottou, Léon. "From machine learning to machine reasoning." Machine learning 94.2 (2014): 133-149.
  • 18. Hypotheses • Reasoning as just-in-time program synthesis. • It employs conditional computation. • Reasoning is recursive, e.g., mental travel. 14/08/2021 18
  • 19. Two approaches to neural reasoning • Implicit chaining of predicates through recurrence: • Step-wise query-specific attention to relevant concepts & relations. • Iterative concept refinement & combination, e.g., through a working memory. • Answer is computed from the last memory state & question embedding. • Explicit program synthesis: • There is a set of modules, each performs an pre-defined operation. • Question is parse into a symbolic program. • The program is implemented as a computational graph constructed by chaining separate modules. • The program is executed to compute an answer. 14/08/2021 19
  • 20. In search for basic neural operators for reasoning • Basics: • Neuron as feature detector  Sensor, filter • Computational graph  Circuit • Skip-connection  Short circuit • Essentials • Multiplicative gates  AND gate, Transistor, Resistor • Attention mechanism  SWITCH gate • Memory + forgetting  Capacitor + leakage • Compositionality  Modular design • .. 14/08/2021 20 Photo credit: Nicola Asuni
  • 21. Part A: Sub-topics • Reasoning as a prediction skill that can be learnt from data. • Question answering as zero-shot learning. • Neural network operations for learning to reason: • Concept-object binding. • Attention & transformers. • Dynamic neural networks, conditional computation & differentiable programming. • Reasoning as iterative representation refinement & query-driven program synthesis and execution. • Compositional attention networks. • Reasoning as Neural module networks. • Combinatorics reasoning 14/08/2021 21
  • 22. Concept-object binding • Perceived data (e.g., visual objects) may not share the same semantic space with high-level concepts. • Binding between concept-object enables reasoning at the concept level 14/08/2021 22 Example of concept-object binding in LOGNet (Le et al, IJCAI’2020) More reading: Greff, Klaus, Sjoerd van Steenkiste, and Jürgen Schmidhuber. "On the binding problem in artificial neural networks." arXiv preprint arXiv:2012.05208 (2020).
  • 23. Attentions: Picking up only what is needed at a step • Need attention model to select or ignore certain computations or inputs • Can be “soft” (differentiable) or “hard” (requires RL) • Needed for selecting predicates in reasoning. • Attention provides a short-cut  long- term dependencies • Needed for long chain of reasoning. • Also encourages sparsity if done right! http://distill.pub/2016/augmented-rnns/
  • 24. Fast weights | HyperNet – the multiplicative interaction • Early ideas in early 1990s by Juergen Schmidhuber and collaborators. • Data-dependent weights | Using a controller to generate weights of the main net. 14/08/2021 24 Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." arXiv preprint arXiv:1609.09106 (2016).
  • 25. Memory networks: Holding the data ready for inference • Input is a set  Load into memory, which is NOT updated. • State is a RNN with attention reading from inputs • Concepts: Query, key and content + Content addressing. • Deep models, but constant path length from input to output. • Equivalent to a RNN with shared input set. 14/08/2021 25 Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information processing systems. 2015.
  • 26. Transformers: Analogical reasoning through self- attention 14/08/2021 26 Tay, Yi, et al. "Efficient transformers: A survey." arXiv preprint arXiv:2009.06732 (2020). State Key Query Memory
  • 27. Transformer as implicit reasoning • Recall: Reasoning as (free-) energy minimisation • The classic Belief Propagation algorithm is minimization algorithm of the Bethe free-energy! • Transformer has relational, iterative state refinement makes it a great candidate for implicit relational reasoning. 14/08/2021 27 Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020).
  • 28. Transformer v.s. memory networks • Memory network: • Attention to input set • One hidden state update at a time. • Final state integrate information of the set, conditioned on the query. • Transformer: • Loading all inputs into working memory • Assigns one hidden state per input element. • All hidden states (including those from the query) to compute the answer. 14/08/2021 28
  • 29. Universal transformers 14/08/2021 29 https://ai.googleblog.com/2018/08/moving-beyond-translation-with.html Dehghani, Mostafa, et al. "Universal Transformers." International Conference on Learning Representations. 2018.
  • 30. Dynamic neural networks • Memory-Augmented Neural Networks • Modular program layout • Program synthesis 14/08/2021 30
  • 31. Neural Turing machine (NTM) A memory-augmented neural network (MANN) • A controller that takes input/output and talks to an external memory module. • Memory has read/write operations. • The main issue is where to write, and how to update the memory state. • All operations are differentiable. Source: rylanschaeffer.github.io
  • 32. MANN for reasoning • Three steps: • Store data into memory • Read query, process sequentially, consult memory • Output answer • Behind the scene: • Memory contains data & results of intermediate steps • LOGNet does the same, memory consists of object representations • Drawbacks of current MANNs: • No memory of controllers  Less modularity and compositionality when query is complex • No memory of relations  Much harder to chain predicates. 14/08/2021 32 Source: rylanschaeffer.github.io
  • 33. Part A: Sub-topics • Reasoning as a prediction skill that can be learnt from data. • Question answering as zero-shot learning. • Neural network operations for learning to reason: • Concept-object binding. • Attention & transformers. • Dynamic neural networks, conditional computation & differentiable programming. • Reasoning as iterative representation refinement & query-driven program synthesis and execution. • Compositional attention networks. • Reasoning as Neural module networks. • Combinatorics reasoning 14/08/2021 33
  • 34. MAC Net: Recurrent, iterative representation refinement 14/08/2021 34 Hudson, Drew A., and Christopher D. Manning. "Compositional attention networks for machine reasoning." ICLR 2018.
  • 35. Module networks (reasoning by constructing and executing neural programs) • Reasoning as laying out modules to reach an answer • Composable neural architecture  question parsed as program (layout of modules) • A module is a function (x  y), could be a sub-reasoning process ((x, q)  y). 14/08/2021 35 https://bair.berkeley.edu/blog/2017/06/20/learning-to-reason-with-neural-module-networks/
  • 36. Putting things together: A framework for visual reasoning 14/08/2021 36 @Truyen Tran & Vuong Le, Deakin Uni
  • 37. Part A: Sub-topics • Reasoning as a prediction skill that can be learnt from data. • Question answering as zero-shot learning. • Neural network operations for learning to reason: • Concept-object binding. • Attention & transformers. • Dynamic neural networks, conditional computation & differentiable programming. • Reasoning as iterative representation refinement & query-driven program synthesis and execution. • Compositional attention networks. • Reasoning as Neural module networks. • Combinatorics reasoning 14/08/2021 37
  • 38. Implement combinatorial algorithms with neural networks 38 Generalizable Inflexible Noisy High dimensional Train neural processor P to imitate algorithm A Processor P: (a) aligned with the computations of the target algorithm; (b) operates by matrix multiplications, hence natively admits useful gradients; (c) operates over high- dimensional latent spaces Veličković, Petar, and Charles Blundell. "Neural Algorithmic Reasoning." arXiv preprint arXiv:2105.02761 (2021).
  • 39. Processor as RNN • Do not assume knowing the structure of the input, input as a sequence not really reasonable, harder to generalize • RNN is Turing-complete  can simulate any algorithm • But, it is not easy to learn the simulation from data (input- output) Pointer network 39 Assume O(N) memory And O(N^2) computation N is the size of input Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2, pp. 2692-2700. 2015.
  • 40. Processor as MANN • MANN simulates neural computers or Turing machine ideal for implement algorithms • Sequential input, no assumption on input structure • Assume O(1) memory and O(N) computation 40 Graves, A., Wayne, G., Reynolds, M. et al. Hybrid computing using a neural network with dynamic external memory. Nature 538, 471–476 (2016)
  • 41. Sequential encoding of graphs 41 • Each node is associated with random one-hot or binary features • Output is the features of the solution [x1,y1, feature1], [x2,y2, feature2], … [feature4], [feature2], … Geometry [node_feature1, node_feature2, edge12], [node_feature1, node_feature2, edge13], … [node_feature4], [node_feature2], … Graph Convex Hull TSP Shortest Path Minimum Spanning Tree Le, Hung, Truyen Tran, and Svetha Venkatesh. "Self-attentive associative memory." In International Conference on Machine Learning, pp. 5682-5691. PMLR, 2020.
  • 42. DNC: graph reasoning 42 Graves, A., Wayne, G., Reynolds, M. et al. Hybrid computing using a neural network with dynamic external memory. Nature 538, 471–476 (2016)
  • 43. NUTM: learning multiple algorithms at once 43 Le, Hung, Truyen Tran, and Svetha Venkatesh. "Neural Stored-program Memory." In International Conference on Learning Representations. 2019.
  • 44. Processor as graph neural network (GNN) 44 https://petar-v.com/talks/Algo-WWW.pdf Veličković, Petar, Rex Ying, Matilde Padovano, Raia Hadsell, and Charles Blundell. "Neural Execution of Graph Algorithms." In International Conference on Learning Representations. 2019. Motivation: • Many algorithm operates on graphs • Supervise graph neural networks with algorithm operation/step/final output • Encoder-Process-Decode framework: Attention Message passing
  • 45. Example: GNN for a specific problem (DNF counting) • Count #assignments that satisfy disjuntive normal form (DNF) formula • Classical algorithm is P-hard O(mn) • m: #clauses, n: #variables • Supervised training on output-level 45 Best: O(m+n) Abboud, Ralph, Ismail Ceylan, and Thomas Lukasiewicz. "Learning to reason: Leveraging neural networks for approximate DNF counting.“ In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 3097-3104. 2020.
  • 46. Neural networks and algorithms alignment 46 Xu, Keylu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, and Stefanie Jegelka. "What Can Neural Networks Reason About?." ICLR 2020 (2020). https://petar-v.com/talks/Algo-WWW.pdf Neural exhaustive search
  • 47. GNN is aligned with Dynamic Programming (DP) 47 Neural exhaustive search
  • 48. If alignment exists  step-by-step supervision 48 Veličković, Petar, Rex Ying, Matilde Padovano, Raia Hadsell, and Charles Blundell. "Neural Execution of Graph Algorithms." In International Conference on Learning Representations. 2019. • Merely simulate the classical graph algorithm, generalizable • No algorithm discovery Joint training is encouraged
  • 49. Processor as Transformer • Back to input sequence (set), but stronger generalization • Transformer with encoder mask ~ graph attention • Use Transformer with: • Binary representation of numbers • Dynamic conditional masking 49 Yan, Yujun, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, and Milad Hashemi. "Neural Execution Engines: Learning to Execute Subroutines." Advances in Neural Information Processing Systems 33 (2020). Next step Masked encoding Decoding Mask prediction
  • 51. End of part A 14/08/2021 51 https://bit.ly/37DYQn7
  • 52. From Deep Learning to Deep Reasoning 14/08/2021 1 Tutorial at KDD, August 14th 2021 Truyen Tran, Vuong Le, Hung Le and Thao Le {truyen.tran,vuong.le,thai.le,thao.le}@deakin.edu.au https://bit.ly/37DYQn7 Part B: Reasoning over unstructured and structured data
  • 53. Agenda • Cross-modality reasoning, the case of vision-language integration. • Reasoning as set-set interaction. • Relational reasoning • Temporal reasoning • Video question answering. 2 14/08/2021
  • 54. Learning to Reason formulation • Input: • A knowledge context C • A query q • Output: an answer satisfying • C can be • structured: knowledge graphs • unstructured: text, image, sound, video Q: Is it simply an optimization problem like recognition, detection or even translation?  No, because the logics from C, q into a is more complex than other solved optimization problems  We can solve (some parts of) it with good structures and inference strategies Q: “What affects her mobility?” 14/08/2021 3
  • 55. A case study: Image Question Answering • Realization • C: visual content of an image • q: a linguistic question • a: a linguistic phrase as the answer to q regarding K • Challenges • Reasoning through facts and logics • Cross-modality integration 14/08/2021 4
  • 56. Image QA: Question types 14/08/2021 Slide credit: Thao Minh Le 5
  • 57. Image QA datasets 14/08/2021 Slide credit: Thao Minh Le 6
  • 58. The two main themes in Image QA • Neuro-symbolic reasoning • Parse the question into a “program” of small steps • Learn the generic steps as neural modules • Use and reuse the modules for different programs • Compositional reasoning • Extract visual and linguistic individual- and joint- representation • Reasoning happens on the structure of the representation • Sets/graphs/sequences • The representation got refined through multi-step compositional reasoning 14/08/2021 7
  • 59. Agenda • Cross-modality reasoning, the case of vision-language integration. • Reasoning as set-set interaction. • Relational reasoning • Temporal reasoning • Video question answering. 8 14/08/2021
  • 60. A simple approach  Issue: This is very susceptible to the nuances of images and questions 14/08/2021 Agrawal et al., 2015, Slide credit: Thao Minh Le 9
  • 61. Reasoning as set-set interaction • : a set of context objects • Faster-RCNN regions • CNN tubes • q: a set of linguistic objects L. - biLSTM embedding of q  Reasoning is formulated as the interaction between the two sets O and L for the answer a 14/08/2021 10
  • 62. Set operations • Reducing operation (eg: sum/average/max) • Attention-based combination (Bahdanau et al. 2015) • Attention weights as query-key dot product (Vaswani et al., 2017)  Attention-based set ops seem very suitable for visual reasoning 14/08/2021 11
  • 63. Attention-based reasoning • Unidirectional attention • Find relation score between parts in the context C to the question q: Options for f: • Hermann et al. (2015) • Chen et al. (2016) • Normalized by softmax into attention weights • Attended context vector:  We can now extract information from the context that is “relevant” to the query 14/08/2021 12
  • 64. Bottom-up-top-down attention (Anderson et al 2017) • Bottom-up set construction: Choosing Faster-RCNN regions with high class scores • Top-down attention: Attending on visual features by question  Q: How about attention from vision objects to linguistic objects? 14/08/2021 13
  • 65. Bi-directional attention • Question-context similarity measure • Question-guided context attention • Softmax across columns • Context-guided question attention • Softmax across rows  Q: Probably not working for image qa where single words does not have the co-reference with a region? 14/08/2021 Dynamic coattention networks for question answering (Seo et al., ICLR 2017) 14
  • 66. Hierarchical co-attention for ImageQA • The co-attention is found on a word-phrase-sentence hierarchy  better cross-domain co-references  Q: Can this be done on text qa as well?  Q: How about questions with many reasoning hops? 14/08/2021 15
  • 67. Multi-step compositional reasoning • Complex question need multiple hops of reasoning • Relations inside the context are multi- step themselves • Single shot of attention won’t be enough • Single shot of information gathering is definitely not enough 16  Q: How to do multi-hop attentional reasoning? 14/08/2021 Figure: Hudson and Manning – ICLR 2018
  • 68. Multi-step reasoning - Memory, Attention, and Composition (MAC Nets) • Attention reasoning is done through multiple sequential steps. • Each step is done with a recurrent neural cell • What is the key differences to the normal RNN (LSTM/GRU) cell? • Not a sequential input, it is sequential processing on static input set. • Guided by the question through a controller. 14/08/2021 MAC network, Hudson and Manning – ICLR 2018 17
  • 69. Multi-step attentional reasoning • At each step, the controller decide what to look next • After each step, a piece of information is gathered, represented through the attention map on question words and visual objects • A common memory kept all the information extracted toward an answer 14/08/2021 MAC network, Hudson and Manning – ICLR 2018 18
  • 70. Multi-step attentional reasoning • Step 1: attends to the “tiny blue block”, updating m1 • Step 2: look for “the sphere in front” m2. • Step3: traverse from the cyan ball to the final objective – the purple cylinder, 19 14/08/2021
  • 71. Reasoning as set-set interaction – a look back • : a set of context objects • q: a set of linguistic objects • Reasoning is formulated as the interaction between the two sets O and L for the answer a Q:What is the brown animal sitting inside of?  Q: Set-set interaction falls short for questions about relations between objects 14/08/2021 20
  • 72. Agenda • Cross-modality reasoning, the case of vision-language integration. • Reasoning as set-set interaction. • Relational reasoning • Temporal reasoning • Video question answering. 21 14/08/2021
  • 73. Reasoning on Graphs • Relational questions: requiring explicit reasoning about the relations between multiple objects 14/08/2021 Figure credit: Santoro et al 2017 22
  • 74. • Relation networks • and are neural functions • generate “relation” between the two objects • is the aggregation function Relation networks (Santoro et al 2017)  The relations here are implicit, complete, pair-wise – inefficient, and lack expressiveness 14/08/2021 23
  • 75. Reasoning with Graph convolution networks • Input graph is built from image entities and question • GCN is used to gather facts and produce answer  The relations are now explicit and pruned  But the graph building is very stiff: - Unrecovrable if it makes a mistake? - Information during reasoning are not used to build graphs 14/08/2021 Narasimhan et.al NIPS2018 24
  • 76. Reasoning with Graph attention networks • The graph is determined during reasoning process with attention mechanism The relations are now adaptive and integrated with reasoning  Are the relations singular and static? 14/08/2021 ReGAT model, Li et.al. ICCV19 25
  • 77. Dynamic reasoning graphs • On complex questions, multiple sets of relations are needed • We need not only multi- step but also multi-form structures • Let’s do multiple dynamically–built graphs! 14/08/2021 LCGN, Hu et.al. ICCV19 26
  • 78. Dynamic reasoning graphs The questions so far act as an unstructured command in the process Aren’t their structures and relations important too? 14/08/2021 LCGN, Hu et.al. ICCV19 27
  • 79. Reasoning on cross-modality graphs • Two types of nodes: Linguistic entities and visual objects • Two types of edges: • Visual • Linguistic-visual binding (as a fuzzy grounding) • Adaptively updated during reasoning 14/08/2021 LOGNet, T.M Le et.al. IJCAI2020 28
  • 80. Language-binding Object Graph (LOG) Unit • Graph constructor: build the dynamic vision graph • Language binding constructor: find the dynamic L-V relations 14/08/2021 LOGNet, T.M Le et.al. IJCAI2020 29
  • 81. LOGNet: multi-step visual-linguistic binding • Object-centric representation  • Multi-step/multi-structure compositional reasoning  • Linguistic-vision detail interaction  14/08/2021 LOGNet, T.M Le et.al. IJCAI2020 30
  • 82. Dynamic language-vision graphs in actions 14/08/2021 LOGNet, T.M Le et.al. IJCAI2020 31
  • 83. We got sets and graphs, how about sequences? • Videos pose another challenge for visual reasoning: the dynamics through time. • Sets and graphs now becomes sequences of such. • Temporal relations are the key factors • The size of context is a core issue 14/08/2021 32
  • 84. Agenda • Cross-modality reasoning, the case of vision-language integration. • Reasoning as set-set interaction. • Relational reasoning • Temporal reasoning • Video question answering. 33 14/08/2021
  • 85. Overview • Goals of this part of the tutorial • Understanding Video QA as a complete testbed of visual reasoning. • Representative state-of-the-art approaches for spatio-temporal reasoning. 34 14/08/2021
  • 86. Video Question Answering Short-form Video Question Answering Movie Question Answering 35 14/08/2021
  • 87. 36 Reasoning Qualitative spatial reasoning Relational, temporal inference Commonsense Object recognition Scene graphs Computer Vision Natural Language Processing Machine learning Visual QA Parsing Symbol binding Systematic generalization Learning to classify entailment Unsupervised learning Reinforcement learning Program synthesis Action graphs Event detection Object discovery 14/08/2021 36
  • 88. Challenges 37 37 • Difficulties in data annotation. • Content for performing reasoning spreads over space- time and multiple modalities (videos, subtitles, speech etc.) 14/08/2021
  • 89. Video QA Datasets 38 38 Movie QA (Tapaswi, M., et al., 2016) MSRVTT-QA and MSVD-QA (Xu, D., et al., 2017) TGIF-QA (Jang, Y., et al., 2017) MarioQA (Mun, J., et al., 2017) CLEVRER (Yi, K., et al., 2019) KnowIT VQA (Garcia, N., et al., 2020) 14/08/2021
  • 90. Video QA datasets 39 39 (TGIF-QA, Jang et al., 2018) (CLEVRER, Yi, Kexin, et al., 2020) 14/08/2021
  • 91. Video QA as a spatio-temporal extension of Image QA 40 (a) Extended end-to-end memory network (b) Extended simple VQA model (c) Extended temporal attention model (d) Extended sequence- to-sequence model 14/08/2021 Zeng, Kuo-Hao, et al. "Leveraging video descriptions to learn video question answering." AAAI’17.
  • 92. Spatio-temporal cross-modality alignment 41 Key ideas: • Explore the correlation between vision and language via attention mechanisms. • Joint representations are query-driven spatio-temporal features of a given videos. 14/08/2021 Zhao, Zhou, et al. "Video question answering via hierarchical dual-level attention network learning." ACL’17.
  • 93. Memory-based Video QA 42 General Dynamic Memory Network (DMN) Co-memory attention networks for Video QA Key ideas: • DMN refines attention over a set of facts to extract reasoning clues. • Motion and appearance features are complementary clues for question answering. 14/08/2021 Gao, Jiyang, et al. "Motion-appearance co-memory networks for video question answering." CVPR’18.
  • 94. Memory-based Video QA 43 Heterogeneous video memory for Video QA Key differences: • Learning a joint representation of multimodal inputs at each memory read/write step. • Utilizing external question memory to model context-dependent question words. 14/08/2021 Fan, Chenyou, et al. "Heterogeneous memory enhanced multimodal attention model for video question answering." CVPR’19.
  • 95. Multimodal reasoning units for Video QA 44 • CRN: Conditional Relation Networks. • Inputs: • Frame-based appearance features • Motion features • Query features • Outputs: • Joint representations encoding temporal relations, motion, query. . 14/08/2021 Le, Thao Minh, et al. "Hierarchical conditional relation networks for video question answering.“ CVPR’20
  • 96. Object-oriented spatio-temporal reasoning for Video QA 45 • OSTR: Object-oriented Spatio-Temporal Reasoning. • Inputs: • Object lives tracked through time. • Context (motion). • Query features. • Outputs: • Joint representations encoding temporal relations, motion, query. . 14/08/2021 Dang, Long Hoang, et al. "Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering." IJCAI’21
  • 97. Video QA as a down-stream task of video language pre-training 46 VideoBERT Apr., 2019 HowTo100M Jun., 2019 MIL-NCE Dec., 2019 UniViLM Feb., 2020 HERO May, 2020 ClipBERT Feb., 2021 14/08/2021
  • 98. VideoBERT: a joint model for video and language representation learning 47 • Data for training: Sample videos and texts from YouCook II. Instructions in text given by ASR toolkit Subsampled video segments Sun, Chen, et al. "Videobert: A joint model for video and language representation learning.“ ICCV’19. 14/08/2021
  • 99. VideoBERT: a joint model for video and language representation learning 48 Sun, Chen, et al. "Videobert: A joint model for video and language representation learning.“ ICCV’19. • Linguistic representations: • Tokenized texts into WordPieces, similar as BERT. • Visual representations: • S3D features for each segmented video clips. • Tokenized into clusters using hierarchical k-means. Pre-training 14/08/2021
  • 100. VideoBERT: a joint model for video and language representation learning 49 Pre-training Down-stream tasks Sun, Chen, et al. "Videobert: A joint model for video and language representation learning.“ ICCV’19. Video captioning Video question answering Zero-shot action classification 14/08/2021
  • 101. CLIPBERT: video language pre-training with sparse sampling 50 Lei, Jie, et al. "Less is more: Clipbert for video-and-language learning via sparse sampling." CVPR’21. ClipBERT Prev. methods ClipBERT overview Procedure: • Pretraining on large-scale image-text datasets. • Finetuning on video-text tasks. 14/08/2021
  • 102. From short-form Video QA to Movie QA 51 Lei, Jie, et al. "Tvqa: Localized, compositional video question answering." EMNLP’18. Long-term temporal relationships Multimodal inputs 14/08/2021
  • 103. Conventional methods for Movie QA 52 Question-driven multi-stream models: • Short-term temporal relationships are less important. • Long-term temporal relationships and multimodal interactions are key. • Language is dominant over visual counterpart. Le, Thao Minh, et al. "Hierarchical conditional relation networks for video question answering.“ IJCV’21. Lei, Jie, et al. "Tvqa: Localized, compositional video question answering." EMNLP’18. 14/08/2021
  • 104. HERO: large-scale pre-training for Movie QA 53 Li, Linjie, et al. "Hero: Hierarchical encoder for video+ language omni-representation pre-training." EMNLP’20. • Pre-trained on 7.6M videos and associated subtitles. • Achieved state-of- the-art results on all datasets. 14/08/2021
  • 105. End of part B 14/08/2021 54 https://bit.ly/37DYQn7
  • 106. From Deep Learning to Deep Reasoning 14/08/2021 1 Tutorial at KDD, August 14th 2021 Truyen Tran, Vuong Le, Hung Le and Thao Le {truyen.tran,vuong.le,thai.le,thao.le}@deakin.edu.au https://bit.ly/37DYQn7 Part C: Memory | Data efficiency | Recursive reasoning
  • 107. Agenda • Reasoning with external memories • Memory of entities – memory-augmented neural networks • Memory of relations with tensors and graphs • Memory of programs & neural program construction. • Learning to reason with less labels • Data augmentation with analogical and counterfactual examples • Question generation • Self-supervised learning for question answering • Learning with external knowledge graphs • Recursive reasoning with neural theory of mind. 2
  • 108. Agenda • Reasoning with external memories • Memory of entities – memory-augmented neural networks • Memory of relations with tensors and graphs • Memory of programs & neural program construction. • Learning to reason with less labels: • Data augmentation with analogical and counterfactual examples • Question generation • Self-supervised learning for question answering • Learning with external knowledge graphs • Recursive reasoning with neural theory of mind. 3
  • 110. Memory is part of intelligence • Memory is the ability to store, retain and recall information • Brain memory stores items, events and high- level structures • Computer memory stores data and temporary variables 5
  • 111. Memory-reasoning analogy 6 • 2 processes: fast-slow o Memory: familiarity- recollection • Cognitive test: o Corresponding reasoning and memorization performance o Increasing # premises, inductive/deductive reasoning is affected Heit, Evan, and Brett K. Hayes. "Predicting reasoning from memory." Journal of Experimental Psychology: General 140, no. 1 (2011): 76.
  • 112. Common memory activities • Encode: write information to the memory, often requiring compression capability • Retain: keep the information overtime. This is often assumed in machinery memory • Retrieve: read information from the memory to solve the task at hand Encode Retain Retrieve 7
  • 113. Memory taxonomy based on memory content 8 Item Memory • Objects, events, items, variables, entities Relational Memory • Relationships, structures, graphs Program Memory • Programs, functions, procedures, how-to knowledge
  • 114. Item memory Associative memory RAM-like memory Independent memory 9
  • 115. Distributed item memory as associative memory 10 "Green" means "go," but what does "red" mean? Language birthday party on 30th Jan Time Object Where is my pen? What is the password? Behaviour 10 Semantic memory Episodic memory Working memory Motor memory
  • 116. Associate memory can be implemented as Hopfield network Correlation matrix memory Hopfield network Encode Retrieve Retrieve Feed-forward retrieval Recurrent retrieval 11 “Fast-weight � 𝑀𝑀
  • 117. Rule-based reasoning with associative memory • Encode a set of rules: “pre-conditions post-conditions” • Support variable binding, rule-conflict handling and partial rule input • Example of encoding rule “A:1,B:3,C:4X” 12 Outer product for binding Austin, Jim. "Distributed associative memories for high-speed symbolic reasoning." Fuzzy Sets and Systems 82, no. 2 (1996): 223-233.
  • 118. Memory-augmented neural networks: computation-storage separation 13 RNN Symposium 2016: Alex Graves - Differentiable Neural Computer RAM
  • 119. Neural Turing Machine (NTM) • Memory is a 2d matrix • Controller is a neural network • The controller read/writes to memory at certain addresses. • Trained end-to-end, differentiable • Simulate Turing Machine support symbolic reasoning, algorithm solving 14 Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
  • 120. Addressing mechanism in NTM Input 𝑒𝑒𝑡𝑡, 𝑎𝑎𝑡𝑡 Memory writing Memory reading
  • 122. Optimal memory writing for memorization • Simple finding: writing too often deteriorates memory content (not retainable) • Given input sequence of length T and only D writes, when should we write to the memory? 17 Le, Hung, Truyen Tran, and Svetha Venkatesh. "Learning to Remember More with Less Memorization." In International Conference on Learning Representations. 2018. Uniform writing is optimal for memorization
  • 123. Better memorization means better algorithmic reasoning 18 T=50, D=5 Regular Uniform (cached)
  • 124. Memory of independent entities • Each slot store one or some entities • Memory writing is done separately for each memory slot each slot maintains the life of one or more entities • The memory is a set of N parallel RNNs 19 John Apple __ John Apple Office Apple John __ John Apple Kitchen Apple John Office Apple John Kitchen Weston, Jason, Bordes, Antoine, Chopra, Sumit, and Mikolov, Tomas. Towards ai-complete question answering: A set of prerequisite toy tasks. CoRR, abs/1502.05698, 2015. RNN 1 RNN 2 … Time
  • 125. Recurrent entity network 20 Garden Henaff, Mikael, Jason Weston, Arthur Szlam, Antoine Bordes, and Yann LeCun. "Tracking the world state with recurrent entity networks." In 5th International Conference on Learning Representations, ICLR 2017. 2017.
  • 126. Recurrent Independent Mechanisms 21 Goyal, Anirudh, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. "Recurrent independent mechanisms.“ ICLR21.
  • 129. Why relational memory? Item memory is weak at recognizing relationships Item Memory • Store and retrieve individual items • Relate pair of items of the same time step • Fail to relate temporally distant items 24
  • 130. Dual process in memory 25 • Store items • Simple, low-order • System 1 Relational Memory • Store relationships between items • Complicated, high-order • System 2 Item Memory Howard Eichenbaum, Memory, amnesia, and the hippocampal system (MIT press, 1993). Alex Konkel and Neal J Cohen, "Relational memory and the hippocampus: representations and methods", Frontiers in neuroscience 3 (2009).
  • 131. Memory as graph • Memory is a static graph with fixed nodes and edges • Relationship is somehow known • Each memory node stores the state of the graph’s node • Write to node via message passing • Read from node via MLP 26 Palm, Rasmus Berg, Ulrich Paquet, and Ole Winther. "Recurrent Relational Networks." In NeurIPS. 2018.
  • 132. bAbI 27 Fact 1 Fact 2 Fact 3 Question Node Edge Answer CLEVER Node (colour, shape. position) Edge (distance)
  • 133. Memory of graphs access conditioned on query • Encode multiple graphs, each graph is stored in a set of memory row • For each graph, the controller read/write to the memory: • Read uses content-based attention • Write use message passing • Aggregate read vectors from all graphs to create output 28 Pham, Trang, Truyen Tran, and Svetha Venkatesh. "Relational dynamic memory networks." arXiv preprint arXiv:1808.04247 (2018).
  • 134. Capturing relationship can be done via memory slot interactions using attention • Graph memory needs customization to an explicit design of nodes and edges • Can we automatically learns structure with a 2d tensor memory? • Capture relationship: each slot interacts with all other slots (self- attention) 29 Santoro, Adam, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Théophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy Lillicrap. "Relational recurrent neural networks." In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 7310-7321. 2018.
  • 135. Relational Memory Core (RMC) operation 30 RNN-like Interface
  • 136. 31 Allowing pair-wise interactions can answer questions on temporal relationship
  • 137. Dot product attention works for simple relationship, but … 32 What is most similar to me? 0.7 0.9 - 0.1 0.4 What is most similar to me but different from tiger? For hard relationship, scalar representation is limited
  • 138. Complicated relationship needs high- order relational memory 33 Extract items Item memory Associate every pairs of them … 3d relational tensor Relational memory Le, Hung, Truyen Tran, and Svetha Venkatesh. "Self- attentive associative memory." In International Conference on Machine Learning, pp. 5682-5691. PMLR, 2020.
  • 140. Predefining program for subtask • A program designed for a task becomes a module • Parse a question to module layout (order of program execution) • Learn the weight of each module to master the task 35 Andreas, Jacob, Marcus Rohrbach, Trevor Darrell, and Dan Klein. "Neural module networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 39-48. 2016.
  • 141. Program selection is based on parser, others are end2end trained 36 5 module templates 1 2 3 4 5 Parsing
  • 142. The most powerful memory is one that stores both program and data • Computer architecture: Universal Turing Machines/Harvard/VNM • Stored-program principle • Break a big task into subtasks, each can be handled by a TM/single purposed program stored in a program memory 37 https://en.wikipedia.org/
  • 143. NUTM: Learn to select program (neural weight) via program attention • Neural stored-program memory (NSM) stores key (the address) and values (the weight) • The weight is selected and loaded to the controller of NTM • The stored NTM weights and the weight of the NUTM is learnt end-to-end by backpropagation 38 Le, Hung, Truyen Tran, and Svetha Venkatesh. "Neural Stored-program Memory." In International Conference on Learning Representations. 2019.
  • 144. Scaling with memory of mini-programs • Prior, 1 program = 1 neural network (millions of parameters) • Parameter inefficiency since the programs do not share common parameters • Solution: store sharable mini-programs to compose infinite number of programs 39 it is analogous to building Lego structures corresponding to inputs from basic Lego bricks.
  • 145. Recurrent program attention to retrieve singular components of a program 40 Le, Hung, and Svetha Venkatesh. "Neurocoder: Learning General-Purpose Computation Using Stored Neural Programs." arXiv preprint arXiv:2009.11443 (2020).
  • 146. 41 Program attention is equivalent to binary decision tree reasoning Recurrent program attention auto detects task boundary
  • 147. Agenda • Reasoning with external memories • Memory of entities – memory-augmented neural networks • Memory of relations with tensors and graphs • Memory of programs & neural program construction. • Learning to reason with less labels: • Data augmentation with analogical and counterfactual examples • Question generation • Self-supervised learning for question answering • Learning with external knowledge graphs • Recursive reasoning with neural theory of mind. 42
  • 148. Data Augmentation with Analogical and Counterfactual Examples 43 • Poor generalization when training under independent and identically distributed assumption. • Intuition: augmenting counterfactual samples to allow machines to understand the critical changes in the input that lead to changes in the answer space. • Perceptually similar, yet • Semantically dissimilar realistic samples Visual counterfactual example Language counterfactual examples Gokhale, Tejas, et al. "Mutant: A training paradigm for out-of-distribution generalization in visual question answering." EMNLP’20.
  • 149. Question Generations 44 Li, Yikang, et al. "Visual question generation as dual task of visual question answering." CVPR’18. Krishna, Ranjay, Michael Bernstein, and Li Fei-Fei. "Information maximizing visual question generation." CVPR’19. • Question answering is a zero-shot learning problem. Question generation helps cover a wider range of concepts. • Question generation can be done with either supervised and unsupervised learning.
  • 150. BERT: Transformer That Predicts Its Own Masked Parts 46 BERT is like parallel approximate pseudo- likelihood • ~ Maximizing the conditional likelihood of some variables given the rest. • When the number of variables is large, this converses to MLE (maximum likelihood estimate). [Slide credit: Truyen Tran] https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
  • 151. Visual QA as a Down-stream Task of Visual- Language BERT Pre-trained Models 47 Numerous pre-trained visual language models during 2019-2021. VisualBERT (Li, Liunian Harold, et al., 2019) VL-BERT (Su, Weijie, et al., 2019) UNITER (Chen, Yen-Chun, et al., 2019) 12-in-1 (Lu, Jiasen, et al., 2020) Pixel-BERT (Huang, Zhicheng, et al., 2019) OSCAR (Li, Xiujun, et al., 2020) Single-stream model Two-stream model ViLBERT (Lu, Jiasen, et al. , 2019) LXMERT (Tan, Hao, and Mohit Bansal, 2019) [Slide credit: Licheng Yu et al.]
  • 152. Learning with External Knowledge 48 Why external knowledge for reasoning? • Questions can be beyond visual recognition (e.g. firetrucks usually use a fire hydrant). • Human’s prior knowledge for cognition-level reasoning (e.g. human’s goals, intents etc.) Q: What sort of vehicle uses this item? A: firetruck Q: What is the sports position of the man in the orange shirt? A: goalie/goalkeeper Marino, Kenneth, et al. "Ok-vqa: A visual question answering benchmark requiring external knowledge." CVPR’19. Zellers, Rowan, et al. "From recognition to cognition: Visual commonsense reasoning." CVPR’19.
  • 153. Learning with External Knowledge 49 Retrieved by Wikipedia search API Marino, Kenneth, et al. "Ok-vqa: A visual question answering benchmark requiring external knowledge." CVPR’19. Shah, Sanket, et al. "Kvqa: Knowledge-aware visual question answering." AAAI’19.
  • 154. Agenda • Reasoning with external memories • Memory of entities – memory-augmented neural networks • Memory of relations with tensors and graphs • Memory of programs & neural program construction. • Learning to reason with less labels: • Data augmentation with analogical and counterfactual examples • Question generation • Self-supervised learning for question answering • Learning with external knowledge graphs • Recursive reasoning with neural theory of mind. 50
  • 155. Source: religious studies project Core AI faculty: Theory of mind
  • 156. Where would ToM fit in? System 1: Intuitive System 1: Intuitive System 1: Intuitive • Fast • Implicit/automatic • Pattern recognition • Multiple System 2: Analytical • Slow • Deliberate/rational • Careful analysis • Single, sequential Single Image credit: VectorStock | Wikimedia Perception Theory of mind Recursive reasoning Facts Semantics Events and relations Working space Memory
  • 157. Contextualized recursive reasoning • Thus far, QA tasks are straightforward and objective: • Questioner: I will ask about what I don’t know. • Answerer: I will answer what I know. • Real life can be tricky, more subjective: • Questioner: I will ask only questions I think they can answer. • Answerer 1: This is what I think they want from an answer. • Answerer 2: I will answer only what I think they think I can. 14/08/2021 53  We need Theory of Mind to function socially.
  • 158. Social dilemma: Stag Hunt games • Difficult decision: individual outcomes (selfish) or group outcomes (cooperative). • Together hunt Stag (both are cooperative): Both have more meat. • Solely hunt Hare (both are selfish): Both have less meat. • One hunts Stag (cooperative), other hunts Hare (selfish): Only one hunts hare has meat. • Human evidence: Self-interested but considerate of others (cultures vary). • Idea: Belief-based guilt-aversion • One experiences loss if it lets other down. • Necessitates Theory of Mind: reasoning about other’s mind.
  • 159. Theory of Mind Agent with Guilt Aversion (ToMAGA) Update Theory of Mind • Predict whether other’s behaviour are cooperative or uncooperative • Updated the zero-order belief (what other will do) • Update the first-order belief (what other think about me) Guilt Aversion • Compute the expected material reward of other based on Theory of Mind • Compute the psychological rewards, i.e. “feeling guilty” • Reward shaping: subtract the expected loss of the other. Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning." Asian Conference on Machine Learning. PMLR, 2020. [Slide credit: Dung Nguyen]
  • 160. Machine Theory of Mind Architecture (inside the Observer) Successor representations next-step action probability goal Rabinowitz, Neil, et al. "Machine theory of mind." International conference on machine learning. PMLR, 2018. [Slide credit: Dung Nguyen]
  • 161. A ToM architecture • Observer maintains memory of previous episodes of the agent. • It theorizes the “traits” of the agent. • Implemented as Hyper Networks. • Given the current episode, the observer tries to infer goal, intention, action, etc of the agent. • Implemented as memory retrieval through attention mechanisms. 14/08/2021 57
  • 163. Wrapping up • Reasoning as the next challenge for deep neural networks • Part A: Learning-to-reason framework • Reasoning as a prediction skill that can be learnt from data • Dynamic neural networks are capable • Combinatorics reasoning • Part B: Reasoning over unstructured and structured data • Reasoning over unstructured sets • Relational reasoning over structured data • Part C: Memory | Data efficiency | Recursive reasoning • Memories of items, relations and programs • Learning with less labels • Theory of mind 14/08/2021 59
  • 164. A possible framework for learning and reasoning with deep neural networks System 1: Intuitive System 1: Intuitive System 1: Intuitive • Fast • Implicit/automatic • Pattern recognition • Multiple System 2: Analytical • Slow • Deliberate/rational • Careful analysis • Single, sequential Single Image credit: VectorStock | Wikimedia Perception Theory of mind Recursive reasoning Facts Semantics Events and relations Working space Memory