Multi-modal embeddings: from discriminative to generative models and creative ai

@graphiﬁc
Roelof Pieters
Mul--Modal Embeddings:
from Discrimina-ve to
Genera-ve Models and
Crea-ve AI
2 May 2016  
KTH
www.csc.kth.se/~roelof/
roelof@kth.se

• AE/VAE
• DBN
• Latent Vectors/
Manifold Walking
• RNN / LSTM /GRU
• Modded RNNs (ie
Biaxial RNN)
• CNN + LSTM/GRU
• X + Mixture density
network (MDN)
Common Generative Architectures
• DRAW
• GRAN
• DCGAN
• DeepDream & other CNN
visualisations
• Splitting/Remixing NNs:
• Image Analogy
• Style Transfer
• Semantic Style Transfer
• Texture Synthesis
• Compositional Pattern-Producing
Networks (CPPN)
• NEAT
• CPPN w/ GAN+VAE.

• AE/VAE
• DBN
• Latent Vectors/
Manifold Walking
• RNN / LSTM /GRU
• Modded RNNs (ie
Biaxial RNN)
• CNN + LSTM/GRU
• X + Mixture density
network (MDN)
Common Generative Architectures
• DRAW
• GRAN
• DCGAN
• DeepDream & other CNN
visualisations
• Splitting/Remixing NNs:
• Image Analogy
• Style Transfer
• Semantic Style Transfer
• Texture Synthesis
• Compositional Pattern-Producing
Networks (CPPN)
• NEAT
• CPPN w/ GAN+VAE.
what we’ll cover

Alex Graves (2014) Generating Sequences With
Recurrent Neural Networks
Wanna Play ?
Text Prediction (1)

Wanna Play ?
Text Prediction (2)

Wanna Play ?
Handwriting Prediction

Wanna Play ?
Handwriting Prediction
(we’re skipping the density mixture network details for now)

Multi-modal embeddings: from discriminative to generative models and creative ai

Wanna Play ?
Text generation
15
Karpathy (2015), The Unreasonable Eﬀectiveness of Recurrent Neural
Networks (blog)

Wanna Play ?
Text generation
16
Networks (blog)

Networks (blog)

Andrej Karpathy, Justin Johnson, Li Fei-Fei (2015) Visualizing and
Understanding Recurrent Networks

Karpathy (2015), The
Unreasonable Eﬀectiveness
of Recurrent Neural
Networks (blog)

http://www.creativeai.net/posts/aeh3orR8g6k65Cy9M/
generating-magic-cards-using-deep-recurrent-
convolutional

More…
more at: 
http://gitxiv.com/category/natural-language-
processing-nlp
http://www.creativeai.net/?cat%5B0%5D=read-
write

Turn Convnet Around: “Deep Dream”
Image -> NN -> What do you (think) you see  
-> Whats the (text) label
Image -> NN -> What do you (think) you see ->  
feed back activations ->  
optimize image to “ﬁt” to the ConvNets
“hallucination” (iteratively)

Google, Inceptionism: Going Deeper into Neural Networks
see also: www.csc.kth.se/~roelof/deepdream/

see also: www.csc.kth.se/~roelof/deepdream/
Google, Inceptionism: Going Deeper into Neural Networks

code
youtube
Roelof Pieters 2015

https://www.ﬂickr.com/photos/graphiﬁc/albums/72157657250972188
Single Units
Roelof Pieters 2015

Multifaceted Feature Visualization
Anh Nguyen, Jason Yosinski, Jeff Clune (2016)
Multifaceted Feature Visualization: Uncovering the
Different Types of Features Learned By Each Neuron in
Deep Neural Networks

Multifaceted Feature Visualization

Preferred stimuli generation
Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, and Jeff Clune (2016) AI Neuroscience: Understanding Deep
Neural Networks by Synthetically Generating the Preferred Stimuli for Each of Their Neurons

Inter-modal: Style Transfer (“Style Net” 2015)
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge , 2015.  
A Neural Algorithm of Artistic Style (GitXiv)

Inter-modal: Image Analogies (2001)
A. Hertzmann, C. Jacobs, N. Oliver, B. Curless, D. Salesin.
(2001) Image Analogies, SIGGRAPH 2001 Conference Proceedings.
A. Hertzmann (2001) Algorithms for Rendering in Artistic Styles
Ph.D thesis. New York University. May, 2001.

Inter-modal: Image Analogies (2001)

style layers: 5_1 + 5_2 + 5_3 + 5_4

Gene Kogan, 2015. Why is a Raven Like a Writing Desk? (vimeo)

Inter-modal: Style Transfer+MRF (2016)
Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis, 2016, Chuan Li, Michael Wand

Inter-modal: Pretrained Style Transfer (2016)
Texture Networks: Feed-forward Synthesis of Textures and Stylized Images, 2016, Dmitry Ulyanov, Vadim Lebedev, Andrea
Vedaldi, Victor Lempitsky
500x speedup! (avg min loss from 10s to 20ms)

Inter-modal: Pretrained Style Transfer #2 (2016)
Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks, Chuan Li, Michael Wand, 2016
similarly 500x speedup

Inter-modal: Perceptual Loss ST (2016)
Perceptual Losses for Real-Time Style Transfer and Super-Resolution, 2016, Justin Johnson, Alexandre Alahi, Li Fei-Fei

Perceptual Losses for Real-Time Style Transfer and Super-Resolution, 2016, Justin Johnson, Alexandre Alahi, Li Fei-Fei

54
@DeepForger

55
@DeepForger

+
+
=
Inter-modal: Semantic Style Transfer

https://github.com/alexjc/neural-doodle
Semantic Style Transfer (“Neural Doodle”)

Synthesise textures (random weights)
- which activation function should i use?
- pooling?
- min nr of units?
Experiment:
https://nucl.ai/blog/extreme-style-machines/
Random Weights (kinda like “Extreme Learning Machines”)

Synthesise textures (random weights)
totally random initialised weights:

Synthesise textures (randim weights)
Activation Functions

Down-sampling

Nr of units

• Image Analogies, 2001, A. Hertzmann, C. Jacobs, N. Oliver, B. Curless, D. Sales
• A Neural Algorithm of Artistic Style, 2015. Leon A. Gatys, Alexander S. Ecker,
Matthias Bethge
• Combining Markov Random Fields and Convolutional Neural Networks for Image
Synthesis, 2016, Chuan Li, Michael Wand
• Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks, 2016, Alex J.
Champandard
• Texture Networks: Feed-forward Synthesis of Textures and Stylized Images, 2016,
Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, Victor Lempitsky
• Perceptual Losses for Real-Time Style Transfer and Super-Resolution, 2016, Justin
Johnson, Alexandre Alahi, Li Fei-Fei
• Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial
Networks, 2016, Chuan Li, Michael Wand
• @DeepForger
70
“Style Transfer” papers

“A stop sign is ﬂying in blue skies.”
“A herd of elephants ﬂying in the blue skies.”
Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan
Salakhutdinov, 2015. Generating Images from Captions
with Attention (arxiv) (examples)
Caption -> Image generation

More…
more at: 
http://gitxiv.com/category/computer-vision
http://www.creativeai.net/ (no category for
images just yet)

Early LSTM music composition (2002)
Douglas Eck and Jurgen Schmidhuber (2002) Learning The
Long-Term Structure of the Blues?

Markov constraints
http://www.ﬂow-machines.com/

78
Audio Generation: Midi
https://soundcloud.com/graphiﬁc/pyotr-lstm-tchaikovsky
A Recurrent Latent Variable Model for
Sequential Data, 2016,  
J. Chung, K. Kastner, L. Dinh, K. Goel,
A. Courville, Y. Bengio
+ “modded VRNN:

79
Audio Generation: Midi
https://soundcloud.com/graphiﬁc/neural-remix-net
A Recurrent Latent Variable Model for
Sequential Data, 2016,  
J. Chung, K. Kastner, L. Dinh, K. Goel,
A. Courville, Y. Bengio
+ “modded VRNN:

80
Audio Generation: Raw
Gated Recurrent Unit (GRU)stanford cs224d project
Aran Nayebi, Matt Vitelli (2015) GRUV: Algorithmic Music Generation using Recurrent Neural Networks

• LSTM improvements
• Recurrent Batch Normalization http://
gitxiv.com/posts/MwSDm6A4wPG7TcuPZ/
recurrent-batch-normalization
• also: hidden-to-hidden transition (earlier only
input-to-hidden transformation of RNNs)
• faster convergence and improved generalization.
LSTM improvements

• weight normalisation http://gitxiv.com/posts/
p9B6i9Kzbkc5rP3cp/weight-normalization-a-
simple-reparameterization-to
LSTM improvements

• Associative Long Short-Term Memory http://
gitxiv.com/posts/jpfdiFPsu5c6LLsF4/associative-
long-short-term-memory
LSTM improvements

• Bayesian RNN dropout http://gitxiv.com/posts/
CsCDjy7WpfcBvZ88R/bayesianrnn
LSTM improvements

Chor-RNN
Continuous Generation
Luka Crnkovic-Friis & Louise Crnkovic-Friis (2016) Generative Choreography
using Deep Learning

• Mixture Density LSTM
Generative
1
2
3
4
5
6

python has a wide range of deep
learning-related libraries available
Deep Learning with Python
Low level
High level
deeplearning.net/software/theano
caffe.berkeleyvision.org
tensorﬂow.org/
lasagne.readthedocs.org/en/latest
and of course:
keras.io

Code & Papers?
http://gitxiv.com/ #GitXiv

Creative AI projects?
http://www.creativeai.net/ #Crea-veAI

Questions?
love letters? existential dilemma’s? academic questions? gifts?  
ﬁnd me at: 
www.csc.kth.se/~roelof/
roelof@kth.se
@graphiﬁc
Oh, and soon we’re looking for Creative AI enthusiasts !
- job
- internship
- thesis work
in
AI (Deep Learning) 
& 
Creativity

https://medium.com/@ArtiﬁcialExperience/
creativeai-9d4b2346faf3

Creative AI > a “brush” > rapid experimentation
human-machine collaboration

(YouTube, Paper)

(Vimeo, Paper)

101
Generative Adverserial Nets
Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus, 2015.  
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks (GitXiv)

102
Alec Radford, Luke Metz, Soumith Chintala , 2015.  
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (GitXiv)

103

104
”turn” vector created from four averaged samples of faces looking left
vs looking right.

walking through the manifold

top: unmodiﬁed samples
bottom: same samples dropping out ”window” ﬁlters

Multi-modal embeddings: from discriminative to generative models and creative ai

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Multi-modal embeddings: from discriminative to generative models and creative ai (20)

Recently uploaded (20)

Multi-modal embeddings: from discriminative to generative models and creative ai