SlideShare a Scribd company logo
Introduction to Neural
Networks + Art
Nandita Naik
Day 2: Session 2
Agenda
1. Can ML Create Images?
2. What is Style Transfer and
how does it work?
3. What is DeepDream and how
does it work?
4. Can ML write poetry or
compose music?
a. How?
Machine Learning for Art
Suppose we have a neural network which
does facial recognition.
Introduction to Neural Networks + Art
Image Recognizer in Action
Each mini-image corresponds to
an edge.
lines
https://www.slideshare.net/roelofp/python-for-image-understanding-deep-learning-with-convolutional-neural-nets
parts of the
face
entire face
Using a method like image
recognition, we can generate
images.
From white noise, a model similar to a face recognition
model generates faces.
Image Generator in Action
https://www.slideshare.net/roelofp/python-for-image-understanding-deep-learning-with-convolutional-neural-nets
Text-image synthesis
Another Example of Image Generation:
Generating Images from a Description
Generate an Image from the Description Makes it as close to any “real” image
Questions?
● GANs
● How an image recognition network can be
repurposed into an image generating network
● Text-image synthesis
● Embeddings
Image Style Transfer
What is style transfer?
Content image
Style image
Merged image
Style Transfer Between Images
Original Image Reference Image Style-Transfered
Original Image
How Does Style Transfer Work?
Input: Two images S and C: Image S provides the style
and image C provides the content.
A neural network extracts the style of S and the content
of C. (How? We’ll go into these terms later.)
Then it merges the two to create an image with the style
of S and the content of C.
What is content?
What can you see in
the picture?
What is content?
What can you see in
the picture?
- Wolf
- Mountain
- clouds
What is style?
Think of something
common across all
hidden layers,
such as colors, texture,
brush strokes
How do content & style extraction
work?
Image Recognition Neural Network
“rabbit”1 n
content content
Correlation between the weights at
different layers is an indicator of what
features the network thinks is most
important.
style
Then we merge the content
and style.
How?
How does merging work?
1. Start with white noise (call it “our_image”)
How does merging work?
1. Start with white noise (call it “our_image”)
2. Run our_image and contentimage through
content extractor
our_image
content
extractor
Content of our_image
(ex. white noise)
Content of
content_image
(ex. bunny)
content
image
How does merging work?
2. Run our_image and contentimage through content
extractor
3. Loss = difference between content of contentimage
and content of our_image
Content of
contentimage
Content of our_image
content loss
How does merging work?
1. Do the exact same for style.
2. So we have two loss functions, content loss and
style loss.
3. Use gradient descent to minimize these
4. The image that minimizes the content loss and the
style loss is the style transferred image
Questions?
Deep Dream : Convert Images
Into (Trippy) Art
What is DeepDream?
Introduction to Neural Networks + Art
Deep Dream: Creating Dogs When There are None
Introduction to Neural Networks + Art
Can anyone guess how this is
created?
Finding visual patterns and emphasizing
them.
What is the computer doing?
How does it work?
Think about running an image recognition network backwards.
What individual neurons output are patterns, and a confidence level
So then the original image is modified to boost the confidence level for the
output neurons
Normally, we would fix the input and change the weights. In this
case, we’re fixing the weights and changing the input.
input
“rabbit”1 n
Patterns!
(with a confidence level)
Questions?
Creating Music
These use something called a recurrent neural network, which is a
neural network that can remember what happened previously.
Train it on music previously generated.
Recommended: Project Magenta, https://deepjazz.io/. Pretty famous
on Soundcloud.
Creating Literature
A recurrent neural network that talks like Shakespeare!
Input a bunch of words, and ask it to generate the
words that come right after.
PANDARUS:
Alas, I think he shall be come approached and the day
When little srain would be attain'd into being never fed,
And who is but a chain and subjects of his death,
I should not sleep.
Second Senator:
They are away this miseries, produced upon my soul,
Breaking and strongly should be buried, when I perish
The earth and thoughts of many states.
DUKE VINCENTIO:
Well, your wit is in the care of side and that.
Second Lord:
They would be ruled after this chamber, and
my fair nues begun out of the fact, to be conveyed,
Whose noble souls I'll have the heart of the wars.
Clown:
Come, sir, I will make did behold your worship.
VIOLA:
I'll drink it.
to be or not?
to be
Recap
Generative adversarial networks: two networks against each other, one
which generates and one which discriminates
Style transfer: extract content, style, calculate content loss, style loss,
optimize
DeepDream: network goes, “I found a pattern! Let me change the original
image so I am more confident that my pattern exists.”
AI+Music and AI+Literature: use an RNN which can remember what
happened previously
Questions?
Introduction to Neural Networks + Art

More Related Content

PDF
7-200404101602.pdf
PPTX
Generative models
PDF
Data Science - Part XVII - Deep Learning & Image Processing
PDF
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
PPTX
Introduction to machine learning november 25, 2017
PDF
Untitled document (23).pdf
PDF
Deep Learning from Scratch - Building with Python from First Principles.pdf
DOCX
Deep learning vxcvbfsdfaegsr gsgfgsdg sd gdgd gdgd gse
7-200404101602.pdf
Generative models
Data Science - Part XVII - Deep Learning & Image Processing
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Introduction to machine learning november 25, 2017
Untitled document (23).pdf
Deep Learning from Scratch - Building with Python from First Principles.pdf
Deep learning vxcvbfsdfaegsr gsgfgsdg sd gdgd gdgd gse

Similar to Introduction to Neural Networks + Art (20)

PDF
Nimuel PechaKucha.pdf
PDF
Stable Diffusion Artificial Intelligence – The Quick Book (2).pdf
PDF
Pin On Reluctant Homeschool Writers
PPTX
An introduction to Deep Learning
PDF
Multimediaexercise
PPTX
HOW CONVOLUTIONAL NEURAL NETWORKS WORK_ (1).pptx
PPTX
unit 3 creating-images-and-vityhtytytytytdeos.pptx
PDF
Multi-modal embeddings: from discriminative to generative models and creative ai
PPTX
PBL presentation p2.pptx
PPTX
Deep Learning with Python (PyData Seattle 2015)
PDF
Phidget Artifact Project
PPTX
Automatic Attendace using convolutional neural network Face Recognition
PPTX
PPTX
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
PDF
Alberto Massidda - Images and words: mechanics of automated captioning with n...
PDF
Y conf talk - Andrej Karpathy
PDF
machine learning and art hack day
PDF
Top 10 deep learning algorithms you should know in
PDF
Visual search
PDF
stable_diffusion_a_tutorial, How stable_diffusion works, build stable_diffusi...
Nimuel PechaKucha.pdf
Stable Diffusion Artificial Intelligence – The Quick Book (2).pdf
Pin On Reluctant Homeschool Writers
An introduction to Deep Learning
Multimediaexercise
HOW CONVOLUTIONAL NEURAL NETWORKS WORK_ (1).pptx
unit 3 creating-images-and-vityhtytytytytdeos.pptx
Multi-modal embeddings: from discriminative to generative models and creative ai
PBL presentation p2.pptx
Deep Learning with Python (PyData Seattle 2015)
Phidget Artifact Project
Automatic Attendace using convolutional neural network Face Recognition
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
Alberto Massidda - Images and words: mechanics of automated captioning with n...
Y conf talk - Andrej Karpathy
machine learning and art hack day
Top 10 deep learning algorithms you should know in
Visual search
stable_diffusion_a_tutorial, How stable_diffusion works, build stable_diffusi...
Ad

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Cloud computing and distributed systems.
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Spectroscopy.pptx food analysis technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Understanding_Digital_Forensics_Presentation.pptx
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Dropbox Q2 2025 Financial Results & Investor Presentation
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectroscopy.pptx food analysis technology
Per capita expenditure prediction using model stacking based on satellite ima...
sap open course for s4hana steps from ECC to s4
NewMind AI Weekly Chronicles - August'25 Week I
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Ad

Introduction to Neural Networks + Art

  • 1. Introduction to Neural Networks + Art Nandita Naik
  • 2. Day 2: Session 2 Agenda 1. Can ML Create Images? 2. What is Style Transfer and how does it work? 3. What is DeepDream and how does it work? 4. Can ML write poetry or compose music? a. How? Machine Learning for Art
  • 3. Suppose we have a neural network which does facial recognition.
  • 5. Image Recognizer in Action Each mini-image corresponds to an edge. lines https://www.slideshare.net/roelofp/python-for-image-understanding-deep-learning-with-convolutional-neural-nets parts of the face entire face
  • 6. Using a method like image recognition, we can generate images.
  • 7. From white noise, a model similar to a face recognition model generates faces.
  • 8. Image Generator in Action https://www.slideshare.net/roelofp/python-for-image-understanding-deep-learning-with-convolutional-neural-nets
  • 9. Text-image synthesis Another Example of Image Generation: Generating Images from a Description
  • 10. Generate an Image from the Description Makes it as close to any “real” image
  • 11. Questions? ● GANs ● How an image recognition network can be repurposed into an image generating network ● Text-image synthesis ● Embeddings
  • 13. What is style transfer? Content image Style image Merged image
  • 14. Style Transfer Between Images Original Image Reference Image Style-Transfered Original Image
  • 15. How Does Style Transfer Work? Input: Two images S and C: Image S provides the style and image C provides the content. A neural network extracts the style of S and the content of C. (How? We’ll go into these terms later.) Then it merges the two to create an image with the style of S and the content of C.
  • 16. What is content? What can you see in the picture?
  • 17. What is content? What can you see in the picture? - Wolf - Mountain - clouds
  • 18. What is style? Think of something common across all hidden layers, such as colors, texture, brush strokes
  • 19. How do content & style extraction work?
  • 20. Image Recognition Neural Network “rabbit”1 n content content Correlation between the weights at different layers is an indicator of what features the network thinks is most important. style
  • 21. Then we merge the content and style. How?
  • 22. How does merging work? 1. Start with white noise (call it “our_image”)
  • 23. How does merging work? 1. Start with white noise (call it “our_image”) 2. Run our_image and contentimage through content extractor our_image content extractor Content of our_image (ex. white noise) Content of content_image (ex. bunny) content image
  • 24. How does merging work? 2. Run our_image and contentimage through content extractor 3. Loss = difference between content of contentimage and content of our_image Content of contentimage Content of our_image content loss
  • 25. How does merging work? 1. Do the exact same for style. 2. So we have two loss functions, content loss and style loss. 3. Use gradient descent to minimize these 4. The image that minimizes the content loss and the style loss is the style transferred image
  • 27. Deep Dream : Convert Images Into (Trippy) Art
  • 30. Deep Dream: Creating Dogs When There are None
  • 32. Can anyone guess how this is created?
  • 33. Finding visual patterns and emphasizing them. What is the computer doing?
  • 34. How does it work? Think about running an image recognition network backwards. What individual neurons output are patterns, and a confidence level So then the original image is modified to boost the confidence level for the output neurons Normally, we would fix the input and change the weights. In this case, we’re fixing the weights and changing the input.
  • 37. Creating Music These use something called a recurrent neural network, which is a neural network that can remember what happened previously. Train it on music previously generated. Recommended: Project Magenta, https://deepjazz.io/. Pretty famous on Soundcloud.
  • 38. Creating Literature A recurrent neural network that talks like Shakespeare! Input a bunch of words, and ask it to generate the words that come right after.
  • 39. PANDARUS: Alas, I think he shall be come approached and the day When little srain would be attain'd into being never fed, And who is but a chain and subjects of his death, I should not sleep. Second Senator: They are away this miseries, produced upon my soul, Breaking and strongly should be buried, when I perish The earth and thoughts of many states. DUKE VINCENTIO: Well, your wit is in the care of side and that. Second Lord: They would be ruled after this chamber, and my fair nues begun out of the fact, to be conveyed, Whose noble souls I'll have the heart of the wars. Clown: Come, sir, I will make did behold your worship. VIOLA: I'll drink it.
  • 40. to be or not? to be
  • 41. Recap Generative adversarial networks: two networks against each other, one which generates and one which discriminates Style transfer: extract content, style, calculate content loss, style loss, optimize DeepDream: network goes, “I found a pattern! Let me change the original image so I am more confident that my pattern exists.” AI+Music and AI+Literature: use an RNN which can remember what happened previously

Editor's Notes

  • #3: These are really advanced topics, so we will only give the high-level intuition here.
  • #5: https://www.usnews.com/dims4/USNEWS/853ba66/2147483647/thumbnail/970x647/quality/85/?url=%2Fcmsmedia%2Ffd%2F95%2F26afe35a4dde8cf716252ac8ad7d%2F140721-designmain-editorial.jpg
  • #6: Each edge out of each neuron corresponds to an image So what does each hidden layer appear to be doing here? Each hidden layer is composing the output of the previous layers to identify the key parts of the face.
  • #8: Here we can see an example. So our program creates these faces from scratch, from this image of noise and static. To clarify, these faces aren’t readymade pictures taken by a human--they are generated entirely by our program. (Source: https://raw.githubusercontent.com/torch/torch.github.io/master/blog/_posts/images/model.png)
  • #9: Let us get back to our earlier neural network. Suppose we change the output layer that tries to make its output image to be as close to a face as possible (meaning it has all the attributes of a face), when given a white noise image as input. That is, roughly speaking a face generetor. https://www.slideshare.net/roelofp/python-for-image-understanding-deep-learning-with-convolutional-neural-nets
  • #10: So here we are giving the program a description: two plates of food that include beans, guacamole and rice. We are asking it to generate an image which contains everything in this description. It seems to have done pretty well! This type of generation problem is called text-image synthesis. This is a very high-level explanation, feel free to talk to me later to find out more: We create something called embeddings for both the text and the images in the training data such that the distance between text and any image is meaningful. That is, if they are close in meaning, then they have a small distance; if they are far away, they have a large distance. By distance I mean vector distance http://imgur.com/Zt7W2vI
  • #11: This is called a GAN, and it is a system that can generate images out of white noise. (hard concept, will explain intuitively) A GAN is made up of two networks. One network makes these images. How does it know whether its image is realistic or not? Going back to our linear regression days, we were able to measure a loss--how good or bad we were doing. For GANs, to evaluate the image, we need a whole other network called a discriminator network. This network classifies whether the image the G network produces is fake or not, and challenges the other to improve. Let’s take an analogy here. Suppose we have a bank, and someone who counterfeits money. At the beginning, it’s easy for the bank to detect which money is fake, because the counterfeiter isn’t very good. But the counterfeiter learns from which ones succeed, so some money may get past the bank’s defences. The bank also keeps learning how to tell the fakes apart. So, end result, the counterfeiter is able to produce near-exact replicas of money. Back to our situation, the bank is a discriminator network--it tells the generator network, aka the counterfeiter, what it is doing wrong. The key observation here is that both networks keep learning, so over time, the image quality produced ges better and better. This is how a GAN works. When I say “real image,” I mean Source: https://camo.githubusercontent.com/1925e23b5b6e19efa60f45daa3787f1f4a098ef3/687474703a2f2f692e696d6775722e636f6d2f644e6c32486b5a2e6a7067
  • #12: Ask Pamela all the deep technical questions! I understand that we went through a lotta concepts. My goal is to provide an intuitive understanding behind these concepts.
  • #14: Remember how yesterday we talked about a pastiche, an art piece which imitates the style of another painting? Style transfer is when a computer makes a pastiche. So you take an image, and say, for example, “I want this to look like an oil painting.” Then you transfer (Source: http://static.boredpanda.com/blog/wp-content/uploads/2015/09/cute-bunnies-25__605.jpg, www.deepdream.com)
  • #19: This is hard to define, but let me try to explain it this way. Suppose we have a line. This line, no matter if we flip it, rotate it, or compose it with something else, it will always be red.
  • #20: Remember that slide we saw, with the faces and the different hidden layers? Source: https://blog.paperspace.com/art-with-neural-networks/
  • #21: look at the weights of each layer, and look at the correlation between them and then we pick out the features that the neural network thinks is most important. this is the style. Content is something that is specific to a couple of hidden layers. But all hidden layers preserve style. If the weights at all the hidden layers emphasize a certain feature, then we know it’s important. And
  • #25: Remember that these are represented by pixels.
  • #26: Do the exact same for style. So for style we’d run both the style image and our_image through a style extractor, and then compute style_loss by taking the difference between the two styles.
  • #29: DeepDream converts an image into another. Let’s look at a bunch of examples, and see if we can guess how
  • #30: Once upon a time, this was Starry Night by Van Gogh. Then it was invaded by snakes, fish, birds, ducks, cars… :)
  • #32: This is my all-time favorite. Thanks to Pamela for showing this to me. Can anybody guess how DeepDream is created?
  • #33: DRINK WATER
  • #34: Suppose you’re playing the game Telephone. You say a phrase: “Tiny elephants like bubbles.” One person thinks they hear something else, convinces themselves that they’re right, and then passes it on. The original phrase ends up massively distorted, like “Slimy sea slugs slime trouble.” This is basically what the computer is doing. It detects a pattern, convinces itself it is absolutely right, and adjusts the original image so it’s more confident about the patterns it sees.
  • #35: So basically, what the computer is saying to itself is: “Oh, I’ve detected a really small pattern right here. But I feel really confident, and I’m totally right that is a dog! So let me go back and adjust the picture so I’m more confident that my pattern exists.”
  • #36: Where do we get these patterns? From the hidden layers, of course!
  • #38: So yesterday, with AI Experiments for Google, you may have played with some music-generation stuff! I think the intersection of AI + Music is cool, it’s something that’s very deeply human. Intuition behind RNN: We have a sequence of observations. We assume the next observation depends on the current state as well as previous “hidden states”. (By states, I mean N-dimensional vectors) We need an RNN in music because music falls into many patterns. If we finish a little early, we can watch some AI-generated music called “Daddy’s Car” which is supposed to sound like the Beatles.
  • #41: http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png
  • #42: Note that with image recognition, there’s always a right or wrong. There’s no universal definition of what “good art” is.