SlideShare a Scribd company logo
RBM for unsupervised features
learning of handwritten digits
(MNIST dataset)
Neural networks
Feedforward ANN: compute
units states layer-by-layer
Hopfield networks
Recurrent ANN (Hopfield net): update
units until convergence
Hopfield networks
Energy minima correspond to learned patterns
Bolzmann machine
Stochastic neural network,
Has probability distribution defined:
Low energy states have higher probability
“Partition function”: normalization constant
(Boltzmann distribution is a probability distribution
of particles in a system over various possible states)
Bolzmann machine
Same form of energy as for Hopfield networks:
But, probabilistic units state update:
(Sigmoid function)
Bolzmann machine
v: Visible units, connected to observed data
h: Hidden units, capture dependancies and
patterns in visible variables
Interested in modeling observed data, so
marginalize over hidden variables (x for
visivle variables):
Defining “Free energy”
Allows to define
distribution
of visible states:
Bolzmann machine
Log-likelihood given a training sample (v):
Gradient with respect to model parameters:
RBM
RBM
Approximating gradient
“Contrastive divergence”: for model samples,
initialize Markov chain from the training sample
Gibbs sampling: alternating update of visible and hidden units
Contrastive divergence
Persistent Contrastive Divergence
Run several sampling chains in parallel
...
For example, one for each training sample
Mnist dataset
60,000 training images and 10,000 testing images
28 x 28 grayscale images
pixels connected
to 768 visible units
v0 v1 v2 v3 v4 v5 v6 v7 ... v768
h0 h1 h2 h3 h4 h5 h6 h7 ... h100
use 100 hidden units
Demo
v0 v1 v2 v3 v4 v5 v6 v7 ... v768
h0
Training code
// (positive phase)
// compute hidden nodes activations and probabilities
for (int i = 0; i < n_hidden; i++) {
h_prob[i] = 0.0f;
for (int j = 0; j < n_visible; j++) {
h_prob[i] += c[i] + v_data[j] * W[j * n_hidden + i];
}
h_prob[i] = sigmoid(h_prob[i]);
h[i] = ofRandom(1.0f) > h_prob[i] ? 0.0f : 1.0f;
}
for (int i = 0; i < n_visible * n_hidden; i++) {
pos_weights[i] = v_data[i / n_hidden] * h_prob[i % n_hidden];
}
// (negative phase)
// sample visible nodes
for (int i = 0; i < n_visible; i++) {
v_negative_prob[i] = 0.0f;
for (int j = 0; j < n_hidden; j++) {
v_negative_prob[i] += b[i] + h[j] * W[i * n_hidden + j];
}
v_negative_prob[i] = sigmoid(v_negative_prob[i]);
v_negative[i] = ofRandom(1.0f) > v_negative_prob[i] ? 0.0f :
1.0f;
}
Training code
// and hidden nodes once again
for (int i = 0; i < n_hidden; i++) {
h_negative_prob[i] = 0.0f;
for (int j = 0; j < n_visible; j++) {
h_negative_prob[i] += c[i] + v_negative[j] * W[j * n_hidden + i];
}
h_negative_prob[i] = sigmoid(h_negative_prob[i]);
h_negative[i] = ofRandom(1.0f) > h_negative_prob[i] ? 0.0f : 1.0f;
}
for (int i = 0; i < n_visible * n_hidden; i++) {
neg_weights[i] = v_negative[i / n_hidden] * h_negative_prob[i %
n_hidden];
}
// update weights
for (int i = 0; i < n_visible * n_hidden; i++) {
W[i] += learning_rate * (pos_weights[i] - neg_weights[i]);
}
Training
● Matrix multiplication
● Mini-batch training
● Regularization
● Weights update momentum
Training code
// (positive phase)
clear(h_data_prob, n_hidden * batch_size);
multiply(v_data, W, h_data_prob, batch_size, n_visible,
n_hidden);
getState(h_data_prob, h_data, n_hidden * batch_size);
transpose(v_data, vt, batch_size, n_visible);
multiply(vt, h_data, pos_weights, n_visible, batch_size,
n_hidden);
for (int cdk = 0; cdk < k; cdk++) {
// run update for CD1 or persistent chain for PCD
clear(v_prob, n_visible * batch_size);
transpose(W, Wt, n_visible, n_hidden);
multiply(h, Wt, v_prob, batch_size, n_hidden, n_visible);
getState(v_prob, v, n_visible * batch_size);
// and hidden nodes
clear(h_prob, n_hidden * batch_size);
multiply(v, W, h_prob, batch_size, n_visible, n_hidden);
getState(h_prob, h, n_hidden * batch_size);
}
Training code
for (int i = 0; i < n_visible * n_hidden; i++) {
W_inc[i] *= momentum;
W_inc[i] += learning_rate * (pos_weights[i] -
neg_weights[i])
/ (float) batch_size - weightcost * W[i];
W[i] += W_inc[i];
}
Sparseness & Selectivity
Sparseness & Selectivity
// increasing selectivity
float activity_smoothing = 0.99f;
float total_active = 0.0f;
for (int i = 0; i < n_hidden; i++) {
float activity = pos_weights[i] / (float)batch_size;
mean_activity[i] = mean_activity[i] *
activity_smoothing
+ activity * (1.0f - activity_smoothing);
W[i] += (0.01f - mean_activity[i]) * 0.01f;
total_active += activity;
}
// increasing sparseness
q = activity_smoothing * q
+ (1.0f - activity_smoothing) * (total_active / (float)
n_hidden);
for (int i = 0; i < n_hidden; i++) {
W[i] += (0.1f - q) * 0.01f;
}
Classification with RBM
Classification with RBM
● inputs for some standard discriminative
method (classifier)
● train a separate RBM on each class
Greedy layer-wise training
Higher layers capture more abstract features:

More Related Content

PPTX
DNN and RBM
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
PPT
Multilayer perceptron
PPTX
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
PDF
neural networksNnf
PDF
Lecture 06 marco aurelio ranzato - deep learning
PDF
The Back Propagation Learning Algorithm
PPTX
Nural network ER.Abhishek k. upadhyay
DNN and RBM
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Multilayer perceptron
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
neural networksNnf
Lecture 06 marco aurelio ranzato - deep learning
The Back Propagation Learning Algorithm
Nural network ER.Abhishek k. upadhyay

What's hot (20)

PDF
Lesson 38
PDF
15 Machine Learning Multilayer Perceptron
PPTX
Back propagation network
DOCX
Backpropagation
PDF
Deep Style: Using Variational Auto-encoders for Image Generation
PDF
Artificial Neural Networks Lect8: Neural networks for constrained optimization
PDF
PPTX
Back propagation method
PDF
PPTX
Anomaly detection using deep one class classifier
PPTX
The world of loss function
PDF
Convolution Neural Networks
PPTX
2021 03-01-on the relationship between self-attention and convolutional layers
PDF
Neural network in matlab
PPTX
PDF
Multi Layer Perceptron & Back Propagation
PDF
Learning in Networks: were Pavlov and Hebb right?
PPTX
Auto encoders in Deep Learning
PPTX
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
PDF
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Lesson 38
15 Machine Learning Multilayer Perceptron
Back propagation network
Backpropagation
Deep Style: Using Variational Auto-encoders for Image Generation
Artificial Neural Networks Lect8: Neural networks for constrained optimization
Back propagation method
Anomaly detection using deep one class classifier
The world of loss function
Convolution Neural Networks
2021 03-01-on the relationship between self-attention and convolutional layers
Neural network in matlab
Multi Layer Perceptron & Back Propagation
Learning in Networks: were Pavlov and Hebb right?
Auto encoders in Deep Learning
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Ad

Similar to Introduction to RBM for written digits recognition (20)

PDF
Electricity Demand Forecasting Using ANN
PPTX
Deeplearning
PPTX
08 neural networks
PPT
Neural networks,Single Layer Feed Forward
PPTX
LEC 04 ANN Boltzmann Machines Restricted
PPTX
Batch normalization presentation
PPTX
Nimrita deep learning
PDF
A Survey of Deep Learning Algorithms for Malware Detection
PPTX
Neural network basic and introduction of Deep learning
PDF
Neural networks
PPTX
Introduction to Neural Networks and Deep Learning from Scratch
PPTX
Deep Learning in Recommender Systems - RecSys Summer School 2017
PPTX
Neural Network - Feed Forward - Back Propagation Visualization
PDF
Capstone paper
PDF
Separating Hype from Reality in Deep Learning with Sameer Farooqui
PPTX
Neural Networks in Artificial intelligence
PPTX
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
PPTX
Deep learning from a novice perspective
PPTX
Deep Learning
PDF
Artificial Neural Network for machine learning
Electricity Demand Forecasting Using ANN
Deeplearning
08 neural networks
Neural networks,Single Layer Feed Forward
LEC 04 ANN Boltzmann Machines Restricted
Batch normalization presentation
Nimrita deep learning
A Survey of Deep Learning Algorithms for Malware Detection
Neural network basic and introduction of Deep learning
Neural networks
Introduction to Neural Networks and Deep Learning from Scratch
Deep Learning in Recommender Systems - RecSys Summer School 2017
Neural Network - Feed Forward - Back Propagation Visualization
Capstone paper
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Neural Networks in Artificial intelligence
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
Deep learning from a novice perspective
Deep Learning
Artificial Neural Network for machine learning
Ad

Recently uploaded (20)

PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
top salesforce developer skills in 2025.pdf
PDF
AI in Product Development-omnex systems
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
medical staffing services at VALiNTRY
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
System and Network Administraation Chapter 3
PDF
Nekopoi APK 2025 free lastest update
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Understanding Forklifts - TECH EHS Solution
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Reimagine Home Health with the Power of Agentic AI​
PTS Company Brochure 2025 (1).pdf.......
top salesforce developer skills in 2025.pdf
AI in Product Development-omnex systems
How to Choose the Right IT Partner for Your Business in Malaysia
medical staffing services at VALiNTRY
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
L1 - Introduction to python Backend.pptx
Odoo POS Development Services by CandidRoot Solutions
System and Network Administraation Chapter 3
Nekopoi APK 2025 free lastest update
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Softaken Excel to vCard Converter Software.pdf
CHAPTER 2 - PM Management and IT Context
Understanding Forklifts - TECH EHS Solution
How to Migrate SBCGlobal Email to Yahoo Easily
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Reimagine Home Health with the Power of Agentic AI​

Introduction to RBM for written digits recognition

  • 1. RBM for unsupervised features learning of handwritten digits (MNIST dataset)
  • 2. Neural networks Feedforward ANN: compute units states layer-by-layer
  • 3. Hopfield networks Recurrent ANN (Hopfield net): update units until convergence
  • 4. Hopfield networks Energy minima correspond to learned patterns
  • 5. Bolzmann machine Stochastic neural network, Has probability distribution defined: Low energy states have higher probability “Partition function”: normalization constant (Boltzmann distribution is a probability distribution of particles in a system over various possible states)
  • 6. Bolzmann machine Same form of energy as for Hopfield networks: But, probabilistic units state update: (Sigmoid function)
  • 7. Bolzmann machine v: Visible units, connected to observed data h: Hidden units, capture dependancies and patterns in visible variables Interested in modeling observed data, so marginalize over hidden variables (x for visivle variables): Defining “Free energy” Allows to define distribution of visible states:
  • 8. Bolzmann machine Log-likelihood given a training sample (v): Gradient with respect to model parameters:
  • 9. RBM
  • 10. RBM
  • 11. Approximating gradient “Contrastive divergence”: for model samples, initialize Markov chain from the training sample Gibbs sampling: alternating update of visible and hidden units
  • 13. Persistent Contrastive Divergence Run several sampling chains in parallel ... For example, one for each training sample
  • 14. Mnist dataset 60,000 training images and 10,000 testing images 28 x 28 grayscale images pixels connected to 768 visible units v0 v1 v2 v3 v4 v5 v6 v7 ... v768 h0 h1 h2 h3 h4 h5 h6 h7 ... h100 use 100 hidden units
  • 15. Demo v0 v1 v2 v3 v4 v5 v6 v7 ... v768 h0
  • 16. Training code // (positive phase) // compute hidden nodes activations and probabilities for (int i = 0; i < n_hidden; i++) { h_prob[i] = 0.0f; for (int j = 0; j < n_visible; j++) { h_prob[i] += c[i] + v_data[j] * W[j * n_hidden + i]; } h_prob[i] = sigmoid(h_prob[i]); h[i] = ofRandom(1.0f) > h_prob[i] ? 0.0f : 1.0f; } for (int i = 0; i < n_visible * n_hidden; i++) { pos_weights[i] = v_data[i / n_hidden] * h_prob[i % n_hidden]; } // (negative phase) // sample visible nodes for (int i = 0; i < n_visible; i++) { v_negative_prob[i] = 0.0f; for (int j = 0; j < n_hidden; j++) { v_negative_prob[i] += b[i] + h[j] * W[i * n_hidden + j]; } v_negative_prob[i] = sigmoid(v_negative_prob[i]); v_negative[i] = ofRandom(1.0f) > v_negative_prob[i] ? 0.0f : 1.0f; }
  • 17. Training code // and hidden nodes once again for (int i = 0; i < n_hidden; i++) { h_negative_prob[i] = 0.0f; for (int j = 0; j < n_visible; j++) { h_negative_prob[i] += c[i] + v_negative[j] * W[j * n_hidden + i]; } h_negative_prob[i] = sigmoid(h_negative_prob[i]); h_negative[i] = ofRandom(1.0f) > h_negative_prob[i] ? 0.0f : 1.0f; } for (int i = 0; i < n_visible * n_hidden; i++) { neg_weights[i] = v_negative[i / n_hidden] * h_negative_prob[i % n_hidden]; } // update weights for (int i = 0; i < n_visible * n_hidden; i++) { W[i] += learning_rate * (pos_weights[i] - neg_weights[i]); }
  • 18. Training ● Matrix multiplication ● Mini-batch training ● Regularization ● Weights update momentum
  • 19. Training code // (positive phase) clear(h_data_prob, n_hidden * batch_size); multiply(v_data, W, h_data_prob, batch_size, n_visible, n_hidden); getState(h_data_prob, h_data, n_hidden * batch_size); transpose(v_data, vt, batch_size, n_visible); multiply(vt, h_data, pos_weights, n_visible, batch_size, n_hidden); for (int cdk = 0; cdk < k; cdk++) { // run update for CD1 or persistent chain for PCD clear(v_prob, n_visible * batch_size); transpose(W, Wt, n_visible, n_hidden); multiply(h, Wt, v_prob, batch_size, n_hidden, n_visible); getState(v_prob, v, n_visible * batch_size); // and hidden nodes clear(h_prob, n_hidden * batch_size); multiply(v, W, h_prob, batch_size, n_visible, n_hidden); getState(h_prob, h, n_hidden * batch_size); }
  • 20. Training code for (int i = 0; i < n_visible * n_hidden; i++) { W_inc[i] *= momentum; W_inc[i] += learning_rate * (pos_weights[i] - neg_weights[i]) / (float) batch_size - weightcost * W[i]; W[i] += W_inc[i]; }
  • 22. Sparseness & Selectivity // increasing selectivity float activity_smoothing = 0.99f; float total_active = 0.0f; for (int i = 0; i < n_hidden; i++) { float activity = pos_weights[i] / (float)batch_size; mean_activity[i] = mean_activity[i] * activity_smoothing + activity * (1.0f - activity_smoothing); W[i] += (0.01f - mean_activity[i]) * 0.01f; total_active += activity; } // increasing sparseness q = activity_smoothing * q + (1.0f - activity_smoothing) * (total_active / (float) n_hidden); for (int i = 0; i < n_hidden; i++) { W[i] += (0.1f - q) * 0.01f; }
  • 24. Classification with RBM ● inputs for some standard discriminative method (classifier) ● train a separate RBM on each class
  • 25. Greedy layer-wise training Higher layers capture more abstract features: