SlideShare a Scribd company logo
Reducing the Dimensionality of Data
with Neural Networks
@St_Hakky
Geoffrey E. Hinton; R. R. Salakhutdinov (2006-07-28). “Reducing the
Dimensionality of Data with Neural Networks”. Science 313 (5786)
Dimensionality Reduction
• Dimensionality Reduction facipitates…
• Classification
• Visualization
• Communication
• Storage of high-dimensional data
Principal Components Analysis
• PCA(Principal Components Analysis)
• A simple and widely used method
• Finds the directions of greatest variance in the data set
• Represents each data point by its coordinates along each
of these directions
“Encoder” and “Decoder” Network
• This paper describe a nonlinear generalization of
PCA(This is autoencoder)
• use an adaptive, multilayer “encoder” network to
transform the high-dimensional data into a low-
dimensional code
• a similar “decoder” network to recover the data from
the code
AutoEncoder
Code
Input Output
Encoder Decoder
AutoEncoder
Input data
Reconstructing data
Hidden layer
Input layer
Outputlayer
Dimensionality
Reduction
How to train the AutoEncoder
・ Starting with random
weights in the two networks
Input data
Reconstructing data
Hidden layer
Input layer
Outputlayer
Dimensionality
Reduction
・ They are trained by
minimizing the discrepancy
between the original data
and its reconstruction.
・ Gradients are obtained by
the chain rule to back-
propagate error from the
decoder network to encoder
network.
It is difficult to optimize multilayer
autoencoder
• It is difficult to optimize the weights in nonlinear
autoencoders that have multiple hidden layers(2-4).
• With large initial weights:
• autoencoders typically find poor local minima
• With small initial weights:
• the gradients in the early layers are tiny, making it infeasible to
train autoencoders with many hidden layers
• If the initial weights are close to a good solution,
gradient decent works well. However finding such
initial weights is very difficult.
Pretraining
• This paper introduce this “pretraining” procedure
for binary data, generalize it to real-valued data,
and show that it works well for a variety of data
sets.
Restricted Boltzmann Machine(RBM)
Visible units
Hidden units
The input data correspond
to “visible” units of the RBM
and the feature detectors
correspond to “hidden” units.
A joint configuration (𝑣, ℎ) of
the visible and hidden units
has an energy given by (1).
𝑣𝑖
ℎ𝑗
𝑏𝑖, 𝑏𝑗: 𝑏𝑖𝑎𝑠
𝑤𝑖𝑗
The network assigns a
probability to every possible
data via this energy function.
Pretraining consits of learning a stack
of RBMs
・ The first layer of feature
detectors then become the visible
units for learning the next RBM.
・ This layer-by-layer learning can
be repeated as many times as
desired.
Experiment(2-A)
The six units in the code layer were linear
and all the other units were logistic.
The network was trained on 20,000
images and tested on 10,000 new images.
The autoencoder discovered how to
convert each 784-pixel image into six
real numbers that allow almost perfect
reconstruction.
Data
The function of layer
Encoder
Decoder
28 * 28
28 * 28
400
400
200
200
100
100
50
50
25
25
6
6
Used AutoEncoder’s Network
Observed Results
Experiment(2-A)
(1) Random samples of curves from the
test data set
(2) Reconstructions produced by the six-
dimensional deep autoencoder
(3) Reconstructions by logistic PCA using
six components
(4) Reconstructions by logistic PCA
The average squared error per image for
the last four rows is 1.44, 7.64, 2.45, 5.90.
(5) Standard PCA using 18 components.
(1)
(3)
(5)
(4)
(2)
Experiment(2-B)
Used AutoEncoder’s Network
The 30 units in the code layer were linear
and all the other units were logistic.
The function of layer
The network was trained on 60,000
images and tested on 10,000 new images.
Data
Encoder
Decoder
1000
1000
784
784
500
250
250
30
30
500
Experiment(2-B):MNIST
The average squared errors for the last
three rows are 3.00, 8.01, and 13.87.
(1)
(3)
(2)
(4)
(1) A random test image from each class
(2) Reconstructions by the 30-dimensional
autoencoder
(3) Reconstructions by 30- dimensional
logistic PCA
(4) Reconstructions by standard PCA
Experiment(2-B)
A two-dimensional autoencoder produced a better visualization of the data than
did the first two principal components.
(A) The two-dimensional codes for 500
digits of each class produced by taking
the first two principal components of
all 60,000 training images.
(B) The two-dimensional codes
found by a 784- 1000-500-250-2
autoencoder.
Experiment(2-C)
Used AutoEncoder’s Network
The 30 units in the code layer were linear
and all the other units were logistic.
The function of layer
Olivetti face data set
Data
Encoder
Decoder
2000
2000
625
625
1000
500
500
30
30
1000
Observed Results
The autoencoder clearly outperformed PCA
Experiment(2-C)
(1) Random samples from the test data set
(1)
(3)
(2)
(2) Reconstructions by the 30-dimensional autoencoder
(3) Reconstructions by 30-dimensional PCA.
The average squared errors are 126 and 135.
Conclusion
• It has been obvious since the 1980s that
backpropagation through deep autoencoders would
be very effective for nonlinear dimensionality
reduction in the situation of…
• Computers were fast enough
• Data sets were big enough
• The initial weights were close enough to a good solution.
Conclusion
• Autoencoders give mappings in both directions
between the data and code spaces.
• They can be applied to very large data sets.
• The reason is that both the pretraining and the fine-
tuning scale linearly in time and space with the
number of training cases.

More Related Content

PPTX
Reducing the Dimensionality of Data with Neural Networks
PPTX
[DL輪読会]When Does Label Smoothing Help?
PPTX
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
PDF
DeepLearning 14章 自己符号化器
PDF
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
PPTX
ResNetの仕組み
PDF
XGBoostからNGBoostまで
PPTX
[DL輪読会]Let there be color
Reducing the Dimensionality of Data with Neural Networks
[DL輪読会]When Does Label Smoothing Help?
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
DeepLearning 14章 自己符号化器
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
ResNetの仕組み
XGBoostからNGBoostまで
[DL輪読会]Let there be color

What's hot (20)

PPTX
Crowd Counting & Detection論文紹介
PPTX
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
PDF
PRML ベイズロジスティック回帰 4.5 4.5.2
PDF
ディープラーニングの最新動向
PDF
PythonによるCVアルゴリズム実装
PDF
[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送
PDF
PCAの最終形態GPLVMの解説
PDF
Kaggle RSNA Pneumonia Detection Challenge 解法紹介
PDF
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
PPTX
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
PPTX
【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい
PPTX
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
PDF
距離学習を導入した二値分類モデルによる異常音検知
PDF
[DL輪読会]ICLR2020の分布外検知速報
PDF
【DL輪読会】Scaling laws for single-agent reinforcement learning
PDF
[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法
PDF
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
PDF
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
PPTX
[DL輪読会]医用画像解析におけるセグメンテーション
PPTX
Triplet Loss 徹底解説
Crowd Counting & Detection論文紹介
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
PRML ベイズロジスティック回帰 4.5 4.5.2
ディープラーニングの最新動向
PythonによるCVアルゴリズム実装
[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送
PCAの最終形態GPLVMの解説
Kaggle RSNA Pneumonia Detection Challenge 解法紹介
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
距離学習を導入した二値分類モデルによる異常音検知
[DL輪読会]ICLR2020の分布外検知速報
【DL輪読会】Scaling laws for single-agent reinforcement learning
[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
[DL輪読会]医用画像解析におけるセグメンテーション
Triplet Loss 徹底解説
Ad

Viewers also liked (17)

PPTX
強くなるロボティック・ ゲームプレイヤーの作り方3章
PPTX
Boosting probabilistic graphical model inference by incorporating prior knowl...
PDF
Tensorflow
PPTX
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
PPTX
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
PPTX
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節
PPTX
【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章
PPTX
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
PPTX
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節
PPTX
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
PPTX
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章
PPTX
【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章
PPTX
Diet networks thin parameters for fat genomic
PPTX
スパース性に基づく機械学習 2章 データからの学習
PPTX
劣モジュラ最適化と機械学習1章
PPTX
Greed is Good: 劣モジュラ関数最大化とその発展
PPTX
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
強くなるロボティック・ ゲームプレイヤーの作り方3章
Boosting probabilistic graphical model inference by incorporating prior knowl...
Tensorflow
Creating basic workflows as Jupyter Notebooks to use Cytoscape programmatically.
[DL輪読会]Xception: Deep Learning with Depthwise Separable Convolutions
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 4.2節
【機械学習プロフェッショナルシリーズ】グラフィカルモデル2章
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 3.3節と3.4節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 1章
【機械学習プロフェッショナルシリーズ】グラフィカルモデル1章
Diet networks thin parameters for fat genomic
スパース性に基づく機械学習 2章 データからの学習
劣モジュラ最適化と機械学習1章
Greed is Good: 劣モジュラ関数最大化とその発展
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Ad

Similar to Reducing the dimensionality of data with neural networks (20)

PDF
11.secure compressed image transmission using self organizing feature maps
PPTX
Teach a neural network to read handwriting
PDF
Neural Networks: Principal Component Analysis (PCA)
PPT
Neural Networks in Data Mining - “An Overview”
PPTX
Neural networks
PDF
A new gridding technique for high density microarray
PPTX
employed to cover the tampering traces of a tampered image. Image tampering
PPTX
Convolutional Neural Network (CNN)of Deep Learning
PPTX
HS Demo
PDF
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
PDF
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
PDF
Web spam classification using supervised artificial neural network algorithms
PPTX
LIDAR- Light Detection and Ranging.
PPTX
Convolutional Neural Networks
PPTX
PPTX
phase 2 ppt.pptx
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PPTX
All projects
PPT
ai7.ppt
PDF
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
11.secure compressed image transmission using self organizing feature maps
Teach a neural network to read handwriting
Neural Networks: Principal Component Analysis (PCA)
Neural Networks in Data Mining - “An Overview”
Neural networks
A new gridding technique for high density microarray
employed to cover the tampering traces of a tampered image. Image tampering
Convolutional Neural Network (CNN)of Deep Learning
HS Demo
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web spam classification using supervised artificial neural network algorithms
LIDAR- Light Detection and Ranging.
Convolutional Neural Networks
phase 2 ppt.pptx
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
All projects
ai7.ppt
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...

Recently uploaded (20)

PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Computer network topology notes for revision
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Quality review (1)_presentation of this 21
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Supervised vs unsupervised machine learning algorithms
Foundation of Data Science unit number two notes
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Computer network topology notes for revision
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to machine learning and Linear Models
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Miokarditis (Inflamasi pada Otot Jantung)
Quality review (1)_presentation of this 21
Database Infoormation System (DBIS).pptx
Introduction-to-Cloud-ComputingFinal.pptx
Clinical guidelines as a resource for EBP(1).pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

Reducing the dimensionality of data with neural networks

  • 1. Reducing the Dimensionality of Data with Neural Networks @St_Hakky Geoffrey E. Hinton; R. R. Salakhutdinov (2006-07-28). “Reducing the Dimensionality of Data with Neural Networks”. Science 313 (5786)
  • 2. Dimensionality Reduction • Dimensionality Reduction facipitates… • Classification • Visualization • Communication • Storage of high-dimensional data
  • 3. Principal Components Analysis • PCA(Principal Components Analysis) • A simple and widely used method • Finds the directions of greatest variance in the data set • Represents each data point by its coordinates along each of these directions
  • 4. “Encoder” and “Decoder” Network • This paper describe a nonlinear generalization of PCA(This is autoencoder) • use an adaptive, multilayer “encoder” network to transform the high-dimensional data into a low- dimensional code • a similar “decoder” network to recover the data from the code
  • 6. AutoEncoder Input data Reconstructing data Hidden layer Input layer Outputlayer Dimensionality Reduction
  • 7. How to train the AutoEncoder ・ Starting with random weights in the two networks Input data Reconstructing data Hidden layer Input layer Outputlayer Dimensionality Reduction ・ They are trained by minimizing the discrepancy between the original data and its reconstruction. ・ Gradients are obtained by the chain rule to back- propagate error from the decoder network to encoder network.
  • 8. It is difficult to optimize multilayer autoencoder • It is difficult to optimize the weights in nonlinear autoencoders that have multiple hidden layers(2-4). • With large initial weights: • autoencoders typically find poor local minima • With small initial weights: • the gradients in the early layers are tiny, making it infeasible to train autoencoders with many hidden layers • If the initial weights are close to a good solution, gradient decent works well. However finding such initial weights is very difficult.
  • 9. Pretraining • This paper introduce this “pretraining” procedure for binary data, generalize it to real-valued data, and show that it works well for a variety of data sets.
  • 10. Restricted Boltzmann Machine(RBM) Visible units Hidden units The input data correspond to “visible” units of the RBM and the feature detectors correspond to “hidden” units. A joint configuration (𝑣, ℎ) of the visible and hidden units has an energy given by (1). 𝑣𝑖 ℎ𝑗 𝑏𝑖, 𝑏𝑗: 𝑏𝑖𝑎𝑠 𝑤𝑖𝑗 The network assigns a probability to every possible data via this energy function.
  • 11. Pretraining consits of learning a stack of RBMs ・ The first layer of feature detectors then become the visible units for learning the next RBM. ・ This layer-by-layer learning can be repeated as many times as desired.
  • 12. Experiment(2-A) The six units in the code layer were linear and all the other units were logistic. The network was trained on 20,000 images and tested on 10,000 new images. The autoencoder discovered how to convert each 784-pixel image into six real numbers that allow almost perfect reconstruction. Data The function of layer Encoder Decoder 28 * 28 28 * 28 400 400 200 200 100 100 50 50 25 25 6 6 Used AutoEncoder’s Network Observed Results
  • 13. Experiment(2-A) (1) Random samples of curves from the test data set (2) Reconstructions produced by the six- dimensional deep autoencoder (3) Reconstructions by logistic PCA using six components (4) Reconstructions by logistic PCA The average squared error per image for the last four rows is 1.44, 7.64, 2.45, 5.90. (5) Standard PCA using 18 components. (1) (3) (5) (4) (2)
  • 14. Experiment(2-B) Used AutoEncoder’s Network The 30 units in the code layer were linear and all the other units were logistic. The function of layer The network was trained on 60,000 images and tested on 10,000 new images. Data Encoder Decoder 1000 1000 784 784 500 250 250 30 30 500
  • 15. Experiment(2-B):MNIST The average squared errors for the last three rows are 3.00, 8.01, and 13.87. (1) (3) (2) (4) (1) A random test image from each class (2) Reconstructions by the 30-dimensional autoencoder (3) Reconstructions by 30- dimensional logistic PCA (4) Reconstructions by standard PCA
  • 16. Experiment(2-B) A two-dimensional autoencoder produced a better visualization of the data than did the first two principal components. (A) The two-dimensional codes for 500 digits of each class produced by taking the first two principal components of all 60,000 training images. (B) The two-dimensional codes found by a 784- 1000-500-250-2 autoencoder.
  • 17. Experiment(2-C) Used AutoEncoder’s Network The 30 units in the code layer were linear and all the other units were logistic. The function of layer Olivetti face data set Data Encoder Decoder 2000 2000 625 625 1000 500 500 30 30 1000 Observed Results The autoencoder clearly outperformed PCA
  • 18. Experiment(2-C) (1) Random samples from the test data set (1) (3) (2) (2) Reconstructions by the 30-dimensional autoencoder (3) Reconstructions by 30-dimensional PCA. The average squared errors are 126 and 135.
  • 19. Conclusion • It has been obvious since the 1980s that backpropagation through deep autoencoders would be very effective for nonlinear dimensionality reduction in the situation of… • Computers were fast enough • Data sets were big enough • The initial weights were close enough to a good solution.
  • 20. Conclusion • Autoencoders give mappings in both directions between the data and code spaces. • They can be applied to very large data sets. • The reason is that both the pretraining and the fine- tuning scale linearly in time and space with the number of training cases.