SlideShare a Scribd company logo
3
Most read
5
Most read
14
Most read
Siamese Networks
Textual Record Matching
By Nick McClure
@nfmcclure
Outline
● Motivation
– Why are siamese networks useful?
– Why should we care?
– What are the benefits?
● Structure and Loss
– What are they?
– How does the loss work?
● Use Cases
● Address Matching Example Code
Motivation
● Neural Networks can have unintentional behaviors.
– Outliers
– Model Bias
– Unexplainable results
● Siamese Networks impose a structure that helps
combat these problems.
● Siamese Networks allow us to use more data points
than we would have in other common cases.
– Allow us to use relationships between data points!
Blue Red
Structural Definition
● Siamese networks train a similarity measure
between labeled points.
● Two input data points (textual embeddings,
images, etc…) are run simultaneously through a
neural network and are both mapped to a vector
of shape Nx1.
● Then a standard numerical function can measure
the distance between the vectors (e.g. the cosine
distance).
Structural Definition
Input A Input B
Neural Network
Architecture
Neural Network
Architecture
Vector A Output Vector B Output
Cosine Similarity
-1<=output<=1
Same Parameters
Same Structure
Training Dataset
● Siamese Networks must be trained on data that has two
inputs and a target similarity.
– [‘input a1’, ‘input a2’, 1]
– [‘input a2’, ‘input a3’, 1]
– [‘input a2’, ‘input b1’, -1]
– …
● There must be similar inputs (+1) and dissimilar inputs (-1).
● Most studies have shown that the ratio of dissimilar to
similar is optimal around:
– Between 2:1 and 10:1.
– This depends on the problem and specificity of the model needed.
Training Dataset
● Since we have to generate similar and dissimilar pairs, the
actual amount of training data is quite higher than normal.
● For example, in the UCI machine learning data set of
spam/ham text messages, there are 656 observations, 577
are ham and only 79 are spam.
● With the siamese architecture, we can consider up to 79
choose 2 = 3,081 similar spam comparisons and 577
choose 2 = 166,176 similar ham comparisons, while
having 45,583 total dissimilar comparisons!!!
Dealing with New Data
● Another benefit is that siamese similarity networks
generalize to inputs and outputs that have never been
seen before.
– This makes sense when comparing to how a person
can make predictions on unseen instances and events.
Help Explain Results
● With siamese networks, we can always list the nearest
points in the output-vector space. Because of this, we can
say that a data point has a specific label because it is
nearest to a set of points.
● This type of explanation does not depend on how
complicated the internal structure is.
Siamese Loss Function: Contrastive Loss
● The loss function is a combination of a similar-loss
(L+) and dissimilar-loss (L-).
How Does the Backpropagation Work?
● Siamese networks are constrained to have the same
parameters in both sides of the network.
● To train the network, a pair of output vectors needs to be
either closer together (similar) or further apart (dissimilar).
● It is standard to average the gradients of the two ‘sides’
before performing the gradient update step.
Potential Use Cases
● Natural Language Processing:
– Ontology creation: How similar are words/phrases?
– Job Title Matching: ‘VP of HR’ == ‘V.P. of People’
– Topic matching: Which topic is this phrase referring to?
● Others:
– Image recognition
– Image search
– Signature/Speech recognition
Address Matching!
● Input addresses can have typos. We need to be
able to process these addresses and match
them to a best address from a canonical truth
set.
● E.g. ‘123 MaiinSt’ matches to ‘123 Main St’.
– Why? Fat fingers, image→text translation errors,
encoding errors, etc...
● Our siamese network will be a bidirectional
LSTM with a fully connected layer on the top.
Address Matching!
TensorFlow Demo!
Here: https://github.com/nfmcclure/tensorflow_cookbook
Navigate to Chapter 09: RNNs, then Section 06: Training a Siamese Similarity Measure
Conclusions and Summary
● Advantages:
– Can predict out-of-training-set data.
– Makes use of relationships, using more data.
– Explainable results, regardless of network complexity.
● Disadvantages:
– More computationally intensive (precomputation helps however).
– More hyperparameters and fine-tuning necessary.
– Generally, more training needed.
● When to use:
– Want to exploit relationships between data points.
– Can easily label ‘similar’ and ‘dissimilar’ points.
● When not to use:
Further References
● Signature Verification with fully connected siamese newtworks, 1995, Yann
LeCun, et. al., Bell Labs,
http://papers.nips.cc/paper/769-signature-verification-using-a-siamese-time-
delay-neural-network.pdf
● Attention based CNN for sentence similarity, 2015,
https://arxiv.org/pdf/1512.05193v2.pdf
● Learning text similarities with siamese RNNs, 2016,
http://anthology.aclweb.org/W16-1617
● Sketch-based Image Retrieval via Siamese CNNs, 2016,
http://qugank.github.io/papers/ICIP16.pdf

More Related Content

PDF
Neural Network Architectures
PPTX
PDF
LeNet-5
PPTX
Denoising autoencoder by Harish.R
PDF
GoogLeNet Insights
PDF
Intro to Neural Networks
PPTX
AlexNet
PDF
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
Neural Network Architectures
LeNet-5
Denoising autoencoder by Harish.R
GoogLeNet Insights
Intro to Neural Networks
AlexNet
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman

What's hot (20)

PDF
Siamese networks.pptx.pdf
PPT
2.5 backpropagation
PDF
Tutorial on Deep Generative Models
PPTX
Feature selection concepts and methods
PDF
Feature Engineering
PDF
Feature selection
PPTX
Transfer Learning and Fine-tuning Deep Neural Networks
PDF
Feature Engineering - Getting most out of data for predictive models
PPTX
Support vector machines (svm)
PDF
What is the Expectation Maximization (EM) Algorithm?
PPTX
Lecture 1 graphical models
PPTX
Inductive analytical approaches to learning
PPTX
Naive bayes
PDF
Meta learning with memory augmented neural network
PPTX
Artificial neural network
PPTX
Image classification with Deep Neural Networks
PPTX
Introduction to Machine Learning
PPTX
Machine Learning Final presentation
PPTX
Feedforward neural network
PPTX
Stochastic Gradient Decent (SGD).pptx
Siamese networks.pptx.pdf
2.5 backpropagation
Tutorial on Deep Generative Models
Feature selection concepts and methods
Feature Engineering
Feature selection
Transfer Learning and Fine-tuning Deep Neural Networks
Feature Engineering - Getting most out of data for predictive models
Support vector machines (svm)
What is the Expectation Maximization (EM) Algorithm?
Lecture 1 graphical models
Inductive analytical approaches to learning
Naive bayes
Meta learning with memory augmented neural network
Artificial neural network
Image classification with Deep Neural Networks
Introduction to Machine Learning
Machine Learning Final presentation
Feedforward neural network
Stochastic Gradient Decent (SGD).pptx
Ad

Similar to Siamese networks (20)

PDF
Advanced Topics in Machine Learning - Basics
PDF
VGGFace Transfer Learning and Siamese Network for Face Recognition
PPTX
Neural netorksmatching
PPTX
Tutorial on deep transformer themed “Gemini family”
PPTX
Exploring Simple Siamese Representation Learning
PDF
Optically processed Kannada script realization with Siamese neural network model
PDF
Sentence-BERT
PDF
PR-305: Exploring Simple Siamese Representation Learning
PDF
Image similarity with deep learning
PDF
Using Siamese Graph Neural Networks for Similarity-Based Retrieval in Process...
PPTX
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
PDF
Exploring Simple Siamese Representation Learning
PPTX
Tutorial on deep transformer (presentation slides)
PPTX
ML Paper Tutorial - Video Face Manipulation Detection Through Ensemble of CNN...
PDF
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
PPT
one shot15729752 Deep Learning for AI and DS
PDF
Poster present at the CAIM workshop NYC, Feb 15 2018
PDF
Cheatsheet recurrent-neural-networks
PDF
2019-06-14:6 - Reti neurali e compressione immagine
PPTX
Towards understanding the impacts of textual dissimilarity on duplicate bug r...
Advanced Topics in Machine Learning - Basics
VGGFace Transfer Learning and Siamese Network for Face Recognition
Neural netorksmatching
Tutorial on deep transformer themed “Gemini family”
Exploring Simple Siamese Representation Learning
Optically processed Kannada script realization with Siamese neural network model
Sentence-BERT
PR-305: Exploring Simple Siamese Representation Learning
Image similarity with deep learning
Using Siamese Graph Neural Networks for Similarity-Based Retrieval in Process...
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
Exploring Simple Siamese Representation Learning
Tutorial on deep transformer (presentation slides)
ML Paper Tutorial - Video Face Manipulation Detection Through Ensemble of CNN...
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
one shot15729752 Deep Learning for AI and DS
Poster present at the CAIM workshop NYC, Feb 15 2018
Cheatsheet recurrent-neural-networks
2019-06-14:6 - Reti neurali e compressione immagine
Towards understanding the impacts of textual dissimilarity on duplicate bug r...
Ad

Recently uploaded (20)

PPTX
Computer network topology notes for revision
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Mega Projects Data Mega Projects Data
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Global journeys: estimating international migration
PDF
Foundation of Data Science unit number two notes
Computer network topology notes for revision
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Mega Projects Data Mega Projects Data
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Galatica Smart Energy Infrastructure Startup Pitch Deck
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
climate analysis of Dhaka ,Banglades.pptx
Global journeys: estimating international migration
Foundation of Data Science unit number two notes

Siamese networks

  • 1. Siamese Networks Textual Record Matching By Nick McClure @nfmcclure
  • 2. Outline ● Motivation – Why are siamese networks useful? – Why should we care? – What are the benefits? ● Structure and Loss – What are they? – How does the loss work? ● Use Cases ● Address Matching Example Code
  • 3. Motivation ● Neural Networks can have unintentional behaviors. – Outliers – Model Bias – Unexplainable results ● Siamese Networks impose a structure that helps combat these problems. ● Siamese Networks allow us to use more data points than we would have in other common cases. – Allow us to use relationships between data points! Blue Red
  • 4. Structural Definition ● Siamese networks train a similarity measure between labeled points. ● Two input data points (textual embeddings, images, etc…) are run simultaneously through a neural network and are both mapped to a vector of shape Nx1. ● Then a standard numerical function can measure the distance between the vectors (e.g. the cosine distance).
  • 5. Structural Definition Input A Input B Neural Network Architecture Neural Network Architecture Vector A Output Vector B Output Cosine Similarity -1<=output<=1 Same Parameters Same Structure
  • 6. Training Dataset ● Siamese Networks must be trained on data that has two inputs and a target similarity. – [‘input a1’, ‘input a2’, 1] – [‘input a2’, ‘input a3’, 1] – [‘input a2’, ‘input b1’, -1] – … ● There must be similar inputs (+1) and dissimilar inputs (-1). ● Most studies have shown that the ratio of dissimilar to similar is optimal around: – Between 2:1 and 10:1. – This depends on the problem and specificity of the model needed.
  • 7. Training Dataset ● Since we have to generate similar and dissimilar pairs, the actual amount of training data is quite higher than normal. ● For example, in the UCI machine learning data set of spam/ham text messages, there are 656 observations, 577 are ham and only 79 are spam. ● With the siamese architecture, we can consider up to 79 choose 2 = 3,081 similar spam comparisons and 577 choose 2 = 166,176 similar ham comparisons, while having 45,583 total dissimilar comparisons!!!
  • 8. Dealing with New Data ● Another benefit is that siamese similarity networks generalize to inputs and outputs that have never been seen before. – This makes sense when comparing to how a person can make predictions on unseen instances and events.
  • 9. Help Explain Results ● With siamese networks, we can always list the nearest points in the output-vector space. Because of this, we can say that a data point has a specific label because it is nearest to a set of points. ● This type of explanation does not depend on how complicated the internal structure is.
  • 10. Siamese Loss Function: Contrastive Loss ● The loss function is a combination of a similar-loss (L+) and dissimilar-loss (L-).
  • 11. How Does the Backpropagation Work? ● Siamese networks are constrained to have the same parameters in both sides of the network. ● To train the network, a pair of output vectors needs to be either closer together (similar) or further apart (dissimilar). ● It is standard to average the gradients of the two ‘sides’ before performing the gradient update step.
  • 12. Potential Use Cases ● Natural Language Processing: – Ontology creation: How similar are words/phrases? – Job Title Matching: ‘VP of HR’ == ‘V.P. of People’ – Topic matching: Which topic is this phrase referring to? ● Others: – Image recognition – Image search – Signature/Speech recognition
  • 13. Address Matching! ● Input addresses can have typos. We need to be able to process these addresses and match them to a best address from a canonical truth set. ● E.g. ‘123 MaiinSt’ matches to ‘123 Main St’. – Why? Fat fingers, image→text translation errors, encoding errors, etc... ● Our siamese network will be a bidirectional LSTM with a fully connected layer on the top.
  • 14. Address Matching! TensorFlow Demo! Here: https://github.com/nfmcclure/tensorflow_cookbook Navigate to Chapter 09: RNNs, then Section 06: Training a Siamese Similarity Measure
  • 15. Conclusions and Summary ● Advantages: – Can predict out-of-training-set data. – Makes use of relationships, using more data. – Explainable results, regardless of network complexity. ● Disadvantages: – More computationally intensive (precomputation helps however). – More hyperparameters and fine-tuning necessary. – Generally, more training needed. ● When to use: – Want to exploit relationships between data points. – Can easily label ‘similar’ and ‘dissimilar’ points. ● When not to use:
  • 16. Further References ● Signature Verification with fully connected siamese newtworks, 1995, Yann LeCun, et. al., Bell Labs, http://papers.nips.cc/paper/769-signature-verification-using-a-siamese-time- delay-neural-network.pdf ● Attention based CNN for sentence similarity, 2015, https://arxiv.org/pdf/1512.05193v2.pdf ● Learning text similarities with siamese RNNs, 2016, http://anthology.aclweb.org/W16-1617 ● Sketch-based Image Retrieval via Siamese CNNs, 2016, http://qugank.github.io/papers/ICIP16.pdf