SlideShare a Scribd company logo
SEMI-AUTOMATIC GROUND TRUTH GENERATION
USING UNSUPERVISED CLUSTERING AND LIMITED
MANUAL LABELING:
APPLICATION TO HANDWRITTEN CHARACTER
RECOGNITION
SzilárdVajda,Yves Rangoni, Hubert Cecotti
Pattern Recognition Letters, 2015
1
Ground-truth generation
UnlabeledLabeled
• Usually, real world data is not labeled.
• Large data collections need accurate labels.
2
Labeling strategy
Unsupervised clustering
Labeling by
human expert
the closest real data point for
each centroid is labeled
3
Image dataset
Pixels
Profiles
LBP
Randon
Encoder
Feature representations
Image dataset
Pixels
Profiles
LBP
Randon
Encoder
Feature representations
Label a
Label c
Label b
Labeling strategy
each point inherits a label
of its cluster
4
Input image
Pixels
Profiles
LBP
Randon
Encoder
Feature representations
5
8
5
5
5
5
Final label
Consensus voting /
Majority voting
Labeling strategy
5
Feature representations
• Raw pixels
– Pixel intensity in raw images
• Profiles (upper/lower/left/right)
– only considers outer shape of the character
– i.e. consider the distance between the upper
horizontal line and the closest pixel to the
upper boundary of the image
• Local Binary Patterns (LBP)
– local texture and rotation invariant
representation
6
L. Heutte, T. Paquet, J.V. Moreau, Y. Lecourtier, C. Olivier, A
structural/statistical featurebased vector for handwritten
character recognition, Pattern Recognit. Lett. 19 (7) (1998)
629–641.
Feature representations
• Randon transform
– takes multiple and parallel-beam projections of the image from
different angles
• Encoder network
– a special kind of deep learning architectures
– data-driven
7
• Definitions
• Voting scheme: consensus, majority voting
Classifiers
the number of patterns that should be assigned to the i-th class
the number of patterns that are assigned to the
class after classification
𝑁𝑝 = 𝑁𝑑𝑒𝑐 + 𝑁𝑟𝑒𝑗
𝑁𝑑𝑒𝑐: patterns that have a class assigned, 𝑁𝑟𝑒𝑗: patterns that have no assigned patterns
𝑁+ / 𝑁−: patterns that have been correctly / incorrectly classified 8
Classifiers
• Unsupervised clustering
– K-means clustering (Lloyd algorithm)
– Self Organizing Map (SOM) : a special type of neural network trained in
an unsupervised fashion, to produce a two-dimensional mapping of the
input data
– The Growing Neural Gas (GNG) : no constraints on the topology
contrary to the SOM
• Supervised classification
– The k-nearest neighbor (k-nn) classifier 9
Classifiers
• Evaluation
• measures combine inter-class and intra-class variances
• measures the reliability of the labeling strategy
X: total numbers of vectors to be clustered
1
0
Dataset
• MNIST
– Arabic digits
– 10 classes (0,1,…,9)
– 60,000 training / 10,000 test images
• Lampung
– multi-writer handwritten collection produced by 82 high school students from
Bandar Lampung, Indonesia
– 20 character classes
– 23, 447 characters for training
– 7,853 characters for the test
1
1
Results
• Performance of features
1
2
Results
• Compactness of clustering techniques
1
3
Results
• Clustering performance
1
4
Results
• Labeling performance
– Majority / consensus voting: at least 3 methods / 5 methods provide the same label
1
5
Results
• Labeling performance
1
6
Competitive performance is shown with
few human-labeled samples
Results
• Classification performance
– against several Monte Carlo simulations (100 times) which pick random samples
from complete training set.
1
7
Results
• Classification performance (different voting)
– A fully connected multi layer perceptron classification
1
8
96.69 96.74 96.77
The network is more sensitive to the samples with wrong labels
Conclusion
• Semi-automatic labeling scheme with minimal human
involvement.
• The newly discovered labels with this labeling scheme are
compared in a k-nn scheme, with randomly selected samples
and the complete data (all labeled).
1
9
Thank you !
Q & A
2
0

More Related Content

PDF
Tracking emerges by colorizing videos
PDF
object detection paper review
PDF
Topology-Preserving Ordering of the RGB Space with an Evolutionary Algorithm
PPTX
Translated learning
PPTX
Self taught clustering
PDF
A survey of heterogeneous information network analysis
PPTX
Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated wi...
PPTX
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Tracking emerges by colorizing videos
object detection paper review
Topology-Preserving Ordering of the RGB Space with an Evolutionary Algorithm
Translated learning
Self taught clustering
A survey of heterogeneous information network analysis
Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated wi...
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Similar to Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition (20)

PPTX
Introduction to image processing and pattern recognition
PDF
Practice discovering biological knowledge using networks approach.
PPTX
ExplainableAI.pptx
PPT
convolutional_rbm.ppt
PPTX
Cahall Final Intern Presentation
PPTX
AutoML for user segmentation: how to match millions of users with hundreds of...
PPTX
BAS 250 Lecture 8
DOCX
Adaptive membership functions for hand written character recognition by voron...
PPTX
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
PPTX
NS-CUK Seminar: S.T.Nguyen, Review on "Graph Pointer Neural Networks", AAAI 2022
PDF
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
PDF
Dimensionality Reduction
PPTX
Charmi chokshi ppt
PDF
The Power of Auto ML and How Does it Work
PPTX
Building and deploying analytics
PDF
Pro active management of visual appearance of products
PDF
DoE applied on visual appearance of materials
PPTX
Recent Progress on Object Detection_20170331
PDF
Computer Vision Computer Vision: Algorithms and Applications Richard Szeliski
Introduction to image processing and pattern recognition
Practice discovering biological knowledge using networks approach.
ExplainableAI.pptx
convolutional_rbm.ppt
Cahall Final Intern Presentation
AutoML for user segmentation: how to match millions of users with hundreds of...
BAS 250 Lecture 8
Adaptive membership functions for hand written character recognition by voron...
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
NS-CUK Seminar: S.T.Nguyen, Review on "Graph Pointer Neural Networks", AAAI 2022
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
Dimensionality Reduction
Charmi chokshi ppt
The Power of Auto ML and How Does it Work
Building and deploying analytics
Pro active management of visual appearance of products
DoE applied on visual appearance of materials
Recent Progress on Object Detection_20170331
Computer Vision Computer Vision: Algorithms and Applications Richard Szeliski
Ad

More from SOYEON KIM (20)

PDF
Network-based machine learning approach for aggregating multi-modal data
PPTX
Revealing disease-associated pathways by network integration of untargeted me...
PPTX
Systems genetics approaches to understand complex traits
PPTX
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
PDF
Network embedding
PPTX
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
PPTX
Deep learning based multi-omics integration, a survey
PPTX
DeepWalk: Online Learning of Social Representations
PPTX
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
PPTX
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
PPTX
Text extraction from natural scene image, a survey
PPTX
Opinion Fraud Detection in Online Reviews by Network Effects
PPTX
Evaluating color descriptors for object and scene recognition
PPTX
Outcome-guided mutual information networks for investigating gene-gene intera...
PPTX
Spectral clustering
PPTX
Sentiwordnet: A publicly available lexical resource for opinion mining
PPT
Opinion spam and analysis
PPTX
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
PPTX
Graph-based KNN Algorithm for Spam SMS Detection
PPTX
Deep belief networks for spam filtering
Network-based machine learning approach for aggregating multi-modal data
Revealing disease-associated pathways by network integration of untargeted me...
Systems genetics approaches to understand complex traits
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Network embedding
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
Deep learning based multi-omics integration, a survey
DeepWalk: Online Learning of Social Representations
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
Text extraction from natural scene image, a survey
Opinion Fraud Detection in Online Reviews by Network Effects
Evaluating color descriptors for object and scene recognition
Outcome-guided mutual information networks for investigating gene-gene intera...
Spectral clustering
Sentiwordnet: A publicly available lexical resource for opinion mining
Opinion spam and analysis
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Graph-based KNN Algorithm for Spam SMS Detection
Deep belief networks for spam filtering
Ad

Recently uploaded (20)

PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Introduction to Data Science and Data Analysis
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Database Infoormation System (DBIS).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Introduction to the R Programming Language
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to machine learning and Linear Models
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Data Science and Data Analysis
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Database Infoormation System (DBIS).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
.pdf is not working space design for the following data for the following dat...
Introduction-to-Cloud-ComputingFinal.pptx
Supervised vs unsupervised machine learning algorithms
[EN] Industrial Machine Downtime Prediction
Introduction to the R Programming Language
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Mega Projects Data Mega Projects Data
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to machine learning and Linear Models

Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition