SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 12 | Dec 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1041
Review of Tencent ML-Images Large-Scale Multi-Label Image Database
Luke Breitfeller1, Sahib Singh2, Abhinav Reddy Chamakura3
1Language Technology Institute, Carnegie Mellon University
2Heinz College, Carnegie Mellon University
3Heinz College, Carnegie Mellon University
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - The paper examines and re-implement the paper Tencent ML-Images: A Large-Scale Multi-Label Image Database for
Visual Representation Learning by Wu et al [1]. The paper implements a fine-tunedResnetvisualrepresentationmodel, trainedon
a novel 10M image dataset of the authors’ invention.
Key Words: Visual Representation Learning, ResNet, Multi-Label Image Database.
1. INTRODUCTION
The paper is re-implementation of the paper Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual
Representation Learning by Wu et al. [1], which seeks to define and test a large-scale image database annotated with multiple
labels per image. The database, known as Tencent ML- Images, contains 10M imageswith14k possibleimagelabels. Thepaper
tests this database using the ResNet image classifier model, modifying the model to run on a distributed multi-GPU system,
getting improved efficiency and accuracy of 79.2 top-1 accuracywhenusedintransferlearningtosingle-label image prediction
tasks.
1.1 Tencent ML-Images
The Tencent ML-Images databases contains 10M total images. These images were assembled through scraping of existing
ImageNet and OpenImage databases.AsImageNetisasingle-labeldatabase,Wuetal.extrapolatedmulti-labelannotationsusing
entailment within a semantic multi-label hierarchy (where the image label "animal" might be a parent of "dog", and thus all
images labeled "dog" must also be labeled "animal") and also predicting labels based on the co-occurrence matrix of multi-
labeled OpenImage images.
1.2 Implementation of ResNet
To test the database, Wu et al. train a ResNet visual representation model on their database and test the accuracy for all 14k
labels. They cite prior SOTA usages of ResNet [2] which were run sequentiallyfortwo monthstotrainover50 epochs.Wu et al.
Instead implement the model distributively on a system of 128 linked GPUs. With this system, they were able to run the
training model over 60 epochs for 90 hours.
1.3 Fine-tuned checkpoints
The paper produced five checkpoint weights with their implementation of ResNet. One checkpoint, trained only on ImageNet
data, serves as their baseline. The remaining checkpoints are trained on the Tencent ML-Images database and fine-tuned on
ImageNet, as previous work has demonstrated this technique improves accuracy (Sun et al.). These checkpoints either used
fixed learning rate (checkpoint 2) or adaptive learning rates of different types (3-5).
1.4 Evaluation
We et al. used instance-level metrics to measure the the results of their trained model. This included instance-level precision,
recall, and F1. Wu et al. note their results are not particularly high (F1 scores of 23.3 for top-5 prediction and 28.1 for top-10
prediction) and cite missing examples of certain labels in their validation data as one significant gap in their evaluation.
1.5 Paper’s impact in the field
We et al. present their work as filling in an existing gap in the field of visual representation. Large- scale, publicly available
image databases that had existing prior used single-label annotations; while multi-label imagedatabasesdidexist before, they
were relatively small, limiting their usefulness as training tools for complex machine learning models.
The paper asserts that the reliance on single-label image annotationspreventsvisual representationmodelsfrombeingableto
analyze images with more than one object, or from being able to make inferences about an image’s content based on other
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 12 | Dec 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1042
factors of the image (for example, one may conclude that if an image contains a doctor, it is highly likely to be located inside a
hospital).
An additional contribution of the paper is implementing the commonly used ResNet model in a distributed framework (as
discussed in "Implementation of ResNet"), significantly cutting down on the time required to run the model on large-scale
databases.
2. RESULTS
2.1 Implementation Challenges
A significant challenge in re-implementing the paperwasworkingaroundcomparativelylimitedresources.ThedistributedGPU
framework run by Wu et al. required over 1000 GPU-hours to trainthedata,whichwedeterminedwouldrequireaprohibitively
large budget to run using AWS.
An additional challenge was the size of the database. The full Tencent ML-Images database consists of 10M images. Each image
varies in storage size, but many take over a MB of storage space on their own. We did not have the storage space necessary to
access the full dataset.
2.2 New Code
Though Wu et al. published the code used to download Tencent ML-Images, establish the RestNet model, and train the data
sequentially, they did not publish the parallelized version of the code. In this paper we converted the code to a parallel format
utilizing TensorFlow.
We also developed additional tools based on the image download code provided to extract only excerpts of the code.
3. RESULTS
3.1 Training Data
A significant hurdle in training on the Tencent ML-Images dataset was the size of the data. We found that every 500 images
from the source ImageNet and OpenImage datasets took a full GB of memory, which prohibitedusfromloadingthefull dataset
into our implementation of the ResNet training. We ran with a significantly reduced training set, preventing us from matching
the F1 scores of the full dataset.
3.2 Fine Tuning Results
The fine-tuning part includes training of the Model trained previously on Tencent ML-Images database. The checkpoint was
provided for public use by the authors. The model is then trained on entire ImageNet with an image size of 224 by 229.Ourre-
implementation results are listed below.
4. CONCLUSION
4.1 Literature Review
Though the database is quite large, many image categories are barely represented among the data, and the average numberof
images per label is 1447.2. Given that the paper puts so much emphasis on the size of the dataset, it seems that for any single
label a smaller, more specialized dataset would perform the task equally well and at lower time cost.
Another note is that the dataset contains very few human-annotated images, and in fact the process of scraping images from
ImageNet includes additional machine annotations that may not be correct, like annotating images with one label with a label
that had high co-occurrence in the OpenImages models. A pre-processing step also selects random bounded boxes from the
original image for use in training, meaning that certain image labels may be rendering entirelyincorrectoncetheyareusedfor
training.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 12 | Dec 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1043
Possibly as a result of the above, the paper presents low scores for multi-label prediction. Though in some respectseventhose
scores are impressive for 11k possible labels, it is a far cry from the informed visual representation that can learn from other
labels that the authors originally envisioned for the database.
4.2 Our Work
An unexpected hurdle was the size of the database given the naturally large size of image data. We think, knowing what we
know now about working with image files, we would have chosen instead a challenge analyzing text data to lower storage
concern.
Another concern was that the paper did not implement anything particularly novel in its model–rather, the model and
evaluation were designed to demonstrate the usefulness of the new dataset. As our limited resources prevented us from
utilizing the full dataset, this by extension limited our ability to gain useful data from the re-implementation.
REFERENCES
[1] Wu, Baoyuan, et al. "Tencent ML-Images:Alarge-scalemulti-label imagedatabaseforvisual representationlearning."arXiv
preprint arXiv: 1901. 01703 (2019).
[2] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in ICCV.
IEEE, 2017, pp. 843–852.

More Related Content

PDF
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
PDF
N1803017478
PDF
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
PDF
Test case prioritization using firefly algorithm for software testing
PPT
Adaptive fault tolerance in cloud survey
PDF
An Effective Attendance Management System using Face Recognition
PDF
IRJET- Instant Exam Paper Generator
PDF
O1803017981
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
N1803017478
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
Test case prioritization using firefly algorithm for software testing
Adaptive fault tolerance in cloud survey
An Effective Attendance Management System using Face Recognition
IRJET- Instant Exam Paper Generator
O1803017981

What's hot (20)

PDF
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
PDF
20 54-1-pb
PDF
Towards formulating dynamic model for predicting defects in system testing us...
PDF
F1803013034
PDF
IRJET- An Improvised Multi Focus Image Fusion Algorithm through Quadtree
PDF
Automated exam question set generator using utility based agent and learning ...
PDF
IRJET- Software Bug Prediction using Machine Learning Approach
PDF
Is Your Software Development Process Green?
PDF
IRJET- Copy-Move Forgery Detection using Discrete Wavelet Transform (DWT) Method
PDF
Bug Triage: An Automated Process
PDF
Implementation of area optimized low power multiplication and accumulation
PDF
implementation of area efficient high speed eddr architecture
DOCX
Edics categories imd anal msp mult
PDF
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
PDF
Ieee projects 2012 2013 - Datamining
PDF
IMAGE QUALITY ASSESSMENT- A SURVEY OF RECENT APPROACHES
PDF
The Computation Complexity Reduction of 2-D Gaussian Filter
PDF
Volume 2-issue-6-1974-1978
PDF
Migration strategies for object oriented system to component based system
PDF
Towards predictive maintenance for marine sector in malaysia
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
20 54-1-pb
Towards formulating dynamic model for predicting defects in system testing us...
F1803013034
IRJET- An Improvised Multi Focus Image Fusion Algorithm through Quadtree
Automated exam question set generator using utility based agent and learning ...
IRJET- Software Bug Prediction using Machine Learning Approach
Is Your Software Development Process Green?
IRJET- Copy-Move Forgery Detection using Discrete Wavelet Transform (DWT) Method
Bug Triage: An Automated Process
Implementation of area optimized low power multiplication and accumulation
implementation of area efficient high speed eddr architecture
Edics categories imd anal msp mult
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
Ieee projects 2012 2013 - Datamining
IMAGE QUALITY ASSESSMENT- A SURVEY OF RECENT APPROACHES
The Computation Complexity Reduction of 2-D Gaussian Filter
Volume 2-issue-6-1974-1978
Migration strategies for object oriented system to component based system
Towards predictive maintenance for marine sector in malaysia
Ad

Similar to IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database (20)

PDF
IRJET- Rice QA using Deep Learning
PPTX
Deep cv 101
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PDF
Do Better ImageNet Models Transfer Better... for Image Recommendation?
PDF
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PDF
Deep-Learning-with-PydddddddddddddTorch.pdf
PDF
On-device ML with TFLite
PDF
Creating a custom ML model for your application - DevFest Lima 2019
PPTX
Parking: DeepLens to the rescue
PDF
IRJET- Art Authentication System using Deep Neural Networks
PPTX
Multi-class Image Classification using deep convolutional networks on extreme...
PPTX
Multi-class Image Classification using Deep Convolutional Networks on extreme...
PDF
OpenPOWER Workshop in Silicon Valley
PDF
ImageNet Classification with Deep Convolutional Neural Networks
PPTX
Ai use cases
PDF
Structure Unstructured Data
PDF
Mlp mixer image_process_210613 deeplearning paper review!
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PPTX
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
PDF
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
IRJET- Rice QA using Deep Learning
Deep cv 101
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Do Better ImageNet Models Transfer Better... for Image Recommendation?
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Deep-Learning-with-PydddddddddddddTorch.pdf
On-device ML with TFLite
Creating a custom ML model for your application - DevFest Lima 2019
Parking: DeepLens to the rescue
IRJET- Art Authentication System using Deep Neural Networks
Multi-class Image Classification using deep convolutional networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...
OpenPOWER Workshop in Silicon Valley
ImageNet Classification with Deep Convolutional Neural Networks
Ai use cases
Structure Unstructured Data
Mlp mixer image_process_210613 deeplearning paper review!
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Construction Project Organization Group 2.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Geodesy 1.pptx...............................................
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
composite construction of structures.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Lecture Notes Electrical Wiring System Components
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
OOP with Java - Java Introduction (Basics)
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Construction Project Organization Group 2.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Embodied AI: Ushering in the Next Era of Intelligent Systems
Geodesy 1.pptx...............................................
Automation-in-Manufacturing-Chapter-Introduction.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
composite construction of structures.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
R24 SURVEYING LAB MANUAL for civil enggi
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Lecture Notes Electrical Wiring System Components
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Model Code of Practice - Construction Work - 21102022 .pdf

IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 12 | Dec 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1041 Review of Tencent ML-Images Large-Scale Multi-Label Image Database Luke Breitfeller1, Sahib Singh2, Abhinav Reddy Chamakura3 1Language Technology Institute, Carnegie Mellon University 2Heinz College, Carnegie Mellon University 3Heinz College, Carnegie Mellon University ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - The paper examines and re-implement the paper Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning by Wu et al [1]. The paper implements a fine-tunedResnetvisualrepresentationmodel, trainedon a novel 10M image dataset of the authors’ invention. Key Words: Visual Representation Learning, ResNet, Multi-Label Image Database. 1. INTRODUCTION The paper is re-implementation of the paper Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning by Wu et al. [1], which seeks to define and test a large-scale image database annotated with multiple labels per image. The database, known as Tencent ML- Images, contains 10M imageswith14k possibleimagelabels. Thepaper tests this database using the ResNet image classifier model, modifying the model to run on a distributed multi-GPU system, getting improved efficiency and accuracy of 79.2 top-1 accuracywhenusedintransferlearningtosingle-label image prediction tasks. 1.1 Tencent ML-Images The Tencent ML-Images databases contains 10M total images. These images were assembled through scraping of existing ImageNet and OpenImage databases.AsImageNetisasingle-labeldatabase,Wuetal.extrapolatedmulti-labelannotationsusing entailment within a semantic multi-label hierarchy (where the image label "animal" might be a parent of "dog", and thus all images labeled "dog" must also be labeled "animal") and also predicting labels based on the co-occurrence matrix of multi- labeled OpenImage images. 1.2 Implementation of ResNet To test the database, Wu et al. train a ResNet visual representation model on their database and test the accuracy for all 14k labels. They cite prior SOTA usages of ResNet [2] which were run sequentiallyfortwo monthstotrainover50 epochs.Wu et al. Instead implement the model distributively on a system of 128 linked GPUs. With this system, they were able to run the training model over 60 epochs for 90 hours. 1.3 Fine-tuned checkpoints The paper produced five checkpoint weights with their implementation of ResNet. One checkpoint, trained only on ImageNet data, serves as their baseline. The remaining checkpoints are trained on the Tencent ML-Images database and fine-tuned on ImageNet, as previous work has demonstrated this technique improves accuracy (Sun et al.). These checkpoints either used fixed learning rate (checkpoint 2) or adaptive learning rates of different types (3-5). 1.4 Evaluation We et al. used instance-level metrics to measure the the results of their trained model. This included instance-level precision, recall, and F1. Wu et al. note their results are not particularly high (F1 scores of 23.3 for top-5 prediction and 28.1 for top-10 prediction) and cite missing examples of certain labels in their validation data as one significant gap in their evaluation. 1.5 Paper’s impact in the field We et al. present their work as filling in an existing gap in the field of visual representation. Large- scale, publicly available image databases that had existing prior used single-label annotations; while multi-label imagedatabasesdidexist before, they were relatively small, limiting their usefulness as training tools for complex machine learning models. The paper asserts that the reliance on single-label image annotationspreventsvisual representationmodelsfrombeingableto analyze images with more than one object, or from being able to make inferences about an image’s content based on other
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 12 | Dec 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1042 factors of the image (for example, one may conclude that if an image contains a doctor, it is highly likely to be located inside a hospital). An additional contribution of the paper is implementing the commonly used ResNet model in a distributed framework (as discussed in "Implementation of ResNet"), significantly cutting down on the time required to run the model on large-scale databases. 2. RESULTS 2.1 Implementation Challenges A significant challenge in re-implementing the paperwasworkingaroundcomparativelylimitedresources.ThedistributedGPU framework run by Wu et al. required over 1000 GPU-hours to trainthedata,whichwedeterminedwouldrequireaprohibitively large budget to run using AWS. An additional challenge was the size of the database. The full Tencent ML-Images database consists of 10M images. Each image varies in storage size, but many take over a MB of storage space on their own. We did not have the storage space necessary to access the full dataset. 2.2 New Code Though Wu et al. published the code used to download Tencent ML-Images, establish the RestNet model, and train the data sequentially, they did not publish the parallelized version of the code. In this paper we converted the code to a parallel format utilizing TensorFlow. We also developed additional tools based on the image download code provided to extract only excerpts of the code. 3. RESULTS 3.1 Training Data A significant hurdle in training on the Tencent ML-Images dataset was the size of the data. We found that every 500 images from the source ImageNet and OpenImage datasets took a full GB of memory, which prohibitedusfromloadingthefull dataset into our implementation of the ResNet training. We ran with a significantly reduced training set, preventing us from matching the F1 scores of the full dataset. 3.2 Fine Tuning Results The fine-tuning part includes training of the Model trained previously on Tencent ML-Images database. The checkpoint was provided for public use by the authors. The model is then trained on entire ImageNet with an image size of 224 by 229.Ourre- implementation results are listed below. 4. CONCLUSION 4.1 Literature Review Though the database is quite large, many image categories are barely represented among the data, and the average numberof images per label is 1447.2. Given that the paper puts so much emphasis on the size of the dataset, it seems that for any single label a smaller, more specialized dataset would perform the task equally well and at lower time cost. Another note is that the dataset contains very few human-annotated images, and in fact the process of scraping images from ImageNet includes additional machine annotations that may not be correct, like annotating images with one label with a label that had high co-occurrence in the OpenImages models. A pre-processing step also selects random bounded boxes from the original image for use in training, meaning that certain image labels may be rendering entirelyincorrectoncetheyareusedfor training.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 12 | Dec 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1043 Possibly as a result of the above, the paper presents low scores for multi-label prediction. Though in some respectseventhose scores are impressive for 11k possible labels, it is a far cry from the informed visual representation that can learn from other labels that the authors originally envisioned for the database. 4.2 Our Work An unexpected hurdle was the size of the database given the naturally large size of image data. We think, knowing what we know now about working with image files, we would have chosen instead a challenge analyzing text data to lower storage concern. Another concern was that the paper did not implement anything particularly novel in its model–rather, the model and evaluation were designed to demonstrate the usefulness of the new dataset. As our limited resources prevented us from utilizing the full dataset, this by extension limited our ability to gain useful data from the re-implementation. REFERENCES [1] Wu, Baoyuan, et al. "Tencent ML-Images:Alarge-scalemulti-label imagedatabaseforvisual representationlearning."arXiv preprint arXiv: 1901. 01703 (2019). [2] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in ICCV. IEEE, 2017, pp. 843–852.