SlideShare a Scribd company logo
UNIT III
Cloud APIs for Computer Vision: The landscape of visual Recognition APIs, Clarifai, Microsoft
cognitive services, Google Cloud Vision, IBM Watson Visual Recognition. Getting up and running with
cloud APIs, Training our custom classifier. Performance tuning for cloud APIs: Effect of Resizing on
image labelling APIs, Effect of Compression on Image Labelling APIs, Effect of compression on OCR
APIs, Effect of Resizing on OCR APIs
Google Cloud ML Engine: Pros of using cloud ML Engine, Cons of using Cloud ML Engine, Building
classification API, TensorFlow Serving, KubeFlow: Pipelines, Fairing.
Edge ML: Constraints and Optimizations, Tensoflow Lite, Running tensorflow lite, Processing the
Image Buffer, Federated Learning
The landscape of visual Recognition APIs
Clarifai
• Clarifai was one of the first visual recognition API Started by Matthew Zeiler, a graduate student
from New York University.
• It offers multilingual tagging in more than 23 languages, visual similarity search among previously
uploaded photographs, face-based multicultural appearance classifier, photograph aesthetic
scorer, focus scorer, and embedding vector generation to help us build our own reverse-image search.
• It also offers recognition in specialized domains including clothing and fashion, travel and
hospitality, and weddings.
• Through its public API, the image tagger supports 11,000 concepts.
Clarifai
Microsoft Cognitive Services
• With the creation of ResNet-152 in 2015, Microsoft was able to win seven tasks at the ILSVRC, the COCO Image
Captioning Challenge as well as the Emotion Recognition in the Wild challenge, ranging from classification and
detection (localization) to image descriptions.
• Originally starting out as Project Oxford from Microsoft Research in 2015, it was eventually renamed Cognitive
Services in 2016. And most of this research was translated to cloud APIs.
• It’s a comprehensive set of more than 50 APIs ranging from vision, natural language processing, speech,
search, knowledge graph linkage, and more.
• Historically, many of the same libraries were being run at divisions at Xbox (image tagging) and Bing(Image
Search+tagging), but they are now being exposed to developers externally.
• Some viral applications showcasing creative ways developers use these APIs include how-old.net (How Old Do I
Look?), Mimicker Alarm (which requires making a particular facial expression in order to defuse the morning
alarm), and CaptionBot.ai.
Microsoft Cognitive Services
As illustrated in Figure 8-2, the API offers image captioning, handwriting understanding, and
headwear recognition. Due to many enterprise customers, Cognitive Services does not use
customer image data for improving its services.
Google Cloud Vision
• Google won the 2014 ILSVRC (ImageNet Large Scale Visual Recognition Challenge)
using GoogLeNet, a deep 22-layer neural network.
• This led to the development of the now-standard Inception architectures.
• In December 2015, Google released a set of Vision APIs to complement the
Inception models.
• With access to vast amounts of consumer data, Google can significantly improve its
classifiers.
• For instance, insights from Google Street View help enhance real-world text
extraction, such as reading billboards.
Google Cloud Vision
• For human faces, it provides the most detailed facial key points (Figure 8-3)
including roll, tilt, and pan to accurately localize the facial features.
• The APIs also return similar images on the web to the given input. A simple way to
try out the performance of Google’s system without writing code is by uploading
photographs to Google Photos and searching through the tags.
Amazon Rekognition
• Amazon Rekognition API is largely based on Orbeus, a Sunnyvale, California-based
startup that was acquired by Amazon in late 2015.
• Founded in 2012, its chief scientist also had winning entries in the ILSVRC 2014
detection challenge.
• The same APIs were used to power PhotoTime, a famous photo organization app.
The API’s services are available as part of the AWS offerings.
Amazon Rekognition
• License plate recognition, video recognition APIs, and better end-to-end
integration examples of Rekognition APIs with AWS offerings like Kinesis Video
Streams, Lambda, and others.
• Also, Amazon’s API is the only one that can determine whether the subject’s eyes
are open or closed.
IBM Watson Visual Recognition
• Under the Watson brand, IBM’s Visual Recognition offering started in early 2015.
• After purchasing AlchemyAPI (Web3 development), a Denver-based startup,
AlchemyVision has been used for powering the Visual Recognition APIs.
• Like others, IBM also offers custom classifier training.
IBM Watson Visual Recognition
Algorithmia
• Algorithmia is a marketplace for hosting algorithms as APIs on the cloud.
• Founded in 2013, this Seattle-based startup has both its own in-house algorithms as
well as those created by others .This API did tend to have the slowest response time.
Colorization service for black and
white photos (Figure 8-6), image
stylization, image similarity, and the
ability to run these services on premises,
or on any cloud provider.
Getting Up and Running with Cloud APIs
• Calling these cloud services requires minimal code.
• At a high level, get an API key, load the image, specify the intent, make a POST
request with the proper encoding (e.g., base64 for the image), and receive the
results.
• Most of the cloud providers offer software development kits (SDKs) and sample code
showcasing how to call their services.
• They additionally provide pip installable Python packages to further simplify calling
them.
Getting Up and Running with Cloud APIs
• Now, let’s test the same image using Google Vision APIs. Get an API key from their
website and use it in the code.
google_cloud_tagimage('DogAndBaby.jpg')
Getting Up and Running with Cloud APIs
cognitive_services_tagimage('DogAndBaby.jpg')
Training Our Own Custom Classifier
A few of these cloud providers give us the ability to train our own
custom classifier by merely using a drag-and-drop interface. The
pretty user interfaces provide no indication that under the hood
they are using transfer learning. As a result, Cognitive Services
Custom Vision, Google AutoML, Clarifai, and IBM Watson all provide
us the option for custom training.
Additionally, some of them even allow building custom detectors,
which can identify the location of objects with a bounding box.
Training Our Own Custom Classifier
The key process in all of them being the following:
1. Upload images
2. Label them
3. Train a model
4. Evaluate the model
5. Publish the model as a REST API
6. Bonus: Download a mobile-friendly model for inference on
smartphones and edge devices
Training Our Own Custom Classifier
step-by-step example of Microsoft’s Custom
Vision.
1. Create a project (Figure 8-14): Choose a domain
that best describes our use case. For most purposes,
“General” would be optimal. For more specialized
scenarios, we might want to choose a relevant
domain.
UNIT III_Cloud APIs for CV_unit III power point
As an example, if we have an ecommerce website with
photos of products against a pure white background, we
might want to select the “Retail” domain.
If we intend to run this model on a mobile phone
eventually, we should choose the “Compact” version
of the model, instead; it is smaller in size with only a
slight loss in accuracy.
2.Upload (Figure 8-15): For each category, upload
images and tag them.
It’s important to upload at least 30 photographs per
category.
For our test, we uploaded more than 30 images of
Maltese dogs and tagged them appropriately.
UNIT III_Cloud APIs for CV_unit III power point
3. Train (Figure 8-16): Click the Train button, and then in about three minutes, we
have a spanking new classifier ready.
4. Analyze the model’s performance: Check the precision and recall of the model. By default,
the system sets the threshold at 90% confidence and gives the precision and recall metrics at
that value.
For higher precision, increase the confidence threshold. This would come at the expense of
reduced recall. Figure 8-17 shows example output.
5. Ready to go: We now have a production-ready API endpoint that we can call from any
application.
UNIT III_Cloud APIs for CV_unit III power point
Performance Tuning for Cloud APIs
A photograph taken by a modern cell phone can have a high resolution and be
upward of 4 MB in size.
Depending on the network quality, it can take a few seconds to upload such an
image to the service.
Performance Tuning for Cloud APIs
There are two ways to reduce the size of the image:
Resizing
Most CNNs take an input image with a size of 224 x 224 or 448 x 448 pixels. Much of a cell
phone photo’s resolution would be unnecessary for a CNN. It would make sense to
downsize the image prior to sending it over the network, instead of sending a large image
over the network and then downsizing it on the server.
Compression
Most image libraries perform lossy compression while saving a file. Even a little bit of
compression can go a long way in reducing the size of the image while minimally affecting
the quality of the image itself. Compression does introduce noise, but CNNs are usually
robust enough to deal with some of it.

More Related Content

PDF
Design Day Workshop
PDF
Azure Cognitive Services - Custom Vision
PPTX
Google Cloud Vision API
PDF
Want to integrate your business phone system or contact center with your CRM?
PDF
Easy path to machine learning
PDF
Computers Are Opening Their Eyes - And They're Already Better at Seeing Than ...
PPTX
Hands-On with Google’s Machine Learning APIs, 12/3/2017
PDF
Cloud computing for image processing and bio informatics
Design Day Workshop
Azure Cognitive Services - Custom Vision
Google Cloud Vision API
Want to integrate your business phone system or contact center with your CRM?
Easy path to machine learning
Computers Are Opening Their Eyes - And They're Already Better at Seeing Than ...
Hands-On with Google’s Machine Learning APIs, 12/3/2017
Cloud computing for image processing and bio informatics

Similar to UNIT III_Cloud APIs for CV_unit III power point (20)

PDF
Machine Learning for Any Size of Data, Any Type of Data
PPTX
How to Get Started in ML?
PPTX
Ai use cases
PDF
Reproducible data science and business solutions
PPTX
Google Vision and Recommendation AI
PDF
Are API Services Taking Over All the Interesting Data Science Problems?
PDF
A Journey with Microsoft Cognitive Service I
PPTX
Biometric Systems - Automate Video Streaming Analysis with Azure and AWS
PDF
Automatic multi-modal metadata annotation based on trained cognitive solution...
PDF
Gears: Hipster as a Service
PDF
Building Computer Vision Products at Hometogo GmbH (AI in Action Berlin, Apr...
PPTX
AI in image recognition
PPTX
Custom vision
PDF
Easy path to machine learning (Spring 2021)
PDF
Create an image classifier with azure custom vision net sdk
PDF
Unity and Microsoft Azure Cognitive Services - DIGITREK21 Workshop
PDF
AISF19 - Unleash Computer Vision at the Edge
PDF
Using Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaS
PDF
Biometric Systems - Automate Video Streaming Analysis with Azure and AWS
PDF
Shift Remote: AI: Behind the scenes development in an AI company - Matija Ili...
Machine Learning for Any Size of Data, Any Type of Data
How to Get Started in ML?
Ai use cases
Reproducible data science and business solutions
Google Vision and Recommendation AI
Are API Services Taking Over All the Interesting Data Science Problems?
A Journey with Microsoft Cognitive Service I
Biometric Systems - Automate Video Streaming Analysis with Azure and AWS
Automatic multi-modal metadata annotation based on trained cognitive solution...
Gears: Hipster as a Service
Building Computer Vision Products at Hometogo GmbH (AI in Action Berlin, Apr...
AI in image recognition
Custom vision
Easy path to machine learning (Spring 2021)
Create an image classifier with azure custom vision net sdk
Unity and Microsoft Azure Cognitive Services - DIGITREK21 Workshop
AISF19 - Unleash Computer Vision at the Edge
Using Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaS
Biometric Systems - Automate Video Streaming Analysis with Azure and AWS
Shift Remote: AI: Behind the scenes development in an AI company - Matija Ili...
Ad

More from smithashetty24 (6)

PPT
Unit II_Computer Vision unit II power point
DOC
Unit II & III_uncovered topics.doc notes
PPT
CV_Unit 1.Introduction Light in space Linear Filters
PPTX
Unit III_Ch 17_Probablistic Methods.pptx
PPTX
DAA-Module-5.pptx
PPTX
M2-Recursion.pptx
Unit II_Computer Vision unit II power point
Unit II & III_uncovered topics.doc notes
CV_Unit 1.Introduction Light in space Linear Filters
Unit III_Ch 17_Probablistic Methods.pptx
DAA-Module-5.pptx
M2-Recursion.pptx
Ad

Recently uploaded (20)

PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
RMMM.pdf make it easy to upload and study
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Institutional Correction lecture only . . .
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Pre independence Education in Inndia.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Cell Structure & Organelles in detailed.
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
human mycosis Human fungal infections are called human mycosis..pptx
Complications of Minimal Access Surgery at WLH
RMMM.pdf make it easy to upload and study
Microbial diseases, their pathogenesis and prophylaxis
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Institutional Correction lecture only . . .
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Insiders guide to clinical Medicine.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Renaissance Architecture: A Journey from Faith to Humanism
Pre independence Education in Inndia.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Cell Types and Its function , kingdom of life
Anesthesia in Laparoscopic Surgery in India
TR - Agricultural Crops Production NC III.pdf
VCE English Exam - Section C Student Revision Booklet
Cell Structure & Organelles in detailed.

UNIT III_Cloud APIs for CV_unit III power point

  • 1. UNIT III Cloud APIs for Computer Vision: The landscape of visual Recognition APIs, Clarifai, Microsoft cognitive services, Google Cloud Vision, IBM Watson Visual Recognition. Getting up and running with cloud APIs, Training our custom classifier. Performance tuning for cloud APIs: Effect of Resizing on image labelling APIs, Effect of Compression on Image Labelling APIs, Effect of compression on OCR APIs, Effect of Resizing on OCR APIs Google Cloud ML Engine: Pros of using cloud ML Engine, Cons of using Cloud ML Engine, Building classification API, TensorFlow Serving, KubeFlow: Pipelines, Fairing. Edge ML: Constraints and Optimizations, Tensoflow Lite, Running tensorflow lite, Processing the Image Buffer, Federated Learning
  • 2. The landscape of visual Recognition APIs Clarifai • Clarifai was one of the first visual recognition API Started by Matthew Zeiler, a graduate student from New York University. • It offers multilingual tagging in more than 23 languages, visual similarity search among previously uploaded photographs, face-based multicultural appearance classifier, photograph aesthetic scorer, focus scorer, and embedding vector generation to help us build our own reverse-image search. • It also offers recognition in specialized domains including clothing and fashion, travel and hospitality, and weddings. • Through its public API, the image tagger supports 11,000 concepts.
  • 4. Microsoft Cognitive Services • With the creation of ResNet-152 in 2015, Microsoft was able to win seven tasks at the ILSVRC, the COCO Image Captioning Challenge as well as the Emotion Recognition in the Wild challenge, ranging from classification and detection (localization) to image descriptions. • Originally starting out as Project Oxford from Microsoft Research in 2015, it was eventually renamed Cognitive Services in 2016. And most of this research was translated to cloud APIs. • It’s a comprehensive set of more than 50 APIs ranging from vision, natural language processing, speech, search, knowledge graph linkage, and more. • Historically, many of the same libraries were being run at divisions at Xbox (image tagging) and Bing(Image Search+tagging), but they are now being exposed to developers externally. • Some viral applications showcasing creative ways developers use these APIs include how-old.net (How Old Do I Look?), Mimicker Alarm (which requires making a particular facial expression in order to defuse the morning alarm), and CaptionBot.ai.
  • 5. Microsoft Cognitive Services As illustrated in Figure 8-2, the API offers image captioning, handwriting understanding, and headwear recognition. Due to many enterprise customers, Cognitive Services does not use customer image data for improving its services.
  • 6. Google Cloud Vision • Google won the 2014 ILSVRC (ImageNet Large Scale Visual Recognition Challenge) using GoogLeNet, a deep 22-layer neural network. • This led to the development of the now-standard Inception architectures. • In December 2015, Google released a set of Vision APIs to complement the Inception models. • With access to vast amounts of consumer data, Google can significantly improve its classifiers. • For instance, insights from Google Street View help enhance real-world text extraction, such as reading billboards.
  • 7. Google Cloud Vision • For human faces, it provides the most detailed facial key points (Figure 8-3) including roll, tilt, and pan to accurately localize the facial features. • The APIs also return similar images on the web to the given input. A simple way to try out the performance of Google’s system without writing code is by uploading photographs to Google Photos and searching through the tags.
  • 8. Amazon Rekognition • Amazon Rekognition API is largely based on Orbeus, a Sunnyvale, California-based startup that was acquired by Amazon in late 2015. • Founded in 2012, its chief scientist also had winning entries in the ILSVRC 2014 detection challenge. • The same APIs were used to power PhotoTime, a famous photo organization app. The API’s services are available as part of the AWS offerings.
  • 9. Amazon Rekognition • License plate recognition, video recognition APIs, and better end-to-end integration examples of Rekognition APIs with AWS offerings like Kinesis Video Streams, Lambda, and others. • Also, Amazon’s API is the only one that can determine whether the subject’s eyes are open or closed.
  • 10. IBM Watson Visual Recognition • Under the Watson brand, IBM’s Visual Recognition offering started in early 2015. • After purchasing AlchemyAPI (Web3 development), a Denver-based startup, AlchemyVision has been used for powering the Visual Recognition APIs. • Like others, IBM also offers custom classifier training.
  • 11. IBM Watson Visual Recognition
  • 12. Algorithmia • Algorithmia is a marketplace for hosting algorithms as APIs on the cloud. • Founded in 2013, this Seattle-based startup has both its own in-house algorithms as well as those created by others .This API did tend to have the slowest response time. Colorization service for black and white photos (Figure 8-6), image stylization, image similarity, and the ability to run these services on premises, or on any cloud provider.
  • 13. Getting Up and Running with Cloud APIs • Calling these cloud services requires minimal code. • At a high level, get an API key, load the image, specify the intent, make a POST request with the proper encoding (e.g., base64 for the image), and receive the results. • Most of the cloud providers offer software development kits (SDKs) and sample code showcasing how to call their services. • They additionally provide pip installable Python packages to further simplify calling them.
  • 14. Getting Up and Running with Cloud APIs • Now, let’s test the same image using Google Vision APIs. Get an API key from their website and use it in the code. google_cloud_tagimage('DogAndBaby.jpg')
  • 15. Getting Up and Running with Cloud APIs cognitive_services_tagimage('DogAndBaby.jpg')
  • 16. Training Our Own Custom Classifier A few of these cloud providers give us the ability to train our own custom classifier by merely using a drag-and-drop interface. The pretty user interfaces provide no indication that under the hood they are using transfer learning. As a result, Cognitive Services Custom Vision, Google AutoML, Clarifai, and IBM Watson all provide us the option for custom training. Additionally, some of them even allow building custom detectors, which can identify the location of objects with a bounding box.
  • 17. Training Our Own Custom Classifier The key process in all of them being the following: 1. Upload images 2. Label them 3. Train a model 4. Evaluate the model 5. Publish the model as a REST API 6. Bonus: Download a mobile-friendly model for inference on smartphones and edge devices
  • 18. Training Our Own Custom Classifier step-by-step example of Microsoft’s Custom Vision. 1. Create a project (Figure 8-14): Choose a domain that best describes our use case. For most purposes, “General” would be optimal. For more specialized scenarios, we might want to choose a relevant domain.
  • 20. As an example, if we have an ecommerce website with photos of products against a pure white background, we might want to select the “Retail” domain. If we intend to run this model on a mobile phone eventually, we should choose the “Compact” version of the model, instead; it is smaller in size with only a slight loss in accuracy.
  • 21. 2.Upload (Figure 8-15): For each category, upload images and tag them. It’s important to upload at least 30 photographs per category. For our test, we uploaded more than 30 images of Maltese dogs and tagged them appropriately.
  • 23. 3. Train (Figure 8-16): Click the Train button, and then in about three minutes, we have a spanking new classifier ready. 4. Analyze the model’s performance: Check the precision and recall of the model. By default, the system sets the threshold at 90% confidence and gives the precision and recall metrics at that value. For higher precision, increase the confidence threshold. This would come at the expense of reduced recall. Figure 8-17 shows example output. 5. Ready to go: We now have a production-ready API endpoint that we can call from any application.
  • 25. Performance Tuning for Cloud APIs A photograph taken by a modern cell phone can have a high resolution and be upward of 4 MB in size. Depending on the network quality, it can take a few seconds to upload such an image to the service.
  • 26. Performance Tuning for Cloud APIs There are two ways to reduce the size of the image: Resizing Most CNNs take an input image with a size of 224 x 224 or 448 x 448 pixels. Much of a cell phone photo’s resolution would be unnecessary for a CNN. It would make sense to downsize the image prior to sending it over the network, instead of sending a large image over the network and then downsizing it on the server. Compression Most image libraries perform lossy compression while saving a file. Even a little bit of compression can go a long way in reducing the size of the image while minimally affecting the quality of the image itself. Compression does introduce noise, but CNNs are usually robust enough to deal with some of it.