2. Microsoft Azure AI Fundamentals: Computer
Vision
Subtitle or speaker name
Sreya E P
3. Agenda
•Fundamentals of Computer Vision
•Azure AI Vision
•Fundamentals of Facial Recognition
•Facial Analysis
•Azure Face Services
•Responsible AI Use
•Optical Character Recognition
•Azure AI Vision OCR Engine
•Vision Studio
•Demo
•Conclusion
5. Fundamentals of Computer Vision
Computer vision is one of the core areas of artificial intelligence (AI), and
focuses on creating solutions that enable AI applications to "see" the
world and make sense of it.
6. Azure AI Vision
• While you can train your own machine learning models for
computer vision, the architecture for computer vision models
can be complex; and you require significant volumes of
training images and compute power to perform the training
process.
• Microsoft's Azure AI Vision service provides prebuilt and
customizable computer vision models that are based on the
Florence foundation model and provide various powerful
capabilities.
7. Analyzing images with the Azure AI Vision service
Azure AI Vision supports multiple image analysis
capabilities, including:
•Optical character recognition (OCR) - extracting
text from images.
•Generating captions and descriptions of images.
•Detection of thousands of common objects in
images.
•Tagging visual features in images
9. Introduction
Face detection and analysis is an area of artificial intelligence (AI) which uses algorithms to locate and analyze human faces
in images or video content.
There are many applications for face detection, analysis, and recognition. For example,
•Security - facial recognition can be used in building security applications, and increasingly it is used in smart phones
operating systems for unlocking devices.
•Social media - facial recognition can be used to automatically tag known friends in photographs.
•Intelligent monitoring - for example, an automobile might include a system that monitors the driver's face to determine
if the driver is looking at the road, looking at a mobile device, or shows signs of tiredness.
•Advertising - analyzing faces in an image can help direct advertisements to an appropriate demographic audience.
•Missing persons - using public cameras systems, facial recognition can be used to identify if a missing person is in the
image frame.
•Identity validation - useful at ports of entry kiosks where a person holds a special entry permit.
10. Understand facial analysis
Face detection involves identifying regions of
an image that contain a human face, typically
by returning bounding box coordinates that
form a rectangle around the face
11. Understand facial analysis
With Face analysis, facial features can be used
to train machine learning models to return
other information, such as facial features such
as nose, eyes, eyebrows, lips, and others
12. Understand facial analysis
A further application of facial analysis is to train
a machine learning model to identify known
individuals from their facial features. This is
known as facial recognition, and uses multiple
images of an individual to train the model. This
trains the model so that it can detect those
individuals in new images on which it wasn't
trained.
13. Get started with facial analysis on Azure
Microsoft Azure provides multiple Azure AI services that you can use to detect and analyze faces, including:
•Azure AI Vision, which offers face detection and some basic face analysis, such as returning the bounding box
coordinates around an image.
•Azure AI Video Indexer, which you can use to detect and identify faces in a video.
•Azure AI Face, which offers pre-built algorithms that can detect, recognize, and analyze faces.
Of these, Face offers the widest range of facial analysis capabilities.
14. Azure AI Face service
The Azure AI Face service can return the rectangle coordinates for any human faces that are found in an image, as
well as a series of related attributes:
• Accessories: indicates whether the given face has accessories. This attribute returns possible accessories
including headwear, glasses, and mask, with confidence score between zero and one for each accessory.
• Blur: how blurred the face is, which can be an indication of how likely the face is to be the main focus of the
image.
• Exposure: such as whether the image is underexposed or over exposed. This applies to the face in the image
and not the overall image exposure.
• Glasses: whether or not the person is wearing glasses.
• Head pose: the face's orientation in a 3D space.
• Mask: indicates whether the face is wearing a mask.
• Noise: refers to visual noise in the image. If you have taken a photo with a high ISO setting for darker settings,
you would notice this noise in the image. The image looks grainy or full of tiny dots that make the image less
clear.
• Occlusion: determines if there might be objects blocking the face in the image.
• Quality For Recognition: a rating of high, medium, or low that reflects if the image is of sufficient quality to
attempt face recognition on.
15. Responsible AI Use
Anyone can use the Face service to:
• Detect the location of faces in an image.
• Determine if a person is wearing glasses.
• Determine if there's occlusion, blur, noise, or over/under exposure for any of the faces.
• Return the head pose coordinates for each face in an image.
The Limited Access policy requires customers to submit an intake form to access additional Azure AI Face service capabilities
including:
•Face verification: the ability to compare faces for similarity.
•Face identification: the ability to identify named individuals in an image.
•Liveness detection: the ability to detect and mitigate instances of recurring content and/or behaviors that indicate a violation
of policies (eg. such as if the input video stream is real or fake).
17. Introduction
OCR, or Optical Character Recognition, is a technology that converts different
types of documents, such as scanned paper documents, PDFs, or images
captured by a digital camera, into editable and searchable data.
Key Points About OCR
1.Functionality:
1. OCR scans text characters in images and converts them into
machine-encoded text. This includes recognizing printed text,
handwritten text, or other textual content within images.
2.Applications:
1. Digitizing Documents: Converting physical paper documents into
digital formats, making them easier to store, search, and share.
2. Text Extraction: Extracting text from images and PDFs to use in
other applications, such as databases, word processors, and
spreadsheets.
3. Automation: Automating data entry processes, reducing manual
input errors, and increasing efficiency.
18. Get started with Azure AI Vision
• The ability for computer systems to
process written and printed text is an
area of AI where computer
vision intersects with natural language
processing.
• Vision capabilities are needed to "read"
the text, and then natural language
processing capabilities make sense of it.
• OCR is the foundation of processing text
in images and uses machine learning
models that are trained to recognize
individual shapes as letters, numerals,
punctuation, or other elements of text.
19. Azure AI Vision's OCR Engine
• Azure AI Vision service has the ability to extract machine-readable text from images. Azure AI Vision's Read API is the
OCR engine that powers text extraction from images, PDFs, and TIFF files. OCR for images is optimized for general,
non-document images that makes it easier to embed OCR in your user experience scenarios.
• The Read API, otherwise known as Read OCR engine, uses the latest recognition models and is optimized for images
that have a significant amount of text or have considerable visual noise. It can automatically determine the proper
recognition model to use taking into consideration the number of lines of text, images that include text, and
handwriting.
• The OCR engine takes in an image file and identifies bounding boxes, or coordinates, where items are located within
an image. In OCR, the model identifies bounding boxes around anything that appears to be text in the image.
20. Azure AI Vision's OCR Engine
Calling the Read API returns results
arranged into the following
hierarchy:
•Pages - One for each page of
text, including information about
the page size and orientation.
•Lines - The lines of text on a
page.
•Words - The words in a line of
text, including the bounding box
coordinates and text itself.
21. Get started with Vision Studio on Azure
To use the Azure AI Vision service you must first create a resource for it in your Azure subscription. You can use
either of the following resource types:
•Azure AI Vision: A specific resource for vision services. Use this resource type if you don't intend to use any
other AI services, or if you want to track utilization and costs for your AI Vision resource separately.
•Azure AI services: A general resource that includes Azure AI Vision along with many other Azure AI services
such as Azure AI Language, Azure AI Speech, and others. Use this resource type if you plan to use multiple
Azure AI services and want to simplify administration and development.
Once you've created a resource, there are several ways to use Azure AI Vision's Read API:
•Vision Studio
•REST API
•Software Development Kits (SDKs): Python, C#, JavaScript