Unlocking Text Analytics with Amazon Comprehend

Unlocking Text Analytics with Amazon Comprehend

We have been exploring the Google & Azure Document Intelligence features lately, in this article we will focus on AWS document intelligence offerings (mainly Amazon Comprehend). Intelligent Document Processing (IDP) on AWS consists of AI services that automate the extraction of data from documents and texts. Amazon Comprehend is a fully managed natural language processing (NLP) service from AWS that uses machine learning to uncover insights in text. Comprehend service can automatically analyze blocks of text and identify useful information without the need to train your NLP models. Below picture contains Amazon AWS IDP related key services (including comprehend). 

Article content

Amazon Comprehend provides both OOTB models (ready to use without training) and supports custom models for classification and entity recognition when we need to define categories or entity types. These services appear to be more appropriate for running documents like contracts and mortgage documents. However analysis can be done for the text extract of form documents as well.  

[Key capabilities of Amazon Comprehend] 

[1] Entity Recognition - Identifying named entities in text, such as people, places, organizations, dates, quantities, etc. For example, it can find names like “Sunil Kumar” (Person) or “Pune” (Location) in a sentence. 

Article content

[2] Sentiment Analysis - Determines overall sentiment or tone - positive, negative, neutral, or mixed and providing confidence scores. This is useful for analyzing customer feedback or reviews. 

Article content

[3] Key Phrase Extraction – Extracts important key phrases (mainly noun phrases) from text. This is little different from Key/Phrase extraction in GCP or Azure Doc Intelligence services. It returns a lot of junk word in the result which we would not generally consider as key phrase, it could be due to the nature of text (form document text extract was used here).  

Article content

[4] Language Detection – Identifies the dominant language of a text (from among 100+ supported languages) along with a confidence score.  

Article content

[5] Syntax Analysis – Analyzes the syntax of text (part of speech tagging of words). 

Article content

 [6] PII Detection – Detects personally identifiable information (PII) like addresses, credit card numbers in the text. 

Article content

[7] Topic Modeling – Groups collection of documents into topics (unsupervised document classification based on common themes). Follow the steps below to create a job to explore this feature.  

  • Navigate to Analysis Jobs on comprehend window 

  • Start a New Job  

  • Job Name – Under Job settings enter a name for your job 

  • Analysis Type – Choose Topic modeling from the dropdown of analysis types 

  • Specify Input Data – Under Input data, choose the source of documents from AWS S3 

  • Specify Output Data – choose where the results of the job will be stored  

  • Create the Job – Review your settings, then click “Create job” 

  • Initially, the job status will show as “In Progress”

Article content

  • View Job Results  

  • Inside the output, you will find at least two CSV files: topic-terms.csv and doc-topics.csv 

Note - More on this will be covered in upcoming articles.

This covers some of the common Amazon comprehend features, they make continuous updates to these services, keep checking the documentation to stay updated (flywheels, custom classification, custom entity recognition are some of the features which we didn't cover here). In upcoming articles, we will explore more on how to use these services and build our understanding on Amazon Document/Text Intelligence offerings.  

[Summary] 

In this article we covered how to use Amazon Comprehend through the AWS Management Console to perform basic NLP tasks for text layer of image file. With this understanding you should be able to try out Amazon Comprehend using the AWS console. You can experiment with your own texts in real time to see how Comprehend extracts insights from text layer. For example, you can upload a paragraph text from a news article to see what entities are mentioned and whether the sentiment is positive or negative. You can also run analysis jobs on larger text data (using S3) to process many documents at once. The console is a friendly way to get familiar with Comprehend’s capabilities before automating things via AWS SDK, APIs or CLIs. 





To view or add a comment, sign in

Others also viewed

Explore topics