Unlocking Text Analytics with Amazon Comprehend

Vijay Chaudhary

Lead Software Engineer

Published May 4, 2025

We have been exploring the Google & Azure Document Intelligence features lately, in this article we will focus on AWS document intelligence offerings (mainly Amazon Comprehend). Intelligent Document Processing (IDP) on AWS consists of AI services that automate the extraction of data from documents and texts. Amazon Comprehend is a fully managed natural language processing (NLP) service from AWS that uses machine learning to uncover insights in text. Comprehend service can automatically analyze blocks of text and identify useful information without the need to train your NLP models. Below picture contains Amazon AWS IDP related key services (including comprehend).

Amazon Comprehend provides both OOTB models (ready to use without training) and supports custom models for classification and entity recognition when we need to define categories or entity types. These services appear to be more appropriate for running documents like contracts and mortgage documents. However analysis can be done for the text extract of form documents as well.

[Key capabilities of Amazon Comprehend]

[1] Entity Recognition - Identifying named entities in text, such as people, places, organizations, dates, quantities, etc. For example, it can find names like “Sunil Kumar” (Person) or “Pune” (Location) in a sentence.

[2] Sentiment Analysis - Determines overall sentiment or tone - positive, negative, neutral, or mixed and providing confidence scores. This is useful for analyzing customer feedback or reviews.

[3] Key Phrase Extraction – Extracts important key phrases (mainly noun phrases) from text. This is little different from Key/Phrase extraction in GCP or Azure Doc Intelligence services. It returns a lot of junk word in the result which we would not generally consider as key phrase, it could be due to the nature of text (form document text extract was used here).

[4] Language Detection – Identifies the dominant language of a text (from among 100+ supported languages) along with a confidence score.

[5] Syntax Analysis – Analyzes the syntax of text (part of speech tagging of words).

[6] PII Detection – Detects personally identifiable information (PII) like addresses, credit card numbers in the text.

[7] Topic Modeling – Groups collection of documents into topics (unsupervised document classification based on common themes). Follow the steps below to create a job to explore this feature.

Navigate to Analysis Jobs on comprehend window

Start a New Job

Job Name – Under Job settings enter a name for your job

Analysis Type – Choose Topic modeling from the dropdown of analysis types

Specify Input Data – Under Input data, choose the source of documents from AWS S3

Specify Output Data – choose where the results of the job will be stored

Create the Job – Review your settings, then click “Create job”

Initially, the job status will show as “In Progress”

View Job Results

Inside the output, you will find at least two CSV files: topic-terms.csv and doc-topics.csv

Note - More on this will be covered in upcoming articles.

This covers some of the common Amazon comprehend features, they make continuous updates to these services, keep checking the documentation to stay updated (flywheels, custom classification, custom entity recognition are some of the features which we didn't cover here). In upcoming articles, we will explore more on how to use these services and build our understanding on Amazon Document/Text Intelligence offerings.

[Summary]

In this article we covered how to use Amazon Comprehend through the AWS Management Console to perform basic NLP tasks for text layer of image file. With this understanding you should be able to try out Amazon Comprehend using the AWS console. You can experiment with your own texts in real time to see how Comprehend extracts insights from text layer. For example, you can upload a paragraph text from a news article to see what entities are mentioned and whether the sentiment is positive or negative. You can also run analysis jobs on larger text data (using S3) to process many documents at once. The console is a friendly way to get familiar with Comprehend’s capabilities before automating things via AWS SDK, APIs or CLIs.

Unlocking Text Analytics with Amazon Comprehend

Vijay Chaudhary

Lead Software Engineer

[Key capabilities of Amazon Comprehend]

AI-ML & Automations

1,632 follower

More articles by this author

Others also viewed

Introduction to Classification: Likelihoods, Margins, Features, and Kernels

Natural Language Processing Basics: From Tokenization to Word Embeddings

Evolution of Word Embeddings: A Journey Through NLP History

🔍 Speed Up Your Search: How NLP is Revolutionizing Search Systems in the AI Era

What Is NLP Text Classification?

Unraveling the Power of Machine Learning in NLP: A Beginner's Guide to MLP

How Does RAG Differ from Traditional NLP Models?

Transforming financial data with machine learning

Unlocking the Power of Text 📜🔢

Transfer Learning for Link Prediction on Knowledge Graphs

Explore topics

[Key capabilities of Amazon Comprehend]

AI-ML & Automations

1,632 follower

Small problem big outcome - A Practical Guide to OMR

Aug 16, 2025

Text Recognition to Intelligence - IDP with OCR Tools

Jul 14, 2025

The Rise of Generative IDP - AI Meets Document Capture

Jun 29, 2025

From Templates to Prompts: How Document Capture Is Embracing GenAI

Jun 14, 2025

Unlocking Document Insights with Amazon Textract

Apr 22, 2025

Document Processing: Named Entity Recognition with Azure Services

Apr 6, 2025

Document Intelligence: A Dive into LayoutLM and Cloud Offerings

Mar 31, 2025

Agents with a Mind: A Practical Start to Agentic AI

Mar 23, 2025

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

Mar 16, 2025

Splitting Text Right Way - NLTK, SpaCy or Markdown

Mar 2, 2025

Others also viewed

Introduction to Classification: Likelihoods, Margins, Features, and Kernels

Natural Language Processing Basics: From Tokenization to Word Embeddings

Evolution of Word Embeddings: A Journey Through NLP History

🔍 Speed Up Your Search: How NLP is Revolutionizing Search Systems in the AI Era

What Is NLP Text Classification?

Unraveling the Power of Machine Learning in NLP: A Beginner's Guide to MLP

How Does RAG Differ from Traditional NLP Models?

Transforming financial data with machine learning

Unlocking the Power of Text 📜🔢

Transfer Learning for Link Prediction on Knowledge Graphs

Explore topics