From Templates to Prompts: How Document Capture Is Embracing GenAI

From Templates to Prompts: How Document Capture Is Embracing GenAI

Generative AI is making a big impact on enterprise applications, and the image capture space is also starting to see these changes. GenAI is bringing new ways to handle tasks that could really change how this field works. Most intelligent capture platforms now integrate generative AI to enhance document capture and understanding. In this article we will go through some of the progress Tungsten (formerly Kofax) TotalAgility has enabled LLMs into its low-code process orchestration with AI-driven extraction and conversational interfaces. These capabilities allow users to interact with the system in natural language, enable them to build solutions way faster and get insights from documents.  

Sharing a few thoughts on how GenAI could reshape image capture work.

  • Building capture projects will take less time because GenAI makes it faster and easier to set up data extraction. 

  • Complex documents can be handled without needing a lot of custom rules or templates. 

  • We can still use rule-based methods to double check important data fields that GenAI pulls out, to make sure it’s correct. 

  • Existing capture tools may shift their role—instead of doing all the extraction work, they could be used to catch errors or “hallucinations” from GenAI and document processing workflows. 

  • Processing large numbers of documents will still need applications that can track and audit everything that happens. 

  • User interfaces where people can review and tag documents will still matter, especially when the AI isn’t confident. Straight through processing is risky at this point of time.

  • The real value will come from writing smart prompts that help GenAI get the right data out of documents. 

  • Prompts used for classifying and extracting data can be reused across different tools, making those flexible and easy to transfer between tools. 

In this article we will cover some of the recent additions to TotalAgility platform, our focus will be on features with GenAI capabilities.  

[Conversational Copilot Interfaces] 

  1. Copilot for Development  - A user can type in request or upload a hand-drawn image of design for the Development workflow, generating workflows, data models, forms, and business rules based on the description. And copilot creates these artifacts in real time with LLMs. This speeds up solution development by allowing citizen developers to build automation using everyday language. 

  2. Copilot for Extraction - Users describe in natural language the fields or information they want from a document, and the platform generates the extraction data. This enables users to describe what they would like to extract across multiple languages and cuts model development time by up to 80%. It works on unstructured or variable documents without lengthy training, by intelligently breaking down documents, extracting text, and applying the user’s prompt to retrieve specified data points. Copilot for Extraction eliminates manual field definition and training, handling high variability in layouts while reducing maintenance overhead. 

  3. Copilot for Insights - Provides a conversational AI assistant for data analysis and document understanding. Users can converse with data naturally by asking questions and getting instant answers with annotations back to the source, even across large collections of documents from your repository. This is work in progress, more uses cases and integrations will come in new future I believe.

[Intelligent Document Understanding with LLMs] 

Feature called Auto-Extract uses an LLM to automatically pull key-value data from documents with no prior template or training. This is essentially zero-shot extraction – given a document, the AI identifies important fields (like dates, totals, names). No training extraction to automatically identify key-value pairs, reducing build time by 90%.  

Total Agility also integrates with Azure AI Document Intelligence (or Google Document OCR) for improved text layer extraction from document (especially handwritten data) and then use it with the generative Auto-Extract. It is like getting the best text layer available and then using it with advanced LLMs using prompts (with document text layer sent as context - RAG), leading to best extraction results.

[Knowledge Integration with LLMs] 

In the roadmap generative AI is coupled with knowledge sources to produce field results. It uses techniques like retrieval augmented generation (RAG) to feed corporate (document/image) data to the LLMs. This implies that the tool can connect to enterprise content (documents, databases, CRM records) so that the AI’s answers reference up-to-date and relevant information (mitigating hallucinations). For example, a Copilot might pull in relevant policy documents or past cases from a knowledge base when a user asks a question, ensuring the answer is based on actual data. Intelligent Search & Retrieval allows users to search across documents and get summaries of key points or direct Q&A with cited references. This feature effectively uses an LLM to read complex documents and pinpoint information, boosting knowledge workers in content-heavy tasks. The idea is to intelligently chunk long form content and provide a curated context to the LLM which could reduce the risk of hallucinations. By combining its IDP extraction, search, and the LLM - TotalAgility can deliver summaries of lengthy documents, answers queries with evidence.  

Coming to the technical details, you can connect LLMs on the Total Agility Platform via one of these three ways.  

[1] OpenAI ChatGPT - Supports integration with ChatGPT OpenAI, most of the GenAI integration features on the platform with OpenAI. 

[2] Azure OpenAI - Supports integration with Azure OpenAI generative AI models, with this also most of the GenAI integration features on the platform with OpenAI 

[3] Custom LLM – Allows us to integrate other language models by defining a REST API or a custom workflow as the intermediary. Here you can invoke a chat completion API with other providers. This means an enterprise could use an on-prem model or another third-party LLM (e.g. with AWS SageMaker or Google Vertex AI) by writing a small adapter process.  

Note - Copilot for Extraction & Auto Extract features works either OpenAI ChatGPT or Microsoft Azure OpenAI and not with Custom LLMs. Meaning with OpenAI and Azure OpenAI TotalAgility has tighter integrations.

[Summary] 

Generative AI is transforming how document capture and automation projects are delivered. These capabilities simplify the handling of complex or unstructured documents, which previously required time-consuming rule or template-based configurations. While GenAI offers powerful automation, it's equally important to validate its outputs. Rule based checks and human in the loop interfaces remain critical, especially for high-confidence, high-volume environments.  These advances bring speed and cost savings, especially as LLM APIs become more affordable (right now cost is a big question) in future, they also introduce challenges. Generative AI models can behave unpredictably, which makes them risky for straight-through processing without oversight.

These new enhancements offers a blueprint for doing it within a governed, enterprise workflow context. It combines the conversational power of modern LLMs with the structure of enterprise data, business rules, and audit trails. Other platforms are also moving in this direction, bringing together document intelligence, BPM, RPA, and generative AI into one solution. To summarize, this approach brings together the strengths of generative AI and the reliability of traditional capture, creating efficient way to automate document heavy capture business processes. 

To view or add a comment, sign in

Others also viewed

Explore topics