Back to The Intelligent Enterprise

How to Successfully Integrate Computer Vision, Large Language Models, and Intelligent Document Processing

by Marcelo Ribeiro Araújo, Solution Consultant
It is possible to integrate technologies such as computer vision and large language models into intelligent document processing to obtain an automated process with the least level of human interaction possible. Integrating these technologies allows for a multidisciplinary approach to automation, dealing with different types of data and information.

Hyperautomation is an approach used in organizations to quickly identify, verify, and automate as many processes as possible. In this context, intelligent document processing (IDP) is certainly a key component, since most organizational processes are based on electronic documents or paper.

IDP is capable of capturing, extracting, and processing data from a variety of document formats. An IDP solution applies artificial intelligence (AI) and machine learning (ML) techniques to process structured, semi-structured, and unstructured documents. These technologies understand the content of documents and are able to extract the data, allowing for process automation.

However, with the advancement of the use of new technologies, some business processes move from only documents to images, videos, and document interpretation. In this scenario, additional technologies are needed to achieve hyperautomation.

In business processes that include images and videos, the use of computer vision (CV) is more efficient than IDP technology. Computer vision is an area of artificial intelligence that focuses on the ability of machines to interpret and understand visual content and can be trained to recognize a wide variety of objects, patterns, and features in images. The use of CV can be added to the automation process by consuming services such as Microsoft Cognitive Services, AWS Amazon Rekognition, and Google Cloud Vision, among others, or even through specialized open libraries such as OpenCV and Python programming language.

Similarly, there are scenarios that go beyond data extraction, in which an LLM can contribute to a higher level of automation. Large language models (LLMs) are an area of artificial intelligence that use algorithms to create human language from a large set of texts used as reference. If the task involves understanding of complex contexts, activities such as document summarization, translation, or simply answering a question related to the context, an LLM may be more suitable. OpenAI (ChatGPT), Microsoft (Bing), and Google (Bart) are services that can be integrated into IDP solutions to perform these activities.

Using ABBYY Vantage, our low-code intelligent document processing platform, it is possible to integrate technologies such as CV and LLM into the IDP to obtain an automated process with the least level of human interaction possible. Integrating these technologies allows for a multidisciplinary approach to automation, dealing with different types of data and information.

Claims automation: An ABBYY Vantage use case

The ABBYY Vantage intelligent document processing platform provides easily consumable artificial intelligence-based skills/models to understand documents quickly and easily. Vantage provides over 150 pre-trained skills (AI models) that can read, understand, and extract information from business documents, helping businesses accelerate hyperautomation.

Additional technologies such as CV and LLM can be easily consumed by ABBYY Vantage with API execution. Serverless components like Azure Functions, AWS Lambda, or Google API Engine are flexible, scalable, and pay-as-you-go—often a great method for incorporating technologies.

In this practical insurance use case, we created a process for claims automation, in which various documents can be sent, such as vehicle photos, odometer photos, driver's licenses, police reports, and invoices through the ABBYY Vantage mobile capture application.

The photos of the vehicles must be validated according to the position of the vehicle, and must be taken front left, front right, rear left, and rear right with the purpose of proving the condition of the vehicle. The odometer photo, on the other hand, is intended to prove the current mileage of the vehicle. Driver's license, police report, and invoice aim to provide complementary information on the process.



This usage scenario using ABBYY Vantage demonstrates how hyperautomation requires an integrated approach, combining specialized technologies such as intelligent document processing, computer vision, and large language models, among others. For an in-depth, step-by-step description of all of the activities and technologies at play in this use case, download the full white paper.

By merging these technologies, organizations can automate processes, increase efficiency, reduce errors, and promote smarter, more adaptable automation and a better customer experience. This integrated approach not only drives productivity but also positions businesses to address growing challenges, enabling an agile response to ever-evolving market demands.

Subscribe for updates

Get updated on the latest insights and perspectives for business & technology leaders


Connect with us