ABBYY Artificial Intelligence
Purpose-Built AI Center

At the heart of ABBYY solutions we employ a combination of technologies to deliver best-in-class intelligent document processing (IDP).
Innovative AI is built into ABBYY’s IDP platform in all steps within the intelligent document processing pipeline, from image enhancement to object detection, OCR/ICR, classification, extraction from semi-structured documents, and extraction from unstructured documents.
Using the right combination of technologies and techniques, ABBYY IDP solutions can process any kind of document—any format, any language, any structure. All our specialized techniques have been optimized for the best possible inferences and the least amount of resources required so they can have optimal cost and deliver the –greatest ROI for our customers.

Cutting-edge AI tools powering ABBYY’s
purpose-built solutions
A combination of highly optimized for the task AI models and algorithms.
Phoenix 1.0
Phoenix 1.0 is a cutting-edge multimodal model that combines advanced image and text analysis by integrating Convolutional Neural Networks (CNNs) for visual data processing with the RoBERTa language model for text comprehension. Phoenix features an innovative AI-driven pipeline that offers zero-shot key/value pair extraction capabilities, automating the most cumbersome tasks of document model training. Unlike broader language models on its own that address a wide range of language understanding tasks, Phoenix provides a more robust framework for document processing, particularly when dealing with multimodal data. It offers enhanced capabilities in feature extraction, efficiency in processing workflows, and a deeper understanding of context that broad language models alone may not fully achieve. This specialization makes it an ideal choice for use cases that rely heavily on information transmitted through documents, ensuring data is processed with precision and swift turnaround times.
Phoenix was developed with a targeted focus on enhancing the efficiency and effectiveness of document processing tasks. By leveraging the strengths of Convolutional Neural Networks for image analysis alongside the advanced language comprehension of RoBERTa, this integration allows for a nuanced understanding of complex documents that contain both textual and visual elements. This focused approach means that businesses can achieve superior accuracy in extracting and analyzing information compared to using general-purpose models. Furthermore, the design minimizes resource consumption by streamlining the processing workflow, ultimately improving speed and reducing operational costs. As a result, organizations can process documents more effectively, yielding significant value in the realm of document processing and enhancing overall productivity.
Machine learning
Our intelligent document processing leverages a mix of technologies to deliver unparalleled performance. A combination of deep machine learning and fast machine learning maximizes the straight-through processing (STP) rate. With our document-specific AI models, pre-trained using deep machine learning, our customers can achieve as much as 90 percent accuracy right out of the box. But with the inclusion of fast machine learning, that accuracy climbs above 95 percent. Fast machine learning will memorize the outliers that deep machine learning couldn’t get, and it works quickly, with just a few variations of the documents in question. And with the data we collect from that process, our deep learning continually improves to deliver higher and higher accuracy over time.
Deep learning allows us to pre-train AI models specifically for document processing tasks. Unlike general-purpose LLMs or Gen AI, which are designed for a broad range of tasks, our deep learning models excel in their specialized purpose, providing more reliable and accurate results.
- Deep machine learning (ML) uses CNNs (convolutional neural networks), RNNs (recurring neural networks), and NLP (natural language processing) to extract information from semi-structured documents. It generalizes across various document formats, effectively handling unseen layouts without relying on templates. Although it requires a substantial amount of labeled data—between 500 to 10,000 documents—for accurate field extraction, the extended training process ensures high precision, making it a powerful tool for complex data interpretation.
- Fast machine learning (ML) focuses on textual and visual patterns, working efficiently with as few as one or two documents per set. It uses clustering technology that groups similar-looking document layouts together and internally trains a field extraction model for each cluster. Unlike deep ML, this approach focuses on document variations it has already “seen” rather than generalizing the patterns. Its clear advantage is that it accelerates the learning process, requires less CPU power and yields shorter processing times.



















