ABBYY FineReader Engine

The most comprehensive OCR SDK for software developers.

Integrate AI-powered OCR features into your applications.

Document classification using Machine Learning

About technologies

ABBYY FineReader Engine provides an API for document classification, allowing you to create applications, which automatically categorize documents and sort them into predefined document classes. The advanced document classification leverages modern technologies such as machine learning. These technologies are able to detect even subtle differences among individual document categories and allow setting up flexible and scalable classification processes that can granularly distinguish among many document categories.

The new intelligent Image Classifier is able to collect and process visual information about document images and delivers fast classification results. The advanced Text Classifier is able to extract and process information about the documents’ content, which increases the classification accuracy. The Image Classifier and the Text Classifier can be used individually, or in combination.

How does it work?

In principal, the classification process consists of three steps:

Factoids 025 77X77

Preparing data sets for classification training

At this step, the requested document classes are defined. For each document class, several document examples - with similar appearance and/or content - are selected. With the help of Machine Learning algorithms, the ABBYY technology analyzes the training documents within each document class and defines parameters that should be used to identify the respective document class.

Factoids 020 77X77

Training the Classification Model

During this step, information about document classes and respective parameters is imported into the Classification Model and the Classification Model is trained. The model can use Image Classifier, Text Classifier or a combination of both. The performance can be optimized by defining the balance between high recall and high precision. Cross validation of data is available to test the quality of the Classification Model.

Factoids 018 77X77

Classification deployment

During the classification process, the Classification Model analyzes each incoming document. To correctly determine the document type, the Classification Model calculates requested parameters for each document and compares them with the information it received during the training step. Developers can create a routine, which allows users to flexibly update the training data set and re-train the Classification Model.

In addition to the information about detected document categories, the information about the probability that documents belong to them is provided. The probability information can be used to determine next processing steps, such as forwarding documents to the relevant company departments or re-classifying them.

In ABBYY FineReader Engine’s documentation, the classification process is illustrated by a code sample, which can be used for testing, adjusted and integrated in own applications.

Classification modes

Depending on the usage scenario, the classification can be optimized for high precision, high recall or a balance between these.

Benefits 118V 77X77

High precision mode

This mode is recommended in scenarios, where it is important to precisely classify documents into the right categories and limit wrong class assignment to a minimum.

Documents identified as belonging to the class A, should really belong to the class A and not to the class B, while it is acceptable that ‘uncertain’ documents belonging class A would not be classified as such and might be left out.

Key focus: Precisely categorize documents and limit the risk of assigning documents to wrong document classes.

Benefits 165R 77X77

High recall mode

This mode is recommended in scenarios, in which it is important to detect all documents belonging into a certain category among all available documents, and limit the risk that they might be missed.

Key focus: Within a document batch, detect all documents belonging to a certain class and limit the risk of leaving them out.

Request information

Request trial

Fill out the form below, and we will get back to you shortly.

I agree to receive email updates from ABBYY such as news related to ABBYY products and technologies, invitations to events and webinars, and information about whitepapers and content related to ABBYY products and services. I am aware that my consent could be revoked at any time by clicking the unsubscribe link inside any email received from ABBYY or via ABBYY Data Subject Access Rights Form.

By submitting this form, I consent to the use of my personal information for the purposes described in the Privacy Notice.

Thank you for your interest in ABBYY products!

We will get back to you shortly. If you’d like to contact ABBYY office in your region, please visit the Contacts page.

With best regards,
The ABBYY Team