Document analysis in data capture

Document analysis refers to the methods of automatically identifying components of a document. ABBYY’s award-winning FlexiCapture Technology provides a solution for a wide variety of data capture problems by giving the system increased intelligence and flexibility. Through the FlexiLayout™, a logical definition of layout of data - FlexiCapture frees you from the restrictions of template-based form matching (such as reliance on the exact placement of fields on the page). The system can find fields else-where, using any information available: relation to other objects on t he page, contents of the field, its size, lines drawn around, etc.

An important feature of the Data Capture scenario is that only certain fields are recognized. FlexiCapture imitates the way humans recognize objects. In order to detect the required data human operator is looking for the special fields on the document. He founds the field and analyzes areas around it. Our product does the same. It finds the required fields on flexible forms by using a special formalized description, called FlexiLayout™ - that created with a special visual tool – FlexiLayout Studio. Then the program analyzes the area surrounding each element and make inferences about the nature of the fields and their content.


Developing with FlexiCapture Engine usually is done in 2 steps:

  1. You should analyze the nature of documents to be used for data extraction and create proper Document Definitions that can be based on either Fixed Form Templates or Flexible Layouts.
  2. After that you can integrate the Engine into your application.

Development of document templates for fixed forms

The Document Template Editor allows fast and intuitive development of document templates to process static, fixed forms.

  1. Load the different segments of the multipage form to the editor.
  2. Define the elements that are used to match the document: anchors, static texts and separators.
  3. Define the different recognition areas in a graphical editor where, e.g. text blocks, tables, checkmarks, checkmark groups, barcodes and pictures are located.
  4. Set up the recognition properties, for each areas, e.g OCR, ICR and attach data type definitions, dictionaries and verification rules.




Development of multipage FlexiLayout templates with FlexiLayout Studio


The FlexiCapture Studio user interface is designed to simplify FlexiLayout creation by directing the developer through a set of dialog boxes. In complicated cases requiring more detailed customization and assistance, FlexiCapture Studio provides direct access to its internal structural language for greater flexibility and more control.

  1. Load a selection of documents with different layouts.
  2. Define some generic elements that allow to identify documents and that can be used for orientation within one document, e.g. text strings, lines, spaces between elements.
  3. Define search elements for the data you are looking for e.g. text, numbers, date, tables, the length of the sting, the set of characters, one or multiple words, one or multiple lines.

Additionally these elements are set into a relation with other areas set up in 2), for example right or below.

Creating document definitions on-the-fly

In ABBYY FlexiCapture Engine developers can create Document Definitions on-the-fly using methods provided by ABBYY FlexiCapture Engine:

Creating simple document definition (with only one section*)

Creating compound document definition (with several sections*)

  • Use the CreateDocumentDefinitionFromAFL method of the Engine object to create a document definition from ABBYY Flexible Layout
  • Create an empty document definition using the CreateDocumentDefinition method of the Engine object
  • Create a new fixed section from an XFD file
  • Create a new flexible section from an AFL file
  • Use the CreateDocumentDefinitionFromXFD method of the Engine object to create a document definition from XML Form Definition

*The section of a document definition is a set of pages (or a single page) and fields on these pages, which corresponds to a logically complete part of a document.


Contact our sales team for more information