ABBYY FineReader Engine 10 for Windows

OCR and Other Recognition Technologies

Optical Character Recognition (OCR)

  • OCR technology — printed text recognition is available for 198 languages, including:
    • European languages (Latin, Cyrillic, Armenian, Greek alphabets) 
    • Chinese (Simplified and Traditional), Japanese, and Korean (CJK) 
    • Thai, Vietnamese and Hebrew
    • Arabic — technical preview version 
    • FineReader XIX — an OCR module designed specifically for digitizing and archiving old documents, books and newspapers published in the XVII-XX centuries, many of which are rare and unique. Stored in the historical archives of libraries and government organizations, they are national heritage that must be preserved. FineReader XIX provides a unique capability to recognize texts published in the period from 1600 till 1937 in English, French, German, Italian and Spanish. It supports recognition of old fonts such as Fraktur, Schwabacher and the majority of Gothic fonts. 
  • 47 languages have dictionary/morphology support that is significantly improves OCR accuracy.
  • Multilingual documents recognition feature provides recognition of several languages e.g. German and Chinese; English, Russian and Korean at the same document. 
  • Dot-matrix documents recognition — ABBYY FineReader Engine recognizes printed dot matrix texts of many types. It has been trained using several thousand samples produced by a variety of printers including dot matrix, daisy wheel, chain and band printers, as well as using draft and Near Letter Quality (NLQ) printing modes. 
  • Typewritten documents recognition. 
  • Recognition of OCR-A, OCR-B, MICR (E13B) and CMC7 fonts.  

 See The Full List of Supported Languages

Intelligent Character Recognition (ICR)

Optical Mark Recognition (OMR)

The ABBYY’s OMR technology recognizes simple checkmarks, grouped checkmarks, model checkmarks and checkmarks with “corrections” made by hand in different variations:

  • Char Box Series
  • Comb In Frame
  • Grey Boxes
  • Partitioned Frame
  • Simple Comb
  • Text In Frame
  • Underlined Text

OMR delivers accuracy rate of 99.995 %

Optical Barcode Recognition (OBR)

  • 1D and 2D barcode types. ABBYY OCR SDK supports recognition of popular types of 1D and 2D barcodes. See The Full List of Supported Barcodes
  • Fast barcode extraction. This feature enables automated detection and recognition of barcodes at any angle on a document. It works both for 1D and 2D barcodes

Recognition modes

With the Engine's pre-defined processing modes, developers have the ability to quickly set up and tune the processing speed and accuracy in a way which is the most appropriate for their needs. In addition to the default processing mode, both OCR and ICR recognition can be performed in normal, fast and balanced recognition modes:

 

Full Text and Field-Level Recognition

There are two types of recognition which could be separated: full text and field-level recognition. The main difference is that full text recognition usually includes OCR technology and used for document conversion. Field-level recognition includes OCR, ICR and other technologies that are used in local area for recognizing and extraction particular data.

The following table shows specifications of these recognition types:

 

 

Specification

 

Full text recognition

 

Field-level recognition
Where is used Document conversion, books archiving Data capture
Document analysis General document analysis, document analysis for invoices, document analysis for full-text indexing Manual blocks specification for field-level recognition
Recognition OCR with general accuracy about 96-99% OCR, ICR, OMR, Barcodes recognition with predefined data types and values range. Accuracy is about 100%
Verification Recommended for content reuse Obligatory in most cases
Synthesis Used for document retrieval Not used
Export of recognition results Document files (RTF, DOC, PDF, etc.) Export to XML file or database

 

Full text recognition

Full text recognition is a basic recognition type for different tasks, like:

  • Documents and books conversion for archiving
  • Document conversion for content reuse 
  • Ground text extraction for fields detection and documents classification

Learn more about recognition tasks ›

All of them require the recognition (OCR) of whole text on document (page). Before recognition the document analysis usually processes for splitting and correct orientation of pages, detection of text blocks, pictures and other objects.

Then after OCR document synthesis rebuilds the structure and layout of document (for content reuse task) or just retrieves the correct text order for complex documents with several text columns and pictures (for archive scenario). Resulted text is exported depending on task as pure text or as a document of supported format.

The text could be manually verified for increasing its accuracy, especially for future reuse.

Field-level recognition

ABBYY FineReader Engine 10 delivers complete field-level recognition capabilities to support key business processes such as forms processing, keyword classification, and keyword indexing. Powerful image processing functions increase its ability to intelligently detect small zone areas of any quality, with any type of graphic specifics which may affect the recognition accuracy (i.e. underlined text, after-scanning garbage, spaces in the text, etc.)
Key functionality for field-level or zonal recognition includes multilingual OCR and ICR, OMR, barcode recognition and a range of specific functions, such as:

  • Data extraction from fields with various borders and frames, including combo-box, underlined fields, boxes, and even fields where the data does not fit within the field border
  • Definition of field content by setting alphabets, dictionaries, regular expressions, types of segmentations, handwriting styles, etc. 
  • Detection of in-field spacing, accurately recognizing fields where the spaces are allowed. ABBYY FineReader Engine 10 also allows use of dictionaries which contain word combinations with spaces 
  • Intelligent processing of blocks with intersecting parts and lines, provides recognition of text (words and symbols) located entirely within the block borders, saving time spent on non-relevant text block recognition 
  • Text block despeckle, with the ability to specify the size of white or black "garbage"

Field-level recognition is supported by the Engine’s special tools for developers such as Voting API and "On-the-Fly" Recognition Tuning. For details, please see Advanced Development Tools.

User languages

ABBYY FineReader Engine provides an API for creating and editing recognition languages, creating copies of predefined recognition languages and adjusting them, and adding new words to user languages.

Below are two examples illustrating how user languages can help you to improve recognition quality:

  • In documents filled out by hand, the values in the form fields usually belong to a specific set such as city names, countries, zip codes, product codes, sums, etc. To improve the quality of ICR recognition, you can use user languages to describe the information which may be entered in each field.
  • If a document contains "structures" such as product codes, telephone numbers, passport numbers etc., recognition errors may occur. This happens because the program reads such structures letter by letter. To improve the recognition of product codes and the like, you can create a new recognition language which will help the program to read specific types of data correctly.

Pattern Training

In the vast majority of cases FineReader Engine can successfully read texts without prior training. However, in such cases as recognition of decorative or outlined fonts or bulk input of low print quality documents, preliminary pattern training will prove useful.

The OCR SDK allows you to create and use user patterns or import them from the ABBYY FineReader desktop application (Professional or Corporate Edition). FineReader Engine is flexible and applicable to build up an application of any architecture, either it is a client workstation or a server-based solution.

 


All OCR processing stages:

Image Import
Image Processing
Document Analysis
OCR and Other Recognition Technologies
Receiving and Exporting Recognized Text

<< Back to OCR Processing Stages for OCR SDK