/Pages/ProductsPage.aspx
 

ABBYY FineReader Engine 8.0 EPS

Key Features

Image Manipulation and Pre-processing

ABBYY FineReader Engine can receive images by two ways: getting directly from memory or opening from files. It supports major imaging formats, including multi-page TIFFs, JPEG 2000 (part 1), and works with black-and-white, grayscale and color images. It also can open PDF files, converting them into images using Adobe® PDF Library Technology.
The FineReader Engine may also save original and modified images into various formats. For the full list of supported image formats, see the Specifications section.

Upon receiving images, the FineReader Engine can perform the following preprocessing functions to improve the recognition:

  • Automatically de-skew images
    This feature is essential to be used especially when images come from scanners and requires a compensation for image skew. It does not require leading edge borders or lines.
  • Split dual pages
    Works for scanning books as broadsides - both the left and right pages. The recognition quality is higher if, after scanning, the page is split into two, with each page corresponding to a single book page. Recognition and layout analysis are then performed separately for each page, along with de-skewing if required.
  • Despeckle image (or image clean-up)
    Designed to eliminate random noise (speckles). The image may have a large amount of "dust" present on it, i.e. a large number of excess dots. The dots arise in the case of documents of medium-to-low print quality, and dots located close to character outlines may have an adverse effect on recognition quality. These cases despeckle technique helps to improve the recognition quality.
  • Texture Filtering and Adaptive binarization
    Texture filtering technology helps to filter out background "noise" such as color and texture, increasing accuracy for difficult-to-read documents such as newsprint, color documents, faxes, and copies.

Innovative Adaptive Binarization technology dynamically adjusts threshold of brightness for each image fragment during the recognition. And by this usage of individual recognition parameters it produces significantly accurate recognition results for documents with gray or color variable contrast background and textures.

  • Auto-detection of page orientation (90, 180, 270 degrees).
    This feature is very important when in a bulk imputing system it is unknown on which direction the image is scanned. The FineReader system automatically detects the orientation of each page and corrects it, if needed.
  • Manipulating with text color and background manipulation inside rectangles.
    It is an important feature for customers working with document management systems (DMS). The typical scenario of using this feature in archiving business is the following. A recognized image is stored as image and as plain text in an archive. Archive index of text also contains the coordinates if each character on the image. When a user receives a result of searching through archive, he gets an image of document as a source. But on that image, using the mentioned FineReader Engine function, the searching text is highlighted, changing the text color and background color within a rectangle, which completely outlines a found text.
  • Adaptive image pre-processing for camera images. The new technology applies different processing algorithms and correct specific image distortion typically seen in digital camera images. This provides an average improvement of 40% better accuracy in digital camera OCR.
  • Despeckling of an image in individual blocks (or zones), with the ability to specify the size of black dots.

The FineReader Engine also offers a number of useful pre-processing functions, allowing to manipulate images such as "image scaling", "image clipping", "creating previews", "rotating (90, 180, 270 degrees)", "mirroring" and "inverting".

Document Analysis and Full Layout Retention

The document analysis function set of FineReader Engine API solves such tasks as automatic document conversion with full-page layout retention, zoning OCR with manually located blocks, etc. It includes:

  • auto-detection of page orientation - 90, 180, 270 degrees (see above in Image Manipulation and Pre-processing);
  • auto-detection of text blocks, tables, barcodes and pictures;
  • auto-detection of vertical text in table cells;
  • manual block zoning (adding, removing and editing blocks);

The unique FineReader Engine features are:

  • Document Analysis for Archiving Tasks
    This function automatically detects and recognizes all text on documents including text embedded in pictures, charts, and diagrams. Developers may choose to use this mode of document analysis to extract exhaustive full-text information on documents needed for document index building (as in DMS, CMS, Archiving systems).
  • Document Analysis for Invoices
    A special document analysis function designed as a preprocessing engine for converting semi-structured documents, such as invoices, payment drafts, checks, transfers, business cards, agreement, health claim forms, resumes, etc. In this preprocessing role, this function has been designed to find as much text on these documents as possible, including characters and numbers — even if this information is located within stamps, pictures, logos or small-text areas. Unlike in standard full-page document analysis, this specialized document analysis assumes all printed information on the documents is text. It also ensures that important text information is not identified as graphic elements and that words or numerical values are not separated into multiple characters. As a result, maximum information about the text, including its coordinates, is available for analysis, field-by-field processing and parsing at subsequent processing stages by other systems.
  • Export to Multiple Formats including PDF, RTF/DOC/WordML, XML, and HTML with exact layout retention.

Recognition

OCR

  • Recognize 190 languages for OCR.
  • 170 languages with Latin, Cyrillic, Greek, Hebrew and Armenian characters
  • 46 languages have dictionary/morphology support.
  • Recognition of multilingual documents.
  • Recognition of typewritten documents.
  • Chinese, Japanese and Korean (CJK) Character Recognition (see more detailed information about CJK OCR in “Licensing Policy, add-on modules” section).
  • Fast mode recognition.
    Designed for high-volume document processing applications where speed is more important than accuracy. This mode increases processing speed by 200-250%, making it particularly useful with document management and archiving systems.
  • Recognition of OCR-A, OCR-B, MICR (E13B), and CMC7.
  • FineReader XIX.
          There are many old documents, books, and newspapers published in the 17-20th century all over the world. The set of functions called as “FineReader XIX” of ABBYY FineReader Engine provides a UNIQUE capability to recognize texts published in the period from 1600 till 1937 in English, French, German, Italian, and Spanish. FineReader XIX supports special fonts such as Fraktur, Schwabacher and the majority of Gothic fonts.

Entire list of supported OCR languages see in “Specifications” section.
Barcode Recognition

    * Recognition of 1D barcodes.
      For the full list of supported barcode types, see the Specifications.
    * 2D barcode recognition (PDF417).
      The 2D Barcode recognition recognizes PDF417, the industry standard for 2D barcodes. PDF417 encodes up to 1.1 kilobytes of data, including text and graphics information.
    * Fast barcode extraction.
      This feature enables automatic finding and recognizing barcodes at any angle on a document. It works both for 1D and 2D barcodes.

Field-level/Zonal Recognition Support

The SDK provides powerful field-level/zonal recognition capabilities ensuring accuracy and speed enhancement on small fields/zones. This functionality is crucial for processing tasks like data extraction, key-word indexing, and keyword classification. Key functionality for field level or zonal recognition includes multilingual OCR and barcode recognition, including:

    * Definition of field content by setting alphabets and dictionaries.

    * Detection of in-filed spacing - Detection and recognition of fields where the spaces are allowed. The FineReader Engine 8.0 also allows use of dictionaries which contain word-combinations with spaces.

    * Text block despeckle, with the ability to specify the size of white or black «garbage».

    * External recognition tuning features - Provides integrators with multiple word-level and character-level hypothesis and allowing integrators to influence the hypothesis choice by inserting additional ranking criteria during the recognition process. 

PDF Conversion

The SDK includes powerful PDF conversion technology with extensive functions for PDF input and output including:

PDF Input

  • Fast and accurate PDF processing. Analyses of internal information within the source PDF files such as annotations, metadata, text objects, font dictionaries and content streams and as a result enhances PDF performance and speed by efficient and accurate selection. When the text is embedded, examines the integrity of the text layer, and makes a decision as to whether or not to extract the text or apply OCR on a block by block basis.
  • Capture of Internal PDF Information, extracting internal PDF links, hyperlinks and document properties such as: subject, author, title, and keywords.

PDF Output

  • PDF Security and Encryption Support:
  • The SDK supports a variety of PDF security settings, increasing its applicability for government agencies and other organizations demanding high security.
              o "Open File" password settings designed to prevent unauthorized access to a document.
              o Restriction of certain operations, such as printing, editing or extracting file content, by assigning permission passwords.
              o Support for the latest encryption standards.

    * Output in tagged PDF format – that can be "reflowed" to fit different page or screen widths. Ideal for use with handheld devices (PDAs) or screen readers typically used by visually impaired users.
    * Page size – Ability to set the size for all pages of a output PDF file.
    * Links in PDF files – Re-creates hyperlinks within a PDF file.