ABBYY Empowers Professionals to Develop Highly Efficient Applications Using Key Cornerstone Technologies in a Single SDK

September 27, 2005

ABBYY FineReader Engine 8.0 provides a single source for developers to integrate ABBYY's technology into a variety of DMS and ECM applications including: document/content processing, classification, indexing, archiving, document/PDF conversion, forms processing, and data capture from semi-structured forms and documents.

FineReader Engine 8.0 Raises the Bar with Field-Level/Zone Recognition Support, New PDF Processing and Development Platform Enhancements

ABBYY today announced FineReader Engine 8.0, the latest platform release of its powerful recognition SDK. By integrating full-page recognition, field-level recognition, PDF conversion and data capture capabilities in one SDK, Engine 8.0 essentially provides a single source for developers to integrate ABBYY's technology into a variety of DMS and ECM applications including: document/content processing, classification, indexing, archiving, document/PDF conversion, forms processing, and data capture from semi-structured forms and documents.

For the first time, FineReader Engine 8.0 addresses key new audiences with major field-level recognition enhancements, making it an ideal platform for supporting applications such as keyword indexing and document classification, control and verification systems, and data extraction from different documents by intelligent analysis (checks, invoices, passports). These features, combined with enhanced PDF conversion and new customization tools for added developer support, make FineReader 8.0 Engine the most accurate and comprehensive software development kit for document conversion and data capture. Unlike any other toolkit in its class, FineReader Engine includes all the key functionality needed to support today's DMS and ECM applications.

ABBYY FineReader Engine 8.0 supports 189 OCR and 91 ICR languages, OMR, plus 1D and 2D barcodes. The new version delivers an overall boost in recognition accuracy, enhanced field-level recognition, new document analysis tools, and new features such as full-text index preprocessing, making it an effective tool for different tasks. It also provides programming-specific tools to aid developers in creating accurate and efficient applications, such as external Voting API support (for solutions with multiple engines) and lower-level access for "on-the-fly" recognition tuning. Developers can also take advantage of the code sample database, complete with sample images and benchmark data for common use cases. ABBYY also offers professional services and works closely with its developer community to help achieve the optimal balance between speed and accuracy for each particular application.

Overall Recognition Enhancements

OCR accuracy enhancement. ABBYY FineReader Engine 8.0 delivers a significant increase in overall recognition accuracy with up to 30 percent accuracy improvement for "difficult-to-read" images such as faxes and documents scanned at low resolution.
Fast mode for ICR. When the fast mode is chosen, FineReader Engine delivers up to 2 times faster field-level ICR
Adaptive image pre-processing for camera images. The new technology applies different processing algorithms and corrects specific image distortions typically seen on digital camera images. This provides an improvement of up to 40% in digital camera OCR (compared to previous versions of the technology).

Field-level/Zonal Recognition Improvements

ABBYY FineReader Engine 8.0 contains a complete set of field-level recognition functions using OCR, ICR, OMR or barcode recognition and extracting the text or data from specified zones or snippets of images. Special enhancements in version 8.0 ensure accuracy and speed enhancement on small fields/zones. These improvements include:

Fast mode ICR, performing ICR up to two times faster.
Better text extraction from the fields, even when the text is overlapped by field lines.
Detection of in-field spacing, recognizing fields in which the spaces are allowed. Version 8.0 also includes dictionaries which may contain word combinations with spaces.
Intelligent processing of blocks with intersecting parts and lines, recognizing the text that is completely located within the block borders, without wasting time on recognizing surrounding irrelevant text blocks.
Text block despeckle, with the ability to specify the size of white or black "garbage".
Voting API, word- and character-level hypotheses for following voting scenarios.
"On-the-fly" recognition tuning, allowing integrators to influence hypothesis choice by inserting additional ranking criteria during the recognition process.

Full Page Recognition/Document (PDF) Conversion Features

With significant technology enhancements, ABBYY FineReader Engine 8.0 offers higher performance and a recognition rate up to twice as fast when converting source PDF files. With extensive functions for both PDF input and output, version 8.0 also provides developers with new powerful tools to create PDF conversion applications (including PDF to a variety of formats, or image to searchable PDF).

Enhanced PDF Conversion (PDF Input)

More accurate and up to two times faster PDF processing
When processing PDF files, ABBYY FineReader Engine determines whether or not the text is embedded, examines the integrity of the text layer, and analyses internal information within the source PDF files (such as annotations, metadata, text objects, font dictionaries and content streams). Using all this information, it makes a decision as to whether to extract the text or apply OCR. It examines each block individually, and selects the most appropriate method to apply to each block. This process ensures more accurate and faster PDF conversion.
Extraction of internal PDF links and hyperlinks
Compliance with the security settings of source PDF files

Enhanced PDF Output

PDF Security Settings and Encryption Support. ABBYY FineReader Engine 8.0 supports open and permission passwords for output PDF files, allowing users to restrict printing, editing, or extraction of file content. This important feature makes it well suited for professionals working in government ministries and other organizations demanding high security.
Tagged PDF. In addition to output to a variety of searchable PDFs and image only PDFs, version 8.0 now creates Tagged PDFs that allow text to be reflowed to fit different page or screen widths. This makes it easy to generate PDF files that are optimized for viewing on handheld devices and accessible by screen readers typically used by visually impaired people.
Metadata for PDF files. It is possible to add the following metadata during PDF export: bookmarks, hyperlinks, and document properties.

Document Analysis for Full Text Indexing

This feature supports automatic detection and recognition of text on an image, including the text embedded in pictures, charts and diagrams. Document Analysis for Full Text Indexing provides exhaustive information on text that is vital for further document index building. This makes FineReader Engine 8.0 truly indispensable for indexing solutions (for building an index in or for DMS, CMS and archiving systems).

Data Capture from Semi-Structured Forms and Documents

The new ABBYY FineReader Engine offers semi-structured forms and documents processing through support for the latest ABBYY FlexiCapture Studio 1.5 tool. This makes processing forms and semi-structured documents even more accurate and minimizes the amount of adjustments required for each project. New features supported by FlexiCapture Studio 1.5 include:

Table Element Support, enabling proper reading of tables in documents and providing easy extraction of line-item details. Ideal for processing invoices and a great variety of other documents.
Specialized Numerical Element Support, with the new "Phone" and "Currency" element types streamlining the description of these data types on the form and thus increasing capture quality.
Texture Filtering, offering enhanced pre-processing technologies to screen out irrelevant texture that may affect recognition quality.
Multiple Language Selection for Pre-recognition, enabling the pre-selection of mixed-language combinations, for example English-German, for easier processing of multilingual documents.

Development Platform Function Enhancement

Sample Codes for Maximum Performance and Efficiency
The new SDK is supplied with the database of common Engine Usage Samples which help tune FineReader Engine for each particular project in the most appropriate way. This is a set of "ready-to-load" profiles with the optimal speed and accuracy performance balance. The profiles are designed for particular tasks, such as field-level recognition, archiving with imaging and indexing (e.g. searchable PDFs), full-page conversion to RTF and HTML, etc. The database also contains sample images and benchmarks.

External Voting Algorithm Support
When FineReader Engine is used as one of the participating engines in a third-party application, it supplies recognition alternatives (or hypotheses) with a relevant confidence level for characters, words and intercharacter separation. This information helps developers design an efficient and accurate voting algorithm. For example, when recognizing an "O", ABBYY FineReader Engine may return 3 hypotheses: "0" (zero), with 60% confidence; capital "O", with 80% confidence; and capital "C", with 10% confidence. Another example for intercharacter separation: the possible hypotheses for an "m" would be "m", "rn", and "in".

On-the-Fly Core Recognition Tuning
ABBYY FineReader Engine 8.0 provides developers with the access and ability to manipulate the engine during the recognition process on a core level. The FineReader recognition engine generates hypotheses (or recognition alternatives) and allows developers to influence or fine-tune the procedure of setting the confidence level for each hypothesis (or selecting the best hypothesis) using their own specific ranking criteria.

"Our developer customers want to use FineReader Engine to enhance their ISV applications with document conversion and data capture capabilities that deliver the optimal balance between accuracy and speed," explained Alex Rylov, chief product manager for ABBYY's technology licensing products.

" FineReader Engine 8.0 delivers a powerful combination of core technologies, and builds upon that by delivering productivity tools such as diagnostic tools, pre-defined samples for the popular processing scenarios, a Voting API, and recognition tuning. We give tour customers the tools they need to significantly influence their productivity, while our technical teams can work closely with theirs to help them achieve their ideal levels of performance - whatever the application."

Input/Output Formats Support for All Types of Functions

ABBYY FineReader Engine supports a variety of input image formats (including BMP, PCX, DCX, JPEG, PNG, TIF and PDF) and document saving formats (including DOC, RTF, PDF, HTML, PPT, TXT, XLS, DBF, and three types of XML). The new version also supports new input formats: GIF and DjVu, which are very useful for web publishing, online archiving, SPAM filtering and other Web-related tasks.

Availability and Pricing

ABBYY FineReader Engine consists of a set of Dynamic Link Libraries (DLLs) and is accessible through a standard programming interface, which conforms to the COM (Component Object Model) standard, supporting development environments such as C/C++, Visual Basic, and Visual Studio.Net.

The full product is scheduled to ship starting in November 2005. Information on licensing models, pricing, and other technical information is available from your local ABBYY office. For additional product and sales information, please visit www.ABBYY.com.

Connect with us