ABBYY Announces FineReader Engine 7.0 SDK

September 16, 2003

ABBYY FineReader Engine 7.0 is the next generation of the company’s Software Development Kit (SDK) for integrating ABBYY OCR, ICR, OMR and barcode recognition technologies into Windows applications.

Next Generation Platform Adds Extended Functionality to Address Key Vertical Markets

ABBYY®, a leading developer of document recognition and linguistic technologies, today announced ABBYY FineReader Engine 7.0, the next generation of the company’s Software Development Kit (SDK) for integrating ABBYY OCR, ICR, OMR and barcode recognition technologies into Windows applications.

With the announcement of FineReader Engine 7.0, ABBYY dramatically expands the reach of its technology by offering extended functionality to address key vertical markets, including the financial and banking sectors, library archiving organizations and Asian markets. In addition to core platform enhancements to overall accuracy, document analysis, and export functions, FineReader Engine 7.0 adds sophisticated new modules for recognition of ancient and historical texts, PDF files, invoices, barcodes, and Asian characters.

“ABBYY’s goal is to deliver recognition technologies that help organizations transform documents into manageable data that can be processed, searched, indexed, edited, sent, or tabulated. As recognition technologies become more advanced, the true technological challenge in achieving this lies in the ability to address specialized texts and document formats,” explained Vadim Tereshchenko, FineReader division vice president. “With FineReader 7.0, we offer new add-on modules offering technology breakthroughs that expand the ability of our software to perform in key vertical markets.”

With the release of ABBYY FineReader Engine 7.0, developers will gain access to the powerful functionality of an advanced OCR system which is already being used by many leading companies worldwide, such as Canon Australia, Cardiff, Fujitsu, Kofax, Panasonic, Saperion, and ZyLab. ABBYY FineReader, ABBYY’s flagship OCR application based on the FineReader Engine, has won more than 100 awards worldwide since 1998.

Platform Enhancements in 7.0

FineReader Engine 7.0 is based on an entirely new recognition platform that offers the following enhancements:

Recognition Accuracy

Enhancements to ABBYY’s proprietary IPA Technology and other tools for fine-tuning recognition increase FineReader’s accuracy significantly over previous versions. A major contributor to the overall letter, word, and line recognition accuracy is the addition of new structural character models. In addition, new image preprocessing algorithms increase the technology’s ability to read documents that have text printed over an image, low-contrast documents, and poorly scanned pages. These improvements in accuracy are possible due to further enhancements of the two image preprocessing technologies that aid in recognizing this kind of text: Adaptive Binarization and Intelligent Background removal . Adaptive Binarization uses a “dynamic” or “intelligent threshold” technique, which tunes the image contrast, line by line and word by word, optimizing the characters’ quality in order to achieve the most accurate recognition results. Intelligent background removal removes textures and background “noise” even on complex or degraded documents that could interfere with the recognition of text properly.

Improved Document and Image Analysis

FineReader Engine 7.0 offers a new algorithm, Multilevel Document Analysis, (MDA), a process that examines the document at various levels — from characters to words, lines, paragraphs, and finally, it reconstructs the entire document. With this sophisticated document and image analysis algorithm, FineReader Engine “understands” each formatting elements on a document. As a result, applications developed using the Engine will be able to retain complex layouts, such as placement of images and columns on the page, formatting of tables, and font sizing. Other key benefits include improved recognition accuracy of complex tables, multiple-column documents with images, HTML formatting, and bullet points.

New Export and Synthesis Capabilities

ABBYY FineReader 7.0 also delivers significant improvements in export and synthesis, which include:

Improved PDF Export. FineReader Engine now creates “linearized” PDF files that are optimized for publishing on the Web.
Improved WYSIWYG HTML output
The retention of complex formatting elements (e.g. text flowing around non-rectangular images) has been improved in HTML. The resulting HTML files are now smaller in size, which is particularly important for documents published on the Internet.
Output to Microsoft PowerPoint
Smaller file sizes when exporting results to Microsoft Word

Extended Functionality with New Add-On Modules

With the development of FineReader Engine 7.0, ABBYY has focused on fine-tuning the technology to deliver special features and functions that address key markets. FineReader Engine’s add-on modules offer specialised functionality to support software developers, system integrators and VARs working with specific types of documents and files. FineReader Engine 7.0 add-on modules include:

1. PDF Opening

ABBYY FineReader Engine 7.0 offers an intelligent PDF conversion process. It first performs standard recognition on the image file then extracts the “text layer” (if available) to check accuracy. This scheme circumvents potential recognition errors when the text layer in PDF is written using special encoded fonts.
2. FineReader XIX: Fraktur/ Black Letter Script Recognition

FineReader 7.0 offers the industry’s first omnifont OCR solution for “Fraktur” or “Black Letter” prints used in ancient texts from the 19th and 20th centuries. FineReader will recognise elaborate, type prints as well as old style roman type characters, such as the elongated “s” used in early English or French texts. This feature, developed together with the European METAe archiving project, is already being tested by leading universities. Well-suited for archiving a variety of old books and documents, FineReader XIX includes dictionaries to support German, English, French, Italian, and Spanish.
3. Extended XML Output Module

The Extended XML Output module exports recognition results tagged with document structure information, including the location of graphics, tables, paragraphs and even characters, as well as the full formatting information about characters, paragraphs and tables. Post-recognition processing makes it easy to export this information to external applications, such as document management and content management systems and databases (e.g. MS SQL Server, Oracle or MS SharePoint). XML output is offered in the following formats:
- Native XML (includes full information of recognized text)
- Microsoft Word XML. Recognised files can be exported recognized as native XML files using Microsoft Word 2003 defined schema.
- ASCII XML Output. A special ASCII XML Output module has been designed for DMS and archiving applications. Resulting files contain information about character positions and character confidence levels and can be easily indexed. Automatically eliminates those parts of text which have a low confidence level.
4. Chinese and Japanese Recognition

ABBYY FineReader Engine 7.0 now has an add-on module for Chinese (Traditional, and Simplified) and Japanese (Hiragana, Katakana and Kanji) OCR. Seamlessly integrated with the core engine, this module allows developers to use FineReader Engine's existing API to execute recognition for Chinese and Japanese texts. Functions include: recognition of multi-language documents (Chinese-English and Japanese-English texts), automatic recognition of vertical and horizontal texts, automatic detection of text blocks, tables, columns and pictures on a document, manual drawing of recognition zones, detailed information about recognized characters, and export of recognized text into RTF, XML, HTML, TXT, CSV, and DBF file formats.
5. Document Analysis for Invoices

A special OCR module developed for the financial and banking market segments, Document Analysis for Invoices can be used as a pre-processing engine for the conversion of semi-structured documents such as invoices, payment drafts, checks and transfers. In this pre-processing role, the module is designed to find as much text on these documents as possible, including characters and numbers — even if this information is located within stamps, logos or small text areas.

In contrast to standard OCR, this specialized OCR module assumes all printed information on documents is text and ensures that important text information is not incorrectly identified as graphic elements and that words or numerical values are not separated into multiple characters. As a result, a maximum of textual information is obtained from the document, including the coordinates, and is available for analysis, field-by-field processing and parsing, which are performed at subsequent processing stages by other systems.
6. OMR (Optical Mark Recognition) Module

The Optical Mark Recognition (OMR) module recognizes simple check marks, grouped check marks, model check marks and check marks with “corrections” made by hand.
7. 2D Barcode Recognition Module (PDF417)

The 2D barcode module recognizes PDF417, the industry standard for 2D barcodes. It is ideal for recognising and categorising product labels,and packages. PDF417 encodes up to 1.1 kilobytes of data, including text and graphics information.

Specifications

The FineReader Engine SDK consists of a set of DLLs (Dynamic Link Libraries) and an API (Application Programming Interface) that conforms to the COM (Component Object Model) standard and is easily accessed with Visual Studio.NET, C/C++, Visual Basic or any other development tool supporting COM components. The FineReader Engine offers complete access to low-level OCR/ICR/OMR/barcode functionality and does not require a graphical user interface.

ABBYY also offers a version of its OCR development tool kit for the Linux platform. FineReader Engine software development kit for the Linux platform supports a Linux-based programming and operating environment and provides access to ABBYY OCR functionality through an application programming interface (API) and via the Command Line interface.

Trial Version

Since October 1st, ABBYY offers a free trial version of ABBYY FineReader Engine 7.0, to allow prospective customers the ability to test Engine 7.0 under real working conditions without any limitation of functionality. To obtain an evaluation copy, please contact your ABBYY salesperson.

Pricing and Availability

ABBYY FineReader Engine 7.0 is scheduled to ship in the fourth quarter of 2003. ABBYY offers flexible pricing options that allow developers to select the type of licensing model that is best suited to their product and sales strategy. For additional product information, visit ABBYY’s website at https://www.abbyy.com/.

Connect with us

ABBYY Announces FineReader Engine 7.0 SDK

Platform Enhancements in 7.0

Recognition Accuracy

Improved Document and Image Analysis

New Export and Synthesis Capabilities

Extended Functionality with New Add-On Modules

1. PDF Opening

2. FineReader XIX: Fraktur/ Black Letter Script Recognition

3. Extended XML Output Module

4. Chinese and Japanese Recognition

5. Document Analysis for Invoices

6. OMR (Optical Mark Recognition) Module

7. 2D Barcode Recognition Module (PDF417)

Specifications

Trial Version

Pricing and Availability