ABBYY FineReader Engine First to Combine Extensive Language Support in a Single, Comprehensive Software Development Kit

January 24, 2007

ABBYY FineReader Engine Software Development Kit (SDK) is the first in the market to combine OCR (optical character recognition) on Thai, European (Latin/Roman), Hebrew, Chinese/Japanese/Korean (CJK), Cyrillic, Greek, and Armenian languages in a single SDK.

ABBYY, a world leader in the development of document recognition, data capture and linguistic technologies, today announced the newest edition of ABBYY FineReader Engine Software Development Kit (SDK), which is the first in the market to combine OCR (optical character recognition) on Thai, European (Latin/Roman), Hebrew, Chinese/Japanese/Korean (CJK), Cyrillic, Greek, and Armenian languages in a single SDK. The new version, which delivers breakthrough features such as the ability to export documents to PDF/Archive (PDF/A) format, provides an intelligent approach for tuning the speed and accuracy ratio for OCR and PDF conversion to give developers unmatched choice and flexibility.

PDF/A Export

FineReader Engine allows documents to be exported to the newly developed searchable PDF/A format for long-term electronic preservation of page-oriented documents. The format provides reliable data exchange in enterprise and government environments and promises to become the primary document storage standard. Already, it is widely donated by national archives, records management divisions and agencies, state ministry archives and other influential organizations.

“ABBYY is dedicated to continuing to evolve its technologies to support the latest standards,” said Alexander Rylov, ABBYY Chief Product Manager. “With ABBYY's rich linguistic and artificial intelligence expertise along with the addition of the PDF/A archiving format to FineReader Engine, ABBYY reinforces its status of a clear leader providing the most advanced SDK for document conversion and data capture.”

Thai and Hebrew OCR

The newest edition of ABBYY FineReader Engine now processes documents in Thai and Hebrew by means of its proprietary linguistic technology and advanced OCR. The Thai language, used by over 70 million people, is one of the most difficult for accurate natural language processing. It has about 80 characters including consonants, vowels, diacritics, and numerals.

Thai words can be composed of four levels, with vowels written behind, over, under or around the consonants and diacritics located over and under the basic characters. In addition, there are no spaces between words. ABBYY’s unique technology detects characters and separate text strings from each other to provide reliable recognition and is 50% more accurate than the competing Thai OCR.

Meanwhile, Hebrew, used by approximately 9 million people over the world, is written from right to left using the Hebrew alphabet, while numbers are written in the opposite direction, from left to right (most Hebrew texts today use European digits).

In addition, texts in Hebrew often include words in English and other “left-to-right” languages. ABBYY FineReader Engine’s bi-directional recognition overcomes the challenge of processing Hebrew texts for OCR in both directions simultaneously within a single document.

“This is much more than just a support for two additional languages for OCR,” said Alexander Rylov. “ABBYY is proud to have reached a new level in developing recognition technologies in order to overcome the most challenging recognition hurdles, such as those posed by Thai and Hebrew.”

Other Featured Enhancements

Featured upgrades to ABBYY FineReader Engine also include:

Extended CJK export to PDF and RTF — Expanded export capabilities for Chinese/Japanese/Korean (CJK) documents to PDF and RTF with vertical texts and complex layout retention.
Tuning of PDF conversion accuracy and speed balance — Developers can now select one of the four different modes for tuning PDF conversion accuracy and speed balance that matches their specific processing requirements.
Balanced recognition mode for OCR — In addition to Accurate and Fast modes, the new balanced recognition mode provides a middle ground between recognition speed and accuracy. These pre-defined modes allow developers to quickly select the quality and speed ratio that best suits their projects requirements.
Support for EAN 13 supplemental barcode and MICR CMC-7 font — The EAN 13 supplemental barcode remains the standard in the publishing industry for encoding ISBN numbers on books. CMC-7 processing provides high accuracy when recognizing bank checks and institutional payment remittances.

About ABBYY FineReader Engine

ABBYY's platform for computer intelligence technologies includes all the technologies needed for developing state-of-the-art data capture, document conversion, archiving, and content/document management systems. The FineReader Engine SDK enables developers to build applications of any scale and complexity: from client-oriented solutions to server-based distributed projects.

Availability and Pricing

ABBYY FineReader Engine is available through ABBYY’s network of reseller partners. FineReader Engine is sold via a flexible, modular licensing policy. Developers may select the best combination of tools and pricing options for their project: they can choose only the functions and features they need. Pricing varies according to the number of pages processed.

A special time-limited trial version is also available for testing. Information on licensing models, pricing, and other technical information is available from your local ABBYY office.

Connect with us