There is one place where you can access unlimited Korean studies data in Asian region, and this place is the National Assembly Library of Korea. Founded on 20 February 1952, it started out as a small establishment of the Secretariat, meant to support the legislative activities of the National Assembly Members. Nowadays, it’s one of the leading Korean libraries. Over 160 million pages of valuable materials have been made available in its digital library, which people can easily access anytime, anywhere. Among its contents are various parliamentary materials, government publications of different countries, and materials from many international organizations — such as the UN and the EU.
Like most libraries, the National Assembly Library of Korea used to keep a lot of its data in image formats and image-only PDFs, which made their use very limited. The main problem was the impossibility of searching the information within scanned documents and books. Furthermore, those materials were completely inaccessible to blind people.
To overcome these constraints, the National Assembly Library of Korea came to the conclusion that it was necessary to digitize all available materials, and implemented an OCR solution “setting the groundwork for providing the best online national library service for blind people”. It enlisted the help of ABBYY Partner ReTIA — the go-to solution specialist in the country. Together with the developer DBPortal, which specializes in IT development for libraries, they deployed a web-based system for creating a library database that would allow visitors to efficiently search for any published information and provides voice service for blind people.
ReTIA and DBPortal introduced powerful server-based OCR functionality for automated document capture and PDF conversion. A web-based system makes it possible to upload a tremendous amount of images and image-only PDFs from the National Assembly Library of Korea into the library database.
Here’s how the process goes down: the administrator specifies OCR options -> the data are recognized -> XML files are created -> Verification Operators verify the data under the web environment.
During this process image files and image PDF files are converted into the ALTO XML format through ABBYY Recognition Server 4. This solution has already been used in a number of Korean public organizations and enterprises, e.g. the Korea Customs Service, the Korea Institute of Patent Information, the Fair Trade Commission, Korea Internet and Security Agency, LG Electronics Inc., GS Engineering & Construction Corp. etc. It is a Full Text Recognition Solution which may be applied for documents search systems, documents management systems, etc.
The web-based system, whose OCR function is provided by ABBYY Recognition Server, performs tasks of the internal library processes as well as external online OCR requests. It saves recognition results to the digital library system of the National Assembly Library of Korea and generates log data files, which include various meta data (for example, the time of the creation and work unit). An outstanding feature of the system is its user-friendly and intuitive interface, which has been specifically designed for the tasks of the library.
In the OCR management interface library workers can register files for processing, check recognition results, etc. Moreover, there is a function of verifying the recognition results. The web-based system provides the function of user registration and work statistics creation and management, which makes it easy to monitor the amount of work done by each user.
Moreover, the system also provides an additional, highly innovative function: real-time OCR. It allows users to recognize the previously unprocessed data in the library database. The combination of Real-Time OCR and TTS has resulted in a groundbreaking solution for the National Assembly Library of Korea.
The OCR function available in the system is soon to be adapted for online voice service of the National Assembly Digital Library of Korea.
Thanks to the application of a web-based OCR system, the amount of human and time resources needed to create a searchable library was reduced greatly.
The easy-to-use search function of the database, as well as the range of voice services for blind people should help the National Assembly Library of Korea to achieve its mission of contributing to the development of parliamentary democracy and people’s quality of life, as well as preserving the intellectual cultural heritage of mankind for future generations.