ABBYY
Back to Customer stories

Education & Science | PDF and Document Conversion

The Royal Botanic Garden Edinburgh’s Herbarium Collection Evolves with ABBYY Recognition Server

pathner logo

The Royal Botanic Garden Edinburgh’s Herbarium Collection Evolves with ABBYY Recognition Server

Customer Overview

Name Royal Botanic Garden Edinburgh
Headquarters Edinburgh, United Kingdom
Web

High Volume OCR Server Product Enhances Database and Enables Web Search of Valuable Collection

Extending over four gardens boasting a rich living collection of plants, the Royal Botanic Garden Edinburgh (RBGE) is a world-renowned centre for science and education whose mission is to explore and explain the world of plants for a better future. The RBGE has been growing and studying plants for over 330 years and is therefore well-placed to help document and conserve the world's diversity of flora. The RBGE recently endeavoured to do what many academic institutions and libraries are currently doing - make part of their collection available online, not only within their organisation, but to the outside world as well. 

Challenge

With plants and their habitats under threat around the world there is an urgent need to understand, protect and conserve them. With its long tradition of working with plants and the environment the Royal Botanic Garden Edinburgh is well placed to serve both researchers and enthusiasts. The RBGE's extensive herbarium represents half to two thirds of the world's flora and is considered one of the world’s leading botanical collections. Every year people travel great distances simply to visit the institution and have access to the herbarium specimens. Recently the RBGE sought out a solution that would help automate the retrieval of important data from plant specimens and increase access to their collections through the web.

The RBGE faced the challenge of digitising its vast collection of 3 million herbarium specimens of pressed plants and collection data. Specifically, they were looking for a solution which could capture the text on labels of herbarium specimens and text on printed sheets of collection data. Previously, much of this work was done through manual data entry by an employee, a process that was both time-consuming and resulted in incomplete records in the RBGE database.

An additional challenge was the fact that the original material often included a wide range of formats, with either handwritten or typed information on labels, some dating back as far as 1690. A variety of fonts, languages and the presence of barcodes added to the complexity of the digitisation project. Given the importance of this information to researchers, there was a real need for efficient methods for capturing label data with minimal loss of information. 

Solution 

undefined

When the RBGE made the decision to seek out a new solution they had two goals in mind. Firstly, they were looking for a solution that could capture the text on labels of herbarium specimens and text on printed sheets of collection data without losing any information, even from source materials of poor quality or high complexity. Secondly, they wanted a solution that could be incorporated into an existing workflow and one that could also access the RBGE’s Image Management System where the TIFFs were stored.

After attending a workshop run by the British Library, the RBGE discovered that OCR (optical character recognition) technology might provide the answer. OCR software is able to ‘read’ the text contained in scans or camera images of paper documents and PDFs, and convert that data to editable, useable text. After several recommendations by other botanical institutes and the British Library, the RBGE team selected ABBYY Recognition Server as a best-fit solution for its digitisation needs. Recognition Server is a robust, scalable solution designed to automate the recognition and document/ PDF conversion process for high-volume projects.

ABBYY Recognition Server delivers highly accurate conversion of images to text documents for the purpose of classifying, searching and exporting information to the RBGE’s internal system for document storage and management. Recognition Server accesses the existing TIFFs in a folder on the RBGE’s Image Management System. After processing the high-resolution specimen images through Recognition Server two output files are created. The first is a searchable image PDF that the RBGE uses as a backup. The second, a plain text file, is saved in a specified folder on the server. The RBGE’s existing workflow software picks up the text file from this location and enters it into a MySQL database. The database makes the RBGE’s vast collection easily searchable by researchers according to several criteria and all records that are in the internal database are now available through the RBGE’s web site as well. 

Outcome

The RBGE has seen numerous benefits from implementing ABBYY Recognition Server. They have a system that is accurate and reliable and enables them to greatly reduce the amount of time spent on manual data entry. Another advantage is the amount of data on each specimen now available in the database; The RBGE attributes this to Recognition Server’s ability to accurately read and convert such a wide range of text, including a multitude of font styles and sizes and numerous languages (47 with dictionary support plus an additional 139 recognition languages).

“ABBYY Recognition Server has been essential in making our collection easily searchable in the database. The ability to find specimens quickly according to a wide range of label information puts us in a very strong position when doing academic research,” said Dr Elspeth Haston, Assistant Curator for Digitisation at the Royal Botanic Garden Edinburgh.

The RBGE is also pleased with the increased access to its collections it is now able to provide to researchers, both within and outside the organisation. In the past, researchers had to invest vast amounts of time and expense in either travelling to Scotland to study the specimens or in applying to the RBGE to have them sent on loan by post. With the collections available online researchers can have immediate access at any time, no matter where they are.

“With Recognition Server we are much closer to our final aim to make RBGE’s invaluable collection of millions of specimens digitised, available from anywhere around the world and ultimately preserved for future generations. We are extremely pleased with the performance of ABBYY’s OCR technology and plan to use it as part of our digitisation projects in the future,” concluded Dr. Haston.

Ready to talk to an expert?

We'd love to help you along your automation journey.