PDF documents can be categorized in three different types, depending on the way the file originated. How it was originally created also defines whether the content of the PDF (text, images, tables) can be accessed or whether it is “locked” in an image of the page.
Digitally created PDFs, also known as “true“ PDFs, are created using software such as Microsoft® Word, Excel® or via the “print” function within a software application (virtual printer). They consist of text and images.
Both the characters in the text and the meta-information have an electronic character designation. With ABBYY FineReader 14 you can easily search through these PDFs and select, edit or delete text similar to how you would do that in other editable formats, such as Microsoft® Word. The images in digitally created documents can be resized, moved, or deleted.
When scanning hard copy documents on MFPs and office scanners, or when converting a camera image, jpg, tiff or screenshot into a PDF, the content is “locked” in a snapshot-like image.
Such PDF documents contain just the scanned/photographed images of pages, without an underlying text layer. Consequently, scanned PDF files are not searchable, and their text usually cannot be modified or marked up. A scanned PDF can be made searchable by applying OCR with which a text layer is added, normally under the page image. Application of automatic OCR processing, like in FineReader, also enables direct editing of scanned PDFs without the additional step of first converting the document to an editable file format such as Microsoft Word.
Searchable PDFs usually result through the application of OCR (Optical Character Recognition) to scanned PDFs or other image-based documents. During the text recognition process, characters and the document structure are analyzed and “read”. A text layer is added to the image layer, usually placed underneath. Such PDF files are almost indistinguishable from the original documents and are fully searchable. Text in searchable PDF documents can be selected, copied, and marked up.
Text recognition technology can be applied in different ways during the document conversion process, each requiring different levels of involvement by the user. It can be:
Or as a “service” in the cloud.