- Not every document can be clearly identified on pure visual characteristics, especially if it has either a very generic or very exceptional page structure and form. An e-mail body does usually not have a fixed structure nor has a standard paper-printed letter. They could include a customer complaint or a notification of the client’s address change in continuous text – or even both. The automated allocation of such communication to the relevant business case requires a ‘content classification’ by analysing text and identifying predefined and statistically trained key words/phrases.
- While reading the content of an e-mail or its attached Microsoft Office document seems a solvable task for a software, the “unlocking” of text trapped in a PDF image from a scanned paper document can only be performed by reliable and complex optical character recognition (OCR) technology including language recognition and semantic analysis logics – ABBYY’s core competences for more than 25 years.