Back to ABBYY Blog

The Power of Intelligent Document Processing to Transform Contracts into Actionable Intelligence

Michael Simon

November 06, 2018

Capabilities needed to leverage content intelligence -ABBYY Blog Post #2 of 5 Content Analytics

Now that we have covered the value of contract analytics, and before we get into the specific use cases, it might be useful to take a closer look at the capabilities needed to leverage the power of intelligent document processing.

First, as table stakes for this game, is powerful recognition capabilities.  If your system doesn’t have a highly accurate, scalable data processing and document capture solution that intelligently extracts, classifies and serves critical data from incoming image, email and document streams, it’s time to find a new system.  Yet even the most accurate systems are never 100% accurate.  Thus, once captured, business-critical data must be automatically validated for accuracy and compliance. Finally, for fundamental capabilities, your solution must integrate into corporate information systems such as CLM, ECM, ERP and EDMS – having to export or migrate data from a siloed system simply no makes sense in this day and age.

Next up on our list of needs is, of course, entity extraction.  Our system needs to automatically identify and extract entities such as the names, organizations, and locations, dates, quantities and monetary value from contracts.  Once you can get to this point, you can start driving efficiencies and creating business value:

AI software, however, can easily extract data and clarify the content of contracts . . . It can let companies review contracts more rapidly, organize and locate large amounts of contract data more easily, decrease the potential for contract disputes. . . and increase the volume of contracts it is able to negotiate and execute.

How AI Is Changing Contracts” Harvard Business Review, Feb. 12, 2018.

Last on our list of “must-haves” is Natural Language Processing (NLP).  NLP helps to read and analyze textual information, infer meaning in context, and determine which parts of the document are important by analyzing the co-occurrence of text and their relationships within and between documents.  It’s this document understanding that transforms content into intelligence.  NLP can be – and commonly is – combined with mathematical techniques to better cover specialized or uncommon language where NLP, which is driven by ontologies of known language usages, can have trouble.

However, purely mathematically-based approaches have a distinct tendency to produce results that only a machine can love – and that people tend to hate because it doesn’t reflect the way that we talk or even think about language.  Personally, I’ve seen and even worked with far too many systems that use such pure mathematical approaches, and I’m tired of trying to explain to end users why a few good or even highly insightful results are buried under a pile of useless and sometimes even bizarre-seeming noise.

At this point, I’ve just described what you need to get into the contract analytics game.  To play that game to it’s fullest, for real ROI, you need more.  One of the things to look for in advanced systems is the ability to leverage the human-centered guidance qualities of documents – or as they are better known: document sections and headlines.  After all, those document sections and headlines are there for a reason, because nobody wants to (or even can) make sense of dozens or even hundreds of pages of contracts without them.  The advanced capability to leverage such guidance, which we call “Document Sectioning” provides a uniquely differentiated approach that identifies business-critical information from that human-generated organization of sectioned documents and then maps this into NLP algorithms to automate the contextual analysis of clauses within sections and lift associated entities and fact identification in a way that is implicitly understood by humans.  More advanced systems even provide micro-ontologies that dramatically improve precision along with recall, which is typically impossible to do at the same time!

The second new, big feature to look for is open architecture that provides the flexibility to leverage NLP capabilities for a wider variety of use cases, including embedded or customized one.  This might seem like a small thing, perhaps even trivial, but it’s not.  Older contract analytics systems tend to have rigid structures that require focusing the design and code around a specific use case or user scenario.  Thus, even powerful systems that can provide excellent end-user contract review capabilities – and there are indeed many such excellent systems now available on the market – are not adaptable for other use cases.  Consultants or service providers who work within a business model where we (and this in fact does include me with that “we”) build highly-targeted solutions for clients who expect such solution to fit their needs not just mostly or even almost entirely, but exactly.  Thus, having to wrangle a rigid solution for our demanding clients is not just frustrating, but a way to burn otherwise billable hours or wreck our hour rates within flat-rated projects.

In contrast, open architected systems can provide the baseline technology stack for value-added partners who want to focus on building apps, not infrastructure.  There are clear economic advantages of using outside technology for those who make their money and reputation from add-on services.  Service providers need to focus on their own overall client strategy, not the technology provider’s.  As well, providers appreciate the ability to white-label such solutions to promote their own business, and again, not the technology providers.

Add in a multi-tenant system, to provide an extensible, secure environment for customers to focus on extracting insights from their client’s business-critical documents, and you now have a serious solution for value-added partners to develop their own approaches on top of that technology stack.  Going beyond the generalities, what kinds of features are required to be considered “open architecture?”

  1. Modular, open, configurable solution that can “snap-in” or “snap-out” functionality as necessary for specific workflows;
  2. Supports highly configurable processing workflows for different document types and projects, as well as multi-tier review and analysis with defined escalation and release procedures;
  3. Guided review and analysis takes advantage of easy-to-use interfaces tuned to the specific requirements of clause and section analysis, obligation analysis and compliance analysis.
  4. Allows for plug-and-play of best-of-breed AI and NLP components, whether from the provider or third-party;
  5. Enables individual customers to create different workflows using specific microservices such as recognition, classification, entity extraction, semantic analysis and NLP;
  6. Clause comparison and analysis, with auto tagging, accelerates determining if, and to what degree, contracts comply with regulatory requirements as defined in your contract or regulatory clause library; and
  7. Transfer entities and terms for further processing within a CLM system or other system of record.

For further information, browse this page:

Intelligent Document Processing (IDP) Legal
Mike Simon Attorney

Michael Simon

Michael Simon is the Principal of Seventh Samurai, an e-Discovery and Information Governance expert consulting firm.  As a trial attorney in Chicago, he was an early innovator in using electronic evidence to win cases for his clients.  He is an adjunct professor at Michigan State University College of Law (and formerly at Boston University School of Law), teaching classes in e-Discovery.  He has advised a number of companies and government agencies on how to best mitigate the risks arising from their information while best optimizing value, and provides strategic consulting for companies in the analytics, security, privacy, and legal technology markets.

Michael is a legal technology thought leader, having made over 100 presentations and written dozens of articles on e-Discovery and legal technology topics, including a book on Internet Law in 2002.  He received his J.D. from Loyola University Chicago School of Law, and his B.A. cum laude from Tufts University.

Subscribe for blog updates


Connect with us