Unstructured Documents – Definitions, The Challenges And The Methods To Manage Unstructured Content – Chapter 3

Our third chapter in the “Best Practices for Managing Unstructured Data” series will focus on the definition of Unstructured Documents. We’ll continue to add chapters around the solutions and best practices regarding managing this information.

Unstructured Documents

The third document classification type, Unstructured Documents, presents the biggest challenge for Document Imaging. These documents are defined as having little structure and consistency; they are more free-flowing reports, like the one you are reading today. Examples of such include Correspondence, Deeds, Title Releases, Contracts, Plant Records, Claims, and hopefully not complaints.

Unstructured DocumentThose familiar with the documents processed by the Mortgage and Title industry will not be surprised to learn that it is estimated that nearly 80 percent of all documents in business, in general, fall into this category.

The challenge falls into a variety of factors. First, the index or metadata that clients wish to extract is free-form and unstructured; it could be a sentence, paragraph or whole page, or a few keywords embedded within a description. For example, on a Release document the Borrower Name is usually embedded within a sentence on the first page, but that sentence changes based on how the Title Insurance company wishes to describe it.

On a Grant Deed, the Borrower Name presents the same issue. Even worse—the Legal Description can extend over two or more pages. The formats are irregular; even a human has to be trained how to read the document to determine what is really the Legal Description.

Due to these complexities, it’s been next to impossible to prepare your typical document imaging solution to address these complex formats, and organizations have had to fall back on either their own staff to enter the data into a line-of-business system or ship the images over to a DPO for data entry. Axis Technical Gropu saw an opportunity to introduce new technology to overcome this challenge and launched a new software solution to address these very complex formats – Axis Smart Data Extraction™.

In our next chapter, we’ll focus on handwritten documents and forms.

Blog Sequence Index