Paper Comes in All Kinds of Shapes, Sizes, and Formats. Different challenges come with each variation of the form, content, layout, and complexity of a document. For those familiar with a loan package, think about structured documents and all the different types, page sizes, designs, colors, formats, sources, and file types that exist.
Specifically, we’ll describe these different document types in regards to the mortgage and title industry, since most people have experience with the documents in these business transactions, and illustrate where the challenges lie and how they are being addressed. There are three main paper or document formats; structured, semi-structured, and unstructured.
These are generally the easiest documents to index and store. The documents are created as forms, and then someone fills in the form. The data is always in the same place; the indexes are clearly defined since the form identifies to the client where to enter the information. Think of these fields like fields in an electronic form or database. It is generally a one-to-one ratio. For example, fields might include First Name, Last Name, Street Address, Zip Code, Loan Number, and so on. Examples of forms in the mortgage and title market would include HUD-1, tax forms, and loan applications.
For software to know how and what data to extract, a sample document is scanned into the system, and the fields are mapped out as a template. Nothing actually moves around these pages, so the software just knows to look in the same place every time for information.
You can see an example of how easy this works by using software like Adobe Acrobat Professional. Run an image of a form through this software, and it’ll automatically identify areas it thinks are form data. The imaging industry has had great success with these document types for more than a decade.
Blog Sequence Index
- Chapter One – Structured Forms
- Chapter Two – Semi-Structured Documents
- Chapter Three – Unstructured Documents
- Chapter Four – Handwritten Documents
In our next chapter, we’ll focus on Semi-Structured Forms.