What is Document Classification Software?
Document Classification Software is able to identify and sort documents
into groups using intelligent page layout and keyword analysis. The more complex
applications have interactive training capabilities that learn to recognize
new documents automatically. Less expensive applications are also available that
require more document-specific configuration but work well for many classification
tasks.
Who can benefit from document classification software?
Companies that need to digitize or capture data from many different types
of documents that do not come pre-sorted should invest in document classification
technology.
Enterprise Mailroom Automation is the classic example for a classification
solution. The system is trained to identify and label all the documents commonly
received throughout the organization and route them to the correct person or workflow
electronically. These types of solutions are best suited to mid to large sized companies
with very high volumes of documents.
Smaller businesses and departments can use less expensive classification solutions
to perform a variety of useful tasks. The key question is whether you can use keywords or
templates to identify and group your documents. If you can define a unique list of key words
or phrases that identify a document there are very cost-effective solutions available. If
you have a limited set of documents to identify and can provide a sample of each, then
template-based solutions are available that are more robust but still affordable.
How much do Document Classification systems cost?
The total cost of a Document Classification solution includes several items:
- Cost of the software
- Time to install and configure the software
- Configure classification rules and train classification engine
- Data Capture Workflows configured for automatic filing or data entry
- User and administrator training
- Labor required to verify classification results
- IT infrastructure and maintenance costs
The major factor in software cost is the classification technology that is employed.
Without configuration, keyword-based classification software can be purchased for around
$1,000. Template-based solutions can be found for under $5,000. Enterprise solutions start
around $15,000. However depending on the volume and types of documents involved, and especially
the number of Data Capture Workflows
that are integrated into the process, the cost of configuring and testing will be the biggest
influence on total cost.
If you have an IT staff that is familiar with document scanning and OCR
applications, it is possible to do most of the configuration and maintenance
in-house. If not then it is highly recommended that you use our
Consulting Services
to guide you through the setup process.
Contact Us to get a professional analysis of your project
requirements and a full time and cost estimate.
What is the typical Document Classification workflow?
Using the Enterprise Digital Mailroom to illustrate the most comprehensive type of
implementation, the following steps are involved in the solution workflow:
- Mail is opened and prepped for scanning (unfolded, staples removed, junk discarded, etc.)
- Paper documents are scanned on a high-speed document scanner
- Electronic documents can be input via email or hotfolder
- Classification engine identifies the documents
- Unknown or uncertain documents are verified by a user, further training the system
- Documents are routed to appropriate data capture workflow
- OCR, ICR, checkboxes and barcodes are used to index or enter data from documents
- Uncertain data fields are presented to users for manual review and correction
- Once all errors are corrected, data and documents are exported
- Data is sent to relevant databases, and/or used to file documents automatically in your file system, SharePoint, or
document management system
- Documents can then enter electronic workflows for routing to email recipients and performing
any necessary reviews or approval processes
There are many steps involved in the process, including some manual review,
but the end result is that the vast majority of paper in the organization is digitized, and all associated
data entry and filing tasks are automated.
Simpler solutions combine the classification and data extraction steps, and
don't interactively train the classification engine.
How do I find out more?