M-Files Smart Classifier - Overview

Dec 19, 2022 12:34:09 PM

M-Files offers quite a few Intelligence Services to improve document tagging,  recognition, and creation. These services include the M-Files Smart Extractor, Smart Metadata, and today's topic - Smart Classifier. 

What is Smart Classifier?

The purpose of Smart Classifier is to equip your Document Management System with tools that automatically provide document class suggestions upon data ingestion. Each time a document is added to your vault, its content and structure are analyzed and compared to existing documents. M-Files then provides a suggestion for the classification of the new document based on data trends within your vault.

Each time a new document is classified, M-Files goes through an internal process of learning and validation, and over time improves its ability to intelligently analyze content and provide improved class suggestions. This internal process consists of a background operation that periodically samples a document learning set, using a naïve Bayes classifier to determine results.

Smart Classifier is not designed to extract data like an OCR Engine would. Instead, M-Files is looking at the overall characteristics of a document, such as headers, logos, page counts, pixel gradients, language family, vernacular, etc. This means that the documents that use this service must have the following attributes:

  • Consistent internal structure and vernacular
  • Noticeable differences between other intelligently analyzed classes

When these two attributes do not apply to the document classes with which you want to use Smart Classifier, M-Files may confuse the documents and suggest the incorrect class. In addition, for proper recognition and analysis of a document class, it’s important to prime your vault with samples of each document class you'd like to train - usually 50-75 documents for each class for use with Smart Classifier. 

When Is Smart Classifier Not Viable?

There are a few scenarios where Smart Classifier may not be the best option for use with particular classes, and will not provide optimal results:

  • The document layout and design vary drastically within the same class
  • The contents of two or more document classes are very similar to each other 
  • The language used within a document class differs (i.e English and French)

In these cases, Smart Classifier should not be enabled for document classes that fall under these categories and should be excluded from the document training set.

In addition, Smart Classifier often arises as a topic of conversation with clients when discussing initial document migration efforts. However, as mentioned previously, Smart Classifier requires that each class it identifies already have documents within the system, which are used to initially train the intelligence service. However, Smart Classifier can be used later on in your migration efforts once your vault is primed. 

Interested in learning more? Check out our other insights from TEAM IM, or reach out to us on our website at www.teamim.com

No Comments Yet

Let us know what you think