Mastering AutoClassification: Pro Tips (Part 2)

You didn’t think we’d only have 10 tips did you? Here are 10 MORE more AutoClassification Pro Tips:


11. Document the “Before”

Prove your ROI. 

Make sure you accurately and completely document the current “Before” state practices of tagging files, searching for them, retrieving them and managing them. Make sure your documentation is metrics-based. For example, how many minutes (or hours or days) did it take to locate your content? How many files were missing, wrong or inconsistent (error rates)? What is the hourly cost of the manual labor put in? These Before metrics will help cement the validity and return of your After ones.


12. Ensure Proper Subjects

Title vs. DocumentType vs. FileName. 

Don’t confuse the goals of these three essential metadata tags. Title should be the exact, given title of the file and NOT the filename or the type of document it is. DocumentType is the kind of document (its form & function), and FileName is the file label the creating party (or system) gave it.  Consider a Letter with a Re: line and a filename:
♦ DocumentType = Letter
♦ Title/Subject = In regards to our meeting of 24 January 2018
♦ FileName = responseletterJJ.docx


13. Use Custom DocTypes

Match your document Types to your Company’s Operations.

Every organization is different, with different workflows, practices and documentation needs. Some store critical information in contracts, others in financial transactions, and still others in personnel or client files. AutoClassification supports an infinite array of custom Document Types to ensure proper classification that fits your company’s use cases and content types. For even more flexibility, consider using hierarchical or conditional DocTypes that offer layers of nuance and customization.


14. More About DocTypes

Focus your energies on DocumentType. 

We cannot stress DocTypes enough, and so have devoted two tips this time to Document Types! If you have limited bandwidth to setup an AutoClassification project (and who doesn’t?), spend your time and energy on creating an ideal set of generic + custom Document Types. Strong DocTypes will properly set up all your future lifecycle and disposition handling, and are the best place for your investment of time, energy and budget. Furthermore, good DocTypes help ensure easy and complete search and retrieval results for a wide variety of downstream data requests.properly.


15. Aliases & Codenames

Actively track “insider” terms.

 Many organizations use codenames to label specific, sometimes covert, operations. Typically these codenames mean nothing outside of the organization and are eventually removed in favor of a more formal, public-use name (such as a launched brand), or when the project has completed its course. Unfortunately, the content around the codenamed time period continues to live on when the “tribal knowledge” about its origins has long passed. Codenames are an excellent rich metadata tag and should be properly managed so AutoClassification can benefit from their intent and classification properties. We recommend managing codenames as a list, file or database, similar to other key enterprise data lists, such as employee names, office locations, product names or supplier lists.


16. Standardize on Currency Formats

Dollars & Sense. 

Currency can be tricky. Most people in the US use the dollar sign, dollars, cents (2-digits) format, like this: $27.99. But this is definitely not standard worldwide. To save a lot of difficulty in managing, organizing, search & retrieval down the line, standardize on one currency and one format. Then convert other currencies and formats to the standard schema at processing time.


17. Leave Room to Experiment

Iterate fields as you go. 

AutoClassification opens a lot of possibilities for analyzing and tagging content. You don’t need to have your final “ideal state” designed on day one. Leave yourself permission and room to experiment over time with new fields, new rules and new ways of organizing and sorting your data. A flexible processing engine should be able to support multiple iterations and configurations, even suggesting the best configuration based on the data itself.


18. Go Beyond Identical (Forensic) Dupes

Understand functional dupes & near dupes. 

Forensically identical content is straight-up ROT, and can likely be safely, defensibly deleted from content stores. But what about Functional Duplicates who are similar in every way, but not forensically identical (based on Hash)? Functional dupes include things like an original Word or Excel file also saved in PDF format. Keep both formats or only one? And if so, which? Near Dupes are files that are similar but not as close as Functional or Forensic dupes are. Good examples of Near Dupes include versions, templates and “stock” content like form letters. For Near Dupes, the decisions revolve more around association, storage and retrieval, rather than removal.


19. Consider Storage Costs

Live files or secondary copy?

Solutions that require a second set of content to be stored in addition to the first will increase your storage costs commensurately. In addition, you are setting up for inconsistencies and confusion across the populations. Aim to work on live data, where possible, and auto-sync where not possible.


20. Aliases

Understand and manage nicknames.

Who are you? Robert, Rob, Robbie, Bob, Bobby or “Stretch”? If the answer is, “all of them,” welcome to the wonderful world of aliases and nicknames. Some nicknames are so far from the original person’s name that they can be almost impossible to trace back. Like codenames (see Pro Tip #15), we recommend managing aliases as a list, file or database, similar or perhaps tied to other key enterprise data lists, such as employee names, office locations, product names or supplier lists.