Organize, analyze and tag enterprise documents based on file content, not just file type.
Classification of documents assigns contextual attributes (rich metadata) based on the content of the file, not just the file type. By tagging or classifying enterprise data and understanding the context of the content, organizations can better make decisions on what to do with it, where it belongs, who should have access to it and where to store it.
File metadata defines the physical attributes of the files themselves:
- Creation date
- Last modified date
- File type, file size, file path
- Hash value
Rich metadata defines the contextual attributes based on the content of document and can include:
- Risk level: based on types of PII present
- Document category: contract, blueprint, health record, mortgage application, etc.
- Keyword identification: for complex contextual searches
- Defining disposition: expiration or destruction dates
The only way to understand what your data is, is to classify what it is.
Valora’s technology platform automates the classification or tagging of rich metadata and custom metadata to expedite processing and eliminate human error or omission.
Automating Classification = AutoClassification
Valora’s approach to automating the classification of data combines the machine-learning functionality of Valora’s PowerHouse AutoClassification Platform with a proven 5-step methodology for locating, identifying, analyzing, actioning and monitoring content across multiple data stores.
1. Scan & Locate - Where is it?
- Scan one or many (100,000,000+) documents
- Single and multiple shared drives
- Email repositories
- Document Management & Enterprise Content Management Systems (ECM)
- eDiscovery repositories
- HR, ERP & billing systems
- On-prem and cloud-based document repositories (link to Connectors)
2. Search & Identify - What is it?
- Search by file metadata: doc type, size, date, revisions
- Search by keywords or pattern-matching (regular expression)
- Search by creator or custodian
- Identify duplicates and near-duplicates
- Identify meaningful content vs. Redundant, Obsolete & Trivial content (ROT)
- Identify different types of content (contracts, blueprints,
- Identify PII (Personally Identifiable Information)
3. Analyze & Understand - What am I looking at?
- Preview documents
- OCR unreadable files (images, PDFs, audio files)
- Translate foreign content into English
- Transcribe audio files into text
- Produce reports (high level and drill-down)
4. Decide & Action - What do I do with it?
- Rules-based and machine learning automation for disposition
- Apply rich metadata
- Apply retention schedules
- Apply security access controls
- Migrate on demand
- Delete and sequester
5. Monitor & Audit - How often do I update?
- Set customized refresh and retention schedules based on content type or location
- Crawls, identifies and actions only new or edited data
- Runs in the background with no performance draw on systems or repositories
- Ensures retention schedules and compliance requirements are executed on time