Data Classification
Organize, analyze and tag enterprise documents based on file content, not just file type.
Classification of documents assigns contextual attributes (rich metadata) based on the content of the file, not just the file type. By tagging or classifying enterprise data and understanding the context of the content, organizations can better make decisions on what to do with it, where it belongs, who should have access to it and where to store it.
File metadata defines the physical attributes of the files themselves:
- Creation date
- Author
- Last modified date
- File type, file size, file path
- Hash value
Rich metadata defines the contextual attributes based on the content of document and can include:
- Risk level: based on types of PII present
- Document category: contract, blueprint, health record, mortgage application, etc.
- Keyword identification: for complex contextual searches
- Defining disposition: expiration or destruction dates
The only way to understand what your data is, is to classify what it is.
Valora’s technology platform automates the classification or tagging of rich metadata and custom metadata to expedite processing and eliminate human error or omission.
Automating Classification = AutoClassification
Valora’s approach to automating the classification of data combines the machine-learning functionality of Valora’s PowerHouse AutoClassification Platform with a proven 5-step methodology for locating, identifying, analyzing, actioning and monitoring content across multiple data stores.
1. Scan & Locate - Where is it?
- Scan one or many (100,000,000+) documents
- Single and multiple shared drives
- Email repositories
- Document Management & Enterprise Content Management Systems (ECM)
- eDiscovery repositories
- HR, ERP & billing systems
- On-prem and cloud-based document repositories
2. Search & Identify - What is it?
- Search by file metadata: doc type, size, date, revisions
- Search by keywords or pattern-matching (regular expression)
- Search by creator or custodian
- Identify duplicates and near-duplicates
- Identify meaningful content vs. Redundant, Obsolete & Trivial content (ROT)
- Identify different types of content (contracts, blueprints,
- Identify PII (Personally Identifiable Information)
3. Analyze & Understand - What am I looking at?
- Preview documents
- OCR unreadable files (images, PDFs, audio files)
- Translate foreign content into English
- Transcribe audio files into text
- Produce reports (high level and drill-down)
4. Decide & Action - What do I do with it?
- Rules-based and machine learning automation for disposition
- Apply rich metadata
- Apply retention schedules
- Apply security access controls
- Migrate on demand
- Delete and sequester
5. Monitor & Audit - How often do I update?
- Set customized refresh and retention schedules based on content type or location
- Crawls, identifies and actions only new or edited data
- Runs in the background with no performance draw on systems or repositories
- Ensures retention schedules and compliance requirements are executed on time
Data Privacy
See how Valora locates and actions Personally Identifiable Information (PII) across the enterprise.
Records Management
Discover how Records and Information Management professional leverage Valora’s technology.
ROT & File Clean-Up
See how Valora reduces the amount Redundant, Obsolete and Trivial data across the enterprise.
Related Resources
Explore Valora Technologies’ Resource Library for helpful articles, videos, presentations, white papers, blog posts and more.
PowerHouse AutoClassification
The world-class Machine Learning technology automatically determines document content, attributes & purpose,…
Using Predictive Analytics and Document Data Mining to Classify and Catalog Contracts
A large, international Fortune 50 consumer products company, had saved and stored every contract and agreement since the dawn of time..
Data Privacy: Practical Applications & How to Get it Right
Learn how a holistic approach to information management can expedite the identification, evaluation and management of sensitive content within your organization…
Implementing a Virtual Vault with AutoClassification
Learn what a “Virtual Vault” is and how AutoClassification can be leveraged to tie your organization’s content together across time, space, people, and topics.