AutoClassification 101

Leverage the power of methodology + technology to intelligently automate document processing, analysis and disposition.

What is AutoClassification?

AutoClassification is a suite of software that automates the analysis and classification of digital content or files – thus “AutoClassification.”

AutoClassification software uses both pattern-matching algorithms and machine learning to detect file contents and attributes, and assign contextual attributes (rich metadata) and disposition (rules) for each document or file. AutoClassification answers: What is this content? and How should it be managed throughout its lifecycle?

Classification answers the question: “What is this content and what should I do with it?” AutoClassification automates the analysis and decisioning of the proper answer at all times.

Why AutoClassify?

SPEED
AutoClassification software processes content at a speed that no human, or team of humans, possibly can. AutoClassification software performs 10,000,000 computations per second, saving months or years of manual file categorization and attribution.

ACCURACY
Sophisticated algorithms remove the possibility of human error or oversight through the automated crawling and classification of every file – nothing is missed or overlooked.

CONSISTENCY
Automating the classification of data allows for consistent and perpetual reconciliation across multiple data repositories, regardless of time of day, time of year, data set size or location.

AutoClassification is used where there are large amounts of disparate content across many data stores. Use AutoClassification for:

data and content management (records management)
records retention
content migration
litigation compliance (legal hold)
data privacy (PII/PHI/PCI)
file security (access & permissions)
data governance (regulated industries)

How it Works

Valora Technologies’ PowerHouse AutoClassification Suite applies a 5-step methodology for locating, identifying, analyzing, actioning and monitoring content across multiple data stores.

1. Crawl & Locate - Where is it?

Crawl one or many (100,000,000+) documents
Single and multiple shared drives
Email repositories and servers
Document Management Systems (DMS) & Enterprise Content Management Systems (ECM)
Collaborative sites (SharePoint, Box, Dropbox, Drive)
Personal shares and laptops
eDiscovery repositories
HR, ERP & billing systems
On-prem and cloud-based document repositories

2. Identify & Tag - What is it?

Tag for file metadata: doc type, size, date, revisions
Identify basic metadata tags: DocumentType, DocumentTitle, Author, Recipients & CCs, and Date
Tag for custom metadata: topics, locations, departments, product names, employee ID
Tag for keywords or pattern-matching (regular expression)
Search by creator or custodian
Identify duplicates and near-duplicates
Identify meaningful content vs. Redundant, Obsolete & Trivial content (ROT)
Identify different types of content (contracts, messages, financial)
Identify PII/PHI/PCI

3. Analyze & Understand - What am I looking at?

Preview documents
OCR text for unreadable files (images, PDFs, audio files)
Translated foreign content into English
Transcribed audio and video files into text
Produce reports (high level and drill-down)
Notifications (via email, IM, text)

4. Decide & Action - What do I do with it?

Rules-based and machine learning automation for disposition
Apply or append rich metadata
Apply retention schedules and legal hold
Apply security access controls
Migrate on demand
Delete and sequester
AutoRedact sensitive information
Initiate custom workflows

5. Monitor & Audit - How often do I update?

Set customized refresh and retention schedules based on content type or location
Crawl, identify and action only new, edited or acquired data
Runs in the background with no performance draw on systems or repositories
Ensures retention schedules and compliance requirements are executed on time

AutoClassification in Action

The practical applications and use cases of AutoClassification can be used anywhere documents, files or content need to be located, identified, analyzed and actioned across one or many data environments.

Organizations use Valora Technologies’ PowerHouse AutoClassification engine for:

File Clean-Up / ROT Processing

Keep relevant content, remove the the junk.

Valora crawls and monitors multiple shared drives, folders, ECMs and email servers to locate, identify, tag and eliminate Redundant, Obsolete and Trivial (ROT) content within your organization.

Keeps repositories clean and compliant
Reduces storage costs by eliminating an average of 30-40% of ROT content
Identifies & removes unnecessary content based on:
- relevant content (keywords) or lack of relevant content (spam)
- exact copies (duplicates) and versions (near duplicates)
- file type (temp files) and file size (0 byte files)

LEARN MORE

Retention & Content Lifecycle Management

Easily implement retention schedules and workflows.

PowerHouse ensures files are only kept for the amount of time required per policy, per business process, per repository.

Continuously monitor files across multiple data repositories
Dispose of files properly and on time
Permit special accountability to Legal Hold
Migrate permanent records to appropriate archives

LEARN MORE

Content Migration

Migrate content from multiple repositories into new ones. Migrate content to the cloud.

PowerHouse connects with over 30 different on-prem and cloud-based content management systems to analyze, migrate or consolidate content from one place to another.

Removes unnecessary content
Classifies and applies rich metadata to content
Migrates content into new systems
Migrates content for archival purposes
Promotes content to the cloud
Establishes and respects taxonomies and ontologies.

LEARN MORE

Mergers, Acquisitions & Divestitures

Faster due diligence, organized data rooms, post-acquisition data merges.

PowerHouse speeds up the due diligence process and eliminates the need for manual data searches by locating, identifying and tagging relevant content. After the deal is closed, PowerHouse migrates post-acquisition merge and/or divestiture-based separation of content into the acquiring/divested organization’s data environment.

Pre-Due Diligence: remove Redundant, Obsolete and Trivial data (ROT) from repositories
Due Diligence: locate, identify and tag relevant content (corporate records, contracts, bank records, etc)
Data room: migrate clean content and organized content into third-party virtual data rooms
Post-acquisition: merge data into new business environments and/or divest data into component ones
Coordinate Legal Hold: across organizations, matters and jurisdictions

Data Privacy & Security

Identify Personally Identifiable Information (PII) and other sensitive data.

PowerHouse locates and identifies files that contain PII, PHI or PCI and identifies documents that may be sensitive in nature (contracts, employment agreements, etc). Identify Personal Data (PD) or European citizens that may be subject to GDPR.

Locate and identify files that contain PII, PHI or PCI
Classify documents as PII-sensitive for proper handling
Apply security access controls and business processes for PII
Satisfy data protection and privacy legislation
Satisfy GDPR data controller and data processor compliance
Automate Data Subject Access Requests (DSARs)
Satisfy CCPA, NYDFS and other emerging privacy regulations

LEARN MORE

Data Under Management

Set it and forget it with a single platform that is aware of content across the enterprise and the world.

Valora PowerHouse AutoClassification implements a tooled and automated approach to Content Management that runs perpetually in the background across all data repositories and silos.

Constantly monitors for and indexes new, edited, or acquired content
Sets and applies rich metadata attribution
Implements multi-faceted disposition automatically
Runs without interfering with other systems or business processes
Automatic scheduling for heavy system loads and refresh activities
Machine learning rules automatically update with changes in strategy, systems, personnel and regulations.

LEARN MORE

Legal & eDiscovery

Efficient and cost-effective document analysis.

Valora offers rapid, low-cost and highly defensible eDiscovery processing, review, hosting and professional services. Our rules-based Technology-Assisted Review (TAR) options are optimized for document collection and analysis.

Left side of the Electronic Discovery Reference Model (EDRM)
Gather and analyze relevant documents for Early Data Assessment (EDA)
Identify appropriate content to be placed under Legal Hold
Automate presumptive privilege, responsiveness and data requests
Migrate final data sets to a third-party review platforms

LEARN MORE

Records & Information Management

Automate customized end-to-end records management solutions.

Valora’s PowerHouse content management platform analyzes, manages and automates large-scale and customizable records management solutions for any size and type of enterprise.

Create customized records management solutions, tailored to your Retention Schedules and policies
Manage and automate records and content management disposition across multiple data silos
Records retention schedules and workflows
Tag and remove ROT and duplicates
AutoClassify record types based on rich metadata (Document Type, Topic, Custodian, Source, CY+ dates)
Professional Services for document management & workflow consulting

LEARN MORE



Resources

Case Studies



AutoClassification for Legal Hold, Retention & Relevance



Security through AutoClassification



AmLaw 10 Firm AutoClassifies Extensive Data Backlog While Maintaining Strict Country-wide Data Security Protocols



Helping the FDIC process critical mountains of data from the subprime mortgage crisis with PowerHouse AutoClassification

Webinars



Universal AutoClassification Webinar Series



AutoClassification 101 Webinar Series

Related Resources

Explore Valora Technologies’ Resource Library for helpful articles, videos, presentations, white papers, blog posts and more.

Data Privacy: Practical Applications & How to Get it Right

Learn how a holistic approach to information management can expedite the identification, evaluation and management of sensitive content within your organization…

5 Things Corporate Litigation Professionals Can Learn From Their Records Management & Information Governance (RMIG) Counterparts

While Litigation and Records Management & Information Governance (RMIG) departments may have different goals, there are commonalities…

Constructing a “Virtual Vault” for Legal Contracts

Accumulating over many decades of business, through multiple corporate structures, reorganizations and mergers…

Organization and Control for 30 Years of Electronic Documents

Seeking to streamline their work processes, including easy, on-demand search and retrieval of information…

BROWSE MORE RESOURCES