Data Discovery & Classification
Positively identify your content: where it is and what it is so you can determine how and when it should be handled.


Discover & analyze content in place.
Even with the advances of consolidated content platforms like Microsoft 365, the reality is most organizations will continue to use multiple systems to house enterprise content. The real risk many teams face is not knowing where their content is, what their content is, or who has access to it in public folders or systems.
Valora brings all previously unknown content into view by connecting to structured and unstructured data systems, crawling, analyzing and reporting on that content in place. We deploy custom connectors using APIs or other direct database extraction methods to access, evaluate, protect, repair, and disposition the content at source.
Uncover all enterprise content.
Valora connects to unstructured and structured cloud, onprem and hybrid data environments including:
- On-prem unstructured systems: Windows fileshares.
- On-prem structured systems: ECMs, databases, etc.
- Cloud unstructured systems: SharePoint online, Dropbox, Box, etc.
- Cloud structured systems: Data lakes/warehouse, Oracle, MSSQL, etc.
- Could applications: Salesforce, Workday, SAP, NetSuite, etc.
- Email systems: Microsoft 365, Google mail, etc.


AutoClassify every file.
Valora’s AutoClassification platform performs a full-text analysis of every file, not just the existing file or system-generated metadata, identifying key data elements unique to the document, and important to your organization, industry or jurisdiction.
Based on this analysis, Valora AutoClassifies each document to your Records Retention Schedule, identifies the content with legitimate business value from the content without (ROT), and unifies the detailed analysis, management, reporting and disposition of your enterprise data estate.

Scanning unstructured content.
Manually categorizing unstructured data is challenging due to its volume, complexity, variability, and lack of predefined organization.
Valora’s PowerHouse process and classifies unstructured data, such as documents, emails, and other corporate documents, using advanced AutoClassifiction algorithms. This full-text analysis allows PowerHouse to read and analyze every file without a human having to open and classify it.
Scanning structured data environments.
Although databases house data in structured fields, it can be difficult for Information Governance teams to be able to access, classify and act on that data. Valora’s PowerHouse connects to, scans, analyzes, classifies and performs defensible disposition on fielded data in structured databases.
PowerHouse takes the extracted database text and turns each database record into its own “mockument” so users can read, understand, and interact with the content in BlackCat.

Rich Metadata Tagging
AutoClassification enables the creation of a virtually unlimited range of enriched metadata tags. At Valora, we’ve encountered everything from straightforward tags like Document Type, Custodian and Keywords to more niche examples, such as Latitude/Longitude, Japanese Showa Dates, and everything in between. However, there is a core set of basic tags that almost every AutoClassification project will produce. They include:
Basic Identifiers
Document Type, Document Title, Document Date, Geographic Location, Document ID, Attachment Range
People Fields
Author/Custodian, Recipient/Audience, Employee Name, Customer/Patient Name, Contract Signatories
Records Management Fields
Record Class, Regulatory Citation, Retention Period, Expiration Date, Legal Hold, and ROT
Data Privacy Fields
Data Privacy Type & Detail: PII, PHI, PCI, Sensitivity Class, Minimization Status, Redacted/Pseudonymized
Attributes Fields
Keywords, Duplicate, Version, Language, Product Names, Client/Customer Data, Employee/Personnel Data
From there, the possibilities are endless. It is easy to create custom AutoClassification tags for certain verticals, document types, and specialized needs that might be for just your organization or department. Examples of custom rich metadata tags are:
Custom Document Types
Geographic Tags
Personnel Fields
Lines of Business
Product name/number, Case matter name/number, Supervisor, etc.
Contractual & Finance Fields
Benefits of using Valora for Data Discovery & Classification

True AutoClassification
Valora’s PowerHouse platform “reads” every file in every repository, performing a full-text analysis of the content. It AutoClassifies each file based on content and context – not just the existing file- or system-generated metadata – to identify and flag important data facets.
Determine True Document Type, Not File Type
Knowing how many Word docs, Excel spreadsheets or PDFs is nice if you want to know how many files you have in a given repository, but tells you nothing about the contents of said files. PowerHouse reads every file to determine the true Document Type of each file, for example: a vendor contract vs. an employee contract.
Comprehensive Data Mapping
More than a simple data map, AutoClassification tells you precisely where the land mines are. Way beyond how many files you have and where, PowerHouse determines the specific Document Type for each file (ex. Contract, Board Minutes, Balance Sheet, etc.)m including whether it contains sensitive information (ex. employee records, client personal data, financial reports, etc.), how long it should be retained, and under what circumstances it can be defensibly deleted.
Reduce Risk
By tagging or classifying enterprise content you are actively taking responsibility over the data you hold. You are removing costly “surprises” that can result with litigation, eDiscovery, data privacy and information security events when unplanned activities necessitate a deep dive into “what you have or hold.”
Reduce Human Effort & Error
AutoClassification software processes content at speeds that no human, or team of humans, possibly can. It greatly reduces the frequency and impact of human error, inconsistency, and oversight – providing a comprehensive and thorough review of all enterprise data systematically.
Unlimited Metadata Values
Unlike other classification tools that are limited to point-in-time, single metadata values, Valora’s rich metadata applications support both multiple and hierarchical taxonomic data structures, including automatic updates based on workflows, events and other triggers. A single file can be classified as 1) a Contract, and more specifically 2) a Vendor Contract, and also tagged as 3) Containing sensitive information about a certain topic or person, while also be 4) flagged as being under Legal Hold.
Data Discovery & Classification FAQ
The first baseline run – scanning and full text analysis of every file – runs at about 0.5 GB per processor-hour of uncompressed/expanded data. Valora’s AutoClassification engine, PowerHouse, is offered in three tiers. The higher the tier, the higher the number processors, the faster the processing.
- PowerHouse Starter – can process 2.5GB, or approximately 6,250 files per hour
- PowerHouse Foundation – can process 7.5GB, or approximately 18,750 files per hour
- PowerHouse Enterprise – can process 20GB, or approximately 50,000 files per hour
Subsequent data processing runs (for data updates, new configurations or handling rules, etc.) typically run at about 1.5 – 5 GB per processor-hour.
Yes. Some clients opt for Lite Processing – a fast, thorough scan of the file inventory of a file share or other repository. Lite Processing results in a comprehensive file listing, including all identical duplicates, their size, full path, last accessed date, and last modified date. This approach is often used to create a starting point for risk assessment, gap analysis, and processing recommendations. The final result of Lite Processing yields an automated, rules-driven recommendation for remediating each share analyzed per its resulting risk profile.
Yes. While we have processed and identified thousands of different Document Types over the years, there may be Document Types unique to your organization or industry that we have not seen before. In such cases, Valora trains the system to identify your unique Document Type formats and attributes for accurate classification going forward.
Yes. There are 2 ways Valora integrates with physical records.
- Valora inherits digitized physical documents and metadata from your document storage provider. Each scanned file acts, appears, and is handled like its electronically-stored siblings., including full-text analysis, enriched metadata, and automated handling rules.
- For boxes and documents still in physical format, we integrate with your storage vendor’s inventory tracking systems, representing each physical box (or file) as a “Mockument” – a record placeholder inside BlackCat used to report on or trigger actions at the Box or Document level.
Yes. Some languages are supported with native, in-language processing, such as French, Spanish, and other Roman character-set languages.
For other language analysis, Valora identifies non-English documents and AutoTranslates them into whatever language your team is comfortable with. We integrate with Google Translate, and its support of over 240 world languages. We create your own Google API key and download languages to the same location as Valora systems are deployed (our cloud, your cloud, on on-prem). This allows for translating “onsite” without the content going to Google’s cloud.
For initial set-up, if you know you have certain files types or document you want to train the system on, it could be helpful for your team to provide a description of the document or format, or better yet provide 2-3 examples of specific files. This guidance data will help us to train the system on what to look for and how to identify your unique Document Types.
Other than providing us a handful of templates (if you have them), the impact to your team is minimal. We will occasionally have tagging questions for your InfoGov or content-holder teams, and we will require basic, service account access and permissions from your IT team.
Yes. Valora can read, inherit and apply existing metadata during the analysis process to maintain the metadata values already applied to your data. Furthermore, we will incorporate prior-tagged data into the resulting taxonomy and any subsequent rules processing.
Yes. Valora integrates with third-party systems that use metadata tags. With the correct write permissions, Valora and can send or migrate the file itself and its associated metadata to the target system.