PowerHouse
Valora’s AutoClassification Engine: Transforming Data into Insights, Action, and Compliance

Comprehensive AutoClassification & Data Analysis Engine

How it Works
1) Data Discovery
Valora’s PowerHouse uses source repository APIs and service accounts to connect to and scan multiple data repositories, including unstructured and structured data in cloud and on-prem environments.
2) Intelligent Analysis
PowerHouse performs a full-text analysis of every file’s contents, not just the file metadata, identifying key dates, personal data, sensitive data, key words, and other important data elements.
3) Automated Classification
PowerHouse AutoClassifies each document to your Records Retention Schedule based on the content and context of every file, applying rich metadata elements that enable detailed analysis, point-and-click drilldown, bulk editing, and instant reporting.
4) Defensible Disposition
PowerHouse applies automated rules and workflows based on your document retention, legal hold, regulatory, or other business policies, enabling auditable, defensible disposition.
Content Analysis
Full-text content analysis identifies relevant information data facets from each file. PowerHouse automatically tags and classifies each document, including applying appropriate retention dates, data auditing, and disposition handling rules.

Why PowerHouse?

Flexible Deployment
Valora PowerHouse Suite is deployed in Valora’s cloud, in your private or public cloud, or on-prem. The Valora Professional Services team offers full-service project management, operational support, and technical support.
True AutoClassification
Valora’s PowerHouse platform “reads” every file in every repository, performing a full-text analysis of the content. It AutoClassifies each file based on content and context – not just the existing file- or system-generated metadata – to identify and flag important data facets.
Processing in Place
PowerHouse uses custom connectors to access and crawl multiple data sources and repositories to identify and analyze files – leaving them where they are, accessible to users, and unaffected by the crawl-analysis process.
Repository Agnostic
Valora PowerHouse scans and analyzes cloud and on-prem, unstructured and structured data systems, including fileshares, archival systems, CRM systems, HRIS systems, data lakes, and more. We account for it all.
Machine Learning AI
PowerHouse incorporates elements of probabilistic systems, Bayesian learning, natural language processing, and machine learning (AI) to power its complex content analysis.
Defensible Disposition with Event Triggers
PowerHouse initiates disposition workflows based on events, thresholds, activity or inactivity, and/or within a certain time frame. PowerHouse automates the defensible disposition of content at source, or provides disposition reports to support manual disposition.
Additional Features
AutoRedaction
Identifies and redacts sensitive content per user permissions
AutoTranscription
Transcribes audio and video files into searchable content
AutoTranslation
Identifies and translate documents to English (or any other language)
Mockuments
Creates a user-friendly way to view and interact with structured data
PowerHouse FAQ
Short answer: It depends.
Long answer: It depends on how many files, what filetypes and size that PowerHouse is processing and analyzing. For example: A 2-page document processes faster than a 250-page document. A single email with one attachment processes faster than a .pst file with 100,000 embedded email messages and attachments.
The system will OCR (Optical Character Recognition) PDFs or other scanned image formats, giving them readable text as it goes. PowerHouse will also AutoTranslate non-English documents into English (or other languages) and AutoTranscribe audio and video files into text. These are examples of “heavy lifting” processing that takes a little longer than “reading” a straight-up Word file.
To put an actual number to it, the first baseline processing run – scanning and full text analysis of every file – runs at about 0.5 GB per processor-hour of uncompressed, expanded data. The more processors utilized, the more processing accomplished per hour.
Valora’s PowerHouse AutoClassification engine is offered in three tiers. The higher the tier, the higher the number of processors, the faster the processing.
- PowerHouse Starter – processes 2.5 GB, or approximately 6,250 files per hour
- PowerHouse Foundation – processes 7.5 GB, or approximately 18,750 files per hour
- PowerHouse Enterprise – processes 20.0 GB, or approximately 50,000 files per hour
In the first few weeks after set-up and configuration, we benchmark how fast things are processing based on your actual content so we can better forecast infrastructure needs going forward.
Subsequent data processing runs (for data updates, new configurations or handling rules, etc.) typically run at about 1.5 – 5.0 GB per processor-hour.
PowerHouse is deployed in Valora’s cloud (SaaS), in your private or public cloud, on-premise, or in a hybrid combination of environments (as is sometimes required for specialized cross-border data storage regulations or highly fragmented data estates).
Valora PowerHouse connects to structured and unstructured data systems to crawl and analyze content. Some examples of systems we connect to include:
- On-prem unstructured systems: Windows fileshares.
- On-prem structured or semi-structured systems: ECMs, DMSs, ERPs, databases, etc.
- Cloud unstructured systems: SharePoint online, Dropbox, Box, etc.
- Cloud structured or semi-structured systems: Data lakes/warehouses, Oracle, MSSQL, etc.
- Could applications: Salesforce, Workday, SAP, NetSuite, Jira, etc.
- Email systems: Microsoft 365/Outlook, Google mail, etc.
Valora’s Implementation team builds custom connectors using APIs or other direct database extraction methods to access the content at source. Some examples of the techniques used for custom connector development include:
- Open APIs – application programming interfaces made publicly available to software developers by the application or repository.
- Microsoft Graph API – a RESTful web API that enables access to all Microsoft Cloud service resources.
- JDBC (Java Database Connectivity) – the Java API that manages connecting to a database, issuing queries and commands, and handling result sets obtained from the database.
- Open Database Connectivity (ODBC) interface – a C programming language interface that makes it possible for applications to access data from a variety of database management systems (DBMSs).
There are many options for what PowerHouse does with the tags, rules, and dispositions that it creates. All of this meta-information lives within its internal database, and is available for push or promotion to any/all of the following destinations:
- As inputs to the BlackCat data visualization/graphical representation platform (or other dashboard)
- As reports or data files (csv, excel, json, etc.)
- As database/repository/DMS/ECM fields
- As SharePoint or other collaborative file-share fields or metadata
- As file tags, appended to the file metadata (as supported by the file type) or as part of a file naming convention
Explore the AutoClassification Suite
Valora’s PowerHouse, BlackCat, and Connectors work together to deliver an integrated Information Governance
platform to help enterprises understand, manage, repair, govern, action, protect, and report on their data.
Valora’s AutoClassification engine scans, analyzes, and applies customized classification tags and automated disposition rules to all enterprise content.
Valora’s metadata visualization and management tool helps users interact with, manage, collaborate, report on, and action all enterprise data through a “single pane of glass.”
Custom connectors enable PowerHouse to connect to, crawl, analyze, and repair enterprise content within unstructured, structured, and hybrid systems, whether in cloud or on-prem environments.