PowerHouse

Valora’s AutoClassification Engine: Transforming Data into Insights, Action, and Compliance

fav-1-1

Comprehensive AutoClassification & Data Analysis Engine

Valora’s PowerHouse is an advanced, machine-learning AutoClassification engine designed to revolutionize how organizations manage and analyze corporate information. With the ability to connect to diverse data repositories, PowerHouse performs full-text analysis on files, identifying critical information and applying rich metadata for better organization, disposition, and compliance.

How it Works

1) Data Discovery

Valora’s PowerHouse uses source repository APIs and service accounts to connect to and scan multiple data repositories, including unstructured and structured data in cloud and on-prem environments.

2) Intelligent Analysis

PowerHouse performs a full-text analysis of every file’s contents, not just the file metadata, identifying key dates, personal data, sensitive data, key words, and other important data elements.

3) Automated Classification

PowerHouse AutoClassifies each document to your Records Retention Schedule based on the content and context of every file, applying rich metadata elements that enable detailed analysis, point-and-click drilldown, bulk editing, and instant reporting.

4) Defensible Disposition

PowerHouse applies automated rules and workflows based on your document retention, legal hold, regulatory, or other business policies, enabling auditable, defensible disposition.

Content Analysis

Full-text content analysis identifies relevant information data facets from each file. PowerHouse automatically tags and classifies each document, including applying appropriate retention dates, data auditing, and disposition handling rules.

Why PowerHouse?

Flexible Deployment

Valora PowerHouse Suite is deployed in Valora’s cloud, in your private or public cloud, or on-prem. The Valora Professional Services team offers full-service project management, operational support, and technical support.

True AutoClassification

Valora’s PowerHouse platform “reads” every file in every repository, performing a full-text analysis of the content. It AutoClassifies each file based on content and context – not just the existing file- or system-generated metadata – to identify and flag important data facets.

Processing in Place

PowerHouse uses custom connectors to access and crawl multiple data sources and repositories to identify and analyze files – leaving them where they are, accessible to users, and unaffected by the crawl-analysis process.

Repository Agnostic

Valora PowerHouse scans and analyzes cloud and on-prem, unstructured and structured data systems, including fileshares, archival systems, CRM systems, HRIS systems, data lakes, and more.  We account for it all.

Machine Learning AI

PowerHouse incorporates elements of probabilistic systems, Bayesian learning, natural language processing, and machine learning (AI) to power its complex content analysis.

Defensible Disposition with Event Triggers

PowerHouse initiates disposition workflows based on events, thresholds, activity or inactivity, and/or within a certain time frame. PowerHouse automates the defensible disposition of content at source, or provides disposition reports to support manual disposition.

Additional Features

AutoRedaction

Identifies and redacts sensitive content per user permissions

AutoTranscription

Transcribes audio and video files into searchable content

AutoTranslation

Identifies and translate documents to English (or any other language)

Mockuments

Creates a user-friendly way to view and interact with structured data

PowerHouse FAQ

How fast does PowerHouse crawl and scan repositories and fileshares?

Short answer: It depends.

Long answer: It depends on how many files, what filetypes and size that PowerHouse is processing and analyzing. For example: A 2-page document processes faster than a 250-page document. A single email with one attachment processes faster than a .pst file with 100,000 embedded email messages and attachments.

The system will OCR (Optical Character Recognition) PDFs or other scanned image formats, giving them readable text as it goes. PowerHouse will also AutoTranslate non-English documents into English (or other languages) and AutoTranscribe audio and video files into text. These are examples of “heavy lifting” processing that takes a little longer than “reading” a straight-up Word file.

To put an actual number to it, the first baseline processing run – scanning and full text analysis of every file – runs at about 0.5 GB per processor-hour of uncompressed, expanded data.  The more processors utilized, the more processing accomplished per hour.

Valora’s PowerHouse AutoClassification engine is offered in three tiers. The higher the tier, the higher the number of processors, the faster the processing.

  • PowerHouse Starter – processes 2.5 GB, or approximately 6,250 files per hour
  • PowerHouse Foundation – processes 7.5 GB, or approximately 18,750 files per hour
  • PowerHouse Enterprise – processes 20.0 GB, or approximately 50,000 files per hour

In the first few weeks after set-up and configuration, we benchmark how fast things are processing based on your actual content so we can better forecast infrastructure needs going forward.

Subsequent data processing runs (for data updates, new configurations or handling rules, etc.) typically run at about 1.5 – 5.0 GB per processor-hour.

Where is PowerHouse deployed?

PowerHouse is deployed in Valora’s cloud (SaaS), in your private or public cloud, on-premise, or in a hybrid combination of environments (as is sometimes required for specialized cross-border data storage regulations or highly fragmented data estates).

What kinds of systems can PowerHouse scan?

Valora PowerHouse connects to structured and unstructured data systems to crawl and analyze content. Some examples of systems we connect to include:

  • On-prem unstructured systems: Windows fileshares.
  • On-prem structured or semi-structured systems: ECMs, DMSs, ERPs, databases, etc.
  • Cloud unstructured systems: SharePoint online, Dropbox, Box, etc.
  • Cloud structured or semi-structured systems: Data lakes/warehouses, Oracle, MSSQL, etc.
  • Could applications: Salesforce, Workday, SAP, NetSuite, Jira, etc.
  • Email systems: Microsoft 365/Outlook, Google mail, etc.
How does PowerHouse connect to these disparate systems?

Valora’s Implementation team builds custom connectors using APIs or other direct database extraction methods to access the content at source. Some examples of the techniques used for custom connector development include:

  • Open APIs – application programming interfaces made publicly available to software developers by the application or repository.
  • Microsoft Graph API – a RESTful web API that enables access to all Microsoft Cloud service resources.
  • JDBC (Java Database Connectivity) – the Java API that manages connecting to a database, issuing queries and commands, and handling result sets obtained from the database.
  • Open Database Connectivity (ODBC) interface – a C programming language interface that makes it possible for applications to access data from a variety of database management systems (DBMSs).
Where do the resulting metadata tags go?

There are many options for what PowerHouse does with the tags, rules, and dispositions that it creates. All of this meta-information lives within its internal database, and is available for push or promotion to any/all of the following destinations:

  • As inputs to the BlackCat data visualization/graphical representation platform (or other dashboard)
  • As reports or data files (csv, excel, json, etc.)
  • As database/repository/DMS/ECM fields
  • As SharePoint or other collaborative file-share fields or metadata
  • As file tags, appended to the file metadata (as supported by the file type) or as part of a file naming convention

Explore the AutoClassification Suite

Valora’s PowerHouse, BlackCat, and Connectors work together to deliver an integrated Information Governance 
platform to help enterprises understand, manage, repair, govern, action, protect, and report on their data.

Valora’s AutoClassification engine scans, analyzes, and applies customized classification tags and automated disposition rules to all enterprise content.

Valora’s metadata visualization and management tool helps users interact with, manage, collaborate, report on, and action all enterprise data through a “single pane of glass.”

Custom connectors enable PowerHouse to connect to, crawl, analyze, and repair enterprise content within unstructured, structured, and hybrid systems, whether in cloud or on-prem environments.