Extracting Fields from an Enormous Data Set

Case Study


Large Shipping Reinsurer: A Scalable, Automated Solution for an Ever-Changing, Massive Data Set


Valora was approached by a large reinsurer, primarily servicing the global shipping industry. As a major corporation with international monetary interests related to commercial shipping, the Client was becoming overwhelmed by the amount of information they aggregated on any given day (over 100 shipping documents and manifests per minute).

Solutions Applied:


Document Analytics

Electronic File Processing

OCR & Text Extraction

Analytics & Data Mining

Products Used:




Download the PDF


Client required a vendor and solution who could meet stringent international security, technology, and data privacy audits. As a data aggregator and controller, Client’s content analysis team is a textbook ClearingHouse use case, meaning they have a regular, ongoing need for a wide breadth of content to be tagged, analyzed, and reported upon. Client also has a consistently fluid set of fields of information to be extracted.


Client turned to Valora as a solution for a number of reasons. First, they were looking for a technology that met their ever-changing information extraction needs, so that they could increase the efficiency of their manual data extraction teams. PowerHouse allows for dynamic and self-modifying configuration changes to automatically capture and report on new fields of information. Second, client has an inconsistent data flow and they need a solution that would scale up and down in concert with the document flow. When provisioned in an Amazon Web Services (AWS) Cloud, PowerHouse has the ability to ramp up its processing power and speed within a few seconds.


Client started with multiple increasingly complex Pilots of the PowerHouse software to implement 12, then 35 unique fields of information. By this point, proof of concept had been aptly demonstrated and the Client could see immediate gain in performance. Going forward, PowerHouse will support 250+ unique fields of information to analyze and extract, once fully operational.