Frequently Asked Questions

Don’t be shy!  We get asked these things a lot.  Here are a few common questions to start you off.

It’s really just a semantic difference, but coding can sometimes mean the action of applying an actual “code” to a record’s field. Indexing can sometimes mean the action of abstracting information for a field. The difference is subtly and largely ignored. Both terms are used interchangeably to mean extracting or otherwise deriving meta-information about a documents and its contents, and then placing that information into data fields.
Coding provides useful information that may not be obvious or even attainable at all from searching. For example, you can search for the name Suzy Smith in a document and if it is present (and legible) it will come up. However, there is no way to search whether Suzy is the Author of that document, rather than simply being named in its contents. If she is, in fact, the Author, then there is additional information about her knowledge of the documents contents, her intent in creating the document and the time at which this all took place. In addition, coding forms the basis for much more sophisticated analysis, utilized in rules processing, trend forecasting and data visualization.
This is a matter of considerable debate and is most dependent upon your needs for the project. Most litigation documents are coded for the “Basic Bibliographic” set of fields: Author, Recipient, CC/BCC, Date, Subject and Document Type. Most records management and information governance documents are coded for easy retrieval and analysis, with fields such as: Document Type, Date, Title, Key Topics and Source. Finally, specific industry coding often applies to medical records, financial & mortgage documents and construction & real estate matters.
AutoCoding is performed by machine algorithms, with human input, training and auditing of the work. Manual Coding is performed entirely by human effort and is often known as data entry. Hybrid coding is a technique that utilizes first-pass of machine-coding (Auto), followed by manual Quality Control (eyes-on) to review and verify the results. The goal of AutoCoding is to take the speed and cost basis of AutoCoding and marry it with the judgment and nuances available in Manual Coding.
Almost all documents can be AutoCoded, so long as their contents are readable by machine processing. That means they must be typeset (not handwritten) and relatively clean image quality (ideal is black text on white background). There are literally thousands of Document Types that can be AutoCoded with excellent results. When in doubt, ask to run a sample set through to gauge results.
All Valora Coding Services, of any variety, are always performed onsite at our Bedford, MA Document Center or onsite at our clients’ locations. Nothing is ever sent offshore.
Valora uses our proprietary platform, PowerHouse, for both AutoCoding and subsequent QC of all work. PowerHouse is an integrated processing platform that has extremely sophisticated capabilities for both statistical pattern-matching and for managing large-scale coding or review teams.
Yes. But be prepared for strict security and hardware processes to put in place.
Yes, as of 2018 PowerHouse is now able to be licensed.
The amount of time is highly contingent upon the availability of the case’s subject-matter-experts (SMEs), often outside counsel attorneys who will provide input and feedback to Valora’s project managers. Examples of SME input and feedback include: details of what they are looking to find, or rules for categorizing (review-labeling) document in the collection. Another factor to consider is the number of times the rules are modified (otherwise known as “iterations”) for the attorneys to have confidence with the process. A typical project breaks down as follows:

 2-3 days to configure and run the rules. Simultaneously, Valora is ingesting the data and running it through our proprietary process to extract and tag attributes about the individual documents.

 1-2 days of run time for applying and performing quality control on the rules. This is the step that often iterates as SMEs discover new elements of the documents or the case.

 Final QC using law firm or contract attorneys is optional. If elected, the percent of documents reviewed will drive the estimated time to complete. With Valora’s TAR process, typically <5% of the documents are reviewed manually.

Valora recommends that 1 Sr. Associate or 1 Partner participate in the process. This person should be someone who has an intimate knowledge of the case and pertinent issues driving discovery review. This attorney should be available to provide feedback to their designated PM regarding how documents are being automatically coded. The success of the review is highly contingent upon collaboration between Valora and the attorney.

 Some of Valora’s clients also utilize contract attorneys to perform quality control after the final set of rules has been applied and the work product has been delivered back to the client. The number of attorneys needed is driven by the deadline as well as the % or # of documents your team believes should be reviewed by humans. With Valora’s TAR process, typically <5% of the documents are reviewed manually.

Since our inception, Valora has been providing services powered by our proprietary pattern-matching technology. Valora’s uses Hierarchical Probabilistic Context-free Grammars, a type of Natural Language Processing. Our PowerHouse™ system (Valora’s proprietary processing platform and administrative dashboard) uses a voting engine to identify syntactic and semantic elements of a document (e.g., phone number or person’s name) and combines these elements into progressively larger hierarchical grammar units. It uses probabilities associated with each level in the hierarchy to select the matching recognition patterns.
Absolutely! Valora has a visualization tool, BlackCat™, which allows users to drill into their dataset to uncover recurring themes, relevant issues and other pertinent information. During the visual review in BlackCat, Valora’s PM will work with the SME attorney to build an appropriate Ruleset based on what was uncovered in their analysis of the data. From there, our standard process for TAR commences.
Our process of tagging documents is what makes Valora so unique. Our technology analyzes each document as it comes over the PowerHouse threshold, looking for unique and distinctive attributes about each of those documents that can be used at a later point to search for relevant or privileged content. Examples of the 90+ attributes that we extract and tag are:

 Document Type (memo, inventory report, patent application, etc.)

 Tone of a document (neutral, hostile, informal, formal, personal etc.)

 Actual dates and authors (not relying on metadata)

By tagging documents in this way, Valora essentially obtains a DNA sequence for each document that can later be used for analysis and assessment. Valora’s tag list is a result of more than a decade of document-tagging experience. As a general rule of thumb, if there is some element of information present on a document, Valora’s system can identify and tag it. Interestingly, Valora’s tagging approach works on both scanned paper documents and ESI documents with limited or no metadata, such as attachments and native files. Valora tags all documents in a collection.

Primarily, the value is in being able to utilize the tags to find and classify what you are looking for in the dataset. With Valora, you have access to information that you wouldn’t normally be able to obtain from a manual or “predictive coding” style document review. While Valora’s Ruleset is utilizing both content and tags, each of the tags collected can be made available to you in your database, if desired. This would provide you with useful ancillary information, such as who communicates with whom and which highly unusual concepts are present in a document, etc.

Ultimately, an automated TAR process replaces almost all of the doc-by-doc manual labor, which reduces the cost and time associated with document analysis. Furthermore, since Valora is primarily searching the tags when applying client rules, the speed of each iteration isn’t dependent on the size of the actual dataset. Typical results of each iteration are provided back for attorney comment within hours, not days!

Valora is not suggesting that the machine fully take over and perform the work of an attorney. Instead, our process is designed to reduce the amount of low-level manual labor required for document review, and leave the high-level analysis in the hands of SMEs. Consequently, the result is reduced cost and time and increased, measureable accuracy.

Many of our clients use Valora to perform first pass review to determine relevancy and issues as well as to identify potentially privileged documents. Once Valora has delivered the output to your review platform, prioritization of “eyes on” review can be established. There are many ways to do this. Valora can work with your team to design a plan that gets you to your comfort level. Additionally, we can assist you with sampling the dataset using statistically sound and highly defensible methods for testing quality.

After more than a decade of coding and reviewing documents, at Valora we understand that nothing in litigation review is static, except for maybe the deadline! We have designed our process to accommodate changes at any stage in the process. When new issues arise, the SME attorney working with Valora explains what they need to the assigned Valora PM. The PM modifies the RuleSet to reflect the changes, and begins a process iteration accordingly. All iterations are saved and available for review at any time. Iterations proceed until the SME attorney is satisfied and formally signs off on the results.

The best part is that issue- and accuracy-oriented iterations are free! Valora encourages clients to hypothesize about how they would organize and classify their collection in different ways, and to specifically test out different variables per iteration. There is no additional charge for this type of dataset exploration, nor for the addition or removal of issues and rules over time.

Regarding adding new data, this is easy for Valora. New data installments are incorporated into the existing workflow and RuleSet. Once the new data is ingested and tagged (see above) into PowerHouse, the most recent set of rules is applied to the dataset and the results are delivered to BlackCat and/or to the client. In cases where the new data installment has an effect on the output of the prior installments (as can happen for detection and labeling of NearDuplicates, for example), Valora will supply a fresh overlay of the results for the entire dataset, incorporating the results of the new installment. Valora’s BlackCat interface always reflects the current state of the dataset, and astute observers can actually see the new data installment being incorporated in real-time! New data installments do have an additional per document charge for ingestion and processing, applied only to the new data added to the collection.

Valora uses a stringent method of statistical sampling to ensure accuracy at all stages. The specific method used is called Stratified Sampling. Stratified Sampling is a method where documents are randomly selected from specific segments of the population. An example of a segment is Document Type (Chart, Cash Flow Statement, Lab Results, etc.). So instead of pulling random samples from the entire population, Valora pulls a representative sample from each document type as identified by Valora’s tagging technology at ingestion. We are essentially creating a “mini me” sample set from each document type, date range, issue concentration, and so on. (Remember the 90+ attributes we tag for? Those ultimately become the basis for the sampling strata.)

Each of the documents in the strata selected are then manually reviewed (audited, really) for accuracy by either Valora’s personnel or the client’s. If any errors are found, they are recorded and fed back into the system for re-iteration. Valora will modify the rules to rectify the error and kick off an iteration. This process repeats until no errors are found in the sample set. It typically takes 3-4 tries, which are often combined with iterations for new issues or new data installments.

Like all statistical sample, the number of documents pulled for each stratified sample is mainly contingent on the confidence level desired regarding accuracy of the coding as well as the acceptable margin of error. To some degree the original size of the dataset is factored but the impact on the sample size is minimal. For example, in a document population of 100,000 documents, if you want to be 99% confident in your accuracy, with a 2% margin of error, your sample size would be roughly 4,000 documents.

For many, this seems like too small of a number since it is less than 5% of the collection. For those not well-versed in statistical sampling, this can seem like an impossibly small sample size. Valora recognizes the gulf between statistical sampling and “comfort sampling.” For those who have concerns, Valora suggests that they they review the % that makes them the most comfortable. Typically,this is 10-20% of the population.