Frequently Asked Questions
Don’t be shy! We get asked these things a lot. Here are a few common questions to start you off.
2-3 days to configure and run the rules. Simultaneously, Valora is ingesting the data and running it through our proprietary process to extract and tag attributes about the individual documents.
1-2 days of run time for applying and performing quality control on the rules. This is the step that often iterates as SMEs discover new elements of the documents or the case.
Final QC using law firm or contract attorneys is optional. If elected, the percent of documents reviewed will drive the estimated time to complete. With Valora’s TAR process, typically <5% of the documents are reviewed manually.
Some of Valora’s clients also utilize contract attorneys to perform quality control after the final set of rules has been applied and the work product has been delivered back to the client. The number of attorneys needed is driven by the deadline as well as the % or # of documents your team believes should be reviewed by humans. With Valora’s TAR process, typically <5% of the documents are reviewed manually.
Document Type (memo, inventory report, patent application, etc.)
Tone of a document (neutral, hostile, informal, formal, personal etc.)
Actual dates and authors (not relying on metadata)
By tagging documents in this way, Valora essentially obtains a DNA sequence for each document that can later be used for analysis and assessment. Valora’s tag list is a result of more than a decade of document-tagging experience. As a general rule of thumb, if there is some element of information present on a document, Valora’s system can identify and tag it. Interestingly, Valora’s tagging approach works on both scanned paper documents and ESI documents with limited or no metadata, such as attachments and native files. Valora tags all documents in a collection.
Ultimately, an automated TAR process replaces almost all of the doc-by-doc manual labor, which reduces the cost and time associated with document analysis. Furthermore, since Valora is primarily searching the tags when applying client rules, the speed of each iteration isn’t dependent on the size of the actual dataset. Typical results of each iteration are provided back for attorney comment within hours, not days!
Many of our clients use Valora to perform first pass review to determine relevancy and issues as well as to identify potentially privileged documents. Once Valora has delivered the output to your review platform, prioritization of “eyes on” review can be established. There are many ways to do this. Valora can work with your team to design a plan that gets you to your comfort level. Additionally, we can assist you with sampling the dataset using statistically sound and highly defensible methods for testing quality.
The best part is that issue- and accuracy-oriented iterations are free! Valora encourages clients to hypothesize about how they would organize and classify their collection in different ways, and to specifically test out different variables per iteration. There is no additional charge for this type of dataset exploration, nor for the addition or removal of issues and rules over time.
Regarding adding new data, this is easy for Valora. New data installments are incorporated into the existing workflow and RuleSet. Once the new data is ingested and tagged (see above) into PowerHouse, the most recent set of rules is applied to the dataset and the results are delivered to BlackCat and/or to the client. In cases where the new data installment has an effect on the output of the prior installments (as can happen for detection and labeling of NearDuplicates, for example), Valora will supply a fresh overlay of the results for the entire dataset, incorporating the results of the new installment. Valora’s BlackCat interface always reflects the current state of the dataset, and astute observers can actually see the new data installment being incorporated in real-time! New data installments do have an additional per document charge for ingestion and processing, applied only to the new data added to the collection.
Each of the documents in the strata selected are then manually reviewed (audited, really) for accuracy by either Valora’s personnel or the client’s. If any errors are found, they are recorded and fed back into the system for re-iteration. Valora will modify the rules to rectify the error and kick off an iteration. This process repeats until no errors are found in the sample set. It typically takes 3-4 tries, which are often combined with iterations for new issues or new data installments.
Like all statistical sample, the number of documents pulled for each stratified sample is mainly contingent on the confidence level desired regarding accuracy of the coding as well as the acceptable margin of error. To some degree the original size of the dataset is factored but the impact on the sample size is minimal. For example, in a document population of 100,000 documents, if you want to be 99% confident in your accuracy, with a 2% margin of error, your sample size would be roughly 4,000 documents.
For many, this seems like too small of a number since it is less than 5% of the collection. For those not well-versed in statistical sampling, this can seem like an impossibly small sample size. Valora recognizes the gulf between statistical sampling and “comfort sampling.” For those who have concerns, Valora suggests that they they review the % that makes them the most comfortable. Typically,this is 10-20% of the population.