In the first session of Mark’s new blog series, we’ll be covering who Mark is, where he came from, and what to expect from this series.
Creating Structure from Unstructured Data Webinar Recording
Join industry experts Karen Frazier and Sandra Serkes for part two of the 4-part webinar series.
Part 2 in our AutoClassification webinar series focused on the details of creating structure from unstructured data.
In this session, we explored:
- The difference between structured, semi-structured and unstructured data
- How to make sense of large piles of unclassified content, how to generate context, and how to leverage line-of-business SMEs to assure quality classification
- Strategies for choosing a good place to start a content assessment project
- Tips, tricks and pitfalls learned from the field when conducting content assessment projects
- File type challenges, the value of OCR, clustering, and more!
View the PowerPoint featured in this video here.
Why is the General Data Protection Regulation (GDPR) Important?
As of May 25, 2018 the European Union (EU) will alter business requirements for companies that possess personal information pertaining to EU residents. The General Data Protection Regulation (GDPR) applies to any company doing business with customers in the EU, and will have a far reaching impact, greater than many corporations realize. While a primary purpose of the GDPR is to harmonize data privacy protection regulations across the various EU member nations, the potential business interruption for organizations around the world that will result from these new standards is a serious concern.
The applicable fines set forth in the GDPR for failing to comply with regulations are significant. Corporations that handle EU customer data, regardless of where the company is based, can face up to EUR 20 million (approximately $22.3 Million U.S. Dollars) in fines, or 4% of their total global revenue for the preceding fiscal year, whichever is higher, for GDPR noncompliance. Hence, if a company has customers based in the EU, these GDPR requirements must be taken seriously.
The GDPR data protection protocols must be in place for “Personally Identifiable Information” (PII), of all living EU citizens, regardless of where that information is sent, processed, or stored. In addition, the company possessing such PII data must have a process in place to verify and prove that valid protections exist. Corporations are not exempt from the GDPR simply because they don’t have offices there, or don’t process data in the EU. The EU’s concept of data privacy differs greatly from the United States’, but U.S. based corporations doing business with EU citizens will still have to adhere to the strict requirements of the GDPR. The impact of the GDPR reaches nearly all companies, including many who are seemingly unaware of its regulations. In certain specific circumstances, companies must create a position of “Data Protection Officer” (DPO), whom will address GDPR compliance. Hence, the costs to prepare for compliance will include requirements for trained personnel and financial investment in technology.
Having a means to comply with the stringent requirements of the GDPR is no simple task. Planning is required to comply, which is why the regulations are not meant to take effect until May 2018. Some of the complex issues that need to be addressed for GDPR compliance include:
- How electronic information is stored, transferred, accessed, and secured.
- Document retention schedules, and how they are enforced.
- Written proof of compliance.
Creating an effective compliance strategy will be costly and many companies have not set aside money in their projected annual budget for the funds required to address these concerns, which means they will come from emergency or other contingency planning budgets.
Those corporations who have already begun to address their information management capabilities in general will have an advantage in complying with the GDPR requirements. Many of the key elements of a corporate “Information Governance” (IG) plan are related to the issues of concern for GDPR compliance. The ability to manage information, and address data governance, corporate risk, and regulatory compliance, are existing concerns for corporations, notwithstanding the GDPR. Existing technology for cybersecurity and “Data Loss Prevention” (DLP) can also be utilized to help prepare for the GDPR. Moreover, search and retrieval technology and techniques used for eDiscovery purposes also serve as a means to assist in managing information. The illustration below from Susan Bennett, of Sibenco, provides useful insight into aspects of information governance, many of which help address the specific needs of the GDPR.
(Source: “What is Information Governance and How Does it Differ from Data Governance,” Sibenco, 2017, Information Governance vs Data Governance)
Handling sensitive information, such as PII, is a challenge that pertains to both IG and GDPR compliance. Restrictions imposed on the transfer of PII by the GDPR can be addressed by the use of technology. Identification of sensitive content within a business record, and the ability to redact portions of content, can impact whether that specific file is transferrable under the rules set forth by the GDPR. Having a means in place to identify the content of data will be essential for GDPR compliance. In addition to IG protocols, “Knowledge Management” (KM) practices will also enhance a corporation’s ability to comply with the GDPR. The ability to garner business intelligence about the information the corporation possesses will serve as a significant advantage for GDPR compliance. Knowledge of not only files in possession and control of a business, but also about the content level within those files, will be a prerequisite for doing business with EU customers.
Addressing GDPR Compliance
Since the GDPR specifically requires the ability to prove data protections are in place, documentation of existing privacy safeguards is essential. All documentation and processes must clearly address issues such as: where is the data; what type of data exists; who has access to the data; what is in the data content; how is data stored; how is data transferred; how is newly created data incorporated? Without answers to these questions, GDPR compliance is impossible.
Below are suggestions for IG best practices which can be specifically implemented to address the requirements of the GDPR:
Data Mapping. If a DPO does not know the location and/or the contents of corporate data, it is impossible to fully protect that information per the GDPR requirements. The need for data mapping is rather obvious since the risk of non-compliance is too high without the knowledge of location of all the corporate sources of data. If the data map for the corporation is incomplete or inadequate, a discussion with the I.T. stakeholders in the company should take place to update this information. Collaboration between I.T., management, and the corporate legal department, in order to create a comprehensive data management plan is a vital step toward GDPR compliance. Any corporate data stored by third-party providers, including cloud services vendors, or data archival companies, requires attention. The data in the possession of third-party providers is also subject to the GDPR regulations applicable to the corporation, including information retained by outside counsel law firms. If data in possession or control of the corporation contains PII of EU citizens, GDPR compliance requires steps to protect such information.
Understanding File Contents. Many corporations seem ill prepared for the requirement to know the contents of their internal data. Knowing where data resides is only part of the equation. A corporation must also know what the data is and contains. For example, are the files legally binding in nature, such as contracts and agreements? Do the files contain any sensitive data, such as PII or PHI?
Consent. A key requirement of the GDPR is the need to obtain specific consent from an individual before being obtaining, storing or utilizing their personal data. The corporation must provide a clear affirmative action or statement providing permission to process the individual’s data. In addition, the GDPR establishes that the individual has a “Right to be Forgotten,” and can request their personal information be explicitly removed from use. Without some other legal reason to process an individual’s information, the corporation must respect a request to delete data without undue delay.
Information Request. On a similar note, an individual has a right to request access to the personal information being gathered and stored about them. The individual may request information from a company about any of their personal data, including: who has access to their information, how the data is accessed; where it is being accessed; and the purpose for which it is being accessed. Furthermore, an individual can also seek corrections about their personal data, if the EU resident feels the information is inaccurate. The individual may object to the use of their data for profiling by the corporation.
Retention Schedules. Enforcing corporate document retention schedules, while also maintaining proper litigation hold protocols, is already a challenge for many corporations. There are inherent risks associated with maintaining information when there is no legal obligation to retain possession of that data. An effective means of dispensing with specific information that is outside of an applicable document retention schedule is an important component for both IG and GDPR compliance.
Security Breaches. An overarching component of the GDPR is the need to provide cybersecurity protections to prevent data breaches, as well as express provisions regarding notifications of data breaches to both the supervisory authority and to individuals whose information has been exposed. Hence, corporations must not only be aware when a breach has occurred but also must have a means to notify those impacted by the breach of what specifically was exposed.
Data Transfer. The GDPR places explicit restrictions on transfers of personal information. Corporations must have an enforceable plan to prevent unauthorized data transfers, and the GDPR puts forth stringent requirements regarding data transfers to locations outside of the EU. Whether a data transfer is permissible under the rules of the GDPR, will require answers to a series of queries about the content of the information. If PII, or otherwise sensitive information, exists in the data at issue, additional restrictions will be applied, possibly revoking permission for the transfer of that information. An entire file might be improper to transfer under certain circumstances, thereby prohibiting access for persons outside of the EU to view such information. In other instances, a portion of the content of a file might block the permissible transfer, however if actions are taken to redact the specific content in question, the remainder of the file might be permissible for a data transfer.
What Is Auto-Classification and How Does It Assist with GDPR Compliance?
It is clear that properly managing all data in a corporation’s possession to comply with GDPR regulations is an extremely onerous task for most businesses. The GDPR requirements necessarily create an increased reliance upon automation in order to properly manage the lifecycle of corporate information.
The explosion in the volume of data in the possession of corporations has already led to the advancement in various technologies that assist managing information. Corporate best practices for IG, KM, E-Discovery, compliance and cybersecurity, all provide guidance for the use of technology which help address GDPR regulations. One particular automation technology that will serve as a tremendous asset to corporations struggling with GDPR mandates is referred to as “Auto-Classification.”
Auto-Classification Software data mines information at the content level, and then categorizes files based on the information’s substance. This technique is already being utilized by many corporations as part of their IG strategy. Auto-Classification’s ability to group information by category or by specific characteristics will prove useful for GDPR compliance. Similarly, Auto-Classification’s ability to detect the presence of PII and other sensitive content will likely become a best practice when it comes establishing GDPR protections.
One impediment in complying with GDPR is the vast amount of “Dark Data” currently residing in most corporate networks. Dark data is information existing on shared file servers, or in employees email inboxes whose content or purpose is largely unknown. Auto-Classification helps manage unwieldy unknown information and sheds light on the contents and origins of such data. Corporations utilizing document management systems (DMSs) or enterprise content management systems (ECMs) rely on Auto-Classification to categorize files outside of the document/content management platform, subsequently placing that information into folder-level taxonomies within their systems.
Auto-Classification software uses both pattern-matching algorithms as well as artificial intelligence to detect file contents and attributes such as: personal information; authorship and origin; type or format of document; and expected retention period. In addition, Auto-Classification technologies are configured follow a set of customized rules regarding file disposition. For example, a rules-based Auto-Classification system will enforce a specific document’s retention schedule, and then place the file into the proper folder taxonomy structure. Auto-Classification technology specifically meets the GDPR requirements to have a system in place that can detect what information it has, where it lives and how it will be handled under differing circumstances.
With a proper Rules Engine, sensitive information is protected via individual security level restrictions, including limitations based on the geographical location of the user attempting access. Rules are also used to block improper information transfer to locations outside the EU. Furthermore, rules are used to trigger certain events, such as an expiration date associated with certain data which would make such information eligible for deletion.
Conclusion: Advantages Of Using Automation For GDPR Compliance
While compliance with GDPR regulations will be no small task for most enterprises, the use of automation makes the task more manageable. Though not every organization is as proactive as they should be, there is still time for those companies to prepare for the GDPR regulations, and avoid the imposition of fines. Enterprises that have been more proactive in automating their IG strategies are in a better position to comply with the GDPR than others. Companies most likely to avoid fines are those with a DPO in place, who can document the automated steps taken to provide the required protections to personal sensitive data. Similarly, corporations with established IT security protocols and passed audits will have an easier path toward GDPR compliance.
Return on investment is often a key metric required by corporations before they approve expenditure of funds. While companies may have been reticent about investing in IG technology previously, the GDPR requirements serve as a stark turning point to that strategy. The potential for business interruption caused by the GDPR, not to mention its stringent fines for non-compliance, prove out any return on investment calculation several times over. Furthermore, the benefits derived from improved information management techniques assist not only GDPR compliance, but also corporate efficiency and knowledge management capabilities.
Certainly technology is creating some unique challenges for business. Protecting the privacy of individuals is increasingly difficult as the volume of personal data in possession of corporations continues to explode. However, through intelligent use of a proper combination of people, process and technology, the challenges of GDPR compliance can be adequately met. Conversely, waiting for the deadline of May, 2018 to approach without taking steps to address that challenge could prove very costly.
Lack of preparedness for GDPR is an alarming concern. According to a Symantec survey in 2016, “91% (Ninety-One Percent) of 900 business IT decision makers polled in the U.K., France, and Germany have serious concerns about their ability to be compliant by May 2018. The attention paid to the looming threat from the GDPR’s effective date May 25, 2018, will only grow as that date approaches.
Valora in the News
Webinars, Events, and Appearances
The long awaited book by numerous industry luminaries features Sandra Serkes’ chapter on “Predictive Analytics for Information Governance.” Order your copy here.
Featuring Sandra Serkes and Nick Inglis, Information Coalition President
Valora Technologies CEO
Valora Product Updates
- Rules Engine now supports Data Driven Rules, which automatically generates rules from data lists, retention policies, and corporate databases and ERP systems.
- New tunable Confidence Level Rankings available for all metadata extraction and rules dispositions in PowerHouse. Tune processing to you level of comfort!
- Full integration with Tesseract OCR from Google, versions 3.0 & higher, for better text recognition and search results.
- AutoRedaction now supports full, partial, transparent and blackbar redactions, with full or black & white TIFF images.
- Point of View Pivoting toggles the center of reference across any important attribute, such as Document Count, Employee Count, Incident or Location.
- Automatic link support from HR and ERP systems, such as UltiPro, take users directly from a database record to the supporting documentation behind it.
- Custom maps display geographic data across a wide array of visualization options, from global to street-level, with many custom options.
- Global editing lets user make single metadata changes or additions and have the apply everywhere applicable.
“DLP” (Data Loss Prevention, also referred to as Data Leak Prevention), is a term referring to the use of technology to protect confidential data from being shared with unauthorized parties. DLP systems monitor data in use, at rest, and in motion, seeking to prevent data breaches in real time. DLP technology relies on algorithms that determine which data transfers to block.
Why is DLP important? The value of information cannot be understated, and the risks associated with the exposure of sensitive data gives rise to the need for improvement in DLP practices. Due to growing concerns regarding issues such as: corporate espionage; cybersecurity data breaches; and changes in data privacy obligations, DLP technologies are a required business component for corporations and law firms. DLP protocols ensure that the flow of information within and outside of a corporate enterprise, or a law firm, follows an established path.
An element of DLP workflows requires interactions with other systems, such as a DMS (Document Management System), and/or a CMS (Content Management System). DLP protection relies on some other intelligence about the data in order to determine permissible transfers of information. DMS protocols established to determine where the files reside are often used to share information with the DLP technology, however, this does not generally address the specific content within each file. CMS technology can be used to identify content within files that require the DLP to block the transfer of that information. Determining “What” the file’s contents are is the first step in crafting a plan to protect such material. Files might require varying levels of security based on its content. Transfer of certain information might never be permissible outside the corporation by the DLP, while other transfers might be permissible but only under some limited set of circumstances.
Many corporations and law firms rely on their users to create and assign “tags” to newly created files. The level of sensitivity of a information may rely upon user classification to help determine the level of sensitivity of the individual file, or portions thereof. However, manual user generated classification is often inaccurate, and not an efficient process since it reduces employees productivity. Using technology to auto-classify documents help improve the DLP technology’s performance. Auto-classification is programmable to identify certain terms or phrases within a document that would trigger the DLP protections.
One of the shortcomings of DLP protocols is that there are a high number of “false-positive” incidents. These false-positive occurrences require I.T. involvement, and also delay transmission of information which can frustrate employees, thereby reducing productivity. By having files classified properly in a DMS, and using proper auto-classification technology to organize information, the frequency of false-positives that trigger the DLP protections is substantially reduced.
DLP systems are only effective if they have accurate knowledge about the data it’s trying to protect. Through the use of effective information governance and file classification practices, the performance of a DLP system can be dramatically improved. In addition, technology can be customized to add further enhancement to DLP practices. Certain files might be permissible for transfer if specific material is redacted. Technology can be employed to locate and auto-redact sensitive content. Through the use of automated redaction, DLP systems will permit transfers of data, while still preventing unauthorized sharing of specific sensitive information. Auto-redaction technology can reduce the burden that DLP systems impose on I.T. personnel by reducing false-positives, and also increase business productivity by allowing transfers of content that employees are authorized to share.
Valora’s unique proprietary technology, “PowerHouse”, serves to fill many of the needs that DLP systems require to function effectively. PowerHouse identifies the content of the data, and provides intelligence about each individual file or point of data. Valora’s technology is a “Rules-Based” system that can be custom configured to program the specific types of information that a corporation or law firm deems to require DLP protection. The Rules within PowerHouse include algorithms and elements of pattern matching recognition, and are used to auto-classify information, assigning categorization tags to files. The information classified by PowerHouse can be integrated into any DMS, and can also work in conjunction with other CMS software.
Valora’s PowerHouse not only works at the file level, but also at the content level within each document. Hence, the DLP can rely upon the auto-classification provided by PowerHouse to determine what information requires extra levels of protection. Using PowerHouse increases the efficiency of not only the DLP, but also enhances the performance of any DMS. In addition, PowerHouse increases business productivity by increasing the efficiency of data transfers. PowerHouse reduces the amount of false-positive incidents attributable to the DLP. In addition, Valora’s technology not only seamlessly enables the user to determine “What” the information in their possession is, but also helps enforce “How” that content needs to be handled by the DLP.
Moreover, PowerHouse enables the DLP to determine which individual users might have access to view transferred data, by enforcing established security permission levels. Hence, a DLP might permit internal transfer of information between certain individuals, but restrict others from having access to files, or specific portions of any document.
Since most DLP technology does not have the ability to determine what the contents of a file are, relying on PowerHouse to serve this function is an effective automated solution. In addition, PowerHouse is able to classify both structured and unstructured data. The classification provided by PowerHouse remains with each point of data, while it is at rest, in use or in motion.
Should you wish to learn more about Valora Technologies, and our proprietary solutions such as PowerHouse, please feel free to register at Valora’s website for our free resource information.
Guest Blogger: Joe Bartolo, J.D.
By Greg Buckles
It seems that I was not the only one interested in how Valora’s PowerHouse could address the sprawling corporate digital landfills. DTI has made a minority investment in Valora to provide the functionality to their Information Governance clients. In the M&A world, a minority investment from what is essentially a giant channel partner is ‘neither fish nor fowl’to use a 17th Century idiom for something that is not easily categorized. We understand acquisitions, but Valora remains an independent organization from DTI and maintains its woman-owned status. The real question is how this will affect the nascent partner channel that Valora had just started to cultivate. Will DTI competitors be willing to use PowerHouse when they imagine that DTI can undercut them on the license fees? Will Valora essentially become a captive technology that is resold exclusively as a DTI service? This is essentially what happened to Patterns when FTI acquired Attenex. I don’t think that DTI’s investment comes at the price of Valora’s independent status, but I do think that the Valora team will have to work a bit harder to reassure potential channel partners or non-DTI customers of that fact. This investment confirms my new webinar slide (below) that shows DTI having the largest number of acquisitions/investments in the eDiscovery market. Early M&A Impact poll and interview results are starting to paint a picture of the concerns facing consumers when their provider is acquired or takes a large investment that changes their Go To Market strategy. So take my poll to see the results and join the ILTA webinar by Duane Lites and myself on April 12th to hear our take on how the eDiscovery market is consolidating and how you can mitigate the risks.
Stay skeptical my friends!
Greg Buckles wants your feedback, questions or project inquiries at Greg@eDJGroupInc.com. He solves problems and creates eDiscovery solutions for enterprise and law firm clients. His active research topics include analytics, mobile device discovery, the discovery impact of the cloud, Microsoft’s Office 365/2013 eDiscovery Center and multi-matter discovery. Recent consulting engagements include managing preservation during enterprise migrations, legacy tape eliminations, retention enablement and many more.
Greg’s blog perspectives are personal opinions and should not be interpreted as a professional judgment. Greg is no longer a journalists and all perspectives are based on best public information. Blog content is neither approved nor reviewed by any providers prior to being posted.
Global eDiscovery and Legal Services Market Leader Invests in AutoClassification Information Governance Pioneer
Atlanta, GA and Bedford, MA – March 29, 2017 – DTI, a global legal process outsourcing (LPO) company providing eDiscovery, management services, litigation support, and court reporting, and Valora Technologies, Inc., the leading innovator in AutoClassification, Predictive Analytics and Document Data Mining Technologies for Information Governance, eDiscovery, and Records Management, today announced that DTI has successfully completed a strategic, minority investment in Valora Technologies, Inc. The investment marks the beginning of the coming-of-age of AutoClassification and the clear commitment of DTI to invest in leading-edge Information Governance solutions. Learn more.