Structured data refers to data that is organized and formatted in a structured and predefined way, such as in a database or spreadsheet. Structured data is generally created to make it easy to access, search, analyze, and process by computers for computational purposes. Structured data can be difficult, voluminous, and complex for people to try and search or utilize directly, let alone govern. Common forms of enterprise structured data include billing and financial records, employee resource and HR information, product and knowledge databases, customer or patient records, and many more.
While structured data formats are commonplace, there are some governance headaches to watch out for concerning them, especially as the volume of the structured data grows in size and scope. In fact, per IDC’s annual study, “Structured Databases/Data Management workloads drive the largest share of enterprise IT infrastructure spending, at $6.4 billion of compute and storage hardware infrastructure to support this workload category.” [Source: IDC] Accordingly, here’s a look at some of the biggest governance headaches we’ve found associated with structured data:
Inventory & Classification
Many governance teams do not know exactly what data is housed in their various structured systems. In fact, there is considerable concern over whether they even have a full list of all the structured systems their organizations employ! Research from Forrester and other sources suggest that large organizations often have 20 or more structured data sources, each housing 5 TB of data or more.
This is a staggering amount of “dark” data to attempt to manage. To do so requires 1) finding and cataloguing all of the structured data repositories, 2) understanding in a defensible manner what type of information is stored in each repository and whether or not it should be kept, for how long, who should have access to it, whether any of it is under legal hold, and all the usual Info Gov parameters, 3) having a mechanism to “reach into” each of these structured systems to remove or modify data, as circumstances require, 4) having an ability to document the actions taken, and 5) being able to repeat this process on a regular basis, including on demand when needed.
Inventory needs around structured data tend to pose internal-facing governance headaches, such as significant, and at times, subject-expertise labor resource needs, costs and effort to attempt to document what is present, how it is being used (if at all), and what the costs (and limitations) of maintaining it are.
Compliance
Much of the structured data that organizations obtain, utilize, and store must adhere to regulatory requirements, such as those outlined in the CCPA (California Consumer Privacy Act), HIPAA (Health Insurance Portability and Accountability), and the GDPR (General Data Protection Regulation). Each of these regulations has strict guidelines on issues such as how data is obtained, utilized, updated, stored, mined, and deleted, as well as how to maintain data privacy throughout the data lifecycle.
Compliance regulations pose governance headaches for organizations because failure to follow them can result in heavy fines, sanctions, whistleblower actions and revocation of business licenses. Organizations may also lose consumer trust for failing to comply, resulting in some reputational damage.
Sadly, many structured systems have no mechanism for lifecycle management of their contents. It is simply not a concept they utilize and there is no mechanism for tagging or tracking data with retention windows or record class designations. Many also lack the ability to put content under legal hold. This tends to promote a “keep everything forever” mentality, which can fly directly in the face of privacy, consumer and patient regulations, while also increasing hosting, management and upgrade costs.
Similarly, most structured systems have no mechanism for tracking data provenance or lineage. The data is simply “added,” with no indication of its source or consent and no indication of which connecting systems, agents, or crawlers may be utilizing it for downstream purposes. Again, where data comes from (including how and why it was obtained) and how it will be used beyond its storage system are fundamentals of compliance auditing.
Quality
Data quality is one of the biggest concerns associated with structured data, as few organizations keep their structured data up to date. Many enter or ingest that data once, at its origin or acquisition, and do not update it as time passes, new events materialize, or new regulations come out. Often, there is no accountability or ownership for structured data accuracy at all. IT owns the systems, and IG owns the records aspect, but whose job is it to ensure the data is accurate and up to date?
Structured data quality further degrades with common issues such as duplicate records, incorrect data entry, and incomplete information. When personnel (and increasingly, their AI proxies) rely on the quality and integrity of their structured data for accurate analysis, reporting, decision-making, LLM development, and in designing efficient business processes, poor quality data can cause untold number of headaches!
Standardization
Another big information governance headache associated with structured data involves the collection and storage of data across different departments and systems in larger organizations. Many of these organizations don’t include a standardized method for any data storage and classification, which means that structured data within a single organization could contain many (and potentially contradictory) data formats. This also leads to a lack of standardization in governance policies.
What are the repercussions of this? Well, this creates data silos that can cause severe headaches, including inconsistent or point-blank negligent governance. The fix here is to standardize data governance policies, including taxonomies, across the organization and to encourage better collaboration between departments, potentially via the IG team serving as the common bond.
Security
Large volumes of structured data can cause security issues because as data repositories grow, the “attack vector” grows with it, and the risks and impacts of cyber threats increase. The goals here are 1) to make sure that sensitive information is protected, even as the volume grows, and 2) to remove as much ROT as possible, as soon as it is defensible to do so. While standard security tactics including access control procedures, encryption, multi-factor authentication, and data masking techniques all help, they are generally oriented around preventing the intrusion or penetration of bad actors. They do not help with the nature or disposition of the data itself. For that you need strong classification and adherence to disposition polices, as well as auditable actioning.
In general, maintaining large volumes of structured data to prevent these governance headaches takes constant vigilance and purpose-driven tools. Stay tuned for our talk at InfoNext on April 29th on the governance headaches associated with Structured Data.