The average organization has 240 Tb of data under management at any given time[1]. For large organizations (10,000 employees or more), that number balloons to 150 Pb, or 15 TB per head[2]! In fact, our collective data stores increased by 42% just in the last 18 months!
There is no question that much of this enterprise information is of questionable value, particularly since significant portions of it are old, unused, or duplicative, what we commonly call “ROT[3]” in the information governance space.
In his famous, timeless joke, Stephen Wright said, “It’s a small world, but I wouldn’t want to paint it.” In other words, these data volumes defy manual management techniques. We simply are not keeping up and the sheer volume of this data is overwhelming us.
In Valora’s 25 years of operation, we’ve found most organizations have ROT levels between 40-50% of their total data stored. That’s more than 75 PB, if you’ve been keeping track here!
So, what is the true impact of having so much data, so much ROT, in our companies? Is it really so bad or simply a nuisance? What are the real costs of having so much ROT?
There are three main impact areas of having excessive ROT floating around: Costs, Risks, and Efficiency.
Costs
Naturally, there are hard costs and soft costs to managing so much enterprise data. Let’s start with the pure, hard costs of data storage. Whether your data lives entirely on prem, fully with the cloud, or some hybrid between the two, you are likely facing between $10-50 million dollars in storage costs annually[4].
With close to half of this data being useless ROT, that represents the potential for $5-25 million dollars in savings, if only this data could be catalogued, assessed and removed – what we call data/ROT remediation.
There are 2 kinds of soft costs to outline here: those that involve regular, or routine, reasons to work through enterprise data, and those that involve single, or more event-driven reasons to go through enterprise data. Yet another DSAR or FOIA request is a good example of routine data review, and a data breach or litigation request is an example of a more event-driven effort. To best evaluate soft costs, calculate the number of person-hours to perform the tasks, and multiply by the fully loaded costs of those people. If there is any additional cost to the time delay of performing those tasks, then add that in as well. As a general rule, routine tasks typically cost an organization $400,00 – 1,000,000 per year and event-driven tasks typically cost $500,000 – 1,500,000 per year, depending on their severity.
With ROT comprising close to half of most enterprises’ data, there is a clear and strong ROI to be able to remove close to half of these costs annually!
Risks
While there are multiple risks to storing useless data alongside and intermingled with the valuable data, three big areas stand out for their scale and impact: Compliance Risks, Security Risks and Operational Risks.
Compliance risks are those that arise due regulatory or legal/litigation activity. The more data your organization holds, the greater the burden to assess it for litigation document requests and productions. This directly results in higher eDiscovery costs for document review, data hosting and production expenses.
Furthermore, the risks of non-compliance against regulatory requirements for proper data collection, use and storage can be even greater, as there are costs related to both the holding of the data itself and the penalties for keeping data you should not have!
And finally, for those organizations who must routinely respond to regulatory reporting requests, such as pharmaceutical GxP and HIPAA requirements, having excess ROT complicates your efforts to produce the most accurate and appropriate data on a regular basis.
Security Risks of maintaining excess data “bloat” (i.e., ROT) are obvious. Simply stated, the more data you possess, the greater the attack vector for bad actors. More ROT = larger vector. By identifying and removing ROT, you literally reduce the playing field for breaches and other malfeasance. Furthermore, identifying compromised data after a cyberattack is a much harder and costly endeavor with the ROT still mixed in.
An interesting, and often overlooked, security risk concerns vendor risk. Are your third-party vendors retaining only necessary data and complying with your retention policies? How are they handling their own ROT, as well as yours?
Efficiency
Having too much excess and useless data hampers everyday business activities and stymies people trying to do their jobs. So much so, that people will take steps to actively circumvent the status quo – including policies and requirements – just to do perform their work. When it takes people 2-3x times as long to locate what they need (often due to over-preservation of ROT), they will either abandon the effort or re-create the content from scratch – costly (and potentially risky) efforts.
One of the biggest obstacles to operational efficiency is ROT. Combatting ROT is more than simply putting a ticking time window on files. For personnel accustomed to using their data their way, they will simply circumvent the tools and policies in place. This leads to an openly unsanctioned data “black market.” Examples include copying out email files into PSTs so that emails can be kept past a 90-day or 120-day auto-deletion and copying folders and files to desktops and OneDrive folders to make “personal backup copies.”
Asking people to manually manage their content is akin to asking them to walk to work or wash their clothes by hand. It’s simply too overwhelming and taxing to bother with. The results of that kind of misguidance are reduced efficiency, higher costs and unnecessary risks. Join us for an in-depth webinar on automated ROT remediation. Please visit this page for the replay.
[1] Rubrik Zero Labs, State of Data Security Report, November, 2023
[2] Hitachi Vantara, State of Data Infrastructure global Report, 2024
[3] ROT stands for Redundant, Obsolete or Trivial information.
[4] Assumes ~ $60,000-$300,000/PB/mo all-in (including redundancy, archival/cold storage, active use, read-write operations, and so on).