Eliminate Shadow and Stale Data

Introduction to Stale Data

Michael Ness
Security Researcher
July 25, 2023

The first question to ask yourself is: what is stale data, and why is it relevant? Stale data is data rarely touched by anyone or anything, and its existence is a natural consequence of data sprawl. Sensitive stale data poses the question, why does this still exist? If you have rarely used sensitive data, does it need to be there, and what produced it? Answering these essential questions can only occur when you know the data is stale in the first place, and in answering these questions, you can further protect your organization. Non-sensitive stale data poses a different cost-related problem - if you are paying to host this data and it doesn’t contain any critical information, why does it exist? Identifying this data and answering the questions of where it has come from and if it is required can help save your organization money. The rest of this blog post will cover how we identify stale data here at Open Raven and alert customers about it for investigation.

Detecting Stale Data

In order to detect stale data, it is important to first look at the activity of different data stores to identify which are stale before looking at the data itself. This analysis is performed within the Open Raven Data Security Platform, which has an index of cloud-based data stores. The concept behind this is that we measure the activity of these data stores each day and calculate a staleness score. The staleness score ranges from 0 - 100, where the more stale an asset, the higher the score. We also provide context into what goes into the score, such as read and write activity. The score is normalized across the range of assets to ensure we find truly stale assets in the context of individual environments. Score context is currently communicated through rules and policies. Using the policy engine, users can query data stores using both sensitive data findings and stale data scores to find assets of interest. An example is shown below: a personal data finding has been combined with a stale data store to trigger an alert.

In viewing this alert, users can follow up and investigate why the data store with sensitive data is stale, if it has risk, and whether the data should be retained or removed. To help control or minimize costs, users can also ask the same question of non-sensitive stale data stores.

Conclusion

Stale data, sensitive or not, can have an impact on any organization, whether through introducing risk or cost. Open Raven can identify stale data and help security teams take important actions to reduce cost, risk, or both.

Don't miss a post

Get stories about data and cloud security, straight to your inbox.