Finding Sensitive Data in Cold AWS S3 Buckets
S3 buckets are unbounded storage. S3 buckets are simple to use. S3 buckets are cheap to use. S3 buckets are reliable and durable. S3 buckets are awesome.
But (and there is always a but isn't there?) because they are so awesome people spin them up left, right, and center and more often than not, leave them hanging around. Hackers are lazy and will take what they can get with little effort. If there is an open bucket with six month old credit cards or one year old PII, they will take it.
Smart security teams are adopting the strategy of looking for sensitive data that is not being used and either deleting or moving it to secure backups.
Reduce the attack surface
Big bang for the buck and remarkably low effort. Here is how you do it.
First, classify your S3 buckets using Open Raven.
Once the scans run, you will get a data catalog showing you all of the data across all of your buckets.
You can also drill into any bucket and see exactly what's inside it.
And you can even look at the data inside the files.
Now you know what data you have and the ability to inspect it, but you want to know what's “cold”. This is where AWS Storage Lens comes into play. Storage Lens can be enabled across an AWS Organization and provides, among other things, the ability to see when buckets were last accessed. Storage Lens itself is free but you will need to pay for the Advanced Metrics (see below) to find cold buckets.
After enabling Storage Lens, you need to enable those Advanced Metrics which gives you access to the activity metrics.
Metrics like GET requests and download bytes are solid indications of how often your buckets are accessed each day and trending this data over time (and yes you will need to pay even more to extend data retention) will help see buckets that are no longer being accessed. You can even do clever things like see how much of a bucket is being accessed using the % retrieval rate metric, computed as bytes / total storage.
If you use the metrics of Total storage, % retrieval rate, and Avg. object size, any buckets with a retrieval rate of zero (or near zero) and a large relative storage size are prime candidates for buckets that have gone cold.
In the screenshot below, you have Athena query results that will almost definitely have sensitive data hanging around just waiting to be stolen if you have a security misconfiguration or an insider threat.
Now, you can go back to the Open Raven data catalog and see exactly what type of data is in the bucket and make an informed decision.
If S3 stands for Simple Storage System maybe I should coin C3 Crazy Cold Catalog.