Release Notes

Open Raven Platform Release: Streamlined AWS S3 Data Classification

Bele

Chief Corvus Officer

March 12, 2021

We are pleased to push a release this week which includes a slick, streamlined AWS S3 scanning experience and a load of other improvements across the platform:

Streamlined AWS S3 data classification scan experience, which expedites the setup of sensitive data scanning in S3 buckets
Support for one time scans
Extended Apache Parquet files capabilities and improved handling of very large files
Filtering by supported MIME types when building a data inventory
Updates to data classes: ODBC/JDBC database connector, Facebook API Tokens

Streamlined AWS S3 Data Classification Scanning

Our updated approach to data classification scan jobs provides an efficient way to configure discovery of sensitive data in your S3 buckets. This new interface, combined with our expansive predefined set of data classes, gives you the fastest way to start a scan and identify problems with minimal effort.

Data Scan Jobs with a table for name, description, schedule, restrictions, data to find, and status.

‍

Navigate to Scans → Data Scan Jobs to find a new experience for creating data scan jobs.

Create new scan job panel with the details to add on the left and a table with S3 buckets to select on the right.

Upon creating a new data scan job, you’ll be presented with a straightforward, single screen for setting up a scan. You can now specify:

Scan schedule. Set to run once or repeat as often as every hour.
Data to find by data collection. Use any one of our pre-configured data collections or create your own data collection.
File types or names to ignore. The job scans for all of the supported file formats by default, but you can narrow this down however you would like.
Sample size. The job scans all files by default, but you can scan a subset if you prefer (e.g., for faster completion).

Finally, select the S3 buckets you wish to analyze from the complete list already found by our discovery engine. You can filter the list by AWS account ID, AWS region, or by a few familiar S3 bucket security configurations:

Public accessibility
Encryption status
Back-up status

You will also notice two bars at the top of the page which measure the S3 bucket volume to scan to make sure it is properly sized for successful completion.

Additional Improvements

Also included in this release:

Extended support for scanning Parquet files and improved handling of scanning very large files
Filtering by supported MIME types when building scan inventory
Updates to data classes: ODBC/JDBC database connector, Facebook API Tokens

We have continued our progress in supporting scanning of even larger Apache Parquet files, and very large files in general. Some of this work requires sophisticated techniques such breaking files into chunks (“file chunking”). Read the multi-part blog post here.

We understand that not all files in your S3 buckets will have file suffixes (e.g. .log, .parquet, or .txt) and in many cases, have no file extension at all. No problem. We have improved the scanning engine’s ability to determine file type with extended MIME type analysis, as well as using file extensions when they’re desired.

Finally, and just as important, a few incremental updates were made to two important developer credential data classes: the JDBC / ODBC database connector string and the Facebook API token. The database connector class now captures a larger set of strings, including those used for Redshift, MySQL, PostgreSQL, SQL Server, and many more. Our team is committed to ensuring the accuracy of these data classes and the changes here boost accuracy.

Return to the blog

Ready to get started?

Request demo

Open Raven Platform Release: Streamlined AWS S3 Data Classification

Streamlined AWS S3 Data Classification Scanning

Additional Improvements

Get stories about data and cloud security, straight to your inbox.

Open Raven Platform Release: Classification and Account Management

Open Raven vs Macie: Data Classification Benchmarking - Part 1

Ready to get started?