Open Raven Platform Release: Streamlined AWS S3 Data Classification
We are pleased to push a release this week which includes a slick, streamlined AWS S3 scanning experience and a load of other improvements across the platform:
- Streamlined AWS S3 data classification scan experience, which expedites the setup of sensitive data scanning in S3 buckets
- Support for one time scans
- Extended Apache Parquet files capabilities and improved handling of very large files
- Filtering by supported MIME types when building a data inventory
- Updates to data classes: ODBC/JDBC database connector, Facebook API Tokens
Streamlined AWS S3 Data Classification Scanning
Our updated approach to data classification scan jobs provides an efficient way to configure discovery of sensitive data in your S3 buckets. This new interface, combined with our expansive predefined set of data classes, gives you the fastest way to start a scan and identify problems with minimal effort.
Navigate to Scans → Data Scan Jobs to find a new experience for creating data scan jobs.
Upon creating a new data scan job, you’ll be presented with a straightforward, single screen for setting up a scan. You can now specify:
- Scan schedule. Set to run once or repeat as often as every hour.
- Data to find by data collection. Use any one of our pre-configured data collections or create your own data collection.
- File types or names to ignore. The job scans for all of the supported file formats by default, but you can narrow this down however you would like.
- Sample size. The job scans all files by default, but you can scan a subset if you prefer (e.g., for faster completion).
Finally, select the S3 buckets you wish to analyze from the complete list already found by our discovery engine. You can filter the list by AWS account ID, AWS region, or by a few familiar S3 bucket security configurations:
- Public accessibility
- Encryption status
- Back-up status
You will also notice two bars at the top of the page which measure the S3 bucket volume to scan to make sure it is properly sized for successful completion.
Additional Improvements
Also included in this release:
- Extended support for scanning Parquet files and improved handling of scanning very large files
- Filtering by supported MIME types when building scan inventory
- Updates to data classes: ODBC/JDBC database connector, Facebook API Tokens
We have continued our progress in supporting scanning of even larger Apache Parquet files, and very large files in general. Some of this work requires sophisticated techniques such breaking files into chunks (“file chunking”). Read the multi-part blog post here.
We understand that not all files in your S3 buckets will have file suffixes (e.g. .log, .parquet, or .txt) and in many cases, have no file extension at all. No problem. We have improved the scanning engine’s ability to determine file type with extended MIME type analysis, as well as using file extensions when they’re desired.
Finally, and just as important, a few incremental updates were made to two important developer credential data classes: the JDBC / ODBC database connector string and the Facebook API token. The database connector class now captures a larger set of strings, including those used for Redshift, MySQL, PostgreSQL, SQL Server, and many more. Our team is committed to ensuring the accuracy of these data classes and the changes here boost accuracy.