Structured Cloud Data Discovery & Classification: What You Should Know About Snapshot Scanning
The Open Raven Data Security Platform supports two modes for scanning databases - credentialed scanning and snapshot scanning - that enable broad insights into where sensitive data is stored within cloud infrastructure.
When scanning for data in structured (RDBMS) datastores, our original and preferred approach is to connect directly, via JDBC, to the datastore using credentials (e.g. username + password) provided by the customer. That way Open Raven has access to and only scans current data, with no data being copied/moved to any other location, even temporarily.
Our concern around copying or moving data of any kind is that if one has a datastore with sensitive data stored within it, the security, access control, audit logging, and sovereignty of that data must extend to wherever that data is copied or moved to. If data transitions away from its primary location, then any controls around that data no longer exist. Take audit logging, for example. If a database or data therein is moved to a location where the customer does not have necessary controls or monitoring enabled, then they do not know who, what, when, and for how long that data was accessed, leaving them without visibility and accepting risk.
However, in some cases, users don't have credentials for all the databases they want to scan or can't spend time mapping credentials to databases. Another approach is needed in that case, and the tradeoff of having short-lived, temporary copies of data is acceptable. Enter "snapshot scanning".
When snapshot scanning, Open Raven schedules a snapshot "backup" of a target database (if a recent one does not already exist) into the same location as the primary datastore. This snapshot is then "restored" as a new database instance, again in the same location as the original target database, thus preserving security, access, and logging configuration. The restored database's master password can then be reset to a single-use temporary one Open Raven generates. Once this restored database is available, the Open Raven scanner can directly access it as it would with a customer-provided credential. This process happens entirely in the customer's environment, again with no data leaving or being accessed outside of their environment, and when scanning is completed, Open Raven removes the restored database and snapshot (when not using an existing one), cleaning up artifacts.
From the customer perspective, scanning via snapshots may be considerably easier than direct-access scanning via credentials, although there can be additional time added to the duration of an end-to-end scan as snapshots are created, databases restored, and artifacts cleaned up - this can range from tens-of-minutes for small databases, to over an hour for very large databases. There is also the additional cost factor (for the snapshot and restored database), but in practice, because these are only temporary resources, their overhead is very minimal - i.e., in the order of cents per snapshot scan.
To help customers understand what scan options are available and which will be used, Open Raven indicates which databases have validated credentials via a key icon and will use direct scanning where possible. When credentials are unavailable or incorrect, we will use snapshot scanning, but to ensure the customer is aware of what "scanning mode" will be used, a warning is provided to the customer so they may understand that additional resources and data copies will be temporarily created.
Once scanning is complete, one can see findings in the data catalog associated with the original target database.
To summarize, when conditions permit, Open Raven’s preferred method of structured datastore scanning is via a direct connection. However, when performed in a manner that successfully balances security versus expediency tradeoffs, snapshot scanning provides customers with the right blend of capabilities to discover and classify structured datastores.