Scanning for Developer Secrets in Git Repositories using Open Raven
Introduction
Open Raven’s initial focus was scanning unstructured and semi structured data at scale in AWS S3 buckets. Since just about anything can find its way into an S3 bucket, our platform’s analysis can stretch to a number of places, including source code repositories. Using infrastructure provisioned by CloudFormation templates, customers can sync any of their Git repositories into a S3 bucket in their cloud environment using AWS native services. Syncing happens any time the main branch of repositories are changed so everything in the bucket remains up to date. This bucket containing the repo’s can then be scanned in the product to detect any sensitive personal information and developer secrets on a scheduled basis. Once set up, the process can be scheduled to be fully automated, allowing customers to constantly evaluate their codebases for any sensitive information before attackers have the chance to find it.
Developer Secrets can fall under multiple different categories but the general concept is that they are string-based credentials that provide access to different types of resources. If an attacker finds these keys, they can then obtain access to the service it is used for and extract information as if they were the victim company themselves. Open Raven currently supports 22 different developer secret data classes, with immediate plans of adding more throughout this quarter. Examples of those supported include but are not limited to: AWS Access Keys, Github & Gitlab API keys, Paypal & Stripe API keys. These strings should never be hardcoded into repositories and Open Raven can help identify those that are for immediate remediation.
Git Repository Syncing
The first stage to enable the scanning is syncing an organization’s Git Repositories into an S3 bucket within their AWS account. This is done in a way where it allows automated syncing for any new changes to the repository, so the bucket will always contain the most up to date repositories. This syncing process is done through a mixture of different tooling including: Webhooks, Amazon API Gateway, Lambda’s and CodeBuild. All of this is available in a CloudFormation template provided by AWS. The infrastructure that the CloudFormation template provisions can be seen in the Figure 1 below.
The principle here is a webhook is set within specific repositories, or at an organizational level and if specific conditions are met i.e., a merge to the main branch, then it will be triggered and send a HTTP request to the API gateway configured in the hook. The gateway receives the request and triggers a Lambda to process and validate the request. If the Lambda determines it is a legitimate request it activates AWS CodeBuild which is responsible for the cloning of the git repository, packaging up and uploading to a S3 bucket in a zip archive. This process will happen any time a repo is updated, removing the old zip file and replacing it with the most recent version of that repository.
Scanning your S3 bucket
Once you have synced your repositories into the S3 bucket, it is really simple to scan them within the product. All you need to do is set up your scan in the UI you can see below in Figure 2
It is as simple as clicking continue to submit your scan with the selected configurable options and this scan will be run at the specified time intervals - which means the whole process can be fully automated, syncing and scanning.
The results from the scan will be presented in the platform under the particular asset you scanned, from here you can observe all of the findings as seen in Figure 3.
Summary
Syncing Github repositories is extremely useful for Open Raven’s customers as it allows them to safeguard another aspect of their organization within the same product. The continuous syncing and automated scan scheduling for the repository bucket means they can automate this process and only need to worry about the findings that come through. The downside to this approach is customers are having to take their repositories and place them in a bucket. The minor risk associated with this is minimized by the fact the bucket and scanning exists within their own cloud environment as opposed to ours, allowing them to take the security aspects regarding the bucket into their own hands.