From the course: GitHub Advanced Security (GHAS)

Secret scanning overview

- [Narrator] Secret scanning is a part of GitHub advanced security that does, as the name already gives away. It detects secrets in your code. Secrets can be anything you don't want to leak out into the public. For example, a database connection string, an API token, or the credentials you use to login to your cloud provider. Accidentally committing a secret into the repository, is something that happens to every developer. You add something temporarily in your source, during testing for example, and then forget to remove it before you commit. The issue with having secrets stored in your repository is that anyone with access to that repository could have access to that secret. They might misuse it to impersonate you or worse, download your data and sell it. We have seen numerous leaks where someone got access to an API token to a cloud service for example, and then downloaded entire databases of customer data. Or, they started using computer resources to mine some cryptocurrency that your company then gets the bill for. This is also the reason why GitHub has this feature enabled by default, for public repositories. Malicious hackers have been actively searching for these secrets in public repositories and misusing them for years! This way, GitHub makes the entire community a little safer. GitHub found this so important, they even decided that you cannot disable secret scanning for public repositories, at all. GitHub is actively working on improving the code security and for that, is partnering with third parties that help identify their secrets. There are a lot of secret scanning partners already and this number is only growing all the time. All the major cloud vendors are already a partner, since they see the benefit of preventing their customers secrets from being leaked. Secret scanning has three ways of operating. The first, is when you activate the feature. It scans the entire repository, including all branches and history. A secret might be stored anywhere in the history, so only checking the latest commit is not enough. This of course, means that it could take a while before secrets in your repository show up here. If there is a lot of history to go through, for example. Or when the repository contains large files. All branches, tax, and commits are scanned for known secret pattern. If a secret is found, then an alert is generated and sent to the right people in your environment. The second way of operating is on push. When the developer pushes their code to GitHub the incoming files are checked after they have been stored on GitHub's site. This means that the secret can already be visible in the repository on GitHub before the scan is completed and the alert is created. The third way of operating is push protection. The incoming changes are scanned before they are stored in the repository. If a secret is found the push is rejected. This means that the secret will not be stored in the repository. Thus, reducing the chances of being visible, other than the person sending in the commit. Let's see how secret scanning works by looking at the on push process. It starts with a developer that has secrets in the code on their copy of the repository. They then execute a Git push in their command line or IDE, and send in the data to GitHub. GitHub receives the data and stores it at their end. Then a background process starts that scans the new data for secrets. The scans happen by executing regular expressions, also called a regex. An example of a regex could be that the secret is always 16 characters, alphanumeric, and it follows a certain pattern. Using that pattern, GitHub can check if the code contains any of these secrets. If a match is found, it is sent to a verification website hosted by the secret scanning partners. Each partner has their own set of regular expressions that they configure for GitHub to use. So GitHub knows which partner to notify when a regular expression returns a value. The partner can then check if it's a real match and make a decision on what to do next. It's fully up to the partner to choose what they want to do with that secret. They can choose to do nothing or revoke the secret. Revoking the secret renders it invalid and might break your environment. That is the reason that this decision is up to the secret partner. Revoking the database connection strain for your production web shop might have some unwanted consequences, for example. Often, we see that the decision to revoke the secret or not, is based on the visibility of the repository. If the repo is public, they revoke the secret. If the repo is private, then they do not revoke that secret. That way, we still have an option to revoke it ourselves and clean up the secret from the repository history. If a real match is found, a secret scanning alert is generated in GitHub that will be sent to the organization and repository admins, as well as the people who have the security manager role for that repository.

Contents