The Hard Work of Cloud Security
What cloud security engineers actually do, and how to do it the right way
In any discipline of security, visibility is step 1. You can’t secure what you can’t see. Today, cloud visibility is a solved problem, with cloud native tools like AWS Config/Google Cloud Asset Inventory and third-party solutions like Wiz giving perfect insight into deployed resources and how those align with security requirements and best practices.
What do Cloud Security teams do after adopting a visibility solution? Their job is to effectively reduce risk and ensure compliance with the requirements of common standards such as CIS. Security teams have traditionally hesitated to enforce gates on deployment, so instead, they’re left with retroactive clean-up work.
Unfortunately, most of this clean-up work is laborious, slow, and risky. Projects to document public infrastructure, migrate away from IAM users, downsize roles, apply SCPs, or add backup/encryption to datastores take quarters or years. Often, they aren’t cleaned up in a timely fashion, leading to yet another public cloud data breach. If not a breach, they frequently result in an embarrassing audit report finding. Still other times, security teams trying to do the right thing end up breaking something (IMDSv2, anyone?) and losing trust with developers.
It is possible to broadly improve cloud infrastructure, but the work requires careful execution and tooling. In this post, we’ll analyze a few of these projects, the pitfalls, and how to make them successful.
Tagging
Resource tagging is generally helpful in a few cases:
Starting our list is tagging. Tagging is a generally benign change, with less risk involved than others we’ll discuss in this post. The “hard” part of tagging projects ends up being reaching out to the suspected owners, collecting the necessary information from them, and getting them to actually make the changes. This project ends up being a lot of:
The typical tools of the trade here are Slack, email, Jira, and Confluence. While tags are valuable for many other security and compliance work, they’re painful to roll out. Even the successful organizations get this done over many quarters, and only for a subset of their infrastructure.
Sometimes, you’ll end up with a list of infrastructure that nobody claims ownership of. At this point, you can:
Deploying SCP/Org Policies
We’ll cover SCPs in this blog, but Google’s Org Policies are very similar. Unlike tags, deploying SCPs can be a destructive change – if you get it wrong you might break something. This project should involve careful coordination between the security team and the appropriate owner (hopefully you got them tagged previously!)
The process for a good SCP rollout involves:
Security teams should typically roll out SCPs themselves (vs. giving users instructions) because of the type of permissions required to deploy an SCP. If you’re updating an existing SCP, you should store a previous version so you can quickly roll back if required. Coordinating the timing with the user is important. You’ll both want to do this at a time when people are around to rollback AND you aren’t at peak time for whatever your business does.
Removing IAM users
Rounding out our list is removing IAM users. IAM users (and their often associated static keys) present a huge risk to organizations using cloud, and these findings often top lists of cloud breach vectors. Unfortunately, some legacy vendor tooling still doesn’t support IAM role assumption, so some of the IAM users/keys are going to be required and will need an exception granted.
A successful project to clean up IAM users involves:
Lessons learned from Repokid
At Netflix, I ran a project and tool called Repokid, which used IAM Access Advisor and CloudTrail data to downsize IAM roles to only the used permissions. The project was effective in removing 59% of permissions across our AWS accounts, and is still running today.
Our goals were:
We built all of these functions into Repokid, which enabled the project to succeed. Building these features was time-consuming, and we had a few advantages that many organizations wouldn’t for the projects I mentioned above:
Organizations having to implement this kind of tooling and process and do the hard work of manual outreach and coordination are why not enough of these projects are completed. In the rare case they are successful, they take a long time with slow visible progress.
At Resourcely, we’re building the suite of tools I wish I had to help Repokid be successful, including:
-
4moI love it. This is hilarious.
CEO | CSO | Fixing Cloud Risks
4moWhere you’re the policy cop, I’m happy to be the risk cleaning janitor 🧼
Founder - Security | ML/AI | Engineering
4moAbsolute comedy gold! 😂
Cofounder @ Profit Leap and the 1st AI advisor for Entrepreneurs | CFO, CPA, Software Engineer
4moTravis McPeak, data visibility tools are great, but the real work is in cleanup. What's your biggest challenge? 🔍
Executing with you to the finish line 🎉
4moYou are the best, this is gold 🤣 Happy Friday to you too, Travis 😂😂😂 can’t wait to dive into your post, thanks for always sharing your pov in the cloud 🔒