The Hard Work of Cloud Security
Some combo of AI magic and crappy Photoshop

The Hard Work of Cloud Security

What cloud security engineers actually do, and how to do it the right way


In any discipline of security, visibility is step 1. You can’t secure what you can’t see. Today, cloud visibility is a solved problem, with cloud native tools like AWS Config/Google Cloud Asset Inventory and third-party solutions like Wiz giving perfect insight into deployed resources and how those align with security requirements and best practices.

What do Cloud Security teams do after adopting a visibility solution? Their job is to effectively reduce risk and ensure compliance with the requirements of common standards such as CIS. Security teams have traditionally hesitated to enforce gates on deployment, so instead, they’re left with retroactive clean-up work. 

Unfortunately, most of this clean-up work is laborious, slow, and risky. Projects to document public infrastructure, migrate away from IAM users, downsize roles, apply SCPs, or add backup/encryption to datastores take quarters or years. Often, they aren’t cleaned up in a timely fashion, leading to yet another public cloud data breach. If not a breach, they frequently result in an embarrassing audit report finding. Still other times, security teams trying to do the right thing end up breaking something (IMDSv2, anyone?) and losing trust with developers.

It is possible to broadly improve cloud infrastructure, but the work requires careful execution and tooling. In this post, we’ll analyze a few of these projects, the pitfalls, and how to make them successful.

Tagging

Resource tagging is generally helpful in a few cases:

  • Ownership – assign vulnerability tickets to the responsible team and coordinate necessary changes
  • Public exceptions – document anything that is supposed to be public, and why
  • FinOps – tracking and categorizing cloud spend

Starting our list is tagging. Tagging is a generally benign change, with less risk involved than others we’ll discuss in this post. The “hard” part of tagging projects ends up being reaching out to the suspected owners, collecting the necessary information from them, and getting them to actually make the changes. This project ends up being a lot of:

  1. Send a Slack message/email to somebody and confirm if they’re the owner
  2. Send the owner instructions about how to apply tags and what the tag standard is
  3. Follow up repeatedly until the tags are applied
  4. Track the progress and try to rally the hold-outs

The typical tools of the trade here are Slack, email, Jira, and Confluence. While tags are valuable for many other security and compliance work, they’re painful to roll out. Even the successful organizations get this done over many quarters, and only for a subset of their infrastructure.

Sometimes, you’ll end up with a list of infrastructure that nobody claims ownership of. At this point, you can:

  1. Give up
  2. Escalate to a leader and let them delegate an owner
  3. Start making (restorable) changes to shut it off and wait for somebody to come screaming, they’re the owner or know who is

Deploying SCP/Org Policies

We’ll cover SCPs in this blog, but Google’s Org Policies are very similar. Unlike tags, deploying SCPs can be a destructive change – if you get it wrong you might break something. This project should involve careful coordination between the security team and the appropriate owner (hopefully you got them tagged previously!)

The process for a good SCP rollout involves:

  1. Make a list of accounts where desired SCPs are not applied
  2. Reach out to the owner of that account and inform them of the change and potential impact
  3. Document any known carveouts (conditions that will apply to the SCP to limit what it applies to)
  4. Craft the SCP (with conditions)
  5. Coordinate a good time to deploy the SCP with the owner
  6. Deploy it, and be ready to roll it back if needed

Security teams should typically roll out SCPs themselves (vs. giving users instructions) because of the type of permissions required to deploy an SCP. If you’re updating an existing SCP, you should store a previous version so you can quickly roll back if required. Coordinating the timing with the user is important. You’ll both want to do this at a time when people are around to rollback AND you aren’t at peak time for whatever your business does.

Removing IAM users

Rounding out our list is removing IAM users. IAM users (and their often associated static keys) present a huge risk to organizations using cloud, and these findings often top lists of cloud breach vectors. Unfortunately, some legacy vendor tooling still doesn’t support IAM role assumption, so some of the IAM users/keys are going to be required and will need an exception granted.

A successful project to clean up IAM users involves:

  1. Making the list of users and associated owners
  2. For each owner, document a reason why the IAM user exists:
  3. Select a time to make changes with lower business impact
  4. Select a strategy to neutralize the key while allowing for rollback
  5. Apply the changes gradually and document the project

Lessons learned from Repokid

At Netflix, I ran a project and tool called Repokid, which used IAM Access Advisor and CloudTrail data to downsize IAM roles to only the used permissions. The project was effective in removing 59% of permissions across our AWS accounts, and is still running today.

Our goals were:

  1. Limit developer action required – developers shouldn’t have to spend time on IAM least privilege; they just get the role permissions they need
  2. Limit disruption to environments – we didn’t want to cause outages
  3. Enable fast restoration – sometimes we’d remove permissions from a role that hadn’t been used in a while, but the owner would need those permissions back later. We had to account for this.
  4. Tell developers what we were doing and when
  5. Allow developers to opt-out at any point
  6. Reporting – we wanted to automatically gather progress metrics regularly so we could report them to our stakeholders

We built all of these functions into Repokid, which enabled the project to succeed. Building these features was time-consuming, and we had a few advantages that many organizations wouldn’t for the projects I mentioned above:

  1. It only had to work for IAM role permissions
  2. It made direct cloud API calls; it didn’t have to work with IaC
  3. We didn’t have to collect user input – users got an email notifying them we were making changes and giving them an option to opt-out
  4. We didn’t have to coordinate changes at a specific time, the permissions we were removing were unused for at least a quarter

Organizations having to implement this kind of tooling and process and do the hard work of manual outreach and coordination are why not enough of these projects are completed. In the rare case they are successful, they take a long time with slow visible progress.

At Resourcely, we’re building the suite of tools I wish I had to help Repokid be successful, including:

  1. Coordinating with developers, in their preferred communication method
  2. Managing the pipeline of changes
  3. Applying changes at the right location (IaC, cloud API)
  4. Enabling fast and effective rollback
  5. Continuous progress tracking/metrics
  6. Opt-outs/exception tracking

Thanks for reading this far! I would love to compare notes if you’re working on these projects. You can reach me on LinkedIn or Twitter, or by email.

I love it. This is hilarious.

Snir Ben Shimol

CEO | CSO | Fixing Cloud Risks

4mo

Where you’re the policy cop, I’m happy to be the risk cleaning janitor 🧼

Grady Lancaster

Founder - Security | ML/AI | Engineering

4mo

Absolute comedy gold! 😂

Russell Rosario

Cofounder @ Profit Leap and the 1st AI advisor for Entrepreneurs | CFO, CPA, Software Engineer

4mo

Travis McPeak, data visibility tools are great, but the real work is in cleanup. What's your biggest challenge? 🔍

Cee Ng

Executing with you to the finish line 🎉

4mo

You are the best, this is gold 🤣 Happy Friday to you too, Travis 😂😂😂 can’t wait to dive into your post, thanks for always sharing your pov in the cloud 🔒

To view or add a comment, sign in

Others also viewed

Explore topics