🚀 KRM-Native GitOps: Yes — Without Flux, No. (FluxCD or Nothing.)

🚀 KRM-Native GitOps: Yes — Without Flux, No. (FluxCD or Nothing.)

Written by a battle-hardened Platform Engineer after 10 years in production Kubernetes, and hundreds of hours spent in real-life incident response, CI/CD strategy, audits, and training.

This article is opinionated, strongly so. It’s forged through experience with large-scale Kubernetes deployments at Radio France, BlaBlaCar, Brittany Ferries, BforBank, and LCL. It’s fueled by debates with peers, conversations with OSS maintainers, hundreds of CKA/CKAD students trained… and more than a few outages. 🙃

🎙️ Recently, I watched a KubeCon Paris 2024 session where the speakers tried to argue that KRM was not a viable model for platform engineering.

I believe they missed the point. What they were actually struggling with... was ArgoCD.


🧠 What is KRM-Native GitOps?

KRM-native GitOps is more than a buzzword — it's an architecture pattern. One that finally brings true simplicity to platform engineering.

It’s GitOps, powered by the Kubernetes Resource Model (KRM), managing everything: from deployments to IAM, DNS, managed SQL, and firewall rules — with one API, one model, and one Git source of truth.

With tools like Config Connector (GCP), Crossplane, ACK (AWS) or Pulumi’s KRM output, your entire infrastructure becomes a set of Kubernetes resources.

🧱 Everything is declared in YAML.

  • Your Deployment? YAML.

  • Your VPC? YAML.

  • Your Service Account? YAML.

  • Your CloudSQL instance? YAML.

  • Your Alerting Policy? YAML.

🎯 Everything is declarative, immutable, composable — and versioned in Git.


🌍 Why KRM-Native GitOps Is the Future

One deployment contract to rule them all

As a platform engineer, you’re building a system to host many applications and services, possibly developed by dozens of teams. You want:

  • 🔒 A secure, auditable way to define and provision infra

  • 🧠 A model understandable by humans and machines

  • 🚀 A foundation to enable automation, AI, and smart workflows

  • 🔄 A way to reduce complexity and toil

KRM delivers on all of this — because it’s:

  • 🌐 API-first — Everything is driven by Kubernetes controllers

  • ✍️ Declarative — No scripts, no imperative mess

  • 🔁 Reconciling — Kubernetes ensures convergence to the desired state

  • 🤖 AI-friendly — All resources are machine-readable in the same format

Example: the Full-Stack GitOps Application

Let’s say you’re deploying a microservice that needs:

  • A Kubernetes Deployment

  • A GKE Service with LoadBalancer

  • A PostgreSQL database on CloudSQL

  • IAM bindings for the workload identity

  • A firewall rule for backend-to-database traffic

  • Monitoring alerts and dashboards

Traditionally, this would mean:

  • One Terraform repo

  • One Helm chart

  • One cloud team ticket

  • One devops script

  • And one poor engineer trying to glue everything together.

With KRM-native GitOps, all of that goes into one Git repo, under a clear folder structure, fully declarative. The platform reconciles the rest.

Benefits:

  • 🧩 Unified model: You manage all resources the same way.

  • 🔍 Full visibility: The graph of dependencies becomes observable.

  • 🚨 AI-OPS ready: MCP servers can reason about your platform.

  • 📜 Version control: Every change is tracked.

  • 🛠️ Operational consistency: Human error is reduced. Automation thrives.

If you’ve ever had to track a production outage down to a missing IAM permission or a forgotten firewall rule — you’ll see the value immediately.


⚖️ Flux vs ArgoCD — or Why One Embraces Kubernetes and the Other Reinvents It

You’re probably here because you’ve tried GitOps. Maybe you’re already running ArgoCD in production. Maybe you’ve just seen it demoed at a conference and thought, “Wow, that UI is slick!” And yes — it is.

But here’s the twist: that shiny interface comes with architectural baggage.

🎭 UX vs Truth: The Philosophical Divide

Let’s make it simple:

  • Flux: opinionated, minimal, Kubernetes-native.

  • ArgoCD: flexible, UI-rich, Kubernetes-compatible.

It’s not that one is better in all contexts — but if your goal is KRM-native GitOps, one of these tools aligns more deeply with Kubernetes' design philosophy. And it’s not the one with tabs and graphs.


🔐 Permissions: Trust the API, Not the App

Let’s start with RBAC.

  • Flux delegates everything to Kubernetes’ RBAC. → No weird backdoors. No duplicated permission model. → You use kubectl auth can-i and you're good.

  • ArgoCD introduces its own access model, layered on top of Kubernetes, allowing users to perform actions through the UI regardless of their in-cluster permissions.

Sounds convenient? It is — until it breaks. Ask anyone who’s watched a dev accidentally delete production workloads via the ArgoCD UI on a Friday evening.

True story:

A team pushed a change to ArgoCD, edited something live through the web UI (against best practice), and broke half the platform — with a 24h delay due to drift reconciliation. Guess what day that 24h landed on? Yes, Sunday.


🔄 Pull vs Push: Who Owns the Truth?

The heart of GitOps is that Git is the source of truth. Not a user, not a dashboard, not a UI — Git.

  • Flux operates in pull mode. It watches Git, reconciles continuously, and applies changes. It's quiet, humble, but rock-solid.

  • ArgoCD defaults to push. While pull is technically supported, most implementations use push for flexibility — and speed.

The risk?

You now have a mutable control plane, accessible to humans (and possibly CI systems), with broad permissions and real-time mutation of cluster state.

Let me ask you:

  • Do you trust a web-accessible application to hold the keys to all your environments?

  • Have you heard of Zero-Day vulnerabilities?

  • Can you prove ArgoCD doesn’t have one right now?

If your GitOps controller has a CVE, you want it isolated and read-only, not sitting in front of your cluster with god-mode enabled.


🖥️ UX and Adoption: Argo’s Secret Weapon

Let’s be honest: ArgoCD is winning hearts with its UI. It looks great, it demos well, and teams feel productive.

  • ✅ Visual diffing

  • ✅ One-click sync

  • ✅ Health indicators

  • ✅ Status dashboards

But that convenience comes at a cost:

  • It pulls decision-making away from Git

  • It encourages click-ops (which is not GitOps)

  • It breaks audit trails unless you’re extremely disciplined

Meanwhile, Flux is… well, a CLI-first tool. It integrates beautifully into GitHub Actions, GitLab CI, and pipelines. It runs as native Kubernetes controllers. It emits events. It integrates with SOPS, Kustomize, OCI, Helm, multi-tenancy, and operator models.

Flux doesn’t look good. Flux works well.

🛠️ Tip: Want a UI for Flux? Use Headlamp — it’s clean, RBAC-respecting, and composable.


⚡ Event-Driven vs Polling

Another key difference: Flux is event-driven. ArgoCD polls.

This means:

  • In ArgoCD, you set a sync interval (e.g. 3 minutes) and wait.

  • In Flux, a new Git commit triggers an event, and everything reconciles instantly.

Result? Flux is faster, more reactive, and more Kubernetes-native.

Remember, Kubernetes isn’t about scheduling cron jobs — it’s about reactive convergence.


🔐 Secrets Management: Flux is Just Smarter

Managing secrets in GitOps is painful. Flux embraces this by integrating with SOPS out-of-the-box.

  • Your secrets are encrypted at rest (YAML + PGP/AES/GCP KMS).

  • They’re committed to Git — but safely.

  • They’re decrypted only inside the cluster.

  • It’s deterministic, audit-proof, and works in CI.

ArgoCD’s approach?

"GitOps is not for secrets." — Literally in the docs

Instead, ArgoCD punts the problem to vaults, external syncs, or side-channel tooling. Good luck stitching that into a unified deployment model.


😬 Real-World Consequences

Teams love ArgoCD... until something goes wrong:

  • A UI-clicked override drifts from Git → reconciles late → breaks production.

  • A misconfigured policy allows access too broad.

  • A template gets edited without version control.

  • Someone thinks “it’s just one field” — and suddenly your logs are gone.

I've seen it. You probably have too.


✅ Flux: What You See Is What You Deploy

In Flux:

  • Your repo is the truth.

  • What you commit is what you get.

  • You can run everything in pull-only mode, air-gapped, hardened, and clean.

  • No web dashboard can surprise you.

It aligns with Kubernetes principles:

  • Controller reconciliation

  • Declarative state

  • RBAC-first security

  • Separation of concern

Flux isn't just a tool. Flux is a philosophy: "Do it right, or don't do it."


🛠️ Helm vs Kustomize — or Why Templates Are the Wrong Kind of Magic

If the battle between ArgoCD and Flux is about philosophy, then Helm vs Kustomize is about craftsmanship.

Helm is popular. Extremely popular. It gives teams the illusion of speed, reusability, and structure. But under the hood… it’s often a template-driven minefield that explodes under pressure.

Kustomize? It’s quieter, more opinionated, and sometimes frustrating at first. But once you master it, you’ll never look back.

Let’s break it down.


🧙♂️ Helm: The Illusion of Simplicity

Helm is like JavaScript frameworks: It solves everything at once — until it doesn't.

  • Go templating in YAML? ✔️

  • Global variables passed through values files? ✔️

  • Conditional logic? ✔️

  • Complex rendering paths? ✔️

  • Debugging YAML using --dry-run and --debug while praying? ✔️

Yes, Helm gives you superpowers. But they come with a heavy price: long-term complexity.

🚨 Real-world pain:

  • You try to add a simple flag to a chart.

  • That flag breaks another team’s config.

  • You introduce if statements to handle use cases.

  • You suddenly have 20 conditionals.

  • Someone tries to “fix” it with a subchart.

  • Now your CI is broken and nobody knows why.

And when it breaks, guess who gets paged at 2 AM?


🪜 Kustomize: Demanding, But Honest

Kustomize takes a different approach.

Instead of rendering templates, it composes resources.

  • No logic.

  • No if/else.

  • Just overlays, patches, and transformers.

It’s declarative all the way down.

This makes it a bit harder at first — you have to understand the model, plan your structure, write your base properly. But once that’s done?

  • ✨ It never breaks in surprising ways.

  • ✨ You can read it.

  • ✨ Your successor can read it.

  • ✨ Even a robot can read it.


🧩 Composition vs Templating

Here’s the key insight:

Templates require interpretation. Compositions are just configuration.

In incident response scenarios, simplicity saves lives. If you’ve ever had to troubleshoot a Helm chart with five nested conditionals while your SLO burns… you know what I mean.

Kustomize reduces the cognitive load. You can run kustomize build and see the actual state.

There’s no mental projection. What you see is what Kubernetes gets.


😴 The Helm Trap: Laziness Masquerading as Speed

Here’s a dirty secret: Helm is used everywhere not because it’s good — but because it’s easy to start.

Most companies:

  • Create a "shared" Helm chart to rule them all.

  • Add hundreds of conditionals for different apps.

  • Use it everywhere until it becomes unmaintainable.

Then the SRE team spends the next 6 months trying to refactor it.

🧨 Ask yourself:

  • How many incidents came from Helm chart changes?

  • How many secrets leaked due to template rendering?

  • How many developers blamed “Kubernetes” when Helm was the real villain?

I’ve seen dozens of platforms brought down by Helm chart complexity.


😬 YAML Is Already Hard Enough

Let’s be honest: YAML is nobody’s idea of a pleasant format.

Now add Go templating, indentation-sensitive logic, and nesting.

🔁 The result? You write YAML inside strings inside Go templates inside YAML. What could possibly go wrong?

Kustomize avoids all this. It’s not sexy. But it’s clean.

And isn’t that what we all want when we’re troubleshooting prod?


⚠️ When You Have to Use Helm

There’s one legitimate use case for Helm: when the vendor gives you a chart and you can’t avoid it.

Sometimes it’s certified. Sometimes it’s the only supported install. Sometimes it’s so tightly coupled that writing your own manifests would be insane.

That’s fine.

But even here, Flux wins again.

Why?

  • Flux uses the official Helm library.

  • Flux manages Helm releases natively.

  • Flux supports proper dependency tracking with HelmRelease and HelmChart.

Meanwhile, ArgoCD… renders the chart via helm template and applies the output.

That’s right: Argo doesn’t actually respect the Helm API. It fakes it.


🎻 Helm as an Orchestrator? Not So Fast…

Here’s another dangerous pattern:

Teams use Helm not just as a templating engine — but as an orchestrator.

They define install order, dependencies, retries, hooks, all inside Helm.

But here’s the problem:

  • Kubernetes is already an orchestrator.

  • Helm is now orchestrating resources that are also orchestrated.

  • You end up with reconciliation of reconciliation.

This is a recipe for drift, duplication, and confusion.


🤖 Real GitOps Orchestration = Operators

If you need orchestration — real, intelligent dependency-aware reconciliation — you should be using operators.

Operators are first-class Kubernetes citizens. They expose CRDs. They follow the controller pattern. They respect convergence logic.

And the best part?

They integrate cleanly with GitOps.

You can declare operator, CRDs in Git, reconcile them with Flux, and everything behaves as it should.

Don’t fight the model. Embrace it.


💡 Building Operators Is Easier Than You Think

Afraid of writing operators? Don’t be.

You don’t even need Go — unless you want to.

A well-built operator lets you encapsulate logic once, and reuse it safely. No more Helm hacks. No more CLI glue. No more "just rerun the job."


🆚 GitOps vs Gitless GitOps — or How OCI and Security Are Shaping the Next Frontier

Ah, GitOps. We love it.

  • 🧠 Git as the single source of truth.

  • 🛠️ CI builds the artifact.

  • 🤖 CD tools (like Flux or ArgoCD) sync the declared state into Kubernetes.

But recently, a new flavor has emerged from the shadows of KubeCon EU 2025 — something both familiar and radical: Gitless GitOps.

Let’s unpack it.


🛢️ Gitless GitOps: What Is It?

In short, Gitless GitOps replaces Git repositories with OCI (Open Container Initiative) artifacts stored in container registries.

Instead of syncing YAMLs from Git:

  • CI builds your manifests (e.g. with Kustomize or Helm).

  • The result is packaged as an OCI artifact.

  • That artifact is stored in a registry like ghcr.io, gcr.io, harbor, ArtifactRegistry, etc.

  • CD tools pull and apply those OCI bundles directly.

The source of truth is now a signed, immutable artifact, not a Git branch.


🔒 Why Change What Works?

Because Git — for all its greatness — has some downsides in production-grade supply chains:

🔐 1. Software Supply Chain Security

OCI-based workflows allow:

  • Artifact signing with cosign

  • Vulnerability scanning

  • SBOM (Software Bill of Materials) publishing

  • Provenance tracking with SLSA levels

This makes Gitless GitOps ideal for high-compliance environments: finance, healthcare, defense.

With Git, validating that a YAML wasn’t tampered with is hard. With OCI, it’s cryptographically verifiable.

🧱 2. Registry Replication and Edge Resilience

Unlike Git servers, OCI registries are easily replicated across regions and zones.

This is a huge win for:

  • 🌍 Edge environments

  • 🛰️ Air-gapped systems

  • 🧪 Disaster recovery

Need to deploy the same manifest to 50 edge clusters? OCI replication is your friend.

🔐 3. No Git Access Needed in CD

In traditional GitOps:

  • Your controller pulls from Git.

  • You manage SSH keys, PATs, OAuth tokens.

  • You hope nobody hijacks them.

In Gitless GitOps:

  • CD tools pull read-only, prebuilt artifacts from your registry.

  • No Git credentials needed.

  • Access is managed via OCI scopes and short-lived tokens.

Your attack surface just shrank. 🎯


🚀 Performance Gains

Gitless GitOps eliminates git pull overhead. No more checking out full repos. No more parsing hundreds of YAMLs on every sync.

You fetch a single artifact, signed, validated, and ready to go.

That makes Gitless fast. And in CD, speed = safety.


🔁 But Is It Still GitOps?

Yes… and no.

You still:

  • Maintain a declared desired state

  • Rely on controllers to apply it

  • Expect reconciliation to ensure convergence

But you don’t need direct access to Git anymore.

So is it still "GitOps"?

  • 👉 Philosophically: yes.

  • 👉 Technically: Git is no longer a hard requirement.


🛠️ Flux Embraces Gitless

One more reason FluxCD shines?

Flux has first-class support for OCI artifacts:

  • You can define OCIRepository resources

  • Reference them in Kustomization

  • Integrate signing, verification, promotion

ArgoCD? Not so much. Support is experimental, fragmented, and mostly community-driven.

If you're serious about OCI-based GitOps, Flux is miles ahead.


⚠️ Gitless ≠ Effortless

Let’s be clear: migrating to Gitless GitOps isn’t trivial.

  • Your CI must build and push artifacts.

  • Your CD must understand OCI bundles.

  • Your observability and rollbacks need new strategies.

  • You need to rethink traceability (since Git history is out of the loop).

It’s a tradeoff — more secure and scalable, but more work upfront. Perhaps you deserve it?


🤝 Hybrid GitOps: The Best of Both Worlds?

One common pattern today:

  • Use Git during dev and QA

  • Promote to OCI for staging and production

This gives you:

  • 🔍 Visibility in dev

  • 🔐 Security in prod

  • 🔁 A clear promotion pipeline

In short: GitOps for humans, Gitless for machines.


🧭 TL;DR: Gitless GitOps is not a replacement — it’s an evolution. It’s GitOps with OCI-native packaging, better security, and production-ready practices.

And it’s already supported, out of the box, in Flux. Not ArgoCD.


🧩 Platform Engineers Are Just Glue Professionals — So Use Less Glue

If you’ve been in platform engineering for more than five minutes, you’ve probably realized this:

Our real job isn’t writing code — it’s eliminating chaos with structure.

And we do that by gluing tools together. CI to CD. Git to infra. Secrets to apps. DNS to services.

But here’s the trap: The more glue you use, the more brittle your platform becomes.


🧪 The Less Glue, the Better

🧠 Rule of thumb:

The best glue is the one you don’t need.

In a robust GitOps platform, you want to minimize tool overlap:

  • ✅ One Git repository per purpose (infra, apps, etc.)

  • ✅ One CI (e.g., GitHub Actions, GitLab CI)

  • ✅ One CD (Flux 🤓)

  • ✅ One model (KRM)

  • ✅ One state source (Git or OCI)

Every time you add "just one more tool" to fill a gap — you’re introducing:

  • An interface to maintain

  • A security surface

  • A cognitive cost

  • A new source of truth (aka a new place for bugs to hide)

Platform engineering isn’t about building Rube Goldberg machines. It’s about composing simple tools in the simplest possible way.


🗂️ Namespace Naming: Your First UX

Kubernetes is namespaced. That's not optional — it's foundational.

Bad names = bad UX. And not just for humans.

Let’s look at this DNS behavior:

  • foo → resolves foo in the current namespace

  • foo.bar → resolves foo in namespace bar

  • 👉 That’s deterministic.

  • 👉 That’s predictable.

  • 👉 That’s your friend.

Now, here’s what not to do:

  • 🚫 mynamespace-dev-staging-prod

  • 🚫 namespace-team1-infra-env8

Why?

  • It breaks standard patterns

  • It confuses discovery

  • It makes automation harder

  • It signals that you’re overloading one cluster with multiple environments 😬


☠️ The Dangers of Multi-Env Clusters

Some teams think they’re saving money by putting dev, staging, and prod in the same cluster.

They aren’t.

Instead, they’re paying with:

  • 💥 Security incidents

  • 🔥 Accidental rollouts

  • 👻 Ghost errors from shared resources

  • 🤯 Complex pipeline logic

Clusters are cheap. Engineers aren’t. Incidents are expensive. So are lawsuits.

Use separate clusters per environment.

You want:

  • Cluster: prod-eu-west1

  • Namespace: payment

  • App: api-gateway

🧭 Then your Git path becomes:

And your KRM applies cleanly:

  • Namespace: payment

  • Kustomization: api-gateway

  • GitPath: clusters/prod-eu-west1/payment/api-gateway

🧩 It’s systematic, deterministic, CI-friendly, and human-readable.


🧠 Propagate Git Paths to Kubernetes

The best GitOps strategy?

Let your Git structure define your cluster structure.

That way:

  • You reduce complexity

  • You reduce errors

  • You make it impossible to misroute a deployment

Want your CI to build the right artifact, push it to the right cluster, and apply it in the right namespace?

  • 👉 You don’t need config files.

  • 👉 You need consistent paths.

This also enables:

  • ✅ Predictable rollout logic

  • ✅ Easier observability

  • ✅ Better access control

  • ✅ Cleaner permission boundaries


📦 Context Is King

Avoid repeating yourself — especially in YAML.

Use ConfigMaps to define base primitives like:

  • projectId

  • environment

  • region

  • sharedVPC

  • dnsZone

Let them cascade through overlays.

📌 Declare it once. Reference it everywhere.

That way, when your GCP project changes? You update one line — not 400 YAML files.

It’s also your golden ticket to multi-region deployments. Or your Disaster Recovery (DR) strategy.

Sometimes, without realizing it, you’re writing your Business Continuity Plan... in YAML. You're welcome, CIO. 😎


🧪 YAML Generation Isn’t Evil

Let’s bust a myth:

“Everything must be handwritten, pure YAML.”

False.

It’s okay to generate YAML using shell scripts, as long as:

  • It’s deterministic

  • It’s idempotent

  • It produces valid KRM

Your Kustomize overlays can be built by:

  • Bash scripts

  • Jinja templates

  • Jsonnet (if you're brave)

Just don’t forget: the generated result must still be declarative.

Don’t introduce imperative magic. Do generate clarity.


🧬 DRY + GitOps = Power

If your YAMLs are structured well, with common base values and environment overlays:

  • You only define secrets, policies, labels, resource limits once

  • You eliminate repetition

  • You remove ambiguity

  • You gain auditability

This makes your platform easy to replicate, easy to test, and easy to scale.

Whether you're deploying to dev, staging, prod, or eu-west2, it's just a matter of flipping context variables.

✨ Context is everything. Treat it like gold.


⚙️ Isoproduction, Isoproduction, Isoproduction — Drift Is the Root of All Evil

If you take one thing away from this article, let it be this:

"The greatest cost in cloud is not the infrastructure. It’s the humans managing it — and the incidents they cause."

Every platform team learns this the hard way. And one of the biggest silent killers is drift — the divergence between environments.


🚨 What Is Drift?

Drift is what happens when:

  • QA is “almost like” production

  • Staging has that one feature flag left on

  • Prod has that secret config someone hotfixed last week

  • The CI pipeline applies a different branch than intended

The problem? Most of these issues don’t show up until it’s too late.

And when things explode in prod, people start asking:

“Why didn’t we catch this in staging?”

The answer is always:

“Because staging wasn’t prod enough.”


🧪 Isoproduction = Predictable Everything

🧠 Isoproduction means:

  • Same Kubernetes version

  • Same config baseline

  • Same IAM policies

  • Same manifests

  • Same secrets logic

  • Same CI/CD pipeline structure

  • Different cluster names, but identical behavior

If your environments drift, you're testing a lie. You're testing something that doesn’t exist in production.

This leads to:

  • False positives

  • Missed failures

  • Unreliable tests

  • Painful rollouts


🛠️ Fail Fast, Fix Fast

Don’t aim for "stable environments."

Aim for environments that are:

  • 🔁 Easily recreated

  • 🔍 Constantly tested

  • 💥 Able to break safely

  • 🔄 Quickly fixed

If staging never breaks, it means you’re not pushing changes fast enough.

Production should be boring. Your test environments should be chaotic. That’s what they’re for.


📦 Apply Small, Frequent Changes

Want to avoid catastrophic failures?

Apply small changes. Apply them often. Apply them everywhere.

If you push a config change:

  • Do it in QA

  • Wait

  • Push to staging

  • Wait

  • Push to prod

  • Rejoice

Tiny changes = tiny blast radius. Big changes = Big regrets.

💡 In GitOps, each commit should be small enough to understand, easy to revert, and safe to promote.


🔄 The 12-Factor Alignment

In a KRM-native GitOps model, your stack should resemble the 12-factor app philosophy:

“An app is one deployable unit + its configuration.”

That means:

  • The image, YAML, secrets, and env are one unit of work

  • Every deployment = everything it needs

  • Context is read from the cluster

  • Configuration is constant — code and behavior may change, but config doesn’t drift

This allows:

  • Reliable rollbacks

  • Deterministic tests

  • Stable metrics

  • Scalable CI pipelines


🗂️ Central or Decentralized Repos?

Now the classic question:

“Should we keep all the YAMLs in one central repo, or colocate them with the apps?”

Let’s explore both sides.


📁 Centralized GitOps Repo

Pros:

  • Easier to audit

  • Easier to rollback

  • One revert = config, image, secrets all roll back

Cons:

  • Requires trust and discipline

  • Devs might feel “disconnected” from ops

But with GitOps, this isn’t a problem — it’s the whole point.

You want everything to live together, because you want to move together.

Example:

One commit = new app version + new alert rule + new firewall rule + updated secret One revert = rollback it all

That’s GitOps nirvana.


🧩 App-Centric Repos

Pros:

  • Teams own their destiny

  • Smaller surface per repo

  • Aligned with microservices

Cons:

  • Rollbacks span multiple repos

  • Harder to enforce consistency

  • Changes are fragmented

Worse: you can’t atomically promote one change across envs unless you script the glue. (And you don’t want glue, remember? 🧪)


🧨 The Mission: Not So Possible

Let me describe the horror:

You’ve got:

  • Secrets in one repo or in a an external secret manager like Vault

  • App manifests in another

  • Infra config in a third

Now someone asks for a rollback. You’re stuck doing 3 PRs, across 3 repos, synced by timestamp.

🎯 You need Mission-Impossible-style watch syncing just to revert production safely.

Why is this common?

Because some GitOps tools (👀 ArgoCD) don’t scale well with too many Git repos. So teams consolidate YAMLs into massive, chaotic central repos — or worse, accept the drift.

Flux? It supports dozens or hundreds of Git sources and OCI bundles. No problem.


🔐 Secrets Management in GitOps — How to Do It Without Crying

Managing secrets in GitOps is a lot like parenting teenagers:

They don’t belong in public. They change constantly. And one wrong move can lead to absolute chaos. 😬

But in a declarative world, secrets are just another resource. And they must be treated with the same rigor as everything else.

The problem? Not all tools agree on what “good” secrets management looks like.

Let’s dig in.


❌ What Not to Do

Here’s what bad secrets management looks like in GitOps:

  • Keeping unencrypted secrets in Git (yes, it still happens 😱)

  • Storing encrypted blobs that can’t be validated or rotated

  • Managing secrets outside Git and triggering rollouts manually

  • Spreading secrets across multiple systems with no traceability

  • Committing secrets, then removing them later, hoping CI didn’t leak them in logs

This isn’t just fragile — it’s dangerous.

Secrets deserve first-class treatment. Not "side-channel" status.


🤐 SOPS + Flux: A Match Made in GitOps Heaven

SOPS (Secrets OPerationS) by Mozilla is the gold standard for secrets in GitOps.

Here’s how it works with Flux:

  1. Encrypt your Secret.yaml file with SOPS

  2. Commit the encrypted YAML to Git

  3. Flux decrypts it inside the cluster, just before applying it

Benefits:

  • 💥 No plaintext secrets in Git

  • 🔐 Git history is safe

  • 🛡️ Secrets decrypted only where needed

  • ✅ Everything remains declarative

  • 🧾 Fully auditable and traceable

Example:

That’s safe, versioned, and production-ready.


🚫 ArgoCD: “Secrets? Not Our Problem.”

Now let’s talk about the elephant in the room.

ArgoCD’s official stance on secrets is:

“GitOps isn’t for secrets.” (Source)

Their guidance?

  • Use external secret managers

  • Sync them into your cluster via Vault injectors, sidecars, or CRDs

  • Don’t keep secrets in Git

That’s… fine. But it breaks the GitOps model:

  • You lose auditability

  • You lose repeatability

  • You lose traceability

  • You need extra tooling and scripting

In other words: more glue, more complexity, and less confidence.


🛠️ GitOps Means All State — Including Secrets

You can't say “Git is my source of truth”… and then exclude the most sensitive, most operationally critical parts of your system.

Secrets must be:

  • Versioned

  • Revertible

  • Declarative

  • Environment-aware

  • Safely encrypted

  • Locally decryptable by automation only

And that’s exactly what SOPS + Flux delivers.


🔄 Secrets Rotation and CI

Worried about secret rotation?

Here’s a common pattern:

  • Your CI pipeline rotates secrets on a schedule

  • It re-encrypts the Secret.yaml using SOPS

  • It commits the result to Git

  • Flux detects the commit, reconciles automatically

Bonus:

  • You can build GitHub Actions or GitLab CI jobs to rotate, reencrypt, and commit in one go

  • You maintain full audit history

  • You avoid manual sync hell


🔁 One Commit to Rule Them All

This brings us back to a key GitOps principle:

"Everything that changes together, should live together."

That includes:

  • The app image version

  • Its configuration

  • Its secrets

  • Its context

All in one Git commit.

So when you deploy v1.2.3 of your app:

  • The config is correct

  • The secret is in sync

  • The RBAC is ready

  • The DNS is pointed

  • The Git history reflects it all

💣 And when you roll it back?

One Git revert. One change. Full state reversal.

That’s GitOps at its finest.


💬 Bonus Tip: Don’t Be Afraid of Secret YAMLs

Some engineers get nervous when they hear "secrets in Git".

Here’s the truth:

  • SOPS-encrypted secrets are useless to attackers without the key

  • Git histories are auditable (a plus for compliance)

  • GitOps gives you change history — even for secrets

  • All of it is KRM-compliant, so tools like Flux or AI agents can reason about it

  • Git supports hooks, and the SOPs documentation provides solutions to prevent unencrypted secrets from being committed.

Don’t fear secrets in Git.

Fear secrets that nobody understands, documents, or can rotate.


⚖️ Drift: The Silent Killer — Why Precision Is an Ops Superpower

Let’s be real for a second.

🧑💻 Developers love freedom.

🧑🔧 Operators love predictability.

Both are valid. But when you're responsible for running production — precision is everything.

Because if your definition doesn’t match what’s running in your cluster…

💥 Drift happens.


💥 What Is Drift?

Drift is the difference between:

  • What you think you’ve deployed (your Git state), and

  • What’s actually running in Kubernetes

It’s the invisible gremlin that causes:

  • 🔍 Debugging nightmares

  • 🧪 Broken tests

  • 📉 Observability lies

  • 🔄 Unexpected behavior after restarts

  • 😤 “But it worked yesterday!” moments

Drift is the antithesis of GitOps. And most teams don’t even realize how much drift they have — until it’s too late.


🔄 Why Does Drift Happen?

Drift creeps in when:

  • Someone edits a resource live via kubectl or a UI

  • A secret is updated out-of-band

  • A CI pipeline applies something but forgets to commit

  • Manual patches are made during incidents and never reconciled

  • Resources are generated by tools that don’t round-trip back to Git

The result? Your source of truth is no longer… true.


🛠️ GitOps Tools: How They Handle Drift

Let’s talk about GitOps tools and their attitude toward drift:

If you value trust in your system, you want the first group. If you enjoy chasing ghosts, go with the second.


🧠 Operators Think in Systems

Here’s the thing:

For devs, failure is usually about logic bugs.

For ops, failure is usually about state mismatch.

Devs think:

  • “Why is this function not returning the right thing?”

Ops think:

  • “Why does this resource exist in staging, but not in prod?”

  • “Why is this secret different from what’s in Git?”

  • “Why is this config map still here when it was deleted 2 weeks ago?”

These aren’t bugs in code. They’re bugs in infrastructure reality.


🎯 Precision = Confidence

In Ops, precision is survival.

  • One wrong RoleBinding → your app can’t talk to the DB

  • One missing firewall rule → traffic drops silently

  • One env var typo → app crashes

  • One leftover volume → costs spike

  • One mismatched secret → login fails in production

This is why tools like Flux and Kustomize matter so much. They give you strong guarantees:

  • ✅ What’s in Git is what’s in the cluster.

  • ✅ No drift. No surprises.

  • ✅ No mystery state.


🔒 Config Sync: The Drift Hammer

Want to be even stricter?

🛡️ Google’s Config Sync includes an admission controller that:

  • Rejects manual edits to declared resources

  • Enforces Git as the only allowed input

  • Blocks kubectl apply if it’s not coming from Git

This is the no-compromise approach to drift prevention.

It may sound rigid — but in regulated environments, it’s a blessing.


🎵 You Can’t Always Get What You Want…

Let’s borrow from the Rolling Stones:

“You can’t always get what you want, But if you try sometimes, you just might find... You get what you need.”

Kustomize + Flux = exactly that.

No templating noise. No guessing. No secret --dry-run logic. Just pure, declarative infrastructure — that stays declared.

That’s what you need.


💡 Drift Is Optional — If You Choose the Right Tools

To summarize:

  • If you're okay with surprises: use ArgoCD + Helm

  • If you want calm, clean, testable infra: use Flux + Kustomize

  • If you’re aiming for AI-OPS or auto-remediation: drift = your enemy

You can’t build smart, autonomous, resilient systems on sand. And drift is sand.


🚀 How to Migrate from ArgoCD + Helm to Flux + Kustomize — And Why You Should

If you've read this far, you’re probably already convinced — or at least curious.

You want to move from ArgoCD + Helm to Flux + Kustomize, but you might be wondering:

  • 😬 “Is it worth the pain?”

  • 🤯 “How do I even start?”

  • 🧠 “How do I explain this to my team?”

Let’s unpack it all.


❓ Why Should You Migrate?

Because it works better. That’s the short version.

But here’s the long one:

✅ 1. You eliminate the templating trap

  • No more go-templates nested in YAML hell

  • No more “where does this value even come from?”

  • Just readable, declarative manifests

🔐 2. You gain stronger security posture

  • No custom RBAC overlays

  • No UI overrides or click-ops

  • Just Git + Kubernetes + RBAC

🔄 3. You regain full reconciliation

  • Kustomize is stateless and drift-averse

  • Flux syncs declaratively, not by patching templates

  • Rollbacks are real — one git revert away

🧠 4. You align with Kubernetes' philosophy

  • Controllers manage state

  • CRDs describe it

  • RBAC secures it

  • Git drives it

This isn't opinion. It’s how Kubernetes is designed to work.


⚙️ How to Plan the Migration

Migration doesn’t have to be “big bang”. You can run ArgoCD and Flux in parallel. Seriously. They won’t fight — unless they touch the same resources.

🛠️ Step-by-step guide:

  1. Inventory your ArgoCD apps

  2. Choose a non-critical app to migrate first

  3. Extract the Helm values

  4. Translate to raw Kubernetes manifests

  5. Kustomize it

  6. Commit it in Git

  7. Create Flux Kustomization + GitRepository CRDs

  8. Watch Flux sync it

  9. Done. You’re now running Kustomize + Flux.

  10. 🎉 Delete the old ArgoCD Application when you’re ready


📊 Tips for a Smooth Migration

  • Use kustomize build locally to verify overlays

  • Use kubeval or conftest to lint before applying

  • Compare live cluster state with kubectl get + diff tools

  • If you're using Helm-based charts from vendors: use Flux’s Helm controller temporarily

But eventually, the goal is: get out of Helm where you can. Templates are traps. Compositions are forever.


🗣️ How to Convince the Team

Developers may resist. ArgoCD’s UI is slick. Helm feels easier (until it isn’t).

Here's what to tell them:

  • 🧭 "This aligns better with Kubernetes itself."

  • 💥 "We reduce risk by reducing complexity."

  • 🧠 "You’ll understand what’s deployed without reading templates."

  • 🔄 "We gain proper rollbacks."

  • 🔐 "We improve our security posture."

  • 🧰 "We avoid having multiple sources of truth."

  • 🧘 "You’ll sleep better."

And if they push back?

“Only dead leaves go with the flow. It takes strength to swim upstream.”

Show them Headlamp as an alternative UI. Or better: teach them to trust their Git repo.


⚠️ Common Pitfalls

❌ Trying to migrate everything at once

This is a recipe for burnout and failure. Pick one app. Nail it. Then scale.

❌ Not enforcing reconcile only from Git

Letting devs kubectl apply random stuff after migrating breaks the model.

Use admission controllers like Config Sync or OPA Gatekeeper if needed.

❌ Leaving secrets out of the loop

Migrate secret management to SOPS + Flux too. Don’t bolt it on later.

❌ Misaligning repo structure

Stick to deterministic paths:

Your CI, CD, and tooling will thank you.


🔄 This Isn’t Just a Migration — It’s an Upgrade

You’re not just switching tools. You’re shifting philosophies:

From:

🧩 Patchable templates + UI override culture + fragile pipelines

To:

🧱 Declarative convergence + controller logic + composable infra

You’re reducing glue. You’re enabling auditability. You’re building trustworthy automation.

That’s how you scale your platform.


🎯 Final Thoughts — KRM Is the Model, Flux Is the Engine, Git Is the Truth

Let’s step back and recap what really matters.

This isn’t just a tooling debate.

It’s a philosophical fork in the road for platform engineering.


🔧 You Can Build a Platform... Or You Can Glue One Together

Over the last decade, we’ve seen dozens of platform trends:

  • YAML

  • Helm

  • Terraform

  • Operators

  • GitOps

  • Service Meshes

  • Internal Developer Portals

  • AI-OPS

But the core question never changes:

“How do we safely, efficiently, and reliably ship software at scale?”

And in 2025, the best answer is:

KRM-native GitOps with FluxCD.

Why?

Because it offers:

  • 🧱 A unified model for all resources

  • 📦 Declarative infrastructure with no imperative hacks

  • 🔐 Security-first principles via native RBAC and pull-based sync

  • 📖 Auditability for every change — app, config, secret

  • 🤖 Machine-readability and compatibility with AI-based automation

  • 🚀 CI/CD alignment that’s repeatable, observable, and rollback-friendly

  • 😌 Calm operations that don’t depend on tribal knowledge or magical templates


🚫 Don’t Let UX Drive Your Architecture

Yes, ArgoCD looks great. Yes, Helm is easy… at first.

But short-term convenience creates long-term chaos.

Choose tools that align with your system’s foundations:

  • Kubernetes is about controllers, not dashboards

  • It’s about convergence, not overrides

  • It’s about declarative state, not imperative flows

If you want to build a reliable, scalable, understandable platform… You need to build with the same DNA as Kubernetes itself.

And that means:

  • KRM

  • FluxCD

  • Kustomize

  • GitOps


💬 A Personal Note

I’ve spent 10 years in production Kubernetes, with organizations like Radio France, BlaBlaCar, Brittany Ferries, BforBank, and LCL. I’ve lived the good, the bad, and the YAML-induced nervous breakdowns. I’ve maintained ArgoCD and Flux in production, and written thousands of lines of Helm charts. I even wrote my own YAML generator in Puppet 🤮 back when Helm didn’t exist yet.

I’ve trained hundreds of engineers (CKA/CKAD), conducted countless audits, and debated endlessly with tool maintainers. And I can tell you with confidence:

Simple wins. Declarative wins. KRM wins.

It’s not flashy. It’s not trendy. But it’s scalable, explainable, and deeply aligned with the system we trust to run our businesses.


💡 One Last Push

If you’re stuck in:

  • ArgoCD sprawl

  • Helm complexity

  • ClickOps confusion

  • Drift disaster

  • Secrets disarray

  • CI/CD overload

You don’t need a miracle. You need a model that works — and the courage to enforce it.

Start with one service. Move it to Flux + Kustomize. Use SOPS for secrets. Adopt Git as truth. Watch the chaos fade.


🗣️ Let’s Keep the Conversation Going

Already using Flux? Migrating away from ArgoCD? Curious about Gitless GitOps or Config Connector?

👇 Let’s connect and share war stories in the comments.

I’d love to hear:

  • What’s worked for you?

  • What blew up at 2am?

  • What’s your favorite GitOps anti-pattern?

Because if you’ve made it this far — you’re not just doing DevOps.

You’re building the next generation of platforms.

One YAML at a time.

Nice article. Thank you for your effort.

Like
Reply
Philippe Ensarguet

⚙️VP Software Engineering | 🏆 2021 Awarded CTO | 🌎 Digital Transformation Leader | 🌐 Board Advisor | 🎙️Keynote Speaker

1mo

really nice one Pierre-Gilles, it reminds me of our chat in the cab while moving from Kubecon London to the train station with Seifeddin!

Smaïne KAHLOUCH

Team Leader - DevOps / Platform engineer / SRE (Freelance)

1mo

Opinionated article with very good insights! Thanks Pierre-Gilles Mialon I'm aligned with most of your assertions. At least for managing the platform itself. For the developer experience I'd like to know how the dev workflow would be. Even though I'm personally convinced by Flux (this is one of my favorite tools in my demo repo: https://github.com/Smana/cloud-native-ref), we also have to provide the proper abstraction to our devs ;)

Great article. I think the secret part you describe goes also with great teams maturity on such subjects otherwise the risk can be high.

To view or add a comment, sign in

Others also viewed

Explore topics