Self-Reinforcing Code Quality Pipeline: my 300% Productivity Boost in Coding with AI

Self-Reinforcing Code Quality Pipeline: my 300% Productivity Boost in Coding with AI

In this serious of posts, i want to introduce my Self-Reinforcing Code Quality Pipeline, which allows me to fully automate entire software projects and sub-projects with Cursor and Anthropic - without the usual issues everybody is talking about.

Before I talk about the Planner & Executor Model (coming in the next post), I want to share how I chained Cursor & Co. together to prevent them from deleting or breaking existing code - and how they even test their own code before confidently saying, “I’m done.”

The magic word: Self-reinforcing Pipelines for Testing & Code Quality


Article content

Rule #1: Test Coverage

In all backend systems, I enforce 90% test coverage  -  plus a rule that all tests must pass after every change. This forces Cursor to write tests on its own if coverage drops below the threshold. Also, tasks are only considered “done” when all tests pass and 90% coverage is achieved. For frontend development, the test coverage my vary based on the overall structure.

Rule #2: Linting + Error Detection

Especially with scripting languages like Python and TypeScript, linting is not just helpful  - it’s essential for automated code generation. Otherwise, the AIs happily generate code that doesn’t exist or doesn’t even run. Anthropic sometimes creates method calls with non-existent names. If you wrap such broken code in an async background process with try-catch, you might not notice until much later.

Tools like mypy and flake8 help detect and eliminate these issues early on. With constant linting (pylint), the code stays human-readable  - for the times when we humans need to step in because the LLMs give up.

Rule #3: Enforce Rules via Pre-Commits

Since Cursor often likes to play the rebel and ignores rules, I’ve packed all these points into a pre-commit strategy.

This way, every commit automatically triggers various checks, making it impossible for untested, fragile, or broken code to enter the repo.

Pre-commits are essentially rules enforced by Git itself. Of course, Cursor sometimes tries to cheat by using --no-verify. So, there’s also a rule in place that it’s not allowed to do that. 😉

Rule #4: Let Cursor Commit

It’s crucial that Cursor commits its own changes  -  because it then watches the entire linting & test suite run every single time!

When tests fail or coverage drops, Cursor instantly knows and must fix it before committing again. This creates a cascade of checks and tests that force Cursor to improve its output.

Sometimes this results in long brute-force sessions, where it keeps predicting and adjusting code until everything finally works and all checks pass. Voilà! Code quality, done right.

Side bonus: This process also generates much better, more detailed commit messages. You can feed these into a vector database or knowledge graph for future searches: “When exactly did we fix that rounding bug in the subscription module?”

Rule #5: Whitelist Safe CLI Commands

For Cursor to run tests, lint, check, and search on its own, I maintain a large whitelist of CLI commands that it’s allowed to run automatically -  no questions asked.

You could go full YOLO mode  - but obviously, destructive commands like rm shouldn’t be automated. Plus, sometimes it misquotes double quotes, which can cause unintended side effects.

That’s why I stick to the whitelist approach:

Around 50 approved commands - from sed, grep, to flake8, git add, git commit, etc.

Article content

Rule #6: CodeRabbit

The best tool in this pipeline is CodeRabbit. Our developers requested it long ago, and it works wonders for automated AI development.

CodeRabbit detects potential bugs or side effects that might otherwise go unnoticed. It’s incredibly valuable.

Currently, I feed every single CodeRabbit finding back into Cursor until CodeRabbit has nothing left to complain about. Even this process can now be automated.

Rule #7: Custom Linting Helpers

flake8 can be extended with custom tools and plugins.

Example: Cursor sometimes generates unit tests that are actually integration tests  -  still hitting real APIs, using Supabase Cloud, and skipping mocks.

Or it inserts time.sleep() into tests for async processes, making everything painfully slow.

Sure, you can set Cursor rules to prevent this - but it still happens.

Thanks to custom plugins, you can write detectors that catch such cases and give Cursor a hard time.

And who writes those plugins? Well, obviously: Cursor does. 🙂

Conclusion:

What we’re talking about here isn’t just a pipeline - it’s a self-reinforcing pipeline. One where AI tools aren’t merely generating code, but continuously testing, fixing, and improving their own output - until everything meets strict quality standards.

That’s not automation - it’s automated craftsmanship.

It’s not magic  -  it’s engineering with guardrails. And yes, watching an AI grind through dozens of commit cycles until itself is happy? Weirdly satisfying.

Stay tuned  - next post: the Planner & Executor Model that takes this even further.

Fabian Kaufmann

Developing excellent products in eBusiness Print that will drive efficiency and delight customers

1mo

Any thoughts on using Gemini inside firebase/GCP?

Like
Reply
Roland Golla

APIs for everyone on the internet

1mo

Pre-commit checks? - do not work for me. Have to commit, when I need. had some debug var in twig finder before push. Put happens to less. So just fire all an PR.

Hagen Hübel

CTO @ infobud.ai - Chief of Vectors and Data

1mo

I tested #Claude Code with my workflow - but unfortunately, it’s not usable in automated environments due to its overly cautious approach to permission handling. With Cursor, I’ve whitelisted a set of safe commands (e.g., pytest, linters) that can run automatically without repeated prompts. Claude offers a similar concept, but it breaks down in practice: 1. Approvals are session-bound and don't persist. Each new session forgets prior permissions, forcing you to start over. 2. Wildcard support is poor and not working as expected. You can't whitelist something like "python -m pytest *", meaning even harmless test runs require manual approval. This directly affects efficiency. Right now, I’m stuck in a Claude session after it introduced changes that broke ~10% of tests. Thanks to my pre-commit hooks, it’s forced to fix them. But every time it tries to rerun a test file, it asks for permission again! What Cursor would have completed overnight is still blocked on Claude because I’ve been asked 20 times whether it's okay to run a unit test. Until Claude Code reworks this permission model, it’s not ready for serious professional workflows.

  • No alternative text description for this image
Like
Reply
Hagen Hübel

CTO @ infobud.ai - Chief of Vectors and Data

1mo

Ultimately, this requires a significant mindset shift. Part two of the series is available here: https://www.linkedin.com/feed/update/urn:li:activity:7349569571193802755/

To view or add a comment, sign in

Others also viewed

Explore topics