How to gain the full speed of AI dev without the risk of slop

Humans

We review every build log, every pipeline result, every deployment status and every Sentry issue logged. Every time. The AI writes the code. A human reviews it all. Enough said.

Compounding engineering

The way we do everything is exhaustively documented. Whenever anything suboptimal happens, the fix or mitigation gets integrated into our centralised dev documentation ready for all future builds. Rules files are also centralised and reviewed regularly by the whole team to make sure they cover all our bases. Those rules are automatically given to the LLM at the start of every prompt session without us needing to remember.

Our documentation system is connected via MCP so we make central updates from right in the coding environment. No excuses.

Commands

We have hundreds of Make commands (triggered by Cursor commands for speed) which do all manner of tasks including all data operations, code repository usage, continuous integration tasks, health checks, container operations and on and on.

These function in parallel with the documentation as it forces conformance to our very opinionated way of operating. It is never up to the LLM to decide how to do something.

Pre-commit hooks

Before every single piece of code is sent to the repository (Git) it is automatically tested for:

Syntax errors across every language in the project
Code formatting
Static analysis that catches bugs before they run
Configuration validation so a bad config file never makes it out
Detection of files that would pass on the developer's machine but fail when built on the server

It also auto-bumps the version number so every single build is traceable back to the exact code that produced it.

Continuous integration/delivery

Every time code is pushed it triggers an automated pipeline with multiple stages.

First everything is compiled, analysed and tested. If anything fails, nothing else happens. Then container images are built for every part of the application and tagged with the exact code version.

Then those images are deployed and an automated process watches the live environment, polling it until it can confirm the deployment actually landed and everything is healthy. If anything is wrong — wrong version, wrong configuration, anything — the pipeline fails and nothing goes live.

Protected branches

The branch that deploys to the live site cannot be pushed to directly. It is protected both locally on the developer machines and on the server. The only way to get code onto the live site is through a review and approval process.

This means every single production deployment has been looked at by a human before it goes out. No cowboy deploys.

Test environments

A full environment is spun up to test every deployment before it goes anywhere near the live site. The automated pipeline deploys there first, verifies everything is working, and even checks that the live site hasn't been affected.

Only after that passes can you promote to production. We implement this on every client project as standard. Just one more layer of sense checking after our reviews.

Sandbox apps

Top tip:

Never fix things "moving forwards". If you've rabbit-holed trying different options to get something working then use that state to create a brief of the desired result.

Realising the true speed of AI dev, simply checkout back to the previous state before you started experimenting then apply the brief from there and you are clean! Without this to-and-fro process, you will never hear the end of that attempted implementation that no longer exists.

We also have a sandbox app for each client where we do experiments to create briefs that we then implement in the production apps.

Clean rooms

Just like dogs, AI really doesn't like change! As with the sandbox app experimentations, if you change something foundational along the way then you will be hearing about the old ways forever in the riskiest possible way.

For changes to elements like the architecture or deployment process, create a completely fresh environment (everything from the server upwards) then deploy and test it there. Make sure all docs are up to date with the old ways purged completely.

Kubernetes

Modern web software is built in containers. Independent micro-apps that all coordinate together to make the whole experience. That orchestration of containers is handled by Kubernetes, the defacto king of container usage.

One very cool feature it brings with it is auto-scaling. That automates gaining extra capacity in the system when you have usage spikes.

Another is rolling-updates. When you deploy new versions of your app in this way, each version of each piece is updated one at a time. This leaves the old versions in place as it launches anew so that traffic carries on in the old way as the update happens.

If something about your new version fails then the process stops and so users continue visiting the original app and are none-the-wiser that anything went wrong. Super-stability built-in!

Infrastructure as code

The servers, networking and everything underneath the application is defined in code, not configured by hand. That means the entire infrastructure can be rebuilt from scratch at any time and it will be identical.

No "it works because someone configured it that way three years ago and nobody remembers how." If something goes wrong you can tear it down and rebuild it in minutes if not seconds.

Encrypted secrets

API keys, database credentials, everything sensitive is encrypted and stored safely in the codebase. We have automated checks and commands that compare what credentials exist in testing versus production and validate that nothing is missing before a deployment goes out.

No more "it works in testing but the live site is broken because someone forgot to add an API key."

Sentry error tracking

Even if everything above fails and something makes it to production, we have real-time error tracking across the entire stack. The moment something breaks in production we get full details of exactly what happened, where and why. It is the last line of defence and it means issues get caught in minutes.

Our ticket system even surfaces new issues as urgent review tasks and suggests the solution to the problem. The next step will be to trigger automated fixes ready for review using the Cursor API but for now we are happier with this tiny manual intervention for issues.

Version traceability

Every single build is stamped with the version number, the exact code that produced it and the timestamp. We always know precisely what is running where. The deployment pipeline actually verifies what landed on the server matches what it just built.

Most people building with AI have no idea what version of their thing is running at any given moment.

Subagents

On top of the agent that writes the code we have a fleet of subagents that run after every commit. Those guys each have a specific job:

One for reviewing common anti-patterns
One for overall architecture making sure that the big picture ethos is maintained throughout
One for checking file lengths and splitting to protect against context sludging (where the AI loses track of what it's doing because a file has grown too long)
Another for database checks and migration safety
And more

The compound effect

This last one is super important for anyone having troubles with AI slop.

All those subagents generate comprehensive reports that feed into the human review process but fundamentally they are a second layer to catch anything loose that comes out of the first wave of code generation.

AI doesn't try very hard to force through best practice when it comes across difficulties as it goes.

It is literally trained to please us not to enforce coding standards but if you follow up with specialists in this way then the chances of something sloppy getting through a two-phase filtering process is radically reduced compared to the one-shot prompting most people are doing.

It may sound like a lot to do. But, if you really lean into the principles of compounding engineering then doing all of that by default for everything becomes very quick. Just attach the setup brief document and prompt "Setup project XYZ". It is like templating on speed and I bet most vibe coders have lost hours for issues that would not exist if you did everything right from the very first prompt.

We seem to have gone from months and months of technical definition, build and testing to expecting everyone to be able to build things with one prompt written in 7-seconds. If we set the needle somewhere a little more realistically then the speed and quality you can get now is absolutely magnificent.