
Reconciliation loops everywhere: GitOps, controllers, agentic feedback
Once you see the reconciliation loop — desired state, observed state, corrective action — you find it under operators, GitOps, and now agentic coding.
There is a particular shape of system I keep noticing. It runs quietly under GitOps, under every Kubernetes operator I've shipped, under event-sourced aggregates, and now — more interestingly — under the agentic coding tools I spend a chunk of my week in.
The shape is the reconciliation loop. Three parts, plainly stated:
- A desired state you have written down somewhere.
- An observed state the world is currently in.
- A controller that compares the two, computes the error signal, and applies a corrective action — then loops again.
That is more or less it. It is a control-theoretic primitive, almost boringly old, and it travels further than anything else I've encountered in twenty-five years of building software. I want to write down where I keep finding it, because once you see it you don't stop seeing it, and that sight has practical consequences for how you build platforms — and now, how you collaborate with models.
The primitive, plainly
A controller does not need much. It needs a record of intent, a way to look at the world, and a way to act. The control loop has been engineered into thermostats, autopilots, biological homeostasis and most non-trivial distributed systems. The reason it survives so well is that it is robust to a class of problem you cannot otherwise design away: the world drifts. Hardware fails. Humans make manual changes. Network partitions happen. A perfectly correct deployment at 09:00 is not necessarily the deployment that is running at 09:30.
If you cannot stop the world from drifting, you stop pretending you have "deployed" anything in a one-shot sense. You declare what should be true, you observe what is actually true, and you keep nudging the second toward the first. The shift is from imperative procedure to declarative outcome — "reach this state and stay there," rather than "run these commands in this order on this server."
I list this primitive in my profile under GitOps & Declarative Infrastructure — "Git as single source of truth, continuous reconciliation, drift detection" — because that is the line of work where the loop became unavoidable for me. But the moment it became a portable mental model rather than a deployment technique was somewhere around year three of doing it.
Setting one: Kubernetes operators in air-gapped hospitals
The first place the loop stopped feeling like an infrastructure detail and started feeling like a primitive was at Philips. I architected the Clinical AI Platform — a container-based Clinical AI App Store deployed across 300+ hospitals globally. Third-party ML models, packaged as containers, running inside hospital VMware estates in Germany, India and the UK. ISO 13485 and IEC 62304 applied. Many of those environments were air-gapped from the public internet. PACS, RIS and imaging equipment integrated over HL7 and FHIR.
You cannot ship that with shell scripts. The site engineer who installs the cluster is not necessarily the person who upgrades it. A clinical model might need a specific GPU node label, a specific licence secret, a specific HL7 endpoint mapping to a specific PACS vendor, and a specific feedback queue for the closed-loop diagnosis messaging that lets clinicians validate the model against real-world outcomes. The configuration surface is wide, the environment is hostile to ad-hoc intervention, and the regulatory record needs to say what was running, where, when, against which version of the model.
The Kubernetes operator pattern is built for exactly this. You define a
custom resource — ClinicalModelDeployment, say — that captures intent:
which model, which version, which licence, which feedback channel. You
write a controller that watches those resources and walks the cluster
forward toward what they describe. If a hospital site engineer manually
deletes a deployment because they were debugging something at 02:00, the
controller notices the divergence and recreates it. If a model is upgraded
in the central catalogue, the same controller rolls the change forward at
the next reconcile cycle. The intent is declarative, the corrective action
is encoded once, and the audit trail falls out naturally because every
applied change is a function of a recorded desired state.
What made this real for me, rather than a textbook example, was the combination with air-gapped operation. There is no "just SSH in and fix it" path of last resort across a regulated estate. The loop has to be the operational model, not a fallback to manual recovery. Once you have lived with that constraint, you stop reaching for imperative deploy scripts even in environments that would tolerate them. The patterns are easier to reason about, easier to test, and easier to make compliant.
Setting two: ArgoCD, GitOps and drift detection
If the operator is the loop applied to one resource type inside a cluster, GitOps is the same loop applied one level up — across clusters, against a Git repository as the single source of truth. I list ArgoCD and Flux as "reconciliation agents" on my profile for a reason: they are not deploy tools. They are controllers whose desired state happens to live in version control.
The pattern that recurs: you stop treating "deploy" as a verb performed by
a pipeline and start treating it as a property of the system. A pipeline's
job becomes "make the desired state in Git reflect the change you want."
The reconciliation agent's job is to make the cluster catch up. Drift —
someone hot-fixing a production deployment with kubectl edit — is no
longer a quiet, invisible failure mode. It is a visible signal: the
observed state has diverged from the declared state, and the controller
either corrects it or surfaces it for a human. Either way, the divergence
is named.
I have spent six-plus years inside this model, across very different domains. At Philips it was clinical software in air-gapped hospitals. At BrightInsight it was a multi-cloud regulated platform serving 20M API calls a day, where the same declarative posture extended to indemnity-insurance rules — we encoded those into a JSON-based DSL so underwriters could change products independently, with the platform reconciling configuration against intent in the same spirit as it reconciled infrastructure. At AVEVA, on the Cloud & Platform Services team, the Core AI Services platform sits on AKS and Azure with the same continuous-reconciliation posture under versioned public APIs and SDKs.
The thing I want to underline is not that GitOps is good — that argument has been made enough times. It is that the loop generalises. The reason GitOps tools can be built in the first place is that the underlying control-theoretic shape is sound. Once a team has internalised it for infrastructure, they reach for it for configuration, for feature flags, for policy (OPA, Cloud Custodian — "Policy as Code" on the same profile page), and increasingly for ML model lifecycles. Every one of those is a "desired state vs observed state, applied as a corrective action" formulation in a different costume.
Setting three: event-sourced aggregates as a relative
A brief detour, because it is the same shape from a different angle.
Event sourcing — which I list alongside DDD and CQRS in my pattern inventory, and which I have used in production from Credit Suisse derivatives through BrightInsight to dunnhumby — is what you get when you make the desired state of a business aggregate a function of an immutable log. The aggregate's current state is not a row you mutate. It is a projection that you rebuild by replaying events. If the projection drifts or is corrupted, you do not patch it; you rebuild it from the log.
That is a reconciliation loop with a wider clock period. The log is the desired narrative, the projection is the observed state, and the rebuild is the corrective action. The same intuition — declarative intent over imperative mutation, rebuildability over patching, drift as a signal rather than an error — sits underneath. I keep mentioning this because it is where I first understood that the loop is not a Kubernetes idea. It is a way of organising any system where state can drift and intent must survive.
Setting four: agentic coding as a feedback loop
Here is where it has gotten interesting again.
For the last stretch of my time at AVEVA, alongside the Core AI Services
platform work, I have been contributing to the Hypervelocity Engineering
initiative — the company-wide push around agentic coding. Day to day,
that means I work with GitHub Copilot CLI, Claude Code and ChatGPT Codex
the way I used to work with kubectl and argocd. The first time I sat
down with Claude Code in earnest, I had a small, surprised moment of
recognition: this is the same loop.
Spelled out:
- The desired state is your intent — usually expressed as a prompt, a spec, a failing test, or an issue description. "This function should do X. The build should be green. The contract test should pass."
- The observed state is what the codebase, the build, the linter, and the test runner are telling you right now.
- The controller is the model-plus-tools loop. The model proposes a change. A tool — a compiler, a test, a linter, a type-checker, a script — runs and reports what actually happened. The model reads that feedback, compares it to intent, and proposes the next corrective action. It loops.
The interesting word in that last sentence is tool. A model on its own is a one-shot generator, no different in shape from a one-shot deploy script. What turns it into a controller is the tool calls back into the real world: run the tests, read the file, lint the diff, execute the script, fetch the type errors. The closer those tools are to the truth of the system, the more useful the loop. This is, structurally, why agentic-coding workflows feel so much more reliable than pure prompt-based coding: the loop has an observed state to compare against, not just an imagined one.
It is also why the workflows that frustrate me most are the ones where the loop is starved of feedback. If I let an agent write a function with no way to actually run the tests, or with a brittle integration that silently passes, I am back to one-shot deploys — a hopeful procedure without a controller. The model's quality matters, but the loop's quality matters more. Tighten the feedback, get tighter convergence. Loosen it, get drift dressed up as confidence.
I do not think this analogy is decorative. I think it is the same
primitive. The desired-state document happens to be a prompt or a spec
rather than a Kubernetes manifest. The observed-state probe happens to
be a test runner rather than kubectl get. The corrective action
happens to be a code edit rather than a pod restart. But the control
shape is identical, and the heuristics that make Kubernetes operators
production-grade — small reconcile periods, idempotent corrective actions,
explicit divergence reporting, bounded backoff — translate uncomfortably
well to how you should set up an agentic coding workflow.
What travels, what doesn't
This is where I try to be careful. Not everything I know about GitOps maps cleanly onto agentic coding, and I have watched colleagues over-rotate on the analogy. The differences are worth naming:
- Determinism. A Kubernetes controller, given the same desired and observed state, will choose the same corrective action. A language model will not. The loop tolerates that, but only if your feedback is honest enough to catch the variance. Tests and types do a lot of heavy lifting here.
- Granularity of the desired state. A Kubernetes manifest is a precise machine-readable artefact. A prompt is a fuzzy human-readable one. Some of the hardest learning in agentic coding is figuring out how to make the desired state crisp enough for the loop to converge. Tests, contracts (Pact, in my world), and types are the closest thing to "declarative manifests for code."
- Blast radius. A Kubernetes operator running an unbounded corrective action on the wrong custom resource can take production down. An agent running an unbounded corrective action on the wrong file can rewrite your repo. The lesson from a decade of operator design — bounded scope, dry-runs, idempotency, explicit confirmation for destructive operations — applies almost verbatim to agentic tooling.
What does travel, in my experience, is the posture. Stop thinking in one-shot deploys; start thinking in declared intent. Stop thinking of drift as an embarrassment; start treating it as the loop's most useful signal. Stop coupling intent to execution; let the controller be the thing that closes the gap. Whether the controller is ArgoCD, an operator SDK, or Claude Code with a test harness wired in, the engineering discipline is the same.
Why I keep mentioning this
I have written publicly about GitOps and declarative infrastructure in the way Fowler and his co-authors recommend in Expert Generalists: "favour fundamental knowledge" over tool-specific recipes, and "trust the patterns underneath the tools." Reconciliation loops are one of a small number of patterns I have genuinely watched survive every shift in fashion over the last twenty years — from .NET 1.0 admin dashboards at DesignSquad to Kubernetes operators in air-gapped hospitals to agentic coding in a Hypervelocity Engineering rollout.
It is one of perhaps four or five abstractions I would teach an early-career engineer if I only had a few hours. The pattern is small, the consequences are large, and it composes. You can layer reconciliation loops: agents-of-agents-of-agents, operators-of-operators, GitOps-driving-GitOps. Each layer follows the same primitive shape. Each layer earns the same properties — declarative intent, drift as signal, corrective action as a function of intent and observation.
When people ask me what changes in platform engineering as AI engineering matures, my honest answer is "less than you'd think." We are not throwing out the patterns. We are finding new substrates to run them on. The model is a new kind of controller — fuzzier, faster, more general — wired into loops that look, structurally, very like the ones I was building for hospital clusters in 2019. The job, as ever, is to make the loop tight, honest, and bounded, and to keep the intent legible.
That is not a flashy claim. It is barely a claim at all. But after six years inside this pattern, I think it is the one I would most want younger engineers to internalise: when you understand reconciliation loops, you stop being surprised by how much of modern computing turns out to be variations on the same idea.
— Madu