The Hypervelocity bet: agentic coding at platform scale

There is a flavour of conversation about agentic coding I have stopped finding interesting. A senior engineer demonstrates Claude Code, or GitHub Copilot CLI, or ChatGPT Codex on a non-trivial task; the room makes appreciative noises; somebody asks about productivity; the meeting ends with a vague resolution to "encourage adoption." Repeat in a different building next month.

The demos are useful as recruitment for the practice. But they answer the wrong question. The question I have ended up actually working on, as Principal AI Engineer on Cloud & Platform Services at AVEVA, is not whether an agent can ship a feature faster than a developer can type. It is what an engineering platform has to look like — versioned APIs, contract tests, telemetry, FinOps, audit trails — once a non-trivial share of its code is being written by non-deterministic processes acting on the developer's behalf.

That is the bet I am calling Hypervelocity, after the initiative at AVEVA I'm contributing to. The bet: the platform changes more than the keystroke does. The productivity story is downstream of a platform story, and most of the disappointments I see in adoption trace back to the platform not having been treated as the primary work.

I want to lay out what that platform looks like when it is designed deliberately, what it costs when it isn't, and what is still unsolved. None of this is novel in its parts. The pieces are familiar to anyone who has done platform engineering, versioned an SDK, or run a compliance gate in CI. The question is which of them load-bear differently when the consumer on the other side of the contract is occasionally an LLM-driven agent acting under a human's name.

What I mean by "agentic coding"

By agentic coding I mean using tools like Claude Code, GitHub Copilot CLI and ChatGPT Codex in a mode where the model is driving a multi-step workflow: reading the repository, planning, editing across files, running tests, reading the output, deciding the next move. The developer is in the loop, but the loop is wider than a single keystroke. Across a session, the agent acts on the codebase the way a junior engineer would on a ticket — with the supervisor checking in at points the supervisor chooses.

The structural property that matters is that the action is not deterministic in the way a compiler's action is. Two runs of the same agent on the same prompt against the same codebase will not produce byte-identical diffs. The model's choices are sampled. The final patch is path-dependent on intermediate tool calls the supervising engineer rarely audits in full. The diff is real. The process that produced it is not, strictly, reproducible.

That non-determinism is why the platform questions get sharper. Almost everything an engineering organisation does to manage risk — code review, regression testing, change auditing, compliance evidence, incident forensics — assumes the artefact has an explainable causal history. Agentic coding does not break that assumption. It stretches it, and the stretch is where the platform either holds or doesn't.

Why the platform changes, not the keystroke

I have spent most of my career building platforms for other engineers — at Live Nation, at Philips on the Clinical AI App Store inside three-hundred-plus hospital VMware estates, at BrightInsight where compliance and continuous delivery sat together by design, at dunnhumby where contract testing with Pact cut integration bugs by forty per cent. The pattern that travels across all of those, more than the language or the cloud: the leverage in an engineering organisation lives in the contracts between teams, not in the teams themselves.

The reason agentic coding is a platform problem is that it changes the rate and the shape of the traffic going through those contracts. Three things happen at once.

First, the number of changes per unit time goes up. An agent doing a refactor across twelve files is a single PR, but the rate at which a developer can author such PRs increases substantially, and the cognitive cost of reviewing them does not decrease in step. If the platform was already at the edge of what its review and CI processes could absorb — and at most companies it is — agentic coding moves it past that edge quickly.

Second, the shape of the changes shifts. Agents are very good at work that pattern-matches against a large corpus — refactors, format migrations, glue code, test scaffolding, well-understood adapters. They are less good at work where the constraint is local, recent, or not yet written down. The proportion of "high-volume, medium-stakes" change grows; the proportion of "low-volume, high-judgement" change does not. The platform has to absorb that shift without lowering its standards for the latter.

Third, the locus of authorship gets fuzzier. A PR may contain contributions from two or three agents across two or three sessions, against drafts of a feature that no longer exist. "Who wrote this line" stops resolving to a single human, a single tool, a single time. That is fine for accounting and not fine for forensics. If you need to ask, six months from now, "why did this branch of the if-statement get chosen," you want more than a commit author. You want the trace of the conversation that produced it.

The Hypervelocity bet is, in short, that the platform investments needed to make agentic coding safe at scale are the same investments that make engineering organisations better in general. Stronger contracts. Better telemetry. Honest cost attribution. Audit trails an external auditor would recognise. None is new. All get more important.

What I mean by "the platform"

When I say the platform changes, I do not mean a single product. I mean the union of things an engineering organisation provides to its product teams to make production work feasible — closer in spirit to a Team Topologies enabling platform than a single internal service. At AVEVA the Core AI Services platform is one piece of that — versioned public APIs and SDKs on AKS, fault-tolerant patterns — but the platform that has to evolve for agentic coding is wider than any single team's deliverable.

Concretely, "at platform scale" means seven surfaces, each already familiar to platform engineers, each stress-tested differently when agents are part of the workflow.

1. Versioned APIs and SDKs

The point of versioning has always been to let consumers move at their own pace. With agents in the loop, the consumer's pace can be much higher and its appetite for breakage much lower. An agent follows an SDK README literally. If the README is six months stale and the SDK has shipped a breaking change, the agent will confidently produce broken code, and the developer will spend the saved time debugging the discrepancy.

The investment that pays for this is the one that always pays: stable contracts, semantic versioning, deprecation windows with telemetry on usage, and machine-readable specifications — OpenAPI or gRPC schemas — that the agent can consume rather than guess at from prose. The non-deterministic consumer raises the cost of every undocumented or implicit API edge.

2. Contract tests as the source of truth

When the producer of code is non-deterministic, the only reliable claim about behaviour is the one a contract test makes. I have leaned on Pact-style contract testing for years for entirely different reasons — cross-team coordination overhead, integration regression — and the practice transfers directly. If the agent is generating code against an SDK or a service, the contract the SDK is verified against is the closest thing the team has to a behavioural specification the agent can be held to. Any contract you do not have, the agent will infer; and the inference will be confidently wrong.

3. Telemetry: agent runs as first-class events

Most organisations instrument their applications. Far fewer instrument the engineering process that produces them, and where they do — DORA metrics, deployment frequency, lead time — the granularity is the deployment, not the change. Agentic coding wants telemetry a level finer. Per-session: tool calls, files touched, model version, token cost, final outcome. Aggregated: where in the codebase agents are succeeding and where they are quietly failing, which prompts produce churn, which repositories have effectively become unreviewable.

This is the surface where most organisations are weakest, and the gap matters most. Without it you cannot tell whether agentic coding is helping. You can only tell whether developers feel like it is.

4. Code review reimagined

Code review does not become less important. It becomes differently shaped. The reviewer is no longer checking whether the author understood the change — frequently the author barely did; the agent did the work and the human ratified it. The reviewer is checking whether the change makes sense in the system the platform team curates, and whether the audit trail is intact. That argues for stronger automated gates on the cheap-to-detect categories — style, structural conformance, security lints, dependency provenance — so that human review is reserved for the questions only humans can answer: is this the right change at all, in this codebase, at this time.

5. Standards conformance: linters, formatters, policy-as-code

Agents are excellent at conforming to standards that exist in code, and terrible at conforming to standards that exist only in tribal knowledge. Every standard your organisation cares about must be encoded as a lint, a formatter, a policy-as-code rule, or a CI gate. Architecture decision records, naming conventions, library choices, deprecated patterns — anything that lives only in a Confluence page is invisible to the agent and will be quietly violated. Agentic coding changes the calculation from "worth doing" to "non-negotiable."

6. FinOps and cost attribution

Agentic coding is not free. The token cost of a single substantial session is small in isolation and very large in aggregate, and the cost is borne unevenly across teams and repositories. The same FinOps discipline I have always argued for — attribute the spend, surface it to the team that incurred it, govern it with budgets — applies, with the wrinkle that the unit being attributed is not a compute resource but a session. Treating agent usage as a first-class cost centre, with per-team attribution and per-repository budgets, turns a creeping invisible expense into a managed line item. It also surfaces the failure mode where a team is burning tokens against a problem the platform should have solved instead.

7. Audit trails for non-deterministic generation

This is the surface I am least satisfied with. The audit trail you want for a piece of agent-authored code is: the prompt, the model and its version, the tool calls and their outputs, the intermediate file states, the developer's interventions, and the final diff. With current tooling, you can capture some of that some of the time. You cannot reliably capture all of it all of the time, and even if you could, the artefact is large and unwieldy and outlives the contract terms of the model provider. Regulatory regimes I have spent a decade working under — ISO 13485, IEC 62304, GDPR, HIPAA, MDSAP — assume an artefact you can produce on demand to defend a decision. We do not yet have an industry-standard artefact for "this code was co-authored with an agent." Building one is one of the most important pieces of platform work still outstanding.

What is still unsolved

I have made a deliberately confident argument above because I think the platform-engineering frame is the right one. There are at least four problems sitting underneath it that are not yet solved, and the honest version of the Hypervelocity bet has to acknowledge them.

Determinism, properly defined. It is tempting to ask for "deterministic agentic coding" and easy to ridicule the request. The subtler question is: what is the smallest piece of the generation pipeline that needs to be reproducible for the audit trail to be useful? The prompt, the model version, and the tool-call sequence are necessary; whether they are sufficient depends on whether the model provider commits to reproducing outputs from a snapshot. None do, today.

Model versioning as a supply-chain concern. When a session ran against a particular model snapshot, and the production code is being defended in front of an auditor years later against a model long since retired, what is the answer? I do not have one, and I have not heard a satisfying one from anyone else. We are going to need SBOM-style provenance for model versions and tool-call traces, before regulators ask for it, not after.

Hallucinated dependency chains. Agents will, with great confidence, import packages that do not exist, or are the wrong one, or are the right name but malicious. The last category is being actively exploited. The mitigation — dependency allow-listing, signed packages, curated registries — is not new, but suddenly urgent for organisations that had been treating the public registry as the source of truth.

Knowledge transfer in the other direction. A junior engineer who learns a codebase by reading and modifying it learns the codebase. A junior engineer who learns it by asking an agent to make changes does not, necessarily. The platform question is whether the agentic workflow can be designed to teach its supervisor, the way pair programming does, rather than only to act for them. The teams I have seen do this best treat agent sessions as artefacts to be reviewed and discussed in their own right, not just as inputs to the diff.

What this looks like from inside the work

I want to close with the texture of this work, day to day, rather than the architectural diagram of it.

It looks like spending a week writing a machine-readable specification for an internal SDK that has lived as prose for three years, because the agents using it produce more working code per attempt once the spec exists. It looks like adding telemetry to a CLI nobody has thought about in eighteen months, because that CLI is now the most-invoked tool in the agentic toolchain. It looks like a long conversation with a security colleague about how to log prompts that might contain regulated data, and an equally long one with a finance colleague about how to attribute a shared LLM budget across product teams.

It looks like writing more contracts and fewer scripts. It looks like treating the build, the lint, the policy-as-code gate and the test suite as load-bearing, because they are now the substrate against which a non-deterministic generator is checking itself. It looks, in other words, like the platform engineering I would have advocated for anyway — with the urgency turned up and the cost of half-measures unusually visible.

The bet I am making at AVEVA, alongside colleagues across Cloud & Platform Services and the wider Hypervelocity Engineering effort, is that an organisation that takes the platform work seriously now will be able to use these tools as a multiplier on the work engineers should never have had to spend their time on — and as a forcing function for the platform discipline the organisation should have invested in regardless.

The keystroke part is real. It is also the easy part. The platform part is the bet.

— Madu