Vibing to Prod: Why DORA Misses the AI Inflection Point
DORA shows delivery improving, but AI’s biggest impact happens earlier. AI shifts value upstream into reasoning and decision making, where work is avoided before it exists. Vibing now produces production ready outcomes, moving rigor earlier and outside what DORA can see.
DORA metrics remain one of the clearest ways we have to understand software delivery health. Across the industry, those metrics show steady, incremental improvement. Teams are shipping a bit faster. Reliability is improving modestly. From that vantage point, AI looks like a useful productivity boost, but not a fundamental shift.
At the same time, many experienced engineers report something very different in practice. Work that once required weeks of exploration now collapses into days. Architectural decisions harden earlier. Entire solution paths disappear before code is written. Some teams move in ways that feel categorically different, even when the metrics barely move.
These two observations appear to be in tension.
Either AI is overhyped, or we are missing something in how we are measuring change.
The resolution is not that DORA is flawed, or that AI impact is imaginary. It is that AI changes where meaningful work happens, and DORA was never designed to observe that part of the system.
What DORA is designed to see
DORA metrics are excellent at what they were built to do. They measure execution quality inside a known delivery system.
Lead time, deployment frequency, change failure rate, and mean time to recovery all share a common premise. Work enters the system in a reasonably stable form, and improvement comes from making that system flow more smoothly and recover more safely.
Those assumptions emerged during the Agile and DevOps era and were later crystallized by DORA into a clear operational framework. For most of modern software delivery, the primary source of leverage was reducing friction inside the pipeline. Better tooling. Better automation. Better coordination.
DORA raises the floor. It helps organizations become reliable. It turns chaos into competence.
That is not a limitation. It is the point.
Even leaders who operate upstream in product, architecture, or planning often rely on delivery metrics as a shared language for organizational health. That reliance makes shifts in upstream work harder to see unless they are explicitly surfaced.
What AI changes about the shape of work
AI does not simply accelerate execution. It changes where value is created.
Increasingly, the most consequential work happens before code exists, before tickets exist, and sometimes before anyone agrees there is work to do at all. Conversational exploration replaces spike tickets. Architectural options are evaluated through reasoning rather than implementation. Failure modes are surfaced and avoided instead of discovered through rollback.
From the perspective of delivery metrics, very little appears to happen.
From the perspective of outcomes, a great deal has changed.
DORA measures how well teams execute decisions. AI increasingly improves how decisions are formed in the first place. That improvement happens upstream of the pipeline DORA observes.
This is not a tooling gap. It is a visibility gap.
One concrete example helps make this tangible.
Consider a team that spends three days in conversational exploration with AI. Two architectural approaches are explored and discarded. A third is hardened through dialogue. Tests are sketched. Operational constraints are identified. The team ships in a week. DORA records a single deployment. The avoided work never appears.
Vibing as a production pathway
Vibing is often framed as pre work that must later be formalized or cleaned up before real engineering begins. That framing assumes a clear boundary between exploration and production.
AI collapses that boundary.
When vibing is effective, discipline moves earlier. Tests are often sketched or written during exploration rather than after implementation. Architectural constraints surface in conversation instead of code review. Operational concerns such as failure modes, observability, and rollout strategy are shaped before a single commit exists. By the time code is written, many production decisions are already locked in.
This does not mean shipping directly from a chat window without review or safeguards. Production still requires verification, testing, and operational checks. What changes is not the need for rigor, but where rigor is applied.
Vibing does not replace engineering discipline. It front-loads it.
The result is that production work increasingly reflects decisions that have already been reasoned through deeply. Less work flows through the system because less work needs to.
This shift is real, but its effects show up unevenly, which is why the metrics struggle to capture it.
Why the metrics do not light up
Uneven outcomes show up because AI is not a leveling force. It amplifies base reasoning and judgment. That is a consequence of the shift, not the point.
Because AI amplifies reasoning rather than equalizing skill, organizations should not expect uniform gains. Teams with strong sensemaking capabilities often see disproportionate benefit, while teams that rely primarily on process and throughput tend to see smaller, incremental improvements. When those experiences are aggregated, the signal flattens into modest averages.
This happens because reasoning quality varies. AI helps some teams eliminate work before it exists by surfacing better decisions earlier. In other cases, AI is applied primarily inside an unchanged delivery pipeline, where its impact is naturally constrained. Both patterns can coexist within the same organization.
DORA reports faithfully on what it can see. It measures execution once work has entered the system. It cannot count work that never needed to be done, or decisions that prevented entire classes of change from entering the pipeline at all.
That is why AI’s impact can feel obvious to practitioners and muted in dashboards at the same time.
How to orient without abandoning rigor
The answer is not to discard delivery metrics. It is to place them correctly.
First, separate exploration from execution in your mental model. Vibing and production are not opposites, but they are not the same phase either. Do not score early sensemaking with late stage metrics.
Second, allow pre metric space. Conversational exploration, architectural reasoning, and option mapping create value precisely because they are not immediately legible to dashboards. If every activity must justify itself through throughput, the most valuable work will not happen.
Third, translate insight before scaling. Ask what decisions AI helped you make earlier, which paths were avoided, and which patterns are now stable. Once the shape of the work hardens, metrics become not only appropriate but essential.
Finally, apply DORA where it shines. Use it to stabilize proven workflows and keep production healthy. Let it confirm outcomes rather than police exploration.
The inflection point DORA cannot capture
The AI inflection point is not primarily about shipping faster. It is about arriving at production worthy decisions earlier.
Execution metrics still matter. They simply no longer tell the whole story on their own.
As AI pushes more leverage into early sensemaking, the most important changes occur before work becomes visible to traditional delivery measurement. If we continue to evaluate AI through execution metrics alone, its impact will look incremental.
If we look upstream, the shift is already clear.
Vibing is no longer just pre work. In many cases, it is how production happens.