Why Code Review Will Determine Who Wins in the AI Era

For decades, software engineering was defined by writing code. But that balance has shifted. With AI code agents producing high-quality output in seconds, the bottleneck isn’t generation anymore, it’s everything around it: reviewing, merging, testing, and governing the changes.

We’ve entered a new frontier where engineers spend less time typing and more time orchestrating. Writing code has become the easy part; making sure that code is correct, compliant, and production-ready is where the real challenge now lies.

Codegen CEO Jay Hack joined Merrill Lutsky, co-founder and CEO of Graphite, on AI Hot Takes to dig into why code review is the new bottleneck, and why the way teams rethink their outer loop will decide who ships and who stalls.

Reviews are where teams are getting stuck

Numbers show developers are able to generate code at a blistering pace. GitHub reported in 2023 that developers using Copilot completed tasks 55.8% faster than control groups, and because of this speed, Amazon says “developers report they spend an average of just one hour per day coding.”

But the outer loop hasn’t kept up. An analysis of ~1,000,000 PRs by LinearB shows that cycle time is dominated by review latency, with PRs sitting idle for an average of 4+ days before a reviewer even looks at them. In other words: we can now generate 10x the code, but we can’t yet review or ship it 10x faster.

Lutsky noted:

“If we have these 10x engineering agents, that just makes the problem of code review 10x more important, and more painful for companies that are using them.”

Is stacking how we keep the pace?

One of the most powerful responses to this bottleneck is stacked pull requests. Instead of submitting one massive PR, stacking breaks features into small, independently reviewable increments.

This isn’t new. Facebook built Phabricator to support stacked diffs across thousands of engineers, and Google’s Critique adopted similar practices. The reason was simple: smaller diffs are easier to review, unblock dependent work, and reduce the risk of merge conflicts.

Lutsky stated:

“Stacking was invented for orgs with thousands of engineers, but it’s suddenly relevant to every team now that agents can generate code at the scale of those orgs.”

Now, in the agents era, stacking feels less optional and more like a requirement. Agents generate code in bursts, and humans can’t keep up if the output lands as giant, monolithic PRs. Stacking makes agent output digestible, verifiable, and mergeable.

Solving AI problems with AI solutions

There’s a temptation to see AI review as replacing rule-based automation to keep the pace, but the reality is that both are necessary.

Deterministic systems such as branch protection, CI pipelines, and merge queues enforce non-negotiables. They ensure that every change passes tests, follows style guides, and respects permission boundaries. But they’re limited. They can’t reason about whether a design decision makes sense.

Agentic review fills that gap. Context-aware agents can scan a PR in seconds, check for subtle logic errors, and recommend fixes that a human might miss, especially in unfamiliar parts of the codebase. Studies suggest AI already outperforms humans at spotting certain categories of bugs.

Graphite is already combining merge queues with their review agent, Diamond. Lutsky noted:

“Combining those kinds of deterministic and more traditional methods with agentic review, and having a code review companion…our unique take is that you need both of those combined all into one platform in order to properly handle the volume of code that we’re seeing generated today.”

Deterministic controls to guarantee baseline standards, and agentic reviewers to accelerate semantic checks. The result is faster throughput without sacrificing safety.

Optimizing the Outer Loop

We’re in the middle of an exciting shift. Code generation is fast and plentiful. The bottlenecks are now review, orchestration, and governance — the outer loop of development.

Optimizing this outer loop requires:

  • Making stacking the default, so changes are digestible.
  • Blending deterministic rules with agentic review for speed and safety.
  • Building review interfaces that tell a story and scale to agent-level throughput.
  • Treating AI metadata as compliance-critical data, not an afterthought.
  • Meeting developers where they work, whether in Slack, GitHub, or natural language interfaces.

The message for teams of all sizes is clear. Code is no longer the bottleneck. Review is. The winners in this new era will be the teams that redesign their workflows around that fact.

Want to check out the full conversation? Watch Jay Hack and Merrill Lutsky discuss how AI code generation is breaking traditional development workflows, and why code review has become the real bottleneck on AI Hot Takes.

If you’re still stuck in PR purgatory, it’s time to try Codegen. Free to start, or schedule a demo if you want receipts.