Skip to content
AI Native Builders

The Throughput Wall: Redesigning Sprint Velocity When Agents Generate 10x Your Review Capacity

AI agents generate code overnight but human review capacity hasn't scaled. Here's the review throughput formula and sprint redesign engineering managers need.

Strategy & Operating ModelintermediateApr 8, 20266 min read
By Viktor Bezdek · VP Engineering, Groupon
Editorial illustration of a quality inspector at a tiny desk being buried under an endless avalanche of packages from a high-speed factory conveyor belt, while robotic assembly arms behind it continue running at full speed — representing human reviewers overwhelmed by AI agent code generationThe conveyor belt doesn't stop because you're behind.

Sprint velocity becomes meaningless when agents generate code faster than humans can review it. Story points go up; cycle time stays flat. The sprint board looks productive right up until the review queue contains thirty-seven unmerged pull requests and it's day eight of ten.

This is what happens when teams add coding agents without redesigning the operating model around them. The agents execute overnight — reliably, quickly, without complaint. By morning, GitHub shows dozens of open pull requests. The developers arrive, feel good about the output, and then realize nobody planned for who reviews all of this. The sprint was designed the old way: estimate what the team can build, assign the work, measure velocity in points completed. That model assumed code generation was the rate-limiting step. Agents removed that assumption in one sprint cycle.

The throughput wall is the point where agent code generation capacity outpaces human review capacity. It is not a tooling problem — adding better agents makes it worse. It is a sprint design problem: the operating model was built for a world where writing code was slow and expensive. Redesigning for a world where writing code is nearly free requires inverting the planning logic entirely. You plan sprints backwards now, from review throughput, not forward from agent capability.

When the Constraint Shifts Overnight

Code generation is no longer the slowest stage. Review is. Everything downstream from that fact is different.

87% → 28%
Defect detection rate: drops from 87% on PRs under 100 lines to 28% on PRs over 1,000 lines (SmartBear/Cisco research via LinearB benchmark)[^2]
< 219 lines
Median PR size for elite engineering teams in the LinearB analysis of 8.1 million pull requests across 4,800 teams[^2]
< 7 hours
PR pickup time (first review) for elite teams — agents routinely generate PRs that sit days in the queue on unoptimized teams[^2]

Eliyahu Goldratt's Theory of Constraints[1] offers the clearest framework for what happens here. Every system with interdependent stages is bounded by its slowest stage — the constraint. Improve any other stage and you do not improve the system: you build up more work-in-progress in front of the actual bottleneck.

For most software delivery pipelines, code generation was never the constraint. Code review, QA, and deployment approval were slower. AI agents accelerated code generation — a non-constraint stage — without touching any downstream stage. The result is more code arriving at the review gate faster than reviewers can clear it. The queue grows. Review quality degrades as reviewers rush through larger piles of increasingly unfamiliar code. And the problem compounds: agents building on top of unreviewed code produce PRs with assumptions baked in that nobody has validated.[6]

This is why practitioners describe encountering the wall as a sudden shift rather than a gradual slowdown.[4] The agents are efficient enough that a small team can hit review capacity within the first sprint or two of adoption. The code keeps generating. The reviews don't keep up.

Why Adding More Agents Makes the Wall Taller

Optimizing a non-constraint stage builds inventory, not throughput.

The Throughput Wall: Sprint Flow With Agents
  1. Automated (Overnight)
  2. start
    Sprint Start
  3. fork
    Fork
  4. action
    Coding Agent A
  5. overnight
  6. action
    Coding Agent B
  7. overnight
  8. action
    Coding Agent C
  9. overnight
  10. Human Review Gate
  11. join
    Join
  12. queue
    PR Queue — the wall
  13. clears slowly
  14. manual
    Human Reviewer
  15. decision
    PR Outcome?
  16. approved
  17. returned
  18. action
    QA Pass
  19. end
    Deployed
Multiple agents fan into a single human review gate. Adding agents increases the queue height — not the merge rate.

The Agile Leadership Day India framework for AI-augmented Scrum teams[5] puts this plainly: "A 24/7 AI agent will quickly outpace human reviewers. If you do not plan human capacity for code review, your agents will stack up a massive backlog of unmerged pull requests, stalling your entire continuous integration pipeline."

The instinct when facing this is to add more agents — the reasoning being that agents are cheap, so scaling them is low-cost. This is the wrong response. More agents at the generation stage mean more PRs at the review stage. The constraint is not generation; it is review. Adding capacity to the wrong stage increases inventory without improving throughput.[7] The correct response is to constrain agent output to what reviewers can actually clear.

Designing the Sprint Backwards From Review Capacity

The planning sequence flips: review budget first, agent assignment second.

The inversion is this: instead of asking "what can agents generate this sprint?", ask "what can reviewers approve this sprint?" Build the sprint backwards from that number.

This feels counterintuitive because agents have surplus capacity. They could generate far more than the sprint assigns them. That surplus feels like waste. It is not waste — it is correctly identified excess that exceeds the system's throughput capacity. Agents sitting idle while the review queue clears is the right state. Agents running while the review queue grows is the wrong state, regardless of how full the sprint board looks.

Agent-first sprint design (common failure mode)
  • Assign tickets to team and agents; estimate all work in story points

  • Agents execute overnight; morning reveals a large, unplanned PR queue

  • Measure velocity by story points or PRs opened

  • Review happens when reviewers have bandwidth — backlog carries forward

  • Sprint ends with open agent PRs; next sprint starts already behind

Review-first sprint design (what works)
  • Calculate review throughput first: reviewers × sustainable hours × sprint days

  • Set sprint PR budget from review capacity — this caps agent ticket assignment

  • Measure velocity by PRs merged, not opened

  • Stage agent execution so PRs arrive in daily batches reviewers can absorb

  • Sprint ends with zero open agent PRs — no carryover, clean start

The Review Throughput Formula

A calculation every engineering manager should run before the next sprint kickoff.

The formula has three inputs: the number of available reviewers, the sustainable daily review hours per reviewer, and the average hours required to review one agent PR.

Sprint review throughput = (reviewers × sustainable review hours per day × sprint days) ÷ hours per agent PR review

The "sustainable" qualifier matters. A senior engineer can maintain focused code review for roughly 2–2.5 hours per day before quality degrades meaningfully. Review beyond that threshold still happens, but defect detection drops — reviewers are reading without fully processing. This is not a criticism of the individuals; it is how sustained cognitive load works under context-switching pressure.

Using 2.5 hours as the sustainable budget: a team with three reviewers across a ten-day sprint has 75 review-hours available. If the average agent PR is 400 lines and requires approximately 75 minutes of careful review, that's roughly 60 PRs the team can responsibly approve in that sprint. Assign agents to work that will generate more than 60 PRs and the sprint is over-committed before it starts.

Two variables drive this formula above all others: PR count and PR size. Reducing average PR size from 600 lines to 300 lines roughly doubles review throughput with no additional headcount. This is the highest-leverage control variable in a hybrid team's sprint design.

PR Size (lines)Defect Detection RateTypical Review TimeElite Benchmark
< 100 lines87%20–30 min✓ Below elite median
101–300 lines78%45–75 min✓ At elite median
301–600 lines65%90–150 min⚠ Above elite median
601–1,000 lines42%2.5–3.5 hrs✗ Well above limit
> 1,000 lines28%3.5+ hrs (often rushed)✗ No review is reliable

Ticket Sizing When 30–50% of Tickets Are Agent-Executed

Story points estimate human effort. They need a replacement for agent work.

Story points were designed to capture human cognitive and execution cost. An agent executes a 5-point ticket in minutes. Applying story points to agent work produces inflated velocity numbers and, worse, a planning model that has no concept of the review burden being created.

Hybrid teams need a different sizing unit for agent work: scope (what gets changed) rather than effort (how long it takes). The relevant question for an agent ticket is not "how many hours?" but "how many files, how many systems, and how many review-hours does this generate?"

  1. 1

    Separate agent tickets from human tickets in the backlog

    Tag every ticket as human-executed or agent-executed before backlog grooming. Human tickets retain story points. Agent tickets get scope-based sizing: small (1–2 files, under 200 lines), medium (3–8 files, 200–500 lines), large (more than 8 files or more than 500 lines). Large agent tickets must be broken into medium tickets before assignment — an agent that generates a 1,200-line PR from a single large ticket will produce a review that nobody can responsibly approve in one session.

  2. 2

    Calculate the sprint PR budget before assigning agent tickets

    At sprint planning, run the review throughput formula before discussing agent ticket assignments. This number is a hard cap — it does not flex upward because agents have surplus capacity. Assign agent tickets until the cumulative expected PR count hits the budget, then stop. Work beyond the budget is backlogged to the next sprint.

  3. 3

    Run a mid-sprint review queue health check

    At the sprint midpoint, check the ratio of open agent PRs to remaining reviewer capacity. If more than 30% of agent PRs have been waiting more than two days without a first-review comment, the sprint is over-committed or execution staged poorly. Pause new agent ticket assignment. Clear the queue first, then resume. This ceremony prevents the compounding failure: agents continuing to generate while the backlog grows, ending the sprint with dozens of unmerged PRs that carry forward as technical debt and planning confusion.

The Three Sprint Ceremonies Hybrid Teams Are Missing

Add these to planning cadence before the throughput wall hits you in month two.

Review Capacity Planning
At sprint kickoff: calculate reviewer availability and set the PR budget. This number caps agent ticket assignment before backlog grooming begins — not after.
Agent Work Staging
Schedule agent execution in waves across the sprint so PRs arrive in daily batches reviewers can absorb — not all queued on day 1 from overnight runs.
Daily Review Queue Health
A 5-minute standup check: open agent PRs vs. remaining reviewer capacity. If the queue grows faster than it clears, pause agent assignment immediately.
Constraint-First Sprint Model
  1. Sprint Planning
  2. start
    Review Capacity
  3. caps the sprint
  4. action
    Sprint PR Budget
  5. Agent Execution
  6. fork
    Fork
  7. action
    Agent Task A
  8. action
    Agent Task B
  9. action
    Agent Task C
  10. join
    Join
  11. queue
    Staged PR Batches
  12. Human Review
  13. manual
    Human Review Gate
  14. decision
    PR Fits Budget?
  15. approved
  16. too large
  17. end
    Merged & Shipped
  18. action
    Oversize Returned
  19. re-scope
Design the sprint from review capacity down, not from agent capability up. The PR budget is the constraint that drives everything else.

Acceptance Criteria That Actually Clear the Wall

An agent PR isn't done when it's submitted. It's done when it's reviewed and merged.

Standard acceptance criteria were written for human-generated PRs, where the developer carries context about what they changed and why. Agent-generated PRs arrive without that human context — the reviewer is reading code from a process that has no standing relationship with the codebase and no institutional memory.

This means agent ticket acceptance criteria need two additions: context requirements (what the agent must include in the PR description to make the reviewer's job feasible) and scope constraints (guardrails that prevent agents from modifying code outside the ticket's stated boundaries, which is a common pattern that silently expands review surface area).

Required acceptance criteria for agent-executed tickets

  • PR size within the sprint limit (team-defined, typically 300–500 lines). Agent must submit separate PRs if scope overflows the limit.

  • PR description includes: what changed, why, and which areas carry the most risk. Agent-generated summaries count if complete.

  • Test coverage at or above the codebase baseline. Agents must generate tests alongside implementation code — not as a separate follow-up ticket.

  • No changes to files outside the ticket's stated scope. Agents commonly 'improve' adjacent code; this creates unplanned review surface.

  • Review-ready signal only after a developer scans for obvious issues. One human pass before the PR enters the queue.

Warning signals that the throughput wall is already here

  • Review queue depth growing sprint-over-sprint without a matching increase in merged PRs

  • PRs sitting more than three days without a first-review comment

  • Reviewers approving PRs in under ten minutes — pattern recognition, not genuine review

  • Sprint velocity (points completed) rising while cycle time stays flat or grows

  • Agent-generated tickets entering QA at higher defect rates than human-written tickets

How many agent-generated PRs can a senior engineer responsibly review per week?

There is no universal number, but you can calculate it from your team's context. A sustainable upper bound is roughly 2–2.5 hours of focused review per day. At 45–90 minutes per small-to-medium agent PR (200–400 lines), that is 2–3 agent PRs per reviewer per day, or 10–15 per reviewer per week. For larger PRs (600+ lines), the number drops toward 5–10 per week. Beyond these ranges, defect detection rates fall measurably — reviewers start pattern-matching rather than reading carefully. The LinearB benchmark data puts elite team PR sizes below 219 lines for exactly this reason: small PRs review fast and catch more defects per hour of reviewer time.

What should we do when agents generate more PRs than reviewers can handle in a sprint?

Stop assigning new agent tickets. The instinct is to keep agents running because they are cheap and the tickets exist in the backlog. Resist it. A growing review queue creates compounding problems: agents may start building on unreviewed code, reviewers lose context between sessions, and the queue ages in a way that makes it progressively harder to clear. Pause agent execution, clear the existing queue to zero, then resume with a corrected sprint PR budget. One sprint of under-assignment is better than three sprints of compounding overhang.

Do story points still work for sprints with agent-executed tickets?

For human-executed work, yes. For agent-executed work, no — not as a capacity planning tool. Story points capture human effort; agents collapse execution time to near-zero, making point estimates meaningless for agent tickets. Most teams running hybrid sprints eventually split their sprint board: agent tickets tracked by scope and review budget, human tickets tracked by story points. The two systems run in parallel without conflict.

Should we hire more reviewers or improve AI review tooling first?

Improve tooling first. AI-assisted code review tools that handle first-pass flagging (security vulnerabilities, missing test coverage, obvious code smells) can meaningfully reduce the time a human reviewer needs per PR. If automated tooling reduces per-PR review time by 20–30%, that directly expands sprint PR budget without headcount. Once tooling is in place and the PR budget is still insufficient, then additional reviewer headcount is justified — and the tooling makes new reviewers more effective from day one.

How does sprint retrospective change with a hybrid human-agent team?

The core retro questions shift. Instead of 'did we estimate correctly?', ask 'was our review capacity accurately planned?'. Instead of 'what slowed individual developers?', ask 'where did the review queue back up and why?'. Velocity calibration — the traditional retro focus — becomes less relevant because agent execution time is consistent and fast. Throughput system analysis becomes the primary goal: was the sprint PR budget right? Did staging work? Were acceptance criteria enforced? These questions produce more actionable adjustments than estimation accuracy discussions.

Sources and data notes

Defect detection rates by PR size (87% to 28% across size ranges) are from SmartBear/Cisco research as reported and contextualized in Vitalii Petrenko's analysis of the LinearB 8.1M PR dataset. LinearB benchmark figures (elite team median PR size and pickup time) are from the same source. The sustainable review hours estimate (2–2.5 hours per day) is practitioner-derived and consistent with cognitive load research on sustained analytical tasks, but not from a single citable study — treat it as a calibration starting point, not a fixed ceiling. Specific teams report different sustainable windows based on reviewer experience, codebase familiarity, and PR quality.

Sources:

Key terms in this piece
sprint velocity AI agentsagent code review bottleneckhybrid team sprint designreview throughput formulaTheory of Constraints software
Sources
  1. [1]Goldratt Institute: Theory of Constraints(goldratt.com)
  2. [2]Vitalii Petrenko: The Hidden Cost of Slow Code Reviews — Data from 8 Million PRs (LinearB benchmark, SmartBear/Cisco defect data)(medium.com)
  3. [3]Abhilash M: Your AI Coding Agent Is a 100x Developer — But Your Code Review Process Isn't(medium.com)
  4. [4]Kukicola: The Review Bottleneck — When AI Codes Faster Than You Can Read(kukicola.io)
  5. [5]Agile Leadership Day India: AI-Augmented Scrum Framework — Running Scrum When Half Your Team is AI Agents(agileleadershipdayindia.org)
  6. [6]ZenBusiness: Breaking Bottlenecks — Applying Theory of Constraints to Software Development(tech.zenbusiness.com)
  7. [7]iSixSigma: From Concept to Code — Leveraging Theory of Constraints for Software Development(isixsigma.com)
  8. [8]GitHub: Pull Request Throughput and Time-to-Merge Available in Copilot Usage Metrics API(github.blog)
  9. [9]Qodo: Build a Code Review Process That Handles 10x More PRs(qodo.ai)
  10. [10]More Than Monkeys: The Pragmatic Engineer's Guide to the Theory of Constraints(morethanmonkeys.medium.com)
Share this article