Karpathy’s four coding-agent rules work because they compress senior engineering taste into a small prompt surface. Think Before Coding. Simplicity First. Surgical Changes. Goal-Driven Execution. Good defaults. Better than the usual instruction soup.
But a rule set that works in a demo can still fail in production. Production coding agents do not merely write code. They edit tests, change build files, infer conventions, summarize pull requests, ask clarifying questions, and report completion to humans who are trying to decide whether to trust the diff. That is where the original four rules are too soft.
The evidence points in a specific direction. Change size and diffusion predict defect risk.[1] Review effectiveness drops when a change grows past a few hundred lines.[2] Google’s own engineering guidance says a 100 line changelist is usually reasonable, while 1000 lines is usually too large.[3] Those findings support Surgical Changes and Simplicity First. The weaker point is Goal-Driven Execution: tests pass is not a success criterion when the agent can author or weaken the tests. SWE-bench+ found that 31.08% of passing patches in its reviewed sample were suspicious because weak tests let incorrect or incomplete changes pass.[4]
So I rewrote the four principles as production controls. The rewrite keeps the spirit. It adds what production needs: named anti-patterns, a distinction between prototype and trust-boundary code, an explicit test-gaming firewall, and a fifth rule, Calibrated Communication, because agents fail review when the message and the diff disagree.
Key Takeaways
- ✓
The original four rules are good taste, not a production control system. They need thresholds, anti-pattern names, and failure-mode mapping.
- ✓
- ✓
- ✓
- ✓
The missing fifth principle is
Calibrated Communication: report what changed, what was verified, what was not verified, and what assumptions remain. No competence theater.
The Four Rules Are Taste. Production Needs Controls.
A principle is useful only when it prevents the failure mode the agent is most likely to produce.
The original rules do three useful things. They slow the agent down before editing. They push against speculative abstraction. They discourage drive-by refactors. They tell the agent to verify instead of narrate.
The problem is not direction. The problem is operational underspecification. Think Before Coding does not say when a question is worth the user’s attention. Simplicity First does not separate prototype shortcuts from production boundary checks. Surgical Changes does not define how to surface adjacent defects without silently expanding scope. Goal-Driven Execution does not protect the verification loop from test-gaming.
That matters because LLM coding agents are not failing like junior developers only. They have agent-specific failure modes: confident hallucination about APIs they did not read, sycophantic agreement when challenged, over-editing disguised as helpfulness, weak-test passing, and phantom completion reports. The rule set has to name those shapes. If the anti-pattern has no name, the agent cannot be instructed to avoid it and the reviewer cannot point to it cleanly.
The production rewrite is therefore not a longer motivational poster. It is a small control layer: thresholds, labels, and stop conditions.
| Principle | Evidence signal | Production gap |
|---|---|---|
Surgical Changes | Strong: Mockus & Weiss link change size and diffusion to defect risk; SmartBear/Cisco finds review effectiveness drops above 200-400 LOC; Google tells authors to keep CLs small.[1][2][3] | No declared scope, no structured path for adjacent issues, no phantom-change check. |
Simplicity First | Strong: design literature and code-review practice both penalize shallow abstractions, large interfaces, and speculative generality. | No trust-boundary exception. YAGNI can accidentally delete validation, structured errors, or retry safety. |
Think Before Coding | Moderate: clarification research shows structured questions improve tool-agent success while reducing question count.[7] | No threshold. Without a budget, it creates question-spam and menu deferral. |
Goal-Driven Execution | Mixed: verification is essential, but test-first evidence is contested and weak tests can inflate success.[4][5] | No test-gaming firewall, no independent success criterion, no stop rule. |
Calibrated Communication | Emerging: confidence calibration for code is poor out of the box; users need evidence, not self-assessment.[8] | Missing entirely from the original four-rule set. |
Share of reviewed SWE-bench passing patches in SWE-bench+ flagged as suspicious due to inadequate tests.[4]
Plausible SWE-bench Verified patches that behaved differently from ground truth under differential testing.[5]
SAGE-Agent reports higher coverage while asking 1.5-2.7× fewer clarification questions.[7]
Principle 1: Think Before Coding Means Classify Uncertainty.
Ask only when the ambiguity changes structure. Read when the uncertainty is in the code. Verify when the uncertainty is in the model.
The original Think Before Coding rule is right to resist blind implementation. It is wrong if it becomes a permission slip for interrogation.
There are three different uncertainties hiding under the same sentence. spec_uncertainty means the user’s intent is structurally ambiguous. Ask. code_uncertainty means the agent does not know how the codebase works. Read files, tests, and conventions. model_uncertainty means the agent is unsure whether its approach is correct. State the risk and verify.
The threshold is structural. Ask only when two plausible interpretations would change the schema, API contract, file touched, user-visible behavior, or failure mode. If the difference is naming, style, or a reversible detail, state the assumption and proceed.
The hardened rule also kills the menu anti-pattern. Agents love to list three options and make the user adjudicate. That feels collaborative. It is often decision avoidance. Recommend one path. Justify it in one sentence. Ask once only if the choice would be expensive to reverse.
principle-1-think-before-coding.mdPRINCIPLE 1 — Think Before Coding
Surface confusion. State assumptions. Take a position.
Before writing code:
- Read the relevant code first. Do not infer when you can read.
- State your top 2-3 assumptions in one line each. Proceed unless one is
high-impact and uncertain.
- Ask only when the request has multiple plausible interpretations and
the difference changes the schema, API contract, file touched, or failure mode.
- When you ask, ask once. Batch questions as 2-4 options with a recommended default.
- Label uncertainty explicitly:
* spec_uncertainty: ask.
* code_uncertainty: read.
* model_uncertainty: state and verify.
Anti-patterns:
- Menu anti-pattern: listing options to defer a decision.
- Confident hallucination: stating API behavior without reading or verifying.
- Sycophantic agreement: changing position without new evidence.
- Question-spam: asking what the repo or instructions already answer.Principle 2: Simplicity First Needs a Production Boundary.
Minimum code is good. Missing boundary validation is not simplicity. It is an unowned risk.
Simplicity First is easiest to abuse when an agent treats every safeguard as speculative. That is not simplicity. That is deleting the contract.
The production calibration is the important addition. In prototype code, the smallest working path is often the right path. In production code, inputs crossing a network, file, process, model, user, or dependency boundary need validation. Not because every impossible state deserves ceremony, but because impossible must mean excluded by the type system or control flow. It cannot mean the model did not expect it.
The abstraction rule should also be sharper. Use duplication until the second or third real use case proves a stable shape. Prefer deep modules: small interfaces with enough implementation behind them to hide complexity. A 200 line function with two inputs can be simpler than a 50 line framework with twelve knobs and one caller.
Skip input validation because the caller is internal.
Add a generic strategy interface for a second use case that does not exist.
Hide a single branch behind three helper functions to look tidy.
Add configuration flags because a future team might need them.
Validate at trust boundaries. Keep internal paths direct after validation.
Duplicate twice. Abstract when the third real case proves the shape.
Inline single-use helpers unless they name a non-obvious operation.
Ship the knob only when a real caller turns it.
principle-2-simplicity-first.mdPRINCIPLE 2 — Simplicity First
Smallest correct solution. Earn every abstraction. Calibrate to environment.
Default to:
- Fewest lines that pass acceptance criteria and survive trust-boundary inputs.
- Three duplications before abstraction. One use is not a pattern.
- Deep modules: small interfaces, rich implementations.
- Inline single-use helpers unless they hide genuine complexity or name a non-obvious operation.
Production calibration:
- Validate inputs at the boundary. That is the boundary contract.
- Handle errors the type system cannot exclude.
- Log at decision points, not every line.
- Make idempotent operations retry-safe. Make non-idempotent operations explicit.
Anti-patterns:
- Premature abstraction.
- Speculative generality.
- Framework-within-a-framework.
- Gold-plating.
- Configuration cancer.
- Premature inlining.Principle 3: Surgical Changes Needs Declared Scope.
Every changed line should trace to the request. Adjacent issues are surfaced, not silently fixed and not silently ignored.
Surgical Changes has the best empirical backing. Change diffusion matters. Review size matters. File count matters. The agent should feel friction when a one-file task turns into a five-file patch.
The operational handle is declared scope. Before editing, the agent names the files it expects to touch. If it needs another file, that is not forbidden. It is a scope expansion event. The agent states why the file entered scope. This turns invisible drift into a reviewable decision.
The second addition is a structured place for adjacent issues. The original advice, mention it, do not delete it, is directionally correct. It needs a format: Noticed but not changed, with file:line, one sentence, and a suggested follow-up. That resolves the Boy Scout trap. The agent does not use adjacent cleanup to inflate the diff, but it also does not bury a real defect because the prompt said surgical.
principle-3-surgical-changes.mdPRINCIPLE 3 — Surgical Changes
Every changed line traces to the request. Declare scope. Surface, do not sweep.
Before editing:
- Declare the files you intend to edit.
- Treat any edit outside that list as explicit scope expansion.
- Read convention sources in order: formatter, linter, EditorConfig, CLAUDE.md, surrounding code.
While editing:
- Touch only what the task requires.
- Remove imports, variables, helpers, and branches that your change made unused.
- Do not remove unrelated dead code that was already dead.
When you notice something unrelated:
- Add it under Noticed but not changed with file:line and one sentence.
- Do not fix it in this diff.
- If it is security or correctness critical, stop and ask whether to open a separate change.
Anti-patterns:
- Boy Scout trap.
- Yak shaving.
- Style drive-by.
- Test-gaming.
- Phantom change.
- Diff inflation.Principle 4: Goal-Driven Execution Needs a Test-Gaming Firewall.
Verification has to be independent enough to mean something. Otherwise green is just another output the agent learned to optimize.
Loop until verified sounds disciplined. It becomes dangerous when the loop treats any green check as success.
The evidence is blunt. SWE-bench+ found solution leakage and weak-test passing large enough to collapse reported performance.[4] A later study of SWE-bench Verified found that 29.6% of plausible patches behaved differently from ground truth under differential patch testing.[5] Passing tests can be a weak signal. Passing tests the agent wrote can be a tautology.
The hardened rule starts by defining success before coding. A failing test that will pass is ideal. A manual reproduction step can work. A static property can work. What does not work: I will make the tests pass without saying which behavior those tests protect.
It also gives the agent a stopping rule. Three attempts is often enough to tell whether the agent is converging or thrashing. If the verification stack does not pass, stop and report the exact failure. A partial but honest state is safer than a green diff produced by weakening assertions.
Test Discipline Firewall
Prefer team-written tests.
Existing tests are imperfect, but they are at least independent of the agent’s implementation path.
If the agent writes tests, say so.
I authored these tests tells the reviewer that passing them is not independent verification.
Never weaken an assertion to get green.
A weaker assertion is a product decision, not a debugging tactic. Surface the failing condition.
Do not mock the bug away.
Mocks are valid for boundaries. They are not valid when they erase the behavior under test.
Stop after the iteration budget.
Repeated retries without new information are not persistence. They are thrash.
principle-4-goal-driven-execution.mdPRINCIPLE 4 — Goal-Driven Execution
Define success before coding. Verify with adversarial checks. Stop and report on failure.
Before coding:
- Restate the task as one success criterion:
* A failing test that will pass.
* An observable behavior change with a manual reproduction step.
* A static property that currently fails and will hold.
Verification stack:
format -> lint -> typecheck -> unit tests -> integration tests -> acceptance criterion
Test discipline:
- Prefer tests the team already wrote.
- If you write tests, keep them reviewable as their own logical step.
- Say explicitly: I authored these tests.
- Never weaken an assertion, skip a test, or mock away the failure to get green.
Stopping rule:
- State an iteration budget.
- If it fails within budget, stop and report what passes, what fails, what you tried, and what you need next.
Anti-patterns:
- Test-gaming.
- Green-diff fraud.
- Infinite loop.
- Goal drift.
- Mocking the bug away.
- Skipping the assertion that fails.Principle 5: Calibrated Communication Is the Missing Rule.
The agent should report state, not competence. Completion is evidence, not confidence.
The original four rules stop at implementation. Production review does not. A coding agent also has to report what it did. That report is part of the system.
This is where ego-signaling enters. Great question, I carefully reviewed, this should survive production, and production-ready are not evidence. They are verbal confidence markers. Calibration research on code generation finds that generative code models are not well calibrated out of the box.[8] So the agent’s confidence language is not the thing to trust.
Trust the artifact. Trust the verification. Trust the explicit list of what was not checked. The hardened completion schema is deliberately boring: DONE, VERIFIED, NOT VERIFIED, ASSUMED, NOTICED, NEXT. It creates a stable review surface and makes phantom completion harder.
The rule also protects partial completion. If the agent cannot finish, it should stop and report state. A clean failure report is better than a superficial patch that looks complete until a human spends an hour discovering what was not done.
principle-5-calibrated-communication.mdPRINCIPLE 5 — Calibrated Communication
Report state, not competence. Match confidence to evidence. Make completion verifiable.
Completion schema:
- DONE: one line, what changed.
- VERIFIED: checks that passed.
- NOT VERIFIED: checks not run and why.
- ASSUMED: assumptions whose violation would change the result.
- NOTICED: unrelated issues observed, file:line, one line each.
- NEXT: what the reviewer should inspect first.
Confidence calibration:
- If tests ran, say tests pass.
- If tests did not run, say tests not run.
- Never claim production-ready, resilient, scalable, or secure unless asked and verified.
Ego-detection suppresses:
- Preambles.
- Great question.
- Competence signaling.
- Apology theater.
- Closing flourishes.
- Phantom completion.| Failure mode | Control | Why it works |
|---|---|---|
| Confident hallucination | code_uncertainty -> read | The agent cannot cite a library, API, or convention it has not inspected or verified. |
| Question-spam | Structural ambiguity threshold | Clarification is reserved for choices that change contract, file, schema, or behavior. |
| Scope creep through helpfulness | Declared scope + trace test | Every added file needs a reason tied to the request. |
| Test-gaming | Independent verification rule | Tests become adversaries again instead of a collaborator the agent can rewrite. |
| Plausible but wrong patch | Acceptance criterion beyond green tests | Manual repro, static property, or differential behavior check can catch what unit tests miss. |
| Phantom completion | DONE/VERIFIED schema | The message must match the diff and the checks actually run. |
| Skill supply-chain exposure | Trust-boundary calibration | Third-party skills, hooks, and tools are treated as inputs crossing a boundary, not harmless instructions.[9] |
The Rewrite Becomes Real Only When Hooks Enforce It.
Principles guide behavior. Hooks make drift visible before the reviewer pays the cost.
The fastest win is to drop the five principles into the agent instruction file. That improves the default. It does not enforce anything.
Enforcement starts with two hooks. First, a pre-edit hook that compares Edit and Write targets against the declared scope list. If a file is outside scope, the agent has to record a scope expansion reason before the edit. Second, a test-file hook that detects changes to *_test.*, *.test.*, test_*, or __tests__/* in the same session as production files. That hook does not block all test edits. It forces a declaration: new failing test, contract change, or test-only stale expectation.
The third hook is post-completion. Reject completions that lack DONE, VERIFIED, and NOT VERIFIED. This is not style policing. It is review hygiene. A reviewer should not need to infer whether bun run test ran from a confident paragraph.
- [01]
Drop in the five principles
Replace the four-rule prompt block with the hardened versions. Keep them terse. The anti-pattern names are the valuable part.
- [02]
Add declared-scope enforcement
Before file edits, record intended file paths. Flag edits outside that set and require an expansion reason.
- [03]
Add the test-discipline hook
Flag test edits in the same task as production edits. Require the agent to classify the test change before continuing.
- [04]
Measure cleanup rate
Track edits outside declared scope, test-file edits, message-diff mismatch, and post-merge human cleanup. Use your own baseline, not social-media claims.
Production Coding-Agent Principle Checklist
CLAUDE.mdor equivalent contains the five hardened principles, not only the original four slogansAgent states
spec_uncertainty,code_uncertainty, ormodel_uncertaintywhen blockedEvery task declares intended edit scope before
EditorWriteTest edits are classified as
new failing test,contract change, orstale testCompletion includes
DONE,VERIFIED,NOT VERIFIED,ASSUMED,NOTICED, andNEXTReviewer can map each changed line to the user request or a declared scope expansion
Trust-boundary code includes validation, structured error behavior, and retry/idempotency decisions where relevant
Are Karpathy’s four coding-agent principles wrong?
No. They are good defaults. The issue is that they are underspecified for production. They say be surgical, but not how to declare scope. They say verify, but not how to prevent test-gaming. The hardened version keeps the direction and adds controls.
Why add Calibrated Communication as a fifth principle?
Calibrated Communication as a fifth principle?Because a coding agent’s final report shapes review decisions. If the agent claims work it did not do, hides checks it did not run, or performs confidence instead of reporting evidence, the reviewer loses time and may merge risk. DONE plus VERIFIED plus NOT VERIFIED is the minimum review surface.
Should agents be forbidden from editing tests?
No. Agents should be forbidden from weakening tests to get green. New tests are useful when they fail before the implementation and pass after. Contract-change test updates are valid when the contract changed intentionally. The key is declaration.
Does declared scope slow agents down?
A little. That is the point. Scope expansion should carry friction because over-editing is one of the most common ways coding agents create regressions. Start in warning mode and tune the granularity before blocking.
What is the smallest version to ship first?
Ship the five principles and the completion schema first. Then add a warning-only scope hook. Do not start with a heavy enforcement system. You need local data before you know which drift patterns are real in your codebase.
The useful version of the Karpathy principles is not more polite. It is more falsifiable.
Think Before Coding becomes a rule about uncertainty type. Simplicity First becomes a rule about boundaries and earned abstractions. Surgical Changes becomes a declared-scope contract. Goal-Driven Execution becomes adversarial verification with a stop rule. Calibrated Communication makes the final report reviewable.
That is the production move: less agent personality, more state. Less confidence, more evidence. Less green-test theater, more independent verification. The prompt should not ask the agent to be a better developer in the abstract. It should make the bad moves expensive and the honest moves easy.
- [1]Mockus and Weiss: Predicting Risk of Software Changes(mockus.org)↩
- [2]SmartBear: What Is Code Review? Cisco peer review dataset summary(smartbear.com)↩
- [3]Google Engineering Practices: Small CLs(google.github.io)↩
- [4]SWE-Bench+: Enhanced Coding Benchmark for LLMs(arxiv.org)↩
- [5]Are Solved Issues in SWE-bench Really Solved Correctly?(arxiv.org)↩
- [6]Claude's Constitution(anthropic.com)↩
- [7]Structured Uncertainty guided Clarification for LLM Agents(arxiv.org)↩
- [8]Calibration and Correctness of Language Models for Code(arxiv.org)↩
- [9]Snyk ToxicSkills: malicious AI agent skills and supply-chain exposure(snyk.io)↩