The senior engineer you just hired is not confused because they are incompetent. They are confused because the context they need to operate is distributed across systems that were never designed to be read by anyone other than the people who built them. Old PRs. ADR files three migrations behind. Slack threads from 2024. The heads of three engineers who joined before the Series B.
The questions are predictable: why did we build it this way, where does this service talk to that one, what should I actually work on first. The answers exist. They are just not retrievable.
This is an indexing problem, not a documentation problem. The onboarding agent does not write new docs. It extracts the reasoning that is already in the system — PR comments, post-mortems, ADRs, incident channels — and structures it for someone encountering the codebase cold. Three layers, each targeting a question the new hire is actually asking on a specific day of their ramp.
Three Layers, Three Questions, Three Days of Ramp
Each layer collapses a different feedback loop between the new hire and the people who already know the answer.
One agent is not enough. The questions a new hire asks on day one are different from the ones that show up on day fifteen, and the same retrieval pipeline cannot serve both well. Layer 1 hands over the map. Layer 2 explains why the map looks the way it does. Layer 3 picks the first move. Run sequentially, each layer narrows the surface area the next one has to cover.
Layer 1: The Service Brief That Beats Stale READMEs
Generated structural orientation. Not auto-doc theater — a brief built for someone reading the system cold.
The orientation agent runs against every repository the new hire will touch and produces a service brief per repo. This is not the README rewritten by a model. The README is an input — usually a stale one — among several.
For each service, the brief carries four things:
What it does. A two-paragraph plain-language description derived from README, API surface, and route handlers. If the README has not been touched in six months, the agent flags it and falls back to code-level analysis.[1] Stale documentation with no freshness signal is worse than no documentation — it manufactures false confidence.
How it connects. A dependency map extracted from imports, API client configurations, and infrastructure-as-code. What this service calls. What calls it. What shared state it touches. The map is generated, not maintained — drift is not possible because nobody owns it manually.
Where the load-bearing files are. A guided tour of the directories that matter, weighted by commit frequency over the last 90 days. Files that change often are files the new hire will touch. Stable utility code can wait.
Who actually owns it. CODEOWNERS plus the most active contributors in the last 90 days plus the on-call rotation. The right answer to 'who do I ask' is a name, not a team channel.
Service Brief Output: One Folder Per Repo, Five Files
treeonboarding-briefs/
├── payments-service/
│ ├── overview.md
│ ├── dependency-map.json
│ ├── key-files-guide.md
│ ├── ownership.md
│ └── recent-changes.md
├── user-service/
│ ├── overview.md
│ ├── dependency-map.json
│ ├── key-files-guide.md
│ ├── ownership.md
│ └── recent-changes.md
└── notification-service/
├── overview.md
├── dependency-map.json
├── key-files-guide.md
├── ownership.md
└── recent-changes.mdLayer 2: The Reasoning That Was Never Written Down
The historical context exists in PR threads, ADRs, and post-mortems. It just was never indexed for retrieval.
The most expensive sentence in any onboarding conversation is 'there's a reason for that, but nobody remembers exactly what it was.' The reason exists. Some senior engineer typed it out two years ago in a PR review or an incident channel and then closed the tab. Layer 2 indexes those typings.
Three sources, ranked by signal density:
Architecture Decision Records. Where they exist, ADRs are the highest-density context source the team produces. The agent parses every ADR, links each decision to the services and code paths it affects, and surfaces them on lookup.[4] When the new hire asks why the notification service polls instead of taking webhooks, the answer is the 2024 ADR, not a Slack ping to the senior on call.
PR review threads. The richest source most teams have, and the most under-indexed. Significant PRs — 8+ review comments, multiple revision cycles, large diffs — carry inline arguments about tradeoffs, rejected alternatives, and the failure modes the author is defending against. The agent indexes those discussions and ties them to the files they touch.
Post-mortem documents. Defensive code looks excessive until you read the post-mortem. The retry wrapper that wraps every external call has a date attached to it: the cascading timeout in 2024 that took the payment pipeline down for three hours. The agent links recommendations to the diffs that implemented them so the new hire reads code with the scar tissue visible.
context-index-builder.ts// One entry per indexed source. Path tags are how lookup actually works.
interface ContextEntry {
source: 'adr' | 'pr-discussion' | 'post-mortem' | 'slack-thread';
sourceUrl: string;
date: string;
relevantPaths: string[]; // files this context applies to
summary: string; // 2-3 sentence summary
keyDecision: string | null; // the call that was made
rejectedAlternatives: string[]; // what got considered and dropped
participants: string[]; // who was in the room
tags: string[]; // service names, technology names
}
async function buildContextIndex(
repos: string[],
adrPath: string,
postMortemPath: string
): Promise<ContextEntry[]> {
const entries: ContextEntry[] = [];
// ADRs first — highest signal density per token
const adrs = await parseADRDirectory(adrPath);
for (const adr of adrs) {
entries.push({
source: 'adr',
sourceUrl: adr.filePath,
date: adr.date,
relevantPaths: adr.affectedPaths,
summary: await summarizeDocument(adr.content),
keyDecision: adr.decision,
rejectedAlternatives: adr.alternatives ?? [],
participants: adr.authors,
tags: adr.tags,
});
}
// PR threads — filter hard, signal lives in the contested ones
for (const repo of repos) {
const prs = await getSignificantPRs(repo, {
minComments: 10,
minReviewCycles: 3,
lookbackDays: 365,
});
for (const pr of prs) {
entries.push({
source: 'pr-discussion',
sourceUrl: pr.url,
date: pr.mergedAt,
relevantPaths: pr.changedFiles,
summary: await summarizePRDiscussion(pr.comments),
keyDecision: extractDecision(pr.comments),
rejectedAlternatives: extractRejectedApproaches(pr.comments),
participants: pr.reviewers,
tags: inferServiceTags(pr.changedFiles),
});
}
}
// Post-mortems — explain the defensive code the new hire will otherwise question
const postMortems = await parsePostMortems(postMortemPath);
for (const pm of postMortems) {
entries.push({
source: 'post-mortem',
sourceUrl: pm.url,
date: pm.date,
relevantPaths: pm.affectedServices.flatMap(s => s.paths),
summary: pm.summary,
keyDecision: pm.rootCause,
rejectedAlternatives: [],
participants: pm.responders,
tags: pm.affectedServices.map(s => s.name),
});
}
return entries;
}Layer 3: The First Ticket That Teaches Something
The starter ticket either teaches the codebase or wastes a week. Match scope, skill, and sprint context — or don't ship the agent at all.
Most teams botch the starter ticket. Either it is so trivial it teaches nothing — fix a typo in a doc nobody reads — or it is so sprawling the new hire spends three weeks understanding why the architecture is the way it is before they touch a line of code. The right starter ticket sits in a narrow band: meaningful enough to force engagement with the codebase, scoped enough to ship in two to three days, and connected to something the team will actually merge.
The matching agent takes three inputs and returns a ranked list:
Sprint context. Current backlog, team velocity, upcoming deadlines. The agent surfaces tickets that are needed but not on the critical path — work the team wants done but will not block the sprint if the new hire takes longer than expected.
Skill profile. Stated experience with languages, frameworks, and domain areas, captured during the interview process or a structured intake. A frontend specialist does not get a Kubernetes networking ticket on day three. This is not a kindness. It is a calibration of where the new hire's existing context actually transfers.
Codebase accessibility. From Layer 1: which services have current documentation, the highest test coverage, and the most active reviewers. The agent biases toward those. The first ticket lands where the safety net is densest.
Output is a ranked list of three to five tickets, each with the rationale attached: what the new hire will learn, which services the change will touch, who they should pair with on review.
| Factor | Weight | Score 1 (Cut It) | Score 5 (Ship It) |
|---|---|---|---|
| Scope clarity | 0.25 | Vague requirements. Done criteria undefined. | Bounded scope. Acceptance criteria written down. |
| Learning value | 0.25 | Trivial change. Codebase exposure: zero. | Touches 2-3 services. Forces engagement with core patterns. |
| Safety net | 0.20 | No tests. No active owner. Sole reviewer on PTO. | Strong test suite. Reviewers responsive within hours. |
| Sprint relevance | 0.15 | Backlog filler. Team will not notice if it ships. | Current sprint. Team needs it merged this cycle. |
| Skill match | 0.15 | Requires a stack the new hire has never touched. | Aligns directly with stated experience. |
The Highest-Value Context Is the Context Nobody Curated
The most valuable onboarding artifact is not a document anyone wrote. It is the by-product of every team's daily communication that nobody curated for onboarding. The agent harvests this passively — no new process, no one drafting onboarding wikis on top of their actual work.
PR comment mining. Review comments are direct links between code patterns and operational scars. When a reviewer writes 'use the existing retry wrapper — last time someone rolled their own we got the cascading timeout from INC-2847,' that comment is a pointer from a code line to an incident. The agent indexes those pointers.
Slack thread analysis. Project channels carry decision narratives that never become formal docs. The agent scans threads with high engagement — many participants, many messages — in channels tagged to the new hire's team, extracts the decision, and links it to the relevant code or ticket.
Incident context. Post-mortems document what broke. The richer context lives in the incident channel itself: the hypotheses that were tested and rejected, the workarounds that became permanent, the assumptions that turned out to be wrong. The agent indexes the channel transcript alongside the formal write-up.
All of this runs continuously. By the time the new hire signs their offer, the index is already populated with months of institutional knowledge.[3]
A caveat that cost us a sprint to learn. The first version pulled every PR comment indiscriminately. Result: noise. Snarky review banter, debates from a pre-migration architecture, context tied to code that had since been deleted. The index is only as useful as its filter. The threshold that worked for us: 8+ review comments, 2+ participants, merged within the last 18 months. Older PRs are excluded unless they touch infrastructure that has not changed. The filter is not optional. It is the product.
README last touched 14 months ago. Treated as truth.
Shadow a senior engineer for a week. Hope they remember to explain the load-bearing parts.
Ask 'why' in Slack. Wait hours. Get half an answer.
Starter ticket assigned by guesswork. Teaches nothing or everything.
Tribal knowledge transferred by accident, through trial and error.
First meaningful PR: 60-90 days in.
Service brief regenerated weekly. Freshness timestamped.
Query the context index. Historical reasoning surfaces in seconds with sources attached.
PR threads, ADRs, and post-mortems indexed and tied to the files they explain.
Starter ticket matched to skill profile, sprint priority, and codebase safety net.
Institutional knowledge searchable from day one. Senior engineers stop being a verbal cache.
First meaningful PR: 2-4 weeks, depending on codebase complexity.
What the numbers do and do not prove
These ranges come from self-reported outcomes at teams running structured onboarding, not controlled studies. Codebases with active ADR practices and decent test coverage see the biggest gains. Teams with sparse documentation see modest improvements until the context index matures — which takes a quarter, not a week. Pilot with one team. Measure time-to-first-PR before claiming org-wide impact. Numbers without a baseline are theater.
Pre-Launch Checklist: What Has To Be True Before the Agent Ships
Three to five core repositories named — not 'all of them'
Repository scanner running and producing dependency maps without manual edits
CODEOWNERS plus 90-day commit log feeding ownership data
ADRs indexed with service and path tags, gaps flagged where they don't exist
PR thread indexer filtering on 8+ comments, 2+ participants, 18-month lookback
Post-mortems parsed and linked to the diffs that implemented the fix
Slack scanner scoped to public engineering channels — never DMs, never private, list checked into the repo
Skill profile intake form built, answered before the new hire's first day
Sprint backlog wired from Jira or Linear, ticket metadata synced
Starter ticket scoring weights agreed by the engineering lead, not invented at runtime
Outputs packaged into one onboarding portal — not five tabs the new hire has to find
Piloted with the most recent hire. Accuracy reviewed before next hire onboards.
How do you keep the service briefs from going stale?
Run Layer 1 on a weekly cron, not on hire dates. Diff every run. Briefs that change every week are telling you the service is in flux — that is itself useful context for the next new hire. Version each brief with a generated-on timestamp. A stale brief with no freshness signal is worse than no brief at all because it manufactures false confidence.
What if the team doesn't keep ADRs?
Most teams don't. That is the default state, not the exception. The agent compensates by raising the weight on PR mining — significant PRs with 8+ comments and multiple revision cycles serve as informal ADRs. The agent can also draft retroactive ADRs from the highest-signal PR threads and surface them to engineering leadership for formalization. Onboarding becomes the forcing function for the ADR practice the team should have started two years ago.
How do you keep sensitive material out of the Slack scan?
Scope hard. Public engineering channels only — never DMs, never private channels, never HR or leadership channels. Keyword filter against 'salary', 'performance review', 'layoff', 'HR'. Publish the full list of scanned channels in the repo as a config file so any engineer can read it and propose changes via PR. The list is not a secret. The transparency is the point — it stops the scanner from becoming an unmonitored surveillance vector.
Does the agent replace the buddy or mentor?
No, and conflating them is the mistake teams make on first deployment. The agent handles information transfer — the structured, factual context about code, architecture, and historical decisions that takes hours to communicate verbally and is mostly already written down somewhere. The buddy handles cultural integration, unwritten norms, and the judgment calls no index will ever capture. The agent's job is to stop the buddy from being a walking encyclopedia so they can focus on the parts of onboarding that actually require a human: psychological safety, team dynamics, advocating for the new hire when nobody else will.
The bottleneck is not access to code. It is access to context. Every engineering organization sits on months of accumulated decisions, tradeoffs, and operational scars embedded in artifacts nobody curates for onboarding. The agent does not write new docs. It indexes the artifacts that already exist and makes them retrievable on the day someone new needs them.
Ship Layer 1 first. Repository access only — minimal integration cost, immediate signal. Add Layer 2 once the service briefs are accurate enough that engineers stop correcting them. Add Layer 3 last, after sprint integration and skill profile intake are stable. The system compounds: every new hire who uses it sharpens the index for the next one.[2]
Tribal knowledge is a single point of failure. Index it before the next hire shows up.
- [1]Cortex — Developer Onboarding Guide(cortex.io)↩
- [2]Enboarder — AI Onboarding Tool Guide 2026(enboarder.com)↩
- [3]Instruqt — Navigating the Codebase: Seamless Engineer Onboarding Plan(instruqt.com)↩
- [4]AWS Prescriptive Guidance — Architectural Decision Records Process(docs.aws.amazon.com)↩