Internal AI Playbook: Audit, Distribute, Govern Team Skills

Your Engineers Already Use AI. They Are All Doing It Differently.

Fifty engineers running fifty private AI workflows is not adoption. It is a coordination tax with no owner. Audit what is already running, isolate the workflows with org-wide leverage, ship a versioned skills repo, and govern the blast radius before a shared skill drops a column in production.

Strategy & Operating ModelbeginnerAug 23, 20256 min read

By Viktor Bezdek · VP Engineering, Groupon

Your team is already using AI. That part is settled. A 2025 PwC survey of 300 U.S. executives reports roughly 79% of organizations running AI agents in production^[1], and Gartner projects roughly 40% of enterprise applications will ship task-specific AI agents by the end of 2026^[2] — up from under 5% in 2025. The question is no longer whether your engineers adopt AI tooling. It is whether any two of them are doing it the same way.

Every senior engineer has a personal collection of prompts. Staff engineers have built private workflows that shave hours off their week. One team swears by their code review automation. The team across the hall uses something completely different and does not know the first one exists. At five engineers this looks like creativity. At fifty it is a coordination tax with no owner. The engineers running the highest-leverage workflows are usually not the ones posting in Slack — they have integrated AI so deep into their loop they no longer think of it as a tool. Your audit will find them anyway.

This is the move from scattered private usage to a governed, version-controlled internal playbook. Audit what is already running. Pick the three workflows that compound. Ship a distribution layer. Govern the blast radius before a shared skill ships a bad migration.

Phase 1: Find Out What Is Already Running

You cannot standardize what you have not seen. Quiet adoption is the rule, not the exception.

Before any policy document, get ground truth. Most engineering leaders overestimate how much they know about their team's daily AI usage. The engineers who post about AI in Slack are the vocal minority — not the representative sample. The interesting adoption happens in private IDE configs, personal shell scripts, and browser extensions nobody mentions in standups.

Run the audit as knowledge sharing, not compliance. Three questions only. What tools are people actually using. What tasks have they automated. Where are they getting real time savings — not the theoretical kind.

[01]
Async survey, specific prompts, single deadline
List every AI tool each engineer used in the last two weeks, what tasks they applied it to, and a rough estimate of time saved. Pre-categorize: code generation, code review, documentation, debugging, architecture, testing, communication. Vague categories produce vague answers.
[02]
Grep the repos for what is already institutionalized
Search every repo for .claude/ directories, CLAUDE.md files, custom MCP configs, .cursorrules, shared prompt libraries. These artifacts surface real patterns better than self-reporting. People underreport in surveys. They commit configuration to source control.
[03]
1:1 shadowing — 30 minutes, three to five engineers, mixed seniority
Watch them work. The patterns people forget to mention are the ones that have become invisible habits. A junior might use AI for every commit message. A staff engineer uses it only for architecture decisions. Both are signal. Neither shows up in the survey.
[04]
Synthesize into a usage map you can argue with
Plot every discovered workflow on a 2x2: frequency (daily vs occasional) against breadth (one person vs multiple teams). Top-right quadrant — high frequency, broad adoption — is where standardization pays. Everything else is decoration.

Phase 2: Three Workflows. Not Thirty.

Standardization has a cost. The trick is finding the workflows where standardization pays it back many times over.

The audit will surface dozens of AI-assisted workflows. The instinct is to standardize all of them. Resist it. The goal is the three to five workflows that deliver outsized returns when adopted consistently across the org. Everything else stays where it is.

Think about this the way you think about platform investments. A workflow has org-wide leverage when three things are true at once. It is performed frequently by many people. The variance between a good and bad execution is high. The output feeds downstream into work other teams depend on. Two of three is interesting. All three is where you spend the standardization budget.

Keep Personal

Personal commit message formatting preferences
Individual code snippet generation styles
One-off data analysis scripts
Personal email drafting assistance
Ad-hoc meeting note summarization

Standardize

PR review checklists that enforce team quality standards
Incident response runbook generation from alerts
API documentation generation tied to CI pipelines
Onboarding task scaffolding for new team members
Architecture Decision Record drafting with context

Once the candidate list is short, validate it under load. Pick two or three. Run a two-week pilot where a second team adopts the workflow as documented by the originating team — no extra coaching, no Slack hand-holding. If the second team picks it up inside a day and sees measurable benefit inside a week, the workflow standardizes cleanly. If they hit edge cases the originator forgot to document or find it does not transfer to their domain, the workflow belongs in the recommended-but-optional tier. The pilot is the leverage check the survey cannot run.

Phase 3: A Distribution Layer, Not a Wiki Page

Shared workflows that live in a Notion doc decay. Shared workflows that live in source control compound.

Individual prompt files do not scale. Once you know which workflows deserve standardization, you need a distribution mechanism that handles versioning, dependencies, and team-specific overrides. In a Claude-native organization, that means treating CLAUDE.md files, custom commands, and MCP configurations as a proper internal platform — owned, tested, versioned, deployed.

The pattern that holds up: a monorepo for shared AI configuration with a clear directory structure, a sync script that pushes to consuming repos, and team override slots that do not require forking the base config.

Shared AI Playbook Repository Structure

tree

ai-playbook/
├── skills/
│   ├── pr-review/
│   │   ├── SKILL.md
│   │   ├── README.md
│   │   └── tests/
│   ├── incident-response/
│   │   ├── SKILL.md
│   │   ├── README.md
│   │   └── tests/
│   └── adr-drafting/
│       ├── SKILL.md
│       ├── README.md
│       └── tests/
├── base-configs/
│   ├── CLAUDE.md
│   └── mcp-servers.json
├── team-overrides/
│   ├── platform/
│   ├── frontend/
│   └── data-eng/
├── scripts/
│   ├── sync-to-repos.sh
│   └── validate-skills.ts
├── CHANGELOG.md
└── OWNERS.md

scripts/sync-to-repos.sh

#!/bin/bash
# Versioned playbook sync. Runs on merge to main. Per-repo skills, not a monolith.

PLAYBOOK_VERSION=$(git describe --tags --abbrev=0)
TARGET_REPOS=$(cat repos.json | jq -r '.repositories[]')

for repo in $TARGET_REPOS; do
  echo "Syncing to $repo (v$PLAYBOOK_VERSION)"
  
  # Base config first
  cp base-configs/CLAUDE.md "/tmp/$repo/.claude/CLAUDE.md"
  
  # Team override layered on top, never replacing the base
  TEAM=$(cat repos.json | jq -r ".teams[\"$repo\"]")
  if [ -d "team-overrides/$TEAM" ]; then
    cat "team-overrides/$TEAM/CLAUDE.md" >> "/tmp/$repo/.claude/CLAUDE.md"
  fi
  
  # Only the skills this repo declared it needs
  SKILLS=$(cat repos.json | jq -r ".skills[\"$repo\"][]")
  for skill in $SKILLS; do
    cp -r "skills/$skill" "/tmp/$repo/.claude/commands/$skill"
  done
  
  echo "Synced v$PLAYBOOK_VERSION to $repo"
done

A SKILL.md File Is Source Code

Semver, changelog, owner, test fixtures. The same rigor you apply to any shared library.

A SKILL.md file is source code. It shapes the behavior of a system that produces artifacts your team depends on. Treat it like a shared npm package or internal SDK, not like a wiki page.

Every SKILL.md needs a version, a changelog, a clear description of intended behavior, and at least one test case that proves it produces the expected output. Updating a skill carries the same constraints as updating any other dependency: backward compatibility by default, explicit breaking changes with migration guides, and the ability to pin a previous version when the new one breaks something specific to one team. If you cannot pin a version, you have a wiki, not a platform.

Practice	What It Buys You	Implementation
Semantic versioning	Teams pin to majors and adopt minors automatically — no surprise behavior changes	Tag skill files with semver in the playbook repo; the sync script honors version constraints per repo
Per-skill changelog	Engineers know what changed before adopting an update — no archeology required	CHANGELOG.md inside each skill directory, updated on every PR that touches the skill
Automated validation	Catches regressions before they reach production workflows — including model-side drift	CI runs each skill's test suite against sample inputs, checks output structure, fails the build on regression
Deprecation policy	Prevents abrupt removal of workflows that other teams depend on	30-day deprecation window with automated warnings injected by the sync script
Ownership metadata	An unambiguous person to call when the skill misbehaves at 3am	OWNERS.md per skill listing primary and secondary owners with escalation paths

Skills Drift. Build the Inspection Loop.

Publishing a skill is the start of the work, not the end. The models change, the codebase changes, the team changes.

Publishing a skill is not the finish line. It is the start. AI-assisted workflows need ongoing calibration because the underlying models evolve, the codebase shifts under them, and the team's needs move. Skills that worked in March produce subtly worse output in November and nobody notices until an audit forces them to look.

Quarterly review cadence. Skill owners present usage data, failure patterns, and proposed improvements. Not bureaucracy — the mechanism that keeps the playbook from decaying into stale documentation nobody trusts.

What we got wrong on the first pass: we built the cadence around 'is this skill good?' Wrong question. The real question is 'is this skill still being used, and if not, why.' Skills that fall out of use never announce themselves. Engineers quietly stop invoking them and revert to doing the work manually. A skill with zero invocations in 30 days is a louder signal than a skill with a 30% override rate, because at least the engineers overriding the output are still engaging with it.

Invocation Count

Is the skill being used at all? Low usage means poor discoverability or low value — both require action.

Override Rate

How often do engineers edit or discard the output? High override means the skill needs tuning, not deprecation.

Time-to-Value

Invocation to useful output. Three minutes of waiting for a result the engineer rewrites is a net negative.

Feedback Loops

Are issues getting filed? Silence usually means engineers stopped using the skill, not that it works.

Monthly Lightweight Check-ins

✓
Pull the past 30 days of usage metrics — invocation count, override rate, time-to-value
✓
Triage bug reports and feature requests filed against skills
✓
Check whether model updates have shifted output quality on baseline fixtures
✓
Refresh test fixtures if the underlying codebase has moved out from under them

Quarterly Deep Reviews

✓
Skill owners present a retrospective on the skill's performance against the original benchmark
✓
Compare current output quality to the validation suite from launch — drift is the default
✓
Decide explicitly: promote, demote, or retire. Letting a skill linger is a decision too.
✓
Pull cross-team feedback from engineers outside the owning team — they see what owners stop noticing
✓
Update documentation and test suite to match what the skill actually does now

Onboarding: New Hires Productive in Week One

If a new engineer needs a senior to walk them through every skill, your documentation is the thing that broke.

The fastest way to find out whether your AI playbook actually works is to watch a new hire try to use it. If they need a senior engineer to walk them through every skill, your documentation has gaps you have stopped seeing. If they invoke a skill in the wrong context and get confusing output, your guardrails need work. Both are diagnostic — neither is the new hire's fault.

Onboarding in a Claude-native organization treats the AI playbook as a first-class tool, the same as the CI pipeline, monitoring stack, or deployment process. New engineers do not just learn how to code here. They learn how to work with AI here. The two are no longer separable.

AI Playbook Onboarding Checklist

Local environment configured with org CLAUDE.md and team-specific overrides applied
MCP servers connected and validated with a real test query, not a smoke check
Walked through three core skills (PR review, docs generation, incident response) on a real example
Paired with a mentor on a real task using each core skill — not a sandbox exercise
Read the playbook repo structure and OWNERS.md — knows who to call when a skill breaks
Added to the #ai-playbook channel — visible to updates and incident discussion
Knows the governance model: how to file an issue, request a change, escalate a failure
Shipped a practice change: modified an existing skill and submitted the PR

AI Playbook Lifecycle

A loop, not a project. Each phase feeds the next iteration — and the next audit catches the drift the last cycle missed.

Governance: When a Shared Skill Drops a Column in Production

Shared workflows amplify both good patterns and bad ones. Govern the blast radius before the incident.

Here is the scenario every VP of Engineering needs to think through before it happens. A shared skill generates a database migration that passes code review, gets deployed, and drops a column in production. Or a PR review skill quietly approves a subtle security anti-pattern because its instructions never accounted for your auth model. Shared workflows do not just spread good patterns. They spread bad ones at exactly the same speed.

Governance is not about preventing every mistake. It is about limiting blast radius, naming an owner, and building feedback loops that make the system self-correcting before the next incident review^[7].

AI Playbook Governance Rules

[01]

Every shared skill has a designated owner in OWNERS.md

When a skill misbehaves, there is one person to call — not a Slack channel, not a team alias. Ownership rotates annually so the knowledge does not silo into a single engineer.

[02]

Skills that modify code or infrastructure require a human review gate

Read-only skills (documentation, analysis) run autonomously. Skills that produce code or config destined for production carry a mandatory human review step in the workflow itself, not as an external convention.

[03]

Any production incident traced to a skill triggers a mandatory review within 48 hours

The review must produce one of three artifacts: a skill update, an added test case, or a scope reduction. The finding lands in the skill's CHANGELOG. No finding, no review.

[04]

Skills operating on sensitive data log inputs and outputs for 30 days

Audit trails are non-negotiable for workflows touching PII, financial data, or access controls. Structured logging only — anything that requires grep across raw text is not an audit trail, it is a hope.

[05]

Breaking changes to a shared skill require approval from at least two consuming teams

The skill owner cannot unilaterally change behavior other teams depend on. This kills well-intentioned improvements before they break the workflows downstream.

~79%

of organizations surveyed run AI agents in production today (PwC, 2025). Share varies by industry.

~40%

of enterprise apps projected to ship task-specific AI agents by end of 2026 (Gartner forecast). Up from under 5% in 2025.

48hrs

incident-review window for any production failure traced to a shared skill — calibrate to your team size and on-call capacity.

Ownership: Three Patterns. Pick the One That Matches Your Stage.

Wrong model for your stage produces either a bottleneck or chaos. Both ways the playbook decays.

The ownership model maps to your team size and structure. There is no universally correct answer. There is a wrong answer for your stage — and it produces either a bottleneck or chaos. Both routes end in a playbook nobody trusts.

Model	Mechanism	Where It Fits	Failure Mode
Centralized Platform Team	Two to four engineers own all shared skills, review every PR, run distribution	Orgs with 100+ engineers where consistency matters more than speed	Platform team becomes the bottleneck; skills lose touch with domain-specific reality
Federated Ownership	Each team owns skills in its domain; a lightweight standards body reviews cross-team skills	Orgs with 30-100 engineers spread across distinct product areas	Quality varies by team; cross-cutting skills carry coordination overhead
Guild Model	Voluntary guild of AI-interested engineers maintains the playbook as a 20% project	Orgs with 10-30 engineers where a dedicated platform team is not yet justified	Depends on volunteer attention; stalls the moment guild members get pulled to product work

What to Ship This Quarter

You do not need the entire system in this guide before you see value. The playbook is itself an iterative product. Ship a minimal version, gather feedback, expand based on what your team actually needs — not what looks impressive in an architecture diagram nobody reads.

Start with the audit. One week, zero infrastructure. The findings alone reshape how you think about AI adoption inside your org. From there, pick one high-leverage skill, document it properly, distribute it to two teams, watch what happens. That is the proof of concept.

The orgs that compound over the next two years are not the ones running the newest AI tools^[3]. They are the ones that turned AI workflows into a shared, governed, continuously-improving organizational capability — instead of a collection of private superpowers that walk out the door when the engineer who built them leaves.

How do we handle engineers who refuse to standardize their personal workflows?

Do not force standardization across the board. Make the shared playbook genuinely better than personal setups — invest in testing, documentation, fast iteration. Engineers adopt tools that save them time. If your standardized workflow is slower or weaker than what an engineer built privately, that is a signal to fix the standard, not enforce compliance. Mandates produce surface adoption with private workarounds. Better tooling produces real adoption.

What happens when a model update breaks a shared skill?

Automated validation is the answer. CI runs every skill's test suite on a weekly schedule even when nothing in the playbook has changed — specifically to catch model-side regressions. When a break is detected, the skill owner gets paged automatically and has 48 hours to either fix the skill or pin a specific model version. No automated validation means the breakage discovers itself in production.

Should we version-lock the AI model used by shared skills?

For high-stakes workflows — incident response, security review — yes. Pin the model version and upgrade deliberately after running the validation suite against the new version. For lower-stakes skills like documentation drafting or commit messages, allow automatic model updates and watch the metrics dashboard for quality drift. The pin is a constraint; constraints cost something. Apply them where the cost of a regression exceeds the cost of falling behind.

How do we measure ROI on the AI playbook investment?

Three numbers. Time saved per workflow invocation multiplied by invocation frequency. Reduction in quality-related rework — the bugs caused by inconsistent processes that the playbook removes. Onboarding velocity, the time for new engineers to reach full productivity. The third is the one that ends the ROI conversation: engineers at orgs with mature AI playbooks reach full productivity in 3-4 weeks versus 6-8 weeks without one. A 50-person team hiring 10 engineers per year captures roughly 200-300 engineer-weeks of additional productive capacity annually. That number is the answer.

Key terms in this piece

AI playbookAI workflow standardizationengineering team AI adoptionClaude-native organizationSKILL.md version controlAI governance engineeringinternal AI standardsVP engineering AI strategy

Sources

[1]CIO — How Agentic AI Will Reshape Engineering Workflows in 2026(cio.com)↩
[2]Gartner — 40% of Enterprise Apps Will Feature AI Agents by 2026(gartner.com)↩
[3]Optimum Partners — Engineering Management 2026: How to Structure an AI-Native Team(optimumpartners.com)↩
[4]OpenAI — Building an AI-Native Engineering Team(cdn.openai.com)↩
[5]Anthropic — Enterprise AI Deployment Guide(assets.anthropic.com)↩
[6]Promise Legal — The Complete AI Governance Playbook for 2025(blog.promise.legal)↩
[7]Liminal — Enterprise AI Governance Guide(liminal.ai)↩

Share this article

X LinkedIn Hacker News

Your Engineers Already Use AI. They Are All Doing It Differently.

Strategy & Operating ModelbeginnerAug 23, 20256 min read

By Viktor Bezdek · VP Engineering, Groupon

ai-playbook/ ├── skills/ │ ├── pr-review/ │ │ ├── SKILL.md │ │ ├── README.md │ │ └── tests/ │ ├── incident-response/ │ │ ├── SKILL.md │ │ ├── README.md │ │ └── tests/ │ └── adr-drafting/ │ ├── SKILL.md │ ├── README.md │ └── tests/ ├── base-configs/ │ ├── CLAUDE.md │ └── mcp-servers.json ├── team-overrides/ │ ├── platform/ │ ├── frontend/ │ └── data-eng/ ├── scripts/ │ ├── sync-to-repos.sh │ └── validate-skills.ts ├── CHANGELOG.md └── OWNERS.md

#!/bin/bash # Versioned playbook sync. Runs on merge to main. Per-repo skills, not a monolith. PLAYBOOK_VERSION=$(git describe --tags --abbrev=0) TARGET_REPOS=$(cat repos.json | jq -r '.repositories[]') for repo in $TARGET_REPOS; do echo "Syncing to $repo (v$PLAYBOOK_VERSION)" # Base config first cp base-configs/CLAUDE.md "/tmp/$repo/.claude/CLAUDE.md" # Team override layered on top, never replacing the base TEAM=$(cat repos.json | jq -r ".teams[\"$repo\"]") if [ -d "team-overrides/$TEAM" ]; then cat "team-overrides/$TEAM/CLAUDE.md" >> "/tmp/$repo/.claude/CLAUDE.md" fi # Only the skills this repo declared it needs SKILLS=$(cat repos.json | jq -r ".skills[\"$repo\"][]") for skill in $SKILLS; do cp -r "skills/$skill" "/tmp/$repo/.claude/commands/$skill" done echo "Synced v$PLAYBOOK_VERSION to $repo" done

Practice

What It Buys You

Implementation

Semantic versioning

Teams pin to majors and adopt minors automatically — no surprise behavior changes

Tag skill files with semver in the playbook repo; the sync script honors version constraints per repo

Per-skill changelog

Engineers know what changed before adopting an update — no archeology required

CHANGELOG.md inside each skill directory, updated on every PR that touches the skill

Automated validation

Catches regressions before they reach production workflows — including model-side drift

CI runs each skill's test suite against sample inputs, checks output structure, fails the build on regression

Deprecation policy

Prevents abrupt removal of workflows that other teams depend on

30-day deprecation window with automated warnings injected by the sync script

Ownership metadata

An unambiguous person to call when the skill misbehaves at 3am

OWNERS.md per skill listing primary and secondary owners with escalation paths

Model

Mechanism

Where It Fits

Failure Mode

Centralized Platform Team

Two to four engineers own all shared skills, review every PR, run distribution

Orgs with 100+ engineers where consistency matters more than speed

Platform team becomes the bottleneck; skills lose touch with domain-specific reality

Federated Ownership

Each team owns skills in its domain; a lightweight standards body reviews cross-team skills

Orgs with 30-100 engineers spread across distinct product areas

Quality varies by team; cross-cutting skills carry coordination overhead

Guild Model

Voluntary guild of AI-interested engineers maintains the playbook as a 20% project

Orgs with 10-30 engineers where a dedicated platform team is not yet justified

Depends on volunteer attention; stalls the moment guild members get pulled to product work

Phase 1: Find Out What Is Already Running

Async survey, specific prompts, single deadline

Grep the repos for what is already institutionalized

1:1 shadowing — 30 minutes, three to five engineers, mixed seniority

Synthesize into a usage map you can argue with

Phase 2: Three Workflows. Not Thirty.

Phase 3: A Distribution Layer, Not a Wiki Page

Shared AI Playbook Repository Structure

A SKILL.md File Is Source Code

Skills Drift. Build the Inspection Loop.

Monthly Lightweight Check-ins

Quarterly Deep Reviews

Onboarding: New Hires Productive in Week One

AI Playbook Onboarding Checklist

Governance: When a Shared Skill Drops a Column in Production

AI Playbook Governance Rules

Every shared skill has a designated owner in OWNERS.md

Skills that modify code or infrastructure require a human review gate

Any production incident traced to a skill triggers a mandatory review within 48 hours

Skills operating on sensitive data log inputs and outputs for 30 days

Breaking changes to a shared skill require approval from at least two consuming teams

Ownership: Three Patterns. Pick the One That Matches Your Stage.

What to Ship This Quarter

Related

Phase 1: Find Out What Is Already Running

Async survey, specific prompts, single deadline

Grep the repos for what is already institutionalized

1:1 shadowing — 30 minutes, three to five engineers, mixed seniority

Synthesize into a usage map you can argue with

Phase 2: Three Workflows. Not Thirty.

Phase 3: A Distribution Layer, Not a Wiki Page

Shared AI Playbook Repository Structure

A SKILL.md File Is Source Code

Skills Drift. Build the Inspection Loop.

Monthly Lightweight Check-ins

Quarterly Deep Reviews

Onboarding: New Hires Productive in Week One

AI Playbook Onboarding Checklist

Governance: When a Shared Skill Drops a Column in Production

AI Playbook Governance Rules

Every shared skill has a designated owner in OWNERS.md

Skills that modify code or infrastructure require a human review gate

Any production incident traced to a skill triggers a mandatory review within 48 hours

Skills operating on sensitive data log inputs and outputs for 30 days

Breaking changes to a shared skill require approval from at least two consuming teams

Ownership: Three Patterns. Pick the One That Matches Your Stage.

What to Ship This Quarter

Related