A deep teardown of the production CMS pipeline that turns GitHub Issues into merged PRs while you sleep. 11 workflows, 6 issue types, AI media generation, security hardening, and the exact DAGs that make it work.
Here is how I learned that a dark factory is not a thought experiment. It is a weekend project that accidentally becomes production infrastructure.
Groupon had a careers site. Generic blue gradients, stock photos of handshakes, centered text reading We are passionate about excellence. It said nothing specific enough to be false. It attracted no one who wanted real problems. The old site lived at grouponcareers.com and it was embarrassing every time a candidate opened it.
I decided to fix it — not in a quarter, not with a roadmap, but over a single weekend. The goal was not a perfect site. It was a site that told the truth about working at Groupon during an AI transformation, built using the same agentic patterns we were already running internally.
The preview lives at groupon-careers.vercel.app. The difference is not just visual. The old site declared conditions we wished were true. The new one declares conditions that actually exist: live marketplace pressure, legacy systems, AI workflows in production, real customer weight. It does not say join our journey. It says build while running.
As of December 2025, Spotify's best engineers had stopped writing code manually, with roughly half of all updates flowing through autonomous pipelines. Engineering time on large-scale migrations dropped by 90%.[7]
BCG Platinion's March 2026 report found 3–5x productivity gains already within reach for teams running autonomous delivery pipelines, versus the 30% gains seen with copilot-style assistance.[7]
Pioneering organizations in early 2026 demonstrated that three engineers can operate a factory where humans no longer write code — they define intent and validate outcomes.[7]
A BCG Platinion five-day task force converted two business-critical enterprise applications, hitting 20% productivity gains per application within the first two days.[7]
The build process itself was the experiment. I did not write the content files by hand. I opened GitHub Issues. cms-spotlight-add for employee profiles. cms-insight for leadership articles. cms-location for office pages. The agent read each issue, loaded the correct skills from .claude/skills/, generated headshots via fal-ai/gpt-image-2, wrote the MDX, ran pnpm lint, and opened a PR. I typed /publish on the issue. The PR merged. Vercel deployed.
By Monday morning, the preview site had eight employee spotlights, three insight articles, five location pages, and a full FAQ. None of the content was written by a human at a keyboard. All of it was reviewed by a human before merge. That distinction matters: the factory does not replace judgment. It compresses execution.
What I learned over that weekend became the pipeline this article describes. The BOT_PAT trap, the WAITING/READY signal protocol, the branch detection bug, the recursive comment filter, the media generation fallback chain — every trap here was discovered in real time while building the careers preview. The factory taught itself by failing, and I watched the logs.
A dark software factory is a repository where the control plane is not a dashboard or a scheduler — it is the native GitHub surface: Issues for intake, Pull Requests for implementation, Comments for dialogue, and Labels for state machine transitions. AI agents run inside GitHub Actions, reading the same interfaces humans use, producing the same artifacts humans produce, and following the same quality gates humans follow.
The term borrows from manufacturing: a dark factory runs without lights because there are no humans on the floor. In software, it means the repository keeps shipping while the team sleeps, attends meetings, or works on harder problems. The humans write specs and design harnesses; the agents write code, generate images, and manage state.
This article is a deep teardown of a production CMS pipeline at Groupon that processes content requests through GitHub Issues. It covers the exact workflow DAGs, the AI configuration surface, the media generation pipeline, the issue type taxonomy, the security surface exposed by agentic CI, and the specific traps that will break your factory in production.
GITHUB_TOKEN cannot fire pull_request events on PRs it creates — you need a BOT_PAT with repo scope, or CI gates and auto-merge will never trigger.[9]
The WAITING/READY signal protocol lets an agent in a sandboxed action communicate state to the workflow orchestrator without file system access or side channels.
Clarification loops require careful actor filtering: exclude bot comments, exclude /publish commands, and never remove cms-bot-waiting before PR creation succeeds.
Quality gates must run on the agent's branch (claude/issue-N-*), not the workflow's checkout of main, or you'll silently validate the wrong code.
The claude-code-action Read tool was not sandboxed until v2.1.128 — it could read /proc/self/environ and exfiltrate ANTHROPIC_API_KEY via prompt injection.[6]
The allowed_tools pattern in anthropics/claude-code-action is a security boundary, not a convenience list. It blocks dangerous commands and must exclude gh pr create, rm -rf, and any unapproved operation.
How the factory classifies work before an agent ever sees it
The factory's job queue is not a free-form inbox. It is a typed taxonomy of six issue labels, each with a strict schema, required fields, and a specific set of skills the agent must load before acting. The label is the routing key. The schema is the contract.
When an issue is opened with a cms-* label, the cms.yml workflow fires. The first thing the agent does is read AGENTS.md and load the skills mapped to that label. This is not optional configuration — it is a mandatory skill-loading protocol that determines whether the agent generates a headshot, expands an article draft, or edits a location page.
| Label | Content Type | Required Fields | Primary Skills | Media Pipeline |
|---|---|---|---|---|
cms-spotlight-add | Employee profile | name, role, team, quote, photo, 3 Q&A pairs | content-engine, cms-headshot-gen, cms-brand-voice-check, ai-slop-cleaner | generate-headshot.ts → public/images/spotlight/{slug}.png |
cms-spotlight-edit | Profile update | Fields to change (partial update) | content-engine, cms-brand-voice-check (+ cms-headshot-gen if photo) | Headshot if new photo attached |
cms-insight | Leadership article | title, author, authorRole, body >= 100 words with factual anchors | cms-article-expand, cms-image-gen, cms-brand-voice-check, ai-slop-cleaner | generate-image.ts → public/images/insights/{slug}.png |
cms-faq | FAQ entry | question, answer (2–4 sentences, direct voice) | content-engine, cms-brand-voice-check | None |
cms-location | Office page | city, region, timezone, functions, intro, highlights | content-engine (download photo if provided) | download-attachment.ts only; no generation |
cms-content-edit | Copy or component change | Target file and desired change | frontend-design for components; cms-brand-voice-check for copy | None |
The taxonomy is rigid for a reason. The agent cannot invent a new issue template. It cannot generate a logo because cms-logo is not a label. It cannot write a component change without the frontend-design skill. The harness is the design: human-authored constraints that prevent the agent from wandering outside its lane.
For spotlights, the quote field has a specific quality gate: it must be specific enough to be false. A generic quote like I love working here fails the audit because it could apply to anyone. The agent asks for a replacement via the clarification loop. For insights, the body must contain factual anchors — names, dates, tools, outcomes — or the cms-article-expand skill refuses to expand it.
The complete control flow from issue opened to PR merged, including auto-healing and code review
A dark factory is not one massive workflow. It is 11 small workflows that compose into a closed loop. Each workflow owns one state transition, reads one event type, and writes one artifact type. This decomposition is what makes the system debuggable when an agent hallucinates a file path or a label filter misfires.
The 11 workflows are: cms.yml (Issue Intake), issue-clarify.yml (Clarification Loop), cms-ci-healer.yml (CI Auto-Healer), pr-code-review.yml (Code Review + Auto-Fix), cms-publish.yml (Publish/Merge), vercel-preview.yml (Preview Notifications), cms-spend-alert.yml (Cost Monitoring), cms-stale-prs.yml (Stale PR Cleanup), cms-canary.yml (Skill Loading Test), ci.yml (Quality Gates), pr-checks.yml (Coverage Gate), and the implicit build job that Vercel runs on every push.
The orchestrator that decides whether to implement or wait
cms.yml is the heart of the factory. It triggers on issues.opened and issues.reopened when the issue carries a cms-* label and does not carry cms-bot-waiting. The first gate is the kill switch: vars.CMS_BOT_ENABLED == 'true'. If the repository variable is anything other than 'true', the workflow exits immediately. This is how you halt the entire factory without touching code.
The workflow uses pnpm/action-setup@v4 with Node 20 caching, checks out the repo with fetch-depth: 0, installs the genmedia CLI for image generation, and then invokes anthropics/claude-code-action@beta. The action is the agent container. Everything that follows depends on three configuration surfaces: model, allowed_tools, and direct_prompt.
The model field pins the agent to claude-sonnet-4-6. This is not the latest model — it is the calibrated model. The factory was tested against this specific version, and changing it without re-calibrating the allowed_tools patterns and prompt instructions risks breaking agent behavior.
The allowed_tools field is the security boundary. It uses a comma-separated list where each Bash command is prefixed with Bash(...) and supports wildcard arguments. Bash(git add:*) allows git add content/spotlights/jana-novotna.mdx but does not allow git add . && rm -rf .git because the action treats the entire string as a single command and rejects &&. This is critical: the action's Bash tool does not support command chaining, pipes, or semicolons. Every Bash call must be a single executable with literal arguments.
The direct_prompt is the control surface — not a suggestion but a script the agent executes line by line. The prompt ends with an explicit signal instruction: READY or WAITING on its own line. The workflow reads this signal from the action's output JSON file.
How the factory handles incomplete specs without human intervention
The clarification loop is where most dark factory implementations break. A user opens an issue, the agent asks for a photo, the user uploads one in a comment, and now a second workflow must re-run the agent with the accumulated context. The trigger is issue_comment.created, but the filter is the hardest part of the entire system. The pipeline also filters on github.event.actor != 'github-actions[bot]' as a second guard against recursive triggering.
You must exclude: bot comments (to prevent recursive self-triggering), /publish commands (handled by cms-publish.yml), /cancel commands, comments on pull requests (not issues), and issues that do not have both cms-bot-waiting and a cms-* label. The Groupon pipeline uses a join-based prefix match for label filtering because GitHub Actions' contains() on arrays does exact matching only. contains(join(labels.*.name, ' '), 'cms-') checks whether any label starts with cms-, which is the correct behavior for a prefix-based taxonomy.
The clarification workflow has a pre-download step that runs before the agent. It extracts photo URLs from the comment thread using gh issue view --comments --json comments, downloads them with curl and GITHUB_TOKEN auth, and writes them to public/images/spotlight/{slug}.jpg. This solves two sandbox problems at once: the agent cannot access /tmp, and GitHub attachment URLs expire after a short time. By downloading images in a workflow step, the photos are available in the repo workspace when the agent starts.
A critical bug that appeared during calibration: removing the cms-bot-waiting label before PR creation. If the label is removed early and gh pr create fails, the issue no longer has the waiting label, so issue-clarify.yml will not re-trigger on the user's next reply. The issue is stranded. The fix is to remove the label only after successful PR creation, in the success path of the same step that creates the PR.
When CI fails, the factory heals itself
The CI Auto-Healer is the most recent addition to the factory. It triggers on workflow_run.completed when the conclusion is failure, watching CI and CMS — Process content request workflows. It downloads the run log archive via gh api repos/{repo}/actions/runs/{id}/logs, extracts the last 200 error lines, and invokes Claude Code on the exact failing SHA (not main).
The agent receives the error context and a diagnostic prompt. It creates a fix/ci-healer-{run_id}-{sha} branch, pushes it, applies the fix, and opens a PR. The pr-code-review.yml workflow explicitly skips fix/ci-healer-* branches to prevent the review loop from re-triggering on its own healing PRs.
Every PR gets a Claude review before human eyes see it
The PR Code Review workflow runs on every pull request against main (opened, synchronize, reopened). It fetches the full diff with git diff base...head, pipes it to Claude Code for a structured code review, and then — if the review finds must_fix or should_fix issues — invokes a second pass to auto-fix them on the same branch.
The workflow has a critical guard: it skips branches starting with fix/ci-healer- or claude/. Without this, a bot-fix PR would trigger another review, which might trigger another fix, creating an infinite loop. The allowed_tools list is narrower than the CMS workflows: it excludes media generation and headshot scripts, focusing only on lint, typecheck, and test tools.
The single most expensive trap in dark factory engineering, and how BOT_PAT fixes it
When a GitHub Actions workflow creates a PR using the default GITHUB_TOKEN, that PR does not fire pull_request events.[9] This means your ci.yml, your pr-checks.yml, your required status checks, and your auto-merge rules will never trigger. The PR sits open, green in the sense that no checks failed, but unmergeable because no checks ran.
This is a deliberate GitHub security constraint: tokens issued to github-actions[bot] cannot create workflow events, preventing recursive automation loops. The limitation is by design, not a bug.
The fix is a Personal Access Token with repo scope, stored as BOT_PAT. PRs created with BOT_PAT fire events normally because GitHub treats them as user-created. The tradeoff: you must rotate this token and scope it to the minimum repositories. Alternatively, a GitHub App token achieves the same effect with tighter per-repo scoping and eliminates the PAT rotation burden — but adds App registration overhead.[9]
In the Groupon pipeline, BOT_PAT is used only for gh pr create. All subsequent operations — labeling, commenting, merging — use GITHUB_TOKEN so the comments are attributed to github-actions[bot] and do not re-trigger issue workflows.
Does not fire pull_request events
CI workflows and status checks never trigger
Required checks show as Expected but never run
Auto-merge rules cannot evaluate
PR remains unmergeable indefinitely
Fires pull_request events normally
CI workflows, status checks, and required checks trigger automatically
Branch protection rules evaluate correctly
Auto-merge rules can proceed on green CI
Human-like event semantics with machine execution
The PR creation step uses gh pr merge --squash --admin for same-day merges when the cms-approved label is applied. This bypasses the 24-hour branch protection delay for bot PRs that have passed all quality gates. The step also runs quality gates: pnpm lint and pnpm exec tsc --noEmit. These must run on the agent's branch, not on the workflow's checkout of main. A common latent bug: checking out main, then running lint against main while the agent's commits sit on an undetected branch. The fix is to git checkout the action branch after detecting it with git branch -r | grep origin/claude/issue-N-....
The PR body includes a cost report generated by scripts/cms/cost-report.ts. This script reads .cms-fal-log.json for fal.ai spend and estimates Anthropic cost from token counts or issue body length. Every automated PR carries its own price tag in the description.
How /publish becomes cms-approved becomes squash-merged
The publish workflow has two jobs that communicate through labels, not through shared state or message queues. Job 1 listens for /publish comments on issues, finds the associated bot PR by searching for Closes #N in PR bodies, verifies the PR has the bot-generated label, and adds cms-approved. Job 2 listens for pull_request.labeled events where the label is cms-approved and the PR has the bot-generated label, and immediately squash-merges with gh pr merge --squash --admin.
This label-based coordination degrades gracefully. If the associated PR is not bot-generated (missing the bot-generated label), Job 1 replies with a warning and exits. If the user types /publish before the bot has opened a PR, Job 1 replies with a helpful message and exits cleanly. The state machine recovers: the user replies again, the clarification loop re-triggers, and execution continues.
How fal.ai, flux, and gpt-image-2 generate assets inside a CI job
The factory generates images, not just code. Two distinct pipelines handle media: insight article headers and spotlight headshots. Both use fal.ai, both enforce a monthly spend cap, and both write structured JSON that the workflow parses.
The insight header pipeline lives in scripts/cms/generate-image.ts. It uses fal-ai/flux/dev as the primary model ($0.025/image at fal.ai rates) and fal-ai/flux/schnell as the fallback. The prompt is appended with . No text, no logos. to prevent the model from rendering watermarks or typography. The output is resized to 1200×630 via sharp, EXIF-stripped, and written atomically to public/images/insights/{slug}.png.
The headshot pipeline lives in scripts/cms/generate-headshot.ts. It uses fal-ai/gpt-image-2 as the primary model (image-to-image conditioning) and fal-ai/flux-pro/kontext as the fallback. The input photo is base64-encoded and passed as image_url. The canonical STYLE_PROMPT is professional headshot, light neutral background, soft fill lighting, square crop 1:1, no text, photographic style. The output is 1024×1024, EXIF-stripped, and written to public/images/spotlight/{slug}.png.
Both pipelines share a cost logging mechanism: .cms-fal-log.json. It tracks requests and costUsd per month. The checkCap function throws when spend reaches 90% of FAL_MONTHLY_CAP_USD. This is not a soft warning — it is a hard stop that prevents the agent from burning the monthly budget on a runaway generation loop.
The download-attachment.ts script handles location photos and user-uploaded spotlight photos. It validates MIME type (image/*), enforces a 10 MB streaming byte cap, guards against path traversal in the slug, strips EXIF via sharp.rotate().png(), and writes atomically using a temp file and fs.renameSync. It also checks that the output path is not a symlink or special file before overwriting.
How the factory tracks and controls inference spend before it becomes a problem
Cost governance in a dark factory is a first-class concern because the dominant cost is LLM inference, and runaway agents can burn through a daily budget before anyone notices. The Groupon pipeline has three cost control layers: per-PR cost reporting, daily spend alerts, and monthly media caps.
scripts/cms/cost-report.ts generates a markdown cost block for every PR. It reads .cms-fal-log.json for fal.ai spend and estimates Anthropic cost using Sonnet pricing: $3/MTok input, $15/MTok output for claude-sonnet-4-6. If the action does not expose per-run token counts, the script falls back to estimating from ISSUE_BODY_LENGTH: ~4 characters per token, plus a 2,000 token base for context, with output assumed at 30% of input.
scripts/cms/check-daily-spend.ts runs on a cron at 07:00 UTC (after stale PR cleanup at 06:00 UTC). It estimates yesterday's total spend from .cms-fal-log.json and .cms-token-log.json, compares it against DAILY_SPEND_THRESHOLD_USD (default 50), and opens a GitHub issue with label ops-cost-alert if the threshold is exceeded. The issue body includes a JSON breakdown and instructions to set CMS_BOT_ENABLED=false to pause the factory.
What the /proc/self/environ vulnerability revealed about agentic CI and what it means for your allowedTools config
In April 2026, Microsoft Threat Intelligence reported a prompt injection vulnerability in claude-code-action that exposed workflow secrets via /proc/self/environ.[6] The attack vector: a malicious actor opens a GitHub issue with a hidden payload. The CMS workflow processes the issue, the agent reads the issue body, and the injected instruction directs the agent to read /proc/self/environ using the Read tool. The Read tool was not subject to the same Bubblewrap sandbox applied to Bash operations. The workflow's ANTHROPIC_API_KEY — and critically, ACTIONS_ID_TOKEN_REQUEST_TOKEN and ACTIONS_ID_TOKEN_REQUEST_URL — was readable.
With those OIDC credentials, an attacker could replicate the token exchange and obtain a GitHub App token with write access to repository contents, issues, pull requests, and workflows. The entire repo was compromisable from a single crafted issue body.
Anthropic mitigated this in Claude Code v2.1.128 (May 5, 2026) by having the Read tool unconditionally reject files in /proc/.[6] If you're running a version older than 2.1.128, upgrade immediately. If you're pinning claude-code-action@beta without a SHA, you may already have the fix — verify with gh release list --repo anthropics/claude-code-action.
| Risk | Attack Vector | Mitigation | Configuration |
|---|---|---|---|
| Secret exfiltration via Read | Prompt injection reads /proc/self/environ | Upgrade to claude-code-action ≥ v2.1.128 | uses: anthropics/claude-code-action@v2.1.128 |
| Recursive workflow triggering | Bot comment fires issue-clarify.yml | Filter on comment.user.login != 'github-actions[bot]' | Explicit actor exclusion in workflow if: condition |
| Agent pushes directly to main | No branch protection configured | Branch protection rules + agent never gets gh push main | Remove Bash(git push main:*) from allowed_tools |
| Runaway media spend | Agent loops on image generation | Hard cap at 90% of monthly budget | FAL_MONTHLY_CAP_USD env var + checkCap() throws |
| showfulloutput leaks credentials | Debug logs expose API keys | Keep show_full_output: false on public repos | Default setting — do not override in public repos[8] |
The exact failure modes discovered during calibration, and the fixes
allowed_tools patterns with no wildcards on dangerous commands. The action rejects &&, |, and ; in Bash calls.claude/issue-N-* but the workflow checks out main. Always detect the action branch with git branch -r and git checkout it before running quality gates..cms-fal-log.json for media and .cms-token-log.json for inference.github-actions[bot], claude-code-action[bot], and any other app actors explicitly. Use github.event.comment.user.login not github.event.actor.cms-stale-prs.yml uses actions/stale@v9 with only-pr-labels: bot-generated and days-before-pr-stale: 7. It closes the PR and re-opens the source issue.A decision framework for Monday morning
The careers site pipeline took a weekend to build and 30+ commits to stabilize. The calibration cost — in time and tokens — is real. Not every team should build this. The question is whether your content workflow has the right shape.
| Signal | Build It | Don't Build It |
|---|---|---|
| Content volume | 10+ content requests per week with predictable structure | Fewer than 5 per week — manual PR is faster than factory calibration |
| Schema stability | Issue templates don't change often; taxonomy is fixed | Requirements change weekly — the harness will constantly break |
| Review tolerance | Stakeholders can approve via /publish in GitHub | Approval requires external sign-off workflows or legal review |
| Quality bar | Brand voice rules are codifiable in a skill file | Output needs subjective creative judgment on every piece |
| Team GitHub fluency | Team lives in GitHub already — Issues are natural | Team uses Jira/Notion — GitHub Issues creates friction |
| Security posture | Private repo, trusted contributors, allowed_tools locked down | Public repo with external contributors and no PAT rotation policy |
Write the 3–6 issue templates and required field schemas before touching a workflow file. The taxonomy is the contract. Everything else is implementation.
Set vars.CMS_BOT_ENABLED = 'false' in repository variables. Your first workflow will fire incorrectly — you need a circuit breaker before you write line 1.
Create the PAT, store it as BOT_PAT, create a test PR using it, and confirm pull_request events fire in the Actions log before wiring any other workflow.
Ship a version that just reads issues and signals. No PR creation yet. Test with a complete issue (READY) and an incomplete one (WAITING). Verify the labels and signal parsing before adding complexity.
The recursive triggering bug hits here. Verify the comment.user.login filter by posting a bot comment and confirming the workflow does not re-trigger.
Add cms-publish.yml, confirm the label-based merge, then add vercel-preview.yml. Each workflow needs a manual end-to-end test before the next is added.
Operational workflows (cms-stale-prs.yml, cms-spend-alert.yml, cms-ci-healer.yml) make no sense until the happy path is stable. Add them after a week of real runs.
Intake, clarification, publish, preview, spend, stale, canary, CI, PR checks, CI healer, and code review — under 700 lines combined
The Groupon CMS pipeline required 30+ iterative commits to calibrate the clarification loop, BOT_PAT behavior, and branch detection timing
Claude Code agent cost per content request, measured via cost-report.ts and appended to each PR body
fal.ai pricing for flux/dev at standard resolution; flux/schnell fallback is cheaper. Logged per-request in .cms-fal-log.json.
The 19-point check derived from calibration failures, in the order they matter
Can I use this without Claude Code?
Yes. The architecture is independent of the agent implementation. You could use OpenAI's Codex CLI, GitHub Copilot's agent mode, or a custom LLM client inside the action. The critical interfaces are the same: the agent reads a prompt, emits files, runs git commands, and signals READY or WAITING. What changes is the action wrapper and the allowed_tools pattern.
How do I prevent the agent from deleting production data?
Three layers: First, the allowed_tools pattern blocks dangerous Bash commands. Second, branch protection rules on main prevent direct pushes. Third, the agent never runs on main — it always commits to a feature branch. Even if the agent went rogue, it could only delete files on a branch that gets reviewed before merge.
What happens when the agent produces broken code?
The quality gates catch it before PR creation. If pnpm lint or pnpm exec tsc --noEmit fails, the PR creation step fails and the workflow posts an error comment on the issue. The user can reply with corrections, which triggers the clarification loop again. In practice, most agent errors are syntax-level and fixable with a one-sentence correction in the issue thread.
How much does this cost to run?
The dominant cost is LLM inference. A typical content request costs $0.02–$0.50 in Claude API tokens depending on complexity. Image generation via fal.ai adds $0.025 per request for flux/dev. GitHub Actions minutes are negligible for private repos (2000 minutes/month free). The cost-report.ts script tracks per-PR spend so you can set monthly caps via FAL_MONTHLY_CAP_USD and daily alerts via DAILY_SPEND_THRESHOLD_USD.
Can multiple agents run in parallel?
Yes, but they need isolation. The Groupon pipeline uses the Docker sandboxing built into the Claude Code action. For broader multi-agent patterns — where one agent implements, another reviews, and a third merges — see the Dark Factory CLI by Peter Stratton[1], which orchestrates three independent Claude Code instances with separate permissions and non-overlapping allowed_tools lists.
What happens if fal.ai goes down?
Both generate-image.ts and generate-headshot.ts have fallback models. flux/dev falls back to flux/schnell. gpt-image-2 falls back to flux-pro/kontext. If both fail, the agent enters the clarification loop and asks the user to provide an image manually. The factory degrades rather than crashing.
Is GITHUB_TOKEN ever safe enough for PR creation?
Only if you don't need pull_request events on the created PR. If your CI is purely push-triggered and you have no branch protection rules that depend on PR status checks, GITHUB_TOKEN is fine. The moment you add required status checks, auto-merge rules, or any on: pull_request workflow, you need BOT_PAT or a GitHub App token. There is no middle ground.[9]
Should I be concerned about the /proc/self/environ vulnerability?
If you're on claude-code-action < v2.1.128, yes. The Read tool could be directed by prompt injection in issue bodies or PR descriptions to read environment variables including your ANTHROPIC_API_KEY and OIDC tokens.[6] Upgrade to v2.1.128 or later, which unconditionally blocks reads from /proc/. Also: keep show_full_output: false on any public-facing repo, and audit your allowed_tools list to ensure it doesn't include WebFetch or network-capable tools when processing untrusted input.
What is the CI Auto-Healer?
cms-ci-healer.yml is a workflow that triggers when ci.yml or cms.yml fails. It checks out the exact failing SHA, downloads the run logs, and invokes Claude Code with a diagnostic prompt. The agent reads the error logs, identifies the root cause, opens a fix PR on a fix/ci-healer-{run_id}-{sha} branch, and posts a comment on the original issue. This closes the loop between failure and fix without human intervention.
Why does the preview workflow use deploymentstatus instead of pullrequest?
vercel-preview.yml listens on deployment_status because Vercel previews are triggered by pushes, not PR events. The workflow receives the deployment SHA, looks up the associated PR via gh api /repos/{repo}/commits/{sha}/pulls, extracts the Closes #N reference from the PR body, and posts a state-specific comment on the source issue. This handles pending, in-progress, success, and failure states with different messages.
A dark software factory is not about removing humans from software development. It is about removing humans from the parts that don't need them: the translation from spec to branch, from branch to PR, from PR to merge, and from merge to deployed preview. The human still writes the spec, still approves the merge with /publish, and still designs the harness that constrains the agent via AGENTS.md and allowed_tools.
What changes is latency. A content request that once sat in a backlog for two days now ships in twenty minutes. A CI failure that once required finding an available engineer now generates a fix branch while the team is still in standup.
The 11 workflow files are under 700 lines combined. The complexity is not in the code — it's in the edge cases. The BOT_PAT behavior, the recursive trigger prevention, the branch detection timing, the WAITING/READY protocol, the /proc/self/environ attack surface, and the media generation fallback chains are all learned from failed runs, not from documentation. Build the simple version first. Run it for a week. Let the failures teach you what to guard against. The factory that can't fail gracefully isn't a factory. It's a bomb with good PR descriptions.