Forty-five minutes. That is what most engineering directors pay every morning before standup, walking the same loop: Jira, then GitHub, then Slack, then Asana, then a spreadsheet someone pinned in a chat room three weeks ago. The output is a mental model of what needs intervention. Half of what gets flagged is fine.
That loop is a manual triage pipeline running on the most expensive infrastructure in the org. It is also the wrong shape. Reading dashboards trains you to pattern-match against yesterday's picture. The signals that matter are the ones that changed overnight, and the human eye is bad at deltas.
The replacement is structural. Five collector subagents fan out across five systems in parallel. An orchestrator pulls the payloads, cross-references them, scores confidence, and emits a 90-second brief tagged RED, AMBER, GREEN. Total wall-clock: under a minute. The pipeline finishes before the laptop opens.
For a director scanning five or more tools, the time recovered lands somewhere between 30 and 60 minutes a day.[3] The actual gain depends on how many surfaces you currently check and how disciplined the team is about keeping them current — claim the time savings only after you measure them. The bigger payoff is harder to count: signals that crossed system boundaries overnight no longer require a human to notice them.
Dashboards Are Built for Operators. Leaders Are Not Operators.
What a leader needs is a decision, not a screen. Every additional surface widens the gap.
Dashboards were designed for SOC analysts and on-call engineers — people whose job is to stare at the screen. Engineering leaders are not those people. They context-switch between hiring, architecture review, stakeholder management, and the next quarter's plan. A dashboard demands continuous attention. A leader needs a processed conclusion.
The 2025 SANS Detection & Response Survey clocks 46% of all alerts as false positives.[1] Engineering operations track close to that number. Stale PRs blocked on a design call. "Blocked" Jira tickets nobody bothered to close. Asana tasks marked overdue that the team intentionally deprioritized. Signal-to-noise is bad on every surface, and it compounds across them.
The average org runs eight observability tools.[2] For a director overseeing 40 to 80 engineers across squads, that is eight tabs, eight notification channels, eight mental models — all kept warm in the background while real work happens. The cost is not the time per tool. The cost is the cognitive overhead of holding the union of all of them in working memory before the first 1:1 of the day.
Open Jira, filter by team, scan for blocked tickets
Switch to GitHub, hunt PR age and review queues
Open Asana, audit 5/15 report compliance by hand
Scroll Google Chat for unanswered escalations
Pull up the business metrics sheet someone pinned
Stitch a mental picture across five surfaces
Miss the signal that crossed two systems overnight
45–60 minutes, inconsistent coverage, fatigue compounding
Five collector subagents query all sources in parallel
Orchestrator normalizes payloads and cross-references signals
Confidence scoring separates noise from a real fight
RED/AMBER/GREEN with source attribution and a suggested move
Brief lands before standup; reading it takes 90 seconds
Weekly calibration loop pulls false positives back into 10–15%
Catches the multi-source correlations a single tab cannot show
Same coverage every morning, regardless of how rough yesterday was
Five Collectors. One Orchestrator. Nothing Shared Between Them.
Collection fans out. Judgment lives in one place. Coupling is the enemy.
Three layers, sharply separated:
Layer 1 — Collectors run as parallel Claude Code subagents. Each one talks to exactly one system and returns a normalized signal payload. They run simultaneously, so total collection time equals the slowest API response (usually 3–8 seconds), not the sum of all five.[5]
Layer 2 — Orchestrator receives the five payloads, cross-references them (a blocked Jira ticket and a stale PR on the same feature is worse than either alone), runs a confidence scoring pass, and assigns RED/AMBER/GREEN.
Layer 3 — Output formats the brief as a concise summary delivered to a chosen channel — email, Slack, or a pinned Google Doc that updates daily.
The coupling rule is non-negotiable: collectors know nothing about each other. Adding a sixth source means writing one new collector file, not refactoring the orchestrator. The system that resists growth is not a radar — it is a project liability.
Signal Radar Project Structure
treesignal-radar/
├── collectors/
│ ├── jira-collector.md
│ ├── github-collector.md
│ ├── asana-collector.md
│ ├── gchat-collector.md
│ └── sheets-collector.md
├── orchestrator/
│ ├── scoring-rules.md
│ └── orchestrate.md
├── config/
│ ├── thresholds.json
│ ├── sources.json
│ └── output-template.md
├── logs/
│ ├── false-positives.jsonl
│ └── calibration-history.json
├── run.sh
└── CLAUDE.mdEach Collector Owns One System and Knows Nothing About the Others
Isolation is the leverage point. It is what lets the system grow without the orchestrator carrying the cost.
Each collector subagent is a markdown prompt file Claude Code loads as a task. The design constraint is strict: each collector knows nothing about the others. It queries one API, extracts the signals that matter, and returns a standardized JSON payload. That isolation is what makes the system extend cleanly — adding a sixth source is one new collector file, not an orchestrator rewrite.
What each collector pulls:
| Collector | System | Key Signals | API Method |
|---|---|---|---|
| Jira Collector | Jira Cloud | Blocked ticket count, P1/P0 incidents, sprint burndown deviation, tickets stuck >3 days | REST API v3 with JQL |
| GitHub Collector | GitHub | PR age >48h, review bottlenecks (PRs with 0 reviews), failed CI runs on main, deploy frequency delta | GraphQL API + REST checks |
| Asana Collector | Asana | 5/15 report submission rate, overdue milestones, tasks without assignees, project health drift | Asana REST API |
| Chat Collector | Google Chat | Unanswered threads >4h in key spaces, escalation keywords, unresolved questions from direct reports | Google Chat API |
| Metrics Collector | Google Sheets | Business KPIs vs. targets (revenue, churn, NPS), week-over-week deltas, threshold breaches | Sheets API v4 |
collectors/jira-collector.md# Jira Signal Collector
One collector. One system. Returns a structured signal payload.
## Instructions
1. Run these JQL queries via the Jira MCP tool:
- `status = Blocked AND sprint in openSprints()` → count blocked tickets
- `priority in (P0, P1) AND status != Done AND created >= -7d` → active incidents
- `status changed TO "In Progress" before -3d AND status = "In Progress"` → stuck tickets
2. Compute sprint burndown deviation:
- Current sprint progress vs. ideal burndown line
- Flag if >20% behind ideal pace
3. Return this exact JSON shape:
```json
{
"source": "jira",
"timestamp": "<ISO 8601>",
"signals": [
{
"type": "blocked_tickets",
"count": <number>,
"severity": "red|amber|green",
"details": ["PROJ-123: <summary>", ...]
}
]
}
```
## Severity Rules
- RED: any P0, or >3 blocked tickets, or burndown >30% behind
- AMBER: any P1, or 1-3 blocked, or burndown 15-30% behind
- GREEN: no incidents, 0 blocked, burndown on trackThe Orchestrator Does Not Summarize. It Correlates and Judges.
Where raw signals turn into a decision. The piece most teams get wrong on the first attempt.
The orchestrator is the most important piece, and the one most teams get wrong on the first attempt. Its job is not to summarize. Its job is to correlate and judge.
A blocked Jira ticket is AMBER on its own. The same feature with a PR open five days and zero reviews is RED. The blocker is not the ticket status — it is a review bottleneck stalling an entire feature. The orchestrator catches this because it sees both signals at once.
The scoring pass runs in three stages:
- [01]
Signal Normalization
Each collector returns signals in a standard schema, but severity thresholds differ per source. The orchestrator compresses every signal into a unified 0–100 severity scale where 0 is noise and 100 is drop-everything.
- [02]
Cross-Reference Correlation
The orchestrator looks for signals from different systems referencing the same feature, team, or person. Correlated signals get a confidence boost — multiple systems agreeing on a problem is stronger evidence than any one source claiming it alone.
- [03]
Confidence Classification
After normalization and correlation, each signal cluster receives a final confidence score. Clusters above 70 are RED, 40–70 are AMBER, below 40 are GREEN. Only RED and AMBER appear in the brief. The rest is GREEN summary.
Calibration Is Where the System Earns Trust. Most Teams Skip It.
Start too sensitive. Track false positives. Tune weekly. Without this loop, the radar becomes another notification nobody reads.
Here is the failure mode that kills most alert systems: thresholds get set on what feels reasonable, the system ships, nothing gets adjusted. Within two weeks the brief either cries wolf so often that leaders ignore it, or it misses a real incident because the thresholds were too relaxed.
The calibration loop is not optional. It is the feature that separates a useful radar from another notification channel.
Start intentionally too sensitive. Set every threshold at the aggressive end. PR open longer than 24 hours? AMBER. Two blocked tickets? RED. Business metric down 5% week-over-week? RED. The system should over-report in the first week. Tightening down is easy. Discovering missed signals after the fact is not.
- [01]
Log every false positive as you encounter it
json// false-positives.jsonl — append one line per false positive { "date": "2026-03-19", "signal_type": "pr_age", "source": "github", "classified_as": "red", "should_have_been": "green", "reason": "PR is a long-running RFC, not stale", "threshold_at_time": 24 } - [02]
Run the weekly calibration review
bash# Count false positives by signal type for the past 7 days cat logs/false-positives.jsonl | \ jq -s 'group_by(.signal_type) | map({type: .[0].signal_type, count: length, avg_gap: (map(if .should_have_been == "green" then 2 elif .should_have_been == "amber" then 1 else 0 end) | add / length)})' - [03]
Adjust thresholds based on false positive rate
json// thresholds.json — update after each weekly review { "pr_age_amber_hours": 48, // was 24, bumped after 6 FPs "pr_age_red_hours": 96, // was 48 "blocked_tickets_red": 4, // was 2, bumped after 3 FPs "business_metric_delta_red": 0.10, // was 0.05 "unanswered_thread_hours": 6 // was 4 } - [04]
Track calibration history for trend analysis
json// calibration-history.json — one entry per weekly review { "reviews": [ { "date": "2026-03-19", "total_signals": 47, "false_positives": 12, "fp_rate": 0.255, "changes": ["pr_age_amber: 24→48", "blocked_red: 2→4"] } ] }
The 90-Second Brief Format
Structured for fast scanning. RED first. Source attribution. Suggested move. GREEN summary at the bottom proves the radar ran.
output-template.md# Signal Radar — {{date}}
## RED (Needs Intervention Today)
### 🔴 Deploy pipeline blocked on main
- **Source**: GitHub (failed CI) + Jira (3 blocked tickets on DEPLOY-Epic)
- **Confidence**: 92% (multi-source correlated)
- **Context**: CI failure started at 2:14 AM, 3 PRs queued behind it
- **Suggested action**: Page on-call to investigate flaky test in auth-service
### 🔴 P1 incident: Payment processing latency spike
- **Source**: Jira (INC-847) + Google Sheets (revenue delta -8%)
- **Confidence**: 88% (metric correlation confirms impact)
- **Context**: Opened 6h ago, assigned to payments team, no resolution ETA
- **Suggested action**: Escalate to payments tech lead, request ETA by 10 AM
---
## AMBER (Monitor Closely)
### 🟡 Review bottleneck on mobile team
- **Source**: GitHub (4 PRs >48h, 0 reviews)
- **Confidence**: 65% (single source)
- **Context**: Mobile team has 2 engineers OOO this week
- **Suggested action**: Redistribute reviews to platform team
---
## GREEN Summary
- Asana 5/15s: 12/14 submitted (86%) — on track
- Sprint burndown: -4% from ideal — normal range
- Google Chat: 0 unanswered escalation threads
- Business metrics: All within ±3% of targetsEach RED and AMBER item carries the same four fields: source attribution, confidence percentage, temporal context, and a concrete suggested action. The leader reads the brief in 90 seconds and walks out knowing what needs attention and what the first move is.
The GREEN summary compresses to bullets on purpose. If everything is healthy, the leader does not need details — they need confirmation that the radar actually checked. A brief with no GREEN section is ambiguous: did the system find nothing, or did it fail silently?
From Zero to Running Radar in One Week
Concrete rollout. Day-by-day. v1 before Friday.
run.sh#!/bin/bash
# Signal Radar — main entry point.
# Fans out collectors in parallel, then orchestrates scoring.
set -euo pipefail
DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
OUTPUT_DIR="./output/$DATE"
mkdir -p "$OUTPUT_DIR"
echo "[radar] Starting parallel collection at $DATE"
# Five collectors. Parallel. No shared state.
claude --print \
--prompt "$(cat collectors/jira-collector.md)" \
--output-file "$OUTPUT_DIR/jira.json" &
PID_JIRA=$!
claude --print \
--prompt "$(cat collectors/github-collector.md)" \
--output-file "$OUTPUT_DIR/github.json" &
PID_GITHUB=$!
claude --print \
--prompt "$(cat collectors/asana-collector.md)" \
--output-file "$OUTPUT_DIR/asana.json" &
PID_ASANA=$!
claude --print \
--prompt "$(cat collectors/gchat-collector.md)" \
--output-file "$OUTPUT_DIR/gchat.json" &
PID_CHAT=$!
claude --print \
--prompt "$(cat collectors/sheets-collector.md)" \
--output-file "$OUTPUT_DIR/sheets.json" &
PID_SHEETS=$!
# Wait for all collectors. 30s ceiling per call.
wait $PID_JIRA $PID_GITHUB $PID_ASANA $PID_CHAT $PID_SHEETS
echo "[radar] All collectors finished"
# Orchestrator runs once, with all five payloads as context.
claude --print \
--prompt "$(cat orchestrator/orchestrate.md)" \
--context "$OUTPUT_DIR/jira.json" \
--context "$OUTPUT_DIR/github.json" \
--context "$OUTPUT_DIR/asana.json" \
--context "$OUTPUT_DIR/gchat.json" \
--context "$OUTPUT_DIR/sheets.json" \
--output-file "$OUTPUT_DIR/brief.md"
echo "[radar] Brief generated: $OUTPUT_DIR/brief.md"Five Failure Modes That Kill Signal Radars
Anti-Patterns. Avoid Each One.
Skipping the calibration loop
Without weekly threshold tuning, false positive rates climb above 25% within a month. Leaders stop reading the brief. You have built an expensive notification nobody checks.
Letting collectors share state
Each collector queries its source fresh on every run. Shared caches create phantom signals — the Jira collector reports a blocked ticket that was resolved an hour ago because it read stale data from a shared store.
Adding every possible signal
Start with three to five high-value signals per source. A radar that surfaces 40 items per morning is not a radar — it is a dashboard wearing a trench coat. Add signals later, once the system is stable.
Hard-coding thresholds in collector prompts
Thresholds live in thresholds.json, not in the collector markdown files. That separation lets you tune sensitivity without editing agent prompts and risking accidental behavior changes.
Shipping without a GREEN summary section
The GREEN section proves the radar ran and checked everything. Without it, a brief with zero RED/AMBER items is ambiguous — silent failure looks identical to a clean morning.
Scaling Past One Leader: Share Collectors, Personalize the Orchestrator
Once the radar works for one director, the question is whether every engineering manager can run their own. Yes — with one architectural rule.
Collectors can be shared. A single Jira collector pulling all blocked tickets is more efficient than one per manager. The orchestrator must be personalized. Each leader cares about different teams, different projects, and runs different threshold tolerances.
The cleanest cut is a profiles/ directory where each leader has a config file specifying their teams, projects, and custom thresholds. The orchestrator loads the relevant profile and filters collector output accordingly.
That structure also opens an organizational view: if every leader's radar data is logged, a CPO or CTO can run a meta-analysis across all briefs. Which teams consistently show RED? Which systems generate the most false positives? Which cross-team dependencies surface as correlated signals?
One uncomfortable finding from teams that have rolled this out at scale: some managers actively resist the brief format. Not because the data is wrong. Because scanning dashboards manually was giving them a reason to open conversations with their teams. The morning Jira ritual was also an excuse to ping a teammate, notice something off in passing, stay close to the work. Automating the scan removes that ambient contact. Worth knowing before you push it to fifteen managers at once. The fix is structural, not motivational: replace the lost contact surface with a deliberate one — a 15-minute pulse with each direct, on a schedule, owned by the leader.
Pre-Launch Verification
All five collector subagents return valid JSON payloads
Orchestrator cross-references signals from 2+ sources correctly
Confidence scoring produces expected RED/AMBER/GREEN on test data
Thresholds externalized in thresholds.json — never hard-coded in prompts
False positive logging pipeline writes to false-positives.jsonl
Output brief renders correctly in the target delivery channel
GREEN summary section appears even when no issues detected
Total pipeline execution completes in under 60 seconds
Cron schedule set for 30 minutes before daily standup
First-week calibration review meeting on the calendar
How much does this cost to run daily?
Each run invokes five parallel Claude Code subagent calls plus one orchestrator call. At typical prompt sizes (2–4K tokens input, 1–2K output per collector), that lands around $0.15–0.30 per run. Once daily costs $4.50–9.00 per month — orders of magnitude under the engineering-leader salary time it claws back.
What if one collector API is down?
Build timeout handling into each collector. If a source is unreachable after 15 seconds, the collector returns a payload with zero signals and a source_status: degraded flag. The orchestrator surfaces this in the brief so the leader knows the source was not checked. Silent failure is the worst outcome — design against it.
Can this run with something other than Claude Code?
The architecture is model-agnostic. The collector/orchestrator pattern works with any LLM that can make API calls and return structured JSON. Claude Code's subagent model makes parallelism particularly clean. You can implement the same shape with LangChain agents, CrewAI, or scripts hitting the Anthropic API directly. One practical note: Claude Code handles parallelism and subagent spawning natively, which saves roughly 50–100 lines of orchestration boilerplate compared to a hand-rolled LangChain version. For a production deployment running twice daily, that scaffolding cost is worth paying once. For a quick proof-of-concept, any approach works.
How do I handle sensitive data in the brief?
Collector subagents run inside your security boundary — they call APIs with your credentials and process data locally. The brief itself ships through an authenticated channel: private Slack DM, encrypted email, or a permission-locked Google Doc. Never deliver the brief to a public channel. Treat it as production output.
What is the right false positive rate to target?
10–15%. Below 10% the thresholds are too relaxed and real signals slip through. Above 20% the noise erodes trust and the brief stops getting read. Track FP rate weekly. Adjust thresholds to stay in band. The calibration loop is the radar.
The signal radar is not a complicated system. Five focused collectors, one orchestrator, a delivery channel, and a calibration loop that tightens the system over time. The hard part is not building it. The hard part is the discipline to log false positives and adjust thresholds every week for the first month.
Once calibrated, the radar restructures the morning. No more 45 minutes of dashboard-scanning. A 90-second brief lands before standup, names what needs intervention, and shows the work.[4] The signals that matter find you. Everything else stays GREEN.
Triage in your head is a coordination tax. Pay it once, in code, and stop paying it every morning.
- [1]Stamus Networks — 2025 SANS Detection & Response Survey: False Positives and Alert Fatigue(stamus-networks.com)↩
- [2]Syncause — The State of Observability in 2025(medium.com)↩
- [3]Dashworks — How to Solve Information Sprawl in 30 Minutes(dashworks.ai)↩
- [4]Incident.io — Alert Fatigue Solutions for DevOps Teams in 2025(incident.io)↩
- [5]Claude Code Docs — Agent Teams(code.claude.com)↩
- [6]Waydev — 2026 Tech Trends: A Guide for Engineering Leaders(waydev.co)↩