Every engineering director I know starts the morning the same way: Jira, then GitHub, then Slack, then Asana, then maybe a spreadsheet someone pinned in a Google Chat room three weeks ago. By the time they've assembled a mental picture of what actually needs intervention, 40-50 minutes have evaporated — and half of what they flagged turns out to be fine.
This pattern is a cross-system signal radar. It uses parallel Claude Code subagents to pull data from five systems simultaneously, feeds everything into an orchestrator that runs confidence scoring, and produces a two-minute brief color-coded as RED, AMBER, or GREEN. The entire pipeline runs before you open your laptop.
For many VP or Director-level engineering leaders, the approach can save in the range of 30-60 minutes per day[3] — though the actual gain depends heavily on how many tools you currently scan and how disciplined your team is about keeping them current. The real payoff is not just the time — it is the reduction in missed signals. When you manually scan dashboards, you are pattern-matching against yesterday's mental model. An automated radar catches the things that changed shape overnight.
Why Dashboards Fail Engineering Leaders
The real problem is not missing data — it is too much of it, scattered across too many surfaces.
Dashboards were designed for operators who stare at screens all day. Engineering leaders are not operators. They context-switch between strategy, hiring, architecture reviews, and stakeholder management. A dashboard demands continuous attention; a leader needs processed conclusions.
The 2025 SANS Detection & Response Survey found that 46% of all alerts across organizations turn out to be false positives.[1] In engineering operations, the number tracks similarly: stale PRs that are actually waiting on a design decision, "blocked" Jira tickets that someone forgot to update, Asana tasks that look overdue but were intentionally deprioritized.
The average organization runs eight observability tools.[2] Engineers spend more time toggling between dashboards than actually resolving the problems those dashboards surface. For a director overseeing 40-80 engineers across multiple squads, the cognitive load is multiplicative — every additional tool adds another tab to check, another notification channel to monitor, another mental model to maintain.
Open Jira, filter by team, scan for blocked tickets
Switch to GitHub, check PR age and review queues
Open Asana, verify 5/15 report compliance
Scan Google Chat for unanswered threads
Pull up business metrics spreadsheet
Mentally synthesize across 5+ tools
Miss signals that changed overnight
~45-60 minutes, inconsistent coverage
Parallel subagents pull all 5 sources simultaneously
Orchestrator normalizes and cross-references signals
Confidence scoring filters noise from real issues
RED/AMBER/GREEN classification with explanations
Two-minute brief delivered before standup
Threshold calibration reduces false positives weekly
Catches cross-system correlations humans miss
~90 seconds to read, full coverage every time
Architecture: Five Collectors, One Orchestrator
Parallel subagents gather signals independently, then a central agent scores and routes them.
The architecture has three layers:
Layer 1 — Collectors run as parallel Claude Code subagents. Each one talks to exactly one system and returns a normalized signal payload. They run simultaneously, so total collection time equals the slowest API response (usually 3-8 seconds), not the sum of all five.[5]
Layer 2 — Orchestrator receives all five payloads, cross-references them (a blocked Jira ticket and a stale PR on the same feature is worse than either alone), runs a confidence scoring pass, and assigns RED/AMBER/GREEN.
Layer 3 — Output formats the brief as a concise summary delivered to your preferred channel — email, Slack, or a pinned Google Doc that updates daily.
Signal Radar Project Structure
treesignal-radar/
├── collectors/
│ ├── jira-collector.md
│ ├── github-collector.md
│ ├── asana-collector.md
│ ├── gchat-collector.md
│ └── sheets-collector.md
├── orchestrator/
│ ├── scoring-rules.md
│ └── orchestrate.md
├── config/
│ ├── thresholds.json
│ ├── sources.json
│ └── output-template.md
├── logs/
│ ├── false-positives.jsonl
│ └── calibration-history.json
├── run.sh
└── CLAUDE.mdBuilding the Five Collector Subagents
Each collector is a focused Claude Code subagent that queries one system and returns structured signals.
Each collector subagent is defined as a markdown prompt file that Claude Code loads as a task. The key design principle: each collector knows nothing about the others. It queries one API, extracts the signals that matter, and returns a standardized JSON payload. This isolation makes the system easy to extend — adding a sixth source means writing one new collector file, not refactoring the orchestrator.
Here is what each collector extracts:
| Collector | System | Key Signals | API Method |
|---|---|---|---|
| Jira Collector | Jira Cloud | Blocked ticket count, P1/P0 incidents, sprint burndown deviation, tickets stuck >3 days | REST API v3 with JQL |
| GitHub Collector | GitHub | PR age >48h, review bottlenecks (PRs with 0 reviews), failed CI runs on main, deploy frequency delta | GraphQL API + REST checks |
| Asana Collector | Asana | 5/15 report submission rate, overdue milestones, tasks without assignees, project health drift | Asana REST API |
| Chat Collector | Google Chat | Unanswered threads >4h in key spaces, escalation keywords, unresolved questions from direct reports | Google Chat API |
| Metrics Collector | Google Sheets | Business KPIs vs. targets (revenue, churn, NPS), week-over-week deltas, threshold breaches | Sheets API v4 |
collectors/jira-collector.md# Jira Signal Collector
You are a signal collector subagent. Query the Jira Cloud REST API and return a structured signal payload.
## Instructions
1. Run these JQL queries using the Jira MCP tool:
- `status = Blocked AND sprint in openSprints()` → count blocked tickets
- `priority in (P0, P1) AND status != Done AND created >= -7d` → active incidents
- `status changed TO "In Progress" before -3d AND status = "In Progress"` → stuck tickets
2. Calculate sprint burndown deviation:
- Get current sprint progress vs. ideal burndown line
- Flag if >20% behind ideal pace
3. Return this exact JSON structure:
```json
{
"source": "jira",
"timestamp": "<ISO 8601>",
"signals": [
{
"type": "blocked_tickets",
"count": <number>,
"severity": "red|amber|green",
"details": ["PROJ-123: <summary>", ...]
}
]
}
```
## Severity Rules
- RED: any P0, or >3 blocked tickets, or burndown >30% behind
- AMBER: any P1, or 1-3 blocked, or burndown 15-30% behind
- GREEN: no incidents, 0 blocked, burndown on trackThe Orchestrator: Cross-Reference and Score
Where raw signals become actionable decisions through confidence scoring and correlation.
The orchestrator is the most important piece, and the one most people get wrong on the first attempt. Its job is not to summarize — it is to correlate and judge.
A blocked Jira ticket is AMBER by itself. But if the same feature also has a PR that has been open for five days with zero reviews, that combination is RED. The blocker is not the ticket status; it is a review bottleneck that is stalling an entire feature. The orchestrator catches this because it sees both signals simultaneously.
The confidence scoring pass works in three stages:
- 1
Signal Normalization
Each collector returns signals in a standard schema, but severity thresholds differ per source. The orchestrator normalizes all signals into a unified severity scale (0-100) where 0 is noise and 100 is drop-everything.
- 2
Cross-Reference Correlation
The orchestrator looks for signals from different systems that reference the same feature, team, or person. Correlated signals get a confidence boost because multiple systems agreeing on a problem is stronger evidence.
- 3
Confidence Classification
After normalization and correlation, each signal cluster gets a final confidence score. Clusters above 70 are RED, 40-70 are AMBER, below 40 are GREEN. Only RED and AMBER items appear in the brief.
Threshold Calibration: The Part Everyone Skips
Start too sensitive, track false positives, tune weekly. This is where the system earns trust.
Here is the mistake that kills most alert systems: people set thresholds based on what feels reasonable, ship it, and never adjust. Within two weeks, the brief either cries wolf so often that leaders ignore it, or it misses a real incident because the thresholds were too relaxed.
The calibration loop is not optional. It is the core feature that separates a useful radar from another notification channel.
Start intentionally too sensitive. Set every threshold at the aggressive end. PR open longer than 24 hours? AMBER. Two blocked tickets? RED. Business metric down 5% week-over-week? RED. You want the system to over-report in the first week.
- 1
Log every false positive as you encounter it
json// false-positives.jsonl — append one line per false positive { "date": "2026-03-19", "signal_type": "pr_age", "source": "github", "classified_as": "red", "should_have_been": "green", "reason": "PR is a long-running RFC, not stale", "threshold_at_time": 24 } - 2
Run the weekly calibration review
bash# Count false positives by signal type for the past 7 days cat logs/false-positives.jsonl | \ jq -s 'group_by(.signal_type) | map({type: .[0].signal_type, count: length, avg_gap: (map(if .should_have_been == "green" then 2 elif .should_have_been == "amber" then 1 else 0 end) | add / length)})' - 3
Adjust thresholds based on false positive rate
json// thresholds.json — adjust after each weekly review { "pr_age_amber_hours": 48, // was 24, bumped after 6 FPs "pr_age_red_hours": 96, // was 48 "blocked_tickets_red": 4, // was 2, bumped after 3 FPs "business_metric_delta_red": 0.10, // was 0.05 "unanswered_thread_hours": 6 // was 4 } - 4
Track calibration history for trend analysis
json// calibration-history.json — one entry per weekly review { "reviews": [ { "date": "2026-03-19", "total_signals": 47, "false_positives": 12, "fp_rate": 0.255, "changes": ["pr_age_amber: 24→48", "blocked_red: 2→4"] } ] }
The Two-Minute Brief Format
Structured for fast scanning — RED items first, context included, action suggested.
output-template.md# Signal Radar — {{date}}
## RED (Needs Intervention Today)
### 🔴 Deploy pipeline blocked on main
- **Source**: GitHub (failed CI) + Jira (3 blocked tickets on DEPLOY-Epic)
- **Confidence**: 92% (multi-source correlated)
- **Context**: CI failure started at 2:14 AM, 3 PRs queued behind it
- **Suggested action**: Page on-call to investigate flaky test in auth-service
### 🔴 P1 incident: Payment processing latency spike
- **Source**: Jira (INC-847) + Google Sheets (revenue delta -8%)
- **Confidence**: 88% (metric correlation confirms impact)
- **Context**: Opened 6h ago, assigned to payments team, no resolution ETA
- **Suggested action**: Escalate to payments tech lead, request ETA by 10 AM
---
## AMBER (Monitor Closely)
### 🟡 Review bottleneck on mobile team
- **Source**: GitHub (4 PRs >48h, 0 reviews)
- **Confidence**: 65% (single source)
- **Context**: Mobile team has 2 engineers OOO this week
- **Suggested action**: Redistribute reviews to platform team
---
## GREEN Summary
- Asana 5/15s: 12/14 submitted (86%) — on track
- Sprint burndown: -4% from ideal — normal range
- Google Chat: 0 unanswered escalation threads
- Business metrics: All within ±3% of targetsNotice the structure of each RED and AMBER item: source attribution, confidence percentage, temporal context, and a concrete suggested action. The leader reads this in 90 seconds and knows exactly what needs their attention and what the first move should be.
The GREEN summary is deliberately compressed into bullet points. If everything is healthy, you do not need details — you need confirmation that the radar actually checked.
Implementation: From Zero to Running Radar
A concrete rollout plan that gets you to a working v1 in one week.
run.sh#!/bin/bash
# Signal Radar — main entry point
# Runs all collectors in parallel, then orchestrates scoring
set -euo pipefail
DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
OUTPUT_DIR="./output/$DATE"
mkdir -p "$OUTPUT_DIR"
echo "[radar] Starting parallel collection at $DATE"
# Launch all five collectors as parallel subagents
claude --print \
--prompt "$(cat collectors/jira-collector.md)" \
--output-file "$OUTPUT_DIR/jira.json" &
PID_JIRA=$!
claude --print \
--prompt "$(cat collectors/github-collector.md)" \
--output-file "$OUTPUT_DIR/github.json" &
PID_GITHUB=$!
claude --print \
--prompt "$(cat collectors/asana-collector.md)" \
--output-file "$OUTPUT_DIR/asana.json" &
PID_ASANA=$!
claude --print \
--prompt "$(cat collectors/gchat-collector.md)" \
--output-file "$OUTPUT_DIR/gchat.json" &
PID_CHAT=$!
claude --print \
--prompt "$(cat collectors/sheets-collector.md)" \
--output-file "$OUTPUT_DIR/sheets.json" &
PID_SHEETS=$!
# Wait for all collectors (timeout after 30s each)
wait $PID_JIRA $PID_GITHUB $PID_ASANA $PID_CHAT $PID_SHEETS
echo "[radar] All collectors finished"
# Run orchestrator with all collector outputs
claude --print \
--prompt "$(cat orchestrator/orchestrate.md)" \
--context "$OUTPUT_DIR/jira.json" \
--context "$OUTPUT_DIR/github.json" \
--context "$OUTPUT_DIR/asana.json" \
--context "$OUTPUT_DIR/gchat.json" \
--context "$OUTPUT_DIR/sheets.json" \
--output-file "$OUTPUT_DIR/brief.md"
echo "[radar] Brief generated: $OUTPUT_DIR/brief.md"Five Pitfalls That Derail Signal Radars
Avoid These Mistakes
Never skip the calibration loop
Without weekly threshold tuning, false positive rates climb above 25% within a month. Leaders stop reading the brief, and you have built an expensive notification nobody checks.
Do not let collectors share state
Each collector must query its source fresh on every run. Shared caches create phantom signals — the Jira collector reports a blocked ticket that was resolved an hour ago because it read stale data from a shared store.
Resist the urge to add every possible signal
Start with 3-5 high-value signals per source. A radar that surfaces 40 items per morning is not a radar — it is a dashboard wearing a trench coat. You can always add signals later as the system stabilizes.
Do not hard-code thresholds in collector prompts
Thresholds live in thresholds.json, not in the collector markdown files. This lets you adjust sensitivity without editing agent prompts and risking accidental behavior changes.
Never deploy without a GREEN summary section
The GREEN section proves the radar ran and checked everything. Without it, a brief with zero RED/AMBER items is ambiguous — did the system find nothing, or did it fail silently?
Scaling Beyond One Leader
Once the radar works for one director, the natural question is: can every engineering manager get their own? Yes, but with a critical architectural change.
The collectors can be shared — a single Jira collector pulling all blocked tickets is more efficient than one per manager. The orchestrator, however, must be personalized. Each leader cares about different teams, different projects, and has different threshold tolerances.
The cleanest approach is a profiles/ directory where each leader has a config file specifying their teams, projects, and custom thresholds. The orchestrator reads the relevant profile and filters the collector outputs accordingly.
This also unlocks a powerful organizational view: if every leader's radar data is logged, a CPO or CTO can run a meta-analysis across all briefs. Which teams consistently show RED? Which systems generate the most false positives? Which cross-team dependencies surface as correlated signals?
Pre-Launch Verification
All five collector subagents return valid JSON payloads
Orchestrator correctly cross-references signals from 2+ sources
Confidence scoring produces expected RED/AMBER/GREEN for test data
Thresholds are externalized in thresholds.json (not hard-coded)
False positive logging pipeline writes to false-positives.jsonl
Output brief renders correctly in target delivery channel
GREEN summary section appears even when no issues detected
Total pipeline execution completes in under 60 seconds
Cron schedule set for 30 minutes before daily standup
First week calibration review meeting scheduled
How much does this cost to run daily?
Each run invokes five parallel Claude Code subagent calls plus one orchestrator call. At typical prompt sizes (2-4K tokens input, 1-2K output per collector), you are looking at roughly $0.15-0.30 per run. At once daily, that is $4.50-9.00/month — significantly less than the engineering leader salary time it saves.
What if one collector API is down?
Build timeout handling into each collector. If a source is unreachable after 15 seconds, the collector returns a payload with zero signals and a 'source_status: degraded' flag. The orchestrator notes this in the brief so the leader knows that source was not checked.
Can I use this with tools other than Claude Code?
The architecture is model-agnostic. The collector/orchestrator pattern works with any LLM that can make API calls and return structured JSON. Claude Code's subagent model makes parallelism particularly clean, but you could implement the same pattern with LangChain agents, CrewAI, or custom scripts.
How do I handle sensitive data in the brief?
The collector subagents run within your security boundary — they call APIs with your credentials and process data locally. The brief itself should be delivered through an authenticated channel (private Slack DM, encrypted email, or a locked Google Doc). Never send the brief to a public channel.
What is the right false positive rate to target?
10-15%. Below 10% usually means your thresholds are too relaxed and you are missing real signals. Above 20% creates noise that erodes trust in the system. Track your FP rate weekly and adjust thresholds to stay in this band.
The signal radar pattern is not complicated. Five focused collectors, one smart orchestrator, a delivery mechanism, and — critically — a calibration loop that tightens the system over time. The hard part is not building it. The hard part is the discipline to log false positives and adjust thresholds every week for the first month.
Once calibrated, the radar fundamentally changes how you start your day. Instead of 45 minutes of anxious dashboard-scanning, you get a 90-second brief that tells you exactly what needs intervention and why.[4] The signals that matter find you. Everything else stays GREEN.
- [1]Stamus Networks — 2025 SANS Detection & Response Survey: False Positives and Alert Fatigue(stamus-networks.com)↩
- [2]Syncause — The State of Observability in 2025(medium.com)↩
- [3]Dashworks — How to Solve Information Sprawl in 30 Minutes(dashworks.ai)↩
- [4]Incident.io — Alert Fatigue Solutions for DevOps Teams in 2025(incident.io)↩
- [5]Claude Code Docs — Agent Teams(code.claude.com)↩
- [6]Waydev — 2026 Tech Trends: A Guide for Engineering Leaders(waydev.co)↩