Your strongest engineer's commit messages collapsed from prose to fragments three weeks ago. PR review turnaround drifted from four hours to two days. Optional meetings stopped getting accepted. Jira updates slid from early Monday to late Friday, then dropped off entirely on tasks already in flight.
Any one of those is noise. Short commits happen. Slow review weeks happen. Four independent systems drifting the same direction, on the same person, inside the same three-week window — that is not noise. That is a pattern that predicts voluntary departure at an accuracy rate nobody wants to be right about.
This is not a monitoring problem. It is a correlation problem. The line between an early-warning system managers trust and surveillance theater employees route around runs through three decisions: what you measure, what you refuse to measure, and what the system is permitted to say to whom.
Single-Metric Dashboards Are an Alibi, Not a Signal
The fight: individual metrics generate so much false positive that managers stop reading the dashboard inside two weeks.
Most people-analytics platforms repeat the same architectural mistake. Track one variable per person. Set a threshold. Fire an alert when it crosses. Commit frequency drops below X — flag. Meeting attendance falls below Y — flag. The alert lands in someone's inbox. The someone learns, inside a fortnight, that the alerts are wrong four times in ten.
The dashboard goes unread by week three. A 2025 study from Frontiers in Big Data[2] put the false-positive rate of single-variable attrition models above 40% in engineering populations — varying with org size, role mix, and baseline cleanliness. The mechanism is mundane. People have off weeks. They take leave. They sit inside a design doc for ten days instead of shipping code. None of those are pre-resignation patterns. All of them trip a single-metric threshold.
The fix is not a better threshold. It is a different question. Stop asking is this one metric bad? Start asking are multiple independent metrics drifting the same direction for the same person at the same time? That correlation is the load-bearing variable. Single-metric systems treat noise as data. Composite systems use noise as a filter.
Threshold alert on commit count below a fixed number
Per-person meeting attendance flagged in isolation
Jira velocity tracked as a stand-alone metric
False-positive rate above 40% — managers stop trusting the feed
Alert fatigue lands in two weeks. Dashboard dies in three.
Behavioral shift correlated across four independent source systems
Three or more signals required to converge inside the same window
Z-score weighted against the person's own 90-day baseline, not the team median
False-positive rate drops below 12% under composite scoring
Alerts that survive the trust test — managers act on them
Four Signals. Independently Sourced. That Is the Whole Trick.
Each is noise alone. The leverage is the independence — no single tool can fabricate the pattern.
The reason this combination holds is not the choice of metrics. It is the source topology. Commit behavior lives in version control. Review latency lives in the code review platform. Meeting patterns live in the calendar. Jira updates live in the project tracker. Four systems. Four owners. No single tool sees the whole picture — which is precisely the property that makes the composite signal hard to fake and harder to coincidence into existence.
A rough sprint produces one or two signals for a week. Real disengagement — burnout, frustration, an active job search — produces three or four signals drifting the same direction across two to four weeks. The predictive power is not in any individual reading. It is in the temporal correlation across independent sources. That is the load-bearing claim of the entire architecture.
Entity Resolution: One Person, Four Identities
The hardest technical problem is not the model. It is figuring out who is who across systems.
Before correlation comes a deceptively expensive problem: entity resolution. The same person is jsmith on GitHub, jane.smith@company.com in Google Calendar, Jane S. in Slack, and Jane Smith (Engineering) in Jira. If those four identities never reconcile to one internal ID, no signal correlates. The whole architecture collapses on the first join.
Most organizations do not run a clean universal identity graph. SSO closes part of the gap. It does not close all of it. Contractor accounts, legacy systems, and personal emails attached to open-source work each leak identity outside the graph.
- [01]
Anchor on the HRIS — it is the only authoritative employee list
Pull the canonical employee list from the HR information system. Each record receives a stable internal UUID. That UUID becomes the anchor every other system matches against. No anchor, no resolution.
- [02]
Run deterministic matching first — it pays for itself
Match on corporate email wherever the target system stores one. Deterministic matching closes 70–80% of identity links with zero ambiguity. Spend probabilistic compute only on what is left.
- [03]
Reach for probabilistic matching only on the residual
For the 20–30% that does not match deterministically, run a fuzzy match layer over name similarity, team membership, and activity timing. Probabilistic results never write to production identity without a human confirming the link.
- [04]
Treat the identity graph as a living system, not a setup task
People change usernames. They switch teams. They create new accounts under new emails. Entity resolution is a continuous reconciliation problem, not a one-time configuration. Drift is the default state of any graph without an owner.
signal-aggregator/entity-resolver.tsinterface IdentityRecord {
internalId: string;
canonicalName: string;
emails: string[];
systemAccounts: Map<SystemType, string>;
matchConfidence: Map<SystemType, number>;
}
type SystemType = 'github' | 'jira' | 'calendar' | 'slack';
function resolveIdentity(
hrisRecord: HRISEmployee,
systemProfiles: SystemProfile[]
): IdentityRecord {
const record: IdentityRecord = {
internalId: hrisRecord.uuid,
canonicalName: hrisRecord.preferredName ?? hrisRecord.legalName,
emails: hrisRecord.emails,
systemAccounts: new Map(),
matchConfidence: new Map(),
};
for (const profile of systemProfiles) {
// Deterministic path. Cheap, unambiguous, runs first.
const emailMatch = profile.emails.find(e =>
record.emails.includes(e.toLowerCase())
);
if (emailMatch) {
record.systemAccounts.set(profile.system, profile.accountId);
record.matchConfidence.set(profile.system, 1.0);
continue;
}
// Probabilistic path. Only the residual reaches this branch.
const nameSimilarity = jaroWinkler(
record.canonicalName.toLowerCase(),
profile.displayName.toLowerCase()
);
const teamOverlap = profile.teamId === hrisRecord.teamId ? 0.15 : 0;
const confidence = nameSimilarity + teamOverlap;
if (confidence > 0.85) {
record.systemAccounts.set(profile.system, profile.accountId);
record.matchConfidence.set(profile.system, confidence);
// Anything below deterministic confidence requires human sign-off.
if (confidence < 1.0) {
flagForReview(record.internalId, profile, confidence);
}
}
}
return record;
}The Composite Scoring Model That Refuses to Be Surveillance
Combine signals into a single health score without producing a behavioral dossier on every employee.
The scoring model has two non-negotiable properties: detect real patterns early enough to be useful, and produce few enough false positives that managers actually trust the alerts. Miss either one and the system is dead on arrival.
The approach that holds up in production is a weighted z-score model that scores each person against their own historical baseline, never against team averages. The distinction is load-bearing. Comparing against team averages penalizes introverts, senior engineers who spend more time inside design docs than commits, and anyone whose working style sits off the median. Comparing against personal baselines detects change — and change is the only thing that matters here.
signal-aggregator/composite-scorer.tsinterface SignalReading {
personId: string;
signal: SignalType;
currentValue: number;
baselineMean: number; // 90-day rolling mean
baselineStdDev: number; // 90-day rolling std dev
timestamp: Date;
}
type SignalType =
| 'commit_message_length'
| 'pr_review_latency'
| 'meeting_accept_rate'
| 'jira_update_timeliness';
const SIGNAL_WEIGHTS: Record<SignalType, number> = {
commit_message_length: 0.20,
pr_review_latency: 0.30,
meeting_accept_rate: 0.25,
jira_update_timeliness: 0.25,
};
const COMPOSITE_THRESHOLD = 1.8; // Composite z-score trigger
const MIN_SIGNALS_REQUIRED = 3; // Three independent signals or no alert
const LOOKBACK_WINDOW_DAYS = 14; // Two-week persistence window
function computeCompositeScore(
readings: SignalReading[]
): { score: number; confidence: string; activeSignals: number } {
const recentReadings = readings.filter(
r => daysSince(r.timestamp) <= LOOKBACK_WINDOW_DAYS
);
const zScores = recentReadings.map(r => {
if (r.baselineStdDev === 0) return 0;
const raw = (r.currentValue - r.baselineMean) / r.baselineStdDev;
// Invert metrics where downward movement is the concern.
return ['commit_message_length', 'meeting_accept_rate']
.includes(r.signal) ? -raw : raw;
});
const weightedScore = recentReadings.reduce((sum, r, i) => {
return sum + zScores[i] * SIGNAL_WEIGHTS[r.signal];
}, 0);
const activeCount = zScores.filter(z => Math.abs(z) > 1.0).length;
return {
score: weightedScore,
confidence: activeCount >= MIN_SIGNALS_REQUIRED ? 'high' : 'low',
activeSignals: activeCount,
};
}| Signal | Weight | Z-Score Trigger | Baseline Window | Why This Weight |
|---|---|---|---|---|
| Commit message length | 0.20 |
| 90 days | Noisy alone — many legitimate reasons produce short messages |
| PR review latency | 0.30 |
| 90 days | Strong signal — review habits are stable and deeply ingrained |
| Meeting accept rate | 0.25 |
| 90 days | Mid-weight signal — withdrawal pattern is distinctive and durable |
| Jira update timeliness | 0.25 |
| 90 days | Moderate signal — process-dependent, but the timing shift carries information |
Rough Sprint or About to Quit? The False-Positive Problem
Separating temporary stress from sustained disengagement is the hardest part of the whole system.
Every engineering team has rough sprints. Deadlines compress. A production incident eats a week. A key dependency ships late and everyone scrambles. The behavioral shift produced looks identical to disengagement — for about one to two weeks.
The composite model's primary defense against false positives is temporal persistence. A rough sprint generates a signal spike that resolves within one sprint cycle, typically two weeks. Real disengagement generates a signal that persists or worsens across two or more cycles. The model does not alert on the first deviation. It alerts on the sustained trend.
Pattern Reads as Rough Sprint (Temporary)
All four signals spike simultaneously and recover inside 10–14 days
Multiple team members show the same pattern at the same time
Signals correlate with a known external event — incident, deadline, reorg
Slack tone stays neutral or positive across the same window
Commit frequency holds even when message length drops
Pattern Reads as Disengagement (Persistent)
Signals emerge gradually over 3–4 weeks rather than spiking overnight
Pattern is unique to one person, uncorrelated with team-wide events
Meeting decline starts on optional invites, then bleeds into required ones
PR review quality degrades alongside latency — slower and less thorough
Jira updates shift from proactive to reactive, then stop on in-flight tasks
Where Insight Stops and Surveillance Starts
The technical capability is trivial. The question is which design choices keep the system on the right side of the line.
Correlating behavioral data across four workplace tools is technically trivial. Building a version employees accept requires a fundamentally different design philosophy than most people-analytics platforms ship with.
The core invariant: the system monitors team health patterns, never individual behavior in detail. That is not a marketing distinction. It shapes every technical decision downstream — what data enters the pipeline, how long it persists, who can access what level of resolution, and what action the system is permitted to recommend.
Non-Negotiable Design Constraints
Aggregate before you store
Raw behavioral data — individual commit messages, specific meeting titles, Slack message content — never enters the scoring pipeline. Only normalized, aggregated metrics survive. You store z-scores, never screenshots.
Personal baselines stay personal
Individual baselines never reach managers or dashboards. Managers see team-level composite scores and anonymized trend lines. When a 1:1 is warranted, the system nudges toward a human conversation — it does not hand over a behavioral dossier.
Employees see their own data first
Before any signal routes to a manager, the employee themselves has access to their own health view. Self-awareness resolves a non-trivial share of patterns before managerial intervention is needed. Transparency is also the only durable trust mechanism the system has.
No content analysis, ever
The system tracks timing and volume. It never reads content. It sees that PR review latency rose, not what was said in the review. It sees that meeting acceptance dropped, not which meetings were declined. Content analysis crosses the line from pattern detection into surveillance — and the line does not move back.
Right to explanation and opt-out
Any person flagged by the system has the right to see exactly which signals contributed to their score and the methodology behind it. In jurisdictions with stronger labor protections — EU, Canada — opt-out is legally required. Build it regardless of jurisdiction.
Retention limits enforce forgetting
Raw signal data expires after 90 days. Composite scores expire after 180 days. The system is designed to forget on purpose. A bad fortnight should not haunt anyone's record indefinitely.
Implementation Architecture: How the Detection Agent Actually Lays Out
Components, data flows, and the deployment constraints that shape both.
Signal Detection Agent Project Structure
treepeople-health-agent/
├── connectors/
│ ├── github-connector.ts
│ ├── jira-connector.ts
│ ├── calendar-connector.ts
│ ├── slack-connector.ts
│ └── hris-connector.ts
├── entity-resolution/
│ ├── identity-graph.ts
│ ├── deterministic-matcher.ts
│ ├── probabilistic-matcher.ts
│ └── reconciliation-job.ts
├── signals/
│ ├── commit-message-analyzer.ts
│ ├── review-latency-tracker.ts
│ ├── meeting-pattern-analyzer.ts
│ ├── jira-timeliness-tracker.ts
│ └── baseline-calculator.ts
├── scoring/
│ ├── z-score-normalizer.ts
│ ├── composite-scorer.ts
│ ├── persistence-filter.ts
│ └── alert-generator.ts
└── privacy/
├── data-retention-policy.ts
├── access-control.ts
├── audit-logger.ts
└── employee-dashboard.tsEdge Cases That Break a Naive Model — Every Time
Production patterns academic papers rarely model. Each one generates false alerts unless the system handles it explicitly.
Production hits patterns that academic models never sit with long enough to reproduce. Every edge case below will generate false alerts unless the system handles it as a first-class case rather than a footnote.
| Scenario | Why It Breaks the Model | Mitigation |
|---|---|---|
| New hire (< 90 days) | Baseline data too thin to compute z-scores | Widen confidence intervals, require 4/4 signals, suppress alerts for the first 60 days |
| Role change or team transfer | Historical baseline no longer represents current expectations | Reset baseline with a 30-day burn-in window after the change event |
| Parental leave return | Extended absence creates a structural gap in baseline data | Restart baseline from return date, suppress alerts for 45 days |
| On-call rotation week | On-call duties distort all four signals at once | Tag on-call periods in the system and exclude them from signal calculation |
| Company-wide crunch period | Team-wide drift masks individual patterns | Detect team-level correlation and adjust individual thresholds dynamically |
| Part-time or reduced schedule | Lower activity volume produces artificial deviations | Normalize against scheduled hours, never against a full-time baseline |
What Managers Actually See — and Why That Is the Whole Design
The alert format carries as much weight as the detection accuracy. Possibly more.
Managers do not see scores. They do not see z-values. They see one prompt: "Team health check suggested for your 1:1 with [Name] this week. No specific details available — just a general check-in recommended."
That is the entire output surface. The system never tells the manager why the alert fired. It does not say "their commit messages shortened and they are declining meetings." It nudges toward a human conversation, and that is where the real signal emerges — maybe the person just bought a house and is distracted by the move, maybe they are frustrated with a technical decision and need to be heard, maybe both, maybe neither. The system does not know. It does not need to.
The detection agent is not a replacement for management. It is a reminder to manage.
Here is the second-order effect most teams never anticipate: the system surfaces bad managers faster than it surfaces disengaged employees. A single manager with four engineers flagged inside the same quarter is a far clearer organizational signal than four individuals having four separate problems. Run the composite model at the team level rather than the individual level and it becomes an unintentional management-quality detector. That is either a feature or a threat depending on who is reading the data. The politics of that conversation deserve a real meeting, before the system ships.
Pre-Launch Checklist Before Deploying People Health Monitoring
Legal review signed off for every operating jurisdiction — EU GDPR, US state privacy law, equivalents
Employee communication plan drafted and reviewed with HR before any data flows
Employee self-service dashboard built, tested, and reachable before manager alerts ship
Opt-out mechanism implemented, documented, and surfaced — not buried
Retention policies enforced in code: 90-day raw signals, 180-day composite scores
Access control: individual-level data isolated to the system; managers see prompts only
Audit logging captures every data access event — read paths included, not just writes
Entity resolution validated at 95%+ accuracy on the test population
False-positive rate validated below 15% on a historical data backtest before go-live
Edge case handlers implemented for every scenario in the mitigation table — no gaps
Kill switch exists, is tested, and any owner can pull it the moment trust erodes
Does this qualify as employee surveillance under EU labor law?
Implementation determines the answer. GDPR Article 6 requires a legitimate interest basis and a proportionality argument. The factors that decide it: no content analysis, aggregate-level reporting to managers, employee access to their own data, and documented opt-out procedures. Several EU data protection authorities have ruled that behavioral pattern analysis requires a Data Protection Impact Assessment before deployment. Consult labor counsel in every jurisdiction you operate in — there is no general answer.
What if employees game the metrics once they know what is tracked?
Mostly a feature. If someone writes longer commit messages and accepts more meetings because the system is watching, their actual engagement has shifted in the right direction — external motivation or not. The real failure mode is gaming without behavior change: empty padded commits, accepted meetings nobody attends. The composite model's dependency on four independent signals makes that significantly harder than single-metric systems. Gaming all four convincingly costs more effort than doing the work.
How do you handle remote versus in-office employees?
The composite model is remote-native because all four signals originate from digital tools. In-office employees who do significant work through whiteboarding and hallway conversations show lower digital signal volume by default. Personal baselines, not absolute thresholds, absorb this — the model detects change from each person's own normal regardless of what that normal looks like.
Can this predict burnout before voluntary resignation?
It provides early warning, not prediction. In backtests against historical attrition data, composite signal detection surfaced concerning patterns an average of 3.2 weeks before formal resignation was submitted. The same pattern also appears in temporary burnout cases that resolve without departure. The system's job is to prompt a conversation, not to call an outcome.
What is the minimum team size for this approach?
Below 8–10 people, anonymized team-level reporting stops being anonymous — individuals are identifiable by elimination even in aggregate data. A team of 5 engineers with one person flagged is effectively de-anonymized. For smaller teams, run self-service mode only: employees see their own dashboard, no team-level alerts route to managers. Teams between 10–15 sometimes apply k-anonymity constraints — alerts fire only when 3+ individuals share the same pattern flag — to block spotlight identification even mid-size.
Weak signal detection for people health works precisely because it refuses to be dramatic. No urgent alerts. No risk scores leaking into leadership meetings. The system quietly notices when multiple small things drift the same direction on the same person, and it nudges someone toward a conversation. That is the whole product surface.
The engineering — entity resolution, z-score baselines, composite weighting, temporal persistence — is genuinely interesting work. The system's value is measured in conversations started, not in dashboards built. The best outcome is a manager who walks into a 1:1 and says "Hey, I noticed we haven't caught up in a while — how are things going?" and actually means it.
Start with entity resolution. Get the identity graph right before anything else. Add one signal at a time and validate against historical data before flipping any switch. Ship the employee self-service dashboard before any manager alert exists. Trust gets built before features do. The technology is the easy part — and the part most teams will mistake for the whole problem.
- [1]Composite Behavioral Signal Detection in Workforce Analytics — PubMed Central(pmc.ncbi.nlm.nih.gov)↩
- [2]Single-Variable vs. Composite Attrition Models in Engineering Populations — Frontiers in Big Data (2025)(frontiersin.org)↩
- [3]Early Behavioral Indicators of Voluntary Resignation in Software Teams — Nature Scientific Reports(nature.com)↩
- [4]HRM AI: Sentiment Risk and Governance in 2026 — Leena AI(blog.leena.ai)↩
- [5]Workforce Trends 2026: Leaders Confront Burnout, Disengagement, and AI-Driven Change — Hunt Scanlon(huntscanlon.com)↩
- [6]2026 Mental Health Trends for Your Workplace — Spring Health(springhealth.com)↩