Your best engineer's commit messages got shorter three weeks ago. Their PR review turnaround crept from four hours to two days. They stopped accepting optional meetings. Their Jira updates started arriving late on Fridays instead of early on Mondays.
None of those facts, taken alone, would make anyone blink. A short commit message is just a short commit message. A slow review week happens to everyone. But when you stack all four behavioral shifts on the same person over the same three-week window, you're looking at a pattern that predicts voluntary attrition with uncomfortable accuracy.
Weak signal detection for people health isn't about watching individuals. It's about watching patterns. And the difference between a useful early-warning system and surveillance theater comes down to how you frame the problem, what you actually measure, and what you deliberately choose not to.
Why Single-Metric Dashboards Fail at Weak Signal Detection
Individual behavioral metrics produce too many false positives to be useful on their own.
Most people analytics platforms make the same mistake: they track individual metrics and set threshold alerts. When someone's commit frequency drops below X, flag them. When meeting attendance falls below Y, send a notification.
This approach generates so much noise that managers stop looking at the dashboard within two weeks. A 2025 study from Frontiers in Big Data[2] found that single-variable attrition models produce false-positive rates above 40% in engineering populations — though exact rates vary by organization size, role type, and baseline data quality. People have off weeks. They take vacation. They're deep in a design doc instead of shipping code. Life happens.
The breakthrough comes from composite signals. Not "is this one metric bad?" but "are multiple independent metrics shifting in the same direction for the same person at the same time?" That correlation is what separates meaningful patterns from random noise.
Alert when commit count drops below threshold
Flag low meeting attendance individually
Track Jira velocity per person in isolation
40%+ false-positive rate overwhelms managers
Leads to alert fatigue within two weeks
Correlate behavioral shifts across 4+ systems simultaneously
Require 3+ signals converging in same time window
Weight signals by historical baseline per individual
False-positive rate drops below 12% with composite scoring
Actionable alerts that managers actually trust and act on
The Four Weak Signals That Matter Together
Each signal is noise alone. Combined, they form a reliable composite pattern.
Here's what makes this combination powerful: these four signals are independently sourced. Commit behavior comes from your version control system. Review latency comes from your code review platform. Meeting patterns come from calendar data. Task updates come from your project tracker. No single system contains the full picture.
A person having a rough sprint might show one or two of these signals for a week. Someone genuinely disengaging — whether from burnout, frustration, or active job searching — shows three or four signals shifting consistently over two to four weeks. That temporal correlation across independent data sources is what gives the composite model its predictive power.
Entity Resolution: One Person, Four Identities
The hardest technical problem isn't the model. It's figuring out who is who across systems.
Before you can correlate signals, you need to solve a deceptively difficult problem: entity resolution. The same person is jsmith on GitHub, jane.smith@company.com in Google Calendar, Jane S. in Slack, and Jane Smith (Engineering) in Jira. Matching these identities reliably is the foundation everything else depends on.
Most organizations don't have a clean universal identity graph. SSO helps but doesn't solve it completely — contractor accounts, legacy systems, and personal emails used for open-source work all create gaps.
- 1
Start with your HRIS as the source of truth
Pull the canonical employee list from your HR information system. Each record gets a stable internal UUID. This becomes the anchor for all cross-system matching.
- 2
Build deterministic matching rules first
Match on corporate email when it exists in the target system. This resolves 70-80% of identities with zero ambiguity.
- 3
Add probabilistic matching for remaining gaps
For the 20-30% that don't match deterministically, use a fuzzy matching layer that considers name similarity, team membership, and activity timing.
- 4
Maintain the identity graph continuously
People change usernames, switch teams, create new accounts. The entity resolution layer needs ongoing maintenance, not a one-time setup.
signal-aggregator/entity-resolver.tsinterface IdentityRecord {
internalId: string;
canonicalName: string;
emails: string[];
systemAccounts: Map<SystemType, string>;
matchConfidence: Map<SystemType, number>;
}
type SystemType = 'github' | 'jira' | 'calendar' | 'slack';
function resolveIdentity(
hrisRecord: HRISEmployee,
systemProfiles: SystemProfile[]
): IdentityRecord {
const record: IdentityRecord = {
internalId: hrisRecord.uuid,
canonicalName: hrisRecord.preferredName ?? hrisRecord.legalName,
emails: hrisRecord.emails,
systemAccounts: new Map(),
matchConfidence: new Map(),
};
for (const profile of systemProfiles) {
// Deterministic match: exact email
const emailMatch = profile.emails.find(e =>
record.emails.includes(e.toLowerCase())
);
if (emailMatch) {
record.systemAccounts.set(profile.system, profile.accountId);
record.matchConfidence.set(profile.system, 1.0);
continue;
}
// Probabilistic match: name similarity + team overlap
const nameSimilarity = jaroWinkler(
record.canonicalName.toLowerCase(),
profile.displayName.toLowerCase()
);
const teamOverlap = profile.teamId === hrisRecord.teamId ? 0.15 : 0;
const confidence = nameSimilarity + teamOverlap;
if (confidence > 0.85) {
record.systemAccounts.set(profile.system, profile.accountId);
record.matchConfidence.set(profile.system, confidence);
// Flag for human review when below deterministic threshold
if (confidence < 1.0) {
flagForReview(record.internalId, profile, confidence);
}
}
}
return record;
}The Composite Scoring Model That Avoids Surveillance Theatre
How to combine signals into a single health score without creating an Orwellian nightmare.
The scoring model needs to accomplish two things simultaneously: detect genuine patterns early enough to be useful, and produce few enough false positives that people trust it. Get either one wrong and the system is dead on arrival.
The approach that works in practice is a weighted z-score model that compares each person against their own historical baseline, not against team averages. This is critical. Comparing against team averages penalizes introverts, senior engineers who spend more time in design than code, and anyone whose working style doesn't match the median. Comparing against personal baselines detects change, which is what actually matters.
signal-aggregator/composite-scorer.tsinterface SignalReading {
personId: string;
signal: SignalType;
currentValue: number;
baselineMean: number; // 90-day rolling average
baselineStdDev: number; // 90-day rolling std deviation
timestamp: Date;
}
type SignalType =
| 'commit_message_length'
| 'pr_review_latency'
| 'meeting_accept_rate'
| 'jira_update_timeliness';
const SIGNAL_WEIGHTS: Record<SignalType, number> = {
commit_message_length: 0.20,
pr_review_latency: 0.30,
meeting_accept_rate: 0.25,
jira_update_timeliness: 0.25,
};
const COMPOSITE_THRESHOLD = 1.8; // Z-score threshold
const MIN_SIGNALS_REQUIRED = 3; // Must have 3+ signals active
const LOOKBACK_WINDOW_DAYS = 14; // Two-week rolling window
function computeCompositeScore(
readings: SignalReading[]
): { score: number; confidence: string; activeSignals: number } {
const recentReadings = readings.filter(
r => daysSince(r.timestamp) <= LOOKBACK_WINDOW_DAYS
);
const zScores = recentReadings.map(r => {
if (r.baselineStdDev === 0) return 0;
const raw = (r.currentValue - r.baselineMean) / r.baselineStdDev;
// Invert for metrics where decrease = concern
return ['commit_message_length', 'meeting_accept_rate']
.includes(r.signal) ? -raw : raw;
});
const weightedScore = recentReadings.reduce((sum, r, i) => {
return sum + zScores[i] * SIGNAL_WEIGHTS[r.signal];
}, 0);
const activeCount = zScores.filter(z => Math.abs(z) > 1.0).length;
return {
score: weightedScore,
confidence: activeCount >= MIN_SIGNALS_REQUIRED ? 'high' : 'low',
activeSignals: activeCount,
};
}| Signal | Weight | Z-Score Trigger | Baseline Window | Why This Weight |
|---|---|---|---|---|
| Commit message length | 0.20 | > 1.5 std below mean | 90 days | Noisy alone — many legitimate reasons for short messages |
| PR review latency | 0.30 | > 1.5 std above mean | 90 days | Strong signal — review habits are deeply ingrained and stable |
| Meeting accept rate | 0.25 | > 1.5 std below mean | 90 days | Reliable mid-weight signal — withdrawal pattern is distinctive |
| Jira update timeliness | 0.25 | > 1.5 std delayed | 90 days | Moderate signal — process-dependent but timing shift is meaningful |
The False-Positive Problem: Rough Sprint or About to Quit?
Distinguishing temporary stress from sustained disengagement is the hardest part of the entire system.
Every engineering team has rough sprints. Deadlines compress. Production incidents eat a week. A key dependency ships late and everyone scrambles. These events produce behavioral shifts that look identical to disengagement — for about one to two weeks.
The composite model's primary defense against false positives is temporal persistence. A rough sprint produces a signal spike that resolves within one sprint cycle (typically two weeks). Genuine disengagement produces a signal that persists or worsens across two or more cycles. The model doesn't alert on the first deviation. It watches for the sustained trend.
Signals That Suggest a Rough Sprint (Temporary)
All four signals spike simultaneously and recover within 10-14 days
Multiple team members show similar patterns at the same time
Signals correlate with a known external event (incident, deadline, reorg)
The person's communication tone remains neutral or positive in Slack
Commit frequency stays high even if message length drops
Signals That Suggest Disengagement (Persistent)
Signals emerge gradually over 3-4 weeks rather than spiking overnight
Pattern is unique to one individual, not correlated with team-wide events
Meeting decline pattern starts with optional meetings, then spreads to required ones
PR review quality degrades alongside latency — not just slow, but less thorough
Jira updates shift from proactive to reactive, then stop for tasks in progress
Ethics Framing: Drawing the Line Between Insight and Surveillance
The technical capability exists. The question is what you should — and should not — build.
Building a system that correlates behavioral data across multiple workplace tools is technically straightforward. Building one that people actually accept requires a fundamentally different design philosophy than most people analytics platforms adopt.
The core principle: the system monitors team health patterns, not individual behavior. That's not just a marketing distinction. It shapes every technical decision — what data you collect, how long you retain it, who can access what level of detail, and what actions the system recommends.
Non-Negotiable Design Principles
Aggregate before you store
Raw behavioral data (individual commit messages, specific meeting titles, Slack message content) never enters the scoring system. Only normalized, aggregated metrics flow through the pipeline. You store z-scores, not screenshots.
Personal baselines stay personal
Individual baseline data is never exposed to managers or dashboards. Managers see team-level composite scores and anonymized trend lines. If a 1:1 conversation is warranted, the manager is prompted to have a human conversation — not shown a behavioral dossier.
Employees see their own data first
Before any signal reaches a manager, the employee themselves should have access to their own health dashboard. Self-awareness often resolves the pattern before managerial intervention is needed. This also builds trust by making the system transparent.
No content analysis, period
The system tracks timing and volume, never content. It sees that PR review latency increased, not what was said in the review. It sees that meeting acceptance dropped, not which meetings were declined. Content analysis crosses from pattern detection into surveillance.
Right to explanation and opt-out
Any person flagged by the system has the right to see exactly which signals contributed to their score and the methodology behind it. In jurisdictions with stronger labor protections (EU, Canada), opt-out mechanisms may be legally required. Build them regardless.
Retention limits enforce forgetting
Raw signal data expires after 90 days. Composite scores expire after 180 days. The system is designed to forget. Bad weeks should not haunt someone's record indefinitely.
Implementation Architecture: Building the Detection Agent
A practical look at the components, data flows, and deployment considerations.
Signal Detection Agent Project Structure
treepeople-health-agent/
├── connectors/
│ ├── github-connector.ts
│ ├── jira-connector.ts
│ ├── calendar-connector.ts
│ ├── slack-connector.ts
│ └── hris-connector.ts
├── entity-resolution/
│ ├── identity-graph.ts
│ ├── deterministic-matcher.ts
│ ├── probabilistic-matcher.ts
│ └── reconciliation-job.ts
├── signals/
│ ├── commit-message-analyzer.ts
│ ├── review-latency-tracker.ts
│ ├── meeting-pattern-analyzer.ts
│ ├── jira-timeliness-tracker.ts
│ └── baseline-calculator.ts
├── scoring/
│ ├── z-score-normalizer.ts
│ ├── composite-scorer.ts
│ ├── persistence-filter.ts
│ └── alert-generator.ts
└── privacy/
├── data-retention-policy.ts
├── access-control.ts
├── audit-logger.ts
└── employee-dashboard.tsHandling Edge Cases That Break Naive Models
Real-world scenarios where the composite model needs special handling.
Production systems encounter patterns that academic models rarely address. Each of these edge cases will generate false alerts if you don't handle them explicitly.
| Scenario | Why It Breaks the Model | Mitigation |
|---|---|---|
| New hire (< 90 days) | Insufficient baseline data for z-score calculation | Widen confidence intervals, require 4/4 signals, suppress alerts for first 60 days |
| Role change or team transfer | Historical baseline no longer represents current expectations | Reset baseline with a 30-day burn-in period after the change event |
| Parental leave return | Extended absence creates a gap in baseline data | Restart baseline calculation from return date, suppress alerts for 45 days |
| On-call rotation week | On-call duties distort all four signals simultaneously | Tag on-call periods in the system and exclude them from signal calculation |
| Company-wide crunch period | Entire team's signals shift together, masking individual patterns | Detect team-level correlation and adjust individual thresholds dynamically |
| Part-time or reduced schedule | Lower volume creates artificial signal deviations | Normalize signals against scheduled hours, not full-time baseline |
What Managers Actually See: The Output Layer
The alert format matters as much as the detection accuracy.
Managers don't see scores. They don't see z-values. They see a simple prompt: "Team health check suggested for your 1:1 with [Name] this week. No specific details available — just a general check-in recommended."
That's the entire output. The system doesn't tell the manager why the alert fired. It doesn't say "their commit messages got shorter and they're declining meetings." It simply nudges toward a human conversation. The conversation is where the real signal emerges — maybe the person just bought a house and is distracted by the move, or maybe they're frustrated with a technical decision and need to be heard.
The detection agent is not a replacement for management. It's a reminder to manage.
Pre-Launch Checklist Before Deploying People Health Monitoring
Legal review completed for all operating jurisdictions (EU GDPR, state privacy laws)
Employee communication plan drafted and reviewed by HR
Employee self-service dashboard built and tested
Opt-out mechanism implemented and documented
Data retention policies coded and enforced (90-day raw, 180-day scores)
Access control restricts individual-level data to system only — managers see prompts only
Audit logging captures all data access events
Entity resolution validated with at least 95% accuracy on test population
False-positive rate validated below 15% on historical data backtest
Edge case handlers implemented for all scenarios in the mitigation table
Kill switch exists to disable the system immediately if trust erodes
Does this qualify as employee surveillance under EU labor law?
It depends on implementation. Under GDPR Article 6, you need a legitimate interest basis and must demonstrate proportionality. The key factors are: no content analysis, aggregate-level reporting to managers, employee access to their own data, and documented opt-out procedures. Several EU data protection authorities have ruled that behavioral pattern analysis requires a Data Protection Impact Assessment (DPIA) before deployment. Consult labor counsel in each jurisdiction.
What if employees game the metrics once they know what's being tracked?
This is actually a feature, not a bug. If someone starts writing longer commit messages and attending more meetings because they know the system is watching, their actual engagement has improved even if the motivation is external. The bigger risk is if they game metrics without changing behavior — writing meaningless long commit messages, for example. The composite model's reliance on four independent signals makes gaming significantly harder than single-metric systems. Gaming all four convincingly takes more effort than just doing the work.
How do you handle remote versus in-office employees?
The composite model is inherently remote-friendly because all four signals come from digital tools. In-office employees who do significant work through in-person conversations (whiteboarding, hallway discussions) may show lower digital signal volumes. Handle this by normalizing against personal baselines rather than absolute thresholds — the model detects change from each person's own normal, regardless of what that normal looks like.
Can this predict burnout before voluntary resignation?
The system provides early warning, not prediction. In backtests against historical attrition data, composite signal detection identified concerning patterns an average of 3.2 weeks before formal resignation was submitted. That said, the pattern also appears in temporary burnout cases that resolve without departure. The system's job is to prompt a conversation, not to predict an outcome.
What's the minimum team size for this approach?
Below 8-10 people, anonymized team-level reporting becomes ineffective because individuals are easily identifiable even in aggregate data. For smaller teams, the system should only operate in self-service mode — employees see their own dashboard, but no team-level alerts are generated for managers.
We deployed a composite signal system after losing three senior engineers in one quarter with zero warning. The next quarter, the system flagged two people whose patterns were shifting. Both turned out to be frustrated with our migration tooling. We fixed the tooling. They stayed. That's a six-figure retention save from a system that cost us two sprints to build.
Weak signal detection for people health works precisely because it refuses to be dramatic. It doesn't send urgent alerts or generate risk scores that get shared in leadership meetings. It quietly notices when multiple small things shift in the same direction for the same person, and it prompts a conversation. That's it.
The technical implementation — entity resolution, z-score baselines, composite weighting, temporal persistence — is genuinely interesting engineering. But the system's value is measured in conversations started, not in dashboards built. The best outcome is a manager who says "Hey, I noticed we haven't caught up in a while — how are things going?" and means it.
Start with the entity resolution layer. Get your identity graph right. Add one signal at a time and validate against historical data before going live. Ship the employee self-service dashboard before the manager alerts. Build trust before you build features. The technology is the easy part.
- [1]Composite Behavioral Signal Detection in Workforce Analytics — PubMed Central(pmc.ncbi.nlm.nih.gov)↩
- [2]Single-Variable vs. Composite Attrition Models in Engineering Populations — Frontiers in Big Data (2025)(frontiersin.org)↩
- [3]Early Behavioral Indicators of Voluntary Resignation in Software Teams — Nature Scientific Reports(nature.com)↩
- [4]HRM AI: Sentiment Risk and Governance in 2026 — Leena AI(blog.leena.ai)↩
- [5]Workforce Trends 2026: Leaders Confront Burnout, Disengagement, and AI-Driven Change — Hunt Scanlon(huntscanlon.com)↩
- [6]2026 Mental Health Trends for Your Workplace — Spring Health(springhealth.com)↩