High variant count is the cleanest signal that RPA will shatter and LLMs may carry the load
Process documentation describes what teams believe they do — not what the systems record
A structured mining sprint compresses this to two to four weeks[8]
Deloitte adoption study, 2025
Most enterprise AI pilots were chosen in a conference room. Someone drew a 2×2 on a whiteboard, the senior voice picked the workflow that sounded exciting, and three months later a team is shipping something the system logs never asked for. The data already knew which processes were painful, repetitive, and worth automating. Nobody pulled it.
Process mining — Celonis, Apromore, Disco, IBM Process Mining — extracts event logs from the systems that actually run your business (SAP, Oracle, ServiceNow, Salesforce, Jira) and reconstructs every path your processes take[5]. Not the SOP from 2019. Not the BPMN diagram a consultant drew. The literal sequence of timestamped events your production systems emit. The output is a ranked list of workflows by volume, cycle time, bottleneck severity, and variant count — the raw material for an evidence-based AI use case pipeline.
The market is moving fast. Process mining software is projected to grow from roughly $850M in 2026 toward $2B by 2031, with Celonis claiming 60% market share and holding category leader status across analyst assessments[1]. Most adoption is still RPA-flavored — using mining to find rule-based automation candidates. The LLM-era question is sharper. Which mined workflows justify a language model? Which ones want rule execution? Which ones are not worth touching at all?
One contrarian point worth stating directly: process mining will not fix your AI selection problem if your executives refuse to act on unglamorous findings. One logistics company ran a full Celonis deployment, surfaced accounts payable exception handling as the highest-ROI candidate, and ignored it because the CFO wanted to do AI for customer experience. The data is only as useful as the decision-making culture that absorbs it.
Why the Room Always Picks the Wrong Workflow
Three structural reasons workshop-driven selection fails — and why log data cuts past every one
Workshop-driven selection fails predictably. The same three structural failures show up across industries.
The loudest voice wins. Every prioritization session has someone senior enough to override the analysis. They watched a vendor demo, they have a pet project, they find AI for contract review more interesting than AI for invoice routing. The team builds the interesting idea, not the high-ROI one. The incentive points at narrative, not throughput.
Nobody admits which workflow is actually painful. Ask a room of VPs which process is most broken and they describe a process in someone else's department. The order-to-cash exception handling that costs the finance team forty hours a week never surfaces because the finance VP will not air dirty laundry in front of the CIO. Mining surfaces the pain from system data. Politics has no vote.
The interesting idea beats the measurable one. AI for customer sentiment sounds innovative. AI to reduce the eight percent exception rate in supplier onboarding sounds unglamorous. ROI can be identical. The second one has a cleaner feedback loop, less hallucination surface, and a sharper success metric. Workshop groups pick the first one anyway. Mining-based selection lets the bottleneck data set the initial ranking before any human applies preferences.
Process Mining in One Page (For the AI Reader)
ETL plus clustering plus visualization on event logs — and why the output changes what AI selection looks like
If your background is ML rather than BPM, here is what process mining actually does. Every enterprise system of record — SAP, Oracle, ServiceNow, Salesforce, Jira — emits an event log: a timestamped record of which case (invoice, ticket, contract, employee record) hit which step, when, and in what sequence. Mining tools ingest those logs and run clustering and graph algorithms over the case sequences to reconstruct a process graph.
The output is not a theoretical BPMN diagram. It is a map of every distinct path actually taken, annotated with frequency and duration. A workflow your SOP claims has two paths might have forty-seven variants in reality — each deviation a data quality issue, an edge case someone built a workaround for, or a policy exception that became the default. Variant count is one of the most diagnostic signals in the entire analysis. It tells you whether you have a stable process (low variants, safe for RPA) or a judgment-intensive one (high variants, candidate for an LLM).
Mining tools also compute conformance (how often the actual process matches the intended one), bottleneck attribution (which steps accumulate wait time), and root cause attribution (which case attributes correlate with slow cases). For every workflow in your event log, you get a quantified profile: how big it is, how slow it is, how variable it is, where the pain compounds. That profile is the input to an AI suitability scoring model. The whiteboard is not.[5]
SOP documents written in 2019 and never updated
Happy-path diagrams showing 1-2 process variants
Estimated cycle times from interviews with process owners
Bottlenecks identified by whoever complained loudest last quarter
BPMN diagrams that no longer match the system configuration
Log-derived event sequences from actual production systems
Every variant captured — often 20–80 distinct paths through the same process
Real durations from timestamped event data, not estimates
Bottleneck ranking by total accumulated wait time across all cases
Conformance scores showing where the actual process deviates from intent
AI Suitability Score: Volume × Variance × Structure
Three axes, one product. Ranks every mined workflow against itself and forces a verdict on tooling
Once the mined process data exists, you need a scoring model that turns it into a ranked candidate list. The AI Suitability Score is a three-axis product: Volume × Variance × Structure. Score each axis 1–5, multiply, and every workflow in your inventory gets a comparable number on a 125-point ceiling.
Volume measures the ROI ceiling. A workflow that processes 50,000 cases per year at 25 minutes per case represents 20,000+ hours of annual labor. A workflow that runs 50 times a year is a rounding error. Volume is the non-negotiable axis — without throughput, no automation pays back. Score it: 1 = under 1,000 cases/year, 3 = 10,000–50,000, 5 = 100,000+.
Variance is where LLMs eat RPA's lunch. RPA breaks on variance — every new path through a process needs a new rule. The rule of thumb: a process with more than ten to fifteen variants is a poor RPA candidate. LLMs handle variance by reading the case, interpreting context, and applying judgment to anomalies rather than failing on them. A high-variant process (4–5 on this axis) is exactly where you want a language model. Score it: 1 = 1–3 variants (pure RPA), 3 = 10–20 (borderline), 5 = 30+ (LLM territory).
Structure measures input parsability. A fully structured workflow — every input a database field, every decision rule-based — does not need an LLM. RPA or a deterministic rule engine handles it cheaper. When inputs include unstructured text (emails, PDFs, free-form comments, contracts), audio, or contextual judgment (is this expense within policy?), the calculus flips. Score it: 1 = fully structured, 3 = mixed, 5 = primarily unstructured or judgment-required.
The model produces a ranked list. Workflows above 40 are real AI candidates — high enough volume to justify investment, high enough variance that RPA fails, unstructured enough that LLMs beat rule engines. Workflows between 15 and 40 are tool-selection conversations: some are clean RPA, some need better data first, some belong on the future roadmap. Workflows under 15 are usually best left alone. The ROI ceiling does not justify the build cost.
| Workflow | Volume | Variance | Structure | Score | Better Tool |
|---|---|---|---|---|---|
| Invoice approval (3-way match exceptions) | 5 | 3 | 3 | 45 | LLM |
| Customer support ticket triage | 5 | 5 | 5 | 125 | LLM |
| Contract redlining (standard templates) | 3 | 4 | 5 | 60 | LLM |
| Expense report validation | 4 | 2 | 2 | 16 | RPA |
| Employee onboarding (system provisioning) | 3 | 2 | 1 | 6 | RPA or neither |
| Supplier onboarding (documentation review) | 3 | 4 | 4 | 48 | LLM |
| Insurance claims adjudication (complex) | 4 | 5 | 5 | 100 | LLM |
| IT ticket routing (category assignment) | 5 | 3 | 4 | 60 | LLM |
Discovery Pipeline: From Logs to Ranked Candidates
A five-step sequence that takes raw system event data to a prioritized shortlist. Sequencing is the leverage point
The pipeline is not complicated. The sequencing is the leverage point. Teams that skip straight to AI suitability scoring before running variant analysis and bottleneck profiling end up scoring assumptions rather than what the log data shows.
Step 1: Source system selection. Not every system of record produces clean event logs. SAP and Oracle are the richest sources for finance and procurement. ServiceNow surfaces IT and HR. Salesforce covers sales and customer success. Jira and similar tools capture engineering and operations. The practical constraint is connector availability — Celonis ships hundreds of pre-built extractors; Apromore and the open-source tools require manual ETL. Start with the two systems that cover your highest-volume workflows. Not the ones that are easiest to connect.
Step 2: Variant analysis. Once the log is loaded, the mining tool reconstructs every distinct case sequence and ranks by frequency. This is the moment of truth. The variant map tells you whether what you thought was a single workflow is six workflows under one name. High-variant processes (30+) flag immediately as LLM territory. Low-variant, high-frequency processes are clean RPA candidates. Processes with high variant counts but low frequency might be worth decomposing — the top three variants might be automatable even if the long tail is messy.
Step 3: Bottleneck identification. The mining tool shows where cases accumulate wait time — which step holds cases the longest in aggregate hours across the year. This is not the same as cycle time. A step that takes two minutes per case but processes 100,000 cases per year at 60% yield is a massive bottleneck even though individual cases move fast. Automate the bottleneck, not the step that is technically interesting. The ROI calculation is straightforward: hours accumulated × hourly cost × fraction automatable.
Step 4: AI suitability overlay. Apply Volume × Variance × Structure to the top twenty bottlenecks by accumulated wait time. That produces your ranked candidate list. Anything above 40 earns a pre-mortem. Anything above 70 earns a funded pilot.
Step 5: Pre-mortem and commit. Before committing to the top three, run a structured pre-mortem against each candidate (next section). The candidates that survive become committed pilots. The ones that do not go back into the pipeline for the next cycle.
Tool Landscape, Honestly Assessed
Six platforms ranked by fit, cost, and what they actually integrate with — not by analyst-quadrant decoration
The process mining market is dominated by Celonis — roughly 60% market share, top of analyst rankings[1][4]. Market leadership is not the same as fit for your stack. The honest assessment turns on three questions: what are your primary source systems, how much technical capacity do you have for setup, and what is your budget horizon?
The enterprise space consolidated hard in 2021. SAP acquired Signavio. UiPath acquired ProcessGold. IBM acquired myInvenio. All within months. That was a signal — hyperscalers recognized process discovery as a strategic moat for their automation suites[2]. The consequence: choosing a mining tool is now partly a bet on your broader vendor ecosystem.
Tools Worth Evaluating in 2026
- ✓
Celonis — Market leader at 60% share, built for large enterprise transformation. Strongest SAP and Salesforce connectors, hundreds of pre-built extractors. Expensive at the enterprise tier; PQL literacy required. The right pick when SAP runs the company.
- ✓
Apromore — Open-core with a free Community Edition and an Enterprise tier. Academic roots at the University of Melbourne; strong on non-SAP source systems. Acquired by Salesforce, increasingly the mid-market pick. Best fit for teams that want control over the underlying model without a Celonis budget.
- ✓
Disco (Fluxicon) — Single-analyst desktop tool. Best for a quick discovery sprint on one workflow. Not a full enterprise platform, but fast to start with clean XES or CSV exports.
- ✓
UiPath Process Mining (formerly ProcessGold) — Tightly coupled to UiPath's RPA suite. If UiPath already runs your automation, the discovery-to-deployment loop is shorter here than anywhere else. RPA-flavored by heritage; less suited for LLM-focused discovery.
- ✓
IBM Process Mining (formerly myInvenio) — Embedded in IBM Cloud Pak for Business Automation, with watsonx integration. Right pick when IBM infrastructure or z/OS environments dominate. Discovery and execution stay inside the IBM stack.
- ✓
SAP Signavio — SAP-native process intelligence, tightly integrated with the SAP Business Technology Platform. The obvious pick if SAP is your source of truth and integration friction is the priority constraint. Less competitive outside the SAP ecosystem.
The Pre-Mortem: What Will Block This Automation
A high suitability score does not guarantee delivery. Run the pre-mortem before committing budget
The AI Suitability Score tells you whether a workflow is technically worth automating. The pre-mortem tells you whether your organization can ship it. Different questions. Conflating them is how teams burn six months building pilots that never deploy.
Five blockers, every time. Data access: does your team have the permissions and API access for the systems involved, or are you about to spend three months waiting on InfoSec? Change management: who owns this process today, and are they a sponsor or a blocker? Audit and regulatory: does the workflow carry compliance constraints that mandate human sign-off at specific steps, and has legal confirmed automation is permissible? Integration cost: what is the actual API work to wire the upstream source and downstream action systems, and does it fit your pilot budget? Ownership dispute: does this workflow span multiple departments, and is there a single accountable owner for the automation? If two departments share ownership, the pilot stalls in cross-team approval loops every time.
Pre-Mortem Checklist for the Top-Ranked Candidates
- ✓
Data access confirmed — system API credentials and read permissions secured for every source system in the workflow
- ✓
Process owner identified and enrolled as a sponsor — actively co-owning the pilot outcome, not merely informed
- ✓
Legal and compliance review complete — audit trail requirements documented, automation permissibility confirmed in writing
- ✓
Integration cost estimated — upstream triggers and downstream actions mapped, not just the LLM inference step
- ✓
Change management plan sketched — who changes their workflow, when they are told, who trains them
- ✓
Success metric agreed — one primary metric (cycle time, exception rate, cost per case) every stakeholder accepts as the scaling decision criterion
Task Mining: When the Logs Are Not Enough
When the workflow lives on screens instead of in databases, you need a different instrument
Process mining works on system-of-record event logs. A significant share of enterprise work happens on surfaces that emit nothing structured: Excel spreadsheets, email threads, browser-based internal tools, legacy desktop applications. When an analyst exports an ERP table into a spreadsheet, reconciles it against a PDF invoice, pastes the result into a web form, and emails the approver — none of those steps appear in your SAP event log. They are invisible to process mining.
Task mining fills the gap by recording screen activity (keystrokes, mouse movements, application context) and using computer vision and NLP — increasingly LLM-based — to reconstruct the actual task sequence[3]. UiPath Process Mining's task capture, Celonis Task Mining, and Microsoft Power Automate Process Mining all offer desktop recording. The output looks similar to process mining: a variant map of how analysts actually execute screen-level tasks, annotated with frequency and duration.
The practical constraint is consent and privacy. Screen recording requires explicit employee consent and disciplined data governance — especially in regulated industries. Setup overhead is higher than connector-based mining. Data quality depends on recording duration and coverage. For knowledge worker workflows with high strategic value but no system-of-record footprint, task mining is the right tool. For everything with clean event logs, stay in traditional process mining.
Anti-Patterns That Burn the Mining Budget
Five ways organizations spend on mining and get no AI candidates out the other end
Mining Without an Executive Sponsor
Process mining surfaces uncomfortable truths about which processes are broken and who owns them. Without a C-suite sponsor explicitly committed to acting on the findings, the output is a beautiful process map nobody uses to make a decision. Lock in the sponsor before the tool contract.
The Complete Map Trap
Some teams insist on mapping every operational process before selecting any candidates. The result is a six-month enterprise-wide inventory and zero pilots. Mine two source systems, score the top twenty, pick three. The remaining processes are still there next quarter.
Confusing a Staffing Problem for an Automation Opportunity
Process mining will faithfully show you that cases accumulate for forty-eight hours at a specific approval step. What it will not tell you is that the wait exists because one approver is overloaded and the backlog is a staffing decision, not an automation opportunity. Always ask whether a bottleneck has a human solution before building a technical one.
The Vanity Variant Count
High variant counts are a signal, not a verdict. Thirty variants might mean twenty-eight of them account for two percent of cases — the high-frequency variants are stable and automatable. Always segment by variant frequency before declaring a process too variable for RPA. The long tail is often noise, not signal.
Treating Mining as a One-Time Project
Mining is most valuable as a continuous intelligence layer, not a one-off discovery exercise. Processes change. Systems are updated. Policies shift. Volume spikes during seasonal periods. Organizations that run one sprint, pick three automations, and cancel the license find their automation portfolio drifting out of sync with how processes actually operate.
First 90 Days: From Zero to Committed Pilots
A concrete sequence — without boiling the ocean
- [01]
Pick Two Source Systems and Connect Them (Days 1–30)
Pick the two systems of record carrying the highest operational process volume — typically SAP or Oracle for finance and procurement, ServiceNow or Jira for IT and operations. Avoid CRM systems first; data quality is lower and processes are harder to scope. Stand up the mining tool (Disco trial for a proof-of-concept sprint; Celonis or Apromore for a production deployment), extract six to twelve months of event log data, and render the first process graph.
- [02]
Run the 30-Day Mining Proof (Days 15–45)
A focused analysis sprint. Not a full process inventory. The goal is to score the top twenty bottlenecks from the two source systems against the AI Suitability model. One process analyst (or a data-literate ops manager) owns the sprint. Output: a scored table of twenty candidates with notes on the primary blockers for each.
- [03]
Score the Top Ten and Pre-Mortem the Top Five (Days 30–60)
Take the top ten by AI Suitability Score into a structured pre-mortem with the relevant process owners. Most organizations find that two or three of the top ten are technically strong but organizationally blocked — data access, regulatory constraints, ownership disputes. Remove the blocked ones. The survivors become your pilot shortlist.
- [04]
Commit to Three Pilots With Funding and Timelines (Days 60–90)
Commit to three pilots with explicit funding, timelines, and success criteria. Structure them as the first article in this companion series recommends: one low-risk/high-signal, one medium-risk/high-value, one exploratory. Each should test something different about your organization's ability to execute automation. Set a 60-day go/no-go review for each pilot.
Common Questions
The objections that surface in every mining conversation
Do we need Celonis specifically, or can we DIY this?
You do not need Celonis. For a proof-of-concept sprint, Disco (Fluxicon) is free for limited datasets and produces variant maps and bottleneck analysis good enough to score your first ten candidates. Apromore's Community Edition is open-source and covers the full mining feature set. Celonis pays back when you are deploying enterprise-wide across hundreds of processes and need pre-built SAP connectors, a monitoring layer, and commercial support. For a 90-day discovery sprint with one analyst, start with a lighter tool. Graduate to a platform vendor after the approach has been validated in your org.
What if our processes do not have clean event logs?
Most processes have better log data than their owners think. The real question is whether you can access it and whether it is in extractable form. SAP stores event data in table structures that Celonis and similar tools already know how to read. ServiceNow has native event logs. Even without a native connector, any table with a case ID, an activity or status field, and a timestamp is enough to build an event log. The blocker is usually InfoSec permissions, not data absence. Start by mapping which of your systems of record have case-level timestamped status tables. You will find more clean log data than you expect.
How do we handle workflows that span systems we do not own?
Cross-system workflows are common and harder to mine cleanly. The standard approach: pick the system that holds the case for the longest portion of the cycle time — that is where bottleneck measurement will be most accurate. For handoff points between systems, you need a shared case identifier that persists across both (an invoice number, a ticket ID, a contract ID). Without a shared key, the cross-system join is unreliable. If you cannot join the logs cleanly, mine each system separately and treat the handoff itself as a candidate bottleneck rather than reconstructing the full cross-system view.
What is the difference between process mining and task mining?
Process mining works on structured event logs from systems of record — timestamped records of which case hit which state, when. Best for workflows that run primarily inside enterprise software. Task mining works on screen recordings — it captures what users actually do on their desktops, including operations that never touch a system log (copy-paste from email, Excel manipulation, browser forms). Process mining is faster to set up, cheaper to run, produces cleaner data. Task mining is necessary when the workflow lives partly or wholly in desktop applications that emit nothing structured. Most organizations should start with process mining and add task mining only for specific knowledge worker workflows where system log coverage is low.
How does this connect to picking our first three AI workflows?
Mining is the upstream data layer for the workflow selection framework covered in the companion article on picking your first three AI workflows. That framework (risk × value × signal quality) tells you how to structure your first three pilots once you have candidates. Mining tells you which candidates to put into the framework in the first place. The two approaches run in sequence: mine first to generate a ranked candidate list, then apply the selection framework to decide which three to commit to as pilots. Using the selection framework without mined data means scoring workshop opinions rather than measured process data — which is where most first AI programs go wrong.
Process Mining for AI Discovery Checklist
Two source systems identified by highest operational volume and accessible event logs
Data access permissions secured before tool procurement (InfoSec, API credentials)
Mining tool matched to stage: Disco or Apromore for a discovery sprint, Celonis or IBM for enterprise deployment
Event logs extracted and validated — case ID, activity, and timestamp fields complete
Variant analysis rendered, top twenty workflows by accumulated wait time identified
Each workflow scored on Volume × Variance × Structure (1–5 per axis, multiplied)
Workflows above score 40 flagged as candidates; above 70 flagged as priority pilots
Pre-mortem run on top five — data access, ownership, compliance, integration cost, success metric
C-suite sponsor enrolled before the mining program commits — actively co-owning, not informed
Legal and compliance sign-off captured in writing for any regulatory-adjacent candidate workflows
Top three pilots committed with funding, named pilot owners (not just sponsors), and 60-day go/no-go criteria
Next 90-day mining cycle scheduled to keep the candidate pipeline fresh as processes evolve
The question your organization keeps asking — which workflows should we automate with AI — has a data answer. It is sitting in your SAP tables, your ServiceNow event log, your Salesforce activity history. The workflows worth building share a profile: high volume, high variance, unstructured inputs. Cases where an LLM's ability to read context and handle exceptions outperforms any rule engine you could write. Mining gives you the ranked list. Volume × Variance × Structure gives you the tooling verdict. The pre-mortem tells you which ones you can actually ship. None of this requires a workshop.
Stop building AI use cases out of opinion. Your event logs already know which workflows are painful, repetitive, and ripe for automation. Mine them, score them, run the pre-mortem, then pick. The team that ships the wrong thing confidently is still shipping the wrong thing. The logs have been telling you that the whole time.
- [1]Celonis Named a Leader in Process Mining Market Assessment — Celonis Blog(celonis.com)↩
- [2]Celonis process intelligence turns enterprise AI into ROI — SiliconANGLE, February 2026(siliconangle.com)↩
- [3]6 trends shaping process mining in 2026 — Process Excellence Network(processexcellencenetwork.com)↩
- [4]Celonis Process Mining: Products, Features & Competitors — AIMultiple Research(research.aimultiple.com)↩
- [5]What is Process Mining? — IBM Think(ibm.com)↩
- [6]Top Celonis Process Mining Alternatives: Signavio, Apromore, ARIS, and ABBYY Timeline — mindzie(mindzie.com)↩
- [7]Process Mining Software Market Size & Share Analysis — Mordor Intelligence(mordorintelligence.com)↩
- [8]Process Mining AI Explained: Benefits, Use Cases, and ROI — RTS Labs(rtslabs.com)↩