SlideShare a Scribd company logo
Scalable Conformance Checking
for Business Processes
Daniel Reißner, Raffaele Conforti, Marlon Dumas, Marcello La Rosa,
Abel Armas-Cervantes
1
Process mining
Process mining is a family of methods for analyzing business processes
based on event logs.
• Some of the most important process mining operations:
• Discovery
• Conformance checking
• Enhancement
2
Process mining
Process mining is a family of methods for analyzing business processes
based on event logs.
• Some of the most important process mining operations:
• Discovery
• Conformance checking
• Enhancement
Model v1
Log
3
Applications of conformance checking
Compliance
auditing
Model quality
measures
Model repair Deviance mining
Conformance
checking
How well do process
executions fit to a
normative model?
What is the quality
of a discovered
process model?
How can we
adapt the process
model to fit
reality better?
What are the
current compliance
risks?
Are there any employee
innovations?
➢ Fitness,
Precision etc.
4
id trace
(1) C, B, D, F, E
(2) ⟨ B, C, D, E, I, G, D, F ⟩
Trace Alignment (1): 1/2
Log
Model
compare
Event LogProcess model
FDBC E
BB
CC
DD
E
G
FF
I
Trace Alignment (1): 2/2
Log
Model
One optimal alignmentAll optimal alignments
≫
FDBC E
B
C
D
E
≫
Trace Alignments
• Mismatches are reported as task misalignments,
i.e. moves on model or moves on log ≫
• The one-optimal variant returns one model path
with a minimal number of misalignments
• Adopt interleaving semantics
• Build a synchronous net for each trace
• Use an 𝐴∗-Algorithm to find the closest trace
in the model for each trace in the log
Existing approaches:
Trace Alignments
5
• All optimal alignments aim at returning all possible
model-path with minimal number of misalignments
Existing approaches:
Behavioral Alignment
id trace
1 C, B, D, F, E
2 ⟨ B, C, D, E, I, G, D, F ⟩compare
Behavioral Alignment
• Adopts true concurrency semantics
• Translates model and log to prime event structures (PES)
• Uses an 𝐴∗
-Algorithm to find the closest run
in the model PES for each run in the log PES
6
Event LogProcess model
Existing approaches:
Behavioral Alignment
id trace
1 C, B, D, F, E
2 ⟨ B, C, D, E, I, G, D, F ⟩
Event LogProcess model
B
C
D
E
F I G B
C
D
F
E
E
GI D F
7
Existing approaches:
Behavioral Alignment
compare
PES of event LogPES of process model
Behavioral Alignment
B
C
D
E
F I G
B
C
D
F
E
E
GI D F
Behavioral Mismatch (𝟏):
In the Log, after ‘D’, ’F’ is substituted by ‘E’.
Behavioral Mismatch (𝟐):
In the Log, after ‘D’, ’F’ occurs before ‘E’, while
in the model they are mutually exclusive.
8
• Mismatches are gathered as event misalignments,
i.e. moves on model or moves on log ≫
• Mismatches of behavioral relations can be detected
• Differences can be reported as natural language
statements
Scalability challenges of current approaches
• Trace alignment does not scale up with large logs. In some cases trace alignment
is not capable of computing all optimal alignments
• Behavioral alignment is generally slower than trace alignment
• Scalability issues of the conformance checkers can affect other techniques, such
as model repair or process discovery, which rely on conformance checking to
justify the quality of their outputs
9
Research question and desiderata
RQ: How can we improve scalability of conformance checking techniques with large
and noisy event logs while still providing a complete set of differences?
Desiderata:
• Compute one- or all-optimal alignments
• Report the results of the conformance checking as trace alignments and
behavioral statements
10
Overview and general idea
Petri Net
compress
DAFSA
Reachability
Graph
PSP
Event Log
Optimal
Alignments
Difference
Statements
expand
compare
(1)
(2)
(3)
11
From event log to DAFSA
Trace N
⟨ 𝐵, 𝐷, 𝐸 ⟩ 5
⟨ 𝐵, 𝐷, 𝐹 ⟩ 10
⟨ 𝐶, 𝐵, 𝐷, 𝐸 ⟩ 15
⟨ 𝐶, 𝐵, 𝐷, 𝐹 ⟩ 5
Log
s 𝑛1 𝑛2 𝑓1
B D E
𝑛3
BC
𝑓2
F
𝑛4 𝑛5 𝑓3
D E
𝑓4
F
DAFSA
=
Prefixes
⟨ 𝐵, 𝐷 ⟩ , ⟨ 𝐶, 𝐵, 𝐷 ⟩
Suffixes
𝐷, 𝐹 , ⟨ 𝐷, 𝐸 ⟩
12
𝜏
𝐵 𝐶
𝜏
𝐷
𝐹𝐸 𝐼
𝐺
Petri net
𝑝3
𝑝6
𝑝5 𝑝4
𝑝2
𝑝1
𝑝10 𝑝8
𝑝9𝑝7
𝜏
From process model to reachability graph
[𝑝1] [𝑝2, 𝑝3]
Process model
[𝑝5, 𝑝3]
Reachability graph
[𝑝2, 𝑝4]
[𝑝5, 𝑝4]
[𝑝6] [𝑝7]
[𝑝8][𝑝9]
[𝑝10]
τ B
I
G
ED
F
τ
C
C B
B
C
D
x
Why to remove 𝜏-transitions:How to
• Reduce state space for conformance checking
• Reduce uninterpretable conformance results for end user
• For each 𝜏 not targeting a final marking, insert a copy of each
outgoing arc of the target of 𝜏 and link it to the source,
• otherwise, use each incoming arc of its source
Removing unconnected markings
𝜏-less Reachability graph
F
τ
13
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩
PSP construction with the A∗
- Algorithm
[𝑝1]
[𝑝5, 𝑝3]
[𝑝2, 𝑝4]
[𝑝5, 𝑝4]
[𝑝6] [𝑝7]
[𝑝8][𝑝9]
[𝑝10]
I
G
ED
F
C
B
B
C
D
𝝉-less Reachability graph
F
s 𝑛1 𝑛2 𝑓1
B D E
𝑛3
BC
F
DAFSA
( 𝑝1 , 𝑠)
( 𝑝5, 𝑝3 , 𝑛1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩
( 𝑝5, 𝑝3 , 𝑠) ( 𝑝1 , 𝑛1)
⟨𝑟ℎ𝑖𝑑𝑒, 𝐵⟩ ⟨𝑙ℎ𝑖𝑑𝑒, 𝐵⟩
( 𝑝2, 𝑝4 , 𝑠)
⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩
⟨ 𝐵, 𝐷, 𝐸 ⟩
current trace
𝑐 = 1
𝑔 = 0
ℎ = 1
𝑐 = 3
𝑔 = 1
ℎ = 2
𝑐 = 1
𝑔 = 1
ℎ = 0
( 𝑝2, 𝑝4 , 𝑠)
( 𝑝5, 𝑝4 , 𝑛1) ( 𝑝2, 𝑝4 , 𝑛1)
⟨𝑙ℎ𝑖𝑑𝑒, 𝐵⟩
( 𝑝5, 𝑝4 , 𝑠)
𝑐 = 1
𝑔 = 1
ℎ = 0
𝑐 = 3
𝑔 = 2
ℎ = 1
𝑐 = 3
𝑔 = 2
ℎ = 1
⟨𝑟ℎ𝑖𝑑𝑒, 𝐵⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
( 𝑝5, 𝑝3 , 𝑛1)
✓
𝑐 = 3
𝑔 = 1
ℎ = 2
𝑐 = 1
Prefix Memoization
⟨ 𝐵, 𝐷 ⟩ Node 1, Node 2
𝑐 = 1
⟨ 𝐵, 𝐷, 𝐹 ⟩
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
node, Suffix Memoization
( 𝑝5, 𝑝4 , 𝑛1), ⟨ 𝐷, 𝐸 ⟩ Path to node 3
( 𝑝2, 𝑝4 , 𝑛3)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐶⟩
( 𝑝5, 𝑝4 , 𝑛1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
⟨ 𝐶, 𝐵, 𝐷, 𝐸 ⟩⟨ 𝐶, 𝐵, 𝐷, 𝐹 ⟩
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
⟨ 𝐶, 𝐵, 𝐷 ⟩ Node 4
1 2
3
4
14
Patterns for conformance checking diagnosis
Unfitting behavior:
• Relation mismatch:
1. Causality-Concurrency
2. Conflict
• Event mismatch:
3. Task skipping
4. Task substitution
5. Unmatched repetition
6. Task relocation
7. Task insertion / absence
L. García-Bañuelos, N. R.T.P. van Beest , M. Dumas, and M. La Rosa, and W. Mertens: Complete and interpretable conformance checking
of business processes. IEEE Trans. Softw. Eng.: 2017
15
Pattern detection in the example
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩
( 𝑝1 , 𝑠)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩
( 𝑝2, 𝑝4 , 𝑠)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
( 𝑝5, 𝑝4 , 𝑛1)
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
( 𝑝5, 𝑝3 , 𝑛1)
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
( 𝑝2, 𝑝4 , 𝑛3)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐶⟩
( 𝑝5, 𝑝4 , 𝑛1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩
( 𝑝7 , 𝑛2)
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩
⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩
( 𝑝10 , 𝑓1)
⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩
Behavioral alignment feedback:
• In the log, at the start of the trace, “C” is optional
• In the model, after “B”, “C” occurs before “D”
16
Evaluation setup
• Implemented approach in an open source java tool: ProConformance 2.0
(available from http://apromore.org/platform/tools)
• Tested the approach in three setups:
• Road traffic fines management process (RTFMP) ➢ publicly available model - log pair
• BPI Challenge Log 2013 (BPIC13) ➢ artificially generated process model
• SAP R/3 model collection (120 models) ➢ artificially created logs (2.5% → 10% noise)
• 480 model-log pairs
17
Evaluation results
18
Key findings:
• In the case of all-optimal, our technique outperforms
trace alignments by 1-2 orders of magnitude
• Trace alignments timed out in 207 / 480 SAP cases
(given a time bound)
• In the case of one-optimal, our technique performs
from 1.5 to nearly 40 times faster than trace alignment
• In BPIC13, one-optimal trace alignment outperforms
our technique
Evaluation results
19
Optimal alignments
(upper bound of 95% confidence interval)
All optimal
Dataset DAFSA Trace align. [#unfiltered]
RTFMP 467 338 [1,898,182]
BPIC13 cp. 28,656 22,259 [1,904,057]
SAP R/3 2.5% 4,253
(22,675)
1,233[1,067,533]
(6,470 [1,929,629])
SAP R/3 5% 7,672
(41,133)
1,751[1,224,079]
(9,178 [2,199,248])
SAP R/3 7.5% 11,652
(61,504)
2,154 [1,283,583]
(14,207 [3,039,240])
SAP R/3 10% 15,754
(84,167)
2,809 [1,286,568]
(22,883 [3,302,068])
We detected 5 times more
(all optimal) alignments
Future work
• Improve the handling of concurrency and nested loops
• Evaluate our technique using more complex models and logs
• Extend our technique to detect additional model behavior
• Explore different applications for our technique, e.g., process model repair,
drift detection, log delta analysis, etc.
20
21
Pattern detection in PSP
Statement:
In the log, after ”C”, “A” is optional.
Detecting task skips
C
A
B
Model
match(C)
rhide(A)
match(B)
PSP
match(A)
match(B)
B
C
Log
A
B
PSP
rhide(A)
match(B)
match(C)
lhide(A)
B
C
A
Log
A
B
C
Model
Statement:
In the log, ”A” appears after “C” instead
of the initial marking.
Detecting task relocations
match(C)
rhide(A)
match(B)
C
A
B
B
C
Log Model PSP
Statement:
In the model, after ”C”, “A” occurs before “B”,
while in the log they are mutually exclusive.
A match(A)
rhide(B)
Detecting Causality – Conflict mismatches
22

More Related Content

PPTX
Semantics and Analysis of DMN Decision Tables
PPTX
Metaheuristic Optimization for Automated Business Process Discovery
PPT
Cis435 week01
PPT
random test
PDF
Applications of Stack
PPTX
Isorc18 keynote
PPTX
Repair dagstuhl jan2017
Semantics and Analysis of DMN Decision Tables
Metaheuristic Optimization for Automated Business Process Discovery
Cis435 week01
random test
Applications of Stack
Isorc18 keynote
Repair dagstuhl jan2017

Viewers also liked (7)

PPTX
From Conceptual to Executable BPMN Process Models A Step-by-Step Method
PPTX
Business Process Monitoring and Mining
PDF
Stargel - Multi-Scale Structural Mechanics and Prognosis - Spring Review 2013
PPTX
BetaGroup - Tech Trends in 2017, a snap shot by BetaGroup
PPTX
Process Mining and Predictive Process Monitoring
PPTX
Predictive Business Process Monitoring with LSTM Neural Networks
PPTX
BPM Techniques and Tools: A Quick Tour of the BPM Lifecycle
From Conceptual to Executable BPMN Process Models A Step-by-Step Method
Business Process Monitoring and Mining
Stargel - Multi-Scale Structural Mechanics and Prognosis - Spring Review 2013
BetaGroup - Tech Trends in 2017, a snap shot by BetaGroup
Process Mining and Predictive Process Monitoring
Predictive Business Process Monitoring with LSTM Neural Networks
BPM Techniques and Tools: A Quick Tour of the BPM Lifecycle
Ad

Similar to Scalable Conformance Checking of Business Processes (20)

PPT
Boetticher Presentation Promise 2008v2
PDF
My Postdoctoral Research
PDF
Valerii Vasylkov Erlang. measurements and benefits.
PDF
SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"
PDF
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
PDF
Crude-Oil Scheduling Technology: moving from simulation to optimization
PPT
Dill may-2008
PDF
Debug me
PDF
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs
PDF
DFA Minimization using Hopcroft’s Theorem
PDF
Combining genetic algoriths and constraint programming to support stress test...
PDF
1025 track1 Malin
PDF
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data
PDF
Using Interactive Genetic Algorithm for Requirements Prioritization
PDF
On unifying query languages for RDF streams
PPTX
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
PPTX
MuVM: Higher Order Mutation Analysis Virtual Machine for C
PPTX
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
PPT
Sfeldman performance bb_worldemea07
PPT
Assessing the Reliability of a Human Estimator
Boetticher Presentation Promise 2008v2
My Postdoctoral Research
Valerii Vasylkov Erlang. measurements and benefits.
SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Crude-Oil Scheduling Technology: moving from simulation to optimization
Dill may-2008
Debug me
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs
DFA Minimization using Hopcroft’s Theorem
Combining genetic algoriths and constraint programming to support stress test...
1025 track1 Malin
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data
Using Interactive Genetic Algorithm for Requirements Prioritization
On unifying query languages for RDF streams
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
MuVM: Higher Order Mutation Analysis Virtual Machine for C
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
Sfeldman performance bb_worldemea07
Assessing the Reliability of a Human Estimator
Ad

More from Marlon Dumas (20)

PPTX
LLM-Assisted Optimization of Waiting Time in Business Processes: A Prompting ...
PPTX
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
PPTX
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
PPTX
How GenAI will (not) change your business?
PPTX
Walking the Way from Process Mining to AI-Driven Process Optimization
PPTX
Discovery and Simulation of Business Processes with Probabilistic Resource Av...
PPTX
Can I Trust My Simulation Model? Measuring the Quality of Business Process Si...
PPTX
Business Process Optimization: Status and Perspectives
PPTX
Learning When to Treat Business Processes: Prescriptive Process Monitoring wi...
PPTX
Why am I Waiting Data-Driven Analysis of Waiting Times in Business Processes
PPTX
Augmented Business Process Management
PPTX
Process Mining and Data-Driven Process Simulation
PPTX
Modeling Extraneous Activity Delays in Business Process Simulation
PPTX
Business Process Simulation with Differentiated Resources: Does it Make a Dif...
PPTX
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints
PPTX
Robotic Process Mining
PPTX
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
PPTX
Learning Accurate Business Process Simulation Models from Event Logs via Auto...
PPTX
Process Mining: A Guide for Practitioners
PPTX
Process Mining for Process Improvement.pptx
LLM-Assisted Optimization of Waiting Time in Business Processes: A Prompting ...
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
How GenAI will (not) change your business?
Walking the Way from Process Mining to AI-Driven Process Optimization
Discovery and Simulation of Business Processes with Probabilistic Resource Av...
Can I Trust My Simulation Model? Measuring the Quality of Business Process Si...
Business Process Optimization: Status and Perspectives
Learning When to Treat Business Processes: Prescriptive Process Monitoring wi...
Why am I Waiting Data-Driven Analysis of Waiting Times in Business Processes
Augmented Business Process Management
Process Mining and Data-Driven Process Simulation
Modeling Extraneous Activity Delays in Business Process Simulation
Business Process Simulation with Differentiated Resources: Does it Make a Dif...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints
Robotic Process Mining
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
Learning Accurate Business Process Simulation Models from Event Logs via Auto...
Process Mining: A Guide for Practitioners
Process Mining for Process Improvement.pptx

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
The scientific heritage No 166 (166) (2025)
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
BIOMOLECULES PPT........................
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
The KM-GBF monitoring framework – status & key messages.pptx
Biophysics 2.pdffffffffffffffffffffffffff
ECG_Course_Presentation د.محمد صقران ppt
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
The scientific heritage No 166 (166) (2025)
Phytochemical Investigation of Miliusa longipes.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
2Systematics of Living Organisms t-.pptx
Cell Membrane: Structure, Composition & Functions
POSITIONING IN OPERATION THEATRE ROOM.ppt
Derivatives of integument scales, beaks, horns,.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
7. General Toxicologyfor clinical phrmacy.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
BIOMOLECULES PPT........................
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...

Scalable Conformance Checking of Business Processes

  • 1. Scalable Conformance Checking for Business Processes Daniel Reißner, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Abel Armas-Cervantes 1
  • 2. Process mining Process mining is a family of methods for analyzing business processes based on event logs. • Some of the most important process mining operations: • Discovery • Conformance checking • Enhancement 2
  • 3. Process mining Process mining is a family of methods for analyzing business processes based on event logs. • Some of the most important process mining operations: • Discovery • Conformance checking • Enhancement Model v1 Log 3
  • 4. Applications of conformance checking Compliance auditing Model quality measures Model repair Deviance mining Conformance checking How well do process executions fit to a normative model? What is the quality of a discovered process model? How can we adapt the process model to fit reality better? What are the current compliance risks? Are there any employee innovations? ➢ Fitness, Precision etc. 4
  • 5. id trace (1) C, B, D, F, E (2) ⟨ B, C, D, E, I, G, D, F ⟩ Trace Alignment (1): 1/2 Log Model compare Event LogProcess model FDBC E BB CC DD E G FF I Trace Alignment (1): 2/2 Log Model One optimal alignmentAll optimal alignments ≫ FDBC E B C D E ≫ Trace Alignments • Mismatches are reported as task misalignments, i.e. moves on model or moves on log ≫ • The one-optimal variant returns one model path with a minimal number of misalignments • Adopt interleaving semantics • Build a synchronous net for each trace • Use an 𝐴∗-Algorithm to find the closest trace in the model for each trace in the log Existing approaches: Trace Alignments 5 • All optimal alignments aim at returning all possible model-path with minimal number of misalignments
  • 6. Existing approaches: Behavioral Alignment id trace 1 C, B, D, F, E 2 ⟨ B, C, D, E, I, G, D, F ⟩compare Behavioral Alignment • Adopts true concurrency semantics • Translates model and log to prime event structures (PES) • Uses an 𝐴∗ -Algorithm to find the closest run in the model PES for each run in the log PES 6 Event LogProcess model
  • 7. Existing approaches: Behavioral Alignment id trace 1 C, B, D, F, E 2 ⟨ B, C, D, E, I, G, D, F ⟩ Event LogProcess model B C D E F I G B C D F E E GI D F 7
  • 8. Existing approaches: Behavioral Alignment compare PES of event LogPES of process model Behavioral Alignment B C D E F I G B C D F E E GI D F Behavioral Mismatch (𝟏): In the Log, after ‘D’, ’F’ is substituted by ‘E’. Behavioral Mismatch (𝟐): In the Log, after ‘D’, ’F’ occurs before ‘E’, while in the model they are mutually exclusive. 8 • Mismatches are gathered as event misalignments, i.e. moves on model or moves on log ≫ • Mismatches of behavioral relations can be detected • Differences can be reported as natural language statements
  • 9. Scalability challenges of current approaches • Trace alignment does not scale up with large logs. In some cases trace alignment is not capable of computing all optimal alignments • Behavioral alignment is generally slower than trace alignment • Scalability issues of the conformance checkers can affect other techniques, such as model repair or process discovery, which rely on conformance checking to justify the quality of their outputs 9
  • 10. Research question and desiderata RQ: How can we improve scalability of conformance checking techniques with large and noisy event logs while still providing a complete set of differences? Desiderata: • Compute one- or all-optimal alignments • Report the results of the conformance checking as trace alignments and behavioral statements 10
  • 11. Overview and general idea Petri Net compress DAFSA Reachability Graph PSP Event Log Optimal Alignments Difference Statements expand compare (1) (2) (3) 11
  • 12. From event log to DAFSA Trace N ⟨ 𝐵, 𝐷, 𝐸 ⟩ 5 ⟨ 𝐵, 𝐷, 𝐹 ⟩ 10 ⟨ 𝐶, 𝐵, 𝐷, 𝐸 ⟩ 15 ⟨ 𝐶, 𝐵, 𝐷, 𝐹 ⟩ 5 Log s 𝑛1 𝑛2 𝑓1 B D E 𝑛3 BC 𝑓2 F 𝑛4 𝑛5 𝑓3 D E 𝑓4 F DAFSA = Prefixes ⟨ 𝐵, 𝐷 ⟩ , ⟨ 𝐶, 𝐵, 𝐷 ⟩ Suffixes 𝐷, 𝐹 , ⟨ 𝐷, 𝐸 ⟩ 12
  • 13. 𝜏 𝐵 𝐶 𝜏 𝐷 𝐹𝐸 𝐼 𝐺 Petri net 𝑝3 𝑝6 𝑝5 𝑝4 𝑝2 𝑝1 𝑝10 𝑝8 𝑝9𝑝7 𝜏 From process model to reachability graph [𝑝1] [𝑝2, 𝑝3] Process model [𝑝5, 𝑝3] Reachability graph [𝑝2, 𝑝4] [𝑝5, 𝑝4] [𝑝6] [𝑝7] [𝑝8][𝑝9] [𝑝10] τ B I G ED F τ C C B B C D x Why to remove 𝜏-transitions:How to • Reduce state space for conformance checking • Reduce uninterpretable conformance results for end user • For each 𝜏 not targeting a final marking, insert a copy of each outgoing arc of the target of 𝜏 and link it to the source, • otherwise, use each incoming arc of its source Removing unconnected markings 𝜏-less Reachability graph F τ 13
  • 14. ⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩ PSP construction with the A∗ - Algorithm [𝑝1] [𝑝5, 𝑝3] [𝑝2, 𝑝4] [𝑝5, 𝑝4] [𝑝6] [𝑝7] [𝑝8][𝑝9] [𝑝10] I G ED F C B B C D 𝝉-less Reachability graph F s 𝑛1 𝑛2 𝑓1 B D E 𝑛3 BC F DAFSA ( 𝑝1 , 𝑠) ( 𝑝5, 𝑝3 , 𝑛1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩ ( 𝑝5, 𝑝3 , 𝑠) ( 𝑝1 , 𝑛1) ⟨𝑟ℎ𝑖𝑑𝑒, 𝐵⟩ ⟨𝑙ℎ𝑖𝑑𝑒, 𝐵⟩ ( 𝑝2, 𝑝4 , 𝑠) ⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩ ⟨ 𝐵, 𝐷, 𝐸 ⟩ current trace 𝑐 = 1 𝑔 = 0 ℎ = 1 𝑐 = 3 𝑔 = 1 ℎ = 2 𝑐 = 1 𝑔 = 1 ℎ = 0 ( 𝑝2, 𝑝4 , 𝑠) ( 𝑝5, 𝑝4 , 𝑛1) ( 𝑝2, 𝑝4 , 𝑛1) ⟨𝑙ℎ𝑖𝑑𝑒, 𝐵⟩ ( 𝑝5, 𝑝4 , 𝑠) 𝑐 = 1 𝑔 = 1 ℎ = 0 𝑐 = 3 𝑔 = 2 ℎ = 1 𝑐 = 3 𝑔 = 2 ℎ = 1 ⟨𝑟ℎ𝑖𝑑𝑒, 𝐵⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩ ( 𝑝5, 𝑝4 , 𝑛1) ( 𝑝7 , 𝑛2) ( 𝑝10 , 𝑓1) ( 𝑝5, 𝑝4 , 𝑛1) ( 𝑝7 , 𝑛2) ( 𝑝10 , 𝑓1) ⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩ ( 𝑝5, 𝑝3 , 𝑛1) ✓ 𝑐 = 3 𝑔 = 1 ℎ = 2 𝑐 = 1 Prefix Memoization ⟨ 𝐵, 𝐷 ⟩ Node 1, Node 2 𝑐 = 1 ⟨ 𝐵, 𝐷, 𝐹 ⟩ ( 𝑝10 , 𝑓1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩ ( 𝑝10 , 𝑓1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩ node, Suffix Memoization ( 𝑝5, 𝑝4 , 𝑛1), ⟨ 𝐷, 𝐸 ⟩ Path to node 3 ( 𝑝2, 𝑝4 , 𝑛3) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐶⟩ ( 𝑝5, 𝑝4 , 𝑛1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩ ( 𝑝7 , 𝑛2) ( 𝑝10 , 𝑓1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩ ⟨ 𝐶, 𝐵, 𝐷, 𝐸 ⟩⟨ 𝐶, 𝐵, 𝐷, 𝐹 ⟩ ( 𝑝7 , 𝑛2) ( 𝑝10 , 𝑓1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩ ( 𝑝10 , 𝑓1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩ ⟨ 𝐶, 𝐵, 𝐷 ⟩ Node 4 1 2 3 4 14
  • 15. Patterns for conformance checking diagnosis Unfitting behavior: • Relation mismatch: 1. Causality-Concurrency 2. Conflict • Event mismatch: 3. Task skipping 4. Task substitution 5. Unmatched repetition 6. Task relocation 7. Task insertion / absence L. García-Bañuelos, N. R.T.P. van Beest , M. Dumas, and M. La Rosa, and W. Mertens: Complete and interpretable conformance checking of business processes. IEEE Trans. Softw. Eng.: 2017 15
  • 16. Pattern detection in the example ⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩ ( 𝑝1 , 𝑠) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩ ( 𝑝2, 𝑝4 , 𝑠) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩ ( 𝑝5, 𝑝4 , 𝑛1) ( 𝑝7 , 𝑛2) ( 𝑝10 , 𝑓1) ( 𝑝5, 𝑝4 , 𝑛1) ( 𝑝7 , 𝑛2) ( 𝑝10 , 𝑓1) ⟨𝑟ℎ𝑖𝑑𝑒, 𝐶⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩ ( 𝑝5, 𝑝3 , 𝑛1) ( 𝑝10 , 𝑓1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩ ( 𝑝10 , 𝑓1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩ ( 𝑝2, 𝑝4 , 𝑛3) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐶⟩ ( 𝑝5, 𝑝4 , 𝑛1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐵⟩ ( 𝑝7 , 𝑛2) ( 𝑝10 , 𝑓1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐸⟩ ⟨𝑚𝑎𝑡𝑐ℎ, 𝐷⟩ ( 𝑝10 , 𝑓1) ⟨𝑚𝑎𝑡𝑐ℎ, 𝐹⟩ Behavioral alignment feedback: • In the log, at the start of the trace, “C” is optional • In the model, after “B”, “C” occurs before “D” 16
  • 17. Evaluation setup • Implemented approach in an open source java tool: ProConformance 2.0 (available from http://apromore.org/platform/tools) • Tested the approach in three setups: • Road traffic fines management process (RTFMP) ➢ publicly available model - log pair • BPI Challenge Log 2013 (BPIC13) ➢ artificially generated process model • SAP R/3 model collection (120 models) ➢ artificially created logs (2.5% → 10% noise) • 480 model-log pairs 17
  • 18. Evaluation results 18 Key findings: • In the case of all-optimal, our technique outperforms trace alignments by 1-2 orders of magnitude • Trace alignments timed out in 207 / 480 SAP cases (given a time bound) • In the case of one-optimal, our technique performs from 1.5 to nearly 40 times faster than trace alignment • In BPIC13, one-optimal trace alignment outperforms our technique
  • 19. Evaluation results 19 Optimal alignments (upper bound of 95% confidence interval) All optimal Dataset DAFSA Trace align. [#unfiltered] RTFMP 467 338 [1,898,182] BPIC13 cp. 28,656 22,259 [1,904,057] SAP R/3 2.5% 4,253 (22,675) 1,233[1,067,533] (6,470 [1,929,629]) SAP R/3 5% 7,672 (41,133) 1,751[1,224,079] (9,178 [2,199,248]) SAP R/3 7.5% 11,652 (61,504) 2,154 [1,283,583] (14,207 [3,039,240]) SAP R/3 10% 15,754 (84,167) 2,809 [1,286,568] (22,883 [3,302,068]) We detected 5 times more (all optimal) alignments
  • 20. Future work • Improve the handling of concurrency and nested loops • Evaluate our technique using more complex models and logs • Extend our technique to detect additional model behavior • Explore different applications for our technique, e.g., process model repair, drift detection, log delta analysis, etc. 20
  • 21. 21
  • 22. Pattern detection in PSP Statement: In the log, after ”C”, “A” is optional. Detecting task skips C A B Model match(C) rhide(A) match(B) PSP match(A) match(B) B C Log A B PSP rhide(A) match(B) match(C) lhide(A) B C A Log A B C Model Statement: In the log, ”A” appears after “C” instead of the initial marking. Detecting task relocations match(C) rhide(A) match(B) C A B B C Log Model PSP Statement: In the model, after ”C”, “A” occurs before “B”, while in the log they are mutually exclusive. A match(A) rhide(B) Detecting Causality – Conflict mismatches 22

Editor's Notes

  • #6: ADOPT
  • #10: [1] Verbeek, H. M. W., & van der Aalst, W. M. (2016, June). Merging alignments for decomposed replay. In International Conference on Applications and Theory of Petri Nets and Concurrency (pp. 219-239). Springer International Publishing. [2] L. Garc ́ıa-Ban ̃uelos, N. van Beest, M. Dumas, M. La Rosa, and W. Mertens. Complete and interpretable conformance checking of business processes. IEEE TSE, 43, 2017. In press.
  • #14: Translate nondeterministic to deterministic automaton
  • #16: Unmatched behavior as a way to avoid reporting in the generalization We identified a complete set of mismatch patterns (these in the slide are those for conformance checking, we have similar ones for log delta analysis) For each of these patterns we have a verbalization in natural language --- We only report on immediate causality (not transitive causality) and direct conflict (not inherited conflict) because we want to report each mismatch once: 1. Immediate causality vs concurrency 2. direct conflict vs concurrency direct conflict vs immediate causality Each mismatch occurs in a given context, i.e. a pair of configurations, one for each PES Relation mismatch patterns are O(n) where n is the number of arcs of the PSP (via optimizations of O(n^3)) --- Task absence / insertion is a “catch all” pattern, essentially saying that there is a task at a given configuration in the PES of the log but not in the corresponding configuration in the PES of the model --- Complete finding fitness-related differences: Concurrency (Log) – Conflict Concurrency (Log) – Causality
  • #21: (S-components) Additional model behavior: unobserved behavior in the model but present in the log We proposed a scalable conformance checking technique for handling large and nonconforming event logs We remapped the problem of Conformance checking to automaton synchronization: DAFSA of an event Log vs reachability graph of a model We show that our technique scales well with big event logs, but imprecise process models impose a challenge