SlideShare a Scribd company logo
Maximal Pattern Mining
Veronica Liesaputra1, Sira Yongchareon1 & Shivadon Chaisiri2
1Unitec Institute of Technology & 2University of Waikato
New Zealand
Background
 Businesses use process models to help them monitor and
improve their performance
 Hand-made  High level and understandable
 Do not align with reality  Incorrect decisions
 Abundance of event data
 Generates models based on real processes  process mining
Event Log
 Trace: Sequentially recorded events
 START  Turn on hot & cold water 
Check whether it is too hot/cold  Wait for 2 minutes 
Check whether it has enough water  Wait for 2 minutes 
Check whether it has enough water  Turn Off hot & cold taps 
END
 Event name, agent, timestamp, resource, input and output data
 To simplify, we only consider the event’s name
• Represent each event with symbols  𝑎𝑎, 𝑏𝑏, 𝑐𝑐, 𝑒𝑒, 𝑓𝑓, 𝑒𝑒, 𝑓𝑓, 𝑔𝑔, ℎ
a
b
d
e
g
h
c
f
Process Discovery
 Given a set of traces { 𝑎𝑎, 𝑏𝑏, 𝑐𝑐, 𝑒𝑒, 𝑓𝑓, 𝑒𝑒, 𝑓𝑓, 𝑔𝑔, ℎ , 𝑎𝑎, 𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑐𝑐, 𝑑𝑑, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒, 𝑓𝑓, 𝑔𝑔, ℎ } find
the actual model
 Criteria3
 Can model complex constructs:
 Sequence, Optionality, Concurrency, Duplicate tasks, Non-free choice,
Self/Nested loops, Hidden tasks
 Able to handle noise & incomplete logs
 No over-fitting & under-fitting
 Simple & Fast
a
b
d
e
g
h
c
f
3Buijs, J.C.A.M., van Dongen B.F., van der Aalst, W. M. P. (2014) Quality Dimensions in Process Discovery: The importance of fitness, precision,
generalization & simplicity
Existing approaches4
Noise Duplicates
tasks
Hidden
tasks
Non-free
choice
Loops Sound Approach
α++  2 events
Heuristics     2 events
Genetic      Trace
AGNEs      Trace
ILP    Trace
4De Weerdt, J., Baesens, B., Vanthienen, J. (2012) Business process discovery: new techniques and applications
Maximal Pattern Mining
 Goal: To find maximal patterns that cover most of the traces in the logs
 Find loops
 Store frequent patterns and events in vertical format
 Identify concurrent events
 Discover events sequential order
 Check for loops
 Prune non-maximal patterns
 Generate graph
Notations
 Logs  T = {t0 , t1 … tn}
 T = {〈a,b,c,b,b,c,d,e〉, 〈a,b,c,b,b,c,d,e〉, 〈a,b,b,c,e,d〉}
 Trace  tn = 〈z0 , z1 … zm〉
 t2 = 〈a,b,b,c,e,d〉
 Maximal Pattern  P = {p0 , p1 … pi}
 P = {〈a, (b*, c)*, {d, e}〉}
 Pattern  pi = 〈e0, e1 … ej〉
 Event: a, d, e
 Parallelization: {d, e}
 Loops: b*, (b*, c)*
 Support
 a.support = 3
 p0.support = 3
a b c
d
e
AND AND
Loops
 t0 = 〈a, b, d, d, c, b, b, b, d, c, b, d, c, g, g〉
 Self loop
 〈a, b, d*, c, b*, d, c, b, d, c, g*〉
 Nested Loop
 〈a, (b, d*, c)*, g*〉
a b cd g
Vertical Representation
 Patterns are stored as an IdList in bitset representation5
 P = {〈a, (b*, c)*, d, e, a〉,〈a, b*, c, e, d, a〉,〈e, d, a〉}
a b c d e $
id pos id pos id pos id pos id pos id pos
0 0, 5 0 1 0 2 0 3 0 4 0 6
1 0, 5 1 1 1 2 1 4 1 3 1 6
2 2 2 1 2 0 2 3
5Ayres, J., Flannick, J., Gehrke, J., Yiu, T. (2002) Sequential pattern mining using a bitmap representation
Vertical Representation
 Patterns are stored as an IdList in bitset representation5
 Noisy data
 Keeps only frequent events (vk.support ≥ threshv) and patterns (pi.support ≥ threshp)
 Validation set for parameter tuning
 Frequent events/sub-patterns generate frequent super-patterns
 Joint operations to calculate support
 Only requires a single scan through the logs  Fast and memory efficient
5Ayres, J., Flannick, J., Gehrke, J., Yiu, T. (2002) Sequential pattern mining using a bitmap representation
Concurrency
 T = {〈c, a, b, d〉, 〈c, a, d, b〉, 〈c, d, a, b〉, 〈a, c, d, b, a, b, c, d〉}
 Check for possible cases of parallelization
 Ignore order within parallelizations for now
 〈c, a, b, d〉, 〈c, a, d, b〉  〈c, a, {b, d}〉
 〈c, a, {b, d}〉, 〈c, d, a, b〉  〈c, {a, b, d}〉
 Handle incomplete logs: Sub-pattern ⊆ super-pattern
 〈c, {a, b, d}〉, 〈a, c, d, b, a, b, c, d〉  〈{a, b, c, d}〉, 〈{a, b, c, d}, a, b, c, d〉
a b
dc
AND AND
Concurrency
 T = {〈c, a, b, d〉, 〈c, a, d, b〉, 〈c, d, a, b〉, 〈a, c, d, b, a, b, c, d〉}
 Check for possible cases of parallelization
 Store event order for incomplete inclusion
 〈c, a, {b, d}〉, 〈c, d, a, b〉  〈c, {a, b, d}〉
 𝑎𝑎 → 𝑏𝑏 but 𝑏𝑏 ↛ 𝑎𝑎 : a  b
 𝑎𝑎 → 𝑑𝑑 and 𝑑𝑑 → 𝑎𝑎 : {a, d}
 〈c, a, {b, d}〉, 〈c, d, a, b〉  〈c, {(a, b), d}〉
 〈c, {(a, b), d}〉, 〈a, c, d, b, a, b〉  〈{(a, b), (c, d)}〉, 〈{(a, b), (c, d)}, a, b, c, d〉
 𝑐𝑐 → 𝑑𝑑 but 𝑑𝑑 ↛ 𝑐𝑐 : c  d
a b
dc
AND AND
Concurrency
 T = {〈c, a, b, d〉, 〈c, a, d, b〉, 〈c, d, a, b〉, 〈a, c, d, b, a, b, c, d〉}
 Check for possible cases of parallelization
 Store event order for incomplete inclusion
 Solve loops
 〈{(a, b), (c, d)}, a, b, c, d〉  〈{(a, b), (c, d)}*〉
a b
dc
AND AND
Maximal Patterns
 P = {〈a, b, c, d, e〉, 〈a, {b, c*}, d, e〉, 〈a, b, c, e〉, 〈a, f, e〉, 〈f, g〉, 〈f, h〉}
 Pattern pi is maximal ⟺ no other pattern pj in P covers the same or more traces
 P = {〈a, {b, c*}, d, e〉, 〈a, b, c, e〉, 〈a, f, e〉, 〈f, g〉, 〈f, h〉}
a
b d
c
AND
XOR
XOR
XORAND
∅
XOR
f
eXOR
f XOR XOR
g
h
XOR
Loops {ABCE, ACBE, ABDDCE}
α++ MPM
Duplicates {ADAF, AEAF, AHBAG, AHCAG}
Heuristic Miners MPM
A
H
G
B
F
E
D
C
XORA
H
XOR
D
E
XOR
XOR
B
C
XOR
A F
A G
XOR
Non-free choice {ABC, ABDE, ADBE}
AGNEs MPM
A
C
ED
B
Duplicates in Parallel {ACBA, ACAB, CAAB, CABA, ABCA}
Genetic Miner MPM
AND AND
C
A
A
B
XOR
A
B
C
XORAND AND
Evaluation
 Criteria3:
 Replay fitness: Alignment distance function6
 Simplicity3
 Precision (no under-fit)7
 Generalization (no over-fit): k-fold cross validation
 Time
van der Aalst, W. M. P., Adriansyah, A., van Dongen, B. (2012) Replaying History on Process Models for Conformance Checking & Performance Analysis
Adriansyah, A., Munoz-Game, J., Carmona, J., van Dongen, B.F., van der Aalst, W.M.P. (2015) Measuring Precision of Modeled Behaviour
3Buijs, J.C.A.M., van Dongen B.F., van der Aalst, W. M. P. (2014) Quality Dimensions in Process Discovery: The importance of fitness, precision,
generalization & simplicity
6
7
Synthetic data
 300 to 350 traces with max. 10 unique events
 α++: Under-fits & unsound
 Genetic: Duplicates & very slow
 ILP: Under-fits & slow
 MPM: Duplicates
 Heuristics & AGNEs: Mid-range
Fitness Precision Simplicity Time
α++ 0.7 0.5 1.0 250 ms
Genetic 0.9 0.9 0.7 1 hour
Heuristics 0.8 0.7 0.8 10 s
AGNEs 0.8 0.8 0.6 5 mins
ILP 1.0 0.6 0.9 2 mins
MPM 1.0 0.9 0.7 150 ms
Fitness Precision Simplicity Time
α++ 3 3 1 1
Genetic 1 1 3 4
Heuristics 2 3 2 2
AGNEs 2 2 3 3
ILP 1 3 1 3
MPM 1 1 3 1
Real-life data
 Similar results
 α++ : Under-fits & unsound
 Genetic: Very slow
 ILP: Under-fits & slow
 MPM: Duplicates
 Heuristics: Unsound & slow
 AGNEs: Incorrect false negatives
Fitness Precision Simplicity Time
α++ 0.3 0.4 0.9 10 mins
Genetic DNF DNF DNF >5 days
Heuristics 0.7 0.8 0.8 1 hour
AGNEs 0.5 0.6 0.3 20 hours
ILP 1.0 0.5 0.8 2 hours
MPM 0.9 0.9 0.7 9 mins
Fitness Precision Simplicity Time
α++ 4 3 1 1
Genetic - - - -
Heuristics 2 1 1 2
AGNEs 3 2 3 3
ILP 1 3 1 2
MPM 1 1 2 1
Conclusions
 MPM
 Can model complex constructs:
 Sequence, Optionality, Concurrency, Duplicate tasks, Non-free choice, Self/Nested loops, Hidden tasks
 Cannot handle duplicate tasks in parallel processes
 Able to handle noise & incomplete logs
 No over-fitting & under-fitting
 Fast  Capable of stream mining
 Uses duplicate events  Low simplicity score
 Task abstractions
Questions?
Email: vliesaputra@unitec.ac.nz

More Related Content

PDF
Heaps
PPT
Faceting optimizations for Solr
PDF
An artifact centric view-based approach to modeling inter-organizational busi...
PPT
An artifact centric approach to generating web-based business process driven ...
PPT
我的个人简介
PDF
A process view framework for artifact centric business processes
PPT
BPMN process views construction
PDF
A Framework for Behavior consistent specialization of artifact-centric busine...
Heaps
Faceting optimizations for Solr
An artifact centric view-based approach to modeling inter-organizational busi...
An artifact centric approach to generating web-based business process driven ...
我的个人简介
A process view framework for artifact centric business processes
BPMN process views construction
A Framework for Behavior consistent specialization of artifact-centric busine...

Similar to Efficient Process Model Discovery Using Maximal Pattern Mining (20)

PPT
Basics of data structure types of data structures
PPTX
Data mining presentation.ppt
PDF
A Short Course in Data Stream Mining
PPT
Team activity analysis / visualization
PDF
Data Structure: Algorithm and analysis
PPT
Master method
PPTX
Interpretable Process Mining: shifting control to end users
PDF
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
PDF
Outrageous Ideas for Graph Databases
PPTX
Asymptotic Analysis in Data Structures and Analysis
PPT
Introduction to Algorithms
PDF
Direct use of hydroclimatic information for reservoir operation
PPTX
04-Data-Analysis-Overview.pptx
PPTX
Design and Analysis of Algorithms Lecture Notes
PDF
Dealing with latent discrete parameters in Stan
PDF
Internet of Things Data Science
PPTX
Teaching Population Genetics with R
PPT
Chapter 1 & 2 - Introduction dhjgsdkjfsaf.ppt
PPTX
Introduction to data structures and complexity.pptx
Basics of data structure types of data structures
Data mining presentation.ppt
A Short Course in Data Stream Mining
Team activity analysis / visualization
Data Structure: Algorithm and analysis
Master method
Interpretable Process Mining: shifting control to end users
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Outrageous Ideas for Graph Databases
Asymptotic Analysis in Data Structures and Analysis
Introduction to Algorithms
Direct use of hydroclimatic information for reservoir operation
04-Data-Analysis-Overview.pptx
Design and Analysis of Algorithms Lecture Notes
Dealing with latent discrete parameters in Stan
Internet of Things Data Science
Teaching Population Genetics with R
Chapter 1 & 2 - Introduction dhjgsdkjfsaf.ppt
Introduction to data structures and complexity.pptx
Ad

More from Dr. Sira Yongchareon (9)

PDF
A workflow execution platform for collaborative artifact centric business pro...
PDF
A view framework for modeling and change validation of artifact centric inter...
PPT
An Artifact-centric View-based Approach to Modeling Inter-organizational Busi...
PPTX
A framework for behavior consistent specialization of artifact-centric busine...
PPT
A framework for realizing artifact centric business processes in soa
PDF
A framework for realizing artifact centric business processes in SOA
PDF
An artifact centric approach to generating web-based business process driven ...
PDF
BPMN process views construction
PPT
Process view framework for artifact centric business processes
A workflow execution platform for collaborative artifact centric business pro...
A view framework for modeling and change validation of artifact centric inter...
An Artifact-centric View-based Approach to Modeling Inter-organizational Busi...
A framework for behavior consistent specialization of artifact-centric busine...
A framework for realizing artifact centric business processes in soa
A framework for realizing artifact centric business processes in SOA
An artifact centric approach to generating web-based business process driven ...
BPMN process views construction
Process view framework for artifact centric business processes
Ad

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx

Efficient Process Model Discovery Using Maximal Pattern Mining

  • 1. Maximal Pattern Mining Veronica Liesaputra1, Sira Yongchareon1 & Shivadon Chaisiri2 1Unitec Institute of Technology & 2University of Waikato New Zealand
  • 2. Background  Businesses use process models to help them monitor and improve their performance  Hand-made  High level and understandable  Do not align with reality  Incorrect decisions  Abundance of event data  Generates models based on real processes  process mining
  • 3. Event Log  Trace: Sequentially recorded events  START  Turn on hot & cold water  Check whether it is too hot/cold  Wait for 2 minutes  Check whether it has enough water  Wait for 2 minutes  Check whether it has enough water  Turn Off hot & cold taps  END  Event name, agent, timestamp, resource, input and output data  To simplify, we only consider the event’s name • Represent each event with symbols  𝑎𝑎, 𝑏𝑏, 𝑐𝑐, 𝑒𝑒, 𝑓𝑓, 𝑒𝑒, 𝑓𝑓, 𝑔𝑔, ℎ a b d e g h c f
  • 4. Process Discovery  Given a set of traces { 𝑎𝑎, 𝑏𝑏, 𝑐𝑐, 𝑒𝑒, 𝑓𝑓, 𝑒𝑒, 𝑓𝑓, 𝑔𝑔, ℎ , 𝑎𝑎, 𝑏𝑏, 𝑐𝑐, 𝑑𝑑, 𝑐𝑐, 𝑑𝑑, 𝑐𝑐, 𝑑𝑑, 𝑒𝑒, 𝑓𝑓, 𝑔𝑔, ℎ } find the actual model  Criteria3  Can model complex constructs:  Sequence, Optionality, Concurrency, Duplicate tasks, Non-free choice, Self/Nested loops, Hidden tasks  Able to handle noise & incomplete logs  No over-fitting & under-fitting  Simple & Fast a b d e g h c f 3Buijs, J.C.A.M., van Dongen B.F., van der Aalst, W. M. P. (2014) Quality Dimensions in Process Discovery: The importance of fitness, precision, generalization & simplicity
  • 5. Existing approaches4 Noise Duplicates tasks Hidden tasks Non-free choice Loops Sound Approach α++  2 events Heuristics     2 events Genetic      Trace AGNEs      Trace ILP    Trace 4De Weerdt, J., Baesens, B., Vanthienen, J. (2012) Business process discovery: new techniques and applications
  • 6. Maximal Pattern Mining  Goal: To find maximal patterns that cover most of the traces in the logs  Find loops  Store frequent patterns and events in vertical format  Identify concurrent events  Discover events sequential order  Check for loops  Prune non-maximal patterns  Generate graph
  • 7. Notations  Logs  T = {t0 , t1 … tn}  T = {〈a,b,c,b,b,c,d,e〉, 〈a,b,c,b,b,c,d,e〉, 〈a,b,b,c,e,d〉}  Trace  tn = 〈z0 , z1 … zm〉  t2 = 〈a,b,b,c,e,d〉  Maximal Pattern  P = {p0 , p1 … pi}  P = {〈a, (b*, c)*, {d, e}〉}  Pattern  pi = 〈e0, e1 … ej〉  Event: a, d, e  Parallelization: {d, e}  Loops: b*, (b*, c)*  Support  a.support = 3  p0.support = 3 a b c d e AND AND
  • 8. Loops  t0 = 〈a, b, d, d, c, b, b, b, d, c, b, d, c, g, g〉  Self loop  〈a, b, d*, c, b*, d, c, b, d, c, g*〉  Nested Loop  〈a, (b, d*, c)*, g*〉 a b cd g
  • 9. Vertical Representation  Patterns are stored as an IdList in bitset representation5  P = {〈a, (b*, c)*, d, e, a〉,〈a, b*, c, e, d, a〉,〈e, d, a〉} a b c d e $ id pos id pos id pos id pos id pos id pos 0 0, 5 0 1 0 2 0 3 0 4 0 6 1 0, 5 1 1 1 2 1 4 1 3 1 6 2 2 2 1 2 0 2 3 5Ayres, J., Flannick, J., Gehrke, J., Yiu, T. (2002) Sequential pattern mining using a bitmap representation
  • 10. Vertical Representation  Patterns are stored as an IdList in bitset representation5  Noisy data  Keeps only frequent events (vk.support ≥ threshv) and patterns (pi.support ≥ threshp)  Validation set for parameter tuning  Frequent events/sub-patterns generate frequent super-patterns  Joint operations to calculate support  Only requires a single scan through the logs  Fast and memory efficient 5Ayres, J., Flannick, J., Gehrke, J., Yiu, T. (2002) Sequential pattern mining using a bitmap representation
  • 11. Concurrency  T = {〈c, a, b, d〉, 〈c, a, d, b〉, 〈c, d, a, b〉, 〈a, c, d, b, a, b, c, d〉}  Check for possible cases of parallelization  Ignore order within parallelizations for now  〈c, a, b, d〉, 〈c, a, d, b〉  〈c, a, {b, d}〉  〈c, a, {b, d}〉, 〈c, d, a, b〉  〈c, {a, b, d}〉  Handle incomplete logs: Sub-pattern ⊆ super-pattern  〈c, {a, b, d}〉, 〈a, c, d, b, a, b, c, d〉  〈{a, b, c, d}〉, 〈{a, b, c, d}, a, b, c, d〉 a b dc AND AND
  • 12. Concurrency  T = {〈c, a, b, d〉, 〈c, a, d, b〉, 〈c, d, a, b〉, 〈a, c, d, b, a, b, c, d〉}  Check for possible cases of parallelization  Store event order for incomplete inclusion  〈c, a, {b, d}〉, 〈c, d, a, b〉  〈c, {a, b, d}〉  𝑎𝑎 → 𝑏𝑏 but 𝑏𝑏 ↛ 𝑎𝑎 : a  b  𝑎𝑎 → 𝑑𝑑 and 𝑑𝑑 → 𝑎𝑎 : {a, d}  〈c, a, {b, d}〉, 〈c, d, a, b〉  〈c, {(a, b), d}〉  〈c, {(a, b), d}〉, 〈a, c, d, b, a, b〉  〈{(a, b), (c, d)}〉, 〈{(a, b), (c, d)}, a, b, c, d〉  𝑐𝑐 → 𝑑𝑑 but 𝑑𝑑 ↛ 𝑐𝑐 : c  d a b dc AND AND
  • 13. Concurrency  T = {〈c, a, b, d〉, 〈c, a, d, b〉, 〈c, d, a, b〉, 〈a, c, d, b, a, b, c, d〉}  Check for possible cases of parallelization  Store event order for incomplete inclusion  Solve loops  〈{(a, b), (c, d)}, a, b, c, d〉  〈{(a, b), (c, d)}*〉 a b dc AND AND
  • 14. Maximal Patterns  P = {〈a, b, c, d, e〉, 〈a, {b, c*}, d, e〉, 〈a, b, c, e〉, 〈a, f, e〉, 〈f, g〉, 〈f, h〉}  Pattern pi is maximal ⟺ no other pattern pj in P covers the same or more traces  P = {〈a, {b, c*}, d, e〉, 〈a, b, c, e〉, 〈a, f, e〉, 〈f, g〉, 〈f, h〉} a b d c AND XOR XOR XORAND ∅ XOR f eXOR f XOR XOR g h XOR
  • 15. Loops {ABCE, ACBE, ABDDCE} α++ MPM
  • 16. Duplicates {ADAF, AEAF, AHBAG, AHCAG} Heuristic Miners MPM A H G B F E D C XORA H XOR D E XOR XOR B C XOR A F A G XOR
  • 17. Non-free choice {ABC, ABDE, ADBE} AGNEs MPM A C ED B
  • 18. Duplicates in Parallel {ACBA, ACAB, CAAB, CABA, ABCA} Genetic Miner MPM AND AND C A A B XOR A B C XORAND AND
  • 19. Evaluation  Criteria3:  Replay fitness: Alignment distance function6  Simplicity3  Precision (no under-fit)7  Generalization (no over-fit): k-fold cross validation  Time van der Aalst, W. M. P., Adriansyah, A., van Dongen, B. (2012) Replaying History on Process Models for Conformance Checking & Performance Analysis Adriansyah, A., Munoz-Game, J., Carmona, J., van Dongen, B.F., van der Aalst, W.M.P. (2015) Measuring Precision of Modeled Behaviour 3Buijs, J.C.A.M., van Dongen B.F., van der Aalst, W. M. P. (2014) Quality Dimensions in Process Discovery: The importance of fitness, precision, generalization & simplicity 6 7
  • 20. Synthetic data  300 to 350 traces with max. 10 unique events  α++: Under-fits & unsound  Genetic: Duplicates & very slow  ILP: Under-fits & slow  MPM: Duplicates  Heuristics & AGNEs: Mid-range Fitness Precision Simplicity Time α++ 0.7 0.5 1.0 250 ms Genetic 0.9 0.9 0.7 1 hour Heuristics 0.8 0.7 0.8 10 s AGNEs 0.8 0.8 0.6 5 mins ILP 1.0 0.6 0.9 2 mins MPM 1.0 0.9 0.7 150 ms Fitness Precision Simplicity Time α++ 3 3 1 1 Genetic 1 1 3 4 Heuristics 2 3 2 2 AGNEs 2 2 3 3 ILP 1 3 1 3 MPM 1 1 3 1
  • 21. Real-life data  Similar results  α++ : Under-fits & unsound  Genetic: Very slow  ILP: Under-fits & slow  MPM: Duplicates  Heuristics: Unsound & slow  AGNEs: Incorrect false negatives Fitness Precision Simplicity Time α++ 0.3 0.4 0.9 10 mins Genetic DNF DNF DNF >5 days Heuristics 0.7 0.8 0.8 1 hour AGNEs 0.5 0.6 0.3 20 hours ILP 1.0 0.5 0.8 2 hours MPM 0.9 0.9 0.7 9 mins Fitness Precision Simplicity Time α++ 4 3 1 1 Genetic - - - - Heuristics 2 1 1 2 AGNEs 3 2 3 3 ILP 1 3 1 2 MPM 1 1 2 1
  • 22. Conclusions  MPM  Can model complex constructs:  Sequence, Optionality, Concurrency, Duplicate tasks, Non-free choice, Self/Nested loops, Hidden tasks  Cannot handle duplicate tasks in parallel processes  Able to handle noise & incomplete logs  No over-fitting & under-fitting  Fast  Capable of stream mining  Uses duplicate events  Low simplicity score  Task abstractions