SlideShare a Scribd company logo
Chapter 6
Advanced Process
Discovery Techniques
prof.dr.ir. Wil van der Aalst
www.processmining.org
Overview
Chapter 1
Introduction



Part I: Preliminaries

Chapter 2                   Chapter 3
Process Modeling and        Data Mining
Analysis


Part II: From Event Logs to Process Models

Chapter 4                  Chapter 5               Chapter 6
Getting the Data           Process Discovery: An   Advanced Process
                           Introduction            Discovery Techniques


Part III: Beyond Process Discovery

Chapter 7                   Chapter 8              Chapter 9
Conformance                 Mining Additional      Operational Support
Checking                    Perspectives


Part IV: Putting Process Mining to Work

Chapter 10                  Chapter 11             Chapter 12
Tool Support                Analyzing “Lasagna     Analyzing “Spaghetti
                            Processes”             Processes”


Part V: Reflection

Chapter 13                  Chapter 14
Cartography and             Epilogue
Navigation
                                                                          PAGE 1
Process discovery

                              supports/
      “world”    business
                               controls
                processes                      software
   people   machines                            system
        components
           organizations                              records
                                                   events, e.g.,
                                                    messages,
                                   specifies       transactions,
    models
                                  configures            etc.
   analyzes
                                 implements
                                   analyzes


                            discovery
        (process)                                 event
                            conformance
          model                                    logs
                            enhancement
                                                                   PAGE 2
Challenge

 “able to replay event log”                 “Occam’s razor”

          fitness                             simplicity

                               process
                              discovery



generalization                                precision
 “not overfitting the log”                “not underfitting the log”



                                                              PAGE 3
Observing a stable process infinitely long

       frequent                  all behavior
       behavior    trace in   (including noise)
                  event log




                                                  PAGE 4
Target model


               target model




                              PAGE 5
Non-fitting model


                    non-fitting model




                                        PAGE 6
Overfitting model


                    overfitting model




                                        PAGE 7
Underfitting model


               underfitting model




                                    PAGE 8
Characteristics of process discovery
 algorithms
• Representational bias
   −   Inability to represent concurrency
   −   Inability to deal with (arbitrary) loops
   −   Inability to represent silent actions
   −   Inability to represent duplicate actions
   −   Inability to model OR-splits/joins
   −   Inability to represent non-free-choice behavior
   −   Inability to represent hierarchy
• Ability to deal with noise
• Completeness notion assumed
• Approach used (direct algorithmic approaches, two-
  phase approaches, computational intelligence
  approaches, partial approaches, etc.)                  PAGE 9
Examples
• Algorithmic techniques
  • Alpha miner
  • Alpha+, Alpha++, Alpha#
  • FSM miner
  • Fuzzy miner
  • Heuristic miner
  • Multi phase miner
• Genetic process mining
  • Single/duplicate tasks
  • Distributed GM
• Region-based process mining
  • State-based regions
  • Language based regions
• Classical approaches not dealing with concurrency
  • Inductive inference (Mark Gold, Dana Angluin et al.)
  • Sequence mining
                                                           PAGE 10
Heuristic mining

• To deal with noise and incompleteness.
• To have a better representational bias than the α
  algorithm (AND/XOR/OR/skip).
• Uses C-nets.


                            b
                          check
                          policy

               a            c                 e
            register       check             close
             claim        damage             case

                            d
                                   consult
                                   expert
                                                      PAGE 11
Example log; problem α algorithm




                 p5

                 b



        a   p1   d      p3   e

start                              end

            p2    c     p4

                                         PAGE 12
Taking into account frequencies




                                  PAGE 13
Dependency measure




                     PAGE 14
Example




          PAGE 15
Lower threshold (2 direct successions and
a dependency of at least 0.7)
       5(0.83)

                      b

           11(0.92)       11(0.92)

  a                   c                    e
         11(0.92)            11(0.92)


      13(0.93)                  13(0.93)
                      d

          4(0.80)




                                               PAGE 16
Higher threshold (5 direct successions
and a dependency of at least 0.9)

                  b
    11(0.92)             11(0.92)



a                 c                 e
       11(0.92)       11(0.92)


    13(0.93)             13(0.93)
                  d




                                         PAGE 17
Learning splits and joins

                          5
                                  20    b       20

                                       21
           5             20                          20         5


                    20            20            20   20
      a                                 c                        e
      40                 20            21            20         40
                                                           13
               13
                                  13            13
                    13                                13
                                        d
                              4        17
                                            4
                                  4



                                                                     PAGE 18
Alternative visualization

                     5
                             20   b        20

                                  21
     5              20                          20         5


               20            20            20   20
a                                  c                        e
40                  20            21            20         40
                                                      13
         13
                             13            13
               13                                13
                                  d                                       b
                         4        17
                                       4
                             4
                                                                    AND       AND
                                                                a         c         e




                                                                          d




                                                                                        PAGE 19
Characteristics of heuristic mining

• Can deal with noise and therefore quite robust.
• Improved representational bias.
• Split and join rules are only considered locally
  (therefore most of the discovered model are not
  sound and require repair actions).




                                                     PAGE 20
Genetic process mining

                    create initial
                     population



   event log                                                  mutation

                                     next generation
                  compute
                   fitness
                                       elitism
  termination
                       tournament                           children

                                                       crossover

    select best                  parents
     individual



                             “dead” individuals



                                                                         PAGE 21
Design decisions

•   Representation of individuals
•   Initialization
•   Fitness function
•   Selection strategy (tournament and elitism)
•   Crossover                                   create initial
                                                 population


•   Mutation                   event log                                                  mutation

                                                                 next generation
                                              compute
                                               fitness
                                                                   elitism
                              termination
                                                   tournament                           children

                                                                                   crossover

                                select best                  parents
                                 individual



                                                         “dead” individuals




                                                                                                     PAGE 22
Example: crossover

                        b                                                                           b
                    examine                                                                     examine
                   thoroughly                                                                  thoroughly
                                                            g                                                                           g
                                                           pay                                                                         pay
                        c                                                                           c
                                                       compensation                                                                compensation
           a                          e                                                a                          e
                    examine                                                                     examine
start   register    casually      decide                              end   start   register    casually      decide                              end
        request                                                                     request
                                                            h                                                                           h
                        d                                                                           d
                                                          reject                                                                      reject
                   check ticket                          request                               check ticket                          request
                                  f                                                                           f
                                          reinitiate                                                                  reinitiate
                                           request                                                                     request




                        b                                                                           b
                    examine                                                                     examine
                   thoroughly                                                                  thoroughly
                                                            g                                                                           g
                                                           pay                                                                         pay
                        c                                                                           c
                                                       compensation                                                                compensation
           a                          e                                                a                          e
                    examine                                                                     examine
start   register    casually      decide                              end   start   register    casually      decide                              end
        request                                                                     request
                                                            h                                                                           h
                        d                                                                           d
                                                          reject                                                                      reject
                   check ticket                          request                               check ticket                          request
                                  f                                                                           f
                                          reinitiate
                                                                                                                      reinitiate
                                           request
                                                                                                                       request




                                                                                                                                            PAGE 23
Example: mutation



                                  remove place

                        b                                                                           b
                    examine                                                                     examine
                   thoroughly                                                                  thoroughly
                                                            g                                                                           g
                                                           pay                                                                         pay
                        c                                                                           c
                                                       compensation                                                                compensation
           a                          e                                                a                          e
                    examine                                                                     examine
start   register    casually      decide                              end   start   register    casually      decide                              end
        request                                                                     request
                                                            h                                                                           h
                        d                                                                           d
                                                          reject                                                                      reject
                   check ticket                          request                               check ticket                          request
                                  f                                                                           f
                                          reinitiate                                                                  reinitiate
                                           request
                                                                            added arc                                  request




                                                                                                                                        PAGE 24
Characteristics of genetic
 process mining




• Requires a lot of computing power.
• Can be distributed easily.
• Can deal with noise, infrequent behavior, duplicate tasks,
  invisible tasks, etc.
• Allows for incremental improvement and combinations
  with other approaches (heuristics post-optimization, etc.).
                                                       PAGE 25
Region-based mining

• Two types of regions theory:
   − State-based regions
   − Language-based regions
• All about discovering places (like in the α algorithm)!


              a1                          b1


              a2                          b2

              ...         p(A,B)          ...
              am                          bn


        A={a1,a2, … am}            B={b1,b2, … bn}
                                                      PAGE 26
State-based regions

Two steps:
1.Discover a transition system (different abstractions
  are possible)
2.Convert transition system into an “equivalent” Petri
  net.




                                                     PAGE 27
Step 1: learning a transition system

                                 current state


       trace:   abcdcdcde faghhhi
                      past                       future

                             past and future

•   past, future, past+future
•   sequence, multiset, set abstraction
•   limited horizon to abstract further
•   filtering e.g. based on transaction type, names, etc.
•   labels based on activity name or other features
                                                            PAGE 28
Past without abstraction (full sequence)


                    c             d
       ‹a,b›
                        ‹a,b,c›       ‹a,b,c,d›
                b
      a             e             d
 ‹›       ‹a›           ‹a,e›         ‹a,e,d›
                c
                    b             d
       ‹a,c›
                        ‹a,c,b›       ‹a,c,b,d›

                                                PAGE 29
Future without abstraction


             a             b        ‹c,d›
 ‹a,b,c,d›       ‹b,c,d›       c
             a             e              d
  ‹a,e,d›         ‹e,d›            ‹d ›       ‹›
                               b
             a             c
                                    ‹b,d›
 ‹a,c,b,d›       ‹c,b,d›

                                                   PAGE 30
Past with multiset abstraction

           [a,e]
                             d
                                      [a,d,e]
                e       [a,b]
      a             b
 []       [a]
                c        c
                    b             d
           [a,c]        [a,b,c]       [a,b,c,d]

                                                  PAGE 31
Only last event matters for state

                                ‹e›
                    e                      d
        a               b
                                ‹ b›       d
  ‹›         ‹a ›           c          b       ‹d›
                    c                      d

                                ‹c›

                                                     PAGE 32
Step 2: constructing a Petri net using
regions
                                            a = enter
               b                d           b = enter
       a                            e       c = exit
                                            d = exit
                   f            d           e = do not cross
   e                                        f = do not cross
           e

                       f        c
       a

                           R


                       a                c

           e                                      f
                           pR
                       b                d

                                                               PAGE 33
Example

                                                      d
                                        e
                                            [a,e]             [a,d,e]
                               [ a,b]
             a             b
        []       [a]                    c
                       c
                           b                          d
                  [a,c]                     [a,b,c]           [a,b,c,d]




                               b



        a        p1            e              p3          d

start                                                                end

                 p2            c              p4
                                                                           PAGE 34
Language based regions


                  f                  c1

                          a1                    b1

              e                       c                      d
                                     pR
                          a2                    b2


                          X                     Y

Region R = (X,Y,c) corresponding to place pR: X = {a1,a2,c1} =
transitions producing a token for pR, Y = {b1,b2,c1} = transitions
consuming a token from pR, and c is the initial marking of pR.       PAGE 35
Based idea: enough tokens should be
present when consuming
                           A place is feasible if it
                           can be added without
       f        c1         disabling any of the
                           traces in the event log.

           a1        b1

   e            c          d
                pR
           a2        b2


           X         Y



                                               PAGE 36
Example




          PAGE 37
Regions




          PAGE 38
Model

        a        p5            d

                      c
 p1         p2            p3       p4
        b                      e

                 p6




                                        PAGE 39
Characteristics of region-based mining

• Can be used to discover more complex control-flow
  structures.
• Classical approaches need to be adapted
  (overfitting!).
• Representational bias can be parameterized (e.g.,
  free-choice nets, label splitting, etc.).
• Problems dealing with noise.




                                                  PAGE 40
Other approaches, e.g. fuzzy mining




                                      PAGE 41
Evaluating the discovered process



                         Fitness: Is the event log
                         possible according to the
                         model?

Precision: Is the model                        Generalization: Is the model
not underfitting (allow for                    not overfitting (only allow for
too much)?                                     the “accidental” examples)?


                         Structure: Is this the
                         simplest model (Occam's
                         Razor)?



                                                                          PAGE 42

More Related Content

PDF
Discovering Petri Nets: Evidence-Based Business Process Management
PDF
Process Mining - Chapter 12 - Analyzing Spaghetti Processes
PDF
Distributed Process Discovery and Conformance Checking
PDF
Process Mining - Chapter 5 - Process Discovery
PDF
Process Mining - Chapter 7 - Conformance Checking
PDF
Discovering Concurrency: Learning (Business) Process Models from Examples
PPT
Configurable Declare: Designing Customizable Flexible Models
PDF
Mobility in healthcare
Discovering Petri Nets: Evidence-Based Business Process Management
Process Mining - Chapter 12 - Analyzing Spaghetti Processes
Distributed Process Discovery and Conformance Checking
Process Mining - Chapter 5 - Process Discovery
Process Mining - Chapter 7 - Conformance Checking
Discovering Concurrency: Learning (Business) Process Models from Examples
Configurable Declare: Designing Customizable Flexible Models
Mobility in healthcare

Similar to Process mining chapter_06_advanced_process_discovery_techniques (20)

PDF
Process mining chapter_07_conformance_checking
PDF
Process mining chapter_05_process_discovery
PDF
Process Mining - Chapter 8 - Mining Additional Perspectives
PDF
Process mining chapter_08_mining_additional_perspectives
PPT
Process Mining: Understanding and Improving Desire Lines in Big Data
PDF
Process mining chapter_12_analyzing_spaghetti_processes
PDF
Process mining chapter_01_introduction
PDF
Process Mining - Chapter 1 - Introduction
PDF
Repairing Process Models to Match Reality
PDF
Process Mining - Chapter 14 - Epilogue
PDF
Process mining chapter_14_epilogue
PPT
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
PDF
Virtual memory
PDF
Keynote Gartner Business Process Management Summit, February 2009, London
PDF
Simplifying Mined Process Models
PDF
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
PDF
Back To The Future
PDF
Introduction to R for Data Mining
PDF
On Failure and Resilience
PDF
Prdc2012
Process mining chapter_07_conformance_checking
Process mining chapter_05_process_discovery
Process Mining - Chapter 8 - Mining Additional Perspectives
Process mining chapter_08_mining_additional_perspectives
Process Mining: Understanding and Improving Desire Lines in Big Data
Process mining chapter_12_analyzing_spaghetti_processes
Process mining chapter_01_introduction
Process Mining - Chapter 1 - Introduction
Repairing Process Models to Match Reality
Process Mining - Chapter 14 - Epilogue
Process mining chapter_14_epilogue
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Virtual memory
Keynote Gartner Business Process Management Summit, February 2009, London
Simplifying Mined Process Models
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Back To The Future
Introduction to R for Data Mining
On Failure and Resilience
Prdc2012
Ad

More from Muhammad Ajmal (9)

PDF
Process mining chapter_13_cartography_and_navigation
PDF
Process mining chapter_11_analyzing_lasagna_processes
PDF
Process mining chapter_10_tool_support
PDF
Process mining chapter_09_operational_support
PPT
Process mining chapter_07_conformance_checking
PDF
Process mining chapter_04_getting_the_data
PDF
Process mining chapter_03_data_mining
PDF
Process mining chapter_02_process_modeling_and_analysis
PDF
Process mining
Process mining chapter_13_cartography_and_navigation
Process mining chapter_11_analyzing_lasagna_processes
Process mining chapter_10_tool_support
Process mining chapter_09_operational_support
Process mining chapter_07_conformance_checking
Process mining chapter_04_getting_the_data
Process mining chapter_03_data_mining
Process mining chapter_02_process_modeling_and_analysis
Process mining
Ad

Process mining chapter_06_advanced_process_discovery_techniques

  • 1. Chapter 6 Advanced Process Discovery Techniques prof.dr.ir. Wil van der Aalst www.processmining.org
  • 2. Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Chapter 3 Process Modeling and Data Mining Analysis Part II: From Event Logs to Process Models Chapter 4 Chapter 5 Chapter 6 Getting the Data Process Discovery: An Advanced Process Introduction Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Chapter 8 Chapter 9 Conformance Mining Additional Operational Support Checking Perspectives Part IV: Putting Process Mining to Work Chapter 10 Chapter 11 Chapter 12 Tool Support Analyzing “Lasagna Analyzing “Spaghetti Processes” Processes” Part V: Reflection Chapter 13 Chapter 14 Cartography and Epilogue Navigation PAGE 1
  • 3. Process discovery supports/ “world” business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement PAGE 2
  • 4. Challenge “able to replay event log” “Occam’s razor” fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” PAGE 3
  • 5. Observing a stable process infinitely long frequent all behavior behavior trace in (including noise) event log PAGE 4
  • 6. Target model target model PAGE 5
  • 7. Non-fitting model non-fitting model PAGE 6
  • 8. Overfitting model overfitting model PAGE 7
  • 9. Underfitting model underfitting model PAGE 8
  • 10. Characteristics of process discovery algorithms • Representational bias − Inability to represent concurrency − Inability to deal with (arbitrary) loops − Inability to represent silent actions − Inability to represent duplicate actions − Inability to model OR-splits/joins − Inability to represent non-free-choice behavior − Inability to represent hierarchy • Ability to deal with noise • Completeness notion assumed • Approach used (direct algorithmic approaches, two- phase approaches, computational intelligence approaches, partial approaches, etc.) PAGE 9
  • 11. Examples • Algorithmic techniques • Alpha miner • Alpha+, Alpha++, Alpha# • FSM miner • Fuzzy miner • Heuristic miner • Multi phase miner • Genetic process mining • Single/duplicate tasks • Distributed GM • Region-based process mining • State-based regions • Language based regions • Classical approaches not dealing with concurrency • Inductive inference (Mark Gold, Dana Angluin et al.) • Sequence mining PAGE 10
  • 12. Heuristic mining • To deal with noise and incompleteness. • To have a better representational bias than the α algorithm (AND/XOR/OR/skip). • Uses C-nets. b check policy a c e register check close claim damage case d consult expert PAGE 11
  • 13. Example log; problem α algorithm p5 b a p1 d p3 e start end p2 c p4 PAGE 12
  • 14. Taking into account frequencies PAGE 13
  • 16. Example PAGE 15
  • 17. Lower threshold (2 direct successions and a dependency of at least 0.7) 5(0.83) b 11(0.92) 11(0.92) a c e 11(0.92) 11(0.92) 13(0.93) 13(0.93) d 4(0.80) PAGE 16
  • 18. Higher threshold (5 direct successions and a dependency of at least 0.9) b 11(0.92) 11(0.92) a c e 11(0.92) 11(0.92) 13(0.93) 13(0.93) d PAGE 17
  • 19. Learning splits and joins 5 20 b 20 21 5 20 20 5 20 20 20 20 a c e 40 20 21 20 40 13 13 13 13 13 13 d 4 17 4 4 PAGE 18
  • 20. Alternative visualization 5 20 b 20 21 5 20 20 5 20 20 20 20 a c e 40 20 21 20 40 13 13 13 13 13 13 d b 4 17 4 4 AND AND a c e d PAGE 19
  • 21. Characteristics of heuristic mining • Can deal with noise and therefore quite robust. • Improved representational bias. • Split and join rules are only considered locally (therefore most of the discovered model are not sound and require repair actions). PAGE 20
  • 22. Genetic process mining create initial population event log mutation next generation compute fitness elitism termination tournament children crossover select best parents individual “dead” individuals PAGE 21
  • 23. Design decisions • Representation of individuals • Initialization • Fitness function • Selection strategy (tournament and elitism) • Crossover create initial population • Mutation event log mutation next generation compute fitness elitism termination tournament children crossover select best parents individual “dead” individuals PAGE 22
  • 24. Example: crossover b b examine examine thoroughly thoroughly g g pay pay c c compensation compensation a e a e examine examine start register casually decide end start register casually decide end request request h h d d reject reject check ticket request check ticket request f f reinitiate reinitiate request request b b examine examine thoroughly thoroughly g g pay pay c c compensation compensation a e a e examine examine start register casually decide end start register casually decide end request request h h d d reject reject check ticket request check ticket request f f reinitiate reinitiate request request PAGE 23
  • 25. Example: mutation remove place b b examine examine thoroughly thoroughly g g pay pay c c compensation compensation a e a e examine examine start register casually decide end start register casually decide end request request h h d d reject reject check ticket request check ticket request f f reinitiate reinitiate request added arc request PAGE 24
  • 26. Characteristics of genetic process mining • Requires a lot of computing power. • Can be distributed easily. • Can deal with noise, infrequent behavior, duplicate tasks, invisible tasks, etc. • Allows for incremental improvement and combinations with other approaches (heuristics post-optimization, etc.). PAGE 25
  • 27. Region-based mining • Two types of regions theory: − State-based regions − Language-based regions • All about discovering places (like in the α algorithm)! a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn} PAGE 26
  • 28. State-based regions Two steps: 1.Discover a transition system (different abstractions are possible) 2.Convert transition system into an “equivalent” Petri net. PAGE 27
  • 29. Step 1: learning a transition system current state trace: abcdcdcde faghhhi past future past and future • past, future, past+future • sequence, multiset, set abstraction • limited horizon to abstract further • filtering e.g. based on transaction type, names, etc. • labels based on activity name or other features PAGE 28
  • 30. Past without abstraction (full sequence) c d ‹a,b› ‹a,b,c› ‹a,b,c,d› b a e d ‹› ‹a› ‹a,e› ‹a,e,d› c b d ‹a,c› ‹a,c,b› ‹a,c,b,d› PAGE 29
  • 31. Future without abstraction a b ‹c,d› ‹a,b,c,d› ‹b,c,d› c a e d ‹a,e,d› ‹e,d› ‹d › ‹› b a c ‹b,d› ‹a,c,b,d› ‹c,b,d› PAGE 30
  • 32. Past with multiset abstraction [a,e] d [a,d,e] e [a,b] a b [] [a] c c b d [a,c] [a,b,c] [a,b,c,d] PAGE 31
  • 33. Only last event matters for state ‹e› e d a b ‹ b› d ‹› ‹a › c b ‹d› c d ‹c› PAGE 32
  • 34. Step 2: constructing a Petri net using regions a = enter b d b = enter a e c = exit d = exit f d e = do not cross e f = do not cross e f c a R a c e f pR b d PAGE 33
  • 35. Example d e [a,e] [a,d,e] [ a,b] a b [] [a] c c b d [a,c] [a,b,c] [a,b,c,d] b a p1 e p3 d start end p2 c p4 PAGE 34
  • 36. Language based regions f c1 a1 b1 e c d pR a2 b2 X Y Region R = (X,Y,c) corresponding to place pR: X = {a1,a2,c1} = transitions producing a token for pR, Y = {b1,b2,c1} = transitions consuming a token from pR, and c is the initial marking of pR. PAGE 35
  • 37. Based idea: enough tokens should be present when consuming A place is feasible if it can be added without f c1 disabling any of the traces in the event log. a1 b1 e c d pR a2 b2 X Y PAGE 36
  • 38. Example PAGE 37
  • 39. Regions PAGE 38
  • 40. Model a p5 d c p1 p2 p3 p4 b e p6 PAGE 39
  • 41. Characteristics of region-based mining • Can be used to discover more complex control-flow structures. • Classical approaches need to be adapted (overfitting!). • Representational bias can be parameterized (e.g., free-choice nets, label splitting, etc.). • Problems dealing with noise. PAGE 40
  • 42. Other approaches, e.g. fuzzy mining PAGE 41
  • 43. Evaluating the discovered process Fitness: Is the event log possible according to the model? Precision: Is the model Generalization: Is the model not underfitting (allow for not overfitting (only allow for too much)? the “accidental” examples)? Structure: Is this the simplest model (Occam's Razor)? PAGE 42