SlideShare a Scribd company logo
Chapter 5
Process Discovery:
An Introduction
prof.dr.ir. Wil van der Aalst
www.processmining.org
Overview
Chapter 1
Introduction



Part I: Preliminaries

Chapter 2                   Chapter 3
Process Modeling and        Data Mining
Analysis


Part II: From Event Logs to Process Models

Chapter 4                  Chapter 5               Chapter 6
Getting the Data           Process Discovery: An   Advanced Process
                           Introduction            Discovery Techniques


Part III: Beyond Process Discovery

Chapter 7                   Chapter 8              Chapter 9
Conformance                 Mining Additional      Operational Support
Checking                    Perspectives


Part IV: Putting Process Mining to Work

Chapter 10                  Chapter 11             Chapter 12
Tool Support                Analyzing “Lasagna     Analyzing “Spaghetti
                            Processes”             Processes”


Part V: Reflection

Chapter 13                  Chapter 14
Cartography and             Epilogue
Navigation
                                                                          PAGE 1
Process discovery

                              supports/
      “world”    business
                               controls
                processes                      software
   people   machines                            system
        components
           organizations                              records
                                                   events, e.g.,
                                                    messages,
                                   specifies       transactions,
    models
                                  configures            etc.
   analyzes
                                 implements
                                   analyzes


                            discovery
        (process)                                 event
                            conformance
          model                                    logs
                            enhancement
                                                                   PAGE 2
Process discovery = Play-In
Play-In




      event log                                   process model


Play-Out




                  process model                                event log


Replay

                                                      •   extended model
                                                          showing times,
                                                          frequencies, etc.
                                                      •   diagnostics
                                                      •   predictions
                                                      •   recommendations
     event log                    process model                               PAGE 3
Example

                                   b



            a          p1          e   p3   d

start                                             end

                       p2          c   p4




 Event log contains all possible
 traces of model and vice versa.
                                                PAGE 4
Another example

                  p1                  b        p3




            a              f               e        d

start                                 p5                  end

                  p2                  c        p4




 Generalization: event log contains only
 subset of all possible traces of model.
                                                        PAGE 5
Notation is less relevant (e.g. BPMN)
                  b



         a   p1   e    p3     d

start                                 end   b
             p2   c   p4



                                            c

                                  a             d
                      start                            end
                                            e




                                                    PAGE 6
Another BPMN example
            p1       b        p3




        a        f        e         d

start                p5                    end
                                                         b

            p2       c        p4



                                                         c

                                                 a               d
                                   start                                 end

                                                     f       e




                                                                     PAGE 7
Challenge

• In general, there is a trade-off between the following
  four quality criteria:
1.Fitness: the discovered model should allow for the
  behavior seen in the event log.
2.Precision (avoid underfitting): the discovered model
  should not allow for behavior completely unrelated
  to what was seen in the event log.
3.Generalization (avoid overfitting): the discovered
  model should generalize the example behavior seen
  in the event log.
4.Simplicity: the discovered model should be as
  simple as possible.
                                                      PAGE 8
Process Discovery:
example of algorithm



             α

                       PAGE 9
>,→,||,# relations


• Direct succession: x>y iff
  for some case x is directly
  followed by y.                            abcd
• Causality: x→y iff x>y and                acbd
  not y>x.                                   aed
• Parallel: x||y iff x>y and    a>b
  y>x                           a>c   a→b          b#e
                                a>e
• Choice: x#y iff not x>y and         a→c          e#b
                                b>c         b||c   c#e
  not y>x.                            a→e
                                b>d         c||b
                                c>b   b→d          a#d
                                                    …
                                c>d   c→d
                                e>d   e→d                PAGE 10
Basic Idea Used by α Algorithm (1)




        a                     b

     (a) sequence pattern: a→b




                                     PAGE 11
Basic Idea Used by α Algorithm (2)

                                 a

                                     b                   c

                                 b
           a
                                 (c) XOR-join pattern:
                       b
                                  a→c, b→c, and a#b
   a                                 c
                       c
            (b) XOR-split pattern:
   (b) XOR-split pattern:a→c, and b#c
              a→b,
   a→b, a→c, and b#c                                     PAGE 12
Basic Idea Used by α Algorithm (3)

                              a

                                    b                  c

                              b
          a
                               (e) AND-join pattern:
                 b              a→c, b→c, and a||b

  a
                                      c
                 c
             (d) AND-split pattern:
  (d) AND-split pattern:
               a→b, a→c, and b||c
   a→b, a→c, and b||c                                  PAGE 13
Example Revisited

 a>b       a→b    b||c   b#e
 a>c       a→c    c||b   e#b
 a>e       a→e           c#e
 b>c                     a#d
           b→d
 b>d                      …
 c>b       c→ d
 c>d       e→d                   b
 e>d

              a          p1      e   p3   d

   start                                      end

                         p2      c   p4
Result produced by α algorithm                PAGE 14
Footprint of L1


                    b



           a   p1   e   p3   d

   start                         end

               p2   c   p4




                                       PAGE 15
Footprint of L2



                     p1       b        p3




                 a        f        e        d

         start                p5                end

                     p2       c        p4




                                                      PAGE 16
Simple patterns

                     a                  b

                  (a) sequence pattern: a→b

                         b          a

    a                                                        c

                             c      b

    (b) XOR-split pattern:           (c) XOR-join pattern:
     a→b, a→c, and b#c               a→c, b→c, and a#b

                         b          a

    a                                                        c

                             c      b

    (d) AND-split pattern:           (e) AND-join pattern:
                                                                 PAGE 17
     a→b, a→c, and b||c               a→c, b→c, and a||b
Algorithm

Let L be an event log over T. α(L) is defined as follows.
1. TL = { t ∈ T | ∃σ ∈ L t ∈ σ},
2. TI = { t ∈ T | ∃σ ∈ L t = first(σ) },
3. TO = { t ∈ T | ∃σ ∈ L t = last(σ) },
4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧
   ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },
5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) },
6. PL = { p(A,B) | (A,B) ∈ YL } ∪{iL,oL},
7. FL = { (a,p(A,B)) | (A,B) ∈ YL ∧ a ∈ A } ∪ { (p(A,B),b) | (A,B) ∈
   YL ∧ b ∈ B } ∪{ (iL,t) | t ∈ TI} ∪{ (t,oL) | t ∈ TO}, and
8. α(L) = (PL,TL,FL).



                                                                      PAGE 18
Key idea: find places


                       a1                              b1


                       a2                              b2

                       ...           p(A,B)            ...
                       am                              bn


              A={a1,a2, … am}                 B={b1,b2, … bn}

4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧
   ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },
5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) },
                                                                          PAGE 19
Places as footprints

                a1                          b1


                a2                          b2

                ...         p(A,B)          ...
                am                          bn


          A={a1,a2, … am}            B={b1,b2, … bn}




                                                       PAGE 20
b



        a   p1   e   p3   d

start                               end

            p2   c   p4




                          PAGE 21
Another event log L3




                       PAGE 22
Model for L3




                                            f



                                            c

        a                  b   p({b},{c})       p({c},{e})   e                  g

iL          p({a,f},{b})                    d                    p({e},{f,g})                 oL

                               p({b},{d})       p({d},{e})



                                                                                    PAGE 23
Another event log L4




      a                                      d

                          c

iL         p({a,b},{c})       p({c},{d,e})       oL
      b                                      e




                                                 PAGE 24
Event log L5




               PAGE 25
PAGE 26
Discovered model

                d                            c
                           p({c},{d})

                              b
            p({a,d},{b})                p({b},{c,f})
iL      a                                              f       oL

                              e
            p({a},{e})                  p({e},{f})




                                                           PAGE 27
Limitation of α algorithm
 (implicit places)




                             c
              a
                             d
                                 p1   g
                             e
                                 p2
              b
                             f

Green places are implicit!
                                          PAGE 28
Limitation of α algorithm
(loops of length 1)




              b



      a               c




                                b



                            a       c


                                        PAGE 29
Limitation of α algorithm
(loops of length 2)




                      c


     a                b       d




                                  b

                          a           d

                                  c


                                          PAGE 30
Limitation of α algorithm
(non-local dependencies)




           a                       p1   d

                             c

           b                       p2   e



Green places are not discovered!




                                            PAGE 31
Difficult constructs for α algorithm



           a

                                       c

           b




                                           PAGE 32
Taking the transactional life-cycle into
    account

a                                                    c
      assign                                               assign

                 b
                       start
    assigned                                             assigned
                               suspend


                     running
        start                                                start
                                         suspended

                               resume
                  complete
     running                                              running



    complete                                             complete




                                                                     PAGE 33
Rediscovering process models



                    simulate            discover   discovered
 original process
                                event                process
      model
                                 log                  model
         N                                              N’
                               N=N’ ?


 The rediscovery problem: Is the discovered
 model N’ equivalent to the original model N?



                                                                PAGE 34
Equivalence: trace equivalence,
 bisimilarity, and branching bisimilarity


                s1
                                              s5                    s8
                                      birth                                   curse
             birth                                          birth
        curse                     curse         s6      curse        curse
                              ?                                     s9              s10
                 s2
        curse                     heaven         hell   heaven        hell
                   heaven hell                                               hell
                heaven
curse                                                                        heaven
        s3               s4                s7                   s11
                 TS1                       TS2                  TS3

Three trace equivalent transition systems: TS1 and TS2
are not bisimilar, but TS2 and TS3 are bisimilar

                                                                                    PAGE 35
Branching bisimilarity defined for YAWL


   start                      s1                              s6          start

                          check                       check
  check                                                                  check
                              s2

                              τ          τ
                                                              s7
  c1       c2            s3                  s4                                   c3
                                                  ?

reject          accept   reject          accept   reject      accept   reject          accept

   end                                                           s8       end
                              s5

                                   TS1                     TS2
TS1 and TS2 are not branching bisimilar (although trace equivalent).

                                                                                           PAGE 36
Challenge: finding the right
representational bias




                                   a                       a
                     start                     p                     end


 There is no WF-net with unique visible labels that exhibits this behavior.

                                                                         PAGE 37
Another example

                 τ

        a        b          c
start       p1         p1       end
                 (a)


            a



        a        b          c         There is no WF-
start       p1         p1       end   net with unique
                 (b)                  visible labels
                                      that exhibits
                                      this behavior.


        a        b          c
start       p1         p1       end
                                               PAGE 38
                 (c)
Challenge: noise and incompleteness

• To discover a suitable process model it is
  assumed that the event log contains a
  representative sample of behavior.
• Two related phenomena:
    − Noise: the event log contains rare and
      infrequent behavior not representative for
      the typical behavior of the process.
    − Incompleteness: the event log contains
      too few events to be able to discover
      some of the underlying control-flow
      structures.
                                              PAGE 39
More on incompleteness




See also chapter 3 (cross-validation, precision, recall, etc.)
                                                                 PAGE 40
Challenge: Balancing
Between Underfitting and
Overfitting
                           PAGE 41
Challenge: four competing quality
criteria

 “able to replay event log”                 “Occam’s razor”

          fitness                             simplicity

                               process
                              discovery



generalization                                precision
 “not overfitting the log”                “not underfitting the log”



                                                                PAGE 42
Flower model


                  b   c
              a               d



      start                       end



              e
                                  h
                  f       g


                                        PAGE 43
What is the best model?
                 A        D



                     C



 ACD   99        B        E
 ACE   0
 BCE   85
                 A        D
 BCD   0

                     C



                 B        E




                              PAGE 44
What is the best model?
                 A        D



                     C



 ACD   99        B        E
 ACE   88
 BCE   85
                 A        D
 BCD   78

                     C



                 B        E




                              PAGE 45
What is the best model?
                 A        D



                     C



 ACD   99        B        E
 ACE   2
 BCE   85
                 A        D
 BCD   3

                     C



                 B        E




                              PAGE 46
Example: one log four models
                                                                                                               b
                                                                                                            examine
                                                                                                           thoroughly
                                                                                                                                                                            g
                                                                                                                                                                         pay
                                                                                                               c                                                     compensation
                                                                                          a                examine                                 e
                                                                           start     register              casually                           decide                                   end
                                                                                                                                                                                                     #      trace
                                                                                     request
                                                                                                                                                                            h                        455 acdeh
                                                                                                               d                                                         reject
                                                                                                          check ticket                                                  request                      191 abdeg
                                                                                                                                               f     reinitiate
                                                                                                                                                      request                                        177 adceh
                                                                               N1 : fitness = +, precision = +, generalization = +, simplicity = +
                                                                                                                                                                                                     144 abdeh
                                                                                                                                                                                                     111 acdeg
                                                                                      a              c                        d                          e                      h
                                                                                                                                                                                                      82 adceg
                                                                          start    register       examine                   check                      decide                reject     end
                                                                                   request        casually                  ticket                                          request
                                                                                                                                                                                                      56 adbeh
                                                                               N2 : fitness = -, precision = +, generalization = -, simplicity = +
                                                                                                                                                                                                      47 acdefdbeh
 “able to replay event log”                 “Occam’s razor”
                                                                                                                                                                                                      38 adbeg
                                                                                                       examine                                check
                                                                                                      thoroughly        b             d       ticket                        g                         33 acdefbdeh
          fitness                             simplicity                                                                                                            pay
                                                                                                                                                                compensation
                                                                                          a                                                                                                           14 acdefbdeg
                                                                           start     register   examine
                                                                                                             c                                                                         end            11 acdefdbeg
                                                                                     request    casually
                                                                                                                         e                f        reinitiate               h
                               process                                                                        decide                                request        reject
                                                                                                                                                                  request
                                                                                                                                                                                                         9 adcefcdeh
                              discovery                                        N3 : fitness = +, precision = -, generalization = +, simplicity = +                                                       8 adcefdbeh
                                                                                                                                                                                                         5 adcefbdeg
                                                                                       a              d                        c                           e                    g
                                                                                                                                                                                                         3 acdefbdefdbeg
generalization                                precision                             register
                                                                                    request
                                                                                                    check
                                                                                                    ticket
                                                                                                                         examine
                                                                                                                         casually
                                                                                                                                                        decide              pay
                                                                                                                                                                        compensation
                                                                                                                                                                                                         2 adcefdbeg
                                                                                       a              c                        d                          e                     g                        2 adcefbdefbdeg
 “not overfitting the log”                “not underfitting the log”                register      examine                    check                      decide              pay
                                                                                    request       casually                   ticket                                     compensation                     1 adcefdbefbdeh
                                                                                       a              d                        c                           e                    h                        1 adbefbdefdbeg
                                                                                    register        check                examine                        decide                reject
                                                                                    request         ticket               casually                                            request                     1 adcefdbefcdefdbeg
                                                                                      a               c                       d                           e                     h                   1391
                                                                       start                                                                                                                  end
                                                                                   register       examine                   check                      decide                reject
                                                                                   request        casually                  ticket                                          request


                                                                                                 …                 (all 21 variants seen in the log)


                                                                                      a              b                        d                           e                     g
                                                                                   register        examine                  check                      decide               pay
                                                                                   request        thoroughly                ticket                                      compensation

                                                                                      a              d                        b                           e                     h
                                                                                   register         check                 examine                      decide                reject
                                                                                   request          ticket               thoroughly                                         request

                                                                                      a              b                        d                           e                     h
                                                                                   register        examine                  check                      decide                reject
                                                                                   request        thoroughly                ticket                                          request                        PAGE 47
                                                                                N4 : fitness = +, precision = +, generalization = -, simplicity = -
#     trace
                                                                               455 acdeh
        Model N1                                                               191 abdeg
                                                                               177 adceh
                                                                               144 abdeh
                                                                               111 acdeg
                                                                                82 adceg
                                                                                56 adbeh
                          b                                                     47 acdefdbeh
                       examine
                      thoroughly                                                38 adbeg
                                                              g                 33 acdefbdeh
                                                             pay
                          c                              compensation           14 acdefbdeg
            a         examine               e
                                                                                11 acdefdbeg
start    register     casually          decide                          end
         request                                                                   9 adcefcdeh
                                                              h
                          d                                 reject                 8 adcefdbeh
                     check ticket                          request                 5 adcefbdeg
                                        f   reinitiate                             3 acdefbdefdbeg
                                             request
N1 : fitness = +, precision = +, generalization = +, simplicity = +                2 adcefdbeg
                                                                                   2 adcefbdefbdeg
                                                                                   1 adcefdbefbdeh
                                                                                   1 adbefbdefdbeg
                                                                                   1 adcefdbefcdefdbeg
                                                                                             PAGE 48
                                                                              1391
#     trace
                                                                              455 acdeh
        Model N2                                                              191 abdeg
                                                                              177 adceh
                                                                              144 abdeh
                                                                              111 acdeg
                                                                               82 adceg
                                                                               56 adbeh
                                                                               47 acdefdbeh
                                                                               38 adbeg
           a          c             d            e             h               33 acdefbdeh
start   register   examine        check        decide         reject   end     14 acdefbdeg
        request    casually       ticket                     request
   N2 : fitness = -, precision = +, generalization = -, simplicity = +         11 acdefdbeg
                                                                                  9 adcefcdeh
                                                                                  8 adcefdbeh
                                                                                  5 adcefbdeg
                                                                                  3 acdefbdefdbeg
                                                                                  2 adcefdbeg
                                                                                  2 adcefbdefbdeg
                                                                                  1 adcefdbefbdeh
                                                                                  1 adbefbdefdbeg
                                                                                  1 adcefdbefcdefdbeg
                                                                                            PAGE 49
                                                                             1391
#     trace
                                                                                          455 acdeh
        Model N3                                                                          191 abdeg
                                                                                          177 adceh
                                                                                          144 abdeh
                                                                                          111 acdeg
                                                                                           82 adceg
                                                                                           56 adbeh
                                                                                           47 acdefdbeh
                           examine                  check
                          thoroughly    b   d       ticket                     g           38 adbeg
                                                                        pay                33 acdefbdeh
                                                                    compensation
            a                                                                              14 acdefbdeg
start    register   examine                                                        end     11 acdefdbeg
         request    casually   c
                                        e       f      reinitiate
                                                                      reject
                                                                               h              9 adcefcdeh
                               decide                   request
                                                                     request                  8 adcefdbeh
 N3 : fitness = +, precision = -, generalization = +, simplicity = +
                                                                                              5 adcefbdeg
                                                                                              3 acdefbdefdbeg
                                                                                              2 adcefdbeg
                                                                                              2 adcefbdefbdeg
                                                                                              1 adcefdbefbdeh
                                                                                              1 adbefbdefdbeg
                                                                                              1 adcefdbefcdefdbeg
                                                                                                        PAGE 50
                                                                                         1391
#     trace
                                                                                               455 acdeh
Model N4                                                                                       191 abdeg
                                                                                               177 adceh
                                                                                               144 abdeh
              a             d                 c                e              g                111 acdeg
           register       check           examine            decide          pay
           request        ticket          casually                       compensation           82 adceg
              a             c                 d                e              g                 56 adbeh
           register     examine             check           decide           pay
           request      casually            ticket                       compensation           47 acdefdbeh
              a             d                 c                e              h                 38 adbeg
           register       check           examine           decide           reject
           request        ticket          casually                          request             33 acdefbdeh
              a            c                 d                e               h                 14 acdefbdeg
start                                                                                   end
           register     examine            check            decide           reject
           request      casually           ticket                           request             11 acdefdbeg

                       …             (all 21 variants seen in the log)
                                                                                                   9 adcefcdeh
                                                                                                   8 adcefdbeh
                                                                                                   5 adcefbdeg
             a             b                 d                e               g
           register     examine            check            decide           pay                   3 acdefbdefdbeg
           request     thoroughly          ticket                        compensation
                                                                                                   2 adcefdbeg
              a            d                 b                e               h
           register       check            examine          decide           reject                2 adcefbdefbdeg
           request        ticket          thoroughly                        request
                                                                                                   1 adcefdbefbdeh
             a             b                 d                e               h
          register       examine           check            decide          reject                 1 adbefbdefdbeg
          request       thoroughly         ticket                          request
                                                                                                   1 adcefdbefcdefdbeg
        N4 : fitness = +, precision = +, generalization = -, simplicity = -
                                                                                                             PAGE 51
                                                                                              1391
Why is process mining such a difficult
problem?

• There are no negative examples (i.e., a log shows
  what has happened but does not show what could
  not happen).
• Due to concurrency, loops, and choices the search
  space has a complex structure and the log typically
  contains only a fraction of all possible behaviors.
• There is no clear relation between the size of a model
  and its behavior (i.e., a smaller model may generate
  more or less behavior although classical analysis
  and evaluation methods typically assume some
  monotonicity property).


                                                     PAGE 52
Creating a 2-D slice of a 3-D reality




Creating a 2-D slice of a 3-D reality: the
process is viewed from a specific angle, the
process is scoped using a frame, and the
resolution determines the granularity of the
resulting model                                PAGE 53

More Related Content

PDF
Process Mining - Chapter 6 - Advanced Process Discovery_techniques
PPTX
Process Mining Introduction
PDF
Process Mining - Chapter 1 - Introduction
PPT
Introduction to Business Process Analysis and Redesign
PPTX
Iot Security
PDF
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
PPTX
Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...
PPTX
2. Materi Pembelajaran Storyboard
Process Mining - Chapter 6 - Advanced Process Discovery_techniques
Process Mining Introduction
Process Mining - Chapter 1 - Introduction
Introduction to Business Process Analysis and Redesign
Iot Security
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Object-Centric Process Mining: Dealing With Divergence and Convergence in Eve...
2. Materi Pembelajaran Storyboard

What's hot (20)

PDF
Process Mining - Chapter 7 - Conformance Checking
PDF
Process Mining - Chapter 3 - Data Mining
PDF
Process Mining - Chapter 10 - Tool Support
PDF
Process Mining - Chapter 9 - Operational Support
PDF
Process Mining Book
PPTX
Process Mining 2.0: From Insights to Actions
PPTX
Introduction to Business Process Monitoring and Process Mining
PDF
Process Mining - Chapter 11 - Analyzing Lasagna Processes
PPTX
CMDB - Strategic Role in IT Services - Configuration Management Moves Front a...
PPTX
Process Mining and Predictive Process Monitoring
PPTX
Process mining in business process management
PDF
IT4IT - Manage the Digital Enterprise.pdf
PDF
Ml ops on AWS
PPTX
Business Process Management
PDF
Request to Fulfill Presentation (IT4IT)
PDF
Fundamentals of business process management and BPMN
PDF
Business Process Modeling
PDF
ITIL 4 - Make sense of what BA, UI/UX Designer, Coder, QA, PM and DevOps do
PPTX
SAP Analytics for Procurement
PDF
Critical Review of Open Group IT4IT Reference Architecture
Process Mining - Chapter 7 - Conformance Checking
Process Mining - Chapter 3 - Data Mining
Process Mining - Chapter 10 - Tool Support
Process Mining - Chapter 9 - Operational Support
Process Mining Book
Process Mining 2.0: From Insights to Actions
Introduction to Business Process Monitoring and Process Mining
Process Mining - Chapter 11 - Analyzing Lasagna Processes
CMDB - Strategic Role in IT Services - Configuration Management Moves Front a...
Process Mining and Predictive Process Monitoring
Process mining in business process management
IT4IT - Manage the Digital Enterprise.pdf
Ml ops on AWS
Business Process Management
Request to Fulfill Presentation (IT4IT)
Fundamentals of business process management and BPMN
Business Process Modeling
ITIL 4 - Make sense of what BA, UI/UX Designer, Coder, QA, PM and DevOps do
SAP Analytics for Procurement
Critical Review of Open Group IT4IT Reference Architecture
Ad

Viewers also liked (11)

PDF
Process Mining - Chapter 13 - Cartography and Navigation
PDF
Process Mining - Chapter 14 - Epilogue
PDF
Process Mining - Chapter 4 - Getting the Data
PPTX
Process Mining - a new governance approach
PPT
Process Mining: Understanding and Improving Desire Lines in Big Data
PDF
Process Mining - Chapter 2 - Process Modeling and Analysis
PDF
Process Mining - Chapter 8 - Mining Additional Perspectives
PPT
Event Logs: What kind of data does process mining require?
PDF
Distributed Process Discovery and Conformance Checking
PDF
Process Mining - Chapter 12 - Analyzing Spaghetti Processes
PPT
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Process Mining - Chapter 13 - Cartography and Navigation
Process Mining - Chapter 14 - Epilogue
Process Mining - Chapter 4 - Getting the Data
Process Mining - a new governance approach
Process Mining: Understanding and Improving Desire Lines in Big Data
Process Mining - Chapter 2 - Process Modeling and Analysis
Process Mining - Chapter 8 - Mining Additional Perspectives
Event Logs: What kind of data does process mining require?
Distributed Process Discovery and Conformance Checking
Process Mining - Chapter 12 - Analyzing Spaghetti Processes
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Ad

Similar to Process Mining - Chapter 5 - Process Discovery (20)

PDF
Process mining chapter_07_conformance_checking
PDF
Process mining chapter_08_mining_additional_perspectives
PDF
Process mining chapter_06_advanced_process_discovery_techniques
PDF
Discovering Petri Nets: Evidence-Based Business Process Management
PDF
Discovering Concurrency: Learning (Business) Process Models from Examples
PDF
Repairing Process Models to Match Reality
PDF
The Popper Experimentation Protocol and CLI tool
PPT
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
PDF
Process mining chapter_01_introduction
PDF
Simplifying Mined Process Models
PDF
Fuzzing - Part 2
PDF
e Service Prototype
PDF
Process mining chapter_12_analyzing_spaghetti_processes
PDF
Keynote Gartner Business Process Management Summit, February 2009, London
PPTX
It Works On Dev
PPTX
PhD Thesis: Mining abstractions in scientific workflows
PDF
Go - techniques for writing high performance Go applications
PPT
Section07-Deadlocks_operating_system.ppt
PPT
Section07-Deadlocks (1).ppt
PDF
Multiprocessing with python
Process mining chapter_07_conformance_checking
Process mining chapter_08_mining_additional_perspectives
Process mining chapter_06_advanced_process_discovery_techniques
Discovering Petri Nets: Evidence-Based Business Process Management
Discovering Concurrency: Learning (Business) Process Models from Examples
Repairing Process Models to Match Reality
The Popper Experimentation Protocol and CLI tool
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process mining chapter_01_introduction
Simplifying Mined Process Models
Fuzzing - Part 2
e Service Prototype
Process mining chapter_12_analyzing_spaghetti_processes
Keynote Gartner Business Process Management Summit, February 2009, London
It Works On Dev
PhD Thesis: Mining abstractions in scientific workflows
Go - techniques for writing high performance Go applications
Section07-Deadlocks_operating_system.ppt
Section07-Deadlocks (1).ppt
Multiprocessing with python

More from Wil van der Aalst (14)

PPTX
Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)
PPTX
Everything You Always Wanted To Know About Petri Nets, But Were Afraid To Ask
PPTX
20 years of Process Mining Research (ICPM 2019 keynote)
PPTX
Earth Movers’ Stochastic Conformance Checking
PPTX
Using Process Mining to Remove Operational Friction in Shared Services
PPTX
Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...
PPT
Configurable Declare: Designing Customizable Flexible Models
PPT
On the Role of Fitness, Precision, Generalization and Simplicity in Process D...
PPT
A Decade of Business Process Management Conferences: Reflections on a Develop...
PDF
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
PDF
Service Interaction: Patterns, Formalization, and Analysis
PDF
Keynote on Process Mining at SSCI 2010 / CIDM 2011
PDF
TomTom for Business Process Managment (TomTom4BPM)
PDF
Keynote at 18th International Conference on Cooperative Information Systems (...
Process Mining: BPM on Steroids (CPOs@BPM&O 2019 Keynote)
Everything You Always Wanted To Know About Petri Nets, But Were Afraid To Ask
20 years of Process Mining Research (ICPM 2019 keynote)
Earth Movers’ Stochastic Conformance Checking
Using Process Mining to Remove Operational Friction in Shared Services
Process Mining In Today’s Platforms Economy: Opportunities and Challenges (WI...
Configurable Declare: Designing Customizable Flexible Models
On the Role of Fitness, Precision, Generalization and Simplicity in Process D...
A Decade of Business Process Management Conferences: Reflections on a Develop...
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Service Interaction: Patterns, Formalization, and Analysis
Keynote on Process Mining at SSCI 2010 / CIDM 2011
TomTom for Business Process Managment (TomTom4BPM)
Keynote at 18th International Conference on Cooperative Information Systems (...

Recently uploaded (20)

PDF
Types of control:Qualitative vs Quantitative
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PDF
Laughter Yoga Basic Learning Workshop Manual
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PDF
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PPTX
HR Introduction Slide (1).pptx on hr intro
PPTX
Amazon (Business Studies) management studies
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PPTX
5 Stages of group development guide.pptx
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PDF
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
PDF
WRN_Investor_Presentation_August 2025.pdf
PPT
340036916-American-Literature-Literary-Period-Overview.ppt
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PPTX
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
Types of control:Qualitative vs Quantitative
Roadmap Map-digital Banking feature MB,IB,AB
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Laughter Yoga Basic Learning Workshop Manual
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
HR Introduction Slide (1).pptx on hr intro
Amazon (Business Studies) management studies
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
5 Stages of group development guide.pptx
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
WRN_Investor_Presentation_August 2025.pdf
340036916-American-Literature-Literary-Period-Overview.ppt
Ôn tập tiếng anh trong kinh doanh nâng cao
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh

Process Mining - Chapter 5 - Process Discovery

  • 1. Chapter 5 Process Discovery: An Introduction prof.dr.ir. Wil van der Aalst www.processmining.org
  • 2. Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Chapter 3 Process Modeling and Data Mining Analysis Part II: From Event Logs to Process Models Chapter 4 Chapter 5 Chapter 6 Getting the Data Process Discovery: An Advanced Process Introduction Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Chapter 8 Chapter 9 Conformance Mining Additional Operational Support Checking Perspectives Part IV: Putting Process Mining to Work Chapter 10 Chapter 11 Chapter 12 Tool Support Analyzing “Lasagna Analyzing “Spaghetti Processes” Processes” Part V: Reflection Chapter 13 Chapter 14 Cartography and Epilogue Navigation PAGE 1
  • 3. Process discovery supports/ “world” business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement PAGE 2
  • 4. Process discovery = Play-In Play-In event log process model Play-Out process model event log Replay • extended model showing times, frequencies, etc. • diagnostics • predictions • recommendations event log process model PAGE 3
  • 5. Example b a p1 e p3 d start end p2 c p4 Event log contains all possible traces of model and vice versa. PAGE 4
  • 6. Another example p1 b p3 a f e d start p5 end p2 c p4 Generalization: event log contains only subset of all possible traces of model. PAGE 5
  • 7. Notation is less relevant (e.g. BPMN) b a p1 e p3 d start end b p2 c p4 c a d start end e PAGE 6
  • 8. Another BPMN example p1 b p3 a f e d start p5 end b p2 c p4 c a d start end f e PAGE 7
  • 9. Challenge • In general, there is a trade-off between the following four quality criteria: 1.Fitness: the discovered model should allow for the behavior seen in the event log. 2.Precision (avoid underfitting): the discovered model should not allow for behavior completely unrelated to what was seen in the event log. 3.Generalization (avoid overfitting): the discovered model should generalize the example behavior seen in the event log. 4.Simplicity: the discovered model should be as simple as possible. PAGE 8
  • 10. Process Discovery: example of algorithm α PAGE 9
  • 11. >,→,||,# relations • Direct succession: x>y iff for some case x is directly followed by y. abcd • Causality: x→y iff x>y and acbd not y>x. aed • Parallel: x||y iff x>y and a>b y>x a>c a→b b#e a>e • Choice: x#y iff not x>y and a→c e#b b>c b||c c#e not y>x. a→e b>d c||b c>b b→d a#d … c>d c→d e>d e→d PAGE 10
  • 12. Basic Idea Used by α Algorithm (1) a b (a) sequence pattern: a→b PAGE 11
  • 13. Basic Idea Used by α Algorithm (2) a b c b a (c) XOR-join pattern: b a→c, b→c, and a#b a c c (b) XOR-split pattern: (b) XOR-split pattern:a→c, and b#c a→b, a→b, a→c, and b#c PAGE 12
  • 14. Basic Idea Used by α Algorithm (3) a b c b a (e) AND-join pattern: b a→c, b→c, and a||b a c c (d) AND-split pattern: (d) AND-split pattern: a→b, a→c, and b||c a→b, a→c, and b||c PAGE 13
  • 15. Example Revisited a>b a→b b||c b#e a>c a→c c||b e#b a>e a→e c#e b>c a#d b→d b>d … c>b c→ d c>d e→d b e>d a p1 e p3 d start end p2 c p4 Result produced by α algorithm PAGE 14
  • 16. Footprint of L1 b a p1 e p3 d start end p2 c p4 PAGE 15
  • 17. Footprint of L2 p1 b p3 a f e d start p5 end p2 c p4 PAGE 16
  • 18. Simple patterns a b (a) sequence pattern: a→b b a a c c b (b) XOR-split pattern: (c) XOR-join pattern: a→b, a→c, and b#c a→c, b→c, and a#b b a a c c b (d) AND-split pattern: (e) AND-join pattern: PAGE 17 a→b, a→c, and b||c a→c, b→c, and a||b
  • 19. Algorithm Let L be an event log over T. α(L) is defined as follows. 1. TL = { t ∈ T | ∃σ ∈ L t ∈ σ}, 2. TI = { t ∈ T | ∃σ ∈ L t = first(σ) }, 3. TO = { t ∈ T | ∃σ ∈ L t = last(σ) }, 4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 }, 5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) }, 6. PL = { p(A,B) | (A,B) ∈ YL } ∪{iL,oL}, 7. FL = { (a,p(A,B)) | (A,B) ∈ YL ∧ a ∈ A } ∪ { (p(A,B),b) | (A,B) ∈ YL ∧ b ∈ B } ∪{ (iL,t) | t ∈ TI} ∪{ (t,oL) | t ∈ TO}, and 8. α(L) = (PL,TL,FL). PAGE 18
  • 20. Key idea: find places a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn} 4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 }, 5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) }, PAGE 19
  • 21. Places as footprints a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn} PAGE 20
  • 22. b a p1 e p3 d start end p2 c p4 PAGE 21
  • 23. Another event log L3 PAGE 22
  • 24. Model for L3 f c a b p({b},{c}) p({c},{e}) e g iL p({a,f},{b}) d p({e},{f,g}) oL p({b},{d}) p({d},{e}) PAGE 23
  • 25. Another event log L4 a d c iL p({a,b},{c}) p({c},{d,e}) oL b e PAGE 24
  • 26. Event log L5 PAGE 25
  • 28. Discovered model d c p({c},{d}) b p({a,d},{b}) p({b},{c,f}) iL a f oL e p({a},{e}) p({e},{f}) PAGE 27
  • 29. Limitation of α algorithm (implicit places) c a d p1 g e p2 b f Green places are implicit! PAGE 28
  • 30. Limitation of α algorithm (loops of length 1) b a c b a c PAGE 29
  • 31. Limitation of α algorithm (loops of length 2) c a b d b a d c PAGE 30
  • 32. Limitation of α algorithm (non-local dependencies) a p1 d c b p2 e Green places are not discovered! PAGE 31
  • 33. Difficult constructs for α algorithm a c b PAGE 32
  • 34. Taking the transactional life-cycle into account a c assign assign b start assigned assigned suspend running start start suspended resume complete running running complete complete PAGE 33
  • 35. Rediscovering process models simulate discover discovered original process event process model log model N N’ N=N’ ? The rediscovery problem: Is the discovered model N’ equivalent to the original model N? PAGE 34
  • 36. Equivalence: trace equivalence, bisimilarity, and branching bisimilarity s1 s5 s8 birth curse birth birth curse curse s6 curse curse ? s9 s10 s2 curse heaven hell heaven hell heaven hell hell heaven curse heaven s3 s4 s7 s11 TS1 TS2 TS3 Three trace equivalent transition systems: TS1 and TS2 are not bisimilar, but TS2 and TS3 are bisimilar PAGE 35
  • 37. Branching bisimilarity defined for YAWL start s1 s6 start check check check check s2 τ τ s7 c1 c2 s3 s4 c3 ? reject accept reject accept reject accept reject accept end s8 end s5 TS1 TS2 TS1 and TS2 are not branching bisimilar (although trace equivalent). PAGE 36
  • 38. Challenge: finding the right representational bias a a start p end There is no WF-net with unique visible labels that exhibits this behavior. PAGE 37
  • 39. Another example τ a b c start p1 p1 end (a) a a b c There is no WF- start p1 p1 end net with unique (b) visible labels that exhibits this behavior. a b c start p1 p1 end PAGE 38 (c)
  • 40. Challenge: noise and incompleteness • To discover a suitable process model it is assumed that the event log contains a representative sample of behavior. • Two related phenomena: − Noise: the event log contains rare and infrequent behavior not representative for the typical behavior of the process. − Incompleteness: the event log contains too few events to be able to discover some of the underlying control-flow structures. PAGE 39
  • 41. More on incompleteness See also chapter 3 (cross-validation, precision, recall, etc.) PAGE 40
  • 42. Challenge: Balancing Between Underfitting and Overfitting PAGE 41
  • 43. Challenge: four competing quality criteria “able to replay event log” “Occam’s razor” fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” PAGE 42
  • 44. Flower model b c a d start end e h f g PAGE 43
  • 45. What is the best model? A D C ACD 99 B E ACE 0 BCE 85 A D BCD 0 C B E PAGE 44
  • 46. What is the best model? A D C ACD 99 B E ACE 88 BCE 85 A D BCD 78 C B E PAGE 45
  • 47. What is the best model? A D C ACD 99 B E ACE 2 BCE 85 A D BCD 3 C B E PAGE 46
  • 48. Example: one log four models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh “able to replay event log” “Occam’s razor” 38 adbeg examine check thoroughly b d ticket g 33 acdefbdeh fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeg generalization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg “not overfitting the log” “not underfitting the log” register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request … (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 47 N4 : fitness = +, precision = +, generalization = -, simplicity = -
  • 49. # trace 455 acdeh Model N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh b 47 acdefdbeh examine thoroughly 38 adbeg g 33 acdefbdeh pay c compensation 14 acdefbdeg a examine e 11 acdefdbeg start register casually decide end request 9 adcefcdeh h d reject 8 adcefdbeh check ticket request 5 adcefbdeg f reinitiate 3 acdefbdefdbeg request N1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 48 1391
  • 50. # trace 455 acdeh Model N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg a c d e h 33 acdefbdeh start register examine check decide reject end 14 acdefbdeg request casually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 49 1391
  • 51. # trace 455 acdeh Model N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh examine check thoroughly b d ticket g 38 adbeg pay 33 acdefbdeh compensation a 14 acdefbdeg start register examine end 11 acdefdbeg request casually c e f reinitiate reject h 9 adcefcdeh decide request request 8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 50 1391
  • 52. # trace 455 acdeh Model N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdeg start end register examine check decide reject request casually ticket request 11 acdefdbeg … (all 21 variants seen in the log) 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = - PAGE 51 1391
  • 53. Why is process mining such a difficult problem? • There are no negative examples (i.e., a log shows what has happened but does not show what could not happen). • Due to concurrency, loops, and choices the search space has a complex structure and the log typically contains only a fraction of all possible behaviors. • There is no clear relation between the size of a model and its behavior (i.e., a smaller model may generate more or less behavior although classical analysis and evaluation methods typically assume some monotonicity property). PAGE 52
  • 54. Creating a 2-D slice of a 3-D reality Creating a 2-D slice of a 3-D reality: the process is viewed from a specific angle, the process is scoped using a frame, and the resolution determines the granularity of the resulting model PAGE 53