SlideShare a Scribd company logo
Predictive Business Process Monitoring
with Structured
and Unstructured Data
Irene Teinemaa, Marlon Dumas,
Fabrizio Maria Maggi, Chiara Di Francescomarino
Predictive Monitoring
Example: Debt recovery process
Debt repayment due Call the debtor Send a reminder Payment received
Predictive Monitoring
Debt repayment due Call the debtor Send a reminder Send a warning Call the debtor Call the debtor
Send to external debt
collection agency
Example: Debt recovery process
Call the debtor
Send a reminder Send a warning Call the debtor Call the debtorCall the debtor
Call the debtor
Call the debtor
Call the debtor
Call the debtor Call the debtor
Predictive Monitoring
Event log
Classifier
/
Predictions
Attributes
Traces
Encoding the traces
Event1 Event2
Trace1 Call the
debtor
Send a
reminder
Call the debtor Send a reminder
Example: Debt recovery process
Encoding the traces
Event1 Event2 Resource1 Resource2 Debtor
Trace1 Call the
debtor
Send a
reminder
Sue Bob Mark
Call the debtor Send a reminder
Example: Debt recovery process
Debtor
Encoding the traces
Event1 Event2 Resource1 Resource2 Debtor Summary1 Summary2
Trace1 Call the
debtor
Send a
reminder
Sue Bob Mark ? ?
Call the debtor Send a reminder
Example: Debt recovery process
Debtor
Text Mining
The last ten years has seen a surge of interest in design science research in information
systems, organizations, process modelling and software engineering. In this talk I present a
framework for design science that shows how in design science research, we iterate over
designing new artifacts for a context, and empirically investigating these artifacts in this context.
To be relevant, the artifacts should potentially contribute to organizational goals, and to be
empirically sound, research to validate new artifacts should provide insight into the effects of
using these artifacts in an organizational context. The logic of both of these activities, design
and empirical research, is that of rational decision making. I show how this logic can be used
to structure our technical and empirical research goals and questions, as well as how to
structure reports about our technical or empirical research. This gives us checklists for the
design cycle used in technical research and for the empirical cycle used in empirical research.
Finally, I will discuss in more detail what the role of theories in design science research is, and
how we use theory to state research questions and to generalize the research results.
The tutorial first introduces the PPM including its activities: problem understanding, method
finding, modeling, reconciliation, and validation.
What is a good business process model and how do you get value from it? We have for many
years worked with SEQUAL, a general framework for understanding the quality of models and
modelling languages, which covers all main aspects relative to quality of models. The
framework has been widely cited since the first version was presented in the nineties, and the
tutorial will focus on the most recent version of the framework (2016), specialised for quality of
business process models, with a focus on how to achieve value through long-term usage of
business process models in organizations.
Business process models have gained significant importance due to their critical role for
managing business processes. Process models not only play a fundamental role for obtaining a
common understanding of an organization’s business processes, but are also important assets
for improving business processes and to support the development of information systems. In
this tutorial we will focus on the process of process modeling (PPM) and shed light on how
process models are created.
0.2, 0.1, 0.8, 0.5, …, 0.1
0.4, 0.8, 1.0, 0.2, …, 0.4
0.9, 0.0, 0.4, 0.5, …, 0.2
0.2, 0.3, 0.7, 0.6, …, 0.6
Text Modelling Methods
▷ Bag-of-n-grams
○ 1, 2, 3-grams
resend the invoice today pay
1 1 1 1 0
Example sentence: Resend the invoice today.
Text Modelling Methods
▷ Bag-of-n-grams
○ 1, 2, 3-grams
▷ Weighted bag-of-n-grams
○ Naive Bayes log count ratios
resend the invoice today pay
1 1 1 1 0
resend the invoice today pay
0.8 0.5 0.3 0.7 0
Example sentence: Resend the invoice today.
Text Modelling Methods
▷ Bag-of-n-grams
○ 1, 2, 3-grams
▷ Weighted bag-of-n-grams
○ Naive Bayes log count ratios
▷ Topic models
○ Latent Dirichlet Allocation
resend the invoice today pay
1 1 1 1 0
resend the invoice today pay
0.8 0.5 0.3 0.7 0
topic1 topic2 topic3
0.6 0.1 0.3
Example sentence: Resend the invoice today.
Text Modelling Methods
▷ Bag-of-n-grams
○ 1, 2, 3-grams
▷ Weighted bag-of-n-grams
○ Naive Bayes log count ratios
▷ Topic models
○ Latent Dirichlet Allocation
▷ Neural network
○ Paragraph Vector
“Resend the invoice today” vs.
“Resend the bill today”
resend the invoice today pay
1 1 1 1 0
resend the invoice today pay
0.8 0.5 0.3 0.7 0
topic1 topic2 topic3
0.6 0.1 0.9
t1 t2 t3
0.3 0.6 0.1
Example sentence: Resend the invoice today.
Encoding the traces
Event1 Event2 Resource1 Resource2 Debtor t11
... t1n
t21
... t2n
Trace1 Call the
debtor
Send a
reminder
Sue Bob Mark 0.2 ... 0.1 0.4 ... 0.4
Call the debtor Send a reminder
Example: Debt recovery process
Debtor
Proposed Framework
▷ Offline component
○ Building a text model and classifier for
all historical prefixes of a given length
▷ Online component
○ Producing predictions for a running case
Evaluation Datasets
Dataset Debt recovery Lead-to-contract
# normal cases 13608 385
# deviant cases 417 390
Average # words per
document
11 8
# lemmas 11822 2588
Experimental Set-up
▷ Data split: 80% train, 20% test (randomly)
▷ Handling imbalance: oversampling
▷ Classifiers: random forest and logistic regression
▷ Evaluation metrics: F-Score and earliness
▷ Parameter-tuning: grid search with 5-fold cross
validation on training set
Results
▷ Textual + structured features > structured
▷ Bag-of-n-grams > topic models > NN
▷ Topic models are better when only few
textual data available
▷ Neural network based model requires more
heterogeneous textual data
Future Work
▷ Evaluation on additional datasets
○ More heterogeneous textual data
○ Longer documents
▷ Conversational interactions
▷ Non-boolean predictions
▷ Interpretable explanations
Thanks!
irene.teinemaa@stacc.ee
Predictive Business Process Monitoring with Structured and Unstructured Data
Encoding the traces
▷ Prefix length = 2
▷ Prefix length = 3
Event1 Event2 Resource1 Resource2
Trace1 Call the
debtor
Send a
reminder
Sue Bob
Event1 Event2 Event3 Resource1 Resource2 Resource3
Trace1 Call the
debtor
Send a
reminder
Call the
debtor
Sue Bob Bob
Offline Component
Online Component
Evaluation metrics
▷ Precision:
▷ Recall:
▷ F-Score:
▷ Earliness:
predicted
deviant normal
deviant TP FN
normal FP TN
actual
Results
▷ Textual features improve
the predictions
▷ Random forest
outperforms logit
▷ Bag-of-words based
models outperform more
complex ones
▷ Topic models perform
relatively better when only
few textual data available
▷ Neural network based
model requires more
heterogeneous textual data
Computation times
Offline pre-processing (s) Offline classifier training (s) Online (ms per event)
Data DR LtC
Base 0.5 0.5
BoNG 5.1 1.4
NB 54.0 1.7
LDA 262 28
PV 212 14.7
Data DR LtC
Base 41.3 28.1
BoNG 50 29.9
NB 53.9 35.2
LDA 83.6 24.5
PV 61.3 27.3
Data DR LtC
Base 0.1 0.3
BoNG 0.4 0.4
NB 2.9 0.5
LDA 7.0 0.7
PV 2 0.5

More Related Content

PPTX
Automated Process Improvement: Status, Challenges, and Perspectives
PPTX
Business Process Monitoring and Mining
PPTX
Fundamentals of Business Process Management - Tutorial at CAiSE'2018
PPTX
Business Process Performance Mining with Staged Process Flows
PPTX
AI for Business Process Management
PPTX
Process Mining and Predictive Process Monitoring
PPTX
Process Mining in Action: Self-service data science for business teams
PPTX
Introduction to Business Process Monitoring and Process Mining
Automated Process Improvement: Status, Challenges, and Perspectives
Business Process Monitoring and Mining
Fundamentals of Business Process Management - Tutorial at CAiSE'2018
Business Process Performance Mining with Staged Process Flows
AI for Business Process Management
Process Mining and Predictive Process Monitoring
Process Mining in Action: Self-service data science for business teams
Introduction to Business Process Monitoring and Process Mining

What's hot (20)

PPTX
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
PPTX
BPM Techniques and Tools: A Quick Tour of the BPM Lifecycle
PPTX
AI for Business Process Management
PPTX
Process Mining and AI for Continuous Process Improvement
PPT
I tlecture2
PPT
Introduction to Business Process Analysis and Redesign
PDF
Business Process Modelling via BPMN, Session I
PPTX
Evidence-Based Business Process Management
PPT
From Models to Data and Back: The Journey of the BPM Discipline and the Tangl...
PPTX
Process Mining and Predictive Process Monitoring in Apromore
PPTX
Process Mining Meets Causal Machine Learning: Discovering Causal Rules From E...
PDF
Business Process Modeling
PPTX
My business processes are deviant! What should I do about it?
PPTX
Beyond Process Mining: Discovering Business Rules From Event Logs
PPTX
Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
PDF
Business Process Management 101 Training
PPT
Beyond Tasks and Gateways: Automated Discovery of BPMN Models with Subprocess...
PPTX
Bpm lifecycle ppt
PDF
An innovative software framework and toolkit for process optimization deploye...
PPTX
Process Mining and Predictive Process Monitoring
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
BPM Techniques and Tools: A Quick Tour of the BPM Lifecycle
AI for Business Process Management
Process Mining and AI for Continuous Process Improvement
I tlecture2
Introduction to Business Process Analysis and Redesign
Business Process Modelling via BPMN, Session I
Evidence-Based Business Process Management
From Models to Data and Back: The Journey of the BPM Discipline and the Tangl...
Process Mining and Predictive Process Monitoring in Apromore
Process Mining Meets Causal Machine Learning: Discovering Causal Rules From E...
Business Process Modeling
My business processes are deviant! What should I do about it?
Beyond Process Mining: Discovering Business Rules From Event Logs
Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
Business Process Management 101 Training
Beyond Tasks and Gateways: Automated Discovery of BPMN Models with Subprocess...
Bpm lifecycle ppt
An innovative software framework and toolkit for process optimization deploye...
Process Mining and Predictive Process Monitoring
Ad

Viewers also liked (16)

PPTX
Complete and Interpretable Conformance Checking of Business Processes
PPTX
Semantics and Analysis of DMN Decision Tables
PPTX
Automated Discovery of Structured Process Models: Discover Structured vs Disc...
PPTX
Factors Affecting the Sustained Use of Process Models
PPT
Fundamentals of Business Process Management: A Quick Introduction to Value-Dr...
PPTX
Minimizing Overprocessing Waste in Business Processes via Predictive Activity...
PPTX
Predictive Process Monitoring with Hyperparameter Optimization
PPT
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...
PPTX
In Processes We Trust: Privacy and Trust in Business Processes
PPTX
Minería de Procesos y de Reglas de Negocio
PPT
What is BPM?
PPTX
Differential Privacy Analysis of Data Processing Workflows
PDF
Introduction to Business Process Management
PDF
Business Process Management - Building The BPM Balanced Scorecard
PPTX
Everything Is Digital: Ten HR and Talent Predictions for 2020
PDF
Visual Design with Data
Complete and Interpretable Conformance Checking of Business Processes
Semantics and Analysis of DMN Decision Tables
Automated Discovery of Structured Process Models: Discover Structured vs Disc...
Factors Affecting the Sustained Use of Process Models
Fundamentals of Business Process Management: A Quick Introduction to Value-Dr...
Minimizing Overprocessing Waste in Business Processes via Predictive Activity...
Predictive Process Monitoring with Hyperparameter Optimization
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...
In Processes We Trust: Privacy and Trust in Business Processes
Minería de Procesos y de Reglas de Negocio
What is BPM?
Differential Privacy Analysis of Data Processing Workflows
Introduction to Business Process Management
Business Process Management - Building The BPM Balanced Scorecard
Everything Is Digital: Ten HR and Talent Predictions for 2020
Visual Design with Data
Ad

Similar to Predictive Business Process Monitoring with Structured and Unstructured Data (20)

PDF
Demystify Big Data, Data Science & Signal Extraction Deep Dive
PPTX
ML Framework for auto-responding to customer support queries
PDF
Document IT Communicate IT Succeed
PPT
Nimbus IP10 CJ Workshop
PDF
Migrating to Alfresco Part II: The “How” – Tools & Best Practices for Renovat...
PDF
Adaptive Case Management Workshop 2014 - Keynote
PPT
La Dove Associates -- CRM/Customer Care Consulting Overview
PPTX
Discovery and Analysis for Case Management
PPTX
Supporting Knowledge Workers With Adaptive Case Management
PPTX
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
PDF
Advanced Project Data Analytics for Improved Project Delivery
PPTX
Fact-Finding + the Lean Six Sigma World
PPTX
Day 1 (Lecture 2): Business Analytics
PPTX
APRA_Contact Reports_2016_Turner_Hrubik_IJM
PPT
Value of Collaboration Technologies in the "Flat World"
PPTX
ML Framework for auto-responding to customer support queries
PPTX
Large language models in higher education
DOCX
Mark Goesmann Resume 042716
PPT
Kansas Elsas Top-Cycle
PDF
Credit card fraud detection using python machine learning
Demystify Big Data, Data Science & Signal Extraction Deep Dive
ML Framework for auto-responding to customer support queries
Document IT Communicate IT Succeed
Nimbus IP10 CJ Workshop
Migrating to Alfresco Part II: The “How” – Tools & Best Practices for Renovat...
Adaptive Case Management Workshop 2014 - Keynote
La Dove Associates -- CRM/Customer Care Consulting Overview
Discovery and Analysis for Case Management
Supporting Knowledge Workers With Adaptive Case Management
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
Advanced Project Data Analytics for Improved Project Delivery
Fact-Finding + the Lean Six Sigma World
Day 1 (Lecture 2): Business Analytics
APRA_Contact Reports_2016_Turner_Hrubik_IJM
Value of Collaboration Technologies in the "Flat World"
ML Framework for auto-responding to customer support queries
Large language models in higher education
Mark Goesmann Resume 042716
Kansas Elsas Top-Cycle
Credit card fraud detection using python machine learning

More from Marlon Dumas (20)

PPTX
LLM-Assisted Optimization of Waiting Time in Business Processes: A Prompting ...
PPTX
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
PPTX
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
PPTX
How GenAI will (not) change your business?
PPTX
Walking the Way from Process Mining to AI-Driven Process Optimization
PPTX
Discovery and Simulation of Business Processes with Probabilistic Resource Av...
PPTX
Can I Trust My Simulation Model? Measuring the Quality of Business Process Si...
PPTX
Business Process Optimization: Status and Perspectives
PPTX
Learning When to Treat Business Processes: Prescriptive Process Monitoring wi...
PPTX
Why am I Waiting Data-Driven Analysis of Waiting Times in Business Processes
PPTX
Augmented Business Process Management
PPTX
Process Mining and Data-Driven Process Simulation
PPTX
Modeling Extraneous Activity Delays in Business Process Simulation
PPTX
Business Process Simulation with Differentiated Resources: Does it Make a Dif...
PPTX
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints
PPTX
Robotic Process Mining
PPTX
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
PPTX
Learning Accurate Business Process Simulation Models from Event Logs via Auto...
PPTX
Process Mining: A Guide for Practitioners
PPTX
Process Mining for Process Improvement.pptx
LLM-Assisted Optimization of Waiting Time in Business Processes: A Prompting ...
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
How GenAI will (not) change your business?
Walking the Way from Process Mining to AI-Driven Process Optimization
Discovery and Simulation of Business Processes with Probabilistic Resource Av...
Can I Trust My Simulation Model? Measuring the Quality of Business Process Si...
Business Process Optimization: Status and Perspectives
Learning When to Treat Business Processes: Prescriptive Process Monitoring wi...
Why am I Waiting Data-Driven Analysis of Waiting Times in Business Processes
Augmented Business Process Management
Process Mining and Data-Driven Process Simulation
Modeling Extraneous Activity Delays in Business Process Simulation
Business Process Simulation with Differentiated Resources: Does it Make a Dif...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints
Robotic Process Mining
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
Learning Accurate Business Process Simulation Models from Event Logs via Auto...
Process Mining: A Guide for Practitioners
Process Mining for Process Improvement.pptx

Recently uploaded (20)

PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
A Brief Introduction About Julia Allison
PPTX
Probability Distribution, binomial distribution, poisson distribution
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PDF
DOC-20250806-WA0002._20250806_112011_0000.pdf
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PDF
Laughter Yoga Basic Learning Workshop Manual
PPTX
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PDF
Training And Development of Employee .pdf
PPTX
Amazon (Business Studies) management studies
PDF
Types of control:Qualitative vs Quantitative
PPT
340036916-American-Literature-Literary-Period-Overview.ppt
Ôn tập tiếng anh trong kinh doanh nâng cao
A Brief Introduction About Julia Allison
Probability Distribution, binomial distribution, poisson distribution
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
DOC-20250806-WA0002._20250806_112011_0000.pdf
Roadmap Map-digital Banking feature MB,IB,AB
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
Laughter Yoga Basic Learning Workshop Manual
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
New Microsoft PowerPoint Presentation - Copy.pptx
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Power and position in leadershipDOC-20250808-WA0011..pdf
Training And Development of Employee .pdf
Amazon (Business Studies) management studies
Types of control:Qualitative vs Quantitative
340036916-American-Literature-Literary-Period-Overview.ppt

Predictive Business Process Monitoring with Structured and Unstructured Data

  • 1. Predictive Business Process Monitoring with Structured and Unstructured Data Irene Teinemaa, Marlon Dumas, Fabrizio Maria Maggi, Chiara Di Francescomarino
  • 2. Predictive Monitoring Example: Debt recovery process Debt repayment due Call the debtor Send a reminder Payment received
  • 3. Predictive Monitoring Debt repayment due Call the debtor Send a reminder Send a warning Call the debtor Call the debtor Send to external debt collection agency Example: Debt recovery process Call the debtor Send a reminder Send a warning Call the debtor Call the debtorCall the debtor Call the debtor Call the debtor Call the debtor Call the debtor Call the debtor
  • 5. Encoding the traces Event1 Event2 Trace1 Call the debtor Send a reminder Call the debtor Send a reminder Example: Debt recovery process
  • 6. Encoding the traces Event1 Event2 Resource1 Resource2 Debtor Trace1 Call the debtor Send a reminder Sue Bob Mark Call the debtor Send a reminder Example: Debt recovery process Debtor
  • 7. Encoding the traces Event1 Event2 Resource1 Resource2 Debtor Summary1 Summary2 Trace1 Call the debtor Send a reminder Sue Bob Mark ? ? Call the debtor Send a reminder Example: Debt recovery process Debtor
  • 8. Text Mining The last ten years has seen a surge of interest in design science research in information systems, organizations, process modelling and software engineering. In this talk I present a framework for design science that shows how in design science research, we iterate over designing new artifacts for a context, and empirically investigating these artifacts in this context. To be relevant, the artifacts should potentially contribute to organizational goals, and to be empirically sound, research to validate new artifacts should provide insight into the effects of using these artifacts in an organizational context. The logic of both of these activities, design and empirical research, is that of rational decision making. I show how this logic can be used to structure our technical and empirical research goals and questions, as well as how to structure reports about our technical or empirical research. This gives us checklists for the design cycle used in technical research and for the empirical cycle used in empirical research. Finally, I will discuss in more detail what the role of theories in design science research is, and how we use theory to state research questions and to generalize the research results. The tutorial first introduces the PPM including its activities: problem understanding, method finding, modeling, reconciliation, and validation. What is a good business process model and how do you get value from it? We have for many years worked with SEQUAL, a general framework for understanding the quality of models and modelling languages, which covers all main aspects relative to quality of models. The framework has been widely cited since the first version was presented in the nineties, and the tutorial will focus on the most recent version of the framework (2016), specialised for quality of business process models, with a focus on how to achieve value through long-term usage of business process models in organizations. Business process models have gained significant importance due to their critical role for managing business processes. Process models not only play a fundamental role for obtaining a common understanding of an organization’s business processes, but are also important assets for improving business processes and to support the development of information systems. In this tutorial we will focus on the process of process modeling (PPM) and shed light on how process models are created. 0.2, 0.1, 0.8, 0.5, …, 0.1 0.4, 0.8, 1.0, 0.2, …, 0.4 0.9, 0.0, 0.4, 0.5, …, 0.2 0.2, 0.3, 0.7, 0.6, …, 0.6
  • 9. Text Modelling Methods ▷ Bag-of-n-grams ○ 1, 2, 3-grams resend the invoice today pay 1 1 1 1 0 Example sentence: Resend the invoice today.
  • 10. Text Modelling Methods ▷ Bag-of-n-grams ○ 1, 2, 3-grams ▷ Weighted bag-of-n-grams ○ Naive Bayes log count ratios resend the invoice today pay 1 1 1 1 0 resend the invoice today pay 0.8 0.5 0.3 0.7 0 Example sentence: Resend the invoice today.
  • 11. Text Modelling Methods ▷ Bag-of-n-grams ○ 1, 2, 3-grams ▷ Weighted bag-of-n-grams ○ Naive Bayes log count ratios ▷ Topic models ○ Latent Dirichlet Allocation resend the invoice today pay 1 1 1 1 0 resend the invoice today pay 0.8 0.5 0.3 0.7 0 topic1 topic2 topic3 0.6 0.1 0.3 Example sentence: Resend the invoice today.
  • 12. Text Modelling Methods ▷ Bag-of-n-grams ○ 1, 2, 3-grams ▷ Weighted bag-of-n-grams ○ Naive Bayes log count ratios ▷ Topic models ○ Latent Dirichlet Allocation ▷ Neural network ○ Paragraph Vector “Resend the invoice today” vs. “Resend the bill today” resend the invoice today pay 1 1 1 1 0 resend the invoice today pay 0.8 0.5 0.3 0.7 0 topic1 topic2 topic3 0.6 0.1 0.9 t1 t2 t3 0.3 0.6 0.1 Example sentence: Resend the invoice today.
  • 13. Encoding the traces Event1 Event2 Resource1 Resource2 Debtor t11 ... t1n t21 ... t2n Trace1 Call the debtor Send a reminder Sue Bob Mark 0.2 ... 0.1 0.4 ... 0.4 Call the debtor Send a reminder Example: Debt recovery process Debtor
  • 14. Proposed Framework ▷ Offline component ○ Building a text model and classifier for all historical prefixes of a given length ▷ Online component ○ Producing predictions for a running case
  • 15. Evaluation Datasets Dataset Debt recovery Lead-to-contract # normal cases 13608 385 # deviant cases 417 390 Average # words per document 11 8 # lemmas 11822 2588
  • 16. Experimental Set-up ▷ Data split: 80% train, 20% test (randomly) ▷ Handling imbalance: oversampling ▷ Classifiers: random forest and logistic regression ▷ Evaluation metrics: F-Score and earliness ▷ Parameter-tuning: grid search with 5-fold cross validation on training set
  • 17. Results ▷ Textual + structured features > structured ▷ Bag-of-n-grams > topic models > NN ▷ Topic models are better when only few textual data available ▷ Neural network based model requires more heterogeneous textual data
  • 18. Future Work ▷ Evaluation on additional datasets ○ More heterogeneous textual data ○ Longer documents ▷ Conversational interactions ▷ Non-boolean predictions ▷ Interpretable explanations
  • 21. Encoding the traces ▷ Prefix length = 2 ▷ Prefix length = 3 Event1 Event2 Resource1 Resource2 Trace1 Call the debtor Send a reminder Sue Bob Event1 Event2 Event3 Resource1 Resource2 Resource3 Trace1 Call the debtor Send a reminder Call the debtor Sue Bob Bob
  • 24. Evaluation metrics ▷ Precision: ▷ Recall: ▷ F-Score: ▷ Earliness: predicted deviant normal deviant TP FN normal FP TN actual
  • 25. Results ▷ Textual features improve the predictions ▷ Random forest outperforms logit ▷ Bag-of-words based models outperform more complex ones ▷ Topic models perform relatively better when only few textual data available ▷ Neural network based model requires more heterogeneous textual data
  • 26. Computation times Offline pre-processing (s) Offline classifier training (s) Online (ms per event) Data DR LtC Base 0.5 0.5 BoNG 5.1 1.4 NB 54.0 1.7 LDA 262 28 PV 212 14.7 Data DR LtC Base 41.3 28.1 BoNG 50 29.9 NB 53.9 35.2 LDA 83.6 24.5 PV 61.3 27.3 Data DR LtC Base 0.1 0.3 BoNG 0.4 0.4 NB 2.9 0.5 LDA 7.0 0.7 PV 2 0.5