SlideShare a Scribd company logo
How to
”Effectively” ”Test”
your Chatbot
Soumya Mukherjee
Director QA, DevOps & AIML
Apty.IO
How are we doing our QA today
• Testing is Blackbox for testers
• Mostly manual testing done in organization
• Conversational flow testing
• Small Talk
• Fallback checks
• Integrations
• Automation done on UI and API layer
• Testing is mostly done on same training data
• Models are trained by engineers and are not being
monitored by QA
• There are analytics tools available to monitor but it
needs technical expertise for the QA
• Result : More than 90% times bot breaks (no one
understands when it will break), most of them fallback
and get stuck - once bot is stuck it is stuck
Q ?
A
What are the issues in QA ?
• Bots are evolving and continuous story creation is a problem
• No tool manage story coverage
• Your training data may not correspond to new stories or vice versa (it’s a
mismatch) – most org keep training on the same data
• Most automation tools offers record and playback (My stories are
already written how to port is the question)
What are the issues in QA ?
• No (unified) centralized dashboard present where QA can check (everything is quite scattered)
• Intent Matching
• Entity Testing – Slot identification
• Entity Testing – Entity Validation
• Confidence score
• Confusion Matrix along with Precision/Recall/F1-Score
• No easy way to reset the failed bot !
• Bot versioning is a mess and A/B testing becomes difficult
• Multilingual bot QA is a challenge (have to make 2 separate bots)
• High confidence score is also a problem as your bot will only predict same thing (if the data is same
for multiple intents then it will predict the one with highest confidence score – may be incorrect)
How to make sure your bot never breaks ?
How to make your test effective ?
• Create scenarios for happy path, contextual questions, digressions, domain
specific questions, stateless conversations
• Map proper entities for common scenarios (example bus fee, tuition fee) –
flow should change with entities in the stories
• Automated tests should consume all stories and run them each time as part
of regression testing
• Story coverage visualization
• For Manual Testing use Bot emulation product (like RasaX, Botfront) to test
How to make your test effective ?
• Central dashboarding including :
• Confusion matrix, Precision, Recall and F1-Score
• Cumulative accuracy profile
• Cross validation results
• Perform Exhaustive testing (bot resiliency), Integration checks across
platforms, Webhooks
• Perform fault tolerance testing by performing performance testing (bot
response, session management) & security testing (api interaction,
typing speed check, punctuations, typo errors)
Other KPIs to track
• Activity Volume
• Bounce rate
• Retention rate
• Open sessions count
• Session times (conversation length)
• Goal completion rate
• User feedback (sentiments)
• Fallback rate (Confusion rate, reset rate & Human takeover rate)
Thanks
@QASoumya
Linkedin.com/in/mukherjeesoumya

More Related Content

PDF
Building an End-to-End Test Automation Pipeline for Conversational AI | Rasa ...
PDF
Building an AI Assistant Factory - Rasa Summit 2021
PDF
Six Steps to Conversation Driven Development
PDF
BOTS TESTING BOTS: From manual to automated testing for conversational AI
PDF
Deploy your Rasa Chatbots like a Boss with DevOps | Rasa Summit 2021
PDF
Rasa Open Source - What's next?
PDF
Supercharging User Interfaces with Rasa | Rasa Summit 2021
PDF
Using Rasa to Power an Immersive Multimedia Conversational Experience | Rasa ...
Building an End-to-End Test Automation Pipeline for Conversational AI | Rasa ...
Building an AI Assistant Factory - Rasa Summit 2021
Six Steps to Conversation Driven Development
BOTS TESTING BOTS: From manual to automated testing for conversational AI
Deploy your Rasa Chatbots like a Boss with DevOps | Rasa Summit 2021
Rasa Open Source - What's next?
Supercharging User Interfaces with Rasa | Rasa Summit 2021
Using Rasa to Power an Immersive Multimedia Conversational Experience | Rasa ...

What's hot (20)

PPTX
AI and Python: Developing a Conversational Interface using Python
PPTX
Introduction to Aspect Oriented Programming
PPTX
Chatbot Tutorial - Create your first bot with Xatkit
PPTX
Aspect Oriented Programing - Introduction
PPTX
Code Review tool for personal effectiveness and waste analysis
PDF
Webinar: How to Use Integrated Version Control in Rasa X
PDF
DevOps & Technical Agility: From Theory to Practice
PPTX
Presentation delex
PDF
Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in Paris
 
PDF
When you get lost in api testing #ForumPHP
PPTX
Best Practices for a Repeatable Shift-Left Commitment
KEY
Skillshare - From Noob to Tech CEO - nov 7th, 2011
PPTX
Kaiser Permanente CSUN 2018
PDF
The 7 minute accessibility assessment and app rating system
PPTX
Introduction to Aspect Oriented Programming (DDD South West 4.0)
PPTX
Writing Testable Code in SharePoint
PDF
Research Updates from Rasa: Transformers in NLU and Dialogue
PPTX
Low-code vs Model-Driven Engineering
PPTX
Android application development part2
PPTX
Elements of a Test Framework
AI and Python: Developing a Conversational Interface using Python
Introduction to Aspect Oriented Programming
Chatbot Tutorial - Create your first bot with Xatkit
Aspect Oriented Programing - Introduction
Code Review tool for personal effectiveness and waste analysis
Webinar: How to Use Integrated Version Control in Rasa X
DevOps & Technical Agility: From Theory to Practice
Presentation delex
Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in Paris
 
When you get lost in api testing #ForumPHP
Best Practices for a Repeatable Shift-Left Commitment
Skillshare - From Noob to Tech CEO - nov 7th, 2011
Kaiser Permanente CSUN 2018
The 7 minute accessibility assessment and app rating system
Introduction to Aspect Oriented Programming (DDD South West 4.0)
Writing Testable Code in SharePoint
Research Updates from Rasa: Transformers in NLU and Dialogue
Low-code vs Model-Driven Engineering
Android application development part2
Elements of a Test Framework
Ad

Similar to How to Effectively Test Your Chatbot | Rasa Summit (20)

PDF
Mastering QA Automation_ From Strategy to Execution.pdf
PDF
Manual vs. Automated Testing_ Pros and Cons in the Modern Software Ecosystem.pdf
PPTX
What is Chatbot Testing? How to Perform Chatbot Testing?
PPTX
PPT from Geekle QA Global Summit 2025 conference
PPTX
Best Mobile Application Testing Services | Codetru
PDF
Intelligent Digital Mesh Testing
PDF
Test case for chatbots
PDF
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
PPTX
Manual Software Testing Services | Codetru
PDF
Building a Future-Proof Test Automation Strategy: From Planning to Execution
PPTX
Test strategy for Conversational AI
PPT
Designing and Deploying Chatbots
DOCX
5 Ways AI is Making Software Testing Smarter.docx
PDF
How to Leverage AI to Enhance UI Testing
PDF
Automated Web Testing and Open Source Tools
PDF
Top Five Secrets for a Successful Enterprise Mobile QA Automation Strategy
PDF
Proven Approaches to AI-Powered E2E Testing.pdf
PDF
Chatbot testing
PDF
Automated software testing complete guide
PPTX
Best Test Automation Services Company - Codetru
Mastering QA Automation_ From Strategy to Execution.pdf
Manual vs. Automated Testing_ Pros and Cons in the Modern Software Ecosystem.pdf
What is Chatbot Testing? How to Perform Chatbot Testing?
PPT from Geekle QA Global Summit 2025 conference
Best Mobile Application Testing Services | Codetru
Intelligent Digital Mesh Testing
Test case for chatbots
Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...
Manual Software Testing Services | Codetru
Building a Future-Proof Test Automation Strategy: From Planning to Execution
Test strategy for Conversational AI
Designing and Deploying Chatbots
5 Ways AI is Making Software Testing Smarter.docx
How to Leverage AI to Enhance UI Testing
Automated Web Testing and Open Source Tools
Top Five Secrets for a Successful Enterprise Mobile QA Automation Strategy
Proven Approaches to AI-Powered E2E Testing.pdf
Chatbot testing
Automated software testing complete guide
Best Test Automation Services Company - Codetru
Ad

More from Rasa Technologies (20)

PDF
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
PDF
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
PDF
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
PDF
The missing link: How AI can help create a safer society and better businesse...
PDF
Boss - Bringing More Diversity to Tech | Rasa Summit
PDF
How Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
PDF
Applying Conversational AI in the Enterprise
PDF
Ai = your data | Rasa Summit 2021
PPTX
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
PDF
Continuous Improvement of Conversational AI in Production | Rasa Summit
PDF
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
PDF
The State of Conversation Design - Designing for the Conversational Future
PDF
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
PDF
Conversational Teams: Moving Fast at Scale | Rasa Summit 2021
PDF
Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...
PDF
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...
PDF
Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...
PDF
Rasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from Rasa
PDF
Rasa Developer Summit - Alan Nichol, Rasa - Welcome & Intro
PDF
Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
The missing link: How AI can help create a safer society and better businesse...
Boss - Bringing More Diversity to Tech | Rasa Summit
How Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
Applying Conversational AI in the Enterprise
Ai = your data | Rasa Summit 2021
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
Continuous Improvement of Conversational AI in Production | Rasa Summit
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
The State of Conversation Design - Designing for the Conversational Future
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
Conversational Teams: Moving Fast at Scale | Rasa Summit 2021
Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...
Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...
Rasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from Rasa
Rasa Developer Summit - Alan Nichol, Rasa - Welcome & Intro
Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Tartificialntelligence_presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Tartificialntelligence_presentation.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
cloud_computing_Infrastucture_as_cloud_p
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Getting Started with Data Integration: FME Form 101
MIND Revenue Release Quarter 2 2025 Press Release
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Unlocking AI with Model Context Protocol (MCP)
Univ-Connecticut-ChatGPT-Presentaion.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

How to Effectively Test Your Chatbot | Rasa Summit

  • 1. How to ”Effectively” ”Test” your Chatbot Soumya Mukherjee Director QA, DevOps & AIML Apty.IO
  • 2. How are we doing our QA today • Testing is Blackbox for testers • Mostly manual testing done in organization • Conversational flow testing • Small Talk • Fallback checks • Integrations • Automation done on UI and API layer • Testing is mostly done on same training data • Models are trained by engineers and are not being monitored by QA • There are analytics tools available to monitor but it needs technical expertise for the QA • Result : More than 90% times bot breaks (no one understands when it will break), most of them fallback and get stuck - once bot is stuck it is stuck Q ? A
  • 3. What are the issues in QA ? • Bots are evolving and continuous story creation is a problem • No tool manage story coverage • Your training data may not correspond to new stories or vice versa (it’s a mismatch) – most org keep training on the same data • Most automation tools offers record and playback (My stories are already written how to port is the question)
  • 4. What are the issues in QA ? • No (unified) centralized dashboard present where QA can check (everything is quite scattered) • Intent Matching • Entity Testing – Slot identification • Entity Testing – Entity Validation • Confidence score • Confusion Matrix along with Precision/Recall/F1-Score • No easy way to reset the failed bot ! • Bot versioning is a mess and A/B testing becomes difficult • Multilingual bot QA is a challenge (have to make 2 separate bots) • High confidence score is also a problem as your bot will only predict same thing (if the data is same for multiple intents then it will predict the one with highest confidence score – may be incorrect) How to make sure your bot never breaks ?
  • 5. How to make your test effective ? • Create scenarios for happy path, contextual questions, digressions, domain specific questions, stateless conversations • Map proper entities for common scenarios (example bus fee, tuition fee) – flow should change with entities in the stories • Automated tests should consume all stories and run them each time as part of regression testing • Story coverage visualization • For Manual Testing use Bot emulation product (like RasaX, Botfront) to test
  • 6. How to make your test effective ? • Central dashboarding including : • Confusion matrix, Precision, Recall and F1-Score • Cumulative accuracy profile • Cross validation results • Perform Exhaustive testing (bot resiliency), Integration checks across platforms, Webhooks • Perform fault tolerance testing by performing performance testing (bot response, session management) & security testing (api interaction, typing speed check, punctuations, typo errors)
  • 7. Other KPIs to track • Activity Volume • Bounce rate • Retention rate • Open sessions count • Session times (conversation length) • Goal completion rate • User feedback (sentiments) • Fallback rate (Confusion rate, reset rate & Human takeover rate)