SlideShare a Scribd company logo
REVIEW OF EVALUATION METRICS USED IN LITERATURE + PROPOSED IDEA
DREAMS, EDCC 2021
13 September 2021
Evaluation of Human-in-the-Loop Learning
based Autonomous Systems
Prajit Thazhurazhikath Rajendran, Huascar Espinoza, Chokri Mraidha (CEA, DILS-
LSEA),
Agnes Delaborde (LNE)
| 2
DREAMS 2021 | Prajit T Rajendran
Safety challenges of DL/AI components
• The use of DL/AI components in autonomous
systems comes with various challenges:
• Vulnerable to out of distribution data
• Adversarial inputs
• Anomalies
• Lack of transparency
• Stochastic nature
• Unknown unknowns
• Uncertainty
• Safety is an emergent property- it is not as a
property of any particular component individually
• Regulation/qualification/certification of such DL/AI
components is an ongoing work by the community
• Traditional approaches do not facilitate safe learning
• Humans can guide the system to safe behavior with
their knowledge, experience and adaptability
normal
anomaly
Out-of-Distribution Samples
| 3
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Active learning
• Semi-supervised ML where only a subset of the training data is labelled
• Human queried interactively to label data points of interest from the unlabelled set
• PROS: Reduces data labelling requirement
• CONS: Selecting the right points to query is important
| 4
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Demonstration
• Human is in full control and provides demonstrations to train the agent
• Agent can mimic human data to use as a safe starting point
• PROS: Leads to safer policies
• CONS: More human effort needed, may be subjective, train-test distribution shift
| 5
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Intervention
• Human, agent share control and human intervenes when necessary
• Human takes over control to avoid catastrophic states and agent learns from these
• PROS: Leads to safer policies
• CONS: Need to keep human in the loop for long, slow response time
| 6
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Evaluation
• Agent in full control and human provides feedback for tasks
• Human gives feedback based on known objective or preference, which the agent
learns
• PROS: Leads to safer policies
• CONS: Need to keep human in the loop for long, credit attribution problem,
subjective feedback
| 7
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Rate of task completion
Safety
Performance
Data
requirement
User trust
Time
User
satisfaction
Rate of catastrophies
Response time
Training time
Subjective measures
Likert scale
Binary feedback
Type of interactions
Number of queries
Average reward
Deviation from thresholds
Number of interventions
| 8
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Safety
• Learning from intervention used
• Human intervenes to avoid undesirable events or catastrophies
• Policy constrained to safer regions
• Evaluated based on number of occurences of catastrophies
Trial without error- Towards safe RL via human intervention, William Saunders et.al
Trial without error- Towards safe RL via human intervention, William Saunders et.al
| 9
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Performance
One shot imitation learning, Yan Duan et.al
One shot imitation learning, Yan Duan et.al
• Learning from demonstration + meta-learning used
• Train networks that are not specific to one task and can adapt to
new tasks
• Evaluated based on average rate of success/task completion
| 10
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Time
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
• DAgger approaches: Learning from demonstration + intervention
• Start with imitation of expert policy, collect data
• Train the next policy under the aggregate of all collected datasets
• Hand over control to expert if necessary based on rulesets
• Evaluated based on number of training iterations needed to reach
a significant level of performance
Dropout Dagger- A Bayesian approach to safe imitation learning- Kunal Menda et.al
| 11
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Data
requirement
Overcoming blindspots in the real world: Leveraging complementary
abilities for joint execution, Ramya Ramakrishnan et.al
Overcoming blindspots in the real world: Leveraging complementary
abilities for joint execution, Ramya Ramakrishnan et.al
• Learning from demonstration + intervention used
• Agent and human both are considered to have blindspots
• Choose actor (human vs agent) based on blindspot activation level
• Evaluated based on number of human queries needed vs average
reward
| 12
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
User
satisfaction
• Trust ~ Extent that the human agrees with the AI
• Questionnaire about use of system, biological
data, number of interventions, “humanness” etc
• Users could be quick to distrust AI with easily
identifiable incorrect result
• Interpretability improves trust
User trust
• Satisfaction w.r.t interaction, performance,
design
• Could be subjective
• Questionnaires, evaluative feedbacks
• Necessary for successful adoption and
widespread use in society
| 13
DREAMS 2021 | Prajit T Rajendran
Limitations of prior approaches
• Assumptions made about humans (even experts) being always correct
• Interactions between human and AI may not always be flawless
• Uncertainty of DL components not considered
• Presence of errors in data
• No existing measure for data quality
• Data quality may be defined in terms of completeness, accuracy and efficiency
Cognitive overload Slow response Incorrect response Lack of attention
Errors in perception Errors in planning Errors in execution
| 14
DREAMS 2021 | Prajit T Rajendran
Proposed approach
• Hypothesis: Bad demonstration samples affect safety; Full self-exploration by system is
also infeasible
• Premise: Infeasible to start training afresh due to large training time, unsafe exploration
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.
| 15
DREAMS 2021 | Prajit T Rajendran
Proposed approach
• Non-exploratory training phase:
• Data from the data store is used to train the anomaly predictor and policy learning modules
• Can use human-in-the-loop to classify outliers as correct or erroneous
• Correct samples can directly be used for policy training
• Erroneous samples can be used to predict future anomalies/faults by combining with model of
environment dynamics
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.
| 16
DREAMS 2021 | Prajit T Rajendran
Proposed approach
• Exploratory training phase:
• System interacts with the environment but chooses actions based on predicted anomaly score
• Facilitates safe exploration by taking previous human feedback into consideration
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.
| 17
DREAMS 2021 | Prajit T Rajendran
Future work
• Evaluation of suitable datasets used in autonomous systems policy/control development
• Development of experimental procedure for design and test of proposed model
• Implementation of human-in-the-loop sample classifier, and anomaly predictor
• Evaluation of system on pre-decided metrics on target domain
| 18
DREAMS 2021 | Prajit T Rajendran
Conclusions
• Identified necessity of human-in-the-loop learning, discussed its categories
• Explored the various evaluation metrics of human-in-the-loop approaches presented in
literature
• Defined the requirements for ”quality data“ with characteristics such as accuracy,
completeness or efficiency
• Proposed a method to measure and improve data quality in human-in-the-loop
approaches
Commissariat à l’énergie atomique et aux énergies alternatives
Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142
91191 Gif-sur-Yvette Cedex - FRANCE
www-list.cea.fr
Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019
Prajit Thazhurazhikath Rajendran
prajit.thazhurazhikath@cea.fr Thank you

More Related Content

PPTX
Learning
PDF
Unit1: Introduction to AI
PDF
Human Behavior Learning And Transfer Yangsheng Xu Ka Keung C Lee
PDF
Empirical AI Research
PPTX
Unit 1- Part 1.pptx about basic of Artificial intelligence
PPTX
Imitation Learning
PDF
Humans in a loop: Jupyter notebooks as a front-end for AI
PPTX
Making decisions for complex, dynamic problems with imperfect knowledge: Th...
Learning
Unit1: Introduction to AI
Human Behavior Learning And Transfer Yangsheng Xu Ka Keung C Lee
Empirical AI Research
Unit 1- Part 1.pptx about basic of Artificial intelligence
Imitation Learning
Humans in a loop: Jupyter notebooks as a front-end for AI
Making decisions for complex, dynamic problems with imperfect knowledge: Th...

Similar to DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems (20)

PDF
cs330_2021_lifelong_learning.pdf
PDF
Human Behavior Learning and Transfer 1st Edition Yangsheng Xu (Author)
PPTX
InstructGPT: Follow instructions with human feedback
PDF
Humancentered Artificial Intelligence Advanced Lectures 1st Edition Mohamed C...
PPTX
Explainable Online Reinforcement Learning for Adaptive Systems
PDF
Artificial Agents Without Ontological Access to Reality
PPTX
Ai4life aiml-xops-sig
PDF
RL presentation
PPTX
Abhinav
PDF
Reinforcement Learning - Learning from Experience like a Human
PDF
Interacting with an Inferred World: the Challenge of Machine Learning for Hum...
PDF
Machine Learning Meets Human Learning
PDF
Hpai class 11 - online potpourri - 032320
PDF
Introduction to Artificial Intelligence
PPTX
Generative AI Reasoning Tech Talk - July 2024
PPTX
Human-Level AI & Phenomenology
PPT
old.ipto-2004.ppt
PPT
Learning
PPTX
Grounded Language Learning in a Simulated 3D World by DeepMind (論文読み)
cs330_2021_lifelong_learning.pdf
Human Behavior Learning and Transfer 1st Edition Yangsheng Xu (Author)
InstructGPT: Follow instructions with human feedback
Humancentered Artificial Intelligence Advanced Lectures 1st Edition Mohamed C...
Explainable Online Reinforcement Learning for Adaptive Systems
Artificial Agents Without Ontological Access to Reality
Ai4life aiml-xops-sig
RL presentation
Abhinav
Reinforcement Learning - Learning from Experience like a Human
Interacting with an Inferred World: the Challenge of Machine Learning for Hum...
Machine Learning Meets Human Learning
Hpai class 11 - online potpourri - 032320
Introduction to Artificial Intelligence
Generative AI Reasoning Tech Talk - July 2024
Human-Level AI & Phenomenology
old.ipto-2004.ppt
Learning
Grounded Language Learning in a Simulated 3D World by DeepMind (論文読み)
Ad

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Computer network topology notes for revision
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
annual-report-2024-2025 original latest.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Fluorescence-microscope_Botany_detailed content
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Computer network topology notes for revision
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Knowledge Engineering Part 1
Supervised vs unsupervised machine learning algorithms
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Business Analytics and business intelligence.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
ISS -ESG Data flows What is ESG and HowHow
annual-report-2024-2025 original latest.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to Data Science and Data Analysis
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
SAP 2 completion done . PRESENTATION.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Ad

DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems

  • 1. REVIEW OF EVALUATION METRICS USED IN LITERATURE + PROPOSED IDEA DREAMS, EDCC 2021 13 September 2021 Evaluation of Human-in-the-Loop Learning based Autonomous Systems Prajit Thazhurazhikath Rajendran, Huascar Espinoza, Chokri Mraidha (CEA, DILS- LSEA), Agnes Delaborde (LNE)
  • 2. | 2 DREAMS 2021 | Prajit T Rajendran Safety challenges of DL/AI components • The use of DL/AI components in autonomous systems comes with various challenges: • Vulnerable to out of distribution data • Adversarial inputs • Anomalies • Lack of transparency • Stochastic nature • Unknown unknowns • Uncertainty • Safety is an emergent property- it is not as a property of any particular component individually • Regulation/qualification/certification of such DL/AI components is an ongoing work by the community • Traditional approaches do not facilitate safe learning • Humans can guide the system to safe behavior with their knowledge, experience and adaptability normal anomaly Out-of-Distribution Samples
  • 3. | 3 DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Active learning • Semi-supervised ML where only a subset of the training data is labelled • Human queried interactively to label data points of interest from the unlabelled set • PROS: Reduces data labelling requirement • CONS: Selecting the right points to query is important
  • 4. | 4 DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Demonstration • Human is in full control and provides demonstrations to train the agent • Agent can mimic human data to use as a safe starting point • PROS: Leads to safer policies • CONS: More human effort needed, may be subjective, train-test distribution shift
  • 5. | 5 DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Intervention • Human, agent share control and human intervenes when necessary • Human takes over control to avoid catastrophic states and agent learns from these • PROS: Leads to safer policies • CONS: Need to keep human in the loop for long, slow response time
  • 6. | 6 DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Evaluation • Agent in full control and human provides feedback for tasks • Human gives feedback based on known objective or preference, which the agent learns • PROS: Leads to safer policies • CONS: Need to keep human in the loop for long, credit attribution problem, subjective feedback
  • 7. | 7 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Rate of task completion Safety Performance Data requirement User trust Time User satisfaction Rate of catastrophies Response time Training time Subjective measures Likert scale Binary feedback Type of interactions Number of queries Average reward Deviation from thresholds Number of interventions
  • 8. | 8 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Safety • Learning from intervention used • Human intervenes to avoid undesirable events or catastrophies • Policy constrained to safer regions • Evaluated based on number of occurences of catastrophies Trial without error- Towards safe RL via human intervention, William Saunders et.al Trial without error- Towards safe RL via human intervention, William Saunders et.al
  • 9. | 9 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Performance One shot imitation learning, Yan Duan et.al One shot imitation learning, Yan Duan et.al • Learning from demonstration + meta-learning used • Train networks that are not specific to one task and can adapt to new tasks • Evaluated based on average rate of success/task completion
  • 10. | 10 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Time A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning • DAgger approaches: Learning from demonstration + intervention • Start with imitation of expert policy, collect data • Train the next policy under the aggregate of all collected datasets • Hand over control to expert if necessary based on rulesets • Evaluated based on number of training iterations needed to reach a significant level of performance Dropout Dagger- A Bayesian approach to safe imitation learning- Kunal Menda et.al
  • 11. | 11 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Data requirement Overcoming blindspots in the real world: Leveraging complementary abilities for joint execution, Ramya Ramakrishnan et.al Overcoming blindspots in the real world: Leveraging complementary abilities for joint execution, Ramya Ramakrishnan et.al • Learning from demonstration + intervention used • Agent and human both are considered to have blindspots • Choose actor (human vs agent) based on blindspot activation level • Evaluated based on number of human queries needed vs average reward
  • 12. | 12 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods User satisfaction • Trust ~ Extent that the human agrees with the AI • Questionnaire about use of system, biological data, number of interventions, “humanness” etc • Users could be quick to distrust AI with easily identifiable incorrect result • Interpretability improves trust User trust • Satisfaction w.r.t interaction, performance, design • Could be subjective • Questionnaires, evaluative feedbacks • Necessary for successful adoption and widespread use in society
  • 13. | 13 DREAMS 2021 | Prajit T Rajendran Limitations of prior approaches • Assumptions made about humans (even experts) being always correct • Interactions between human and AI may not always be flawless • Uncertainty of DL components not considered • Presence of errors in data • No existing measure for data quality • Data quality may be defined in terms of completeness, accuracy and efficiency Cognitive overload Slow response Incorrect response Lack of attention Errors in perception Errors in planning Errors in execution
  • 14. | 14 DREAMS 2021 | Prajit T Rajendran Proposed approach • Hypothesis: Bad demonstration samples affect safety; Full self-exploration by system is also infeasible • Premise: Infeasible to start training afresh due to large training time, unsafe exploration Data store Unsupervised anomaly detector Feature extractor Environment dynamics Anomaly predictor Policy learning module Correct samples Erroneous samples Non-exploratory training phase Candidate samples Human-in-the-loop Environment Exploratory training phase Historical data, demonstrations etc.
  • 15. | 15 DREAMS 2021 | Prajit T Rajendran Proposed approach • Non-exploratory training phase: • Data from the data store is used to train the anomaly predictor and policy learning modules • Can use human-in-the-loop to classify outliers as correct or erroneous • Correct samples can directly be used for policy training • Erroneous samples can be used to predict future anomalies/faults by combining with model of environment dynamics Data store Unsupervised anomaly detector Feature extractor Environment dynamics Anomaly predictor Policy learning module Correct samples Erroneous samples Non-exploratory training phase Candidate samples Human-in-the-loop Environment Exploratory training phase Historical data, demonstrations etc.
  • 16. | 16 DREAMS 2021 | Prajit T Rajendran Proposed approach • Exploratory training phase: • System interacts with the environment but chooses actions based on predicted anomaly score • Facilitates safe exploration by taking previous human feedback into consideration Data store Unsupervised anomaly detector Feature extractor Environment dynamics Anomaly predictor Policy learning module Correct samples Erroneous samples Non-exploratory training phase Candidate samples Human-in-the-loop Environment Exploratory training phase Historical data, demonstrations etc.
  • 17. | 17 DREAMS 2021 | Prajit T Rajendran Future work • Evaluation of suitable datasets used in autonomous systems policy/control development • Development of experimental procedure for design and test of proposed model • Implementation of human-in-the-loop sample classifier, and anomaly predictor • Evaluation of system on pre-decided metrics on target domain
  • 18. | 18 DREAMS 2021 | Prajit T Rajendran Conclusions • Identified necessity of human-in-the-loop learning, discussed its categories • Explored the various evaluation metrics of human-in-the-loop approaches presented in literature • Defined the requirements for ”quality data“ with characteristics such as accuracy, completeness or efficiency • Proposed a method to measure and improve data quality in human-in-the-loop approaches
  • 19. Commissariat à l’énergie atomique et aux énergies alternatives Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142 91191 Gif-sur-Yvette Cedex - FRANCE www-list.cea.fr Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019 Prajit Thazhurazhikath Rajendran prajit.thazhurazhikath@cea.fr Thank you