DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems

REVIEW OF EVALUATION METRICS USED IN LITERATURE + PROPOSED IDEA
DREAMS, EDCC 2021
13 September 2021
Evaluation of Human-in-the-Loop Learning
based Autonomous Systems
Prajit Thazhurazhikath Rajendran, Huascar Espinoza, Chokri Mraidha (CEA, DILS-
LSEA),
Agnes Delaborde (LNE)

| 2
DREAMS 2021 | Prajit T Rajendran
Safety challenges of DL/AI components
• The use of DL/AI components in autonomous
systems comes with various challenges:
• Vulnerable to out of distribution data
• Adversarial inputs
• Anomalies
• Lack of transparency
• Stochastic nature
• Unknown unknowns
• Uncertainty
• Safety is an emergent property- it is not as a
property of any particular component individually
• Regulation/qualification/certification of such DL/AI
components is an ongoing work by the community
• Traditional approaches do not facilitate safe learning
• Humans can guide the system to safe behavior with
their knowledge, experience and adaptability
normal
anomaly
Out-of-Distribution Samples

| 3
Categories of human-in-the-loop learning methods
Active learning
• Semi-supervised ML where only a subset of the training data is labelled
• Human queried interactively to label data points of interest from the unlabelled set
• PROS: Reduces data labelling requirement
• CONS: Selecting the right points to query is important

| 4
Demonstration
• Human is in full control and provides demonstrations to train the agent
• Agent can mimic human data to use as a safe starting point
• PROS: Leads to safer policies
• CONS: More human effort needed, may be subjective, train-test distribution shift

| 5
Intervention
• Human, agent share control and human intervenes when necessary
• Human takes over control to avoid catastrophic states and agent learns from these
• CONS: Need to keep human in the loop for long, slow response time

| 6
Evaluation
• Agent in full control and human provides feedback for tasks
• Human gives feedback based on known objective or preference, which the agent
learns
• CONS: Need to keep human in the loop for long, credit attribution problem,
subjective feedback

| 7
Common metrics in HITL learning methods
Rate of task completion
Safety
Performance
Data
requirement
User trust
Time
User
satisfaction
Rate of catastrophies
Response time
Training time
Subjective measures
Likert scale
Binary feedback
Type of interactions
Number of queries
Average reward
Deviation from thresholds
Number of interventions

| 8
Safety
• Learning from intervention used
• Human intervenes to avoid undesirable events or catastrophies
• Policy constrained to safer regions
• Evaluated based on number of occurences of catastrophies
Trial without error- Towards safe RL via human intervention, William Saunders et.al
Trial without error- Towards safe RL via human intervention, William Saunders et.al

| 9
Performance
One shot imitation learning, Yan Duan et.al
One shot imitation learning, Yan Duan et.al
• Learning from demonstration + meta-learning used
• Train networks that are not specific to one task and can adapt to
new tasks
• Evaluated based on average rate of success/task completion

| 10
Time
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
• DAgger approaches: Learning from demonstration + intervention
• Start with imitation of expert policy, collect data
• Train the next policy under the aggregate of all collected datasets
• Hand over control to expert if necessary based on rulesets
• Evaluated based on number of training iterations needed to reach
a significant level of performance
Dropout Dagger- A Bayesian approach to safe imitation learning- Kunal Menda et.al

| 11
Data
requirement
Overcoming blindspots in the real world: Leveraging complementary
abilities for joint execution, Ramya Ramakrishnan et.al
Overcoming blindspots in the real world: Leveraging complementary
abilities for joint execution, Ramya Ramakrishnan et.al
• Learning from demonstration + intervention used
• Agent and human both are considered to have blindspots
• Choose actor (human vs agent) based on blindspot activation level
• Evaluated based on number of human queries needed vs average
reward

| 12
User
satisfaction
• Trust ~ Extent that the human agrees with the AI
• Questionnaire about use of system, biological
data, number of interventions, “humanness” etc
• Users could be quick to distrust AI with easily
identifiable incorrect result
• Interpretability improves trust
User trust
• Satisfaction w.r.t interaction, performance,
design
• Could be subjective
• Questionnaires, evaluative feedbacks
• Necessary for successful adoption and
widespread use in society

| 13
Limitations of prior approaches
• Assumptions made about humans (even experts) being always correct
• Interactions between human and AI may not always be flawless
• Uncertainty of DL components not considered
• Presence of errors in data
• No existing measure for data quality
• Data quality may be defined in terms of completeness, accuracy and efficiency
Cognitive overload Slow response Incorrect response Lack of attention
Errors in perception Errors in planning Errors in execution

| 14
Proposed approach
• Hypothesis: Bad demonstration samples affect safety; Full self-exploration by system is
also infeasible
• Premise: Infeasible to start training afresh due to large training time, unsafe exploration
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.

| 15
Proposed approach
• Non-exploratory training phase:
• Data from the data store is used to train the anomaly predictor and policy learning modules
• Can use human-in-the-loop to classify outliers as correct or erroneous
• Correct samples can directly be used for policy training
• Erroneous samples can be used to predict future anomalies/faults by combining with model of
environment dynamics
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase

| 16
Proposed approach
• Exploratory training phase:
• System interacts with the environment but chooses actions based on predicted anomaly score
• Facilitates safe exploration by taking previous human feedback into consideration
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase

| 17
Future work
• Evaluation of suitable datasets used in autonomous systems policy/control development
• Development of experimental procedure for design and test of proposed model
• Implementation of human-in-the-loop sample classifier, and anomaly predictor
• Evaluation of system on pre-decided metrics on target domain

| 18
Conclusions
• Identified necessity of human-in-the-loop learning, discussed its categories
• Explored the various evaluation metrics of human-in-the-loop approaches presented in
literature
• Defined the requirements for ”quality data“ with characteristics such as accuracy,
completeness or efficiency
• Proposed a method to measure and improve data quality in human-in-the-loop
approaches

Commissariat à l’énergie atomique et aux énergies alternatives
Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142
91191 Gif-sur-Yvette Cedex - FRANCE
www-list.cea.fr
Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019
Prajit Thazhurazhikath Rajendran
prajit.thazhurazhikath@cea.fr Thank you

DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems

More Related Content

Similar to DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems (20)

Recently uploaded (20)

DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems