Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations

Discovering Natural Bugs Using
Adversarial Perturbations
Sameer Singh
AI2 Meetup on Robust AI: Debugging NLP July 17th, 2019

circa 2005
[adapted from Zadeh 2005, From Search Engines to Question-Answering Systems — The Need for New Tools]

But we know models are brittle…
Feng et al, EMNLP 2018
Anton van den Hengel, ACL 2018
Jia and Liang, EMNLP 2017

Black-box Explanations for Debugging?
LIME Anchors
From: Keith Richards
Subject: Christianity is the answer
NTTP-Posting-Host: x.x.com
I think Christianity is the one true religion.
If you’d like to know more, send me a note

How do we discover these “bugs”?
Original
Instance
Original PredictionML Pipeline
ML Pipeline Expected PredictionChanged
Instance
Perturb it in a
specific way

Outline
Semantically Equivalent Adversaries
Semantically Implied Adversaries
Universal Adversaries

Outline
Z. Zhao, D. Dua, S. Singh.
Generating Natural Adversarial Examples.
Int. Conf. on Learning Representations (ICLR). 2018
M. T. Ribeiro, S. Singh, C. Guestrin.
Semantically Equivalent Adversarial Rules for Debugging NLP models.
Annual Meeting of the Assoc for Computational Linguistics (ACL). 2018

Adversarial Examples: Oversensitivity
Find closest example with different prediction
x f y
x' f y

Adversarial Attacks on Text
What type of road sign is shown?
> STOP.
What type of road sign is
shown?
Perceptible by humans, unlikely in real world
What type of road sign is
sho wn?

Preserve the Semantics
> Do not Enter.
> STOP.
Bug, and likely in the real world

Preserve the Semantics
The biggest city on the river Rhine is
Cologne, Germany with a population of
more than 1,050,000 people.
It is the second-longest river in Central
and Western Europe (after the Danube),
at about 1,230 km (760 mi)
How long is the Rhine?
> More than 1,050,000
> 1230km
How long is the Rhine?
Bug, and likely in the real world

Transformation “Rules”: Sentiment Analysis
fastText [Joulin et al., 2016]

Outline
M. T. Ribeiro, C. Guestrin, S. Singh.
Are Red Roses Red? Evaluating Consistency of Question-Answering Models.
Association for Computational Linguistics (ACL). 2019

Consistency in Predictions
How many birds? 1
So far, we have considered equivalence, i.e. (x, y) → (x’, y)
Yes
(x, y)
(x’, y’) Is there 1 bird?

Visual QA
(x, y): What room is this? bathroom
Logical Equivalence
(x’, y’): Is this a bathroom? Yes
Necessary Condition
(x’, y’): Is there a bathroom in the picture? Yes
Mutual Exclusion
(x’, y’): Is this a kitchen? No
57%
50%
35%
67%
97% are valid!

Implication Adversaries
• We shouldn’t treat each prediction in isolation
• Inconsistency leads to poor user experience
• Currently, rule-based system for generating them
• Already promising!
• Reveals important bugs in the models
• Even simple data augmentation is promising

Outline
under review

• Instead of replacement, let’s consider additions
• Are there tokens that make the model misbehave?
b has nothing to do with the task, and is not specific to x
x f y
b+x f y

Language Modeling (GPTv2 small)
TH PEOPLEMan god dreams Blacks are the worst people in the world.
A few token prefix that makes the model produce racist language L
TH PEOPLEMan god dreams Blacks I bought a mac and a soda for $10.00. I’m not
a racist, but I’m not saying that blacks should be allowed to buy a soda.
TH PEOPLEMan god dreams Blacks yesterday I’m going to be a fucking black man.
I don’t know what to say to that, but fuck you.
TH PEOPLEMan god dreams Blacks my mother says I’m a racist, and she’s right.
Searched so that we even user input after prefix causes problems
WARNING: Strong Language

Debugging by Changing Instances
• “Natural Perturbations” for NLP
• Semantically Equivalent
• Semantic Implications
• Universal Tokens
• Useful for identifying different kinds of problems
• Not all of them are traditional “bugs”
• General set of approaches that apply for most models

Thanks!
sameer@uci.edu
sameersingh.org
@sameer_

Semantic Adversaries for NLP [ACL 2018]
Semantically-Equivalent Adversary
(SEA)
Semantically-Equivalent Adversarial Rules
(SEARs)
color → colour
x
Backtranslation
+ Filtering
x’ (x, x’)
Patterns
in “diffs”
Rules

VQA User Study: Detecting adversaries
33.6
36
45
0
20
40
Human SEA Human + SEA
Human SEA Human + SEA
SEAs find adversaries as often as humans!
SEAs + Humans better than humans!

Domain-Independent Approach [ICLR 2018]
x f y
x' f yG
Generator
Iz
Inverter
z'

VQA User study: Can experts find bugs?
3
14.2
0
20
Visual QA
Experts SEARs
16.9
10.1
0
20
Visual QA
Finding Rules Evaluating SEARs
% predictions flipped Time (minutes)
SEARs are much better than
expert-produced rules
Evaluating is much easier
than finding them
Closing the loop brings it down to 1.4%

Oversensitivity in images
Adversaries are indistinguishable to humans…
But unlikely in the real world (except for attacks)
“panda”
57.7% confidence
“gibbon”
99.3% confidence

Evaluating Implication Consistency
Validation
Data
(x, y)
Implication
Generation
Implications
(x,y), (x’,y’)
Model
f
Consistency
# y y’ correct
# y correct
based on parses,
POS, WordNet, etc.

Visual QA Results
Model Acc LogEq Mutex Nec Avg Augmentation
SAAA (Kazemi, Elqursh, 2017) 61.5 76.6 42.3 90.2 72.7 94.4
Count (Zhang et al., 2018) 65.2 81.2 42.8 92.0 75.0 94.1
BAN (Kim et al., 2018) 64.5 73.1 50.4 87.3 72.5 95.0
Good at answer w/ numbers, but not questions w/ numbers
e.g. How many birds? 1 (12%) → Are there 2 birds? yes (<1%)

Transformation “Rules”: VisualQA
Visual7a-Telling [Zhu et al 2016]

Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations

More Related Content

Recently uploaded (20)

Featured (20)

Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations