SlideShare a Scribd company logo
Discovering	Natural	Bugs	Using	
Adversarial	Perturbations
Sameer	Singh
AI2	Meetup	on	Robust	AI:	Debugging	NLP July	17th,	2019
circa	2005
[adapted	from	Zadeh 2005,	From	Search	Engines	to	Question-Answering	Systems	— The	Need	for	New	Tools]
2019
NLP	has	come	a	long	way!
But	we	know	models	are	brittle…
Feng	et	al,	EMNLP	2018
Anton van den Hengel, ACL 2018
Jia and	Liang,	EMNLP	2017
Black-box	Explanations	for	Debugging?
LIME Anchors
From: Keith Richards
Subject: Christianity is the answer
NTTP-Posting-Host: x.x.com
I think Christianity is the one true religion.
If you’d like to know more, send me a note
How	do	we	discover	these	“bugs”?
Original	
Instance
Original	PredictionML	Pipeline
ML	Pipeline Expected	PredictionChanged	
Instance
Perturb	it	in	a	
specific	way
Outline
Semantically	Equivalent	Adversaries
Semantically	Implied	Adversaries
Universal	Adversaries
Outline
Semantically	Equivalent	Adversaries
Semantically	Implied	Adversaries
Universal	Adversaries
Z.	Zhao, D.	Dua, S.	Singh.
Generating	Natural	Adversarial	Examples.
Int.	Conf.	on	Learning	Representations	(ICLR). 2018
M.	T.	Ribeiro, S.	Singh, C.	Guestrin.
Semantically	Equivalent	Adversarial	Rules	for	Debugging	NLP	models.
Annual	Meeting	of	the	Assoc for	Computational	Linguistics	(ACL). 2018
Adversarial	Examples:	Oversensitivity
Find	closest	example	with	different	prediction
x f y
x' f y
Adversarial	Attacks	on	Text
What	type	of	road	sign	is	shown?
>	STOP.
What	type	of	road	sign	is	
shown?
Perceptible	by	humans,	unlikely	in	real	world
What				type	of	road	sign	is	
sho wn?
Preserve	the	Semantics
What	type	of	road	sign	is	shown?
>	Do	not	Enter.
>	STOP.
What	type	of	road	sign	is	shown?
Bug,	and	likely	in	the	real	world
Preserve	the	Semantics
The	biggest	city	on	the	river	Rhine	is	
Cologne,	Germany	with	a	population	of	
more	than	1,050,000	people.
It	is	the	second-longest	river	in	Central	
and	Western	Europe	(after	the	Danube),	
at	about	1,230	km	(760	mi)
How	long	is	the	Rhine?
>	More	than	1,050,000
>	1230km
How	long	is	the	Rhine?
Bug,	and	likely	in	the	real	world
Transformation	“Rules”:	Sentiment	Analysis
fastText [Joulin et	al.,	2016]
Outline
Semantically	Equivalent	Adversaries
Semantically	Implied	Adversaries
Universal	Adversaries
M.	T.	Ribeiro, C.	Guestrin, S.	Singh.
Are	Red	Roses	Red?	Evaluating	Consistency	of	Question-Answering	Models.	
Association	for	Computational	Linguistics	(ACL).	2019
Consistency	in	Predictions
How	many	birds?			1
So	far,	we	have	considered	equivalence,	i.e.	(x, y) → (x’, y)
Yes
(x, y)
(x’, y’) Is	there	1	bird?
Visual	QA
(x, y): What	room	is	this?	bathroom
Logical	Equivalence
(x’, y’): Is	this	a	bathroom?	Yes
Necessary	Condition
(x’, y’): Is	there	a	bathroom	in	the	picture?	Yes
Mutual	Exclusion
(x’, y’): Is	this	a	kitchen?	No
57%
50%
35%
67%
97%	are	valid!
Implication	Adversaries
• We	shouldn’t	treat	each	prediction	in	isolation
• Inconsistency	leads	to	poor	user	experience
• Currently,	rule-based	system	for	generating	them
• Already	promising!
• Reveals	important	bugs	in	the	models
• Even	simple	data	augmentation	is	promising
Outline
Semantically	Equivalent	Adversaries
Semantically	Implied	Adversaries
Universal	Adversaries
under	review
Universal	Adversaries
• Instead	of	replacement,	let’s	consider	additions
• Are	there	tokens	that	make	the	model	misbehave?
b has	nothing	to	do	with	the	task,	and	is	not	specific	to	x
x f y
b+x f y
Language	Modeling	(GPTv2	small)
TH	PEOPLEMan god dreams Blacks are	the	worst	people	in	the	world.
A	few	token	prefix	that	makes	the	model	produce	racist	language	L
TH	PEOPLEMan god	dreams	Blacks	I	bought	a	mac and	a	soda	for	$10.00.	I’m	not	
a	racist,	but	I’m	not	saying	that	blacks	should	be	allowed	to	buy	a	soda.
TH	PEOPLEMan god	dreams	Blacks	yesterday	I’m	going	to	be	a	fucking	black	man.	
I	don’t	know	what	to	say	to	that,	but	fuck	you.
TH	PEOPLEMan god	dreams	Blacks	my	mother	says I’m	a	racist,	and	she’s	right.
Searched	so	that	we	even	user	input	after	prefix	causes	problems
WARNING:	Strong	Language
Debugging	by	Changing	Instances
• “Natural	Perturbations”	for	NLP
• Semantically	Equivalent	
• Semantic	Implications
• Universal	Tokens
• Useful	for	identifying	different	kinds	of	problems
• Not	all	of	them	are	traditional	“bugs”
• General	set	of	approaches	that	apply	for	most	models
Thanks!
sameer@uci.edu
sameersingh.org
@sameer_
Semantic	Adversaries	for	NLP			 [ACL	2018]
Semantically-Equivalent	Adversary
(SEA)
Semantically-Equivalent	Adversarial	Rules
(SEARs)
color	→	colour
x
Backtranslation
+	Filtering
x’ (x, x’)
Patterns
in	“diffs”
Rules
VQA	User	Study:	Detecting	adversaries
33.6
36
45
0
20
40
Human SEA Human	+	SEA
Human SEA Human	+	SEA
SEAs	find	adversaries	as	often	as	humans!
SEAs	+	Humans	better	than	humans!
Domain-Independent	Approach												[ICLR	2018]
x f y
x' f yG
Generator
Iz
Inverter
z'
VQA	User	study:	Can	experts	find	bugs?
3
14.2
0
20
Visual	QA
Experts SEARs
16.9
10.1
0
20
Visual	QA
Finding	Rules Evaluating	SEARs
%	predictions	flipped Time	(minutes)
SEARs	are	much	better	than	
expert-produced	rules
Evaluating	is	much	easier	
than	finding	them
Closing	the	loop	brings	it	down	to	1.4%
Oversensitivity	in	images
Adversaries	are	indistinguishable	to	humans…
But	unlikely in	the	real	world	(except	for	attacks)
“panda”
57.7%	confidence
“gibbon”
99.3%	confidence
Evaluating	Implication	Consistency
Validation
Data
(x, y)
Implication
Generation
Implications
(x,y), (x’,y’)
Model
f
Consistency
#	y y’ correct
#	y correct
based	on	parses,
POS,	WordNet,	etc.
Visual	QA	Results
Model Acc LogEq Mutex Nec Avg Augmentation
SAAA	(Kazemi,	Elqursh,	2017) 61.5 76.6 42.3 90.2 72.7 94.4
Count	(Zhang	et	al.,	2018) 65.2 81.2 42.8 92.0 75.0 94.1
BAN (Kim	et	al.,	2018) 64.5 73.1 50.4 87.3 72.5 95.0
Good	at	answer	w/	numbers,	but	not	questions	w/	numbers
e.g.	How	many	birds?	1 (12%)	→	Are	there	2	birds?	yes (<1%)
Transformation	“Rules”:	VisualQA
Visual7a-Telling	[Zhu	et	al	2016]

More Related Content

PDF
Rsqrd AI: Making Conversational AI Work for Everybody
PDF
Rsqrd AI: Application of Explanation Model in Healthcare
PDF
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
PDF
Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
PPTX
Rsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
PPTX
Rsqrd AI: Explaining ML Models w/ Geometric Intuition
PDF
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
PDF
Rsqrd AI: Exploring Machine Learning Model Predictions
Rsqrd AI: Making Conversational AI Work for Everybody
Rsqrd AI: Application of Explanation Model in Healthcare
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
Rsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
Rsqrd AI: Explaining ML Models w/ Geometric Intuition
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
Rsqrd AI: Exploring Machine Learning Model Predictions

Recently uploaded (20)

PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
project resource management chapter-09.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
STKI Israel Market Study 2025 version august
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
A comparative study of natural language inference in Swahili using monolingua...
TLE Review Electricity (Electricity).pptx
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
The various Industrial Revolutions .pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A novel scalable deep ensemble learning framework for big data classification...
WOOl fibre morphology and structure.pdf for textiles
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
DP Operators-handbook-extract for the Mautical Institute
project resource management chapter-09.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Web App vs Mobile App What Should You Build First.pdf
STKI Israel Market Study 2025 version august
Group 1 Presentation -Planning and Decision Making .pptx
1 - Historical Antecedents, Social Consideration.pdf
Ad
Ad

Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations