On Security and Sparsity of Linear Classifiers for Adversarial Settings

Pattern Recognition
and Applications Lab
University
of Cagliari, Italy
Department of
Electrical and Electronic
Engineering
On Security and Sparsity of Linear Classifiers
for Adversarial Settings
Ambra Demontis, Paolo Russu, Battista Biggio,
Giorgio Fumera, Fabio Roli
battista.biggio@diee.unica.it
Dept. Of Electrical and Electronic Engineering
University of Cagliari, Italy
S+SSPR, Merida, Mexico, Dec. 1 2016

http://pralab.diee.unica.it
Recent Applications of Machine Learning
• Consumer technologies for personal applications
2

iPhone 5s with Fingerprint Recognition…
3

… Cracked a Few Days After Its Release
4
EU FP7 Project: TABULA RASA

New Challenges for Machine Learning
• The use of machine learning opens up new big possibilities
but also new security risks
• Proliferation and sophistication
of attacks and cyberthreats
– Skilled / economically-motivated
attackers (e.g., ransomware)
• Several security systems use machine learning to detect attacks
– but … is machine learning secure enough?
5

Classifier Evasion
6

Is Machine Learning Secure Enough?
• Problem: how to evade a linear (trained) classifier?
Start 2007
with a bang!
Make WBFS
YOUR
PORTFOLIO’s
first winner
of the year
...
start
bang
portfolio
winner
year
...
university
campus
1
1
1
1
1
...
0
0
+6 > 0, SPAM
(correctly classified)
f (x) = sign(wT
x)
x
start
bang
portfolio
winner
year
...
university
campus
+2
+1
+1
+1
+1
...
-3
-4
w
x’
St4rt 2007
with a b4ng!
Make WBFS
YOUR
PORTFOLIO’s
first winner
of the year
... campus
start
bang
portfolio
winner
year
...
university
campus
0
0
1
1
1
...
0
1
+3 -4 < 0, HAM
(misclassified email)
f (x) = sign(wT
x)
7

Evasion of Linear Classifiers
• Formalized as an optimization problem
– Goal: to minimize the discriminant function
• i.e., to be classified as legitimate with the maximum confidence
– Constraints on input data manipulation
• e.g., number of words to be modified in each spam email
8
min$% 𝑤(
𝑥′
𝑠. 𝑡. 𝑑(𝑥, 𝑥%
) ≤ 𝑑34$

Dense and Sparse Evasion Attacks
• L2-norm noise corresponds to
dense evasion attacks
– All features are modified by
a small amount
• L1-norm noise corresponds to
sparse evasion attacks
– Few features are significantly
modified
9
min$% 𝑤(
𝑥′
𝑠. 𝑡. |𝑥 − 𝑥%
|7
7
≤ 𝑑34$
min$% 𝑤(
𝑥%
𝑠. 𝑡. |𝑥 − 𝑥%
|8 ≤ 𝑑34$

Examples on Handwritten Digits (9 vs 8)
10
original sample
5 10 15 20 25
5
10
15
20
25
SVM g(x)= −0.216
5 10 15 20 25
5
10
15
20
25
Sparse evasion attacks
(l1-norm constrained)
original sample
5 10 15 20 25
5
10
15
20
25
cSVM g(x)= 0.242
5 10 15 20 25
5
10
15
20
25
Dense evasion attacks
(l2-norm constrained)
manipulated sample
manipulated sample

Robustness and Regularization
11

• SVM learning is equivalent to a robust optimization problem
Robustness and Regularization
[Xu et al., JMLR 2009]
12
min
w,b
1
2
wT
w+C max 0,1− yi f (xi )( )
i
∑ min
w,b
max
ui∈U
max 0,1− yi f (xi +ui )( )
i
∑
1/margin classification error on
training data (hinge loss) bounded perturbation!

Generalizing to Other Norms
• Optimal regularizer should use dual norm of noise uncertainty sets
13
l2-norm regularization is
optimal against l2-norm noise!
Infinity-norm regularization is
optimal against l1-norm noise!
min
w,b
1
2
wT
w+C max 0,1− yi f (xi )( )
i
∑ min
w,b
w ∞
+C max 0,1− yi f (xi )( )
i
∑ , w ∞
= max
i=1,...,d
wi

Interesting Fact
• Infinity-norm SVM is more secure against L1 attacks as it bounds
the maximum absolute value of the feature weights
• This explains the heuristic intuition of using more uniform feature
weights in previous work [Kolcz and Teo, 2009; Biggio et al., 2010]
14
weights
weights

Security and Sparsity of Linear Classifiers
15

Security vs Sparsity
• Problem: SVM and Infinity-norm SVM provide dense solutions!
• Trade-off between security (to l2 or l1 attacks) and sparsity
– Sparsity reduces computational complexity at test time!
16
weights
weights

Elastic-Net Regularization
[H. Zou & T. Hastie, 2005]
• Originally proposed for feature selection
– to group correlated features together
• Trade-off between sparsity and security against l2-norm attacks
17
𝑤 9:;9< = 1 − 𝜆 𝑤 8 +
𝜆
2
𝑤 7
7
elastic net l1 l2

Octagonal Regularization
• Trade-off between sparsity and security against l1-norm attacks
18
𝑤 BCD; = 1 − 𝜌 𝑤 8 + 𝜌 𝑤 F
octagonal l1 infinity (max)

Experimental Analysis
19

Linear Classifiers
• SVM
– quadratic prog.
• Infinity-norm SVM
– linear prog.
• 1-norm SVM
– linear prog.
• Elastic-net SVM
– quadratic prog.
• Octagonal SVM
– linear prog.
20
min
G,H
1
2
𝑤 7
7
+ 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
min
G,H
𝑤 F + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
min
G,H
𝑤 8 + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
min
G,H
1 − 𝜆 𝑤 8 +
𝜆
2
𝑤 7
7
+ 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
min
G,H
1 − 𝜌 𝑤 8 + 𝜌 𝑤 F + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
𝑓 𝑥 = 𝑤( 𝑥 + 𝑏

Security and Sparsity Measures
• Sparsity
– Fraction of weights equal to zero
• Security (Weight Evenness)
– E=1/d if only one weight is different from zero
– E=1 if all weights are equal in absolute value
• Parameter selection with 5-fold cross-validation optimizing:
AUC + 0.1 S + 0.1 E
21
𝑆 =
1
𝑑
𝑤T|𝑤T = 0, 𝑘 = 1, … , 𝑑
𝐸 =
1
𝑑
𝑤 8
𝑤 F
∈ [
1
𝑑
, 1]

Results on Spam Filtering
Sparse Evasion Attack
• 5000 samples from TREC 07 (spam/ham emails)
• 200 features (words) selected to maximize information gain
• Results averaged on 5 repetitions, using 500 TR/TS samples
• (S,E) measures reported in the legend (in %)
22
0 10 20 30 40
0
0.2
0.4
0.6
0.8
1
Spam Filtering
AUC10%
d max
SVM (0, 37)
∞−norm (4, 96)
1−norm (86, 4)
el−net (67, 6)
8gon (12, 88)
maximum number of words modified in each spam

Results on PDF Malware Detection
Sparse Evasion Attack
• PDF: hierarchy of interconnected objects (keyword/value pairs)
23
0 20 40 60 80
0
0.2
0.4
0.6
0.8
1
PDF Malware DetectionAUC10%
d max
SVM (0, 47)
∞−norm (0, 100)
1−norm (91, 2)
el−net (55, 13)
8gon (69, 29)
maximum number of keywords added in each malicious PDF file
/Type 2
/Page 1
/Encoding 1
…
13 0 obj
<< /Kids [ 1 0 R 11 0 R ]
/Type /Page
... >> end obj
17 0 obj
<< /Type /Encoding ...>>
endobj
Features: keyword count
11,500 samples
5 reps - 500 TR/TS samples
114 features (keywords)
selected with information gain

Conclusions and Future Work
• We have shed light on the theoretical and practical implications
of sparsity and security in linear classifiers
• We have defined a novel regularizer to tune the trade-off
between sparsity and security against sparse evasion attacks
• Future work
– To investigate a similar trade-off for
• poisoning (training) attacks
• nonlinear classifiers
24

?Any questions
Thanks for your attention!
26

Limited-Knowledge (LK) attacks
26
PD(X,Y)data
Surrogate
training data
f(x)
Send queries
Get labels
Learn
surrogate
classifier
f’(x)

On Security and Sparsity of Linear Classifiers for Adversarial Settings

More Related Content

What's hot (7)

Viewers also liked (11)

Similar to On Security and Sparsity of Linear Classifiers for Adversarial Settings (20)

More from Pluribus One (18)

Recently uploaded (20)

On Security and Sparsity of Linear Classifiers for Adversarial Settings