ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING

Another Brick off The Wall:
Deconstructing Web Application
Firewalls Using Automata Learning
George Argyros, Ioannis Stais
Joint Work with:
Suman Jana, Angelos D. Keromytis, Aggelos Kiayias

Overview
• A journey in the world of:
- Code Injection attacks.
- Web Application Firewalls.
- Parsers.
- Learning algorithms.
• And newly discovered vulnerabilities :)

Code Injection Attacks
• SQLi, XSS, XML, etc…
• Not going anywhere anytime soon.
• 14% increase in total web attacks in
Q2 2016 [1]
• 150% - 200% increase in SQLi and
XSS attacks in 2015 [2]
[1] akamai’s [state of the internet] / security Q2 2016 executive review
[2] Imperva: 2015 Web Application Attack Report (WAAR)

Code Injection is a Parsing
Problem
Web Application
Language
Runtime
Input data Injection attack

Problem
Web Application
Language
Runtime
Input data
Input data is parsed
incorrectly
Injection attack

Problem
Web application parsers are doing a really bad
job in parsing user inputs.
Web Application
Language
Runtime
Input data
Input data is parsed
incorrectly
Injection attack

Web Application Firewalls
(or solving parsing problems with parsing)

Web Application Firewalls
• Monitor trafﬁc at the Application
Layer: Both HTTP Requests and
Responses.
• Detect and Prevent Attacks.
• Cost-effective compliance with PCI
DSS requirement 6.6 [1]
[1] PCI DSS v3.2

WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
User
Input

WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
User
Input
<ScRipt>alert(1);</ScRipT>

WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
User
Input
<script>alert(1);</script>
Lower Case

WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
User
Input
Lower Case
Matched Rule:
<script>.*</script>

WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
Event
Correlation
Tokenising
User
Input
Lower Case
Matched Rule:
<script>.*</script>

WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
Event
Correlation
Tokenising
User
Input
Lower Case
Matched Rule:
<script>.*</script>
1.<script>
2. alert(1);
3.</script>

WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
Event
Correlation
Tokenising
User
Input
Lower Case
Matched Rule:
<script>.*</script>
1.<script>
2. alert(1);
3.</script>
1. 4 Rules Matched
2. Session/User history

WAF Rulesets
• Signatures: Strings or Regular Expressions
E.g., [PHPIDS Rule 54] Detects Postgres pg_sleep injection, waitfor delay attacks and
database shutdown attempts:
(?:selects*pg_sleep)|(?:waitfors*delays?"+s?d)|(?:;s*shutdowns*(?:;|--|#|/*|{))

WAF Rulesets
• Rules: Logical expressions and Condition/Control Variables
E.g., ModSecurity CRS Rule 981254:
SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|!REQUEST_COOKIES:/
_pk_ref/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "(?i:(?:selects*?
pg_sleep)|(?:waitfors*?delays?["'`´’‘]+s?d)|(?:;s*?shutdowns*?(?:;|--|#|/*|{)))" “phase:
2,capture,t:none,t:urlDecodeUni,block, setvar:tx.sql_injection_score=
+1,setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},setvar:'tx.%{tx.msg}-
OWASP_CRS/WEB_ATTACK/SQLI-%{matched_var_name}=%{tx.0}'"

WAF Rulesets
• Virtual Patches: Application Speciﬁc Patches
E.g., ModSecurity: Turns off autocomplete for the forms on login and signup pages
SecRule REQUEST_URI "^(/login|/signup)" "id:1000,phase:4,chain,nolog,pass"
SecRule REQUEST_METHOD "@streq GET" "chain"
SecRule STREAM_OUTPUT_BODY "@rsub s/<form /<form autocomplete="off" /"

WAF Rulesets
• Virtual Patches: Application Speciﬁc Patches
• PHPIDS has more than 420K states
• Shared between different WAFs and Log Auditing Software: PHPIDS,
Expose, ModSecurity

Why Bypasses Exist
- Simple hacks:
• Lack of support for different protocols, encodings, contents, etc
• Restrictions on length, character sets, byte ranges, types of
parameters, etc

Why Bypasses Exist
- Rulesets sharing mistakes:
• Normalisation and Rulesets Failure
PHPIDS 0.7.0
Rulesets
Matching
Normalization
’ ” ` ”
User
Input

Why Bypasses Exist
PHPIDS 0.7.0
Rulesets
Matching
Normalization
’ ” ` ”
User
Input
x' onclick='a()'>

Why Bypasses Exist
PHPIDS 0.7.0
Rulesets
Matching
Normalization
’ ” ` ”
User
Input
x' onclick='a()'>
"s*(src|style|onw+)s*=s*")
MATCHED!

Why Bypasses Exist
Rulesets
Matching
Normalization
’ ” ` ”
User
Input
x' onclick='a()'>
MATCHED!

Why Bypasses Exist
Rulesets
Matching
Normalization
’ ” ` ”
User
Input
x' onclick='a()'>
MATCHED!
Expose 2.4.0

Why Bypasses Exist
Rulesets
Matching
’ ” ` ”
User
Input
x' onclick='a()'>
MATCHED!
Expose 2.4.0

Why Bypasses Exist
Rulesets
Matching
’ ” ` ”
User
Input
x' onclick='a()'>
MATCHED!
Expose 2.4.0
BYPASS!

Why Bypasses Exist
- Critical WAF components are not being updated:
• E.g, ModSecurity libinjection library

Why Bypasses Exist
- The Real Fundamental Reasons:
• Insufﬁcient Signatures & Weak Rules
• Detecting vulnerabilities without context is HARD

Our Goal
1. Formalize knowledge in code injection attacks variations
using context free grammars and automata.
2. Use Learning algorithms to expand this knowledge by
inferring system speciﬁcations.

Using parsers to
break parsers

Regular Expressions and
Finite Automata
Every regular expression can be converted to a
Deterministic Finite Automaton.
(.*)man

Context Free Grammars
• Superset of Regular Expressions.
• Mostly used to write programming
languages parsers.
• Equivalent to a DFA with a stack.
• Can be used to count.
- Example: matching parentheses.
E → N
E → E Op E
E → ( E )
N → N N
Op → +
Op → -
Op → *
Op → /
N → [0-9]
} Non
Terminals
}Terminals

Attack of the Grammars
• Context Free Grammars can be used to encode attack vectors.
• Assume we would like to inject code into the query:
- “SELECT * FROM users WHERE id=$id;”
• The valid sufﬁxes (injections) for this query can be encoded as a
CFG!

Cross checking regular expressions with grammars is easy!
Why should I care?
Context Free
Grammar G
Regular Expression Fvs
SQL Injections WAF Filter
Find an SQL Injection attack in the Grammar G
which is not rejected by the ﬁlter F

However…
• In reality, we do not know the language parsed by most
implementations.
- MySQL is parsing a different SQL ﬂavor than MS-SQL.
- Browsers are deﬁnitely not parsing the HTML standard.
- WAFs are doing much more than a simple RE matching.

Learning to Parse
• Our Approach: Use Learning algorithms in order to infer the
speciﬁcations of parsers and WAFs.
- Cross check the inferred models for vulnerabilities.
• By using learning we can actively ﬁgure out important details of the
systems.

Learning Automata
• Active Learning algorithm.
- Instead of learning from corpus of data, query the program
with input of his choice.
• Eventually a model is generated.
• Discovered inconsistencies of the model is used to reﬁne it.

Learning Model
Learning
Algorithm
Parser P

Learning Model
Learning
Algorithm
Parser P
Membership Query

Learning Model
Learning
Algorithm
string s
Is s accepted by P?
Parser P
Membership Query

Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P
Membership Query

Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P

Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P
Equivalence Query

Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P
Equivalence
Oracle
Is H a correct model of P?
Yes, or provide counterexample.
Equivalence Query

Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P
Equivalence
Oracle
Is H a correct model of P?
Yes, or provide counterexample.

Learning DFAs
• Angluin’s algorithm is an active learning algorithm for learning
DFAs.
• Learns the target DFA using a table data structure called the
observation table.
• Let’s use it to learn the regular expression (.*)<a(.*)
- Aggressive ﬁltering of anchor tags.

Learning DFAs
OT ε
ε
ε<
εa

Learning DFAs
OT ε
ε
ε<
εa
Empty string

Learning DFAs
OT ε
ε
ε<
εa
Strings for “testing”
states.
(Distinguishing strings)
Empty string

Learning DFAs
OT ε
ε
ε<
εa
states.
Strings accessing
different states in the
target automaton.
(Access Strings)
Empty string

Learning DFAs
OT ε
ε
ε<
εa
states.
Strings accessing
target automaton.
(Access Strings)
Strings which
transition from
the above states.
Empty string

Learning DFAs
OT ε
ε
ε<
εa
states.
Strings accessing
target automaton.
(Access Strings)
Strings which
transition from
the above states.
An entry is ﬁlled by
concatenating the row
and column string and
ﬁlling with the output of
the automaton.
Empty string

Learning DFAs
OT ε
ε
ε<
εa
Model:
Target: < a
<,a
<
a
> ε < <a
q_0
q_0
trans.

Learning DFAs
OT ε
ε
ε<
εa
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε
ε 0
ε< 0
εa 0
q_0
q_0
trans.

Learning DFAs
OT ε
ε
ε<
εa
Model:
Target: < a
<,a
<
a
> ε < <a
a,<
> ε
OT ε
ε 0
ε< 0
εa 0
q_0
q_0
trans.

Equivalence Query
< a
<,a
<
a
> ε < <a
a,<
> ε =?
Counterexample
analysis
aCE: <aaa<<a Add a new column with
character “a” in the OT.

Learning DFAs
OT ε a
ε 0
ε< 0
εa 0
Model:
Target: < a
<,a
<
a
> ε < <a
q_0
q_0
trans.

Learning DFAs
OT ε a
ε 0
ε< 0
εa 0
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
εa 0 0
q_0
q_0
trans.

Learning DFAs
OT ε a
ε 0
ε< 0
εa 0
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
εa 0 0
Must be a
new state
q_0
q_0
trans.

Learning DFAs
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a
<<
Model:
Target: < a
<,a
<
a
> ε < <a
q_0
q_1
q_0
Trans.
q_1
Trans.

Learning DFAs
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a
<<
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
q_0
q_1
q_0
Trans.
q_1
Trans.

Learning DFAs
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a
<<
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
Must be a
new state
q_0
q_1
q_0
Trans.
q_1
Trans.

Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.

Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa 1 1
<a< 1 1
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.

Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa 1 1
<a< 1 1
states
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.

Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa 1 1
<a< 1 1
states
transitions
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.

Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
< a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa 1 1
<a< 1 1
states
transitions
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.

Learning DFAs
• This algorithm is inefﬁcient for large alphabets/automata.
• For just one PHPIDS Rule (id. 72):
• ((={s}*(top|this|window|content|self|frames|_content))|(/{s}*[gimx]*{s}*[)}])|([^ ]{s}*={s}*script)|(.{s}
*constructor)|(default{s}+xml{s}+namespace{s}*=)|(/{s}*+[^+]+{s}*+{s}*/))
- 72 states when represented as a DFA.
- The OT will have ~650k entries.
• We need a faster algorithm in order to check real systems!

Symbolic Finite Automata
✓ Efﬁcient modeling of large
alphabets.
✓ We designed a novel,
efﬁcient learning algorithm.
✓ Details in the whitepaper!

Bootstrapping Automata Learning
• Similar concept with seed inputs in fuzzers.
- Provide sample inputs and learning algorithm will
discover additional states in the parser.
• Utilize previously inferred models, speciﬁcations, etc.
• Seed inputs are guiding the learning algorithm.
• Details in the white paper!

Grammar Oriented Filter Auditing
(GOFA)

• Assume that we are given a grammar with attacks.
• How do we utilize it with the learning algorithm?
Main idea:
Use the grammar to drive the learning procedure.

…
select_exp: SELECT name
any_all_some: ANY | ALL
column_ref: name
parameter: name
Context Free
Grammar G
Learning
Algorithm

…
column_ref: name
parameter: name
Context Free
Grammar G
Step 1:
Learn a model of the WAF.
Learning
Algorithm

…
column_ref: name
parameter: name
Context Free
Grammar G
Step 1:
Learn a model of the WAF.
Learning
Algorithm
WAF
Model

Context Free
Grammar G
Learning
Algorithmvs
Step 2:
Find a vulnerability in the model using the grammar.
WAF
Model

Context Free
Grammar G
Learning
Algorithmvs
WAF
Model

Context Free
Grammar G
Learning
Algorithmvs
Step 3:
Verify WAF vulnerability.
WAF
Model

Context Free
Grammar G
Learning
Algorithmvs
Step 3:
Verify WAF vulnerability.
Candidate Bypass
WAF
Model

Context Free
Grammar G
Learning
Algorithmvs
Candidate Bypass
WAF
Model

Context Free
Grammar G
Learning
Algorithmvs
Candidate Bypass
Step 4:
or reﬁne model and repeat.
WAF
Model

Context Free
Grammar G
Learning
Algorithmvs
Candidate Bypass
Step 4:
or reﬁne model and repeat.
counterexample (false positive)
WAF
Model

GOFA SQL Injections
• Grammar for extending search conditions:
select * from users where user = admin and email = $_GET[c]

GOFA SQL Injections
S: A main 
main: search_condition  
search_condition: OR predicate | AND predicate  
predicate: comparison_predicate | between_predicate | like_predicate | test_for_null | in_predicate
| all_or_any_predicate | existence_test 
comparison_predicate: scalar_exp comparison scalar_exp | scalar_exp COMPARISON subquery 
between_predicate: scalar_exp BETWEEN scalar_exp AND scalar_exp 
like_predicate: scalar_exp LIKE atom  
test_for_null: column_ref IS NULL 
in_predicate: scalar_exp IN ( subquery ) | scalar_exp IN ( atom )  
all_or_any_predicate: scalar_exp comparison any_all_some subquery 
existence_test: EXISTS subquery 
scalar_exp: scalar_exp op scalar_exp | atom | column_ref | ( scalar_exp )  
atom: parameter | intnum  
subquery: select_exp 
select_exp: SELECT name 
any_all_some: ANY | ALL | SOME 
column_ref: name 
parameter: name 
intnum: 1 
op: + | - | * | /  
comparison: = | < | >  
name: A
• Grammar for extending search conditions:
select * from users where user = admin and email = $_GET[c]

GOFA SQL Injections
• Authentication bypass using the vector: or exists (select 1)
Example:
select * from users where username = $_GET['u'] and password = $_GET['p];
select * from users where username = admin and password = a or exists (select 1)
Affected: ModSecurity Latest CRS, PHPIDS, WebCastellum, Expose

GOFA SQL Injections
• Authentication bypass using the vector: 1 or a = 1
1 or a like 1
Example:
select * from users where username = $_GET['u'] and password = $_GET['p];
select * from users where username = admin and password = 1 or isAdmin like 1
Affected: ModSecurity Latest CRS, PHPIDS (only for statement with ‘like’),
WebCastellum, Expose

GOFA SQL Injections
• Columns/variables ﬁngerprinting using the vectors: and exists (select a)
a or a > any select a
Example:
select * from users where username = admin and id = $_GET['u'];
select * from users where username = admin and id = 1 and exists (select email)

GOFA SQL Injections
• Grammar for extending select queries:
select * from users where user = $_GET[c]

GOFA SQL Injections
S: A main 
main: query_exp 
query_exp: groupby_exp | order_exp | limit_exp | procedure_exp | into_exp | for_exp |
lock_exp | ; select_exp | union_exp | join_exp 
groupby_exp: GROUP BY column_ref ascdesc_exp 
order_exp: ORDER BY column_ref ascdesc_exp 
limit_exp: LIMIT intnum 
into_exp: INTO output_exp intnum 
procedure_exp: PROCEDURE name ( literal ) 
literal: string | intnum 
select_exp: SELECT name 
union_exp: UNION select_exp 
ascdesc_exp: ASC | DESC 
column_ref: name 
join_exp: JOIN name ON name 
for_exp: FOR UPDATE 
lock_exp: LOCK IN SHARE MODE 
output_exp: OUTFILE | DUMPFILE 
string: name 
intnum: 1 
name: A
• Grammar for extending select queries:
select * from users where user = $_GET[c]

GOFA SQL Injections
• Data retrieval bypass using the vector: 1 right join a on a = a
Example:
select * from articles left join authors on author.id=$_GET['id']
select * from articles left join authors on author.id= 1 right join users on author.id =
users.id
Affected: ModSecurity Latest CRS, WebCastellum

GOFA SQL Injections
• Columns/variables ﬁngerprinting using the vectors: a group by a asc
Example:
select * from users where username = $_GET['u'];
select * from users where username = admin group by email asc

GOFA SQL Injections
• Columns/variables ﬁngerprinting using the vectors: procedure a (a)
Example:
select * from users where username = $_GET['u'];
select * from users where username = admin procedure analyze()
Affected: libInjection

SFADiff: Learning Attack Vectors

SFADiff
• Available grammars are not always good for ﬁnding vulnerabilities.
• Most XSS bypasses result from attack vectors deviating from the HTML
standard.
- <IMG SRC=“jav	ascript:alert(‘XSS');">
- Tons of other examples.
• Use the same learning approach to infer the HTML parser speciﬁcation!

SFADiff
WAF
Browser
Automata
Learner
Automata
Learner

SFADiff
WAF
Browser
WAF
model
HTML
Model
Automata
Learner
Automata
Learner

SFADiff
WAF
Browser
vs
WAF
model
HTML
Model
Automata
Learner
Automata
Learner

SFADiff
WAF
Browser
vs
WAF
model
HTML
Model
Automata
Learner
Automata
Learner
candidate bypasses
candidate bypasses

SFADiff
WAF
Browser
vs
WAF
model
HTML
Model
counterexamples
Automata
Learner
Automata
Learner
candidate bypasses
candidate bypasses

SFADiff
WAF
Browser
vs
WAF
model
HTML
Model
counterexamples
Bypasses
Automata
Learner
Automata
Learner
candidate bypasses
candidate bypasses

ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING

SFADiff XSS Bypass
• XSS Attack vectors in PHPIDS 0.7/ Expose 2.4.0
<p onmouseover=-a() ></p>
<p onmouseover=(a()) ></p>
<p onmouseover=;a() ></p>
<p onmouseover=!a() ></p>
• Other types of events can also be use used for the attack (e.g. "onClick").
• Rules 71, 27, 2 and 65 are related to this insufﬁcient pattern match.

Generating Program Fingerprints
P_T
P_1 P_2 P_N…

P_T
P_1 P_2 P_N…
Which program is running in
the Black-box?

P_T
SFADiff
P_1 P_2 P_N…

P_T
SFADiff
P_1 P_2 P_N…
Input causing difference in P_1, P_2

P_T
SFADiff
P_1 P_2 P_N…
P_i
Input causing difference in P_1, P_2

P_T
SFADiff
P_1 P_2 P_N…
P_i

P_T
SFADiff
SFADiff
P_1 P_2 P_N…
P_i

P_T
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
Input causing difference

P_T
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j

P_T
SFADiff
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j

P_T
SFADiff
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j
P_T

“etc/<”
“:%0o”
“:/B”
“%23%0A”
“;”
Webcastelum 1.8.4
“etc/,#”
PHPIDS 0.6.5
“:et#”
PHPIDS 0.5.0
PHPIDS 0.6.4
ModSecurity 2.9.1
PHPIDS 0.6.3
Expose 2.4.0
PHPIDS 0.4.0
✘
✔
✔
✘
✘
✔
✔
✘
✔
✔
✘
✘
✘
✔

Modular Design
• Core Modules:
• Use automata models and operations
• Extend the SFA learning algorithm
• Built-in Query Handlers:
• Perform membership queries
• Modules (and Built-in Modules):
• Use the Built-in Query Handlers
• Extend the Core Modules: GOFA, SFADiff
• Library:
• Set of grammars, filters, fingerprints trees and configurations

Core Modules
• Extend SFA Learning algorithm:
• Accept the Alphabet, a Seed and/or a Tests ﬁle and a Query handler.
• Initialise learning and manage results and models
• The Alphabet: Set of characters to be used
• The Seed File: Knowledge of what the examined inputs should look like
• The Tests File: Knowledge of specialised attacks
• The Query Handler/Function: Knowledge of how to perform queries for selected
inputs

Core Modules
• GOFA:
• Grammar Oriented Filter Auditing.
• SFADiff:
• A black-box differential testing framework based on Symbolic
Finite Automata (SFA) learning.
Simple Structure: Class with ﬁve (5) basic functions:
setup(), learn(), query(), getresults(), stats()

Built-in Query Handlers
• HTTP Request Handler:
• Perform queries on WAF ﬁlters and Sanitizers
• SQL Query Handler:
• Perform queries on MySQL Parser
• Browser Parser Handler:
• Perform queries on Browser JavaScript Parsers
• Browser Filter Handler:
• Perform queries on Browser Anti-XSS Filters

HTTP Request Handler
• Targets WAF Filter
• Requires URL, HTTP Request Type, Parameter and Block
or Bypass Signature
Core Module
GOFA
Initialize
WAF
HTTP
Request Handler
MODULE
HTTP
GET /?parameter=Payload
Block/Bypass Signature
Query
True/False
HTTP
Protocol

MySQL Query Handler
Core Module
GOFA
Initialize
MySQL
Database
SQL Handler
MODULE
Preﬁx Query + Payload
Result or Empty
Query
True/False
MySQL DB
Driver
• Targets MySQL Database Parser
• Requires Database Credentials
• Requires Preﬁx Query: e.g, “SELECT a FROM a WHERE a=**”

Browser Parser Handler
Core Module
GOFA
Initialize
Browser
Handler
MODULE
Query
True/False
HTTP Protocol and WebSockets
• Targets the Browser HTML and JavaScript Parsing Engine
• Requires web sockets port, web browser port, host and trigger delay
• Inputs must trigger function a() (e.g., <script>a();</script>)
WEB
BROWSER
Web Server
Web Socket
Server
HTTP
G
ET
Payload PayloadTrue/False
True/False
HTML Page

Browser Filter Handler
Core Module
GOFA
Initialize
Browser
Filter
Handler
MODULE
Query
True/False
HTTP Protocol, WebSockets &
Cross Origin Message Events
• Targets the Browser Anti-XSS Filter, HTML and JavaScript Parsing Engine
Web Server
Web Socket
Server
Payload
True/False
HTTP GET /?
parameter=Payload
Payload
True/False
IFRAME
HTTP
GET
WEB
BROWSER
Query
HTML Page
True/False
LoadQuery
HTML Page

Using GOFA module and HTTP Handler

use HTTPHandler as my_query_handler 
deﬁne URL http://83.212.105.5/PHPIDS07/ 
deﬁne BLOCK impact 
back

back
Query Handler was created.
We now can perform
membership requests.

back
use GOFA as my_gofa 
deﬁne TESTS_FILE {library}/regex/PHPIDS070/12.y 
deﬁne HANDLER my_query_handler 
back
We now can perform

back
back
We now can perform
Algorithm was selected and
populated.
Know we can learn
application states.

back
back
start my_gofa
We now can perform
Algorithm was selected and
populated.
Know we can learn
application states.

Built-in Modules
• WAF Fingerprints Tree Generator:
• Automatically generates a fingerprints tree for a set of WAFs
• WAF Distinguisher:
• Identifies a WAF using a set of fingerprints trees
• Model Operations:
• Perform automata operations on stored models, input filters and
grammars
• Browser and WAF Differential Testing:
• Queries both Browser and WAF using a predefined set of strings

Built-in Rulesets Library
• Regular Expressions
• Set of WAF filters, and attack models in the form of regular
expressions
• Grammars:
• Set of grammars that can be used for GOFA algorithm.
• Fingerprints Trees:
• Set of fingerprints trees for a predefined number of WAFs.
• Configurations:
• Sample configurations for WAF distinguish tree generation

Grub LightBulb:
https://github.com/lightbulb-framework/

Future Work
• Currently building many optimizations.
- Learning will be much faster in the next months.
- Cross checking models is also getting better.
• Incorporate fuzzers to improve models.
• New ideas?

Conclusions
• Current state of WAFs is still (very) ugly.
- Many low hanging fruits.
• Our vision is to enforce a standard for such products.
- WAFs must effectively defend against inferred language speciﬁcations.
- Learning can run continuously with the assistance of fuzzers.
• We have a similar line of work on sanitizers.

ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING

More Related Content

Similar to ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING (20)

Recently uploaded (20)

ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING