Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Risk-Based Attack Surface Approximation:
How Much Data is Enough?
Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams
North Carolina State University
Microsoft Research

Introduction
What is the “Attack Surface”? Quoting the Open Web Application
Security Project…
• All paths for data and commands in a software system
• The data that travels these paths
• The code that implements and protects both
Concept used for security effort prioritization.
3
Introduction | Background | Methodology | Results | Conclusion

4
Crashes represent activity that put the system under
stress.
Stack Traces tell us what happened.
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
center!processAction+0x1034
center!dontDoAnything+0x1030
Risk-Based Attack Surface Approximation
(RASA)

• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
5
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%

Previously…
6
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Great! All done, right?

Practitioner Problems
7

• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
8

– “We don’t have that much data!”
9

– “We don’t store every crash we received, we don’t
see the value in that.”
10

– “We don’t store every crash we received, we don’t
see the value in that.”
– “We don’t have historical vulnerabilities to use as a
goodness measure.”
11

Research Questions
• RQ1: Can the RASA approach be implemented at the
source code file level with actionable results?
• RQ2: How does random sampling of crash dump stack
traces effect RASA?
12

Data Sources
• Mozilla Firefox
– ~1M crashes
– Vulnerability data from Mozilla Security
Blog and bug tracker
• Windows 8.1
– ~9M crashes
– Vulnerability data from internal data
sources
13

Methodology - RASA
14

Methodology - RASA
15

Methodology - RASA
16

Methodology - Sampling
17
10% of…

18
10% of…
20% of…

19
10% of…
20% of…
• Sample at each “level”
• Record stdev of files,
vulnerabilities covered

20
12%
13%
14%
15%
16%
17%
70%
71%
72%
73%
74%
75%
Random Sample Size
Files
Vulnerabilities

10%
12%
14%
16%
18%
20%
22%
24%
26%
30%
32%
34%
36%
38%
40%
42%
44%
46%
Random Sample Size
21
Files
Vulnerabilities

Why Does Sampling Work?
• Crashes tend not to happen in isolation.
– If something crashes once, it will likely crash again.
• For Firefox, only 6 files in the data set with a vulnerability
had only one crash occurrence.
– Against ~300 vulnerable files, 50,000 total files
• If foo.cpp crashes many times, random sampling unlikely
to remove all foo.cpp’s from the dataset.
22

Future Work
• We have a list of vulnerable files; now what?
– Further prioritization to assist developers.
• We’re looking at:
– How the attack surface changes over time.
– How the complexity of the attack surface predicts
vulnerabilities.
– How proximity to the boundary of a software
system predicts vulnerabilities.
23

Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
24

Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
• “We don’t have that much data!”
– Orders of magnitude less data required compared
to previous studies.
25

Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
26

Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
• “We don’t have historical vulnerabilities to use as a
goodness measure.”
– Satisfied previous complaints with less data, naïve
sampling; evidence it will work on new systems.
27

28
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
crtheise@ncsu.edu
@theisencr
theisencr.github.io
Expected Graduation: May 2018
Data Science, Security Analytics,
Security Education

Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

More Related Content

Similar to Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017] (20)

Recently uploaded (20)

Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Editor's Notes