SlideShare a Scribd company logo
Risk-Based Attack Surface Approximation:
How Much Data is Enough?
Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams
North Carolina State University
Microsoft Research
Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]
Introduction
What is the “Attack Surface”? Quoting the Open Web Application
Security Project…
• All paths for data and commands in a software system
• The data that travels these paths
• The code that implements and protects both
Concept used for security effort prioritization.
3
Introduction | Background | Methodology | Results | Conclusion
4
Crashes represent activity that put the system under
stress.
Stack Traces tell us what happened.
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
center!processAction+0x1034
center!dontDoAnything+0x1030
Risk-Based Attack Surface Approximation
(RASA)
Introduction | Background | Methodology | Results | Conclusion
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
5
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Introduction | Background | Methodology | Results | Conclusion
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
6
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Great! All done, right?
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
7
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
8
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
9
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
10
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
– “We don’t have historical vulnerabilities to use as a
goodness measure.”
11
Introduction | Background | Methodology | Results | Conclusion
Research Questions
• RQ1: Can the RASA approach be implemented at the
source code file level with actionable results?
• RQ2: How does random sampling of crash dump stack
traces effect RASA?
12
Introduction | Background | Methodology | Results | Conclusion
Data Sources
• Mozilla Firefox
– ~1M crashes
– Vulnerability data from Mozilla Security
Blog and bug tracker
• Windows 8.1
– ~9M crashes
– Vulnerability data from internal data
sources
13
Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
14
Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
15
Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
16
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
17
10% of…
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
18
10% of…
20% of…
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
19
10% of…
20% of…
• Sample at each “level”
• Record stdev of files,
vulnerabilities covered
Introduction | Background | Methodology | Results | Conclusion
20
12%
13%
14%
15%
16%
17%
70%
71%
72%
73%
74%
75%
Random Sample Size
Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
10%
12%
14%
16%
18%
20%
22%
24%
26%
30%
32%
34%
36%
38%
40%
42%
44%
46%
Random Sample Size
21
Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
Why Does Sampling Work?
• Crashes tend not to happen in isolation.
– If something crashes once, it will likely crash again.
• For Firefox, only 6 files in the data set with a vulnerability
had only one crash occurrence.
– Against ~300 vulnerable files, 50,000 total files
• If foo.cpp crashes many times, random sampling unlikely
to remove all foo.cpp’s from the dataset.
22
Introduction | Background | Methodology | Results | Conclusion
Future Work
• We have a list of vulnerable files; now what?
– Further prioritization to assist developers.
• We’re looking at:
– How the attack surface changes over time.
– How the complexity of the attack surface predicts
vulnerabilities.
– How proximity to the boundary of a software
system predicts vulnerabilities.
23
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
24
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
• “We don’t have that much data!”
– Orders of magnitude less data required compared
to previous studies.
25
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
26
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
• “We don’t have historical vulnerabilities to use as a
goodness measure.”
– Satisfied previous complaints with less data, naïve
sampling; evidence it will work on new systems.
27
Introduction | Background | Methodology | Results | Conclusion
28
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
crtheise@ncsu.edu
@theisencr
theisencr.github.io
Expected Graduation: May 2018
Data Science, Security Analytics,
Security Education

More Related Content

PPTX
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
PDF
Software Analytics: Towards Software Mining that Matters
PDF
Software Mining and Software Datasets
PPTX
Metrics for Security Effort Prioritization
PPTX
Prioritizing Security Efforts with a Risk-Based Attack Surface Approximation
PDF
Autonomous Hacking: The New Frontiers of Attack and Defense
PPTX
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
PDF
BSidesQuebec2013_fred
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
Software Analytics: Towards Software Mining that Matters
Software Mining and Software Datasets
Metrics for Security Effort Prioritization
Prioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Autonomous Hacking: The New Frontiers of Attack and Defense
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
BSidesQuebec2013_fred

Similar to Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017] (20)

PDF
Mobile Application Assessment By the Numbers: a Whole-istic View
PDF
Crash Analysis with Reverse Taint
PPTX
Approximating Attack Surfaces with Stack Traces [ICSE 15]
PPTX
Best of Both Worlds: Correlating Static and Dynamic Analysis Results
PPTX
Automated Attack Surface Approximation [FSE - SRC 2015]
PDF
The (Memory) Safety Dance - SAS 2017 keynote
PDF
Attacks on Critical Infrastructure: Insights from the “Big Board”
PDF
Software Security Engineering (Learnings from the past to fix the future) - B...
PDF
Secure software chapman
PDF
The Hurricane's Butterfly: Debugging pathologically performing systems
PDF
Randy Rice - Defect Sampling – An Innovation for Focused Testing - EuroSTAR 2012
PDF
Defending against Adversarial Cyberspace Participants
PDF
Giving your AppSec program the edge - using OpenSAMM for benchmarking and sof...
PPT
Active Testing
PDF
Creating Order from Chaos: Metrics That Matter
PDF
Advanced Threats in the Enterprise: Finding an Evil in the Haystack
 
PDF
Open Source in Security-Critical Environments
PDF
Open source-in-security-critical-environments
PDF
Defense In Depth Using NIST 800-30
PPTX
SensePost Threat Modelling
Mobile Application Assessment By the Numbers: a Whole-istic View
Crash Analysis with Reverse Taint
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Best of Both Worlds: Correlating Static and Dynamic Analysis Results
Automated Attack Surface Approximation [FSE - SRC 2015]
The (Memory) Safety Dance - SAS 2017 keynote
Attacks on Critical Infrastructure: Insights from the “Big Board”
Software Security Engineering (Learnings from the past to fix the future) - B...
Secure software chapman
The Hurricane's Butterfly: Debugging pathologically performing systems
Randy Rice - Defect Sampling – An Innovation for Focused Testing - EuroSTAR 2012
Defending against Adversarial Cyberspace Participants
Giving your AppSec program the edge - using OpenSAMM for benchmarking and sof...
Active Testing
Creating Order from Chaos: Metrics That Matter
Advanced Threats in the Enterprise: Finding an Evil in the Haystack
 
Open Source in Security-Critical Environments
Open source-in-security-critical-environments
Defense In Depth Using NIST 800-30
SensePost Threat Modelling
Ad

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
Quality review (1)_presentation of this 21
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Knowledge Engineering Part 1
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Supervised vs unsupervised machine learning algorithms
.pdf is not working space design for the following data for the following dat...
STUDY DESIGN details- Lt Col Maksud (21).pptx
Fluorescence-microscope_Botany_detailed content
Reliability_Chapter_ presentation 1221.5784
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Qualitative Qantitative and Mixed Methods.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Quality review (1)_presentation of this 21
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
annual-report-2024-2025 original latest.
IBA_Chapter_11_Slides_Final_Accessible.pptx
1_Introduction to advance data techniques.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Knowledge Engineering Part 1
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Data_Analytics_and_PowerBI_Presentation.pptx
Ad

Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

  • 1. Risk-Based Attack Surface Approximation: How Much Data is Enough? Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams North Carolina State University Microsoft Research
  • 3. Introduction What is the “Attack Surface”? Quoting the Open Web Application Security Project… • All paths for data and commands in a software system • The data that travels these paths • The code that implements and protects both Concept used for security effort prioritization. 3 Introduction | Background | Methodology | Results | Conclusion
  • 4. 4 Crashes represent activity that put the system under stress. Stack Traces tell us what happened. foo!foobarDeviceQueueRequest+0x68 foo!fooDeviceSetup+0x72 foo!fooAllDone+0xA8 bar!barDeviceQueueRequest+0xB6 bar!barDeviceSetup+0x08 bar!barAllDone+0xFF center!processAction+0x1034 center!dontDoAnything+0x1030 Risk-Based Attack Surface Approximation (RASA) Introduction | Background | Methodology | Results | Conclusion
  • 5. • Previous RASA study used tens of millions of crashes. • Previous study was per binary. Previously… 5 [SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in Companion Proceedings of the 37th International Conference on Software Engineering (2015). [SEIP ‘15] Crashes %binaries 48.4% %vulnerabilities 94.6% Introduction | Background | Methodology | Results | Conclusion
  • 6. • Previous RASA study used tens of millions of crashes. • Previous study was per binary. Previously… 6 [SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in Companion Proceedings of the 37th International Conference on Software Engineering (2015). [SEIP ‘15] Crashes %binaries 48.4% %vulnerabilities 94.6% Great! All done, right? Introduction | Background | Methodology | Results | Conclusion
  • 7. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. 7 Introduction | Background | Methodology | Results | Conclusion
  • 8. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” 8 Introduction | Background | Methodology | Results | Conclusion
  • 9. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” – “We don’t have that much data!” 9 Introduction | Background | Methodology | Results | Conclusion
  • 10. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” – “We don’t have that much data!” – “We don’t store every crash we received, we don’t see the value in that.” 10 Introduction | Background | Methodology | Results | Conclusion
  • 11. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” – “We don’t have that much data!” – “We don’t store every crash we received, we don’t see the value in that.” – “We don’t have historical vulnerabilities to use as a goodness measure.” 11 Introduction | Background | Methodology | Results | Conclusion
  • 12. Research Questions • RQ1: Can the RASA approach be implemented at the source code file level with actionable results? • RQ2: How does random sampling of crash dump stack traces effect RASA? 12 Introduction | Background | Methodology | Results | Conclusion
  • 13. Data Sources • Mozilla Firefox – ~1M crashes – Vulnerability data from Mozilla Security Blog and bug tracker • Windows 8.1 – ~9M crashes – Vulnerability data from internal data sources 13 Introduction | Background | Methodology | Results | Conclusion
  • 14. Methodology - RASA 14 Introduction | Background | Methodology | Results | Conclusion
  • 15. Methodology - RASA 15 Introduction | Background | Methodology | Results | Conclusion
  • 16. Methodology - RASA 16 Introduction | Background | Methodology | Results | Conclusion
  • 17. Methodology - Sampling 17 10% of… Introduction | Background | Methodology | Results | Conclusion
  • 18. Methodology - Sampling 18 10% of… 20% of… Introduction | Background | Methodology | Results | Conclusion
  • 19. Methodology - Sampling 19 10% of… 20% of… • Sample at each “level” • Record stdev of files, vulnerabilities covered Introduction | Background | Methodology | Results | Conclusion
  • 20. 20 12% 13% 14% 15% 16% 17% 70% 71% 72% 73% 74% 75% Random Sample Size Introduction | Background | Methodology | Results | Conclusion Files Vulnerabilities
  • 21. 10% 12% 14% 16% 18% 20% 22% 24% 26% 30% 32% 34% 36% 38% 40% 42% 44% 46% Random Sample Size 21 Introduction | Background | Methodology | Results | Conclusion Files Vulnerabilities
  • 22. Why Does Sampling Work? • Crashes tend not to happen in isolation. – If something crashes once, it will likely crash again. • For Firefox, only 6 files in the data set with a vulnerability had only one crash occurrence. – Against ~300 vulnerable files, 50,000 total files • If foo.cpp crashes many times, random sampling unlikely to remove all foo.cpp’s from the dataset. 22 Introduction | Background | Methodology | Results | Conclusion
  • 23. Future Work • We have a list of vulnerable files; now what? – Further prioritization to assist developers. • We’re looking at: – How the attack surface changes over time. – How the complexity of the attack surface predicts vulnerabilities. – How proximity to the boundary of a software system predicts vulnerabilities. 23 Introduction | Background | Methodology | Results | Conclusion
  • 24. Conclusions • “Binary prioritization isn’t actionable.” – RASA can prioritize security effort effectively at the source code file level. 24 Introduction | Background | Methodology | Results | Conclusion
  • 25. Conclusions • “Binary prioritization isn’t actionable.” – RASA can prioritize security effort effectively at the source code file level. • “We don’t have that much data!” – Orders of magnitude less data required compared to previous studies. 25 Introduction | Background | Methodology | Results | Conclusion
  • 26. Conclusions • “We don’t store every crash we received, we don’t see the value in that.” – A naïve approach like random sampling still works. 26 Introduction | Background | Methodology | Results | Conclusion
  • 27. Conclusions • “We don’t store every crash we received, we don’t see the value in that.” – A naïve approach like random sampling still works. • “We don’t have historical vulnerabilities to use as a goodness measure.” – Satisfied previous complaints with less data, naïve sampling; evidence it will work on new systems. 27 Introduction | Background | Methodology | Results | Conclusion

Editor's Notes

  • #3: In 2010 paper, Tom Zimmermann compared finding vulnerabilities in code to finding “a needle in a haystack.” Based on my experience as a security engineer, it can be more like finding a needle in a field of them! Most difficult part of my job.
  • #4: One prioritization technique is to identify the attack surface. Commonly used for prioritization of effort, but typically based on the common knowledge of the team; imperfect.
  • #5: We developed approach called Risk-Based Attack Surface Approximation, or RASA… Hypothesis was that code that crashes shares properties with vulnerable code. Attempting to crash systems is one of the primary tools in forensics/red-team activity; we’re reverse-engineering what attackers do.
  • #8: So we evangelized this approach to industry days at NCSU, tech talks, etc. Got great feedback on making RASA actionable. I want to highlight a few of the words from the previous slide.
  • #12: When we follow up with, “well, just run it and see!” the response we got was… So we need to run studies at a lower level of granularity, with a lot less data, while still having historical vulnerabilities to compare against.
  • #15: Mine out individual code artifacts (files, in this case), from the stack traces in crash dumps.
  • #16: Map code mined from each crash to code in source control; use binary/function mapping for files, if files aren’t in crash
  • #17: The resultant pairing between code on crashes to source control code is our approximation of the attack surface. Simple by design; not language limited, works in multiple domains, just need crashes with stack traces. Highly flexible!
  • #18: To limit data, we do random sampling of crashes placed into the process at the first step. 10% of crashes, 20% of crashes, et cetera.
  • #20: Do multiple samples at each “level” (10%, 20%, etc), also look at how the attack surface changes from sample to sample at each sampling level. Trying to answer; does our approximation change greatly with different samplings of crash dump stack traces?
  • #21: Chart of percentage of files on the attack surface vs. vulnerabiltiies covered by attack surface. Vulnerabilities are 5 times as likely to be in code that crashes than not! Great place to start to run other tools, like static analysis, vulnerability prediction models, etc. Limits the work you need to do. Tiny variations, can use a quarter of the data available and our metrics change less than a percentage point with no change between samples.
  • #22: Comparison against previous study; we see a similar 2:1 ratio from vulnerabilities to files, so vulnerabilities are twice as dense in crashing code across all samples.
  • #24: Shorten or identify keywords
  • #31: Make a chart of this data