SlideShare a Scribd company logo
Binary Analysis for Botnet
Reverse Engineering & Defense
Dawn Song
UC Berkeley
Binary Analysis Is Important for Botnet Defense
• Botnet programs: no source code, only binary
• Botnet defense needs internal understanding of
botnet programs
– C&C reverse engineering
• Different possible commands, encryption/decryption
– Botnet traffic rewriting
– Botnet infiltration
– Botnet vulnerability discovery
BitBlaze Binary Analysis Infrastructure: Architecture
• The first infrastructure:
– Novel fusion of static, dynamic, formal analysis methods
• Loop extended symbolic execution
• Grammar-aware symbolic execution
– Whole system analysis (including OS kernel)
– Analyzing packed/encrypted/obfuscated code
Vine:
Static Analysis
Component
TEMU:
Dynamic Analysis
Component
Rudder:
Mixed Execution
Component
BitBlaze Binary Analysis Infrastructure
Dissecting
Malware
BitBlaze Binary Analysis Infrastructure
Detecting
Vulnerabilities
Generating
Filters
BitBlaze: Security Solutions via Program Binary Analysis
 Unified platform to accurately analyze security properties of binaries
 Security evaluation & audit of third-party code
 Defense against morphing threats
 Faster & deeper analysis of malware
The BitBlaze Approach & Research Foci
 Semantics based, focus on root cause:
Automatically extracting security-related properties from binary
code for effective vulnerability detection & defense
1. Build a unified binary analysis platform for security
– Identify & cater common needs of different security applications
– Leverage recent advances in program analysis, formal methods, binary
instrumentation/analysis techniques for new capabilities
2. Solve real-world security problems via binary analysis
• Extracting security related models for vulnerability detection
• Generating vulnerability signatures to filter out exploits
• Dissecting malware for real-time diagnosis & offense: e.g., botnet infiltration
• More than a dozen security applications & publications
Plans
• Building on BitBlaze to develop new techniques
• Automatic Reverse Engineering of C&C protocols
of botnets
• Automatic rewriting of botnet traffic to facilitate
botnet infiltration
• Vulnerability discovery of botnet
Preliminary Work
• Dispatcher: Enabling Active Botnet Infiltration
using Automatic Protocol Reverse-Engineering
• Binary code extraction and interface identification
for botnet traffic rewriting
• Botnet analysis for vulnerability discovery
Dispatcher: Enabling Active Botnet
Infiltration using Automatic Protocol
Reverse-Engineering
Juan Caballero
Pongsin Poosankam
Christian Kreibich
Dawn Song
Automatic Protocol Reverse-Engineering
• Process of extracting the application-level protocol
used by a program, without the specification
– Automatic process
– Many undocumented protocols (C&C, Skype, Yahoo)
• Encompasses extracting:
1. the Protocol Grammar
2. the Protocol State Machine
• Message format extraction is prerequisite
Challenges for Active Botnet Infiltration
2. Access to one side of dialog only
1. Understand both sides of C&C protocol
– Message structure
– Field semantics
3. Handle encryption/obfuscation
• Goal: Rewrite C&C messages on either dialog side
Technical Contributions
1. Buffer deconstruction, a technique to extract
the format of sent messages
 Earlier work only handles received messages
2. Field semantics inference techniques, for
messages sent and received
3. Designing and developing Dispatcher
4. Extending a technique to handle encryption
5. Rewriting a botnet dialog using information
extracted by Dispatcher
Message Format Extraction
• Extract format of a single message
• Required by Grammar and State Machine extraction
GET / HTTP/1.1
HTTP/1.1 200 OK
[Polyglot]
[Dispatcher]
Message Field Tree
Field Range: [3:3]
Field Boundary: Fixed
Field Semantics: Delimiter
Field Keywords: <none>
Target: Version
HTTP/1.1 200 OKrnrn
MSG
[0:18]
Status Line
[0:16]
Version
[0:7]
Delimiter
[8:8]
Status-Code
[9:11]
Delimiter
[12:12]
Reason
[13:14]
Delimiter
[15:16]
Delimiter
[17:18]
Message format extraction has 2 steps:
1. Extract tree structure
2. Extract field attributes
Sent vs. Received
• Both protocol directions from single binary
• Different problems
– Taint information harder to leverage
– Focus on how message is constructed,
not processed
• Different techniques needed:
– Tree structure  Buffer Deconstruction
– Field attributes  New heuristics
Outline
Introduction
Problem
Techniques
Buffer Deconstruction
Evaluation
Field Semantics Inference
Handling encryption
Buffer Deconstruction
• Intuition
– Programs keep fields in separate memory buffers
– Combine those buffers to construct sent message
• Output buffer
– Holds message when “send” function invoked
– Or holds unencrypted message before encryption
• Recursive process
– Decompose a buffer into buffers used to fill it
– Starts with output buffer
– Stops when there’s nothing to recurse
Buffer Deconstruction
• Message field tree = inverse of output buffer structure
• Output is structure of message field tree
– No field attributes, except range
Output Buffer (19)
A(17)
G(2)
D(1) E(3) F(1)
C(8) H(2) [0:18]
[0:16] [17:18]
[0:7] [8:8] [9:11] [12:12] [13:14] [15:16]
MSG
Delimiter
Status Line
Reason
Status
Code
Delimiter
Version
B(2)
Delimiter Delimiter
HTTP/1.1 200 OKrnrn
Field Attributes Inference
• Attributes capture extra information
– E.g., inter-field relationships
Attribute Value
Field Range [StartOffset : EndOffset]
Field Boundary Fixed, Length, Delimiter
Field Semantics IP address, Timestamp, …
Field Keywords <list of keyworkds in field>
• Techniques identify
– Keywords
– Length fields
– Delimiters
– Variable-length field
– Arrays
Field Semantics
Field Semantics
Cookies Keyboard input
Error codes Keywords
File data Length
File information Padding
Filenames Ports
Hash / Checksum Registry data
Hostnames Sleep timers
Host information Stored data
IP addresses Timestamps
• A field attribute in the message field tree
• Captures the type of data in the field
• Programs contain much
semantic info  leverage it!
• Semantics in well-defined
functions and instructions
– Prototype
• Similar to type inference
• Differs for received and sent
messages
Field Semantic Inference
GET /index.html HTTP/1.1
struct stat {
…
off_t st_size; /* total size in bytes */
…
}
int stat(const char*path, struct stat *buf);
OUT OUT
IN
HTTP/1.1 200 OK
Content-Length: 25
<html>Hello world!</html>
File path
File length
stat(“index.html”, &file_info);
Detecting Encoding Functions
• Encoding functions = (de)compression,
(de)(en)cryption, (de)obfuscation…
• High ratio of arithmetic & bitwise instructions
• Use read/write set to identify buffers
• Work-in-progress on extracting and reusing
encoding functions
MegaD C&C protocol
type MegaD_Message = record {
msg_len : uint16;
encrypted_payload:
bytestring &length = 8*msg_len;
} &byteorder = bigendian;
type encrypted_payload = record {
version : uint16;
mtype : uint16;
data : MegaD_data (mtype);
};
type MegaD_data (msg_type: uint16) =
case msg_type of {
0x00 -> m00 : msg_0;
[…]
default -> unknown : bytestring &restofdata;
};
• C&C on tcp/443 using
proprietary encryption
• Use Dispatcher’s output
to generate grammar
– 15 different messages
seen (7 recv, 8 sent)
– 11 field semantics
C&C Server
MegaD Dialog
SMTP Test Server
Template Server
C&C Server
MegaD Rewriting
SMTP Test Server
Summary
• Buffer deconstruction, a technique to extract
the format of sent messages
• Field semantics inference techniques, for
messages sent and received
• Designed and developed Dispatcher
• Extended technique to handle encryption
• Rewrote MegaD dialog using information
extracted by Dispatcher

More Related Content

PDF
10. sig free a signature free buffer overflow attack blocker
PDF
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
PPTX
Pen Testing Development
PPTX
A New Framework for Detection
PPT
03 sockets
PPT
Mufix Network Programming Lecture
PDF
Computer network (10)
PPTX
Part 6 : Internet applications
10. sig free a signature free buffer overflow attack blocker
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Pen Testing Development
A New Framework for Detection
03 sockets
Mufix Network Programming Lecture
Computer network (10)
Part 6 : Internet applications

Similar to binary analysis for botnet reverse engineering.pptx (20)

PDF
TLS Optimization
PDF
Tlsoptimizationprint 120224194603-phpapp02
PDF
Sans signature buffer overflow blocker
PDF
11.sans signature buffer overflow blocker
PDF
Network Programming Clients
PDF
15network Programming Clients
PPT
Sockets in unix
PDF
rspamd-fosdem
PDF
Meterpreter in Metasploit User Guide
PDF
TLS/SSL Protocol Design
PPT
Client server
PDF
Black hat 2010-bannedit-advanced-command-injection-exploitation-1-wp
PPTX
USE_OF_PACKET_CAPTURE.pptx
PPTX
Botnet communication patterns 2
PPTX
Linux Inter Process Communication
PPT
Exploiting Network Protocols To Exhaust Bandwidth Links 2008 Final
PPT
Sigfree ppt (International Journal of Computer Science and Mobile Computing)
PDF
rspamd-slides
PDF
Socket programming-in-python
TLS Optimization
Tlsoptimizationprint 120224194603-phpapp02
Sans signature buffer overflow blocker
11.sans signature buffer overflow blocker
Network Programming Clients
15network Programming Clients
Sockets in unix
rspamd-fosdem
Meterpreter in Metasploit User Guide
TLS/SSL Protocol Design
Client server
Black hat 2010-bannedit-advanced-command-injection-exploitation-1-wp
USE_OF_PACKET_CAPTURE.pptx
Botnet communication patterns 2
Linux Inter Process Communication
Exploiting Network Protocols To Exhaust Bandwidth Links 2008 Final
Sigfree ppt (International Journal of Computer Science and Mobile Computing)
rspamd-slides
Socket programming-in-python
Ad

More from AhmedHamouda68 (11)

PDF
CMOS Image Sensor Design_h20_5_noise_sources.pdf
PDF
CMOS Image Sensor Design_h20_3_photodiode_pixels_1sep2020.pdf
PDF
CMOS Image Sensor Design_Course information slides.pdf
PDF
CMOS Image Sensor Design_00h20_9_isp.pdf
PDF
CMOS Image Sensor Design_h20_10_jpeg.pdf
PDF
CMOS Image Sensor Design_h20_7_high_dynamic_range.pdf
PDF
CMOS Image Sensor Design_h20_8_characterization2.pdf
PDF
CMOS Image Sensor Design_00h20_11_io.pdf
PDF
CMOS Image Sensor Design_h20_6_snr_model.pdf
PPT
Universal Armament Interface for the futuere.ppt
PPTX
EDEM2PSI welcome day-of economics.2021.pptx
CMOS Image Sensor Design_h20_5_noise_sources.pdf
CMOS Image Sensor Design_h20_3_photodiode_pixels_1sep2020.pdf
CMOS Image Sensor Design_Course information slides.pdf
CMOS Image Sensor Design_00h20_9_isp.pdf
CMOS Image Sensor Design_h20_10_jpeg.pdf
CMOS Image Sensor Design_h20_7_high_dynamic_range.pdf
CMOS Image Sensor Design_h20_8_characterization2.pdf
CMOS Image Sensor Design_00h20_11_io.pdf
CMOS Image Sensor Design_h20_6_snr_model.pdf
Universal Armament Interface for the futuere.ppt
EDEM2PSI welcome day-of economics.2021.pptx
Ad

Recently uploaded (20)

PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
737-MAX_SRG.pdf student reference guides
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Sustainable Sites - Green Building Construction
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Well-logging-methods_new................
PPT
Mechanical Engineering MATERIALS Selection
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Artificial Intelligence
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PPT
Project quality management in manufacturing
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
UNIT 4 Total Quality Management .pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
737-MAX_SRG.pdf student reference guides
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CH1 Production IntroductoryConcepts.pptx
Sustainable Sites - Green Building Construction
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Well-logging-methods_new................
Mechanical Engineering MATERIALS Selection
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Artificial Intelligence
CYBER-CRIMES AND SECURITY A guide to understanding
Internet of Things (IOT) - A guide to understanding
Project quality management in manufacturing
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT

binary analysis for botnet reverse engineering.pptx

  • 1. Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley
  • 2. Binary Analysis Is Important for Botnet Defense • Botnet programs: no source code, only binary • Botnet defense needs internal understanding of botnet programs – C&C reverse engineering • Different possible commands, encryption/decryption – Botnet traffic rewriting – Botnet infiltration – Botnet vulnerability discovery
  • 3. BitBlaze Binary Analysis Infrastructure: Architecture • The first infrastructure: – Novel fusion of static, dynamic, formal analysis methods • Loop extended symbolic execution • Grammar-aware symbolic execution – Whole system analysis (including OS kernel) – Analyzing packed/encrypted/obfuscated code Vine: Static Analysis Component TEMU: Dynamic Analysis Component Rudder: Mixed Execution Component BitBlaze Binary Analysis Infrastructure
  • 4. Dissecting Malware BitBlaze Binary Analysis Infrastructure Detecting Vulnerabilities Generating Filters BitBlaze: Security Solutions via Program Binary Analysis  Unified platform to accurately analyze security properties of binaries  Security evaluation & audit of third-party code  Defense against morphing threats  Faster & deeper analysis of malware
  • 5. The BitBlaze Approach & Research Foci  Semantics based, focus on root cause: Automatically extracting security-related properties from binary code for effective vulnerability detection & defense 1. Build a unified binary analysis platform for security – Identify & cater common needs of different security applications – Leverage recent advances in program analysis, formal methods, binary instrumentation/analysis techniques for new capabilities 2. Solve real-world security problems via binary analysis • Extracting security related models for vulnerability detection • Generating vulnerability signatures to filter out exploits • Dissecting malware for real-time diagnosis & offense: e.g., botnet infiltration • More than a dozen security applications & publications
  • 6. Plans • Building on BitBlaze to develop new techniques • Automatic Reverse Engineering of C&C protocols of botnets • Automatic rewriting of botnet traffic to facilitate botnet infiltration • Vulnerability discovery of botnet
  • 7. Preliminary Work • Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering • Binary code extraction and interface identification for botnet traffic rewriting • Botnet analysis for vulnerability discovery
  • 8. Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering Juan Caballero Pongsin Poosankam Christian Kreibich Dawn Song
  • 9. Automatic Protocol Reverse-Engineering • Process of extracting the application-level protocol used by a program, without the specification – Automatic process – Many undocumented protocols (C&C, Skype, Yahoo) • Encompasses extracting: 1. the Protocol Grammar 2. the Protocol State Machine • Message format extraction is prerequisite
  • 10. Challenges for Active Botnet Infiltration 2. Access to one side of dialog only 1. Understand both sides of C&C protocol – Message structure – Field semantics 3. Handle encryption/obfuscation • Goal: Rewrite C&C messages on either dialog side
  • 11. Technical Contributions 1. Buffer deconstruction, a technique to extract the format of sent messages  Earlier work only handles received messages 2. Field semantics inference techniques, for messages sent and received 3. Designing and developing Dispatcher 4. Extending a technique to handle encryption 5. Rewriting a botnet dialog using information extracted by Dispatcher
  • 12. Message Format Extraction • Extract format of a single message • Required by Grammar and State Machine extraction GET / HTTP/1.1 HTTP/1.1 200 OK [Polyglot] [Dispatcher]
  • 13. Message Field Tree Field Range: [3:3] Field Boundary: Fixed Field Semantics: Delimiter Field Keywords: <none> Target: Version HTTP/1.1 200 OKrnrn MSG [0:18] Status Line [0:16] Version [0:7] Delimiter [8:8] Status-Code [9:11] Delimiter [12:12] Reason [13:14] Delimiter [15:16] Delimiter [17:18] Message format extraction has 2 steps: 1. Extract tree structure 2. Extract field attributes
  • 14. Sent vs. Received • Both protocol directions from single binary • Different problems – Taint information harder to leverage – Focus on how message is constructed, not processed • Different techniques needed: – Tree structure  Buffer Deconstruction – Field attributes  New heuristics
  • 16. Buffer Deconstruction • Intuition – Programs keep fields in separate memory buffers – Combine those buffers to construct sent message • Output buffer – Holds message when “send” function invoked – Or holds unencrypted message before encryption • Recursive process – Decompose a buffer into buffers used to fill it – Starts with output buffer – Stops when there’s nothing to recurse
  • 17. Buffer Deconstruction • Message field tree = inverse of output buffer structure • Output is structure of message field tree – No field attributes, except range Output Buffer (19) A(17) G(2) D(1) E(3) F(1) C(8) H(2) [0:18] [0:16] [17:18] [0:7] [8:8] [9:11] [12:12] [13:14] [15:16] MSG Delimiter Status Line Reason Status Code Delimiter Version B(2) Delimiter Delimiter HTTP/1.1 200 OKrnrn
  • 18. Field Attributes Inference • Attributes capture extra information – E.g., inter-field relationships Attribute Value Field Range [StartOffset : EndOffset] Field Boundary Fixed, Length, Delimiter Field Semantics IP address, Timestamp, … Field Keywords <list of keyworkds in field> • Techniques identify – Keywords – Length fields – Delimiters – Variable-length field – Arrays
  • 19. Field Semantics Field Semantics Cookies Keyboard input Error codes Keywords File data Length File information Padding Filenames Ports Hash / Checksum Registry data Hostnames Sleep timers Host information Stored data IP addresses Timestamps • A field attribute in the message field tree • Captures the type of data in the field • Programs contain much semantic info  leverage it! • Semantics in well-defined functions and instructions – Prototype • Similar to type inference • Differs for received and sent messages
  • 20. Field Semantic Inference GET /index.html HTTP/1.1 struct stat { … off_t st_size; /* total size in bytes */ … } int stat(const char*path, struct stat *buf); OUT OUT IN HTTP/1.1 200 OK Content-Length: 25 <html>Hello world!</html> File path File length stat(“index.html”, &file_info);
  • 21. Detecting Encoding Functions • Encoding functions = (de)compression, (de)(en)cryption, (de)obfuscation… • High ratio of arithmetic & bitwise instructions • Use read/write set to identify buffers • Work-in-progress on extracting and reusing encoding functions
  • 22. MegaD C&C protocol type MegaD_Message = record { msg_len : uint16; encrypted_payload: bytestring &length = 8*msg_len; } &byteorder = bigendian; type encrypted_payload = record { version : uint16; mtype : uint16; data : MegaD_data (mtype); }; type MegaD_data (msg_type: uint16) = case msg_type of { 0x00 -> m00 : msg_0; […] default -> unknown : bytestring &restofdata; }; • C&C on tcp/443 using proprietary encryption • Use Dispatcher’s output to generate grammar – 15 different messages seen (7 recv, 8 sent) – 11 field semantics
  • 24. Template Server C&C Server MegaD Rewriting SMTP Test Server
  • 25. Summary • Buffer deconstruction, a technique to extract the format of sent messages • Field semantics inference techniques, for messages sent and received • Designed and developed Dispatcher • Extended technique to handle encryption • Rewrote MegaD dialog using information extracted by Dispatcher

Editor's Notes

  • #13: Dynamic binary analysis
  • #14: This should probably be a reply
  • #18: Field = contiguous seq. of bytes in msg Buffer = contiguous seq. of bytes in memory
  • #21: stat is a well-known Unix function Part of the IEEE Std 1003.1, 2004 Edition standard
  • #23: MegaD is a prevalent spam botnet that accounted for 35.4% of all spam in the Internet in a December 2008 study, and still accounts for 4-5 [Marshal86e]. Grammar available in Appendix A Field semantics: IP addresses, ports, hostnames, length, sleep timers, error codes, keywords, cookies, stored data, padding, and host information