SlideShare a Scribd company logo
Responsible : Prof. Frédéric Pétrot
Supervisor : Luc Michel
TIMA Laboratory - SLS Group
Grenoble, France
Translation cache policies for
dynamic binary translation
Ecole
Nationale
des Sciences
de l'Informatique
Saber Ferjani
2
 DBT: Is a CPU simulation technique, it reads a
short sequence of code (Target), translates it,
and executes it in a different CPU (Host).
Host Machine
CPUSimulated Target
translation
asm code
TB TB TB TB TB TB
3
 Translation cache: It is a buffer in host
machine that stores the Translated Blocks (TB)
Outline
1. Virtualization and simulation techniques
2. Qemu Internals
3. Typical cache algorithms
4. Cache algorithm proposal
5. Simulation results
6. Conclusion & Perspectives
4
1. Virtualization and simulation techniques
5
1.1. Just In Time Compiler
1. Virtualization and simulation techniques
6
1.2. Hosted & Native Hypervisors
1. Virtualization and simulation techniques
7
1.3. Virtualization tools
Virtual Box
Virtual PC
VMware
Xen
Bochs
Valgrind
Qemu
KVM
1. Virtualization and simulation techniques
8
1.4. Simulation techniques
 Interpretive technique ► Extremely slow!
 Native Simulation ► Need source code!
 Binary Translation:
 Static ► Cannot handle indirect branches
 Dynamic ► Quite fast & flexible
2. Qemu internals
9
2.1. Overview
 Generic & Open source machine emulator
 Created by Fabrice Bellard in 2003
 Supported targets: IA32, ARM, SPARC, MIPS, PPC…
2. Qemu internals
10
2.2. Execution flow example
2. Qemu internals
11
2.3. Main execution loop
2. Qemu internals
12
2.4. Translation cache size
2. Qemu internals
13
2.4. TB allocation
3. Typical cache algorithms
14
Optimal cache algorithm (offline)
Basic cache algorithms:
Flush, Random, FIFO, LRU, LFU
Advanced cache algorithms:
LRFU, 2Q, LIRS, ARC
Qemu constraints:
TB are not movable
TB size is variable,
TB size is unpredictable
4. Cache algorithm proposal
15
4.1. Algorithm design
4. Cache algorithm proposal
16
4.2. Data structure
Constant insertion overhead
Frequently referenced TBs are elected for
re-translation into separated cache area
4. Cache algorithm proposal
17
4.3. HST update
Before CSA flush, add address of all TBs
that were executed more than 𝐹𝑡ℎ
HST is used as circular buffer,
HST size is fixed to half of HSA size
@HS1
@HS2
@HS3@HS4
@HS5
Qemu monitor: Back-end configuration
console interface
Log options:
out asm: show generated host code
In asm: show target assembly code
Exec: show trace before each executed TB
…etc
Generated log of (log exec):
Trace (Host Address) [(Target Address)]
5. Simulation results
18
5.1. Qemu log
5. Simulation results
19
5.2. TB-trace: Translation cache simulator
5. Simulation results
20
5.3. Simulated cache algorithms
LRU
LFU
CSA HSA
• A-LRU:
• A-LFU:
• A-2Q:
@
@
@@
@
HST
5. Simulation results
5.3. Qemu used guest machines
LZMA benchmark
Linux Kernel
Windows XP start-up
5. Simulation results
22
5.5. Guest 1: LZMA benchmark over Debian
0,25 0,375 0,5
62
89
72
50 55 5256
68
88
CSA
flushs
Quota=
LRU LFU 2Q
0,25 0,375 0,5
18,5%
39,6%
26,1%
86,9% 91,3% 90,1%
81,8% 81,9% 81,8%
Hotspot
hit
5. Simulation results
23
5.6. Guest 2: Linux kernel 2.6.20
0,25 0,375 0,5
15
18
22
15
17
21
16
19
23
CSA
flushs
Quota=
LRU LFU 2Q
+1
HSA
flush
+1
HSA
flush
0,25 0,375 0,5
24,1%
32,1%
43,6%
24,4%
61,9% 57,4%
30,0%
64,1% 65,2%
Hotspot
hit
5. Simulation results
24
5.7. Guest 3: Windows XP start-up
0,25 0,375 0,5
15
18
21
15
17
21
16
19
24
CSA
flushs
Quota=
LRU LFU 2Q
+1
HSA
flush
+1
HSA
flush
+1
HSA
flush
0,25 0,375 0,5
16,0%
45,2%
52,1%
23,4%
56,5% 51,4%
29,0%
45,3%
64,7%
Hotspot
hit
Qemu translation cache is inefficient
Cache algorithms based on page
replacement cannot be used
Our algorithm proposal advantages:
Reduce unneeded re-translations
TB insertion overhead is constant
Drawbacks:
Invalidated TB remain allocated
Address find operation depend on HST size
6. Conclusion & Perspectives
25
6.1. Conclusion
Use a hash function for HST to accelerate
TB lookup before each new translation,
Use an op-code buffer to accelerate TB
re-translation of hot spots,
Estimate size of next translation, and try
to overwrite invalidated TB
6. Conclusion & Perspectives
26
6.2. Perspectives
27
Questions?

More Related Content

PPTX
QEMU - Binary Translation
PDF
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
PDF
Qemu JIT Code Generator and System Emulation
PDF
Qemu Introduction
PPTX
QEMU and Raspberry Pi. Instant Embedded Development
PDF
from Binary to Binary: How Qemu Works
PPTX
Onnc intro
PDF
Linux Kernel Platform Development: Challenges and Insights
QEMU - Binary Translation
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
Qemu JIT Code Generator and System Emulation
Qemu Introduction
QEMU and Raspberry Pi. Instant Embedded Development
from Binary to Binary: How Qemu Works
Onnc intro
Linux Kernel Platform Development: Challenges and Insights

What's hot (20)

PDF
Building Network Functions with eBPF & BCC
PPTX
Tiny ML for spark Fun Edge
PPTX
Linux Device Tree
PDF
F9 Microkernel code reading - part 1
PPT
Linux Kernel Debugging
PDF
한컴MDS_Virtual Target Debugging with TRACE32
PPTX
Embedded TCP/IP stack for FreeRTOS
PDF
Zn task - defcon russia 20
PPTX
Raspberry Pi I/O控制與感測器讀取
KEY
OpenBSD/sgi SMP implementation for Origin 350
PDF
XenSummit NA 2012: Xen on ARM Cortex A15
PDF
Kernel Recipes 2017 - Build farm again - Willy Tarreau
ODP
eBPF maps 101
PDF
Vm ware fuzzing - defcon russia 20
PDF
Implementing Lightweight Networking
ODP
Linux kernel debugging(ODP format)
PDF
Advanced cfg bypass on adobe flash player 18 defcon russia 23
KEY
SMP Implementation for OpenBSD/sgi [Japanese Edition]
PDF
Linux kernel debugging
PPTX
Online test program generator for RISC-V processors
Building Network Functions with eBPF & BCC
Tiny ML for spark Fun Edge
Linux Device Tree
F9 Microkernel code reading - part 1
Linux Kernel Debugging
한컴MDS_Virtual Target Debugging with TRACE32
Embedded TCP/IP stack for FreeRTOS
Zn task - defcon russia 20
Raspberry Pi I/O控制與感測器讀取
OpenBSD/sgi SMP implementation for Origin 350
XenSummit NA 2012: Xen on ARM Cortex A15
Kernel Recipes 2017 - Build farm again - Willy Tarreau
eBPF maps 101
Vm ware fuzzing - defcon russia 20
Implementing Lightweight Networking
Linux kernel debugging(ODP format)
Advanced cfg bypass on adobe flash player 18 defcon russia 23
SMP Implementation for OpenBSD/sgi [Japanese Edition]
Linux kernel debugging
Online test program generator for RISC-V processors
Ad

Similar to Translation Cache Policies for Dynamic Binary Translation (20)

PDF
qemu architecture and internals - How it works
PPTX
Coal (1)
PPTX
Week 12 Operating System Lectures lec 2.pptx
PDF
Performance evaluation with Arm HPC tools for SVE
PDF
Kernel Recipes 2019 - BPF at Facebook
PDF
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
PDF
Cryptologic Applications of the PlayStation 3: Cell SPEED
PDF
Best Practices and Performance Studies for High-Performance Computing Clusters
PPTX
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
PDF
Fletcher Framework for Programming FPGA
PDF
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
PDF
SDAccel Design Contest: Vivado HLS
PPTX
embedded design and systemChapter-0.pptx
PDF
Current and Future of Non-Volatile Memory on Linux
PPTX
EC8691-MPMC-PPT.pptx
PPTX
ADSP processor Notes by Pritish Vibhute.pptx
PDF
Aerospike Go Language Client
PDF
Core Scheduling for Virtualization: Where are We? (If we Want it!)
PPT
Reduced instruction set computers
PPTX
Week1 Electronic System-level ESL Design and SystemC Begin
qemu architecture and internals - How it works
Coal (1)
Week 12 Operating System Lectures lec 2.pptx
Performance evaluation with Arm HPC tools for SVE
Kernel Recipes 2019 - BPF at Facebook
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Cryptologic Applications of the PlayStation 3: Cell SPEED
Best Practices and Performance Studies for High-Performance Computing Clusters
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
Fletcher Framework for Programming FPGA
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SDAccel Design Contest: Vivado HLS
embedded design and systemChapter-0.pptx
Current and Future of Non-Volatile Memory on Linux
EC8691-MPMC-PPT.pptx
ADSP processor Notes by Pritish Vibhute.pptx
Aerospike Go Language Client
Core Scheduling for Virtualization: Where are We? (If we Want it!)
Reduced instruction set computers
Week1 Electronic System-level ESL Design and SystemC Begin
Ad

More from Saber Ferjani (8)

PPTX
Saber Ferjani
PPTX
Localization as a service in an Intelligent Transport System
PPTX
Wireless Meter Bus
PPTX
Future Internet
PPSX
Management de la qualité
PDF
Translation Cache Policies for Dynamic Binary Translation
PDF
SystemC
PPTX
La sécurité informatique
Saber Ferjani
Localization as a service in an Intelligent Transport System
Wireless Meter Bus
Future Internet
Management de la qualité
Translation Cache Policies for Dynamic Binary Translation
SystemC
La sécurité informatique

Recently uploaded (20)

PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
System and Network Administraation Chapter 3
PDF
System and Network Administration Chapter 2
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Introduction to Artificial Intelligence
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Understanding Forklifts - TECH EHS Solution
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Transform Your Business with a Software ERP System
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
wealthsignaloriginal-com-DS-text-... (1).pdf
System and Network Administraation Chapter 3
System and Network Administration Chapter 2
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Introduction to Artificial Intelligence
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
ai tools demonstartion for schools and inter college
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Operating system designcfffgfgggggggvggggggggg
PTS Company Brochure 2025 (1).pdf.......
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Understanding Forklifts - TECH EHS Solution
How Creative Agencies Leverage Project Management Software.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Transform Your Business with a Software ERP System
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx

Translation Cache Policies for Dynamic Binary Translation

  • 1. Responsible : Prof. Frédéric Pétrot Supervisor : Luc Michel TIMA Laboratory - SLS Group Grenoble, France Translation cache policies for dynamic binary translation Ecole Nationale des Sciences de l'Informatique Saber Ferjani
  • 2. 2  DBT: Is a CPU simulation technique, it reads a short sequence of code (Target), translates it, and executes it in a different CPU (Host). Host Machine CPUSimulated Target translation asm code
  • 3. TB TB TB TB TB TB 3  Translation cache: It is a buffer in host machine that stores the Translated Blocks (TB)
  • 4. Outline 1. Virtualization and simulation techniques 2. Qemu Internals 3. Typical cache algorithms 4. Cache algorithm proposal 5. Simulation results 6. Conclusion & Perspectives 4
  • 5. 1. Virtualization and simulation techniques 5 1.1. Just In Time Compiler
  • 6. 1. Virtualization and simulation techniques 6 1.2. Hosted & Native Hypervisors
  • 7. 1. Virtualization and simulation techniques 7 1.3. Virtualization tools Virtual Box Virtual PC VMware Xen Bochs Valgrind Qemu KVM
  • 8. 1. Virtualization and simulation techniques 8 1.4. Simulation techniques  Interpretive technique ► Extremely slow!  Native Simulation ► Need source code!  Binary Translation:  Static ► Cannot handle indirect branches  Dynamic ► Quite fast & flexible
  • 9. 2. Qemu internals 9 2.1. Overview  Generic & Open source machine emulator  Created by Fabrice Bellard in 2003  Supported targets: IA32, ARM, SPARC, MIPS, PPC…
  • 10. 2. Qemu internals 10 2.2. Execution flow example
  • 11. 2. Qemu internals 11 2.3. Main execution loop
  • 12. 2. Qemu internals 12 2.4. Translation cache size
  • 13. 2. Qemu internals 13 2.4. TB allocation
  • 14. 3. Typical cache algorithms 14 Optimal cache algorithm (offline) Basic cache algorithms: Flush, Random, FIFO, LRU, LFU Advanced cache algorithms: LRFU, 2Q, LIRS, ARC Qemu constraints: TB are not movable TB size is variable, TB size is unpredictable
  • 15. 4. Cache algorithm proposal 15 4.1. Algorithm design
  • 16. 4. Cache algorithm proposal 16 4.2. Data structure Constant insertion overhead Frequently referenced TBs are elected for re-translation into separated cache area
  • 17. 4. Cache algorithm proposal 17 4.3. HST update Before CSA flush, add address of all TBs that were executed more than 𝐹𝑡ℎ HST is used as circular buffer, HST size is fixed to half of HSA size @HS1 @HS2 @HS3@HS4 @HS5
  • 18. Qemu monitor: Back-end configuration console interface Log options: out asm: show generated host code In asm: show target assembly code Exec: show trace before each executed TB …etc Generated log of (log exec): Trace (Host Address) [(Target Address)] 5. Simulation results 18 5.1. Qemu log
  • 19. 5. Simulation results 19 5.2. TB-trace: Translation cache simulator
  • 20. 5. Simulation results 20 5.3. Simulated cache algorithms LRU LFU CSA HSA • A-LRU: • A-LFU: • A-2Q: @ @ @@ @ HST
  • 21. 5. Simulation results 5.3. Qemu used guest machines LZMA benchmark Linux Kernel Windows XP start-up
  • 22. 5. Simulation results 22 5.5. Guest 1: LZMA benchmark over Debian 0,25 0,375 0,5 62 89 72 50 55 5256 68 88 CSA flushs Quota= LRU LFU 2Q 0,25 0,375 0,5 18,5% 39,6% 26,1% 86,9% 91,3% 90,1% 81,8% 81,9% 81,8% Hotspot hit
  • 23. 5. Simulation results 23 5.6. Guest 2: Linux kernel 2.6.20 0,25 0,375 0,5 15 18 22 15 17 21 16 19 23 CSA flushs Quota= LRU LFU 2Q +1 HSA flush +1 HSA flush 0,25 0,375 0,5 24,1% 32,1% 43,6% 24,4% 61,9% 57,4% 30,0% 64,1% 65,2% Hotspot hit
  • 24. 5. Simulation results 24 5.7. Guest 3: Windows XP start-up 0,25 0,375 0,5 15 18 21 15 17 21 16 19 24 CSA flushs Quota= LRU LFU 2Q +1 HSA flush +1 HSA flush +1 HSA flush 0,25 0,375 0,5 16,0% 45,2% 52,1% 23,4% 56,5% 51,4% 29,0% 45,3% 64,7% Hotspot hit
  • 25. Qemu translation cache is inefficient Cache algorithms based on page replacement cannot be used Our algorithm proposal advantages: Reduce unneeded re-translations TB insertion overhead is constant Drawbacks: Invalidated TB remain allocated Address find operation depend on HST size 6. Conclusion & Perspectives 25 6.1. Conclusion
  • 26. Use a hash function for HST to accelerate TB lookup before each new translation, Use an op-code buffer to accelerate TB re-translation of hot spots, Estimate size of next translation, and try to overwrite invalidated TB 6. Conclusion & Perspectives 26 6.2. Perspectives